A Generative Deep Learning Approach for Shape

Full text

Turn on search term navigation

Introduction

Wave propagation is widely used to solve the inverse scattering problem^[¹^] (e.g., retrieving the shape of arbitrary objects from scattering data) in a variety of real-life engineering applications, including remote sensing,^[²^] medical imaging,^[³^] tomography,^[⁴^] underwater robotics,^[⁵^] geophysical prospecting,^[⁶^] and detection of defects in oil and gas industry.^[⁷^] In the past decade, underwater activities such as biological research, target recognition, exploration of seabed resources, and monitoring of the underwater environment have increased the demand for subsea inspections. Compared to other waves, such as electromagnetic waves and visible light, sound waves have their own advantage in detecting targets in water in the long range because of their better transmissive characteristics. Several sonar devices have been developed to detect submarine objects using sound waves.^[⁸^] Sonar uses transmitter and sensor elements to send and receive acoustic waves. The waves are reflected by objects on the seabed and detected by the sensors; these measurements over time can be used to reconstruct sonar images with delay sum method to acquire underwater information. Despite their success in detecting certain underwater objects, this method, however, does not adapt well to the complex and constantly changing environment. In this way, the recognition of underwater targets is still primarily dependent on the decision of trained sonar operators,^[⁹^] which can be highly imprecise due to the need of continuous manual monitoring. As a result, an automatic and reliable recognition approach is desired to take over human tasks. In addition, scattering information is complex-valued radiated data, whose intensity and phase may not always be accurately measured. Especially it is difficult to retrieve the phase information from the scattered field data, and only the intensity information is generally available. Several analytical and numerical methods have been proposed to reconstruct the scatterer's shape form the scattering data, such as regularization,^[¹⁰^] factorization,^[¹¹^] linear sampling method,^[¹²^] and many others,^[^13–15^] to name a few. Yet, these computation-driven and brute force optimization methods are time-consuming and require tedious hit-trail efforts in order to achieve the desired performance.

Recent advances in machine learning (ML) have made deep learning approaches an efficient way to solve forward and inverse scattering problems.^[^16,17^] Deep learning networks can approximate the true solution to on-demand inverse designs, attributed to their ability to learn nonlinear mappings in datasets. Several studies focused on discriminative and generative design methods have shown excellent performance beyond the human capability.^[^18–20^] Meng et al. presented a linear sampling method (LSM) with neural networks to reconstruct the shape of obstacles with acoustic far-field data.^[²¹^] LSM relies on selecting a contour line to obtain the shape information of an object. This solution is still somewhat imprecise and requires further improvement. Fan et al. used convolutional neural networks (CNNs) to determine the forward scattering properties for 2D convex prism geometries but did not address the inverse problem.^[²²^] Using neural networks, a compressing sampling matching pursuit (CoSaMP) method is developed to reconstruct the shape of arbitrary objects, but it only works for scatterers with low contrast relative to the surrounding environment.^[²³^] In addition, probability density-based deep learning approaches have also been developed to solve the inverse problem of the acoustic metasurfaces^[²⁴^] and acoustic cloaks.^[^25,26^] In both forward and inverse designs, the proposed solutions work well for simple geometries, but their performance decreases as the degree of freedom in the design space increases, making the scaling of complex models difficult. In this scenario, the generative models are used to reduce the dimensionality of the design space and efficiently learn the relations between design parameters and system responses.^[^19,27^] For an arbitrary configuration, the design space is significantly enlarged, and the inverse design process becomes extremely challenging due to one-to-many structure–property mapping, allowing diverse predictions. Zhang et al. presented an inverse design method for random metasurfaces, but the method is unable to entirely eliminate nonunique space because patterns and electromagnetic responses do not have one-to-one correspondence.^[²⁸^] Lai et al. studied the inverse scattering problem that determines the configuration of 2D rigid cylinders for given total scattering cross sections using Wasserstein generative adversarial networks but had limited success in reducing the nonunique predictions.^[²⁹^] In some situations, these disparate predictions may make the implementation easier; however, nonunique solutions drastically decrease the performance of the shape recognition algorithm in the far-field scattering problem. Metaneural networks have also been proposed for real-time recognition of complex objects such as handwritten digits and misaligned orbital angular momentum (OAM) beams from acoustic scattering.^[³⁰^] Different configurations of deep neural networks have been designed and tested to predict the far-field scattering from 2D and 3D arbitrary objects,^[^31,32^] but the inverse scattering problem remains to be investigated.

This study aims at solving the inverse scattering problem with deep learning that “uniquely” determines the shape of the 2D object from solely its phaseless far-field information. To address the intrinsic one-to-many mapping problem,^[³³^] we feed multifrequency far-field data into the training process, thereby eliminating the nonunique solution space in final predictions. The inverse design procedure is as follows. First, we encode the structural properties of the arbitrary shaped object in lower-dimensional latent vector, $z$ , using the adversarial autoencoder whose decoder part acts as a generator, G, in the inverse design strategy (see Figure 1b). The designed adversarial autoencoder imposes the condition of Gaussian distribution on the latent space to each randomly generated geometry and expedite the learning process from the shape of the object to a given far-field profile. Second, we employ a forward neural network (FNN) that acts as a physics predictor to evaluate the inverse design process (see Figure 1c). In the final step, we probabilistically train an inverse neural network (INN) followed by the pretrained generator and forward simulator. After all networks are trained, the far-field patterns are fed as an input to determine the latent space which predicts the generated design to the corresponding geometry (see Figure 1c). This study presents a powerful method for modeling inverse far-field scattering that allows fast and accurate predictions of the random shape of objects from scattering data for various applications.

View Image - Figure 1. Machine learning-assisted design process for acoustic far-field scattering problem. a) Forward and inverse mapping between the shape of the object and the multifrequency phaseless far-field amplitudes. b) Architecture of Adversarial Autoencoder (AAE) to learn the latent space of the arbitrary object with predefined model distribution. The typical autoencoder reconstructs the image, x, from the latent space, z. A second discriminative network is trained to predict whether a sample originates from hidden data of an autoencoder or a user-specified distribution. c) Inverse design approach where the pretrained Generator (G) and forward neural network (FNN) are cascaded behind the inverse neural network (INN). The generator G extracted from the AAE and the trained FNN remain fixed and the INN is then trained stochastically to learn the mapping between the far-field patterns and the latent distribution of the object. After training, the latent space, z, is the design layer which is passed into G to predict the shape of the object for the given far-field information. In our study, we consider far-field amplitudes at five different frequencies in low frequency to uniquely identify the object.

Figure 1. Machine learning-assisted design process for acoustic far-field scattering problem. a) Forward and inverse mapping between the shape of the object and the multifrequency phaseless far-field amplitudes. b) Architecture of Adversarial Autoencoder (AAE) to learn the latent space of the arbitrary object with predefined model distribution. The typical autoencoder reconstructs the image, x, from the latent space, z. A second discriminative network is trained to predict whether a sample originates from hidden data of an autoencoder or a user-specified distribution. c) Inverse design approach where the pretrained Generator (G) and forward neural network (FNN) are cascaded behind the inverse neural network (INN). The generator G extracted from the AAE and the trained FNN remain fixed and the INN is then trained stochastically to learn the mapping between the far-field patterns and the latent distribution of the object. After training, the latent space, z, is the design layer which is passed into G to predict the shape of the object for the given far-field information. In our study, we consider far-field amplitudes at five different frequencies in low frequency to uniquely identify the object.

Results and Discussion

We consider a 2D arbitrary-shaped scatterer made of steel immersed in water. Our goal is to build a deep learning model that can instantaneously provide information on the shape of the scatterer when illuminated by an acoustic plane wave. The elastic scatterer in water environment, of dimension around 40 cm, is discretized into a pixel-based binary image where we assign the pixel value 0 (for water) or 1 (for steel). The neural network is fed a binary image of the scatterer's shape as the input. The output of the network represents the directivity of the radiation pattern in the angular range from $0^{°}$ to $360^{°}$ . A neural network can be designed to learn the isomorphic relation between the input and the output of the considered acoustic system, as schematically described in Figure 1a.

To train the network, we need to prepare training samples. We randomly generate 20 000 geometries to ensure diversity in training datasets. The far-field radiation properties of these random geometries are simulated with COMSOL Multiphysics^[³⁴^] at five different frequencies $f_{1} = 1 kHz$ , $f_{2} = 1.5 kHz$ , $f_{3} = 2 kHz$ , $f_{4} = 2.5 kHz$ , and $f_{5} = 3 kHz$ . The far-field dataset contains 87 discrete points equally distributed in the full angular range ( $from 0^{°}$ to $360^{°}$ ). Each training sample consists of a pair of model input and the expected corresponding output, that is, the 2D geometry and the far-field radiation pattern. The left panel in Figure 1a illustrates an example of the structure with the red area representing steel and the corresponding radiation characteristics. The matrix of the structure is 64 × 64. The full dataset contains 16 000 (80%) training samples, 2000 (10%) validation samples, and the remaining 2000 (10%) testing samples. The validation dataset monitors the overfitting during training and helps to tune the hyperparameters of the network. The testing set evaluates the performance of the trained network.

The proposed deep learning model involves the training of adversarial autoencoder (AAE), forward neural network (FNN), and inverse neural network (INN). The AAE deals with the dimensionality reduction problem and learns the latent distribution of the arbitrary-shaped object to generate the geometric patterns. The FNN deals with the regression problem between the 2D object with the matrix of dimension 64 × 64 (equivalent to 1 × 4096-dimensional vector) and the multifrequency far-field scattering amplitudes with the dimension of 1 × 435. In fact, we found that the information contained in the total scattering cross section is not enough to determine unequivocally the shape of the object due to non-one-to-one correspondence. Therefore, we included angular-dependent scattering amplitude at more frequency points (i.e., radiative patterns or scattering amplitude, at different frequencies). The INN relies on the generator of AAE and FNN to predict the shape of an arbitrary structure of dimension 64 × 64 for the given multifrequency far-field amplitudes of dimension 1 × 435. We will discuss the training of these three networks in the following sections.

Adversarial Autoencoder (AAE)

We use an AAE^[³⁵^] to learn the distribution of random geometries whose training is based on traditional reconstruction and an adversarial regularization. The architecture of AAE is composed of three fully connected coupled neural networks: encoder, generator, and discriminator, as shown in Figure 1b. In the AAE, the encoder transforms a given input geometry into a compressed continuous design space (a latent space); the generator reconstructs the real space geometry from a given latent space; and the discriminator forces the latent space to follow a specified prior distribution by an adversarial manner.

The training process of AAE includes both adversarial learning of generative adversarial network (GANs)^[³⁶^] and latent distribution learning of variational autoencoders (VAEs).^[³⁷^] In this sense, the AAE architecture incorporates the properties of VAEs and GANs. Compared to other deep generative networks, such as VAEs and GANs, AAE networks possess several advantages. The AAE is more flexible than VAEs because it offers the freedom to control the latent space distribution without imposing any restrictions. Unlike vanilla GANs that skip the representation learning in a latent space, AAE produces a dense (continuous) latent space that can be used to generate diverse designs. In addition, AAE networks are easier to train than GANs because the target of adversarial learning focuses on the representation in the latent space, rather than the generation output. The AAE training procedure involves updating autoencoder (encoder + generator) and discriminator network by optimizing the loss functions of each network. The weights of discriminator network are updated to minimize a binary cross-entropy (BCE) loss, so that it can distinguish whether the sample comes from the encoder's latent space or a sampled distribution. Specifically, the loss function of discriminator to minimize is $ℒ_{D} = - \log ({\hat{y}}_{s}) - log (1 - {\hat{y}}_{l})$ , where ${\hat{y}}_{s}$ is the prediction made by the discriminator for a sample from the prior distribution (expected to be 1), and ${\hat{y}}_{l}$ is the prediction made by the discriminator for a sample from an encoded input in the latent space (expected to be 0). Alternately, the autoencoder weights are updated to minimize $ℒ_{A} = - \log ({\hat{y}}_{l}) - ℒ_{r .}$ The first term is to make the autoencoder produce sample-encoding vectors similar to samples from a prior distribution (fooling the discriminator to make prediction ${\hat{y}}_{l}$ be close to 1). The second term measures how good the autoencoder can reconstruct the arbitrary input object from the latent space. Concretely, the reconstruction error is defined based on a binary cross-entropy loss expressed as $ℒ_{r} = \frac{1}{N} \sum_{k = 1}^{N} y_{i} log ({\hat{y}}_{i}) + (1 - y_{i}) log (1 - {\hat{y}}_{i})$ where N is the size of input (also the size of output), $y_{i}$ is the ground truth of the ith pixel in input. $y_{i} = 0$ represents the ith pixel that belongs to water background and $y_{i} = 1$ represents the one belonging to steel object and ${\hat{y}}_{i}$ ∈(0,1) is the predicted value of the ith pixel in the reconstruction output. When ${\hat{y}}_{i}$ approaches 1, the ith pixel has a greater likelihood of belonging to a steel object.

When the AAE network is trained, the generator will be able to generate designs in real space based on the latent vector. The trained generator is extracted from AAE and then used in the inverse design process. The training and prediction results for AAE network are shown in Figure 2. Figure 2a,b show the loss of the generator and discriminator network during the training process respectively. The loss function of the generator smoothly converges as the number of epochs increase; however, the loss of the discriminator fluctuates due to adversarial learning. Ideally, the discriminator loss should be close to 0.5 since it distinguishes between real and fake images during training. In order to evaluate the performance of the trained AAE, we apply it on testing samples and show the BCE error and the structure similarity index measure (SSIM) of testing samples in Figure 2c,d, respectively. The details of the SSIM are provided in Supporting Information.^[³⁸^] The mean average values of BCE and SSIM are found to be 0.02 and 0.94, respectively. Two representative examples of the reconstructed geometries are shown in Figure 2e,f. The results clearly indicate the excellent prediction performance of the trained network.

View Image - Figure 2. Designed AAE to learn the latent space of arbitrary binary structures. a) The binary cross-entropy loss function of the generator over training epochs. b) The binary cross-entropy loss function of the discriminator over training epochs. c) The binary cross-entropy error of testing samples. d) The structure similarity index measure (SSIM) on the test samples where the red dashed line shows the average error. e–f) Representative examples of original and reconstructed shapes of arbitrary objects from trained AAE network.

Figure 2. Designed AAE to learn the latent space of arbitrary binary structures. a) The binary cross-entropy loss function of the generator over training epochs. b) The binary cross-entropy loss function of the discriminator over training epochs. c) The binary cross-entropy error of testing samples. d) The structure similarity index measure (SSIM) on the test samples where the red dashed line shows the average error. e–f) Representative examples of original and reconstructed shapes of arbitrary objects from trained AAE network.

Forward Neural Network (FNN)

Forward modeling consists of predicting the response of the given physical system. Different from the traditional approach, the data-driven approaches predict the acoustic response of a given structure without explicitly solving the acoustic wave equations. To be more specific, we design a deep neural network that solves the regression problem deterministically by determining the far-field radiation patterns for a given arbitrary structure. The network learns the complex relation between arbitrary 2D binary structures and their associated multifrequency far-field radiation patterns. These patterns are invariant under translation of the object, so each binary structure is flattened to feed the fully connected network as an input.

For the training process, mean squared error (MSE) is used as the loss function that computes the average squared difference between the actual and the estimated values, that is, $ℒ = \frac{1}{N} \sum_{i} {(F_{i} - {\hat{F}}_{i})}^{2}$ where N is size of output spectrum and $F_{i}$ and ${\hat{F}}_{i}$ are the ground truth and the predicted far-field pattern, respectively. The architecture of FNN contains 4096 − 1000 − 1000 − 800 − 800 − 800 − 800 − 600 − 600 − 600 − 435 nodes and the details of the training process and network hyper parameters are provided in Supporting Information.^[³⁸^] The learning behavior as function of different epochs is shown in Figure 3a. In order to verify the validity and accuracy of the designed network, we use the test dataset to calculate the relative absolute error as follows: $e = \sum_{i} | F_{i} - {\hat{F}}_{i} | / F_{i}$ where $F_{i}$ and ${\hat{F}}_{i}$ correspond to the target and the predicted far-field response, respectively. Figure 3b illustrates the distribution of error, and the corresponding mean relative error of the test set predictions is below 0.04, as indicated by the dashed vertical red line. A typical example shows a good match between the COMSOL Multiphysics simulation and the FNN prediction results (see Figure 3c). An accurate training of FNN is crucial because the following inverse design approach uses the trained FN as an essential component for accurate and instant prediction of far-field patterns for a given object.

View Image - Figure 3. Designed FNN for arbitrary-shaped scatterer to multifrequency far-field mapping. a) Evolution of the MSE loss function over training epochs. b) The distribution of absolute prediction error on the test dataset where the red dashed line shows the average error. c) Representative examples of predicted multifrequency far-field amplitudes for the trained networks: i) an arbitrary binary structure and ii–vi) far-field profiles at different frequencies where solid blue and dashed red curves indicate the target and the predicted results.

Figure 3. Designed FNN for arbitrary-shaped scatterer to multifrequency far-field mapping. a) Evolution of the MSE loss function over training epochs. b) The distribution of absolute prediction error on the test dataset where the red dashed line shows the average error. c) Representative examples of predicted multifrequency far-field amplitudes for the trained networks: i) an arbitrary binary structure and ii–vi) far-field profiles at different frequencies where solid blue and dashed red curves indicate the target and the predicted results.

Inverse Neural Network (INN)

The next step extends deep learning to solve the inverse design problem, which is essentially the inverse process of the forward modeling. While the forward modeling deals with a one-to-one mapping between a physical system and the resultant response, the inverse design is facing the difficulty induced by the nonunique solution spaces. Therefore, a single discriminative network is not able to learn the complex relation in the inverse design. To circumvent this problem, auxiliary training approaches, generative models, and optimization strategies are combined in the inverse design process.^[^39–44^] The most common method to overcome the nonuniqueness issue consists of exploiting a tandem network that incorporates the forward modeling network into the inverse design DNN architecture. In conventional tandem networks, direct learning of the pixel-binary structure is not trivial due to the high degree of freedom in parameter design. In addition, the conventional tandem networks do not guarantee the complete elimination of nonunique solution space. Here, we propose a new solution of generative tandem network which learns the mapping between the far-field patterns and the latent distribution of a binary structure by incorporating an additional pretrained network, that is, the shape generator extracted from the trained AAE, into our inverse design strategy. The INN translates the input far-field patterns into the latent representation z, sampled from the Gaussian prior distribution parametrized by mean $μ$ and standard deviation σ, to approximate the latent space of an arbitrary geometry as illustrated in Figure 1c.

The loss function for the INN is constructed by absolute loss for reconstruction of far-field patterns, $ℒ_{recon}$ and Kullback–Leibler (KL) divergence $ℒ_{KL}$ , between the latent space distribution and the prior Gaussian distribution, $N (0, 1)$ . To be specific, the training of INN is to minimize the following loss function expressed as[Image Omitted. See PDF]where N is the total number of training samples, n is the nth data sample, and α is the relative weight between deterministic and generative learning. $ℒ_{recon}$ is the reconstruction loss computed as $ℒ_{recon}^{(n)} = \sum_{k = 1}^{K} | F_{k}^{(n)} - {\hat{F}}_{k}^{(n)} |$ where K is the length of input far-field pattern, $F_{i}$ and ${\hat{F}}_{i}$ are the ground truth and the predicted far-field pattern from the generated object, respectively. KL divergence for Gaussian latent distribution is expressed as $ℒ_{KL}^{(n)} = - \frac{1}{2} \sum_{j}^{J} (1 + {(σ_{j}^{(n)})}^{2} - {(μ_{j}^{(n)})}^{2} - log {(σ_{j}^{(n)})}^{2}),$ where J is the length of latent vector and σ and $μ$ are the standard deviation and mean of the distribution, respectively. The details about the stochastic tuning of the model are provided in the Experimental Section.

The INN architecture is composed of eight layers with each layer having 435 − 800 − 800 − 500 − 500 − 500 − 400 − 100 nodes. The intermediate latent space layer acts as a design layer, which is then passed into the generator and forward modeling part to calculate the corresponding far-field pattern. The learning curve is shown in Figure 4a. The rapidly decreasing behavior of learning curve demonstrates that training is highly effective. To evaluate the performance of the INN, we calculate the prediction errors on inversely generated far-field patterns, that is, relative absolute error, and on the predicted binary structure, that is, binary cross entropy, and structure similarity index measure. The mean percent error of inverse-designed far-field patterns is around 0.03 (see Figure 4b), while the average BCE error and SSIM are 0.077 and 0.85, respectively (see Figure 4c,d). In the inverse process, the learning behavior of the network depends on the intrinsically degenerate solution spaces. When degenerate solution spaces are reduced, prediction accuracy of the network to determine the unique arbitrary object is increased. A similarity metric of 0.85 over the test data indicates that the nonunique solution space has been reduced dramatically and that the target object is generated effectively.

View Image - Figure 4. Designed INN for shape recognition from given multifrequency far-field patterns. a) Evolution of the loss function of the generative inverse model over training epochs. b) The distribution of absolute prediction error for reconstructed far-field patterns on testing samples. c) Binary cross-entropy error of testing samples. d) Structure similarity index on predicted binary structures of testing samples. The red dashed lines in (b–d) indicate the average errors.

Figure 4. Designed INN for shape recognition from given multifrequency far-field patterns. a) Evolution of the loss function of the generative inverse model over training epochs. b) The distribution of absolute prediction error for reconstructed far-field patterns on testing samples. c) Binary cross-entropy error of testing samples. d) Structure similarity index on predicted binary structures of testing samples. The red dashed lines in (b–d) indicate the average errors.

Figure 5 illustrates two examples to showcase how the trained inverse network accurately determines the shape of an object for the given multifrequency far-field patterns. The predicted far-field patterns from the generated shapes are provided to compare with the given far-field patterns. As we can see, inversely designed objects (generated) for the given far-field patterns match well with the target objects, which confirms the efficacy of our method. The small variations around the edges barely alter the corresponding far-field information (see Figure 5a,b: (iii)–(vii)).

View Image - Figure 5. Examples of generated shapes from given multifrequency far-field patterns using INN. i) Target structure. ii) Inversely generated structure from far-field patterns. iii–vii) Comparison of the simulated (solid black curve) and predicted far-field pattern (dashed red curves) from the generated geometries in (a,b).

Figure 5. Examples of generated shapes from given multifrequency far-field patterns using INN. i) Target structure. ii) Inversely generated structure from far-field patterns. iii–vii) Comparison of the simulated (solid black curve) and predicted far-field pattern (dashed red curves) from the generated geometries in (a,b).

Next, we study the effect of feeding the multifrequency data for “unique” shape recognition. We investigate the performance of the INN by starting with single-frequency far-field data, 1 kHz, and then continuously increasing the number of frequencies in discrete steps of 0.5 kHz. For brevity, this study limits the INN results to five frequencies. The INN results are shown in Figure 6. We observe that the performance of the network to uniquely identify a structure is improved as the number of frequencies considered for the far-field data increases. Figure 6a clearly shows that the binary cross-entropy error used to identify an arbitrary shape sharply decreases when the second frequency is added and then slowly saturates as more frequencies are taken into account. A similar trend is observed with SSIM, where more frequencies make the SSIM approach toward unity (see Figure 6b). Thus, we can conclude that by adding more frequencies in the network training, the network gives better results. We found five frequencies are sufficient for an optimal performance (further increasing the frequencies number does not noticeably improve the performance). The detailed results for the trained FNN and INN for multifrequency data are provided in Supporting Information.^[³⁸^]

View Image - Figure 6. Comparison of INN performance with multifrequency far-field data. a) Frequency-dependent Binary cross entropy error and b) structure similarity index measure (SSIM). BCE error decreases and SSIM increases with feeding multifrequency far-field data to the neural network where f1=1 kHz, f2=1.5 kHz, f3=2 kHz, f4=2.5 kHz, and f5=3 kHz. The accuracy of INN to recognize the shape is significantly improved with multifrequency data.

Figure 6. Comparison of INN performance with multifrequency far-field data. a) Frequency-dependent Binary cross entropy error and b) structure similarity index measure (SSIM). BCE error decreases and SSIM increases with feeding multifrequency far-field data to the neural network where f1=1 kHz, f2=1.5 kHz, f3=2 kHz, f4=2.5 kHz, and f5=3 kHz. The accuracy of INN to recognize the shape is significantly improved with multifrequency data.

In reality, it may not be feasible to measure multifrequency far-field data over the entire angular range from $0^{°}$ to $360^{°}$ . Therefore, analyzing the performance of the network with partial angular field data is crucial. For the sake of demonstration, we provide far-field data during training in a reduced angular range from $0^{°}$ to $180^{°}$ (see inset in Figure 7a), and the results of training the network are shown in Figure 7. Although the average error of prediction is slightly compromised because of reduced information used for learning in the training, the proposed method can still accurately identify the key features of the considered arbitrary objects. The detailed results are shown in Supporting Information.^[³⁸^]

View Image - Figure 7. FNN and INN results for multifrequency half-plane far-field data. The distribution of absolute relative error to predict the far-field through trained a) FNN and b) INN. The inset in (a) shows the half-plane multifrequency far-field patterns for an arbitrary-shaped object. The binary cross-entropy error and structure similarity index measure of INN to recognize the shape are shown in (c,d), respectively. e–f) Examples of generated geometries for the given far-field patterns. (See the Supporting Information for more details).

Figure 7. FNN and INN results for multifrequency half-plane far-field data. The distribution of absolute relative error to predict the far-field through trained a) FNN and b) INN. The inset in (a) shows the half-plane multifrequency far-field patterns for an arbitrary-shaped object. The binary cross-entropy error and structure similarity index measure of INN to recognize the shape are shown in (c,d), respectively. e–f) Examples of generated geometries for the given far-field patterns. (See the Supporting Information for more details).

Conclusion

The inverse scattering problem of retrieving the shape from the far-field information is highly nonlinear and extremely challenging to solve with conventional approaches due to nonuniqueness issues. Here, we propose a generative deep learning approach as a practical design tool, to uniquely determine the shape of an arbitrary object with multifrequency phaseless data. We exploit generative adversarial learning to encode the true features of objects into the latent space through adversarial autoencoder and further integrate its generator into the inverse design process to create random shapes. The forward network is designed to learn the relation between a given structure and the corresponding far-field profile for instant and accurate predictions. The inverse design strategy is based on a generative encoder-decoder-like architecture, where the encoder (i.e., inverse network) is trained while fixing the decoder, consisting of the pretrained forward network and generator. We study the influence of feeding multifrequency far-field data on shape recognition and show that multifrequency data rules out the nonunique solution spaces in the inverse architecture. In addition, we demonstrate that half-angular far-field data (i.e., from $0^{°}$ to $180^{°}$ ) is still capable of uniquely determining the shape of the arbitrary object. To accelerate inverse learning, the proposed approach uses a latent distribution rather than a higher-dimensional object, which differs from conventional tandem architectures. The designed network instantly predicts the shape of the object for the given far-field information with unprecedented speed, reducing the design time by four orders of magnitude over traditional methods. It is important to emphasize that all our simulations are scalable, and the choice of frequencies depends on the scattering properties of the scatterer. The scattering properties are directly related to the size of the scatterer and the physical parameters (density, elastic moduli) of both the scatterer and the surrounding environment. The proposed approach has potential applications for automatic detection of underwater objects such as submarines, fish species, and monitoring of the others underwater activities. Our approach is quite generic and may be readily extended to other physical systems, such as optical and plasmonic systems, and to design more complex and random geometries of 3D space structures as 3D scattering does not differ inherently from its 2D counterpart, albeit the computational time will be longer. Moreover, in this work, we considered the material of the scatterer to be known a priori. Yet, in principle, our technique can also work to recognize both the shape and material. In fact, material properties may be represented by discrete data structures that can be incorporated into a forward model as a new input during training and consequently, the generative network will predict them along with the recognition of the shape. As long as the material contrast with the host medium (water in this case) is not significantly large such that the scattering patterns can be distinguished from other materials, the method can learn the material properties from the scattering features. Moreover, a specific material set can be treated as a classification problem to predict the shape based on a particular material label. For instance, in our previous work,^[²⁵^] we predicted materials within a bounded modulus and density range. However, if we specify the material set (such as aluminum, steel, rubber, etc.), then we can treat it as a classification problem by assigning label values 0, 1, 2 to each material instead of bulk modulus or density. As a result, the output will predict the shape along with the target label (0, 1, 2, …) corresponding to real materials.

Experimental Section Adversarial Learning Method

In AAE, the encoder takes a real image $x \in R^{d}$ as an input, compresses it into a p-dimensional latent space z $\in R^{p}$ (where $p ≪ d$ ), and then the generator reconstructs the image $\tilde{x} \in R^{d}$ . Let $q (z)$ be the posterior distribution of latent space, z, in autoencoder and $p (z)$ be user-specified prior distribution, which is assumed to be Gaussian distribution. The encoder generates the latent space from the posterior distribution $z \approx p (z)$ and the discriminator D distinguishes z to be a real from the prior distribution $p (z)$ or generated by the encoder as follows.[Image Omitted. See PDF]

AEE attempts to generate z analogous to the real latent space from the prior distribution via adversarial learning.

Variational Inference Method

The inverse network exploits the variational inference method to learn the distribution of far-field response and its mapping with the latent space, z, of the arbitary object. The reconstruction loss is the absolute error between the input far-field patterns and the reconstructed patterns by the INN. The learned distribution $q (z | x)$ equivalent to the predefined distribution $p (z)$ is ensured through KL divergence term in loss function expressed as[Image Omitted. See PDF]where n represents the $n th$ dataset, p is the dimensionality of the latent space z, α is the relative weight between the deterministic and stochastic learning. In our study, we tuned $α = 10^{- 5}$ for accurate prediction of the shape of object via INN. The KL divergence between two Gaussian distributions can be defined as[Image Omitted. See PDF]where μ and σ are the mean and the standard deviation of the generated latent space distribution, respectively. The reparameterization trick helps to determine $z = μ_{p} + σ_{p}$ ε, where ε is a sample from prior Gaussian distribution, during the optimization process. The reconstruction term is expressed as the MSE between the input coding matrix and the reconstructed coding matrix output by the decoder.

Numerical Modeling

The finite-element method was used to perform the full wave simulations with COMSOL Multiphysics. We assumed water as the background medium in our simulation setup with mass density $ρ = 1000 {kg m}^{- 3}$ and bulk modulus $κ = 2.19 GPa$ . The arbitrary object was made of steel with mass density $ρ = 7850 {kg m}^{- 3}$ and bulk modulus $κ = 201 GPa$ . The steel object in water produces shear waves so we modeled the steel with solid mechanics modules and background medium with acoustic module. The integration of acoustic module and solid mechanics module was done with Multiphysics module for full wave modelling, via the acoustic–structure boundary. The arbitrary object was excited with an acoustic plane wave of unit amplitude to determine the far-field response at different operating frequencies. (See Supporting Information for details on the derivation and governing equations).

Acknowledgements

The work described in here was supported by King Abdullah University of Science and Technology (KAUST) Artificial Intelligence Initiative Fund, KAUST Office of Sponsored Research (OSR) under grant no. OSR-2020-CRG9-4374, and KAUST Baseline Research Fund no. BAS/1/1626-01-01.

Conflict of Interest

The authors declare no conflict of interest.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Word count: 5172

Show less

© 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

A generative deep learning approach for shape recognition of an arbitrary object from its acoustic scattering properties is proposed and demonstrated. The strategy exploits deep neural networks to learn the mapping between the latent space of a 2D acoustic object and the far-field scattering amplitudes. A neural network is designed as an adversarial autoencoder and trained via unsupervised learning to determine the latent space of the acoustic object. Important structural features of the object are embedded in lower-dimensional latent space which supports the modeling of a shape generator and accelerates the learning in the inverse design process. The proposed inverse design uses the variational inference approach with encoder- and decoder-like architecture where the decoder is composed of two pretrained neural networks: the generator and the forward model. The data-driven framework finds an accurate solution to the ill-posed inverse scattering problem, where nonunique solution space is overcome by the multifrequency phaseless far-field patterns. This inverse method is a powerful design tool that doesn't require complex analytical calculation and opens up new avenues for practical realization, automatic recognition of arbitrary-shaped submarines or large fish, and other underwater applications.

Details

Title

A Generative Deep Learning Approach for Shape Recognition of Arbitrary Objects from Phaseless Acoustic Scattering Data

Author

Ahmed, Waqas W¹; Farhat, Mohamed¹; Pai-Yen, Chen²; Zhang, Xiangliang³; Wu, Ying⁴

¹ Division of Computer, Electrical and Mathematical Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
² Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, IL, USA
³ Division of Computer, Electrical and Mathematical Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
⁴ Division of Computer, Electrical and Mathematical Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia; Division of Physical Sciences and Engineering, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia

Section

Research Articles

Publication year

2023

Publication date

May 2023

Publisher

John Wiley & Sons, Inc.

e-ISSN

26404567

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/aisy.202200260

ProQuest document ID

2815836729

A Generative Deep Learning Approach for Shape Recognition of Arbitrary Objects from Phaseless Acoustic Scattering Data

Jump to:

Full text

Abstract

Details

Suggested sources