Fast SAR Autofocus Based on Ensemble

Full text

Turn on search term navigation

1. Introduction

Synthetic Aperture Radar (SAR) is an active microwave remote-sensing system. SAR has been widely applied to both military and civilian fields due to its all-time and all-weather observation abilities [1]. However, the imaging quality of SAR is usually degraded by undesired Phase Errors (PEs). These PEs usually come from trajectory deviations and the instability of the platform velocity [2]. The uncompensated PEs will cause serious image blurring and geometric distortion of the SAR imagery [3]. The navigation system cannot provide precise information about these motion errors [4]. For high-quality imaging, especially high-resolution imaging, it is important to compensate for these PEs. Autofocus is a data-driven technique, which can directly estimate the phase error from the backscattered signals [5].

In recent decades, many autofocus algorithms have been developed. These methods can be classified into the following three categories: sub-aperture-based, inverse-filtering-based, and metric-optimization-based algorithms. The sub-aperture autofocus algorithm is also called Map Drift Autofocus (MDA) [6]. MDA divides the full-aperture range-compressed data into equal-width sub-aperture data. Each sub-aperture datum is imaged separately to obtain a sub-map. The position offset is determined by finding the position of the cross-correlation peak between sub-maps [7]. The more sub-apertures that are divided, the higher the order of phase error that can be estimated [8]. Thus, the sub-aperture-based algorithms cannot be used to correct high-order phase errors, which are limited by the number of sub-apertures. The original MDA was developed to correct the phase errors in azimuth. Recent works focus on two-dimensional phase-error correction. In [9], the MDA was extended to highly squinted SAR by introducing a squinted-range-dependent map drift estimator to correct the range-variant PEs. In [10], a novel, two-dimensional, spatial-variant MDA is proposed for an unmanned aerial vehicle SAR autofocus.

The Phase Gradient Autofocus (PGA) is a widely utilized, inverse-filtering-based autofocus method [11]. There are four main steps in the PGA algorithm: center shift in dominant scatters, windowing, phase gradient estimation, and iterative correction. The Maximum Likelihood (ML) [12] and Linear Unbiased Minimum Variance (LUMV) [13] are two of the methods utilized to estimate the phase gradient. The PGA method can quickly estimate and correct phase errors of any order through iteration. However, the performance of the PGA algorithm heavily depends on the existence of the isolating dominant scatters on the target [14]. The algorithm will not work in a scene without dominant scatters. In addition, the window width will also affect the performance of the algorithm [15] and should be carefully set. The original PGA method is proposed for spotlight SAR autofocus [16]. When utilized for stripmap SAR, the full aperture data must first be divided into smaller aperture data along the azimuth direction (each sub-aperture size cannot exceed the size of a synthetic aperture) [17,18]. Then, for each sub-aperture data group, apply the PGA algorithm. In [19], a generalized PGA algorithm, which is suitable for use with the backprojection algorithm, is developed. Evers et al. [20] the PGA algorithm is extended for SAR over arbitrary flight paths, including both near-field and bistatic collection geometry.

The metric-optimization-based autofocus algorithms estimate the unknown phase errors by minimizing metrics such as entropy [21,22,23,24], contrast [25,26], or harpness [27,28]. The most commonly used metric-based autofocus method is the Minimum-Entropy-based Autofocus (MEA) method. Usually, the phase error is modeled as a polynomial model to reduce the number of optimization variables [29]. These kinds of algorithm can obtain a higher focusing quality than the above two methods. However, the metric-optimization-based algorithm has high computational complexity and needs a lot of iterations to converge [30]. Moreover, it is difficult to set an appropriate learning rate. Too small a learning rate will lead to an increase in iterations, and too large a learning rate will cause it to converge to a non-optimal solution.

Artificial Neural Network (ANN) is a promising machine-learning technique, used for classification and regression tasks. Extreme Learning Machine (ELM) is a kind of single, hidden-layer, feedforward neural network. ELM was first proposed by Huang et al. [31] in 2004. ELM can also be used to solve the problem of classification and regression [32]. As is widely known, traditional ANN requires thousands of iterative training actions to minimize the objective function. Unlike traditional ANN, the training process of an ELM is non-iterative and very fast. The weights from the input layer to the hidden layer are randomly generated and do not need to be adjusted [33]. The optimization of ELM is used to solve a minimum norm, least squares solution problem, which has a closed-form solution [34]. ELM still has universal classification and approximation abilities and can fit arbitrarily functions [35,36]. In recent years, some ensemble-based ELM methods have been proposed [37,38,39]. Due to its properties of fast training times and a robust performance, ELM is very suitable for ensemble learning.

In this paper, a fast, machine-learning-based autofocus algorithm is proposed. The problem of SAR autofocus can be regarded as regression and prediction of phase error. In order to reduce the regression difficulties, the phase errors are modeled as a polynomial, with a specific degree. The machine learning model is utilized to predict the polynomial coefficients. To deal with the two-dimensional SAR image data, a convolutional extreme learning machine (CELM) is constructed to predict the polynomial coefficients. To improve the performance of a single CELM, multiple individual CELMs are integrated by a novel, metric-based combination strategy. The bagging-based ensemble learning method is utilized to train the model. The main contributions of this paper can be summarized as follows: (1) To the best of our knowledge, this is the first use of machine learning to solve the SAR autofocus problem. (2) A metric-based combination strategy is proposed. (3) A novel SAR autofocus scheme, based on our proposed, ensemble, convolutional, extreme learning machine, is proposed.

The remainder of this paper is organized as follows. In Section 2, the fundamental background of SAR autofocus is explained. Section 3 presents our approach to SAR autofocus. Section 4 describes the dataset, outlines the experimental setup, and presents the results. In Section 5, the results obtained in the performed experiments, the practical implications of the proposed method, and future research directions are discussed. Finally, Section 6 concludes the paper.

2. Fundamental Background

SAR autofocus is a data-driven parameter-estimation technology. It aims to automatically estimate the phase error from the SAR-received data. The residual phase error in the distance direction is generally so small that it can be ignored after the correction of range cell migration. The phase errors that needs to be corrected mainly occur in the azimuth direction [40]. The azimuth phase error estimation and compensation usually occur in the range-doppler domain. Suppose we have a complex-valued defocused image $X \in C^{N_{a} \times N_{r}}$ , where $N_{a}, N_{r}$ are the number of pixels in the azimuth and range, respectively. Denote $X$ as the range-doppler domain data matrix of $X$ . The one-dimensional azimuth phase error compensation problem can be formulated as [41]

(1) $Y_{n_{a} n_{r}} = \frac{1}{N_{a}} \sum_{k = 0}^{N_{a} - 1} X_{k n_{r}} exp {- j ϕ_{k}} exp \{j \frac{2 π}{N_{a}} k n_{a}\},$

where

Y \in C^{N_{a} \times N_{r}}

is the compensated image matrix; k is frequency index in azimuth.

n_{a}, n_{r}

are the azimuth and range index subscripts of matrix

X

, respectively.

ϕ_{k}

is the k-th element of the phase error vector

ϕ \in R^{N_{a} \times 1}

. Let

D

be a square diagonal matrix composed of the elements of vector

ϕ

on the main diagonal, i.e.,

D_{ϕ} = diag (exp {- j ϕ})

, where

diag (\cdot)

represents the diagonalization operation. Thus, Equation (1) can be expressed in the form of matrix multiplication as follows:

(2) $Y = {\tilde{F}}_{a} (D_{ϕ} X) = {\tilde{F}}_{a} (D_{ϕ} F_{a} (X)),$

where

F_{a}, {\tilde{F}}_{a}

represent the Fourier transform and the inverse Fourier transform in azimuth, respectively.

The key problem of autofocus is how to estimate $ϕ$ from defocused image $X$ . Phase Gradient Autofocus is a simple autofocus algorithm and has been widely used. Denote $X \in C^{N_{a} \times N_{r}}$ as a defocused SAR image. First, find the dominant scatters (targets with large intensities) of each range line. Then, center shift these strong scatters along the azimuth direction to obtain a center-shifted image $Z$ . This method assumes that the complex reflectivites, except for the dominant scatters, are distributed as zero-mean Gaussian random noises [41]. To accurately estimate the phase error gradient from these dominant targets, the center-shifted image $Z$ is windowed. Denote $Z \in C^{N_{a} \times N_{r}}$ as the range doppler domain data (apply azimuth Fourier transform to $Z$ ) of $Z$ . The phase gradient estimation based on Maximum Likelihood (ML) can be formulated as

(3) $\hat{\dot{ϕ}} (k) = ∠ \sum_{n_{r} = 0}^{N_{r} - 1} Z_{k, n_{r}}^{*} Z_{k + 1, n_{r}},$

where

Z^{*}

is the complex conjugation of

Z

\hat{\dot{ϕ}}

is the estimated phase error gradient vector, and ∠ is the phase operation. Another commonly used gradient estimation method is Linear Unbiased Minimum Variance (LUMV) algorithm. Let

G

be the gradient matrix of

Z

in azimuth, i.e.,

G_{k, :} = Z_{k, :} - Z_{k - 1, :}

, where

k = 0, 1, \dots, N_{a} - 1

and

Z_{- 1, :} = 0 \in C^{1 \times N_{r}}

. The LUMV-based phase error gradient estimation is expressed by

(4) $\hat{\dot{ϕ}} (k) = \frac{\sum_{n_{r} = 0}^{N_{r} - 1} Imag (G_{k, n_{r}} Z_{k, n_{r}}^{*})}{\sum_{n_{r} = 0}^{N_{r} - 1} Z_{k, n_{r}} Z_{k, n_{r}}^{*}},$

where

Imag (\cdot)

represents taking the imaginary part of a complex number.

Different from PGA, the metric-based autofocus algorithms estimate phase errors by optimizing a cost function or a metric function. The cost function has the ability to evaluate the focus quality of the image. In the field of radar imaging, entropy is usually used to evaluate the focusing quality of an image. The better the focus, the smaller the entropy. Denote $X \in C^{H \times W}$ as a complex-valued image; the entropy is defined as

(5) $S (X) = - \sum_{i = 0}^{H - 1} \sum_{j = 0}^{W - 1} \frac{{| X |}_{i j}^{2}}{C} ln \frac{{| X |}_{i j}^{2}}{C},$

where

H, W

are the height and width of the image, respectively,

{| X |}_{i j}

is the element in the i-th row and j-th column of amplitude image

| X | \in C^{H \times W}

, ln is the natural logarithm, and scalar

C \in R

can be computed by [24]

(6) $C = \sum_{i = 0}^{H - 1} \sum_{j = 0}^{W - 1} {| X |}_{i j}^{2} .$

Contrast is another metric used to evaluate an image’s focusing quality. In [30], contrast is defined as the ratio of the mean square error of the target energy to the mean value of the target energy

(7) $C (X) = \frac{\sqrt{E {({| X |}^{2} - {E (| X |}^{2}))}^{2}}}{E (| X |^{2})},$

where

E (\cdot)

denotes the mathematical expectation operation. The better the image focus quality, the greater the contrast, and vice versa.

The minimum-entropy based autofocus (MEA) algorithm aims at minimizing

(8) $\begin{matrix} L (X; ϕ) & = - \sum_{n_{a} = 0}^{N_{a} - 1} \sum_{n_{r} = 0}^{N_{r} - 1} \frac{{| Y |}_{n_{a} n_{r}}^{2}}{C} ln \frac{{| Y |}_{n_{a} n_{r}}^{2}}{C} \\ = - \frac{1}{C} \sum_{n_{a} = 0}^{N_{a} - 1} \sum_{n_{r} = 0}^{N_{r} - 1} {| Y |}_{n_{a} n_{r}}^{2} ln {| Y |}_{n_{a} n_{r}}^{2} + ln C, \end{matrix}$

where

ϕ

is the phase error vector,

Y

is the compensated image and can be computed using Equation (1). Since C is a constant, minimizing Equation (8) is equivalent to minimizing the following equation

(9) $L (X; ϕ) = - \sum_{n_{a} = 0}^{N_{a} - 1} \sum_{n_{r} = 0}^{N_{r} - 1} {| Y |}_{n_{a} n_{r}}^{2} ln {| Y |}_{n_{a} n_{r}}^{2} .$

Utilize the gradient descent method, one can optimize Equation (9); the iterative update formula can be expressed as

(10) $ϕ^{t + 1} = ϕ^{t} - μ \frac{\partial L}{\partial ϕ},$

where

μ

is learning rate,

ϕ^{t + 1}

is the updated phase error vector,

t = 0, 1, \dots, N_{iter}

is iteration counter, and

N_{iter}

is the maximum iteration number.

The partial derivative of $L$ with respect to $ϕ$ can be formulated as

(11) $\frac{\partial L}{\partial ϕ_{k}} = - \sum_{n_{a} = 0}^{N_{a} - 1} \sum_{n_{r} = 0}^{N_{r} - 1} [(1 + {ln | Y |}_{n_{a} n_{r}}^{2}) \frac{{\partial | Y |}_{n_{a} n_{r}}^{2}}{\partial ϕ_{k}}],$

where

k = 0, 1, \dots, N_{a} - 1

. According to [24], the final expression is

(12) $\begin{matrix} \frac{\partial L}{\partial ϕ_{k}} & = - \sum_{n_{a} = 0}^{N_{a} - 1} \sum_{n_{r} = 0}^{N_{r} - 1} [(1 + {ln | Y |}_{n_{a} n_{r}}^{2}) 2 Imag (- j X_{k n_{r}}^{*} X_{k n_{r}})] \\ = 2 Imag \{\sum_{n_{a} = 0}^{N_{a} - 1} \sum_{n_{r} = 0}^{N_{r} - 1} [(1 + {ln | Y |}_{n_{a} n_{r}}^{2}) (j X_{k n_{r}}^{*} X_{k n_{r}})]\} \\ = 2 Imag \{\sum_{n_{r} = 0}^{N_{r} - 1} F^{*} X_{k n_{r}} exp (- j ϕ_{k})\} \end{matrix}$

where

F

can be calculated by azimuth Fourier transform

(13) $\begin{matrix} F & = \sum_{n_{a} = 0}^{N_{a} - 1} [(1 + ln | Y_{n_{a} n_{r}} |^{2}) Y_{n_{a} n_{r}} exp (- j \frac{2 π}{N_{a}} k n_{a})] \\ = F_{a} [(1 + ln | Y_{n_{a} n_{r}} |^{2}) Y_{n_{a} n_{r}}] . \end{matrix}$

In general, for different types of phase error, $ϕ$ can be modeled in different forms. Modeling $ϕ$ can reduce the number of parameters that need to be optimized and the complexity of the problem. In this paper, we focus on the polynomial type phase error, which can be formulated as

(14) $ϕ = ϕ (α) = α_{2} p^{2} + α_{3} p^{3} + \dots + α_{Q} p^{Q},$

where

p \in R^{N_{a} \times 1}

is the azimuth frequency vector, which can be normalized to

[- 1, 1]

[- 0.5, 0.5]

α = {[α_{2}, \dots, α_{Q}]}^{T} \in R^{(Q - 1) \times 1}

is the polynomial coefficient vector and Q is the order of the polynomial.

The minimum-entropy-based methods are not restricted by the assumptions in PGA, but require many iterations to converge. As a result, these methods are more robust than PGA, and have a higher focus quality, but suffer from slow speed. In this paper, we focus on the development of a non-iterative autofocus algorithm based on machine learning. An ensemble-based, machine-learning model is proposed to predict the polynomial coefficients. The azimuth phase errors are computed according to Equation (14). The SAR image can be focused by compensating for the errors in azimuth using Equation (2).

3. Materials and Methods

In this section, ensemble learning and extreme learning machine are briefly introduced, and the proposed ensemble-learning-based autofocus method is described in detail.

3.1. Ensemble Scheme

Ensemble learning combines some weak but diverse models with certain combination rules to form a strong model. Key to ensemble learning are individual learners with diversity and the combination strategy. In ensemble learning, individual learners can be homogeneous or heterogeneous. A homogeneous ensemble consists of members with a single-type base learning algorithm, such as the decision tree, support vector machine or neural network, while a heterogeneous ensemble consists of members with different base learning algorithms. Homogeneous learners are most commonly used [42].

Classical ensemble methods include bagging, boosting, and stacking-based methods. These methods have been well-studied in recent years and applied widely in different applications [43]. The key idea of a boosting-based algorithm is: the samples used to train the current individual learner are weighted according to the learning errors of the previous individual learner. Thus, the larger the errors in a sample used by the previous individual learner, the greater the weight that is set for this sample, and vice versa [44]. Therefore, in the boosting-based algorithm, there is a strong dependence among individual learners. It is not suitable for parallel processing and has a low training efficiency. The bagging (bootstrap aggregating) ensemble method is based on bootstrap sampling [37]. Suppose there are $N^{'}$ training samples and M individual learners; then, N samples are randomly sampled from the original $N^{'}$ samples to form a training set. M training sets for M individual learners can be obtained by repeating M times sampling. Therefore, in the bagging-based method, there is no strong dependence between individual learners, which makes it suitable for parallel training. In this paper, the bagging-based ensemble method is utilized to form data diversity.

In ensemble learning, three combination strategies have been widely used, including averaging, voting, and learning-based strategies [45]. For the regression problem, the first method is usually utilized, i.e., averaging the outputs of M individual learners to obtain the final output. The second strategy is usually used for classification problems. The winner is the candidate with the maximum total number of votes [46]. The learning-based method is different from the above two methods; it takes the outputs of M individual learners as the inputs of a new learner, and the combination rules are automatically learned. To combine the results of multiple individual autofocus learners, we propose a metric-based combination strategy. In other words, the winner is the candidate with the optimal metric value (such as minimum-entropy or maximum-contrast. The framework of our proposed ensemble-learning-based autofocus algorithm is illustrated in Figure 1, where “PEC” represents the phase error compensation module, which is formulated by Equation (2).

In Figure 1, there are M homogeneous individual learners. Each learner is a Convolutional Extreme Learning Machine (CELM). Denote $X \in C^{N_{a} \times N_{r}}$ as a defocused SAR image, where $N_{a}, N_{r}$ are the number of pixels in azimuth and range, respectively. We can obtain M estimated phase errror vectors $ϕ^{(1)}, ϕ^{(2)}, \dots, ϕ^{(M)}$ . These vectors are used to compensate for the defocused image $X$ , and M focused images $Y^{(1)}, Y^{(2)}, \dots, Y^{(M)}$ are obtained. Finally, our proposed metric-based combination strategy is applied to these images to obtain the final result. For example, if entropy is utilized as the metric, then the final focused image can be expressed as

(15) $Y = \underset{Y^{(m)}}{argmin} S (Y^{(m)}), m = 1, 2, \dots, M .$

Similarly, if contrast is utilized as the metric, then the final focused image can be expressed as

(16) $Y = \underset{Y^{(m)}}{argmax} C (Y^{(m)}), m = 1, 2, \dots, M .$

3.2. Convolutional Extreme Learning Machine

The original ELM is a three-layer neural network (input, hidden, output) designed for processing one-dimensional data. Denote $x \in R^{d \times 1}$ as the input vector, and L as the number of neurons in the hidden layer. Let $a_{i} \in R^{d \times 1}$ represent the weight between input $x$ and the i-th neuron of hidden layer, and let $b_{i} \in R$ be the bias. The output of the i-th hidden layer neuron can be expressed as

(17) $h_{i} = g (a_{i}, b_{i}, x) = g (a_{i}^{T} x + b_{i}), i = 1, 2, \dots, L,$

where g is a nonlinear piecewise continuous function (activation function in traditional neural networks). The L outputs of the L hidden layer neurons can be represented as

h = {[h_{1}, h_{2}, \dots, h_{L}]}^{T}

, where

h \in R^{L \times 1}

Denote $β \in R^{L \times K}$ as the weight, ranging from the hidden layer to output layer; K is the number of neurons in the output layer. For a classification problem, K is the number of classes; for a regression problem, K is the dimension of the vector to be regressed. The output of ELM can be formulated as

(18) $y = h^{T} β .$

Suppose there is a training set with N training samples: $S = {(x_{n}, t_{n})}_{n = 1}^{N}$ , where $t \in R^{K \times 1}$ is the truth-value vector (for the classification problem, $t$ is the one-hot class label vector). The hidden layer feature matrix of these N samples is $H = {[h_{1}, h_{2}, \dots, h_{N}]}^{T}$ . The classification or regression problem for ELM is to optimize

(19) $min_{β} {: ∥ β ∥}_{p}^{σ_{1}} + λ {∥ H β - T ∥}_{q}^{σ_{2}},$

where

σ_{1} > 0, σ_{2} > 0, p, q > 0

λ > 0

is the regularization factor,

T = {[t_{1}, t_{2}, \dots, t_{N}]}^{T}

is the truth-value matrix of the N samples.

Equation (19) can be solved by an iterative method, orthogonal projection method or singular value decomposition [34,47]. When $σ_{1} = σ_{2} = p = q = 2$ , Equation (19) has the following closed-form solution [32]

(20) $β = \{\begin{matrix} H^{T} {(\frac{I}{λ} + H H^{T})}^{- 1} T, & if N \leq L \\ {(\frac{I}{λ} + H^{T} H)}^{- 1} H^{T} T, & if N > L, \end{matrix}$

where

I

is an identity matrix. The process of solving

β

does not need iterative training, and it is very fast.

The original ELM can only deal with one-dimensional data. For two-dimensional or a higher dimensional input, it is usually flattened to a vector. This flattened operation destroys the original spatial structure of input data and leads ELMs to perform poorly in image-processing tasks. To overcome this problem, Huang et al. [48] proposed a Local Receptive-Fields-Based Extreme Learning Machine (ELM-LRF). Differing from the traditional Convolutional Neural Network (CNN), the size and shape of the receptive field (convolutional kernel) of ELM-LRF can be generated according to the probability distribution. In addition, CNN uses a back-propagation algorithm to iteratively adjust the weights of all layers, while ELM-LRF has a closed-form solution.

In this paper, we propose a Convolutional Extreme Learning Machine (CELM) method for phase error estimation. The network structure of a single CELM is illustrated in Figure 2. It contains a convolutional (Conv) layer, an Instance Normalization (IN) layer [49], a Leaky Rectified Linear Unit (LeakyReLU) nonlinearity [50], a Global Average Pooling (GAP) layer in range, a flattening layer, and an output layer. As mentioned above, in order to simplify the prediction problem, we use CELM to estimate the polynomial coefficients instead of phase errors. In Figure 2, K denotes polynomial coefficients and equals $Q - 1$ , where Q is the order of the polynomial.

The detailed configuration of CELM is shown in Table 1. Suppose there is a complex SAR image of 256 pixels in both height and width. Denote $C_{o}$ as the number of channels produced by convolution, and n as the number of images in a batch. The output size of each layer in CELM is also displayed in Table 1. As shown in Figure 2 and Table 1, there is only one convolutional layer in a CELM. The convolution stride is set to 1. In Figure 2, the convolution kernel sizes for azimuth and range are 63 and 1, respectively.

Let $X \in R^{N \times C_{i} \times N_{a} \times N_{r}}$ be convolution input, where N is the number of inputs, and $N_{a}, N_{r}, C_{i}$ are the height, width and channels of $X$ , respectively. In this paper, the convolution kernels between channels do not share weights. Denote $A \in R^{C_{o} \times C_{i} \times H_{k} \times W_{k}}$ as the weight matrix of the convolution kernels, where $H_{k}, W_{k}$ are the height and width of the convolution kernel. $C_{o}$ is the number of channels produced by the convolution. The convolution between $A$ and $X$ can be formulated as

(21) $O_{n, c_{o}, :, :} = \sum_{c_{i} = 0}^{C_{i} - 1} X_{n, c_{i}, :, :} * A_{c_{o}, c_{i}, :, :}$

where

n = 0, 1, \dots, N - 1

, * represents the classic two-dimensional convolution operation, and

X_{n, c_{i}, :, :}

is the

c_{i}

-th channel of the n-th image of

X

, and

O \in R^{N \times C_{o} \times H_{o} \times W_{o}}

. In this paper,

C_{i}

equals 2, since the defocused complex-valued SAR image is first converted into a two-channel image (real channel image and imaginary channel image) before being fed into CELM. As the phase distortion is in azimuth, we use azimuth convolution to extract features. Thus, the weight of the convolutional layer is a matrix with size

C_{o} \times 2 \times r_{a} \times 1

, where

C_{o}

is the number of channels produced by the convolution, 2 is the number of channels of the input image,

r_{a}

is the kernel size in azimuth.

The instance normalization of convolutional features $O \in R^{N \times C_{o} \times H_{o} \times W_{o}}$ can be expressed as

(22) $\begin{matrix} \bar{O} & = \frac{O - μ}{\sqrt{σ^{2} + ϵ}}, \end{matrix}$

where

C_{o}, H_{o}, W_{o}

are the channels, height, and width of

O

, respectively. The mean value

μ

and standard variance

σ

can be calculated by

(23) $\begin{matrix} μ & = \frac{1}{H_{o} W_{o}} \sum_{h = 1}^{H_{o}} \sum_{w = 1}^{W_{o}} O_{:, :, h, w}, μ \in R^{N \times C_{o} \times 1 \times 1}, \\ σ^{2} & = \frac{1}{H_{o} W_{o}} \sum_{h = 1}^{H_{o}} \sum_{w = 1}^{W_{o}} {(O_{:, :, h, w} - μ)}^{2}, σ^{2} \in R^{N \times C_{o} \times 1 \times 1} . \end{matrix}$

After convolution and instance normalization, a LeakyReLU activation is applied to the normalized features $\bar{O}$ . Mathematically, the LeakyReLU function is expressed as

(24) $y = LeakyReLU (x) = \{\begin{matrix} x, & x \geq 0 \\ γ x, & x < 0 \end{matrix},$

where

γ

is the negative slope, set to 0.01 in this paper. Denote

\tilde{O} = LeakyReLU (\bar{O})

as output features of the LeakyReLU nonlinearity. By appying the GAP operation to

\tilde{O}

in the range direction for dimension reduction, the features after pooling can be expressed as

(25) ${\bar{H}}_{n, c, i} = \sum_{j = 1}^{W_{o}} {\tilde{O}}_{n, c, i, j},$

where

\bar{H}

is the features after the range GAP. Thus, each feature map is reduced to a feature vector. For an image,

C_{o}

feature vectors will be generated. These

C_{o}

feature vectors are flattened to a long feature vector

h \in R^{L \times 1}

after the flatten operation. Combine the N feature vectors

h_{1}, h_{2}, \dots, h_{N}

into a feature matrix

(26) $H = {[h_{1}, h_{2}, \dots, h_{N}]}^{T} .$

Similar to ELM-LRF, the convolution layer weights are fixed after random initialization. The weights $β$ from hidden layer to the output (polynomial coefficients) can be solved by Equation (20).

3.3. Model Training and Testing

In this paper, the classical bagging ensemble-learning method is applied to generate diverse data and train CELMs. The model trained with the bagging-based method is called Bagging-ECELMs. Suppose there is a training dataset $S_{train} = {X_{n}, α_{n}}_{n = 1}^{N_{train}}$ , and a validation dataset $S_{valid} = {X_{n}, α_{n}}_{n = 1}^{N_{valid}}$ , where $X_{n} \in C^{N_{a} \times N_{r}}$ is the n-th defocused image, $α_{n} \in R^{K \times 1}$ is the polynomial phase error coefficient vector of $X_{n}$ , and $N_{a}$ and $N_{r}$ are the number of pixels in azimuth and range, respectively. Denote M as the number of CELMs. In order to train the M CELMs, N samples are randomly selected from the training set $S_{train}$ as the training samples of a single CELM, and M training sets are obtained by repeating this process M times. The validation dataset $S_{valid}$ was utilized to select the best factor $λ$ in Equation (19). Assuming that there are $N_{λ}$ regularization factors are set in the experiment, then each CELM will be trained $N_{λ}$ times.

The training of a single CELM consists of two main steps: randomly initializing the input weights $A$ (the weights of the convolution layer) and calculating the output weights (Equation (20)). The input weights are randomly generated and then orthogonalized using singular value decomposition (SVD) [48]. Assuming that there are $C_{o}$ convolutional output channels, the convolution kernel size is $r_{a} \times 1$ , where $r_{a}$ is the kernel size in the azimuth and 1 is the kernel size in the range. Firstly, generate $2 C_{o}$ convolution kernel weights ${w_{i} \in R^{r_{a} \times 1}}_{i = 1}^{2 C_{o}}$ with standard Gaussian distribution. Secondly, combine these weights into a matrix $A^{init}$ in order

(27) $W = [w_{1}, w_{2}, \dots, w_{2 C_{o}}] .$

Thirdly, orthogonalize the weight matrix $W \in R^{r_{a} \times 2 C_{o}}$ with SVD, and obtain the orthogonalized weight $A = [a_{1}, a_{2}, \dots, a_{2 C_{o}}] \in R^{r_{a} \times 2 C_{o}}$ . Finally, reshape the weights $A$ into a matrix with size $C_{o} \times 2 \times r_{a} \times 1$ to obtain the final input weights $A$ .

The pseudocode for training Bagging-ECELMs is summarized in Algorithm 1, where the entropy-based combination strategy is utilized (Equation (15)). The testing process of Bagging-ECELMs model is very simple; see Algorithm 2 for details.

Algorithm 1: Training CELMs based on bagging

Input: The orignal training dataset $S_{train} = {X_{n}, α_{n}}_{n = 1}^{N_{train}}$ , validation dataset $S_{valid} = {X_{n}, α_{n}}_{n = 1}^{N_{valid}}$ , trade-off factor set ${λ_{1}, λ_{2}, \dots, λ_{N_{λ}}}$ , the number of CELMs M, the number of samples N used to train a single CELM.
Output: The input weights ${A^{(1)}, A^{(2)}, \dots, A^{(M)}}$ and the output weights ${β^{(1)}, β^{(2)}, \dots, β^{(M)}}$ of the M CELMs.

1:. for $m = 1$ to M do
2:. set $s_{\min} = + \infty$
3:. randomly select N samples from set $S_{train}$ to form training set $S_{train}^{(m)}$ of the m-th CELM
4:. randomly initialize the input weights $W^{(m)}$ of the M-th CELM
5:. orthogonalize $A^{(m)}$ utilize SVD
6:. for $n_{λ} = 1$ to $N_{λ}$ do
7:. compute feature matrix $H_{train}$ of $S_{train}^{(m)}$ using Equation (26)
8:. compute output weights using Equation (20)
9:. compute feature matrix $H_{valid}$ of $S_{valid}$ using Equation (26)
10:. compute the estimated phase error coefficients $H_{valid} β$
11:. compute the phase error vector using Equation (14) and focus each validation image using Equation (2)
12:. compute the entropy s of all the focused images
13:. if $s < s_{\min}$ then
14:. $β^{(m)} \leftarrow β$
15:. $s_{\min} \leftarrow s$
16:. end if
17:. end for
18:. end for

Algorithm 2: Testing CELMs

Input: The unfocused complex image $X$ , the number of CELMs M.
Output: The focused images $Y$ .

1:. for $m = 1$ to M do
2:. set $s_{\min} = + \infty$
3:. compute feature matrix $H_{test}$ of $S_{test}^{(m)}$ using Equation (26)
4:. compute the estimated phase error coefficients $H_{test} β^{(m)}$
5:. compute the phase error using Equation (14) and focus $X$ using Equation (2)
6:. compute the entropy s of the focused image $Y^{(m)}$
7:. if $s < s_{\min}$ then
8:. $Y \leftarrow Y^{(m)}$
9:. $s_{\min} \leftarrow s$
10:. end if
11:. end for

4. Experimental Results

This section presents the results obtained with the proposed autofocus method. Firstly, the used datasets are described in detail. Secondly, implementation details, together with the obtained results, are presented and discussed. All experiments were run in PyTorch1.8.1 on a workstation equipped with an Intel E5-2696 2.3GHz CPU, 64GB RAM, and an NVIDIA 1080TI GPU. Our code is available at https://github.com/aisari/AutofocusSAR (accessed on 25 June 2021).

4.1. Dataset Description

The data used for this work were acquired by the Advanced Land Observing Satellite (ALOS) satellite in fine mode. The ALOS satellite was developed by the Earth Observation Research Center, Japan Aerospace Exploration Agency, began to serve in 2006, and ended in 2011. ALOS is equipped with a Phased Array L-band Synthetic Aperture Radar (PALSAR).

The PALSAR has three working modes: fine mode, scanning mode and polarization mode. Specific parameters of the PALSAR in fine mode are shown in Table 2, where $PRF$ represents Pulse Repetition Frequency, i.e., sampling rate in azimuth. As shown in Table 2, there are two resolution modes in fine mode: high-resolution (HR) and low-resolution (LR). With high resolution, the azimuth resolution is about 5 m, the slant range resolution is up to 5m, and the ground resolution is about 7 m.

Nine groups of SAR raw data were used in the experiment, covering the areas of Vancouver, Xi’an, Heiwaden, Hefei, Florida, Toledo and Simi Valley. More detailed information, containing the scene name, acquisition date, effective velocity ( $V_{r}$ ) and Pulse Repetition Frequency (PRF), is given in Table 3. All the raw data can be acquired from https://search.asf.alaska.edu/ (accessed on 25 May 2018) by searching the scene name. A world map of the nine areas is available from our code repository.

The range doppler algorithm is utilized to process the raw data. Since the original image is very large, we selected a subregion with a size of $8192 \times 8192$ for each image. The imaging results of the nine sub-images, processed by the range doppler algorithm, are shown in Figure 3. The selected areas include sea surface, urban areas, rural areas, mountains, and other terrains with varying texture complexity, which is an important guarantee for verifying the performance of the autofocus algorithms.

We generated azimuth phase errors by simulating an estimation error of equivalent velocity. Of course, the phase errors could also be generated by directly generating polynomial coefficients. The range of velocity estimation error was setatn an interval of $[V_{r} -$ 25 $m / s$ , $V_{r} +$ 25 $m / s]$ , the sampling interval was 2 $m / s$ , and the range doppler algorithm was used to process imaging. Thus, for every SAR raw data matrix, 25 defocused complex-valued SAR images would be generated. The images corresponding to sequence numbers 2, 3, 4, 5, 8 in Table 3 were used to construct the training dataset. The images corresponding to sequence numbers 6, 7 in Table 3 were used to construct the validation dataset. The images corresponding to sequence numbers 1, 9 in Table 3 were used to construct the testing dataset. Image patches with size $256 \times 256$ were selected from these images to create the dataset. We randomly selected 20,000 image patches for training from the $5 \times 25 = 125$ defocused training images. A total of 8000 validation image patches were selected from the $2 \times 25 = 50$ defocused validation images. 8000 testing image patches were selected from the $2 \times 25 = 50$ defocused testing images.

The entropies of the above unfocused training, validation, and testing images were 9.9876, 10.2911, and 10.0474, respectively. The contrast levels in the above unfocused training, validation, and testing images were 3.3820, 1.9860, and 3.4078, respectively.

4.2. Performance of the Proposed Method

In this experiment, the degree of the polynomial (Equation (14)) was set to $Q = 7$ ; thus, each CELM had $K = 6$ output neurons. AN entropy-based combination strategy was used in this experiment. To analyze the influence of CELMs number on focusing performance, M was chosen from $M = {1, 2, 4, 8, 16, 32, 64}$ . All CELMs had the same modules as illustrated in Figure 2. The number of convolution kernels was set to $C_{o} = 32$ . The regularization factor $λ$ was chosen from ${0.01, 0.1, 1, 10, 100}$ . For each CELM, 3000 samples were randomly chosen from the above training dataset to train the CELM. The batch size was set to 10. The NVIDIA 1080TI GPU was utilized to train and testing.

Firstly, we analyzed the influence of convolution kernel size (CKS) $r_{a}$ on the performance of the proposed model. In this experiment, the number of CELMs was set to 1, and the kernel size in azimuth was chosen from ${1, 3, \dots, 63}$ . After training, the entropy and contrast metrics were evaluated on the training, validation, and testing datasets, respectively. The results are illustrated in Figure 4. As we can see from Figure 4a,b, when $r_{a} = 17$ , the performance was best. The corresponding entropy and contrast on testing dataset were 9.9931 and 3.7952, respectively.

Secondly, the influence of the number of CELMs with the same CKS on focusing performance was analyzed. In this experiment, the number of CELMs was chosen from set $M$ . The CKS in azimuth of all CELMs were set to 3 and 17, respectively. The training time (see Algorithm 1 for training details.) of the model on the 1080TI GPU device is displayed in Table 4 and Table 5. After training, we tested the trained model on the testing dataset. Then, the entropy, contrast and testing time were evaluated, and the results are shown in Table 4 and Table 5. It can be seen from Table 4 and Table 5 that the higher the number of CELMs, the better the focusing quality, but the focusing time increases. Furthermore, regardless of the number of CELMs, the performance of Bagging-ECELMs with CKS 17 is much better than that of Bagging-ECELMs with CKS 3.

Thirdly, the influence of the number of CELMs with different CKS on focusing performance is analyzed. Suppose there are M CELMs; the azimuth CKS of the m-th CELM is set as

(28) $r_{a}^{(m)} = \max {1, 63 - (m - 1) \times 64 / M}, m = 1, 2, 3 \dots, M .$

Equation (28) can generate very different kernel sizes. Here are a few examples: if $M = 2$ , then the azimuth CKS are 63 and 31; if $M = 4$ , then the the azimuth CKS are 63, 47, 31 and 15; if $M = 8$ , then the azimuth CKS are 63, 55, 47, 39, 31, 23, 15 and 7.

After training all the CELMs, our proposed model is evaluated on the above training, validation, and testing dataset. The results are illustrated in Figure 5 and Table 6.

In Figure 5, when the number of CELMs is 0, there is no autofocus. As is known, the smaller the entropy, the greater the contrast, indicating that the focusing quality is better. We can conclude that the higher the number of individual learners (CELMs), the higher the focusing quality. The autofocus time of the proposed model is approximately linear with the number of CELMs. However, when the number of CELMs is large, increasing the number of individual learners has little effect on the focus quality.

The detailed numerical results are given in Table 6. The entropy, contrast and testing (Algorithm 2) time metrics are evaluated on the testing dataset. The training time metric is evaluated on the training and validation dataset; see Algorithm 1 for details. As we can see from Table 6, the training time of the proposed model is directly proportional to the number of individual learners. Comparing the results in Table 4, Table 5 and Table 6 and Figure 4, it can be found that the size of convolution kernel has a great influence on the performance of the model. When the optimal kernel size is unknown, using different kernel sizes can yield more optimal solutions.

Finally, to verify the effectiveness of the proposed combination strategy, the classical average combination strategy, which averages the outputs of M CELMs, is tested. In this experiment, a different CKS is used, which can be computed by Equation (28). The performances with different numbers of CELM in the testing dataset are shown in Table 7. The training time, evaluated on training and validation datasets, is also provided. From Table 6 and Table 7, we can conclude that our proposed entropy-based combination strategy can obtain a higher focus quality. The reason the average method does not work well is that the phase errors predicted by different CELMs may be cancelled out by each other.

4.3. Comparison with Existing Autofocus Algorithms

In this experiment, we compared the proposed method with the existing autofocus methods of PGA-ML, PGA-LUMV [16], and MEA [51]. The training, validation and testing datasets described in Section 4.1 were used. In the original PGA algorithm, the window size was set manually. If not set properly, the algorithm will not converge. However, it is difficult to manually set the window size for the above 8000 test images. We implemented an adaptive method to determine the window size. Denote $Z$ as the complex-valued image data where dominant scatters are center-shifted. The threshold value Tk which determines the window size, is calculated by the following formulas

(29) $v = 20 {log}_{10} [\sum_{n_{r} = 0}^{N_{r} - 1} Z_{n_{a}, n_{r}} Z_{n_{a}, n_{r}}^{*}],$

(30) $T = \frac{1}{N_{a}} \sum_{n_{a} = 0}^{N_{a}} v,$

where

N_{a}, N_{r}

are the number of pixels in azimuth and range. Denote

i_{s}, i_{e}

as the positions that satisfy

v_{i} < v_{i_{s}} \leq T < v_{i_{s} + 1}, \forall i < i_{s}

and

v_{i} < v_{i_{e}} \leq T < v_{i_{e} - 1}, \forall i > i_{e}

, respectively. Thus, the window size is computed by

i_{e} - i_{s} + 1

The maximum number of iterations of PGA-ML, PGA-LUMV and MEA are set to 20, 20 and 400, respectively. The tolerance errors of PGA-ML, PGA-LUMV and MEA are both set to 1 × 10 $^{- 4}$ . The learning rates of MEA are set to 1, 10 and 100, respectively. The number of CELMs is 64 and the convolution kernel size of CELMs can be computed by Equation (28). The LeakyReLU nonlinear activation function is utilized in all the CELMs. See Section 4.2 for detailed experimental settings.

The results of different autofocus algorithms on the testing dataset are shown in Table 8. In Table 8, MEA-1, MEA-10, MEA-100 represent the MEA algorithms with learning rates 1, 10 and 100, respectively. As is known, the image with lower entropy and higher contrast has a better focus quality. As shown in Table 8, our proposed method and MEA have a better focus quality than PGA-based methods.

In order to intuitively show the focus performance of different methods, three scenes with different texture complexities and defocusing levels were selected in the experiment. Figure 6 shows the autofocus results of the PGA-LUMV, MEA and the proposed autofocus algorithms. It can be seen from the figure that the proposed algorithm and MEA algorithm are suitable for different scenes. However, the phase-gradient-based methods depend on strong scattering points, so PGA-LUMV fails for the scene without strong scattering points, as shown in Figure 6j.

The phase error curves of the three scenes, estimated by the above three methods, are shown in Figure 7, Figure 8 and Figure 9, respectively. It can be seen from Figure 7 and Figure 9 that the 1st image and 3rd image have large phase errors and are seriously defocused. However, the 2nd image has small phase errors. Wecan see that the phase errors estimated by our proposed method are the closest to the results of MEA.

In the experiment, we also evaluated the focus speed of the above four algorithms on a testing dataset. The NVIDIA 1080TI GPU and Intel E5-2696 CPU device were used for these algorithms. The results are shown in Table 9 and Table 10, respectively. It should be noted that the PGA-based algorithms performed more slowly on GPU than on CPU. This is because the center-shifting dominant scatter operations can not be effectively parallelized.

It is well-known that PGA has fast convergence and a sufficient performance for low-frequency errors, but is not suitable for estimating high-frequency phase error [41]. Meanwhile, MEA requires more iterations and more time to converge, but can obtain a more accurate phase error estimation. From the results in Table 8, Table 9 and Table 10, we can conclude that our proposed algorithm has a good trade-off between focusing speed and quality.

5. Discussion

SAR autofocus is a key technique for obtaining high-resolution SAR images. The minimum-entropy-based algorithm usually has a high focusing quality but suffers from a slow focusing speed. The phase-gradient-based method has a fast focusing speed but performs poorly (or even does not work) in a scene where a dominant scatterer does not exist. Our proposed machine-learning- and ensemble-learning-based autofocus algorithm (Bagging-ECELMs) has a good trade-off between focusing quality and speed. The experimental results presented in Section 4.3 provide evidence for these conclusions. In Section 4.2, the performance of our proposed method is thoroughly analyzed. Firstly, we found that the convolution kernel;s size has a great influence on the performance of the model. Traversing all convolution kernel sizes is often inefficient and sometimes impossible. Utilizing different kernel sizes can obtain a performance closer to the optimal solutions (see Table 4, Table 5 and Table 6). Secondly, our proposed metric-based combination strategy is much more effective than the classical average-based combination strategy. The phase errors predicted by different CELMs may have different symbols, which will lead to phase error cancellations. Last, but not least, we can easily conclude that our proposed Bagging-ECELMs method performs much better than a single CELM.

However, our proposed Bagging-ECELMs method has the following three disadvantages. Firstly, this model can only be utilized for phase errors that can be modeled as a polynomial. Secondly, a high number of samples is needed for training. Finally, the focusing quality is slightly worse than that based on minimum entropy. Bagging-ECELMs can replace PGA when is used to correct polynomial-type phase errors. When a higher image focusing quality is required and the type of phase error is unknown, the MEA method should be used. The prediction results of Bagging-ECELMs can also be used as the initial values of MEA, to accelerate the convergence speed of MEA. In summary, Bagging-ECELMs is more suitable for real-time autofocus applications, while MEA is more suited to high-quality autofocus applications. Different from MEA and PGA, Bagging-ECELMs is nonparametric at the testing phase and easier to use.

In future research, our work will focus on three aspects. Our proposed algorithm will be extended to correct sinusoidal phase errors. Boosting- or divide-and-conquer-based ECELMs will be developed. Although the method proposed in this paper has a good trade-off between focusing quality and speed, it is still possible to enhance this by improving the combination strategy and network structure.

6. Conclusions

In this paper, we propose a machine-learning-based SAR autofocus algorithm. A Convolutional Extreme Learning Machine (CELM) is constructed to predict the polynomial coefficients of azimuth phase error. In order to improve the prediction accuracy of a single CELM, a bagging-based ensemble learning method is applied. Experimental results conducted on real SAR data show that this ensemble scheme can effectively improve the accuracy of phase error estimation. Furthermore, our proposed algorithm has a good trade-off between focus quality and focus speed. Future works will focus on sinusoidal phase error correction, a novel combination strategy, and developing ECELMs based on boosting or divide-and-conquer. Faster and more accurate SAR autofocus algorithms based on deep learning will also be studied.

Author Contributions

Conceptualization, Z.L. and S.Y.; methodology, Z.L.; software, Z.L.; validation, Q.G., Z.F. and M.W.; formal analysis, Z.L.; investigation, Z.L.; resources, S.Y.; data curation, M.W.; writing—original draft preparation, Z.L.; writing—review and editing, S.Y.; visualization, Z.L.; supervision, S.Y.; project administration, S.Y.; funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 61771376, Grant 61771380, Grant 61836009, Grant U1701267, Grant 61906145, Grant U1730109, Grant 61703328, Grant 91438201 and Grant 9183830; the Major Research Plan in Shaanxi Province of China under Grant 2017ZDXMGY-103 and Grant 2017ZDCXL-GY-03-02; the Science and Technology Innovation Team in Shaanxi Province of China under Grant 2020TD-017; the Science Basis Research Program in Shaanxi Province of China under Grant 2016JK1823, Grant 2017JM6086 and Grant 2019JQ-663.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Public ALOS SAR data are acquried from https://search.asf.alaska.edu/ (accessed on 25 May 2018).

Acknowledgments

The authors wish to acknowledge the anonymous reviewers for providing helpful suggestions that greatly improved the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial Neural Network
APE	Azimuth Phase Error
CELM	Convolutional Extreme Learning Machine
CKS	Convolution Kernel Size
ELM	Extreme Learning Machine
LUMV	Linear Unbiased Minimum Variance
MDA	Map Drift Autofocus
MEA	Minimum Entropy Autofocus
ML	Maximum Likelihood
PEs	Phase Errors
PGA	Phase Gradient Autofocus
SAR	Synthetic Aperture Radar
PRF	Pulse Repetition Frequency

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures and Tables

Figure 1. The framework of our proposed ensemble-learning-based autofocus algorithm.

View Image - Figure 2. The structure of a single convolutional, extreme-learning machine for autofocus. The CKS in azimuth is set to 63; the convolution stride is 1.

Figure 2. The structure of a single convolutional, extreme-learning machine for autofocus. The CKS in azimuth is set to 63; the convolution stride is 1.

View Image - Figure 3. The SAR images that were utilized to construct the dataset. Each image was imaged by the range doppler algorithm with accurate equivalent velocity. These images are down-sampled to 512×512 for showing.

Figure 3. The SAR images that were utilized to construct the dataset. Each image was imaged by the range doppler algorithm with accurate equivalent velocity. These images are down-sampled to 512×512 for showing.

Figure 4. The focusing performance versus the azimuth kernel size.

View Image - Figure 5. The focusing performance versus the number of CELMs. The entropy, contrast and time metrics evaluated on the training, validation and testing datasets are illustrated. The kernel size of each CELM is different.

Figure 5. The focusing performance versus the number of CELMs. The entropy, contrast and time metrics evaluated on the training, validation and testing datasets are illustrated. The kernel size of each CELM is different.

Figure 6. The focus results of different autofocus algorithms. Three scenes with different defocusing level are illustrated.

Figure 7. The azimuth phase error curves of the 1st scene estimated by different algorithms.

Figure 8. The azimuth phase error curves of the 2nd scene estimated by different algorithms.

Figure 9. The azimuth phase error curves of the 3rd scene estimated by different algorithms.

Table 1

Configuration of a single convolutional, extreme-learning machine.

Layer Number	Layer Type	Output Size
1	$C_{o} \times 2 \times r_{a} \times 1$ Conv+IN+LeakyReLU	$n \times C_{o} \times (256 - r_{a} + 1) \times 256$
2	Range GAP(256)	$n \times C_{o} \times (256 - r_{a} + 1) \times 1$
3	Flatten	$n \times (C_{o} \times (256 - r_{a} + 1))$
4	$(C_{o} \times (256 - r_{a} + 1)) \times K$ FC	$n \times K$

Table 2

Platform parameters of ALOS PALSAR in fine mode.

Parameter	Notation	Value	Unit
Platform height	H	not fixed, e.g., 691,500	m
Platform velocity	V	not fixed, e.g., 7172	m/s
Antenna length (range)	$L_{r}$	2.9	m
Antenna length (azimuth)	$L_{a}$	8.9	m
Wavelength	$λ$	236.057	mm
Carrier frequency	$f_{c}$	1.27	GHz
Pulse width	$T_{p}$	27.0	$μ$ s
Chirp rate (range)	$K_{r}$	−1037.0370 (HR), −518.5186 (LR)	GHz/s
Bandwidth (range)	$B_{r}$	28 (HR), 14 (LR)	MHz
Sampling rate (range)	$F_{s}$	32 (HR), 16 (LR)	MHz
Number of samples (range)	$N_{r}$	10,344 (HR), 5616 (LR)	-
Chirp rate (azimuth)	$K_{a}$	2122.96	Hz
Pulse Repetition Frequency	$PRF$	<2700, not fixed	Hz
Number of samples (azimuth)	$N_{a}$	not fixed	-
Resolution	$Δ_{a} \times Δ_{r}$	about $5 \times 5$ (HR), $5 \times 10$ (LR)	m
Swath width	$X_{swath}$	about 40–70	km
Incident angle	$θ_{i}$	8–60	degree
Squint angle	$θ_{s}$	0	degree
Data rate		240	Mbps
Bit width		5	bit

Table 3

Detailed information of acquired SAR data.

	Area	Scene Name	Acquisition Date	$V_{r}$ (m/s)	$PRF$ (Hz)
1	Vancouver	ALPSRP020160970	11 June 2006	7153	1912.0459
2	Xi’an	ALPSRP054200670	30 January 2007	7185	2159.8272
3	Hawarden	ALPSRP103336310	2 January 2008	7211	2105.2632
4	Hefei	ALPSRP110940620	23 February 2008	7188	2145.9227
5	Langley	ALPSRP115120970	23 March 2008	7174	2155.1724
6	Florida	ALPSRP268560540	8 February 2011	7190	2159.8272
7	Kaliganj	ALPSRP269950430	17 February 2011	7195	2159.8272
8	SimiValley	ALPSRP273680670	15 March 2011	7185	2155.1724
9	Toledo	ALPSRP278552780	17 April 2011	7178	2141.3276

Table 4

The influence of the number of CELMs with $r_{a} = 3$ on focusing performance.

	0	1	2	4	8	16	32	64
Entropy	10.0474	10.0435	10.0071	9.9739	9.9490	9.9238	9.9069	9.8965
Contrast	3.4078	3.4333	3.7135	3.9798	4.2039	4.4202	4.5721	4.6723
Training (s)	-	82.01	166.95	329.76	673.71	1325.90	2681.57	5293.01
Testing (s)	-	6.26	10.38	18.94	35.96	70.00	136.13	271.78

Table 5

The influence of the number of CELMs with $r_{a} = 17$ on focusing performance.

	0	1	2	4	8	16	32	64
Entropy	10.0474	9.9931	9.9564	9.9231	9.8981	9.8792	9.8693	9.8628
Contrast	3.4078	3.7952	4.0938	4.3873	4.6313	4.8170	4.9197	4.9800
Training (s)	-	57.51	152.41	289.57	534.42	1291.96	2301.05	5151.04
Testing (s)	-	6.12	10.05	18.49	35.51	69.29	134.85	268.55

Table 6

The influence of the number of CELMs with different CKS on focusing performance.

	0	1	2	4	8	16	32	64
Entropy	10.0474	10.0387	9.9706	9.9319	9.9023	9.8808	9.8711	9.8623
Contrast	3.4078	3.4639	3.9824	4.3190	4.6011	4.8025	4.9085	4.9880
Training (s)	-	80.57	141.94	303.39	503.34	1324.85	2605.70	4982.68
Testing (s)	-	5.94	9.99	18.25	34.98	67.83	130.98	262.96

Table 7

The performance of Bagging-ECELMs with average combination strategy.

	0	1	2	4	8	16	32	64
Entropy	10.0474	10.0387	10.0065	9.9950	9.9943	9.9850	9.9868	9.9852
Contrast	3.4078	3.4639	3.6926	3.7810	3.7851	3.8486	3.8554	3.8537
Training (s)	-	81.68	149.13	299.52	608.76	1363.78	2376.71	4208.93
Testing (s)	-	5.19	7.98	14.57	27.31	52.25	101.18	199.57

Table 8

The results of different autofocus algorithms on the testing dataset.

	PGA-ML	PGA-LUMV	MEA-1	MEA-10	MEA-100	Bagging-ECELMs
Entropy	9.8913	9.8879	9.8564	9.8510	9.8565	9.8623
Contrast	4.7447	4.7726	5.0416	5.0944	5.0416	4.9880

Table 9

The focusing speed (unit:s) of different autofocus algorithms on GPU.

Batchsize	PGA-ML	PGA-LUMV	MEA-10	Bagging-ECELMs
1	3682.72	3751.82	15,545.53	675.25
10	3426.21	3460.97	1600.66	262.26
20	3263.72	3419.52	768.21	239.09
40	3214.08	3282.10	572.62	239.09

Table 10

The focusing speed (unit:s) of different autofocus algorithms on CPU.

Batchsize	PGA-ML	PGA-LUMV	MEA-10	Bagging-ECELMs
1	2353.78	2372.80	36,376.94	3637.47
10	1672.91	1749.18	7566.39	2856.29
20	1653.33	1740.23	7634.37	2987.71
40	1647.03	1734.39	7815.35	2966.18

Word count: 7551

Show less

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Inaccurate Synthetic Aperture Radar (SAR) navigation information will lead to unknown phase errors in SAR data. Uncompensated phase errors can blur the SAR images. Autofocus is a technique that can automatically estimate phase errors from data. However, existing autofocus algorithms either have poor focusing quality or a slow focusing speed. In this paper, an ensemble learning-based autofocus method is proposed. Convolutional Extreme Learning Machine (CELM) is constructed and utilized to estimate the phase error. However, the performance of a single CELM is poor. To overcome this, a novel, metric-based combination strategy is proposed, combining multiple CELMs to further improve the estimation accuracy. The proposed model is trained with the classical bagging-based ensemble learning method. The training and testing process is non-iterative and fast. Experimental results conducted on real SAR data show that the proposed method has a good trade-off between focusing quality and speed.

Details

Title

Fast SAR Autofocus Based on Ensemble Convolutional Extreme Learning Machine

Author

Liu, Zhi¹

; Yang, Shuyuan¹

; Feng, Zhixi¹; Gao, Quanwei¹; Wang, Min²

¹ School of Artificial Intelligence, Xidian University, Xi’an 710071, China; [email protected] (Z.L.); [email protected] (Z.F.); [email protected] (Q.G.)
² School of Electronic Engineering, Xidian University, Xi’an 710071, China; [email protected]

First page

2683

Publication year

2021

Publication date

2021

Publisher

MDPI AG

e-ISSN

20724292

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/rs13142683

ProQuest document ID

2554674525

Fast SAR Autofocus Based on Ensemble Convolutional Extreme Learning Machine

Jump to:

Full text

Abstract

Details

Suggested sources