Age estimation from facial images based on Gabor

Full text

Turn on search term navigation

INTRODUCTION

With the rapid development of artificial intelligence, people wish to gain facial attribute information rather than merely recognise a face from images. One's age or age group, for example, can be estimated by extracting facial features. Automatic age estimation is of much value in real life, such as human–computer interaction, demographic information collection, marketing intelligence etc. [1] However, it can be extremely difficult for a human, still less for computers. Therefore, automatic age estimation is a considerable challenge.

The change of age appears in different forms such as facial skin, soft tissue, and skeletal structure [2]. This otherness of age representation makes it difficult to extract facial age features integrally. By utilising different methods of extracting facial features, the human facial age estimation can be solved either by traditional human-designed features or deep learning methods. In Kwon's research, face images were divided by a human body measurement model into three age groups, namely infants, young people, and old people. By calculating six distance ratios of frontal face images, the images of children and adults can be distinguished [3]. However, the method obtained age features from the position and the distance of various facial signs, which required the positions of the key points to be very accurate, causing poor performance on classification and difficulty in realisation. Cootes et al. proposed active appearance models (AAM) based on the statistical analysis from the active shape model [4]. AAM studied the shape and the texture of the human face by applying principal component analysis (PCA) to the training images. Although AAM could build models on 2-D shape and texture simultaneously, it did not efficiently extract the features of the non-contour region of the human face. Thus, AAM did not show a high classification accuracy and consumed a large amount of computation in an application. Besides these methods, many algorithms that describe texture features were applied in the task of age estimation, such as local binary pattern (LBP), histograms of oriented gradients (HOG), biologically inspired features (BIF), binarised statistical image features (BSIF), and local phase quantisation (LPQ) [5–9]. These algorithms were widely applied in computer visualisation due to real-time, straightforward implementation and low computational cost etc. For example, Eidinger et al. extracted the facial LBP feature and used the one-to-one Support Vector Machine (SVM) with a classification accuracy of 45.1% on the Adience age dataset [10]. Considering that a single feature extraction algorithm did not integrally extract age features, Choi et al. used LBP and Gabor wavelet transform to obtain facial skin and wrinkle features, respectively. SVM and support vector regression (SVR) were used to estimate age, and the mean absolute error (MAE) was 4.66 on the FG-Net dataset [11]. However, this method did not incorporate the features extracted by Gabor wavelet transform, resulting in a strong correlation among features which affected the subsequent regression for age. Bekhouche et al. combined pyramid multi-level (PML), LBP, LPQ, and BSIF to obtain age features and performed SVR on age regression. The combined PML and LPQ method achieved the best MAE at 5.30 [12]. However, LPQ could not extract multi-scale texture features and had a poor representation of partial texture features.

Recently, with the development of deep learning, researchers have paid more attention to extracting age features using a convolutional neural network (CNN). Levi and Hassner first used CNN with a small number of layers in feature extraction and age classification. The accuracy achieved 50.7% on the unconstrained Adience dataset [13]. However, due to the small number of CNN layers, this method could only obtain shallow age features. To improve the diversity of age features extraction, Gurpinar et al. used VGG-Face with deeper layers for age features extraction and a kernel-based extreme learning machine for age classification. The MAE on the LAP-2016 dataset was 3.85. The VGG-Face model contained 37 layers, the last of which was a softmax layer with 2622 inputs. A large number of parameters resulted in a great consumption of training time [14]. Liu et al. designed a network model, GA-DFL, which contained three parallel CNN models. The model used different-sized convolution kernels to extract various age features. The MAE on the MORPH dataset was 3.25 [15]. Based on the above methods, we can conclude that the accuracy of age estimation depends on the diversity of extracted age features to a certain degree. The increase in the width or the depth of the model will result in a larger scale network architecture and an exponential rise in the number of parameters. Regarding this problem, some researchers combined traditional methods for feature extraction with CNN models. Hosseini et al. used the features obtained from the Gabor filter as the input of the CNN model, which then learnt the weights of the Gabor filter's response through backpropagation [16]. This method achieved an accuracy of 61.3% on the Adience dataset. Compared to traditional methods, deep learning models perform better at age estimation. However, training such a model could be challenging due to a large number of samples for training, the complex hyperparameters and networks for designing, making the model difficult to deploy on mobile devices.

To conclude, although there are similarities between deep learning methods and traditional feature detection methods, the physical meaning of the features extracted by CNN can differ dramatically from the original features. It is usually unacceptable when features are difficult to explain in many problems. The existing method of achieving state-of-the-art age estimation results based on deep learning was proposed by the author in Ref [17], which uses a deep expectation (DEX) system and requires a complex pretraining process with the collaboration of multiple datasets. Gabor filters are promoted for their useful properties, such as invariance to illumination, rotation, scale and translations, in feature extraction. [18] Therefore, our research focusses on the traditional feature extraction method and uses Gabor wavelet transform to obtain multi-scale facial texture features.

In the traditional feature extraction method, since features are usually characterised by high dimensions and small sample sizes, the performance of machine learning achieves a bottleneck. With the increase in the dimensions of features, the redundancy between features is a severe problem, which could cause a decrease in classification accuracy or overfitting of the machine. Therefore, reducing the dimensions of features is needed, and we should select representative features or feature combinations. In Choi's research, PCA was utilised to reduce the size of components extracted from the Gabor filter and LBP [11]. Nevertheless, PCA was only able to eliminate the correlation between parameters and given that the correlation is linear. For non-linearly correlated features, it could not achieve a good performance on dimension reduction. Furthermore, this might result in losing essential information from data. Targeting this problem, Kernel-PCA (KPCA) focussed more on the non-linear relationship of samples. The key to KPCA lies in the choice of the key kernel functions. Unfortunately, there is no unified guideline for selecting the kernel function, so we can only make attempts continuously and determine the optimal solution. On the other hand, PCA or KPCA maps features from the original space to the new low-dimensional space, and the transformed features have no physical meaning [19, 20]. In addition, some feature selection (FS) algorithms proposed are mostly based on minor dataset problems in recent years, and they have smaller feature dimensions. For instance, Mayfly-Harmony Search (MA-HS) was applied on 18 UCI datasets which have 60 dimensions at most [21]. The author in Ref. [22] proposed the FS method using ensemble learning for network anomaly detection on unSW-NB15 with 49 features. For some high-dimensional feature datasets, these methods often do not work well.

Atom search optimisation (ASO) is a new intelligent optimisation algorithm based on molecular dynamics [23]. Compared to the traditional FS methods, ASO showed an advantage in both accuracy and the number of features selected. Nonetheless, ASO had poor robustness to initial values, showed slow convergence in the later period, and was difficult to break out of the constraint of the local optimum solution. These significantly affect the accuracy and the speed of the algorithm. Li et al. proposed improved atom search optimisation (IASO) [24]. The algorithm corrected the acceleration of every atom at the optimised historical position, balanced the local and global search power by setting two adaptive coefficients, and introduced a Gaussian mutation strategy to improve the ability of atoms to jump out of precocity. IASO improved the convergence speed and the ability to jump out of local optimum solutions in the later stage of the algorithm. Still, IASO showed slow convergence and gave an unstable solution in high-dimensional space. Herein, targeting solving the problems in ASO and IASO, we propose an algorithm, chaotic improved atom search optimisation with simulated annealing (CIASO-SA). The algorithm initialises atoms by using chaotic maps, which improves the robustness of the initial values and the accuracy of the solution. In the meantime, simulated annealing (SA) is used to enhance the ability of atoms to jump out of precocity and enhance the convergence performance in high dimensional feature space. CIASO-SA is used as a random optimisation algorithm for feature selection to improve age classification accuracy. Through the CIASO-SA feature selection algorithm, the performance of Gabor fusion features can be achieved to a greater extent.

To the best of our knowledge, this is the first time that a heuristic optimisation algorithm has been used in the field of age estimation. In short, the contributions of this paper can be summarised as follows:

The multi-scale and multi-direction Gabor wavelet transform is used to extract age characteristics;
A feature fusion method based on histogram and maximum index encoding is proposed;
Aiming at the problem of the high dimension of age feature space, an improved version of IASO, CIASO-SA, is proposed, which integrates chaotic mapping and SA;
Three FS algorithms are used to test on the Adience dataset, and three resolutions are used for the test images.

The overall frame of the age estimation algorithm proposed in this paper is shown in Figure 1. It is mainly divided into four stages: face cropping and preprocessing, feature extraction, feature selection, and age estimation. The remaining part of the paper proceeds as follows: Section 2 introduces the method of Gabor wavelet transform for feature extraction and fusion. Section 3 introduces the specific process of CIASO-SA and modifies it for feature selection. Section 4 describes the experimental setup, verifies the effectiveness of the proposed algorithm on the Adience dataset, and discusses the experimental results. Finally, Section 5 summarises the paper.

[IMAGE OMITTED. SEE PDF]

FEATURE EXTRACTION

Preprocessing of face images

The Adience dataset [10], shown in Figure 2, includes 26,580 images from 2284 different objects. The dataset divides age labels into eight groups, 0–2, 4–6, 8–13, 15–20, 25–32, 38–43, 48–53, and 60–100, including changes in light, angle, and position. Thus, it is a classic unconstrained facial dataset for age estimation. Table 1 shows the quantitative breakdown of the different age categories.

[IMAGE OMITTED. SEE PDF]

TABLE 1 The Adience dataset benchmark

Age	0–2	4–6	8–13	15–20	25–32	38–43	48–53	60–100	Total
Amount	1427	2162	2294	1653	4897	2350	825	869	19,487

The purpose of preprocessing face images is to detect faces in pictures. Then we denoise and posture-normalise the detected face to generate cropped face images. In this paper, the Viola–Jones face detector was used to extract the face area in the images. This algorithm used the cascading AdaBoost classifier and improved the speed and accuracy for detecting faces [25]. Furthermore, to eliminate the impact caused by different shooting angles, multi-scale face detection based on the Harr feature was applied to obtain the rectangular region of the face. Then, we corrected the posture of the face by locating two pupils. In the meantime, face images with low resolutions or severe face occlusion, resulting in no faces being detected, are discarded.

Gabor feature extraction

For human faces, features of different ages have different scales. For instance, the facial wrinkle texture of older people could be more pronounced, which needs to be analysed on a large scale, whereas the facial wrinkle texture of infants is not apparent, which needs to be analysed on a small scale.

The Gabor is a Gaussian kernel modulated by sinusoidal plane waves. Its frequency and orientation are similar to those of the human visual system. Different textures in facial images generally have different central frequencies and bandwidths. According to these frequencies and bandwidths, a set of Gabor filters can be designed to filter the image. Each Gabor filter is required to allow only the texture corresponding to its frequency to pass smoothly. In contrast, the energy of other textures is suppressed, so that texture features of different frequencies and bandwidths can be output from different filters. That is, to say, facial texture features of different directions and scales are analysed and extracted for subsequent classification tasks. Because of the above advantages, this paper adopts the Gabor feature for facial expression recognition. The 2-D Gabor kernel function is defined as Equation (1): 1 $g (x, y; λ, θ, ψ, σ, γ) = \exp (- \frac{{(x \cos θ + y \sin θ)}^{2} + γ^{2} {(- x \sin θ + y \cos θ)}^{2}}{2 σ^{2}}) \times \exp (i (2 π \frac{x \cos θ + y \sin θ}{λ} + ψ))$ where $λ$ is the wavelength of the cosine function in the Gabor kernel function. $θ$ is the direction of the parallel bands in the kernel. $ψ$ is the phase offset of the cosine function. $σ$ is the standard deviation of the Gaussian function. Finally, $γ$ is the spatial aspect ratio.

The Gabor filters with different parameters can capture partial structure information such as different spatial frequency, spatial position, and direction selectivity in the image. These features make the filters insensitive to light and face postures, which benefit texture expression and separation. Thus, our research uses the Gabor filter with five scales and eight directions. The scale and direction parameters setting is shown in Equations (2) and (3). 2 $λ = \sqrt{2} n, n \in (2, …, 6)$ 3 $θ = m * 2 π / 8, m \in (0, …, 7)$

Through Gabor transformation, each face image will transform into 40 feature images of different scales and directions. The dimension of the result features is 40 times larger than that of the original image, resulting in the redundancy in the feature data. Therefore, the real and imaginary parts and modules of Gabor features were fused at five scales and eight directions in our research, respectively. As shown in Equation (4), the maximal index of the pixel is taken as a new feature coding in 8 directions. Then we could obtain five fusion feature maps. Figure 3 illustrates the feature extraction and fusion of the real component in Gabor. 4 $x = \underset{x}{\arg \max} (‖ G_{m, n} (x, y) ‖)$

[IMAGE OMITTED. SEE PDF]

Considering that the Gabor feature does not fully represent the whole face images, we use the histogram to obtain the spatial information for each of the five fusion feature maps. Firstly, the feature map is divided into 8 × 8 grids, then the histogram distribution of the fusion feature in each grid is calculated.

The feature information of an image is different when there is a change in image resolution. Thus, bilinear interpolation is used to establish an image pyramid with three layers of spatial scale expression for the facial area before extracting Gabor features. The corresponding Gabor feature dimensions are shown in Table 2.

TABLE 2 Gabor feature dimensions under different resolutions

Resolution	Gabor feature dimension
16 × 16	160
32 × 32	640
48 × 48	1440

FEATURE SELECTION

Principal component analysis (PCA), Kernel-PCA (KPCA), and Atom search optimisation (ASO) are used to reduce the dimension of Gabor fusion features obtained from Section 2. Then the unrelated features, redundant features, and outliers in the original features can be removed. PCA and KPCA are commonly used in face feature selection. Xiao & Yin focussed on the problems with high feature dimensions and complex computation. They proposed a feature extraction algorithm, PCA-SIFT, based on dimension reduction by introducing the PCA algorithm [26]. The PCA-SIFT greatly improved face recognition. Compared to PCA and KPCA, the new intelligent optimisation algorithm, ASO, shows advantages in both accuracy and the number of selected features.

We compare ASO feature selection with PCA and KPCA. SVM is used as the classifier, and its best-performed parameters are determined by the grid searching technique. The dataset is divided into five parts during experiments and performed five-fold cross-validation to take the mean. Table 3 shows that with the same number of features, the accuracy and 1-off accuracy of the classification by ASO are higher than those of PCA and KPCA.

TABLE 3 Comparison of ASO, PCA, and KPCA

Algorithm	Number of features	Accuracy (%)	1-off accuracy (%)
ASO	684	52.84	79.35
PCA	684	49.72	77.12
KPCA	684	49.43	77.64

However, in the ASO algorithm, the robustness of the initial values is poor, the convergence is slow, and the ability to jump out of the local optimum is weak. The IASO algorithm proposed by Li et al. improved the speed of convergence and the ability for ASO to jump out of the local optimum solution [24]. However, the initial values are still not robust, and the power of resolving solutions in high dimensional space is poor. In this paper, we propose the CIASO-SA algorithm by improving the IASO algorithm. CIASO-SA initialises atoms by chaotic mapping, which has improved the robustness of initial values in the algorithm. Additionally, SA is used and has dramatically improved the ability for the algorithm to jump out of the local optimum solution in the later phase. Next, we introduce the basic idea of the IASO algorithm and some of its core formulas.

Basic principle of IASO

The motion of atoms follows classical mechanics. Given that the interaction force of the other atoms on the ith atom is $F_{i}$ , the binding force is $G_{i}$ , and the mass of the atom is $m_{i}$ ; $F_{i}$ can then be calculated from the Lanner-Jones potential (L-J potential). Then the force exerted by atom j on atom i can be shown in Equation (5). 5 $F_{ij}^{d} (t) = - η (t) [2 {(h_{i j} (t))}^{13} - {(h_{i j} (t))}^{7}]$

The depth function $η$ is defined in Equation (6). 6 $η (t) = - α {(1 - \frac{t - 1}{T})}^{3} e^{- \frac{20 t}{T}}$

The definitions of $r_{i j} (t)$ and σ(t) are shown in Equations (7) and (8), respectively. 7 $r_{i j} (t) = {‖ x_{i} (t), x_{j} (t) ‖}_{2}$ 8 $σ (t) = {‖ x_{i} (t), \frac{\sum_{j \in K_{best} (t)} x_{j} (t)}{K_{best} (t)} ‖}_{2}$

In Equations (7) and (8), $x_{i}$ and $x_{j}$ represent the position of atoms i and j, respectively. $K_{best}$ represents the global search ability of the atom. As the number of iterations grows, atoms tend to be closer to the atom with the optimal fitness. Thus, the number of optimal atoms that have a great effect on the other atoms should decrease. The definition of $K_{best}$ is shown in Equation (9). T is the maximal number of iterations of the algorithm. 9 $K_{best} (t) = N - (N - 2) \times \sqrt{\frac{t}{T}}$

A sine perturbation function $g (t)$ is also added based on $h_{\min}$ , where $g (t) = 0.1 \times \sin (\frac{π}{2} \times \frac{t}{T})$ . The definition of $h_{i j}$ is shown in Equation (10). 10 $h_{i j} (t) = \{\begin{matrix} h_{\min} + g (t) & \frac{r_{i j (t)}}{σ (t)} < h_{\min} \\ \frac{r_{i j (t)}}{σ (t)} & h_{\min} \leq \frac{r_{i j (t)}}{σ (t)} \leq h_{\max} \\ h_{\max} & \frac{r_{i j (t)}}{σ (t)} > h_{\max} \end{matrix}$

The covalent bond binding force, $G_{i}^{d}$ , is used to describe the role of geometric constraints in atomic motion. To emphasise the guidance of the optimum atom of the population, assume that each atom has a covalent bond with the optimal atom of the population, then each atom is bound by the optimal atom. The calculation of $G_{i}^{d}$ is shown in Equation (11). 11 $G_{i}^{d} = β e^{\frac{- 20 t}{T}} (x_{best}^{d} (t) - x_{i}^{d} (t))$ where β is the coefficient factor. $x_{b e s t}^{d} (t)$ is the position of the optimal atom of the population at iteration t. $x_{i}^{d} (t)$ is the current position of the atom i at iteration t.

The covalent bond force, $P_{i}^{d}$ , which is produced by the historical optimal solution of atoms, can combine the population information and its own experience to improve the ability of global search. The definition of $P_{i}^{d}$ is shown in Equation (12). 12 $P_{i}^{d} = λ e^{\frac{- 20 t}{T}} (x_{p}^{d} (t) - x_{i}^{d} (t))$ Where $λ$ is the coefficient factor. $x_{p}^{d} (t)$ represents the historical optimal position of the atom i at iteration t.

Combining the interaction force and the geometric binding force, the acceleration of the atom i in dimension d at time t is defined as, 13 $\begin{array}{l} a_{i}^{d} = \frac{F_{i}^{d} + G_{i}^{d} + P_{i}^{d}}{m_{i} (t)} = - α {(1 - \frac{t - 1}{T})}^{3} e^{- \frac{20 t}{T}} \times \sum_{j \in K_{best}} \frac{{r a n d}_{j} [2 {(h_{i j} (t))}^{13} - {(h_{i j} (t))}^{7}]}{m_{i} (t)} \\ + β e^{\frac{- 20 t}{T}} \frac{(x_{b e s t}^{d} (t) - x_{i}^{d} (t))}{m_{i}} + λ e^{\frac{- 20 t}{T}} \frac{(x_{p}^{d} (t) - x_{i}^{d} (t))}{m_{i}} \end{array}$ 14 $β = β_{\max} + (β_{\min} - β_{\max}) \times \frac{T - t}{T - 1}, λ = λ_{\max} + (λ_{\max} - λ_{\min}) \times \frac{T - t}{T - 1}$

In the equation, the mass of the atom i, $m_{i} (t)$ , is calculated as, 15 $m_{i} (t) = \frac{e^{- \frac{f_{i} (t) - f_{\min} (t)}{f_{\max} (t) - f_{\min} (t)}}}{\sum_{j = 1}^{N} e^{- \frac{f_{i} (t) - f_{\min} (t)}{f_{\max} (t) - f_{\min} (t)}}}$

In each iteration, we update the velocity and position of the atom i by the following, 16 $\begin{array}{l} v_{i}^{d} (t + 1) = {r a n d}_{i}^{d} \times v_{i}^{d} (t) + a_{i}^{d} (t) \\ x_{i}^{d} (t + 1) = x_{i}^{d} (t) + v_{i}^{d} (t + 1) \end{array}$

Chaotic improved atom search optimisation with simulated annealing

The initialisation of IASO utilises the pseudo-random number generator, shown in Equation (17). However, this method may result in atoms clustering far from the optimal solution with uneven distribution, which is worse in high dimensional spaces. This especially increases the instability of the solution. 17 $A t o m_p o s i t i o n = r a n d (A t o m_n u m, D i m) \times ({B o u n d}_{u p} - {B o u n d}_{d o w n}) + {B o u n d}_{d o w n}$

The chaos theory is derived from non-linear dynamical systems and studies the behaviour of dynamic systems sensitive to initial values. Because of the ergodicity and randomness of chaotic mapping, Ma & Yan obtained the phase–space distribution of the OLTC vibration signal by applying the chaotic dynamic behaviour [27]. This enriched the information source for monitoring the OLTC mechanical fault state. Logistic mapping and Tent mapping are commonly used chaotic mappings. Because of the simplicity and the ergodicity of the Logistic regression function, we use the Logistic chaotic mapping to initialise the position and the velocity of the atoms in IASO.

Firstly, the chaotic initialisation produces a D-dimensional vector, $z_{1} = (z_{11}, z_{12}, …, z_{1 D}), z_{1 j} \in (0,1)$ . Then, we use Logistic chaotic mapping to generate a new vector, $z_{i + 1, j} = μ z_{i j} (1 - z_{i j}), i = 1,2, …, N - 1, j = 1,2, …, D$ . We apply Equation (17) to obtain positions of the initialised atoms. Lastly, the top n atoms with better fitness are selected from the generated atoms as the initial population.

IASO uses Gaussian variation as the mutation mechanism. The mechanism has a strong ability for local search but does not guide atoms to jump out of local optimal solution well, resulting in low accuracy of the solution. Therefore, we introduce a simulated annealing algorithm. It produces new solutions from several perturbations near the local optimal solution. Then, the Metropolis criteria are used to determine whether the new solutions are accepted. The Metropolis criteria mean to accept new states with probability. The acceptance probability, p, is shown in Equation (18). In the early stage of the algorithm, if the rate of fitness change m does not change for n consecutive times, a new solution is generated through Gaussian mutation. m is calculated as shown in Equation (20). In this paper, the Cauchy mutation mechanism is used to generate new solutions in the later stage of algorithm, as shown in Equation (19), where $η$ is the mutation step, and $C (0,1)$ is the random number generated by the Cauchy distribution function with the proportionality coefficient of 1. The value of $η$ is 1. 18 $p = \{\begin{cases} 1, & {f i t n e s s}_{n e w} < f i t n e s s \\ e^{\frac{\max_fitness - {fitness}_{new}}{temperature}}, & {f i t n e s s}_{n e w} \geq f i t n e s s \end{cases}$ 19 $X_{i j} {= X}_{i j} + η * C (0,1)$ 20 $m = \frac{{f i t n e s s}_{b e s t}^{i} - {f i t n e s s}_{b e s t}^{i - 1}}{{f i t n e s s}_{b e s t}^{i - 1}}$

The workflow of CIASO-SA is shown in Figure 4.

[IMAGE OMITTED. SEE PDF]

The steps of CIASO-SA are as follows:

$1$

Step

Set initial parameters, including the number of atoms, $α$ , $β$ , $m$ and the maximum number of iterations.

$2$

Step

Chaotically initialise the position and velocity of atoms. Calculate the fitness of each atom by the fitness function.

$3$

Step

Determine if the algorithm runs into the local optimal solution. When m does not change for n consecutive iterations, the positions of atoms are updated based on Gaussian mutation. The optimal fitness for all atoms is updated.

$4$

Step

Determine if the number of iterations is larger than the specified number k. If yes, produce a new solution for each atom perturbation and calculate the fitness of the new solution, ${fitness}_{new}$ .Otherwise, skip to Step 6.

$5$

Step

Determine if ${f i t n e s s}_{n e w} < f i t n e s s$ . If yes, assign a value to the position of each atom, where $x_{i} = x_{i}^{n e w}$ . Otherwise, calculate the acceptance probability, p, following Equation (18). If p > rand(0, 1), the new solution is accepted; otherwise, it is not.

$6$

Step

Determine if the algorithm reaches the maximum number of iterations. If yes, end the algorithm and output the position of the best fitness atom. Otherwise, update parameters $β$ and $λ$ by Equation (14) and the velocity and the positions of atoms by Equation (16). Then, return to Step 2, calculate the fitness of each atom, and start the next iteration.

Feature selection of CIASO-SA

The feature selection aims to remove redundant or irrelevant features in the dataset and improve learning speed. Feature selection is considered a discrete optimisation problem. Therefore, this paper converts continuous CIASO-SA to binary CIASO-SA and uses it to solve the feature selection problem.

IASO updates the position and the speed of atoms based on acceleration in the continuous real number field. For the binary domain problem of feature selection, since while selecting a feature subset, each feature belonging to the original feature vector has two possibilities: either to be included in that subset or discarded.

At initialisation, the position of each atom is a vector of dimension N, where N is the original characteristic dimension. In discrete binary search space, the update of positions implies switching between 0 and 1. Hence, the update function for the position in Equation (16) needs to be adjusted, and we must use a transfer function to map velocity to a probability. According to Ref. [28], the domain of the transfer function must be in the range of 0–1, because we are going to get a probability. And with the increase of atomic speed, the probability value after mapping must be larger. That is, the transfer function is monotonically increasing. In this paper, we use the logistic function as shown in Equation (21). 21 $p_{i}^{d} (t) = \frac{1}{1 + e^{- v_{i}^{d} (t)}}$

By setting a probability threshold, we conduct binary coding to the probability to update the position of the atom, shown in Equations (22) and (23). 22 $B i n a r y_{c o d i n g}_{i}^{d} (t) = \{\begin{cases} 0, if p_{i}^{d} (t) \leq ϕ_{t h r e s h o l d} \\ 1, if p_{i}^{d} (t) > ϕ_{t h r e s h o l d} \end{cases}$ 23 $x_{i}^{d} (t + 1) = x_{i}^{d} (t) \oplus B i n a r y_{c o d i n g}_{i}^{d} (t)$ $\oplus$ represents bitwise XOR.

In age classification, the accuracy of the test and the number of features selected can be used as the evaluation of the whole algorithm. An ideal subset of features should not only eliminate unrelated, weakly related, and redundant features, but also retain the most useful information. Therefore, we define the optimisation goal, or the fitness function, as the following, 24 $f i t n e s s = ω \times R_{error} + (1 - ω) \times R_{select}$ where $R_{error}$ represents the error rate in age classification. $R_{select}$ represents the ratio of the number of selected features and the total number of features. $ω$ represents the weight of the error rate, $ω \in [0,1]$ .

The optimal solution is defined as Equation (25). 25 $x_{best} = \underset{x}{\arg \min} (f i t n e s s)$

Figure 5 is the flowchart of the whole feature selection method of the wrapper based on CIASO-SA.

[IMAGE OMITTED. SEE PDF]

EXPERIMENTS AND EVALUATION

Experimental setup and parameter tuning

To make the algorithm proposed in this paper achieve the best performance and compare it with other methods fairly, this section adjusts the experimental settings and the parameters in the algorithm. Firstly, the effect of the population size and the maximum number of iterations was studied. Table 4 shows the experimental results under different population sizes and the maximum number of iterations.

TABLE 4 The parameters settings of Population size and Max iterations

Population size of atoms	Max iterations	Optimum of fitness	Training time (h)
10	30	0.517	0.7
20	30	0.509	1.48
30	30	0.467	2.81
50	30	0.476	6.06
10	50	0.517	1.38
20	50	0.504	2.53
30	50	0.467	3.86
50	50	0.464	13.57

As seen in Table 4 when the maximum number of iterations is 30 or 50, the optimal fitness does not necessarily decrease as the number of atoms increases, but it takes a longer time. Furthermore, after 30 iterations, there was little change in optimum fitness. Based on the above analysis, when the atomic number is set to 30 and the maximum number of iterations is set to 30, the algorithm proposed by us can achieve the optimal performance from the comprehensive consideration of fitness and training duration. To make a fair comparison with other methods, the atomic size and the maximum number of iterations in the subsequent experiments are both 30.

The setting of parameters $α$ and $β$ in Formula (16) was referred to Ref [23]. In the experiment, the setting of all parameters in the algorithm is shown in Table 5. The maximum number of iterations was 30. The simulated annealing algorithm was introduced after iteration 15. The dataset was divided into five parts. The fivefold cross validation was used to take the mean.

TABLE 5 The parameters settings

Parameters	Number of atoms	$α$	$β$	$ω$	$k$
Values	30	50	0.2	0.9	15

Assessment of the proposed algorithm

Two performance indicators, accuracy and 1-off accuracy, are used to evaluate the result of age classification. Accuracy is defined as the proportion of all correctly classified results in the total number of samples, as shown in Equation (26), where TP, TN, FN, and TN represent true positive, true negative, false negative, and true negative, respectively. 1-off accuracy is defined as the proportion of all correctly classified results in the total number of samples when the classified result is within a domain of the correct label. 26 $accuracy = \frac{T P + T N}{T P + T N + F P + F N}$

For the face Gabor features with three resolutions in Section 2, fusion was conducted in the real part, imaginary part, and module. The curves of the number of iterations for ASO, IASO, and CIASO-SA are shown in Figure 6. On the feature maps in the three scales, ASO showed slow convergence for the fitness curve in the early stage of iterations.

[IMAGE OMITTED. SEE PDF]

After 15 iterations, the fitness curve converged and fell into a precocious state. Since IASO was added with a new covalent bond force based on ASO, the positions of atoms were updated from both the individual and the global optimal solution. An adaptive updating strategy was also introduced for the parameters that measure global search and local search. IASO still showed good performance in the middle and late stages. By comparing the fitness curves under 3 different resolutions, we can see that IASO showed a good convergence ability on small-scaled images. This means that IASO is more adaptive when the number of dimensions of features is small. Since the chaotic mechanism was used in atom initialisation in CIASO-SA, the robustness of initial values is strong. The convergence speed of IASO in the early stage is faster than that of ASO and CIASO-SA, but since the simulated annealing method was utilised, CIASO-SA showed a stronger ability to jump out of the local optimal solution in the later stage. In most cases, better fitness values can be obtained compared to IASO. The performance of CIASO-SA in terms of feature maps in each scale is better than ASO and IASO.

Figure 7 shows the histograms with error lines of the classification accuracy. On the feature map with 16 × 16 resolution, CIASO-SA achieved the best performance on the imaginary fusion map. On the feature map with 32 × 32 resolution, the imaginary part and module of Gabor features contributed similarly to age classification. CIASO-SA achieved the highest accuracy at classification, where the accuracy of the imaginary part and module are 47.7 ± 1.1 and 45.66 ± 2.6, respectively. On the feature map with 48 × 48 resolution, the real part of the Gabor features contributed the most to age classification, where the accuracy of that of CIASO-SA is 60.4 ± 1.2. By comparison, the classification accuracy tends to increase with the increase of the scale of feature maps.

[IMAGE OMITTED. SEE PDF]

Figure 8 shows the histogram, which compares the number of selected features in each algorithm under different scales. For feature maps of the real part with 48 × 48 resolution, the numbers of selected features in CIASO-SA, ASO, and IASO were 714, 721, and 730, respectively. CIASO-SA had the least number of selected features. On the Gabor feature maps of the imaginary part, the numbers of selected features in CIASO-SA, ASO, and IASO are 684, 745, and 715, respectively. Tables 6–8 describe the results obtained by ASO, IASO and the proposed CIASO-SA. In the tables, we report the influence of three feature selection algorithms on classification accuracy and length of selected features. Those illustrate that in high-dimensional spaces, CIASO-SA has a better ability in searching for the optimal solution than ASO and IASO. On the small-scale feature maps with 16 × 16 and 32 × 32 resolutions, the numbers of selected features do not greatly differ in CIASO-SA, ASO, and IASO. Thus, in the later stage of facial age classification, we choose the real part of Gabor with 48 × 48 resolution as the age feature vector, then select features by CIASO-SA.

[IMAGE OMITTED. SEE PDF]

TABLE 6 Recognition accuracy with No feature selection (FS) versus atom search optimisation (ASO)

Resolution	Gabor component	Recognition accuracy using total features (in %)	Recognition accuracy using selected features (in %)	Length of selected features	Improvement in recognition accuracy (in %)	Reduction in feature length
16 × 16	Real	29.56	27	62	−2.56	98
16 × 16	Imaginary	25.91	30.2	62	4.29	98
16 × 16	Modulus	22.99	29.93	78	6.94	82
32 × 32	Real	43.07	43.21	345	0.14	295
32 × 32	Imaginary	45.99	44.25	317	−1.74	323
32 × 32	Modulus	38.32	40.32	269	2	371
48 × 48	Real	45.07	46.32	721	1.25	719
48 × 48	Imaginary	44.34	50.21	745	5.87	695
48 × 48	Modulus	43.34	48.99	724	5.65	716

TABLE 7 Recognition accuracy with No FS versus improved atom search optimisation (IASO)

Resolution	Gabor component	Recognition accuracy using total features (in %)	Recognition accuracy using selected features (in %)	Length of selected features	Improvement in recognition accuracy (in %)	Reduction in feature length
16 × 16	Real	29.56	28.71	84	−0.85	76
16 × 16	Imaginary	25.91	33.12	77	7.21	83
16 × 16	Modulus	22.99	28.93	75	5.94	85
32 × 32	Real	43.07	42.33	333	0.74	307
32 × 32	Imaginary	45.99	48.7	317	2.71	323
32 × 32	Modulus	38.32	43.86	321	5.54	319
48 × 48	Real	45.07	56.32	730	11.25	710
48 × 48	Imaginary	44.34	53.99	715	9.65	725
48 × 48	Modulus	43.34	53.68	729	10.34	711

TABLE 8 Recognition accuracy with No FS versus chaotic improved atom search optimisation with simulated annealing (CIASO-SA)

Resolution	Gabor component	Recognition accuracy using total features (in %)	Recognition accuracy using selected features (in %)	Length of selected features	Improvement in recognition accuracy (in %)	Reduction in feature length
16 × 16	Real	29.56	35.24	80	5.68	80
16 × 16	Imaginary	25.91	35.69	82	9.78	78
16 × 16	Modulus	22.99	35.24	85	12.25	75
32 × 32	Real	43.07	45.26	310	2.19	330
32 × 32	Imaginary	45.99	49.64	335	3.65	305
32 × 32	Modulus	38.32	48.26	310	9.94	330
48 × 48	Real	45.07	60.4	714	15.33	726
48 × 48	Imaginary	44.34	56.32	684	11.98	756
48 × 48	Modulus	43.34	59.37	654	16.03	786

Eidinger et al. used LBP features and FPLBP features as facial age features and used the linear one-to-one SVM as the classifier [10]. Levi and Hassncer constructed a five-layer CNN to extract facial features and applied data augmentation to the Adience dataset by random crop [13]. Hosseini et al. fused the Gabor features of face images with the original images and fed them into a 6-layer CNN. The probability of each class was then produced by softmax [16]. From Table 9, the average accuracy and the 1-off accuracy of our method achieved 60.4% and 85.9%, which are better than the algorithms mentioned above. Table 10 illustrates the confusion matrix, where our algorithm performed the best at the age ranges of 0–2, 8–13 and 25–32, with the accuracy being 70%, 63% and 64%, respectively; whereas the accuracy of age ranges of 4–6, 15–20, and 60–100 is rather low, especially 4–6 and 60–100, which are likely to be misclassified into adjacent age ranges. Hosseini et al. achieved an accuracy of 61.3%, but the number of parameters is large due to the 6-layer CNN model, and the training is longer, at 132.2 h. The existing method of achieving state-of-the-art age estimation results was proposed in [17], which is based on DEX with pretraining on IMDB-WIKI. The model was trained on Nvidia Titan X GPUs. Whereas training on the IMDB-WIKI datasets took several days, let alone fine-tuning on Adience later. In contrast, the iteration time for CIASO-SA is only 3.1 h on the CPU. The size of the proposed age estimation algorithm model is only 15.7 Mb, which is one over 35 of DEX. In conclusion, our algorithm for facial age classification is simple in structure, easy to train, less time-consuming, and robust to complex light, position and background change.

TABLE 9 Result comparison of methods on Adience benchmark

Reference		Method	Accuracy (in %)	1-off Accuracy (in %)	Model file size (in Mb)
[10]	LBP + FPLBP + SVM (dropout 0.8)	45.1 ± 2.6	79.5 ± 1.4	2.5
[13]	Oversampling + CNN1 + Softmax	50.7 ± 5.1	84.7 ± 2.2	44
[16]	Gabor + CNN2	61.3 ± 0.0	N/A	18
[17]	DEX w/IMDB-WIKI pretrain	64.0 ± 4.2	96.6 ± 0.9	552
[17]	DEX w/o IMDB-WIKI pretrain	55.6 ± 6.1	89.7 ± 1.8	552
Our method	Gabor feature fusion + PCA + SVM	49.72 ± 2.5	81.3 ± 3.2	15.6
Our method	Gabor feature fusion + KPCA + SVM	49.43 ± 2.3	80.6 ± 2.7	15.6
Our method	Gabor feature fusion + CIAO-SA + SVM	60.4 ± 1.2	85.9 ± 2.1	15.7

TABLE 10 The confusion matrix of age classification

	0–2	4–6	8–13	15–20	25–32	38–43	48–53	60–100
0–2	0.7	0.15	0.04	0.0	0.04	0.0	0.0	0.07
4–6	0.21	0.51	0.23	0.02	0.01	0.0	0.0	0.02
8–13	0.04	0.15	0.63	0.02	0.02	0.09	0.04	0.01
15–20	0.08	0.08	0.15	0.51	0.14	0.03	0.0	0.01
25–32	0.04	0.0	0.1	0.03	0.64	0.08	0.0	0.01
38–43	0.02	0.04	0.06	0.02	0.08	0.59	0.1	0.09
48–53	0.04	0.04	0.08	0.02	0.09	0.16	0.57	0.0
60–100	0.02	0.02	0.07	0.0	0.08	0.17	0.18	0.48

By analysing the misclassified images, as shown in Figure 9, we can see that these images tend to have low resolution, large angular deviation, or large expression scale. These factors can negatively affect the extraction and fusion of age features in the later stage, resulting in misclassification.

[IMAGE OMITTED. SEE PDF]

CONCLUSIONS

We propose a method based on Gabor feature fusion and the CIASO-SA algorithm for facial age classification. The method first utilises the Gabor filter with five scales and eight directions to extract facial age features, then uses a histogram to conduct coding and fusion for the indices with the largest feature value in each direction of Gabor. As a result, the ability to represent extracted features is improved. To solve the problems with slow convergence and poor ability of high-dimensional search of ASO, CIASO-SA is proposed, which is optimised by the chaotic theory and the simulated annealing algorithm. CIASO-SA has greatly improved the number of selected features and accuracy, and it is more adaptive to solving optimisation problems in high dimensions. To test the performance of our method, we performed tests on the face images of three different resolutions. It is found that the real part of Gabor best performs on the 48 × 48 resolution, where the accuracy is 60.4%, and the 1-off accuracy is 85.9%. It is shown from the tests that our algorithm does not require a complex training process, improving the processing speed. Therefore, there is an excellent reference for the deployment of the mobile terminal.

CONFLICT OF INTEREST STATEMENT

The authors declare no conflicts of interest.

DATA AVAILABILITY STATEMENT

Data used to support the findings of this study are available from the corresponding author upon request.

References

Garain, A., et al.: GRA_Net: a deep learning model for classification of age and gender from facial images. IEEE Access. 9 85672–85689 (2021)

Rizwan, S.A., et al.: Robust active shape model via hierarchical feature extraction with SFS‐optimized convolution neural network for invariant human age classification. Electronics. 10(4), 465 (2021)

Kwon, Y.H., Lobo, N.V.: Age classification from facial images. Comput. Vis. Image Understand. 74(1), 1–21 (1999)

Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)

Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2037–2041 (2006)

Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings IEEE Conference Computer Vision and Pattern Recognition, pp. 886–893. IEEE, San Diego (2005)

Guo, G., et al.: Human age estimation using bio‐inspired features. In: Proceedings IEEE Conf. Computer Vision and Pattern Recognition, pp. 112–119. IEEE, Miami (2009)

Kannala, J., Rahtu, E.: BSIF: binarized statistical image features. In: Proc. Int. Conf. on Pattern Recognition, Tsukuba, Japan, pp. 1363–1366 (2012)

Ahonen, T., et al.: Recognition of blurred faces using local phase quantization, In: Proc. Int. Conf. on Pattern Recognition, Tampa, FL, USA, pp. 1–4 (2008)

Eidinger, E., Enbar, R., Hassner, T.: Age and gender estimation of unfiltered faces. IEEE Trans. Inf. Forensics Secur. 9(12), 2170–2179 (2014)

Choi, S.E., et al.: Age estimation using a hierarchical classifier based on global and local facial features. Pattern Recogn. 44(6), 1262–1281 (2011)

Bekhouche, S.E., et al.: A comparative study of human facial age estimation: handcrafted features vs. deep features. Multimed. Tool. Appl. 79(35–36), 26605–26622 (2020)

Levi, G., Hassncer, T.: Age and gender classification using convolutional neural networks. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, pp. 34–42 (2015)

Gurpinar, F., et al.: Kernel ELM and CNN based facial age estimation. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, pp. 80–86 (2016)

Liu, H., et al.: Group‐aware deep feature learning for facial age estimation. Pattern Recogn. 66, 82–94 (2017)

Hosseini, S., et al.: Age and gender classification using wide convolutional neural network and Gabor filter. In: Int. Workshop on Advanced Image Technology, Chiang Mai, Thailand, pp. 1–3 (2018)

Rothe, R., Timofte, R., Gool, L.V.: Deep expectation of real and apparent age from a single image without facial landmarks. Int. J. Comput. Vis. 126(2), 144–157 (2018)

Kyrki, V., Kamarainen, J.‐K., Kälviäinen, H.: Simple Gabor feature space for invariant object recognition. Pattern Recogn. Lett. 25(3), 311–318 (2004)

Alelyani, S., Tang, J., Liu, H.: Feature selection for clustering: a review. In: Aggarwal, C.C., Reddy, C.K. (eds.) Data Clustering, pp. 29–60. Chapman and Hall/CRC, UK (2018)

Cheng, K., et al.: Feature selection. ACM Comput. Surv. 50(6), 1–45 (2018)

Bhattacharyya, T., et al.: Mayfly in harmony: a new hybrid meta‐heuristic feature selection algorithm. IEEE Access. 8, 195929–195945 (2020)

Doreswamy, F., Ibrahim, G., Gad, I.: Feature selection approach using ensemble learning for network anomaly detection. CAAI Trans. Intell. Technol. 5(4), 283–293 (2020)

Zhao, W., Wang, L., Zhang, Z.: A novel atom search optimization for dispersion coefficient estimation in groundwater Future Generat. Comput. Syst. 91, 601–610 (2019)

Li, J., Lu, D., Li, H.: An improved atomic search algorithm. J. Syst. Simul. 1–13 (2021)

Viola, P., Jones, M.J.: Robust real‐time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)

Xiao, P., Yin, Y.: A dimension reduction SIFT algorithm based on PCA. Modern Computer. (34), 15–18 (2020)

Ma, H., Yan, Y.: Analysis and calculation method of on‐load tap changers state characteristics based on chaos theory and grasshopper optimization algorithm‐K‐means algorithm. Trans. China Electrotech. Soc. 36(07), 1399–1406 (2021)

Mirjalili, S., Lewis, A.: S‐shaped versus V‐shaped transfer functions for binary Particle Swarm Optimization. Swarm Evol Comput. 9, 1–14 (2013)

Word count: 7083

Show less

© 2023. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Aiming at the problem of long time‐consuming and low accuracy of existing age estimation approaches, a new age estimation method using Gabor feature fusion, and an improved atomic search algorithm for feature selection is proposed. Firstly, texture features of five scales and eight directions in the face region are extracted by Gabor wavelet transform. The statistical histogram is introduced to encode and fuse the directional index with the largest feature value on Gabor scales. Secondly, a new hybrid feature selection algorithm chaotic improved atom search optimisation with simulated annealing (CIASO‐SA) is presented, which is based on an improved atomic search algorithm and the simulated annealing algorithm. Besides, the CIASO‐SA algorithm introduces a chaos mechanism during atomic initialisation, significantly improving the convergence speed and accuracy of the algorithm. Finally, a support vector machine (SVM) is used to get classification results of the age group. To verify the performance of the proposed algorithm, face images with three resolutions in the Adience dataset are tested. Using the Gabor real part fusion feature at 48 × 48 resolution, the average accuracy and 1‐off accuracy of age classification exhibit a maximum of 60.4% and 85.9%, respectively. Obtained results prove the superiority of the proposed algorithm over the state‐of‐the‐art methods, which is of great referential value for application to the mobile terminals.

Details

Title

Age estimation from facial images based on Gabor feature fusion and the CIASO‐SA algorithm

Author

Lu, Di¹; Wang, Dapeng¹

; Zhang, Kaiyu¹; Zeng, Xiangyuan²

¹ School of Measurement and Communication Engineering, Harbin University of Science and Technology, Harbin, China
² Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada

Pages

518-531

Section

REGULAR ARTICLES

Publication year

2023

Publication date

Jun 1, 2023

Publisher

John Wiley & Sons, Inc.

e-ISSN

24682322

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1049/cit2.12084

ProQuest document ID

3091965441

Age estimation from facial images based on Gabor feature fusion and the CIASO‐SA algorithm

Jump to:

Full text

Abstract

Details

Suggested sources