Content area
This article focuses on the effectiveness and robustness of facial classification systems in the field of biometric identification. Artificial intelligence is increasingly becoming a part of everyday life, with more and more users employing it across various domains. In the field of security, Al is used, for instance, in cybersecurity and risk analysis. It is also integrated into surveillance systems, particularly for facial recognition. A comparative analysis of three convolutional neural networks-GoogLeNet, ResNet-101, and DenseNet-201-was conducted in this study using the MATLAB simulation environment. These CNNs were pre-trained and subsequently tested from several perspectives, including performance, training time, and validation accuracy. The collected data served as a basis for comparing the networks with one another and were also used for further analysis of training and output evaluation. The results can form the basis for further research and can be compared with a possible study in which real photographs with higher noise were used. The results can also be applied to enhance electronic security systems, such as access control for mines and geologically significant sites.
ABSTRACT
This article focuses on the effectiveness and robustness of facial classification systems in the field of biometric identification. Artificial intelligence is increasingly becoming a part of everyday life, with more and more users employing it across various domains. In the field of security, Al is used, for instance, in cybersecurity and risk analysis. It is also integrated into surveillance systems, particularly for facial recognition. A comparative analysis of three convolutional neural networks-GoogLeNet, ResNet-101, and DenseNet-201-was conducted in this study using the MATLAB simulation environment. These CNNs were pre-trained and subsequently tested from several perspectives, including performance, training time, and validation accuracy. The collected data served as a basis for comparing the networks with one another and were also used for further analysis of training and output evaluation. The results can form the basis for further research and can be compared with a possible study in which real photographs with higher noise were used. The results can also be applied to enhance electronic security systems, such as access control for mines and geologically significant sites.
Keywords: Biometrie, CNN, GoogLeNet, ResNet-101, DenseNet-201
INTRODUCTION
Artificial intelligence 1s increasingly penetrating everyday life, and more and more users are employing it across various sectors. In the field of security, it is used, for example, in cybersecurity, and it can also be found in surveillance systems for facial recognition. Mines are often highly secured areas where facial recognition powered by artificial intelligence presents an interesting option for ensuring security. Facial recognition has gained significant attention from both research communities and the market, leading to a growing demand for robust facial recognition algorithms capable of working with real-world facial images [1].
The aim of this article is to present the results of a comparative analysis focused on the performance and robustness of various artificial intelligence-based classification methods for face detection and recognition from image recordings within the context of biometric identification.
MATERIALS AND METHODS
In general, facial recognition systems can be divided into two main categories: imagebased methods and video-based methods. Image processing techniques (also known as single-image methods) recognize a person based on their current physical appearance, referred to as static facial recognition. On the other hand, video-based techniques consider not only the current physical appearance but also the dynamics of facial recordings, i.e., changes in appearance over time-these are known as dynamic methods [2]. This article focuses solely on image-based methods-static methods.
It is also important to mention the use of so-called neural networks. Deep learning is the process of training multilayer neural networks, with the general rule being that the more layers a network has, the deeper it is. Mathematical neurons form the fundamental building blocks of these networks. They are inspired by biological neurons found in the human brain. [3]
A neural network is composed of individual layers. A layer is a part of the neural network that performs a specific operation on data and passes the result to the next layer. A layer represents a group of neurons that perform a particular data transformation. Different types of layers are distinguished based on the type of transformation they apply.
The MATLAB programming environment provides access to several pre-trained deep neural networks focused on image classification. These networks have been trained on over a million images and are capable of classifying images into up to a thousand categories.
As part of the tests described in this article, three types of pre-trained convolutional neural networks (CNN) were selected: GoogLeNet, ResNet-101, and DenseNet-201. The testing itself was conducted on a standard consumer laptop-specifically, an HP Pavilion Gaming 15.
All tests were carried out in the MATLAB environment..
All tests followed the same procedure, using an identical algorithm consisting of the following sequence of steps: 1.Image loading (photographs, details below), 2.Dataset splitting, 3. CNN loading, 4. Image augmentation, 5. CNN parameter configuration, 5. CNN training process, 6. Evaluation
The entire algorithm used for CNN training remained the same for all three pre-trained neural networks. The only variable was the specific pre-trained neural network applied in each test.
The first step involves preparing the data to be used in the algorithm. A total of 5,000 facial images were used. These images were organized into 100 folders (1 folder = 1 subject), each containing 50 images of the same individual. The initial step of the algorithm 1s to load these images from the CMU Multi-PIE database. Technically, this step 1s performed using the imageDatastore function.
For certain tests, only 10% of the dataset was used (500 images); for others, 50% (2,500 images); and finally, the full dataset was used, i.e., 100% (5,000 images).
In a specific ratio, e.g., 40/60, 40% of the dataset is used as the training set, and the remaining 60% represents the validation dataset. The third step involves loading the pre-trained neural network. Specifically, this includes GoogLeNet, ResNet-101, and DenseNet-201. The fourth step is image augmentation. This primarily involves resizing the images to a resolution of 224x224x3 pixels. The fifth step is setting the CNN parameters. However, before that, transfer learning needed to be performed. This means that two layers located at the end of the pre-trained neural networks had to be replaced. Specifically, the fully connected layer and the classification layer were replaced using the replaceLayer function. The next step was to set the appropriate network hyperparameters. The penultimate step of the algorithm was the actual training process. Depending on the parameter settings, different levels of accuracy and training durations could be achieved. The final step was the evaluation of the process.
The result of the process includes several outputs. It is important to understand that there are only four possible outcomes when recognizing an identity:
* TP (True Positive) - indicates correctly recognized subjects, meaning the algorithm correctly identified the target individual.
* TN (True Negative) - indicates correctly recognized non-target subjects, meaning the algorithm correctly determined that the individual is not the target.
* FP (False Positive) - indicates incorrectly recognized subjects, meaning the algorithm mistakenly identified someone as the target individual.
* FN (False Negative) - indicates misclassified target subjects, meaning the algorithm failed to identify the correct individual and assigned them an incorrect identity.
One of the outcomes of the testing process is the so-called performance metrics, which include:
Accuracy - This is the ratio of correctly identified subjects to the total dataset of individuals. It answers the question: how many subjects were correctly identified by the CNN out of all identification attempts. [4]
Precision - This is the ratio of correctly identified subjects to all subjects for whom the prediction was positive, i.e., to all individuals the algorithm thought it had recognized (regardless of whether the identification was correct). [4]
Sensitivity (Recall) - This is the ratio of correctly identified individuals with a positive identity assignment to all individuals who are actual targets (e.g., terrorists in a security context).
Specificity - This refers to the ratio of correctly rejected (non-identified) individuals to all those who are truly non-targets (e.g., civilians), whether they were labeled or not.
Е-всоге (Е - score) - This metric considers both precision and sensitivity. It is the harmonic mean of precision and recall
To find the optimal testing configuration, exactly 20 tests were designed for each pretrained neural network. The specific settings of all varying parameters are presented in Table 1.
RESULTS
Since the testing process generated a vast amount of data, only the most important results are presented. These can be categorized into three groups: training process performance results, training time evaluation, and validation accuracy.
Training process performance results can be seen in Figure 1. The comparison was based on five indicators, which is why a pentagon is shown in Figure 1. The decisive value for each performance metric is the arithmetic mean of all achieved values. Each metric has its own axis, with the center of the graph representing a value of 94%, and the outer edge of each axis representing the maximum value of 100%.
The best overall results were achieved by the DenseNet-201 network. It performed the best in three categories: accuracy, sensitivity, and F-score. The ResNet-101 network also achieved strong results and was the top performer in two categories: precision and specificity. The GoogLeNet network significantly underperformed in training compared to the other two networks.
The only standout training metric for GoogLeNet was sensitivity, although even here, it did not reach the values achieved by the other networks. Conversely, the worst performance indicator for GoogLeNet was specificity.
Training Time Evaluation
An overview of the time requirements of the tested models 1s presented in Table 2 below.
Validation Evaluation
The main evaluation metric is the final validation accuracy. An overview of the achieved validation accuracies is presented in Table 3.
For all three neural networks, an absolute validation accuracy of 100% was achieved in at least one case. Conversely, the worst results were recorded in Test No. 16 across all three neural networks. In some cases, where the final validation accuracy is identical, it is interesting to observe the evolution of validation accuracy over time. This phenomenon could be further analyzed in a separate article.
DISCUSSION
The presented results were obtained using photographs captured under controlled conditions, despite variations in brightness and positioning. For all tests conducted, the rule "1 image = 1 face" was strictly followed. The question remains: what results would be achieved under real-world conditions? It would therefore be beneficial to test the models that achieved the highest validation accuracy-ideally those with an absolute validation accuracy of 100% - on real-world data that contains higher noise.
It is possible that a new dataset with higher noise levels could serve as a foundation for improving the system's applicability in real-world scenarios. This dataset should include images containing multiple faces per frame or individuals captured from greater distances. This would also introduce varying input image resolutions, making the system even more robust..
Another way to achieve better results may be through the use of a different pre-trained neural network. The choice of network type will depend on the desired outcome. If the researcher aims to obtain approximate results within a short time frame, then-based on the observations and findings of this study-it is recommended to use a smaller and faster network, such as SqueezeNet. On the other hand, if the goal is to achieve highly accurate results over a longer period, it would be more advantageous to utilize a more complex pre-trained neural network, such as NASNet-Large.
CONCLUSION
When comparing the achieved final validation accuracy with the computational time requirements, it is evident that the DenseNet-201 network achieved the highest final validation accuracy in most cases. However, its computation time was significantly longer than that of the ResNet-101 network, which reached a comparable level of final validation accuracy but with considerably lower computational demands. Although ResNet-101 achieved slightly lower validation accuracy, it proved to be more advantageous in terms of the efficiency-to-computation-time ratio compared to DenseNet-201.
The third network, GoogLeNet, was the weakest in terms of validation accuracy results. However, it was the fastest among the three networks, meaning that the computation time was the shortest.
This network is suitable for obtaining approximate results in a short time frame. A slight parallel can be observed between the results of GoogLeNet and the other two networks-that is, in cases where GoogLeNet performed very poorly in terms of validation accuracy, the performance of the other two networks also decreased, though they still achieved significantly better results than GoogLeNet.
We recommend continuing the research using real-world images with higher levels of noise.
REFERENCES
[1] Hassaballah, M., € Aly, $. (2015). Face recognition: challenges, achievements and future directions. IET Computer Vision. Retrieved December 27, 2022, from https://doi.org/10.1049/iet-cvi.2014.0084
[2] Taskiran, M., Kahraman, N., & Erdem, С. Е. (2020). Face recognition: Past, present and future (a review). Digital Signal Processing. Retrieved January 13, 2023, from https://doi.org/10.1016/j.dsp.2020.102809
[3] Pazderka, R. (2021). Introduction to Convolutional Neural Networks. Neuronové sité CZ. Retrieved January 15, 2023, from https://neuronove.site/video-1-page
[4] Ghoneim, S. (2019). Accuracy, Recall, Precision, F-Score & Specificity, which to optimize on? Towards Data Science. Retrieved February 21, 2023, from https://towardsdatascience.com/accuracy-recall-precision-f-score-specificity-which-tooptimize-on-867d3f11124
Copyright Surveying Geology & Mining Ecology Management (SGEM) 2025