This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
Deep learning is an artificial intelligence technique that follows the structure of a human’s brain and imitates the neural cells in the human brain [1]. Over the past decades, deep learning has made significant progresses in speech recognition, natural language processing [2], computer vision [3], image classification [4], and privacy protection [5–7]. In particular, with the increase of data volume, traditional machine learning algorithms, such as SVM [8] and NB [9], suffer a performance bottleneck, in which adding more training data cannot really enhance their classification accuracy. Differently, the deep learning classifiers can continue to get improvements if more data is fed. However, it has been revealed that artificial perturbation can make the deep learning models misclassify easily. A number of effective methods have been proposed to produce so-called “adversarial samples” to fool the models [10, 11] and some work focused on fighting against adversarial attack [12–14].
1.1. One-Pixel Attack
Among the existing works, the one-pixel attack takes an extreme scenario into consideration, where only one pixel of an image is allowed to be modified to mislead the classification models of the Deep Neural Network (DNN) such that the perturbed image is classified to another label different from the image’s original label [15]. As shown in Figure 1, with the modification of one pixel, the classification of images is changed to totally irrelevant labels.
[figure omitted; refer to PDF]
The one-pixel attack is harmful to the performance guarantee of DNN-based information systems. Via modifying only one pixel in an image, the classification of the image may change to an irrelevant label, leading to performance degradation of DNN-based applications/services and even other serious consequences. For examples, in medical image systems, the one-pixel attack may make a doctor misjudge the disease of patients, and in autodriving vehicles, the one-pixel attack may cause serious traffic accidents on roads.
More importantly, the one-pixel attack is more threatening than other types of adversarial attack as it can be implemented easily and effectively to damage system security. Since the one-pixel attack is a type of black box attack, it does not require any additional information of the DNN models. In practice, the one-pixel attack only needs the probabilities of different labels instead of the inner information about the target DNN models, such as gradients and hyperparameters. The effectiveness of the one-pixel attack towards DNNs has been validated in [15], where the attack success rate is 31.40% on the original CIFAR-10 image dataset and 16.04% on the ImageNet dataset. Such a success rate is large enough to break down an image classification system.
Therefore, to avoid the loss of system performance, detecting the one-pixel attack becomes an essential task.
1.2. Technical Challenges
The following two facts result in the difficulty of examining a one-pixel attack in images.
(1) Extremely Small Modification. The one-pixel attack modifies only one pixel in an image, which is significantly less than other types of adversarial attack. This makes the detection of the one-pixel attack very challenging.
(2) Randomness of Pixel Modification. For an image, there may be more than one feasible pixel that can cause the change of classification. In [15], the one-pixel attack randomly selects one of those feasible pixels to mislead the classifiers. Such randomness makes the detection of the one-pixel attack harder.
1.3. Contributions
In this paper, we develop two methods to detect a one-pixel attack for images, including trigger detection and candidate detection. In the trigger detection method, based on a concept named the “trigger” [16], we use gradient information of the distance between the pixels and target labels to find the pixel that is modified by the one-pixel attack. By considering the property of the one-pixel attack, in the candidate detection method, we design a differential evolution-based heuristic algorithm to find a set of candidate victim pixels that may contain the modified pixel. Intensive real-data experiments are well conducted to evaluate the performance of our two detection methods. To sum up, this paper has the following multifold contributions.
(i) To the best of our knowledge, this is the first work to study the detection of the one-pixel attack in literature, which can contribute to the defense of the one-pixel attack in future research
(ii) Two novel detection mechanisms are proposed, in which the modified pixels can be identified in two different ways based on our thorough analysis on the one-pixel attack
(iii) The results of real-data experiments validate that our two detection methods can effectively detect the one-pixel attack with satisfied detection success rates
The rest of this paper is organized as follows. In “Related Works,” the existing works on adversarial attacks and detection schemes are briefly summarized. The attack model and the detection model are presented in “System Models.” Our two detection methods are demonstrated in “Design of Detection Methods.” After analyzing the performance of our methods in “Performance Validation,” we conclude this paper and discuss our future work in “Conclusion and Future Work.”
2. Related Works
A one-pixel attack is a special type of adversarial attack and is designed based on a differential evolution scheme. Thus, this section summarizes the most related literatures from the following two aspects: adversarial attack and detection of an adversarial attack.
2.1. Adversarial Attack
In an adversarial attack, attackers intend to mislead classifiers by constructing adversarial samples. Nguyen et al. made efforts on fooling a machine learning algorithm [17] and found that DNNs give high confidence results to random noise, which indicates that universal adversarial perturbation in a single crafted perturbation can cause a misclassification on multiple samples. In [10, 18], back-propagation is used to find gradient information of machine learning models, and gradient descent methods are used to build adversarial samples.
Since it might be hard or impossible to learn the internal information of a DNN model in practice, some approaches have been proposed to generate adversarial samples without using the internal characteristics of DNN models. Such approaches are called a black box attack [19–21]. Particularly, a special type of black box attack is the one-pixel attack, in which only one pixel is allowed to be modified. Under the one-pixel attack of [15], an algorithm was developed to find a feasible pixel for malicious modification based on differential evolution that has a higher probability of finding an expected solution compared with gradient-based optimization methods. Due to the concealed modification of only one pixel, it becomes more difficult to detect the one pixel attack. As mentioned in [15], the one-pixel attack requires only black box feedback that is the probability label without any inner information of the target network, like gradients or structure.
2.2. Detection of Adversarial Attack
On the other hand, research attention is also paid to work out detection methods for adversarial attack. Papernot et al. provided a comprehensive investigation into the security problems of machine learning and deep learning, in which they established a threat model and presented the “no free lunch” theory showing the tradeoff between accuracy and robustness of deep learning models. Inspired from the fact that most of the current datasets are compressed JPG images, some researchers designed a method to defend image adversarial attack using image compression. However, in their proposed method, a large compression may also lead to a large loss of classification accuracy of the attacked images, while a small compression cannot work well against adversarial attack. In [16], Neural Cleanse was developed to detect a backdoor attack in neural networks, and some methods were designed to mitigate backdoor attack as well.
Compared with the existing works, this paper is the first work that focuses on the detection of the one-pixel attack. In particular, two novel detection mechanisms are proposed with one using a gradient calculation-based method and the other using a differential evolution-based method.
3. System Models
The attack model and detection model in our considered DNN-based information systems are introduced as follows.
3.1. Attack Model
In this paper, the attack model of [15] is taken into account, in which an adversarial image is generated by modifying only one pixel in the victim image. The purpose of a one-pixel attack is to maliciously change the classification result of a victim image from its original label to a target label. As shown in Figure 2, the image is correctly classified as an original label, “sheep,” by a given DNN model. After being modified one pixel, the output label with the highest preference of the model is changed to a target label, “airplane,” leading to a wrong classification result. The attackers perform a black box attack only, which means they have the accessibility to the probability labels and cannot get the inner information of the network. Also, considering that the attacker aims to make the attack as efficiently as possible, it is supposed that all the images in the dataset are altered.
[figure omitted; refer to PDF]
From Figure 3, one can find that the
[figure omitted; refer to PDF]
The average detection success rate of our trigger detection method is
5.4. Performance of Candidate Detection
In our candidate detection method, the initial number of candidate solutions and the number of produced candidate solutions are set to be 400, i.e.,
Table 5
Detection success rate of candidate detection.
| Experiment | 1 | 2 | 3 | 4 | 5 |
| Success rate (%) | 20.4 | 24.3 | 30.1 | 21.9 | 26.7 |
As shown in Figure 5, when
[figure omitted; refer to PDF]
When
However, when
6. Conclusion and Future Work
This paper proposes two novel methods, i.e., the trigger detection method and the candidate detection method, to detect a one-pixel attack that is one of the most concealed attack models. The trigger detection method gives the exact pixel that may be modified by the one-pixel attack; the candidate detection method outputs a set of pixels that may be changed in the one-pixel attack. Via extensive real-data experiments, the effectiveness of our two methods can be confirmed; in particular, the detection success rate of our candidate detection can achieve 30.1%.
As a preliminary exploration of the one-pixel attack detection, in this paper, we consider that all the images are attacked and the detection is thus implemented on a dataset full of modified images. In our future work, we will carry out further research activities along two directions: (i) attempting to distinguish between the benign images and the attacked images in the presence of the one-pixel attack and (ii) mitigating the impact of the one-pixel attack by enhancing the resistance to adversarial samples in DNNs.
Acknowledgments
This work was partly supported by the National Science Foundation of the U.S. (1704287, 1829674, 1912753, and 2011845).
[1] Y. LeCun, Y. Bengio, G. Hinton, "Deep learning," Nature, vol. 521 no. 7553, pp. 436-444, DOI: 10.1038/nature14539, 2015.
[2] R. Socher, C. C. Lin, C. Manning, A. Y. Ng, "Parsing natural scenes and natural language with recursive neural networks," Proceedings of the 28th international conference on machine learning (ICML-11), pp. 129-136, .
[3] A. Kendall, Y. Gal, What uncertainties do we need in Bayesian deep learning for computer vision?, 2017.
[4] T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, Y. Ma, "PCANet: a simple deep learning baseline for image classification?," IEEE Transactions on Image Processing, vol. 24 no. 12, pp. 5017-5032, DOI: 10.1109/TIP.2015.2475625, 2015.
[5] X. Zheng, Z. Cai, Y. Li, "Data linkage in smart internet of things systems: a consideration from a privacy perspective," IEEE Communications Magazine, vol. 56 no. 9, pp. 55-61, DOI: 10.1109/MCOM.2018.1701245, 2018.
[6] Y. Liang, Z. Cai, J. Yu, Q. Han, Y. Li, "Deep learning based inference of private information using embedded sensors in smart devices," IEEE Network, vol. 32 no. 4,DOI: 10.1109/MNET.2018.1700349, 2018.
[7] Z. Cai, Z. He, X. Guan, Y. Li, "Collective data-sanitization for preventing sensitive information inference attacks in social networks," IEEE Transactions on Dependable and Secure Computing, vol. 15 no. 4,DOI: 10.1109/TDSC.2016.2613521, 2016.
[8] V. Vapnik, The Nature of Statistical Learning Theory, 2013.
[9] D. D. Lewis, "Naive (Bayes) at forty: the independence assumption in information retrieval," European conference on machine learning, 1998.
[10] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, A. Swami, "The limitations of deep learning in adversarial settings," 2016 IEEE European Symposium on Security and Privacy (EuroS and P), pp. 372-387, DOI: 10.1109/EuroSP.2016.36, .
[11] X. Yuan, P. He, Q. Zhu, X. Li, "Adversarial examples: attacks and defenses for deep learning," IEEE Transactions on Neural Networks and Learning Systems, vol. 30 no. 9, pp. 2805-2824, DOI: 10.1109/TNNLS.2018.2886017, 2019.
[12] Z. Cai, X. Zheng, J. Yu, "A differential-private framework for urban traffic flows estimation via taxi companies," IEEE Transactions on Industrial Informatics, vol. 15 no. 12, pp. 6492-6499, DOI: 10.1109/TII.2019.2911697, 2019.
[13] Z. Xiong, Z. Cai, Q. Han, A. Alrawais, W. Li, "Adgan: protect your location privacy in camera data of auto-driving vehicles," IEEE Transactions on Industrial Informatics,DOI: 10.1109/TII.2020.3032352, 2020.
[14] Z. Xiong, W. Li, Q. Han, Z. Cai, "Privacy-preserving auto-driving: a GAN-based approach to protect vehicular camera data," 2019 IEEE International Conference on Data Mining (ICDM), pp. 668-677, DOI: 10.1109/ICDM.2019.00077, .
[15] J. Su, D. V. Vargas, K. Sakurai, "One pixel attack for fooling deep neural networks," IEEE Transactions on Evolutionary Computation, vol. 23 no. 5, pp. 828-841, DOI: 10.1109/TEVC.2019.2890858, 2019.
[16] B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, B. Y. Zhao, "Neural cleanse: identifying and mitigating backdoor attacks in neural networks," 2019 IEEE Symposium on Security and Privacy (SP), pp. 707-723, DOI: 10.1109/SP.2019.00031, .
[17] A. Nguyen, J. Yosinski, J. Clune, "Deep neural networks are easily fooled: high confidence predictions for unrecognizable images," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 427-436, .
[18] S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard, "Deepfool: a simple and accurate method to fool deep neural networks," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574-2582, .
[19] N. Narodytska, S. Kasiviswanathan, "Simple black-box adversarial attacks on deep neural networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1310-1318, .
[20] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, A. Swami, "Practical black-box attacks against machine learning," Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506-519, DOI: 10.1145/3052973.3053009, .
[21] J. Gao, J. Lanchantin, M. L. Soffa, Y. Qi, "Black-box generation of adversarial text sequences to evade deep learning classifiers," 2018 IEEE Security and Privacy Workshops (SPW), pp. 50-56, DOI: 10.1109/SPW.2018.00016, .
[22] I. Ben-Gal, "Outlier detection," Data mining and knowledge discovery handbook, pp. 131-146, 2005.
[23] S. Wu, G. Li, L. Deng, L. Liu, D. Wu, Y. Xie, L. Shi, "L 1 -norm batch normalization for efficient training of deep neural networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 30 no. 7, pp. 2043-2051, DOI: 10.1109/TNNLS.2018.2876179, 2019.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2021 Peng Wang et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In recent years, a series of researches have revealed that the Deep Neural Network (DNN) is vulnerable to adversarial attack, and a number of attack methods have been proposed. Among those methods, an extremely sly type of attack named the one-pixel attack can mislead DNNs to misclassify an image via only modifying one pixel of the image, leading to severe security threats to DNN-based information systems. Currently, no method can really detect the one-pixel attack, for which the blank will be filled by this paper. This paper proposes two detection methods, including trigger detection and candidate detection. The trigger detection method analyzes the vulnerability of DNN models and gives the most suspected pixel that is modified by the one-pixel attack. The candidate detection method identifies a set of most suspected pixels using a differential evolution-based heuristic algorithm. The real-data experiments show that the trigger detection method has a detection success rate of 9.1%, and the candidate detection method achieves a detection success rate of 30.1%, which can validate the effectiveness of our methods.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer





