Detection Mechanisms of One-Pixel Attack

Document 1 of 1

More like this

Full Text
Scholarly Journal

Full text

Turn on search term navigation

This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

1. Introduction

Deep learning is an artificial intelligence technique that follows the structure of a human’s brain and imitates the neural cells in the human brain [1]. Over the past decades, deep learning has made significant progresses in speech recognition, natural language processing [2], computer vision [3], image classification [4], and privacy protection [5–7]. In particular, with the increase of data volume, traditional machine learning algorithms, such as SVM [8] and NB [9], suffer a performance bottleneck, in which adding more training data cannot really enhance their classification accuracy. Differently, the deep learning classifiers can continue to get improvements if more data is fed. However, it has been revealed that artificial perturbation can make the deep learning models misclassify easily. A number of effective methods have been proposed to produce so-called “adversarial samples” to fool the models [10, 11] and some work focused on fighting against adversarial attack [12–14].

1.1. One-Pixel Attack

Among the existing works, the one-pixel attack takes an extreme scenario into consideration, where only one pixel of an image is allowed to be modified to mislead the classification models of the Deep Neural Network (DNN) such that the perturbed image is classified to another label different from the image’s original label [15]. As shown in Figure 1, with the modification of one pixel, the classification of images is changed to totally irrelevant labels.

[figure omitted; refer to PDF]

The one-pixel attack is harmful to the performance guarantee of DNN-based information systems. Via modifying only one pixel in an image, the classification of the image may change to an irrelevant label, leading to performance degradation of DNN-based applications/services and even other serious consequences. For examples, in medical image systems, the one-pixel attack may make a doctor misjudge the disease of patients, and in autodriving vehicles, the one-pixel attack may cause serious traffic accidents on roads.

More importantly, the one-pixel attack is more threatening than other types of adversarial attack as it can be implemented easily and effectively to damage system security. Since the one-pixel attack is a type of black box attack, it does not require any additional information of the DNN models. In practice, the one-pixel attack only needs the probabilities of different labels instead of the inner information about the target DNN models, such as gradients and hyperparameters. The effectiveness of the one-pixel attack towards DNNs has been validated in [15], where the attack success rate is 31.40% on the original CIFAR-10 image dataset and 16.04% on the ImageNet dataset. Such a success rate is large enough to break down an image classification system.

Therefore, to avoid the loss of system performance, detecting the one-pixel attack becomes an essential task.

1.2. Technical Challenges

The following two facts result in the difficulty of examining a one-pixel attack in images.

(1) Extremely Small Modification. The one-pixel attack modifies only one pixel in an image, which is significantly less than other types of adversarial attack. This makes the detection of the one-pixel attack very challenging.

(2) Randomness of Pixel Modification. For an image, there may be more than one feasible pixel that can cause the change of classification. In [15], the one-pixel attack randomly selects one of those feasible pixels to mislead the classifiers. Such randomness makes the detection of the one-pixel attack harder.

1.3. Contributions

In this paper, we develop two methods to detect a one-pixel attack for images, including trigger detection and candidate detection. In the trigger detection method, based on a concept named the “trigger” [16], we use gradient information of the distance between the pixels and target labels to find the pixel that is modified by the one-pixel attack. By considering the property of the one-pixel attack, in the candidate detection method, we design a differential evolution-based heuristic algorithm to find a set of candidate victim pixels that may contain the modified pixel. Intensive real-data experiments are well conducted to evaluate the performance of our two detection methods. To sum up, this paper has the following multifold contributions.

(i) To the best of our knowledge, this is the first work to study the detection of the one-pixel attack in literature, which can contribute to the defense of the one-pixel attack in future research

(ii) Two novel detection mechanisms are proposed, in which the modified pixels can be identified in two different ways based on our thorough analysis on the one-pixel attack

(iii) The results of real-data experiments validate that our two detection methods can effectively detect the one-pixel attack with satisfied detection success rates

The rest of this paper is organized as follows. In “Related Works,” the existing works on adversarial attacks and detection schemes are briefly summarized. The attack model and the detection model are presented in “System Models.” Our two detection methods are demonstrated in “Design of Detection Methods.” After analyzing the performance of our methods in “Performance Validation,” we conclude this paper and discuss our future work in “Conclusion and Future Work.”

2. Related Works

A one-pixel attack is a special type of adversarial attack and is designed based on a differential evolution scheme. Thus, this section summarizes the most related literatures from the following two aspects: adversarial attack and detection of an adversarial attack.

2.1. Adversarial Attack

In an adversarial attack, attackers intend to mislead classifiers by constructing adversarial samples. Nguyen et al. made efforts on fooling a machine learning algorithm [17] and found that DNNs give high confidence results to random noise, which indicates that universal adversarial perturbation in a single crafted perturbation can cause a misclassification on multiple samples. In [10, 18], back-propagation is used to find gradient information of machine learning models, and gradient descent methods are used to build adversarial samples.

Since it might be hard or impossible to learn the internal information of a DNN model in practice, some approaches have been proposed to generate adversarial samples without using the internal characteristics of DNN models. Such approaches are called a black box attack [19–21]. Particularly, a special type of black box attack is the one-pixel attack, in which only one pixel is allowed to be modified. Under the one-pixel attack of [15], an algorithm was developed to find a feasible pixel for malicious modification based on differential evolution that has a higher probability of finding an expected solution compared with gradient-based optimization methods. Due to the concealed modification of only one pixel, it becomes more difficult to detect the one pixel attack. As mentioned in [15], the one-pixel attack requires only black box feedback that is the probability label without any inner information of the target network, like gradients or structure.

2.2. Detection of Adversarial Attack

On the other hand, research attention is also paid to work out detection methods for adversarial attack. Papernot et al. provided a comprehensive investigation into the security problems of machine learning and deep learning, in which they established a threat model and presented the “no free lunch” theory showing the tradeoff between accuracy and robustness of deep learning models. Inspired from the fact that most of the current datasets are compressed JPG images, some researchers designed a method to defend image adversarial attack using image compression. However, in their proposed method, a large compression may also lead to a large loss of classification accuracy of the attacked images, while a small compression cannot work well against adversarial attack. In [16], Neural Cleanse was developed to detect a backdoor attack in neural networks, and some methods were designed to mitigate backdoor attack as well.

Compared with the existing works, this paper is the first work that focuses on the detection of the one-pixel attack. In particular, two novel detection mechanisms are proposed with one using a gradient calculation-based method and the other using a differential evolution-based method.

3. System Models

The attack model and detection model in our considered DNN-based information systems are introduced as follows.

3.1. Attack Model

In this paper, the attack model of [15] is taken into account, in which an adversarial image is generated by modifying only one pixel in the victim image. The purpose of a one-pixel attack is to maliciously change the classification result of a victim image from its original label to a target label. As shown in Figure 2, the image is correctly classified as an original label, “sheep,” by a given DNN model. After being modified one pixel, the output label with the highest preference of the model is changed to a target label, “airplane,” leading to a wrong classification result. The attackers perform a black box attack only, which means they have the accessibility to the probability labels and cannot get the inner information of the network. Also, considering that the attacker aims to make the attack as efficiently as possible, it is supposed that all the images in the dataset are altered.

[figure omitted; refer to PDF]

From Figure 3, one can find that the $L 1$ norm of the 751st pixel is obviously different from others. With verification, we know that the 751st pixel is the pixel modified in the one-pixel attack. Also, to understand how the affected target label is related to the modified pixel, we calculate the $L 1$ norm of the infected pixel to different labels. In Figure 4, we can find that the $L 1$ norm to the airplane is lower than that to the other labels. Thus, our approach can also determine which label is the target label.

[figure omitted; refer to PDF]

The average detection success rate of our trigger detection method is $9.1 %$ .

5.4. Performance of Candidate Detection

In our candidate detection method, the initial number of candidate solutions and the number of produced candidate solutions are set to be 400, i.e., $N_{ini} = N_{c} = 400$ ; the maximum number of iterations is $T_{\max} = 100$ ; the scale parameter is $F = 0.5$ ; and the threshold for the probability of being classified to the target label is $p_{th} = 90 %$ . To eliminate the influence of random variables in our Algorithm 1, we run the experiment 5 times with the fixed parameter settings and present the results in Table 5. Moreover, to investigate the impact of the size of candidate set, we also compare the detection success rates when $C$ is set to be different values.

Table 5

Detection success rate of candidate detection.

Experiment	1	2	3	4	5
Success rate (%)	20.4	24.3	30.1	21.9	26.7

As shown in Figure 5, when $C = 1$ , the success detection rate is $5.4 %$ smaller than the success detection rate of our trigger detection method. Particularly, with $C = 1$ , both the trigger and the candidate detection methods output one detected pixel but differ in their pixel selection schemes: (i) in out trigger detection, the trigger pixel is an optimal solution of trigger identification problem as well as has a smallest value of the $L 1$ form, while (ii) in our candidate detection, the only one candidate victim pixel is randomly selected by the differential evolution-based heuristic algorithm. From the definition of the trigger pixel in this paper, the probability of the trigger pixel being modified in a one-pixel attack is larger than that of other pixels. Therefore, the trigger detection method outperforms the candidate detection method when $C = 1$ , confirming that the idea of finding the trigger pixel to detect the one-pixel attack is solid.

[figure omitted; refer to PDF]

When $C$ is increased from 1 to 5, the detection success rate grows from $5.4 %$ to $30.1 %$ . The main reason is that with a larger number of output candidate pixels, more possible modified pixels can be examined, and the probability of finding the actual modified pixel is increased.

However, when $C$ is increased from 5 to 10, the detection success rate nearly remains the same (in Figure 5, the subtle difference of the detection success rate results from the randomness of the DE algorithm). This illustrates that only increasing the number of candidate pixels may not always enhance the detection success rate. In summary, for the detection success rate, the marginal benefit of enlarging the number of candidate pixels is diminishing, and thus, setting an appropriate value to $C$ can help effectively and efficiently detect the one-pixel attack (e.g., $C = 5$ in our experiments).

6. Conclusion and Future Work

This paper proposes two novel methods, i.e., the trigger detection method and the candidate detection method, to detect a one-pixel attack that is one of the most concealed attack models. The trigger detection method gives the exact pixel that may be modified by the one-pixel attack; the candidate detection method outputs a set of pixels that may be changed in the one-pixel attack. Via extensive real-data experiments, the effectiveness of our two methods can be confirmed; in particular, the detection success rate of our candidate detection can achieve 30.1%.

As a preliminary exploration of the one-pixel attack detection, in this paper, we consider that all the images are attacked and the detection is thus implemented on a dataset full of modified images. In our future work, we will carry out further research activities along two directions: (i) attempting to distinguish between the benign images and the attacked images in the presence of the one-pixel attack and (ii) mitigating the impact of the one-pixel attack by enhancing the resistance to adversarial samples in DNNs.

Acknowledgments

This work was partly supported by the National Science Foundation of the U.S. (1704287, 1829674, 1912753, and 2011845).

References

[1] Y. LeCun, Y. Bengio, G. Hinton, "Deep learning," Nature, vol. 521 no. 7553, pp. 436-444, DOI: 10.1038/nature14539, 2015.

[2] R. Socher, C. C. Lin, C. Manning, A. Y. Ng, "Parsing natural scenes and natural language with recursive neural networks," Proceedings of the 28th international conference on machine learning (ICML-11), pp. 129-136, .

[3] A. Kendall, Y. Gal, What uncertainties do we need in Bayesian deep learning for computer vision?, 2017.

[4] T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, Y. Ma, "PCANet: a simple deep learning baseline for image classification?," IEEE Transactions on Image Processing, vol. 24 no. 12, pp. 5017-5032, DOI: 10.1109/TIP.2015.2475625, 2015.

[5] X. Zheng, Z. Cai, Y. Li, "Data linkage in smart internet of things systems: a consideration from a privacy perspective," IEEE Communications Magazine, vol. 56 no. 9, pp. 55-61, DOI: 10.1109/MCOM.2018.1701245, 2018.

[6] Y. Liang, Z. Cai, J. Yu, Q. Han, Y. Li, "Deep learning based inference of private information using embedded sensors in smart devices," IEEE Network, vol. 32 no. 4,DOI: 10.1109/MNET.2018.1700349, 2018.

[7] Z. Cai, Z. He, X. Guan, Y. Li, "Collective data-sanitization for preventing sensitive information inference attacks in social networks," IEEE Transactions on Dependable and Secure Computing, vol. 15 no. 4,DOI: 10.1109/TDSC.2016.2613521, 2016.

[8] V. Vapnik, The Nature of Statistical Learning Theory, 2013.

[9] D. D. Lewis, "Naive (Bayes) at forty: the independence assumption in information retrieval," European conference on machine learning, 1998.

[10] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, A. Swami, "The limitations of deep learning in adversarial settings," 2016 IEEE European Symposium on Security and Privacy (EuroS and P), pp. 372-387, DOI: 10.1109/EuroSP.2016.36, .

[11] X. Yuan, P. He, Q. Zhu, X. Li, "Adversarial examples: attacks and defenses for deep learning," IEEE Transactions on Neural Networks and Learning Systems, vol. 30 no. 9, pp. 2805-2824, DOI: 10.1109/TNNLS.2018.2886017, 2019.

[12] Z. Cai, X. Zheng, J. Yu, "A differential-private framework for urban traffic flows estimation via taxi companies," IEEE Transactions on Industrial Informatics, vol. 15 no. 12, pp. 6492-6499, DOI: 10.1109/TII.2019.2911697, 2019.

[13] Z. Xiong, Z. Cai, Q. Han, A. Alrawais, W. Li, "Adgan: protect your location privacy in camera data of auto-driving vehicles," IEEE Transactions on Industrial Informatics,DOI: 10.1109/TII.2020.3032352, 2020.

[14] Z. Xiong, W. Li, Q. Han, Z. Cai, "Privacy-preserving auto-driving: a GAN-based approach to protect vehicular camera data," 2019 IEEE International Conference on Data Mining (ICDM), pp. 668-677, DOI: 10.1109/ICDM.2019.00077, .

[15] J. Su, D. V. Vargas, K. Sakurai, "One pixel attack for fooling deep neural networks," IEEE Transactions on Evolutionary Computation, vol. 23 no. 5, pp. 828-841, DOI: 10.1109/TEVC.2019.2890858, 2019.

[16] B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, B. Y. Zhao, "Neural cleanse: identifying and mitigating backdoor attacks in neural networks," 2019 IEEE Symposium on Security and Privacy (SP), pp. 707-723, DOI: 10.1109/SP.2019.00031, .

[17] A. Nguyen, J. Yosinski, J. Clune, "Deep neural networks are easily fooled: high confidence predictions for unrecognizable images," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 427-436, .

[18] S.-M. Moosavi-Dezfooli, A. Fawzi, P. Frossard, "Deepfool: a simple and accurate method to fool deep neural networks," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574-2582, .

[19] N. Narodytska, S. Kasiviswanathan, "Simple black-box adversarial attacks on deep neural networks," 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1310-1318, .

[20] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, A. Swami, "Practical black-box attacks against machine learning," Proceedings of the 2017 ACM on Asia conference on computer and communications security, pp. 506-519, DOI: 10.1145/3052973.3053009, .

[21] J. Gao, J. Lanchantin, M. L. Soffa, Y. Qi, "Black-box generation of adversarial text sequences to evade deep learning classifiers," 2018 IEEE Security and Privacy Workshops (SPW), pp. 50-56, DOI: 10.1109/SPW.2018.00016, .

[22] I. Ben-Gal, "Outlier detection," Data mining and knowledge discovery handbook, pp. 131-146, 2005.

[23] S. Wu, G. Li, L. Deng, L. Liu, D. Wu, Y. Xie, L. Shi, "L 1 -norm batch normalization for efficient training of deep neural networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 30 no. 7, pp. 2043-2051, DOI: 10.1109/TNNLS.2018.2876179, 2019.

Word count: 2869

Show less

Copyright © 2021 Peng Wang et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

In recent years, a series of researches have revealed that the Deep Neural Network (DNN) is vulnerable to adversarial attack, and a number of attack methods have been proposed. Among those methods, an extremely sly type of attack named the one-pixel attack can mislead DNNs to misclassify an image via only modifying one pixel of the image, leading to severe security threats to DNN-based information systems. Currently, no method can really detect the one-pixel attack, for which the blank will be filled by this paper. This paper proposes two detection methods, including trigger detection and candidate detection. The trigger detection method analyzes the vulnerability of DNN models and gives the most suspected pixel that is modified by the one-pixel attack. The candidate detection method identifies a set of most suspected pixels using a differential evolution-based heuristic algorithm. The real-data experiments show that the trigger detection method has a detection success rate of 9.1%, and the candidate detection method achieves a detection success rate of 30.1%, which can validate the effectiveness of our methods.

Details

Title

Detection Mechanisms of One-Pixel Attack

Author

Wang, Peng¹; Cai, Zhipeng¹

; Kim, Donghyun¹; Li, Wei¹

¹ Department of Computer Science, Georgia State University, Atlanta 30303, USA

Editor

Wenzhong Li

Publication year

2021

Publication date

2021

Publisher

John Wiley & Sons, Inc.

e-ISSN

15308677

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2021/8891204

ProQuest document ID

2497887418

Detection Mechanisms of One-Pixel Attack

Jump to:

Full text

Abstract

Details

Suggested sources