Content area

Abstract

Machine learning (ML) algorithms play a critical role in automated decision-making systems across domains such as healthcare, finance, and autonomous systems. However, these models are increasingly vulnerable to adversarial threats, particularly poisoning attacks that manipulate training data without the knowledge of the ML developers. As ML models are often trained on publicly available data, data poisoning is trivial for attackers to perform, with no way to determine if training data is legitimate, poisoned during data collection, or poisoned during training in the current ML training pipeline.

This dissertation investigates data poisoning attacks, with a focus on label flipping and gradient manipulation techniques, two attacks capable of compromising the integrity and performance of ML systems. The dissertation also addresses the impact of these attacks on key performance metrics, including accuracy, computational efficiency, and prediction time, across multiple ML algorithms. To establish a foundation for evaluation, I benchmark Decision Trees, K-Nearest Neighbors (KNN), Logistic Regression, Random Forest, and Support Vector Machines (SVM) on manipulated datasets, generating strong baseline performance metrics that enable direct comparison of poisoning attacks.

Building on this foundation, I introduce DynaDetect\cite{perry2024detecting}, a novel KNN-based algorithm designed to detect data poisoning attacks in real-time. I further develop DynaDetect2.0, an improved version that integrates Convolutional Neural Networks (CNNs) for feature extraction and Mahalanobis distance for improved detection accuracy in high-dimensional data. I show the viability of DynaDetect2.0 on the CIFAR-10, ImageNet, and GTSRB datasets, where it outperformed both DynaDetect and traditional KNN in detecting label-flipping and gradient poisoning attacks.

To better understand the impact of DynaDetect2.0, I assess the vulnerability of multiple ML algorithms to poisoning attacks and examine new potential detection methods. This work emphasizes the importance of computational overhead, efficiency, and latency, ensuring these algorithms can rapidly and accurately detect data poisoning in real-world scenarios. The results provide insights into the conditions under which ML algorithms are most vulnerable to poisoning attacks and offer effective strategies for identifying these threats.

This dissertation's anticipated contributions include advancements in detection mechanisms, such as DynaDetect2.0, and the application of these techniques to other traditional ML algorithms. By improving ML systems' resilience, this work aims to improve their security and reliability, ensuring they can withstand sophisticated malicious attacks in diverse application environments.

Details

1010268
Business indexing term
Title
Improving Detection Capabilities of Traditional Machine Learning (ML) Algorithms Against Data Poisoning Attacks on Image Data
Number of pages
179
Publication year
2025
Degree date
2025
School code
0131
Source
DAI-B 87/1(E), Dissertation Abstracts International
ISBN
9798288861055
Committee member
Brown, Joshua; Chen, Yixin; Hua, Yi; Wang, Feng
University/institution
The University of Mississippi
Department
Computer Science
University location
United States -- Mississippi
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32000834
ProQuest document ID
3232316093
Document URL
https://www.proquest.com/dissertations-theses/improving-detection-capabilities-traditional/docview/3232316093/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic