Content area

Abstract

Data and analysis have evolved from being scattered numbers and qualities in spreadsheets to being seen as a means to revolutionize any substantial industry, thanks to the rise of technology. There are many unethical and unlawful ways that data may get corrupted, thus it's important to find a way to effectively detect and highlight all the corrupted data in the dataset. It is not an easy task to detect damaged data or to restore information from a corrupted dataset. This is crucial and could cause issues with data processing using machines or deep learning methods later on if not handled early enough. Rather than focusing on outlier identification, this study introduces its PAACDA: Presence driven Adamic Adar Corruption identification Algorithm and then consolidates the findings. Even though they rely on parameter tuning to achieve high accuracy and remember, current state-of-the-art models like Isolation forest and DBSCAN (which stands for "Density-Based the spatial the process of clustering of the applications with Noise") have a lot of uncertainty when they factor in corrupted data. This study investigates the specific performance problems with several unsupervised learning methods on corrupted linear and clustered datasets. In addition, we provide a new PAACDA technique that achieves a higher precision of 96.35% for cluster data and 99.04% for linear data compared to previous unsupervised training benchmarks on 15 prominent baselines, including as К-means clustering, Isolation forest, and LOF (Local Outlier Factor). From the aforementioned angles, this essay delves deeply into the relevant literature as well. In this study, we identify all the problems with current methods and suggest ways forward for research in this area.

Details

1009240
Business indexing term
Title
Comprehensive Data Corruption Identification Using Machine Learning Algorithms (PAACDA)
Author
Vanitha, M 1 ; Maneesha, K 2 ; Sri, K Uma Renu 2 ; Nancy, K 2 

 Professor, Department of CSE, Malla Reddy Engineering College for Women, Autonomous, Hyderabad, 
 Student, Department of CSE, Malla Reddy Engineering College for Women, Autonomous, Hyderabad 
Volume
15
Issue
3
Pages
144-153
Publication year
2024
Publication date
2024
Publisher
Ninety Nine Publication
Place of publication
Gurgaon
Country of publication
India
Publication subject
e-ISSN
13094653
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
ProQuest document ID
3114535215
Document URL
https://www.proquest.com/scholarly-journals/comprehensive-data-corruption-identification/docview/3114535215/se-2?accountid=208611
Copyright
© 2024. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2024-10-17
Database
ProQuest One Academic