Content area

Abstract

With the rapid growth of the Internet of Things (IoT) and the emergence of big data, handling massive amounts of data has become a major challenge. Traditional approaches involve sending raw data to cloud data centers for cleaning, processing, and interpretation using data warehouse tools. However, this study introduces BlueEdge, a fog edge mobile application that aims to shift the cleaning and preprocessing tasks from the cloud to the edge. We compare BlueEdge with four popular data cleaning tools (WinPure, DoubleTake, WizSame, and DQGlobal) that operate within data warehouse architectures, such as Hadoop servers. The comparison considers criteria such as time consumption, resource utilization (memory and CPU), and tool performance. BlueEdge utilizes Natural Language Processing (NLP) techniques, including those from the Natural Language Toolkit (NLTK) and Python packages, to connect with a real-time database. As shown in our results, the accuracy values that BlueEdge showed ranged between 72 and 95% across 6 categories of name-based duplicate detection tasks, proving its competitive performance in mobile edge environments. The validation of the framework was done using a larger dataset of 146 error cases with statistically significant values having confidence interval of between 3.4% to 5.8. Statistical comparison indicates consistently significant changes ( p < 0.05) compared to baseline settings of four commercial tools with large effect sizes ( Cohen d: 0.89- 1.34). BlueEdge takes care of data duplication elimination services such as using different spelling and pronunciation (78.4%, CI: 73.1–83.7%), misspellings (72.0%, CI: 66.2–77.8%), name abbreviations (90.5%, CI: 86.1–94.9%), honorific prefixes (95.2%, CI: 91.8–98.6%), common nicknames (76.2%, C The reliable performance of edge-based data cleaning is verified through cross-validation analysis (81.7% ± 2.3%), the results of which prove the consistency of its activity. Additionally, BlueEdge utilizes a minimal bandwidth of only 5000 bytes per edge on mobile phones, unlike data warehouses that require 10,000–60,000 bytes on Hadoop machines. Additionally, BlueEdge is designed to reduce the time taken for data cleaning to 1 s at the data edge, unlike the standard 4–30 s it normally takes for data warehouses. The blue edge is easy to use without authorization of the mobile devices, where the application is conducted free of charge. The framework was validated through controlled experimental testing and real-world deployment at an IT services company, achieving an overall ITSQM quality score of 8.9/10 and demonstrating practical effectiveness in organizational settings. This foundation has been further enhanced with neural network-based classification approaches, which are currently under peer review.

Details

1009240
Title
BlueEdge: application design for big data cleaning processing using mobile edge computing environments
Author
Elmobark, Nagwa 1 ; El-ghareeb, Haitham 1 ; Elhishi, Sara 1 

 Mansoura University, Faculty of Computer Science, Mansoura, Egypt (GRID:grid.10251.37) (ISNI:0000 0001 0342 6662) 
Publication title
Volume
12
Issue
1
Pages
204
Publication year
2025
Publication date
Aug 2025
Publisher
Springer Nature B.V.
Place of publication
Heidelberg
Country of publication
Netherlands
e-ISSN
21961115
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-08-21
Milestone dates
2025-08-03 (Registration); 2023-06-11 (Received); 2025-08-03 (Accepted)
Publication history
 
 
   First posting date
21 Aug 2025
ProQuest document ID
3241754801
Document URL
https://www.proquest.com/scholarly-journals/blueedge-application-design-big-data-cleaning/docview/3241754801/se-2?accountid=208611
Copyright
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-14
Database
ProQuest One Academic