Content area

Abstract

The integration of distributed big data analytics into modern industrial environments has become increasingly critical, particularly with the rise of data-intensive applications and the need for real-time processing at the edge. While High-Performance Computing (HPC) systems offer robust petabyte-scale capabilities for efficient big data analytics, the performance of big data frameworks, especially on ARM-based HPC systems, remains underexplored. This paper presents an extensive experimental study on deploying Apache Spark 3.0.2, the de facto standard in-memory processing system, on an ARM-based HPC system. This study conducts a comprehensive performance evaluation of Apache Spark through representative big data workloads, including K-means clustering, to assess the effects of latency variations, such as those induced by network delays, memory bottlenecks, or computational overheads, on application performance in industrial IoT and edge computing environments. Our findings contribute to an understanding of how big data frameworks like Apache Spark can be effectively deployed and optimized on ARM-based HPC systems, particularly when leveraging vectorized instruction sets such as SVE, contributing to the broader goal of enhancing the integration of cloud–edge computing paradigms in modern industrial environments. We also discuss potential improvements and strategies for leveraging ARM-based architectures to support scalable, efficient, and real-time data processing in Industry 4.0 and beyond.

Details

1009240
Title
Efficient Parallel Processing of Big Data on Supercomputers for Industrial IoT Environments
Author
Al Jawarneh Isam Mashhour 1   VIAFID ORCID Logo  ; Rosa, Lorenzo 2   VIAFID ORCID Logo  ; Venanzi Riccardo 2   VIAFID ORCID Logo  ; Foschini Luca 2   VIAFID ORCID Logo  ; Bellavista Paolo 2   VIAFID ORCID Logo 

 Department of Computer Science, University of Sharjah, Sharjah P.O. Box 27272, United Arab Emirates 
 Dipartimento di Informatica—Scienza e Ingegneria, University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy; [email protected] (L.R.); [email protected] (R.V.); [email protected] (L.F.); [email protected] (P.B.) 
Publication title
Volume
14
Issue
13
First page
2626
Number of pages
26
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-06-29
Milestone dates
2025-05-31 (Received); 2025-06-23 (Accepted)
Publication history
 
 
   First posting date
29 Jun 2025
ProQuest document ID
3229142773
Document URL
https://www.proquest.com/scholarly-journals/efficient-parallel-processing-big-data-on/docview/3229142773/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-07-11
Database
ProQuest One Academic