Content area
E-government applications generate and process large volumes of heterogeneous data that demand high-throughput and low-latency computation. Although Hadoop MapReduce is commonly used for such tasks, its performance is often limited by disk I/O constraints and network delays during the shuffle phase. This study proposes a data address-based shuffle mechanism optimized for Hadoop clusters equipped with Solid-State Drives (SSDs), aiming to enhance data processing performance in e-government applications. The mechanism introduces three key components: address-based sorting, address-based merging, and pre-transmission of intermediate data, which collectively reduce disk I/O and network transfer overhead. Experimental evaluations using Terasort and Wordcount benchmarks demonstrate execution time reductions of 8% and 1%, respectively, with statistical significance confirmed through 95% confidence intervals. Scalability assessments on a simulated 50-node cluster and energy profiling further validate the approach, showing improved performance, reduced network congestion, and a 31% decrease in energy consumption compared to HDD-based systems. The findings establish the proposed mechanism as a cost-effective and efficient solution for large-scale data processing in public sector computing environments.
Details
1 Department of Information Systems and Technology, College of Informatics and Virtual Education, The University of Dodoma, Dodoma, Tanzania (ROR: https://ror.org/009n8zh45) (GRID: grid.442459.a) (ISNI: 0000 0001 1998 2954)