Content area
The integration and emerging adoption of machine learning (ML) algorithms in healthcare big data has revolutionized clinical decision-making, predictive analytics, and real-time medical diagnostics. However, the application of machine learning in healthcare big data faces computational challenges, particularly in efficiently processing and training on large-scale, high-velocity data generated by healthcare organizations worldwide. In response to these issues, this study critically reviews and examines current state-of-the-art advancements in machine learning algorithms and big data frameworks within healthcare analytics, with a particular emphasis on solutions addressing data volume and velocity. The reviewed literature is categorized into three key areas: (1) efficient techniques, arithmetic operations, and dimensionality reduction; (2) advanced and specialized processing hardware; and (3) clustering and parallel processing methods. Key research gaps and open challenges are identified based on the evaluation of the literature across these categories, and important future research directions are discussed in detail. Among the several proposed solutions are the utilization of federated learning and decentralized data processing, as well as efficient parallel processing through big data frameworks such as Apache Spark, neuromorphic computing, and multi-swarm large-scale optimization algorithms; these highlight the importance of interdisciplinary innovations in algorithm design, hardware efficiency, and distributed computing frameworks, which collectively contribute to faster, more accurate, and resource-efficient AI-driven healthcare big data analytics and applications. This research supports the UNSDG 3 (Good Health and Well-Being) and UNSDG 9 (Industry, Innovation and Infrastructure) by integration of machine learning in healthcare big data and promoting product innovation in the healthcare industry, respectively.
Details
Accuracy;
Data processing;
Predictive analytics;
Datasets;
Big Data;
Hardware;
Optimization;
Feature selection;
Data analysis;
Machine learning;
Health care industry;
Medical imaging;
Field programmable gate arrays;
Innovations;
Distributed processing;
Efficiency;
Velocity;
Electronic health records;
Health care;
Costs;
Clustering;
Clinical decision making;
Algorithms;
Real time;
Federated learning
; Shaker Khalid 3 ; Abdul Latif Aliza 2
; Muda, Zakaria Che 4
1 Department of Informatics, College of Computing & Information Technology, Universiti Tenaga Nasional, Putrajaya Campus, Jalan IKRAM-UNITEN, Kajang 43000, Selangor, Malaysia, Departments of Artificial Intelligence & Information Technology, College of Computer Science and Information Technology, University of Anbar, Ramadi 31001, Iraq
2 Department of Informatics, College of Computing & Information Technology, Universiti Tenaga Nasional, Putrajaya Campus, Jalan IKRAM-UNITEN, Kajang 43000, Selangor, Malaysia
3 Departments of Artificial Intelligence & Information Technology, College of Computer Science and Information Technology, University of Anbar, Ramadi 31001, Iraq
4 Faculty of Engineering and Quantity Surveying, INTI International University, Nilai 71800, Negeri Sembilan, Malaysia; [email protected]