Content area
Lung cancer continues to be one of the most widespread and deadly cancer diagnoses that affects humans worldwide. Early detection of lung cancer leads to decreased mortality rates; however, several challenges hinder the development and deployment of effective predictive models. These challenges consist of mainly the problem of high computational power to evaluate large-scale medical data, privacy and security of medical data, limited data sharing between medical organizations and interpretability to handle the black box problem that AI-based models face. Such limitations have posed severe difficulties to the utilization of conventional approaches in the prediction of lung cancer, thus limiting them most importantly for general use, especially in clinical practice settings in real time. To address these challenges, this research introduced a novel lung cancer prediction model that utilizes an integrated framework combining MapReduce, Private Blockchain, Federated Learning (FL), and Explainable Artificial Intelligence (XAI). It improves lung cancer detection using MapReduce to handle large lung cancer datasets, supporting rapid and scalable learning. Private Blockchain is used for the secure, tamper-proof, and immutable processing of patient information, whereas FL allows healthcare sectors to train models together, without compromising patients’ privacy. Moreover, it also employed XAI to improve the model’s interpretability so clinicians can understand and rely on AI predictions. Together, these methods improve AI’s efficiency and trustworthiness in medical applications. This proposed model provides better and more secure lung cancer predictions, ensuring interpretability and collaboration. With an exceptional accuracy of 98.21% and a miss rate of just 1.79%, it outperforms previously published approaches, establishing a new benchmark for privacy-preserving, explainable, and scalable AI models in healthcare.
Introduction
Healthcare is one of the most crucial aspects of people’s and communities’ quality of Life worldwide. Access to affordable, quality healthcare has become a fundamental human right essential for disease prevention and overall well-being. Many technological and medical advances go on to adapt and advance healthcare technologies, thereby reducing costs and improving the provision of treatment. Rapid advancements in medicine have led to the development of new treatments for serious illnesses, including cancer. Cancer is the second biggest killer, next to heart disease, in the entire world. Each state alone in the US has a projected total of 1,865,190 cases and 609,360 deaths in 2022. Further, the probability of new drug discovery for cancer treatment is less than 10%, which makes cancer even more difficult1.
Lung cancer remains the leading cause of cancer-related deaths worldwide, accounting for 18% of all cancer fatalities2. Smoking is the leading cause of lung cancer, and it has either plateaued or is increasing in some countries. This implies that lung cancer will continue to grow for at least several decades3. It has been identified that if lung cancer is detected at the early stage and appropriately diagnosed, the fatality rate is easily reducible4,5. Lung cancer patients are expected to Live only 10–20% of those diagnosed five years ago. Computed Tomography (CT scan) and Magnetic Resonance Imaging (MRI) are some of the tests that patients frequently take during the early stage of the disease to enhance their survival rates6,7.
The advances in Deep Learning (DL) have made it possible for the Computer-Assisted Diagnosis (CAD) system to analyze and identify the visual features on its own8; therefore, many medical image-processing techniques have been implemented. Convolutional Neural Networks (CNNs) can effectively extract features using both shallow and deep models, demonstrating that relevant information can be captured at different levels9. There are two major lung cancer classifications: Small-Cell Lung Cancer (SCLC) and Non-SCLC (NSCLC). In general, lung cancer has its roots in several factors such as smoking, carcinogens in the air, gender, genetics, age, and other factors10. Chronic smoking is known to be a leading cause of lung cancer11. Lung cancer is caused by smoking and is not limited to individuals who smoke but also those who are exposed to second-hand smoke. The early warning signs of this killer disease include yellow fingers, anxiety, chronic illness, fatigue, reactions to allergens, wheezing, roaring, coughing up blood, hoarseness, breathing difficulty, bone pain, headache, swallowing problems, and chest discomfort12.
With the growing availability of CT and other medical imaging data, advanced computational techniques have become increasingly essential. MapReduce, a distributed computing framework, enables efficient processing of large-scale lung cancer datasets by parallelizing data-intensive tasks, such as feature extraction and classification, across multiple nodes13,14. Furthermore, private blockchain technology ensures data security, integrity, and privacy by creating an immutable ledger for managing sensitive patient data during collaborative research or multi-institutional studies15, 16–17. Together, these technologies facilitate the development of robust, scalable, and privacy-preserving lung cancer diagnostic systems. However, prior works using MapReduce or Blockchain in healthcare have focused mainly on scalability and security in isolation, with little attention to interpretability, limiting clinical adoption.
In general, the detection of a cancer case at an early stage can be achieved through accurate diagnosis and subsequent appropriate treatment; this can improve the possibility of finding a cure18. However, no matter the kind of medical tools, only authorized specialists are allowed to interpret medical information to identify diseases. There is also expected to be a difference in specialists’ opinions because the interpretation of medical images is challenging. Consequently, there is a great need for an intelligent, assisted diagnosis system in the medical domain. Recently, Machine Learning (ML) and Deep Learning (DL) algorithms have been widely applied to medical image analysis19, 20–21, and are also finding applications in other fields such as Intelligent Transportation Systems (ITS)22, 23–24, smart healthcare system25, 26, 27, 28–29, smart agricultural systems29, 30–31 and secure communication mechanisms33,34.
In practice, ML algorithms perform numerous specific tasks due to the recognition of precise data correlations. Real inputs, such as images, are also difficult to process for ML algorithms35, 36, 37–38 because of the amount of work entailed in processes such as the feature engineering process and the feature selection process. On the other hand, the DL algorithms, which are classified under the broader category of ML algorithms, are modelled by one or more hidden layers coming in between the input and the output layers. These models require a considerable amount of data and high-end workstations to execute the models correctly. The scientific community is turning its attention to DL for these reasons, including flexibility, high performance and throughput, use across multiple disciplines, and other features39. The DL methods improve the system’s performance compared to ML methods in areas such as the classification of biological pictures40,41. Nevertheless, both ML and DL are known as ‘Black Box’ models as both fail to display the exact analysis being done during the prediction42,43. Therefore, end users (such as doctors) cannot bear the burden of the complexity of the prediction process and cannot assess the results produced by these models. This has reduced the utilization of artificial intelligence models by healthcare providers, such as doctors, who still rely on medically grounded diagnoses to inform a prognosis. Thus, the need for justification and the role of interpretation in model output become necessary to increase trust in and the adoption of such systems, especially in specific fields of application where a single incorrect prediction can endanger human lives (autonomous driving, medical applications). Furthermore, if a medical professional is telling their patient about the therapeutic option, then, as a rule, the explanations of one’s decision are also a prerequisite for building trust.
However, while ML and DL algorithms have shown remarkable performance in analyzing medical images, their black-box nature limits trust and adoption in clinical settings44. To address this, XAI methods are increasingly employed to provide interpretability and transparency by highlighting the critical regions of medical images influencing the model’s predictions45. By enabling clinicians to understand and validate AI-driven decisions, XAI bridges the gap between complex algorithms and real-world clinical applications, paving the way for trustworthy and interpretable lung cancer diagnostic systems. Yet, most existing XAI-based studies do not combine explainability with federated privacy and scalability, leaving a critical research gap in the development of trustworthy and secure clinical AI systems.
The growing reliance on AI in healthcare highlights the need for privacy-preserving and interpretable decision-making, especially when sharing sensitive medical data. As healthcare institutions increasingly collaborate, ensuring patient privacy becomes critical due to data sensitivity and legal constraints. At the same time, for AI to be trusted by clinicians, it must provide transparent, understandable predictions. This motivates the integration of privacy-preserving technologies, like private blockchain, MapReduce, and FL, with interpretable XAI models, ensuring secure data sharing while allowing healthcare professionals to understand and trust the decisions made by AI systems, thereby paving the way for ethical and reliable AI-driven solutions in healthcare. In addition, the framework is designed to comply with healthcare data protection regulations such as the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR), ensuring ethical and secure handling of sensitive patient information. This research’s key contributions include using MapReduce, Private Blockchain, Federated Learning, and XAI to address the challenges of scalability, data security, collaborative learning, and interpretability in lung cancer prediction.
The rest of the manuscript is organized as follows: Sect. 2 provides a literature review about lung cancer prediction with identified challenges; Sect. 3 highlights the limitations of previous approaches; Sect. 4 outlines the detailed contributions of the proposed model; Sect. 5 describes the proposed method; and Sect. 6 demonstrates the simulation results and performance analysis. Finally, Sect. 7 provides an overview of future research possibilities and its conclusion.
Literature review
Lung cancer, which is considered one of the most fatal types of cancer, has contributed to the development of new diagnostic tools. Advanced imaging analysis through ML and DL technology, including hybrid models and gene expression-based approaches, has enhanced the early and accurate identification of diseases21,46. However, some issues are the privacy of the patient data, regulatory constraints, and obtaining trust through interpretability. Several research works have investigated new AI achievements to solve the problems in lung cancer detection. However, despite such advancements, significant challenges like data privacy, interpretability, and regulatory compliance remain, as shown by the subsequent studies.
The researchers in47 proposed a two-step approach for detecting lung cancer from histopathology images. The first step involved image preprocessing techniques, including contrast enhancement, histogram equalization, median filtering, adaptive thresholding, and image segmentation, to improve image quality and prepare it for analysis. In the second step, a basic CNN model was used for image categorization, characterized by three layers of convolution, two layers of max-pooling, and a final layer of fully connected neurons. The model was trained on a large dataset of histopathology images from different sources, proving the effectiveness of the proposed model in lung cancer detection. However, the model lacked interpretability and privacy considerations, which are addressed in the proposed model.
In48, the authors presented a DL model in the encoder-decoder form to segment and detect lung cancer in chest radiographs. To improve the approach’s stability and accuracy, normal and inverted Chest Radiographs were incorporated into the design. The method showed high sensitivity and low false positivity when detecting lung cancer. However, there are some drawbacks to the technique. For example, it was not compared with other methods that have been proposed so far. In addition, the study did not incorporate explainability or secure distributed learning, unlike the proposed approach.
In49, an intelligent lung cancer detection system was presented with many techniques to detect lung cancer with high accuracy. The system integrates multilevel brightness-preserving methods for image enhancement, an advanced deep neural network for region segmentation, blended spiral optimizationand an intelligent generalized rough set to select features and, and an intelligent generalized rough set to select features, and an ensemble classifier for precise cancer classification. It shows that efficient, more thorough lung cancer detection is possible through this multifaceted strategy. However, this system did not ensure interpretability or incorporate secure collaborative learning mechanisms, both of which are central to the proposed model.
In50, a weakly supervised lung cancer detection and diagnosis network (WS-LungNet) was introduced. It employs semi-supervised learning for nodule segmentation and a cross-attention mechanism for patient-level malignancy evaluation. WS-LungNet effectively leverages unlabelled data and explores correlations among detected nodules, resulting in improved performance in both nodule detection and malignancy evaluation at the patient level. The method addresses key challenges such as label scarcity and annotation inconsistencies in nodule segmentation tasks. Yet, WS-LungNet lacked privacy-preserving federated approaches and explainability features, which limit its clinical applicability compared to the proposed integrated solution.
In51, Xu reimplemented a CNN model based on ResNet to address the issue of low learning efficiency. The study also compared three CNN architectures, revealing that deeper neural networks exhibit superior learning efficiency and achieve higher accuracy in classifying lung CT images. This highlighted the potential of deeper architectures in enhancing performance for lung cancer detection tasks.
References52,53 applied DL algorithms for lung cancer detection, achieving an accuracy of up to 77%. In contrast54, proposed a multimodal late fusion approach that integrates hand-crafted features from radiomics, pathomics, and clinical data to predict radiotherapy outcomes for NSCLC patients. Tested on a cohort of 33 patients, this method achieved an AUC of 90.9%, outperforming unimodal approaches and highlighting the potential of data integration to improve precision medicine in lung cancer treatment.
In55, the authors developed a 3-year lung cancer risk prediction model using substantial real-world data and ML. The model demonstrated transportability to Electronic Health Records (EHR) and claims data, identifying a high-risk group with a lung cancer incidence nine times higher than the average cohort. With the ability to detect high-risk patients and integrate them into an open-source platform, the model is helpful in potentially decreasing physician burden. Similarly56, endorsed the application of CXR-LC, an open-source DL tool, to discern high-risk people for lung cancer screening by incorporating Electronic Medical Records (EMR). CXR-LC successfully detected high-risk individuals, including those ineligibles under the latest CMS recommendations. This tool could enhance screening participation and potentially reduce lung cancer mortality rates. Table 1 shows the comprehensive analysis of a few related studies.
Recent research works have produced improved analysis of lung diseases through hybrid CNN models57 and refined lung section segmentation through UNet by applying two-fold training58. A detailed analysis of omics data processing through AI methods has identified scalability and interpretability as primary hurdles59.
In21, feature imbalance and computational complexity issues were addressed through the development of a multi-layered deep learning perceptron (MLP) in health risk prediction. By utilizing feature extraction and selection on benchmark datasets like Wisconsin Breast Cancer, SaHeart, and Pima Indians Diabetes, they were able to outperform traditional classifiers (SVM, KNN, K-means) and exhibit improved performance when it comes to the ability to process large-scale medical data. Following this trend, Bikku et al.46 concentrated on the inability of traditional methods to detect temporal relationships inherent in genomic data. They proposed a hybrid LSTM-SVM biclustering algorithm by integrating sequential learning and robust pattern recognition, attaining significant relative gains of 10% accuracy, 7% recall, and 8% F-measure over existing methods like Hidden Markov Models, standalone SVM, and Recurrent Neural Networks. The combination of these studies underscores the increasing importance of hybrid and deep learning models in the analysis of critical healthcare data challenges, including scalability, temporal dependency, and predictive accuracy.
Table 1. Comprehensive overview of various lung cancer prediction models.
Reference | Model | Outcome | Computational Efficiency | MapReduce | Private Blockchain | FL | XAI | Predictive Model | Limitations |
|---|---|---|---|---|---|---|---|---|---|
Nannapaneni, D. et al., 202347 | CNN (3 conv, 2 max-pooling, 1 FC) | Detected lung cancer from histopathology images | Preprocessing improved image quality and analysis | × | × | × | × | Image classification (histopathology) | Limited comparison with advanced methods or interpretability |
Shakeel, P.M. et al., 202248 | Enhanced DNN, hybrid optimization, ensemble classifier | robust lung cancer detection | Feature selection and image enhancement improved efficiency | × | × | × | × | Multilevel brightness preservation, region segmentation | Lacks interpretability and real-world validation |
Shen, Z. et al., 202350 | WS-LungNet (semi-supervised DL) | Improved nodule detection and malignancy evaluation | Semi-supervised learning utilized unlabelled data | × | × | × | × | Nodule segmentation, malignancy prediction | Needs further validation and comparison with supervised methods |
Xu, H., 202351 | CNN (ResNet-based, deeper architectures) | Improved accuracy in lung CT image classification | Addressed low learning efficiency using deeper networks | × | × | × | × | Classification of lung CT images | Limited privacy and interpretability |
Chandran, U. et al., 202355 | ML-based lung cancer risk prediction model | Identified a high-risk group with 9x higher lung cancer incidence | Scalable predictions using real-world EHR data | × | × | × | × | Risk prediction for high-risk groups using EHR and claims data | Lacks advanced privacy and interpretability mechanisms |
Shaffie, A. et al., 201960 | MGRF with 3D HOG, stacked auto-encoder | Feature extraction and fusion for lung cancer classification | Efficiently analyzed large datasets using fusion techniques | × | × | × | × | Diagnosis, prognosis, survival prediction (CT/MRI) | Lacks privacy-preserving techniques and interpretability |
Agrawal, A. et al., 201161 | Combined voting classifier | Predicted Survival with 90% accuracy for 6 months, 1 Year, 5 years | Utilized the SEER dataset for efficient survival prediction | × | × | × | × | Survival prediction using SEER | Limited external validation and interpretability |
Mohalder, R.D. et al., 202262 | CNN (ReLU, softmax) | Detected abnormal tumor patterns in HPI | Effectively analyzed complex tumor data | × | × | × | × | Tumor pattern recognition in CRC (HPI) | Limited comparison with state-of-the-art methods, and lacks interpretability. |
Su, Y. et al., 202263 | WGCNA with Lasso, DT, RF, SVM | Detected colon cancer, staging via differential gene expression | Reduced complexity using Lasso and WGCNA | × | × | × | × | Gene-based classification and staging | Lack of privacy, collaborative learning, and interpretability |
Garg, S. et al., 202064 | Pre-trained CNNs (MobileNet, InceptionResNetV2, etc.) | Detected colon and lung cancer (HPI) | Enhanced efficiency with pre-trained models and augmentation | × | × | × | × | Cancer detection using histopathology | Lacks privacy, collaborative learning, and interpretability |
Li, L. et al., 201965 | DL-CAD system | Detected lung nodules (< 3 mm), malignancy prediction | Efficiently analyzed small nodules (LIDC-IDRI, NLST datasets) | × | × | × | × | Lung nodule detection and malignancy prediction | Moderate accuracy (86.2%), lacks privacy and interpretability |
Teramoto, A. et al., 201766 | Deep CNN (DCNN) | Automated classification of lung cancer with 71% accuracy | Small dataset (76 cases), limited computational demands | × | × | × | × | Lung cancer classification | Low accuracy, small dataset, lacks privacy, and interpretability. |
Proposed interpretable global model | CNN, EfficientNetB0, InceptionV3, DenseNet121 | High predictive accuracy (98.21%) in lung cancer diagnosis | Enhanced through MapReduce for distributed processing | ✔ | ✔ | ✔ | ✔ | Lung cancer prediction using aggregated and decentralized learning | Requires synchronization and computational resources |
Limitations of the previous approaches
Lung cancer prediction systems face several critical challenges that impede their efficiency, security, and clinical acceptance. These systems often struggle to process large-scale datasets efficiently, ensure the privacy and security of sensitive medical data, foster collaborative learning among institutions, and provide interpretability that builds trust among clinicians. Overcoming these limitations is vital for the widespread adoption of AI-driven solutions in lung cancer diagnosis and prediction. The key limitations in the field are outlined, as shown in Table 1:
Computational inefficiency
Existing systems struggle to process vast lung cancer images and patient datasets due to the large datasets’ demand for significant computational power. This often leads to delays in model training and slow diagnostic support, which can negatively impact timely clinical intervention and patient outcomes in real-world hospital workflows47,49,51,60,66.
Privacy and security concerns
Centralized data storage methods expose sensitive patient information to potential security breaches and cyber threats, while privacy concerns over unauthorized use further restrict data-sharing among healthcare sectors. In real-world practice, these issues prevent hospitals from pooling datasets, which hinders collaborative AI development and leads to fragmented and potentially biased AI models55,60,63.
Lack of collaborative learning
Privacy regulations limit the direct sharing of patient data between healthcare sectors, leading to fragmented data collection. This fragmentation reduces the diversity of data needed to train robust models that generalize well to different patient populations and clinical settings, ultimately limiting the model’s reliability when deployed across various hospitals50,55,63,64,66.
Limited model interpretability
Many AI models operate as opaque “black boxes,” offering predictions without clear explanations for clinicians. In practical terms, this lack of transparency means clinicians cannot verify whether the AI’s decisions align with known medical criteria, which can slow adoption of AI in lung cancer diagnosis and limit its usefulness in supporting clinical decisions47,49, 50–51,55,65.
These real-world challenges highlight the urgent need for an integrated solution that ensures computational efficiency, data privacy, collaborative learning, and interpretability to facilitate safe and effective AI deployment in lung cancer diagnosis.
Contribution of the proposed model
To overcome the challenges faced by existing lung cancer prediction systems, this study proposes a comprehensive framework that addresses key limitations. The contributions of this work are outlined below:
Improving computational efficiency
To address the challenges of handling large-scale lung cancer datasets, this study utilizes the MapReduce framework, a distributed computing approach that significantly enhances processing efficiency. MapReduce eliminates long computing times by distributing computational tasks across different machines, supports large-scale data preprocessing and efficient distribution of medical imaging data, and prevents a load on specialized system resources to improve large-scale learning for AI model training.
Enhancing privacy and security
Understanding the importance of protecting patient information, this research incorporates Private Blockchain technology to prevent alteration of shared information. The blockchain framework also protects data and keeps it from unauthorized access, which is essential for healthcare sectors that need to share data with partners but are bound by privacy rules.
Fostering collaborative learning
To address the problem of data isolation in healthcare, FL is implemented to enable distributed model training across healthcare sectors. This approach allows the model to learn from multiple and decentralized datasets, without transferring specific patient data, thus avoiding violations of privacy legislation, as well as enhancing the model’s cross-site transferability. This conceptually aligns with privacy standards (HIPAA/GDPR) by ensuring that raw patient data remains on local devices while only model parameters are shared.
Improving model interpretability
To promote trust and transparency in AI systems, this study incorporates XAI approaches to address the ‘black box’ issue prevalent with AI solutions. XAI enhances the clinician’s ability to check some of the significant factors contributing to the output of the AI and may make the AI systems more acceptable in clinical practice.
This work addresses the critical limitations of lung cancer prediction systems by employing a multi-faceted approach that integrates scalable computational methods, privacy-preserving frameworks, decentralized learning, and enhanced interpretability. This study provides the foundation for secure, efficient, and trustworthy AI-driven solutions in lung cancer diagnosis by tackling these challenges.
This research develops a novel AI-driven framework incorporating MapReduce, Private Blockchain, FL, and XAI capabilities. This framework delivers performance improvements, privacy features, scalability, collaborative abilities, and clinical interpretability, proving it as a forward-looking real-world medical AI application solution.
Proposed methodology
Lung cancer prediction remains a significant challenge due to the complex, sensitive nature of medical data and the need for secure, collaborative, and interpretable solutions suitable for clinical use. Addressing these challenges is essential to developing AI-driven solutions that provide accurate predictions and gain healthcare professionals’ trust. The proposed methodology builds upon these imperatives, introducing a robust framework designed to overcome these obstacles and deliver practical, real-world applicability in lung cancer diagnosis, as shown in Fig. 1.
Fig. 1 [Images not available. See PDF.]
Overview of the proposed lung cancer prediction framework.
In Fig. 1, a comprehensive lung cancer prediction framework is illustrated, integrating advanced technologies to enhance data collection, processing, and validation. The Lung Cancer Data Collection System gathers medical imaging data, such as CT scans, from various healthcare sources, including hospitals, healthcare mobile devices, and remote monitoring enabled by the Internet of Medical Things (IoMT). The collected data is transmitted via secure communication networks for centralized management. The system processes collected data through multiple stages. Initially, Data Pre-Processing ensures image quality and prepares it for analysis. MapReduce distributes and processes large-scale lung cancer data across various nodes, significantly improving computational efficiency. To ensure security and privacy, Private Blockchain Technology is integrated, creating a tamper-proof and secure data-sharing environment. The Data Processing and Data Post-Processing stages are performed within a Cloud Computing environment, enabling scalable storage and computation.
The Validation Phase ensures the model’s robustness and reliability through comprehensive testing of processed data. The system combines cloud computing capabilities with blockchain security, enabling secure, efficient, and trustworthy lung cancer diagnosis and prediction.
Fig. 2 [Images not available. See PDF.]
Abstract view of the proposed framework.
Figure 2 presents a cloud-centric lung cancer diagnosis and prediction framework, leveraging cloud big data storage as the central hub for efficient, scalable, and secure data management. The process begins with collecting lung cancer data from various sources, such as healthcare systems, IoT devices, and imaging tools, which are then uploaded to the cloud. The Data Pre-Processing module ensures that the collected data undergoes histogram, noise reduction, edge detection, and colour space transformation, preparing it for further analysis. Cloud-based MapReduce efficiently handles large-scale data by distributing computational tasks (Map, Reduce, and Shuffle), significantly enhancing processing speed and scalability. To ensure privacy and data integrity, private blockchain technology facilitates tamper-proof, secure data sharing across institutions. The data processing phase integrates XAI, providing interpretable results to enhance trust and transparency in AI predictions. Post-processed outputs, such as ranked alerts and false-positive reductions, are stored and visualized through cloud-based dashboards, enabling real-time access to actionable insights. This cloud-driven framework ensures seamless interconnectivity, scalability, and security, positioning cloud computing as the backbone of advanced lung cancer diagnosis systems.
Fig. 2 [Images not available. See PDF.]
Local server-side of the proposed framework for lung cancer prediction.
Figure 3 illustrates the flow of the proposed model for lung cancer prediction. The process begins with the Lung Cancer Data Collection phase, where data is gathered from a Kaggle dataset83, which includes CT scans and X-rays, providing a foundation for model training and validation. The dataset comprises 120 benign, 561 malignant, and 416 normal cases, offering a reliable foundation for training and validation. Preprocessing and augmentation techniques were applied to enhance data quality and address dataset limitations. Figure 4 shows a few random sample images, including benign cases, malignant cases, and normal images. This raw data is then passed into the Preprocessing Layer, where techniques such as normalization, noise reduction, edge detection, and colour space transformation are applied to clean and prepare the data for analysis.
Fig. 4 [Images not available. See PDF.]
Random sample images used for training and testing from the dataset83.
Figure 5 shows the distribution of pixel intensities in the normalized images, where the x-axis represents pixel intensity values (ranging from 0 to 1), and the y-axis indicates their frequency. Most pixel intensities are concentrated near 0 (darker regions), corresponding to background areas or low-density tissues. At the same time, smaller peaks appear at higher intensity values (0.7–0.9), representing brighter regions like dense tissues or lung nodules. The normalization process scales the pixel values uniformly, enhancing contrast and ensuring consistency in the dataset, which is essential for accurate analysis and improved performance of DL models.
Fig. 5 [Images not available. See PDF.]
Distribution of pixel intensities in normalized images.
Fig. 6 [Images not available. See PDF.]
Sample normalized image.
Fig. 7 [Images not available. See PDF.]
Noise reduction using Gaussian Blur and Median Filtering on lung images.
Figure 6 shows a normalized lung CT image, resized to an input resolution of 224 × 224, where pixel intensities are scaled to enhance contrast. This highlights key structures like tissues and nodules, making it easier for DL models to identify abnormalities, such as potential tumors, for accurate lung cancer prediction.
Figure 7 demonstrates the application of noise reduction techniques—Gaussian Blur and Median Filtering—on lung CT images to smooth out noise while preserving key structures. Each row represents one original image followed by its corresponding Gaussian-blurred and median-blurred versions. The Gaussian Blur reduces noise by averaging pixel intensities with a weighted kernel, resulting in a smoother appearance but slightly blurring edges. In contrast, Median Filtering preserves edges more effectively by replacing pixel values with their neighborhood’s median, making it particularly useful for reducing speckle noise. These techniques enhance image quality, which is critical for improving the accuracy of lung cancer detection models.
Fig. 8 [Images not available. See PDF.]
Edge detection using Canny and Sobel filters for lung CT images.
Figure 8 shows edge detection techniques applied to lung CT images. The first column displays the original images, the second column highlights fine edges using Canny Edge Detection, and the third column enhances broader structural features with Sobel Edge Detection. These edge detection techniques are essential for isolating lung regions, detecting nodules, and identifying tissue abnormalities, thereby serving as a vital step in preprocessing for accurate lung cancer prediction models.
Figure 9 illustrates the transformation of lung CT scan images into different color spaces for enhanced feature analysis. The first column displays the original grayscale images, which retain the raw structure of lung regions. The second column shows the images converted into the HSV (Hue, Saturation, Value) color space, allowing finer control over brightness, contrast, and intensity variations. The third column represents the images in the LAB (Lightness, A, B) color space, where lightness is separated from chromatic components, making it especially effective for noise reduction, contrast enhancement, and texture-based feature extraction. These color space transformations support improved image preprocessing and strengthen the model’s ability to detect subtle features in lung cancer diagnosis. The preprocessed data is split into training (70%) and testing (30%) datasets.
In the next step, the MapReduce framework is applied to handle large-scale lung cancer datasets efficiently. This distributed processing technique splits the data into smaller chunks, processes them in parallel, and combines the results, enabling fast and scalable analysis.
Fig. 9 [Images not available. See PDF.]
Color space transformations (HSV and LAB) for enhanced feature analysis.
Fig. 10 [Images not available. See PDF.]
MapReduce framework.
Figure 10 illustrates the MapReduce framework used for processing large datasets efficiently, particularly in lung cancer prediction. The process begins with the Input Data, which is divided into smaller chunks for parallel processing. In the Map phase, the input data is processed using a Map function , transforming each chunk into intermediate key-value pairs:
1
Where represents the key (e.g., feature label, patient ID) and denotes the value (e.g., pixel intensity or extracted feature). The intermediate results then move to the Shuffle phase, where the key-value pairs are sorted and grouped by key:
2
In the Reduce phase, the grouped data is aggregated using a Reduce function , which applies an operation (e.g., summation, averaging):
3
The final output is represented as:
4
consists of aggregated results ready for analysis. This MapReduce approach enhances computational efficiency by enabling parallel processing of large-scale lung cancer datasets, such as medical imaging (CT scans, X-rays) or patient records, ensuring faster and scalable data analysis. Once the training and testing datasets have been processed and analyzed using MapReduce, they are stored on the private blockchain layer, as shown in Fig. 11.
Fig. 11 [Images not available. See PDF.]
Framework for private blockchain.
Figure 11 illustrates a private blockchain architecture specifically adapted for secure and decentralized management of hospital transactions within a healthcare system. The process begins with hospital nodes, which are data sources for medical imaging and patient records. These transactions, such as patient diagnoses, medical imaging results, or treatment records, are converted into secure blocks using cryptographic hashing. The cryptographic process ensures data integrity and prevents unauthorized tampering.
The blocks are distributed across decentralized hospital databases, where each transaction—such as Hospital Transaction A and Transaction B—is validated and stored. Trusted peers in the blockchain network maintain this distributed ledger, enabling secure and transparent sharing of healthcare data without central control. Smart contracts further automate and govern the transaction processes, ensuring compliance with privacy policies and facilitating trusted interactions among hospital nodes. This design conceptually aligns with healthcare privacy regulations such as HIPAA and GDPR by maintaining data integrity, decentralization, and access control while preventing unauthorized data sharing. This system enhances data security and integrity, fostering collaboration among healthcare institutions while maintaining patient confidentiality.
After applying Private Blockchain Technology, the dataset transactions are stored on the cloud for testing, and the training data is passed to DL for predicting lung cancer for the local clients. If the predictions meet the required learning rate, the trained patterns are stored on the local server’s cloud system for centralized access. If the criteria are retraining, the model retrains using the local client data. Once the local models are trained, they are aggregated to the global cloud. During this phase, the global model combines inputs from all the local models to optimize its decision-making process. The updated global model is then synchronized with all local clients in each iteration, ensuring consistent model updates and improved accuracy across the distributed system. Specifically, Federated Learning is implemented at this stage by collecting the trained model parameters from each local client. These parameters include the weights (W) and biases (b) of the deep learning models trained on each hospital’s data. The global cloud server securely aggregates these weights and biases to create a new global model. No gradients or optimizer states are transmitted; only the trained weights and biases are exchanged to ensure data privacy. This approach preserves patient privacy while enabling collaborative training across institutions.
In the Validation Phase, the trained patterns of the local model stored on the local client-server are imported from the cloud alongside the testing dataset to predict lung cancer. The system checks the predictions: if lung cancer is detected, a confirmation message is displayed for further analysis; otherwise, the process exits or re-trains for improved accuracy. This ensures a robust workflow for efficient lung cancer detection, leveraging cloud computing for scalability and blockchain for secure data sharing.
Fig. 12 [Images not available. See PDF.]
Proposed model for lung cancer prediction using XAI (Global Server).
Figure 12 illustrates a comprehensive lung cancer prediction framework, integrating decentralized processing, secure data management, and XAI to ensure transparency in decision-making. The process begins with Lung Cancer Data Collection using IoMT across multiple hospitals (e.g., Hospital 1, Hospital 2, and so on). The data undergoes Data Preprocessing, including cleaning, normalization, and feature extraction, to ensure high-quality inputs. Next, MapReduce is applied to enable distributed and parallel data processing, efficiently handling large medical datasets. The Private Blockchain Technology layer ensures secure and tamper-proof data sharing, maintaining patient privacy and facilitating hospital collaboration. Local DL models are trained on the pre-processed data at each hospital.
The local models are then sent to the Global Cloud, where model aggregation occurs, integrating updates from all hospitals to build an improved global model. The global model is validated and redistributed to local hospitals.
To ensure trust and transparency in the lung cancer predictions, XAI techniques, including Local Interpretable Model-agnostic Explanations (LIME) and Gradient-weighted Class Activation Mapping (Grad-CAM), are employed:
LIME works by creating a simplified, interpretable model that approximates the behavior of a complex, global model in a small region around a specific input. It does this by slightly altering the input and observing how these changes affect the model’s predictions. The simplified model is mathematically represented as:
5
Where, is the original complex model, is the local surrogate model, is the loss function measuring the difference between and , is the proximity measure of perturbed samples to the original sample , represents the complexity of , ensuring interpretability. LIME highlights superpixels or regions in the input images (e.g., lung nodules in CT scans) that significantly influence the model’s decision, making the output explainable for clinicians.
Grad-CAM visualizes regions of the image that contributed most to a prediction by analyzing gradients of the target class concerning the convolutional feature maps. The heatmap is generated using the following equation:
6
Where is the Grad-CAM heatmap for class , represents the activation maps of the convolutional layer , are the importance weights are calculated as:
7
Here, represents the gradients of the class score concerning the feature maps , and is the normalization factor. Grad-CAM produces a heatmap highlighting the areas (e.g., nodules or abnormalities) most relevant to the model’s prediction, making it visually interpretable for clinicians.
Finally, the trained global model is used to analyze new data in the Validation Phase. If lung cancer is detected, the results are communicated for further diagnostics. Otherwise, the predictions are discarded or refined for enhanced accuracy. This multi-layered approach combines scalability (via MapReduce), security (via blockchain), and interpretability (via XAI) to create a reliable and transparent lung cancer prediction system. Table 2 shows the pseudocode of the proposed Local Server Model Training and Global Server Model Aggregation.
Table 2
Pseudocode for local server model training and global server model aggregation.
Steps | Description |
|---|---|
1 | Start |
2 | Collect lung cancer data (CT scans, X-rays) from the dataset83 |
3 | Perform data preprocessing: • Histogram • Noise Reduction • Edge Detection • Color Space Transmission |
4 | Split data into Training Dataset (70%) and Testing Dataset (30%). |
5 | Apply MapReduce for distributed processing: • Map Phase: Divide training data into smaller chunks and process locally. • Shuffle Phase: Group intermediate results by key. • Reduce Phase: Aggregate the processed data into meaningful outputs. |
6 | Store the preprocessed Training Dataset and Testing Dataset securely using Private Blockchain Technology to ensure data integrity and secure sharing. |
7 | Initialization layer weights & ), Error, , and the number of epochs |
8 | For each training pattern, , do: |
9 | a. Feedforward phase (calculate hidden layer error): Calculate using the equation: (8) Calculate using the equation: (9) |
10 | b. Backpropagation (update weights using): (10) (11) |
11 | Increment the epoch count: |
12 | Check Learning Rate: If the desired learning rate is achieved, store the trained model in the Cloud System. If not, retrain the model using local data. |
13 | Send the optimized local model weights and to the Global Cloud for model aggregation. |
14 | Perform Global Model Aggregation in the cloud using inputs from all local models. |
15 | Synchronize the aggregated global model back to local nodes. |
16 | Validation Phase: • Import the trained global model from the cloud. • Use the Testing Dataset to predict lung cancer. |
17 | If lung cancer is predicted, display a “Lung Cancer Found” message. If not, discard the results or refine the model. |
18 | Stop |
Simulation results
This research primarily addresses the critical challenges in lung cancer prediction systems, including computational inefficiency, data privacy concerns, and limited model interpretability. By leveraging MapReduce, the framework efficiently processes large-scale medical imaging datasets, overcoming delays and resource bottlenecks. Private Blockchain Technology is integrated to ensure the secure sharing of sensitive patient information across healthcare sectors, providing a tamper-proof and decentralized data management system. Additionally, FL is employed to enable collaborative model training across institutions without compromising patient confidentiality, addressing data siloing issues. XAI techniques, such as LIME and Grad-CAM, highlight the key regions influencing the AI’s decision-making process to enhance transparency and trust in AI predictions. This research applies the proposed model to a lung cancer dataset83, split into 70% for training and 30% for testing, emphasizing a scalable, secure, and interpretable solution for accurate lung cancer prediction.
To evaluate the effectiveness of the proposed model, the performance of various DL models (local models) was analyzed using the lung cancer dataset. These local models, including CNN, EfficientNetB0, InceptionV3, and DenseNet121, are trained and tested on the dataset and evaluated for accuracy and miss-rate to determine the most effective approach for lung cancer prediction. The CNN model consisted of three convolutional layers with ReLU activation, each followed by max pooling and a flatten layer, which was then connected to a dense layer with 128 units and concluded with a softmax output layer. The training protocol utilized Adam optimizer with a batch size of 32 for 10 epochs using categorical cross-entropy loss. The models were implemented using Python 3.8 and TensorFlow 2.10 in a GPU-enabled environment (Google Colab GPU), with a learning rate set to 0.0001 to ensure optimal convergence. The results highlight the varying performance of different DL models in handling complex medical imaging data, offering insights into their strengths and limitations for accurately detecting lung cancer. The performance matrices of these local models are shown in Table 3.
Table 3. performance analysis of the local client models on training and testing.
Performance matrices | ||||
|---|---|---|---|---|
Proposed local models | Training | Testing | ||
Accuracy | Miss-rate | Accuracy | Miss-rate | |
CNN | 98.56 | 1.44 | 98.21 | 1.79 |
EfficientNetB0 | 90.35 | 9.65 | 90.30 | 9.7 |
InceptionV3 | 94.13 | 5.87 | 93.31 | 6.69 |
DenseNet121 | 94.93 | 5.07 | 94.64 | 5.36 |
Table 3; Fig. 13 provide the performance analysis of the proposed local client models on both training and testing datasets, including CNN, EfficientNetB0, InceptionV3, and DenseNet121. Each model’s performance is measured using accuracy and miss rate. The CNN model achieved a training accuracy of 98.56% and a testing accuracy of 98.21%, with corresponding miss rates of 1.44% and 1.79%, respectively. The EfficientNetB0 model demonstrated a training accuracy of 90.35% and a testing accuracy of 90.30%, with miss rates of 9.65% and 9.7%. The InceptionV3 model achieved a training accuracy of 94.13% and a testing accuracy of 93.31%, with miss rates of 5.87% and 6.69%, respectively. Lastly, the DenseNet121 model recorded a training accuracy of 94.93% and a testing accuracy of 94.64%, with miss rates of 5.07% and 5.36%. These results reflect the training and testing performance of the individual local models applied to the dataset.
Fig. 13 [Images not available. See PDF.]
Performance analysis of the local client models on training and testing.
After implementing local client models, their performance results are aggregated on the global cloud server using FL. FL ensures data privacy by combining model insights without sharing raw patient data. Among the evaluated local DL models, the CNN model emerged as the best-performing global model due to its Superior accuracy of 98.21% and lower miss rate of 1.79%, making it the most reliable for lung cancer prediction.
Fig. 14 [Images not available. See PDF.]
LIME-based interpretation of the global model’s lung cancer prediction.
Figure 14 illustrates the application of LIME on the global model after FL. The left panel shows the original lung CT scan image, while the right panel highlights the critical regions identified by the global model using LIME. These highlighted areas (marked in yellow) represent the region’s most influential in the AI’s decision-making process for predicting lung cancer. This explanation ensures model transparency and provides interpretable insights for clinicians, enabling them to understand and validate the prediction outcomes.
Figure 15 illustrates the interpretability of the global model’s predictions using LIME for lung cancer detection. The figure showcases three sets of images, where each row corresponds to a different input CT scan. The leftmost column displays the Original Images, which serve as the input to the global model. The middle column highlights critical regions contributing to the model’s decision, marked by LIME highlighted features (yellow boundaries), emphasizing the areas the model considers most significant for detecting lung cancer. The rightmost column shows the Focused Mask Overlay, which isolates the highlighted regions to emphasize only the areas influencing the model’s prediction. This figure demonstrates how LIME enhances the transparency of the model’s decisions, making it easier for clinicians to interpret and validate the results for accurate lung cancer diagnosis.
Fig. 15 [Images not available. See PDF.]
LIME feature highlights with mask overlays for lung cancer detection.
Fig. 16 [Images not available. See PDF.]
Grad-CAM heatmaps highlighting key regions contributing to lung cancer predictions.
Figure 16 presents the Grad-CAM visualizations applied to the global model for lung cancer detection. The left column shows the original CT scan images, while the right column highlights the regions of interest using heatmaps. Warmer regions (red, yellow) indicate areas where the model focused on decision-making, whereas cooler regions (blue, green) reflect less significant areas. These visualizations, along with the broader XAI-based analysis employed in this study, provide an interpretable explanation of the model’s predictions. Specifically, we utilized Local Interpretable Model-agnostic Explanations (LIME) and Grad-CAM to identify and highlight the most influential features contributing to the model’s decisions. LIME perturbs the input data to build a simple local model that approximates the global model’s behavior, identifying key superpixels affecting the output. At the same time, Grad-CAM uses gradients to generate heatmaps that localize important areas in the image. These techniques help clinicians verify that the AI is focusing on medically relevant lung regions—such as nodules or tumors—rather than irrelevant artifacts, thereby enhancing trust, transparency, and acceptance of the system in clinical settings.
Table 4. Comparative analysis of the proposed global model with previous works.
References | Models | Performance matrices | |
|---|---|---|---|
Accuracy (%) | Miss-rate (%) | ||
Dutta, A.K., 202266 | Convolutional neural network (CNN) and Random Forest (RF) classifier | 93.25 | 6.75 |
Wang, H. et al., 201768 | CNN | 86 | 14 |
Dirik, M., 202369 | NB and SVM | 91 | 9 |
Mohammed, K.K. et al., 202170 | VNet | 80 | 20 |
Zhang, G. et al., 202271 | UNet | 83.2 | 16.8 |
Bhattacharyya, D. et al., 202372 | Hierarchical attention UNet (HAUNet-3D) | 83.3 | 16.7 |
Liu, F. et al., 202273 | Deep Neural Network VNet | 88.3 | 16.7 |
Boubnovski, M.M. et al., 202274 | A multi-task learning VNet | 94 | 6 |
Dodia, S. et al., 202275 | Regularized VNet and NCNet | 95 | 5 |
Xiao, Z. et al., 202076 | UNet and Res2Net | 95.3 | 4.7 |
Bansal, G. et al., 202077 | Deep 3D Segmentation work (Deep3DSCan) | 95.8 | 4.2 |
Sathish, R. et al., 202078 | 2D CNN | 98.4 | 1.6 |
Ye, Y. et al., 202079 | VNet model and SVM classifier | 66.7 | |
Liao, F. et al., 201980 | 3D CNN and a leaky noisy-OR gate | 81.4 | 18.6 |
Zhou, Y. et al., 202181 | VNet | 84.8 | 15.2 |
Jiang, H. et al., 202082 | Ensembling 3D-Dual Path Networks (DPNs) | 90.2 | 9.8 |
Proposed interpretable global model | FL with XAI | 98.21 | 1.79 |
Table 4 compares the proposed interpretable global model with recent state-of-the-art techniques, highlighting its Superior performance and architectural strength. The proposed model achieves an accuracy of 98.21% with a miss rate of 1.79%, surpassing models that lack integrated privacy, scalability, and interpretability features68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81–82. This demonstrates that the proposed approach is highly accurate and aligns with current trends in ethical and practical AI, making it suitable for real-world clinical adoption.
Conclusion
The challenges in lung cancer prediction systems, such as computational inefficiency, data privacy concerns, and limited model interpretability, were addressed through a robust and scalable framework. MapReduce was integrated to overcome computational bottlenecks and efficiently process large-scale medical imaging datasets. Data security and privacy issues were mitigated using Private Blockchain Technology, ensuring secure and tamper-proof data sharing across institutions. FL enabled decentralized model training, preserving patient confidentiality while allowing collaborative learning across healthcare organizations. Advanced DL models, including CNN, EfficientNetB0, InceptionV3, and DenseNet121, were employed as local models, with their outputs aggregated into a global model achieving a better accuracy of 98.21% and a minimal miss rate of 1.79%. Furthermore, XAI techniques, such as LIME and Grad-CAM, enhanced model transparency by highlighting critical regions influencing predictions. This comprehensive approach ensures efficient, secure, and interpretable lung cancer prediction, achieving superior performance compared to previously published works, and laying the groundwork for reliable AI-driven healthcare diagnostics.
Practical advantages
This study emphasizes the critical need for secure, adaptable, and interpretable AI systems capable of operating effectively within real-world clinical environments, where sensitive data must be managed, and rapid decision-making is essential.
Research limitations
The framework shows strong results but is limited by reliance on a single publicly available dataset, which may introduce dataset-specific bias and affect generalizability. Future validation using multi-center or real-world hospital data is therefore essential. It is also limited by a lack of adversarial testing and communication overheads inherent in federated learning. Furthermore, the integration of multiple components (MapReduce, Blockchain, FL, and XAI) introduces additional computational cost, which may hinder deployment in resource-constrained healthcare environments. From an ethical perspective, the risk of misdiagnosis remains a critical concern. At the same time, the inclusion of explainable methods such as LIME and Grad-CAM enhances transparency and clinician trust; broader validation and governance safeguards will be required to ensure safe real-world deployment.
Future research directions
In the future, the proposed framework can be further enhanced through the integration of multi-modal data sources and optimization across diverse healthcare settings to maximize its generalizability and impact. Future developments may focus on optimizing FL to mitigate communication overhead, support cross-dataset generalization, address data heterogeneity, and incorporate expert-annotated overlays in LIME and Grad-CAM to improve clinical interpretability and decision support. We also plan to conduct a formal ablation study to quantify the individual contributions of each integrated component—MapReduce, Blockchain, FL, and XAI—to performance, scalability, privacy, and interpretability. Future studies will also include additional evaluation metrics (e.g., sensitivity, specificity, F1-score) and statistical validation through t-tests to confirm that observed improvements are statistically significant.
Acknowledgements
This research received funding from the Centre for Secure Cyber-Physical Systems (C2PS), Department of Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates.
Author contributions
Khan Muhammad Adnan, Muhammad Sajid Farooq, and Muhammad Saleem have collected data from different resources and contributed to writing—original draft preparation. Khan Muhammad Adnan., Muhammad Sajid Farooq and Muhammad Saleem performed formal analysis and Simulation, Taher M. Ghazal, Chan Yeob Yeun and Sang-Woong Lee; writing—review and editing, Munir Ahmad, and Sang-Woong Lee; performed supervision, Muhammad Sajid Farooq, Taher M. Ghazal and Chan Yeob Yeun.; drafted pictures and tables, Khan Muhammad Adnan, Muhammad Saleem and Munir Ahmad.; performed revisions and improve the quality of the draft. All authors have read and agreed to the published version of the manuscript.
Funding
This research received funding from the Centre for Secure Cyber-Physical Systems (C2PS), Department of Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates.
Data availability
The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding authors.
Declarations
Competing interests
The authors declare no competing interests.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1. Ahmed, F., Samantasinghar, A., Soomro, A. M., Kim, S. & Choi, K. H. A systematic review of computational approaches to understand cancer biology for informed drug repurposing. Journal of Biomedical Informatics, 142, p.104373. (2023).
2. Siegel, RL et al. Colorectal cancer statistics, 2020. Cancer J. Clin.; 2020; 70,
3. Sung, H et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J. Clin.; 2021; 71,
4. Gandhi, Z. et al. Artificial intelligence and lung cancer: impact on improving patient outcomes. Cancers, 15(21), p.5236. (2023).
5. Li, C. et al. Advances in lung cancer screening and early detection. Cancer biology & medicine, 19(5), p.591. (2022).
6. Kuan, K. et al. Deep learning for lung cancer detection: tackling the kaggle data science bowl 2017 challenge. ArXiv Preprint ArXiv:1705.09435. (2017).
7. Ciompi, F. et al. Towards automatic pulmonary nodule management in lung cancer screening with deep learning. Scientific reports, 7(1), p.46479. (2017).
8. Chan, HP; Hadjiiski, LM; Samala, RK. Computer-aided diagnosis in the era of deep learning. Med. Phys.; 2020; 47,
9. Amin, SU; Alsulaiman, M; Muhammad, G; Mekhtiche, MA; Hossain, MS. Deep learning for EEG motor imagery classification based on multi-layer CNNs feature fusion. Future Generation Comput. Syst.; 2019; 101, pp. 542-554. [DOI: https://dx.doi.org/10.1016/j.future.2019.06.027]
10. Jiang, H; Ma, H; Qian, W; Gao, M; Li, Y. An automatic detection system of lung nodule based on multigroup patch-based deep learning network. IEEE J. Biomedical Health Inf.; 2017; 22,
11. Skourt, BA; El Hassani, A; Majda, A. Lung CT image segmentation using deep neural networks. Procedia Comput. Sci.; 2018; 127, pp. 109-113. [DOI: https://dx.doi.org/10.1016/j.procs.2018.01.104]
12. Prado, MG et al. Symptoms and signs of lung cancer prior to diagnosis: case–control study using electronic health records from ambulatory care within a large US-based tertiary care centre. BMJ Open.; 2023; 13,
13. Sujitha, R; Seenivasagam, V. Extraction with map-reduce framework and correlation-based feature selection in lung cancer towards big data. Indian J. Sci. Technol.; 2020; 13,
14. Renuka Devi, D. & Sasikala, S. Feature selection and classification of big data using mapreduce framework. In Intelligent Computing, Information and Control Systems: ICICCS 2019 (666–673). Springer International Publishing. (2020).
15. Gupta, M., Jain, R., Kumari, M. & Narula, G. Securing Healthcare Data by Using Blockchainpp.93–114 (Applications of blockchain in healthcare, 2021).
16. Zaabar, B., Cheikhrouhou, O., Jamil, F., Ammi, M. & Abid, M. HealthBlock: A secure blockchain-based healthcare data management system. Computer Networks, 200, p.108500. (2021).
17. Chen, Z; Xu, W; Wang, B; Yu, H. A blockchain-based preserving and sharing system for medical data privacy. Future Generation Comput. Syst.; 2021; 124, pp. 338-350. [DOI: https://dx.doi.org/10.1016/j.future.2021.05.023]
18. Begum, S; Sarkar, R; Chakraborty, D; Maulik, U. Identification of biomarker on biological and gene expression data using fuzzy preference based rough set. J. Intell. Syst.; 2020; 30,
19. Razzak, M. I., Naz, S. & Zaib, A. Deep learning for medical image processing: Overview, challenges and the future (Lect. Notes Comput. Vis) (2017).
20. Khan, SUR; Asif, S; Bilal, O; Ali, S. Deep hybrid model for Mpox disease diagnosis from skin lesion images. Int. J. Imaging Syst. Technol.; 2024; 34,
21. Bikku, T. Multi-layered deep learning perceptron approach for health risk prediction. Journal of Big Data, 7(1), p.50. (2020).
22. Saleem, M. et al. Secure and Transparent Mobility in Smart Cities: Revolutionizing AVNs To Predict Traffic Congestion Using MapReduce (Private Blockchain and XAI. IEEE Access, 2024).
23. Saharan, S; Kumar, N; Bawa, S. DyPARK: A dynamic pricing and allocation scheme for smart on-street parking system. IEEE Trans. Intell. Transp. Syst.; 2023; 24,
24. Saleem, M et al. Smart cities: Fusion-based intelligent traffic congestion control system for vehicular networks using machine learning techniques. Egypt. Inf. J.; 2022; 23,
25. Aslam, M. A. et al. October. Neurological Disorder Detection Using OCT Scan Image of Eye. In 2022 International Conference on Cyber Resilience (ICCR) (pp. 01–13). IEEE. (2022).
26. Athar, A. et al. M.R. and March. Improving pneumonia detection in chest X-rays using transfer learning approach (AlexNet) and adversarial training. In 2023 International Conference on Business Analytics for Technology and Security (ICBATS) (pp. 1–7). IEEE. (2023).
27. Sajjad, G. et al. March. An early diagnosis of brain tumor using fused transfer learning. In 2023 International Conference on Business Analytics for Technology and Security (ICBATS) (pp. 1–5). IEEE. (2023).
28. Abbas, S. et al. Smart Vision Transparency: Efficient Ocular Disease Prediction Model Using Explainable Artificial Intelligence. Sensors, 24(20), p.6618. (2024).
29. Shahzad, T. et al. Developing a Transparent Diagnosis Model for Diabetic Retinopathy Using Explainable AI (IEEE Access, 2024).
30. Malik, JA et al. Optimizing agricultural risk management with hybrid Block-Chain and fog computing architectures for secure and efficient data handling. Computational Intelligence in Internet of Agricultural Things; 2024; Springer Nature Switzerland: pp. 309-337. [DOI: https://dx.doi.org/10.1007/978-3-031-67450-1_12]
31. Qadri, S et al. Innovating with quantum computing approaches in Block-Chain for enhanced security and data privacy in agricultural IoT systems. Computational Intelligence in Internet of Agricultural Things; 2024; Springer Nature Switzerland: pp. 339-370. [DOI: https://dx.doi.org/10.1007/978-3-031-67450-1_13]
32. Malik, JA et al. Implementing fog computing in precision agriculture for Real-Time soil health monitoring and data management. Computational Intelligence in Internet of Agricultural Things; 2024; Springer Nature Switzerland: pp. 371-400. [DOI: https://dx.doi.org/10.1007/978-3-031-67450-1_14]
33. Malik, J. A. & Saleem, M. Blockchain and cyber-physical system for security engineering in the smart industry. In Security Engineering for Embedded and Cyber-Physical Systems (51–70). CRC. (2022).
34. Ali, A. et al. February. Enhancing Cybersecurity with Artificial Neural Networks: A Study on Threat Detection and Mitigation Strategies. In 2024 2nd International Conference on Cyber Resilience (ICCR) (pp. 1–5). IEEE. (2024).
35. Saharan, S; Kumar, N; Bawa, S. An efficient smart parking pricing system for smart City environment: A machine-learning based approach. Future Generation Comput. Syst.; 2020; 106, pp. 622-640. [DOI: https://dx.doi.org/10.1016/j.future.2020.01.031]
36. Saharan, S., Baway, S., Kumarz, N. & December OP 3 S: On-street occupancy based parking prices prediction system for ITS. In 2020 IEEE Globecom Workshops (GC Wkshps (pp. 1–6). IEEE. (2020).
37. Mirza, B. et al. Machine learning and integrative analysis of biomedical big data. Genes, 10(2), p.87. (2019).
38. Bikku, T., Jain, A., Changala, R., Kumar, K. S. & Dhyani, B. Data extraction to identify and analyze the symptoms of mental illness. In Demystifying the Role of Natural Language Processing (NLP) in Mental Health (181–198) (IGI Global Scientific Publishing, 2025).
39. Loh, H. W. et al. Application of deep learning models for automated identification of Parkinson’s disease: A review (2011–2021). Sensors, 21(21), p.7034. (2021).
40. Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 1–9 (2012).
41. Bilal, O. et al. Differential evolution optimization based ensemble framework for accurate cervical cancer diagnosis. Applied Soft Computing, 167, p.112366. (2024).
42. Lee, JG et al. Deep learning in medical imaging: general overview. Korean J. Radiol.; 2017; 18,
43. Varghese, J. Artificial intelligence in medicine: chances and challenges for wide clinical adoption. Visc. Med.; 2020; 36,
44. Hekmat, A., Zhang, Z., Khan, S. U. R., Shad, I. & Bilal, O. An attention-fused architecture for brain tumor diagnosis. Biomedical Signal Processing and Control, 101, p.107221. (2025).
45. Srinivasu, P. N. et al. An interpretable approach with explainable AI for heart stroke prediction. Diagnostics, 14(2), p.128. (2024).
46. BIKKU, T et al. Healthcare biclustering of predictive gene expression using LSTM based support vector machine. Informing Sci.; 2025; 28, 12. [DOI: https://dx.doi.org/10.28945/5446]
47. Nannapaneni, D., Saikam, V. R. S. V., Siddu, R., Challapalli, V. M. & Rachapudi, V. February. Enhanced Image-based Histopathology Lung Cancer Detection. In 2023 7th International Conference on Computing Methodologies and Communication (ICCMC) (pp. 620–625). IEEE. (2023).
48. Shimazaki, A. et al. Deep learning-based algorithm for lung cancer detection on chest radiographs using the segmentation method. Scientific Reports, 12(1), p.727. (2022).
49. Shakeel, P. M., Burhanuddin, M. A. & Desa, M. I. Automatic Lung Cancer Detection from CT Image Using Improved Deep Neural Network and Ensemble Classifierpp.1–14 (Neural Computing and Applications, 2022).
50. Shen, Z., Cao, P., Yang, J. & Zaiane, O. R. WS-LungNet: A two-stage weakly-supervised lung cancer detection and diagnosis network. Computers in Biology and Medicine, 154, p.106587. (2023).
51. Xu, H. January. Comparison of CNN Models in Non-small Lung Cancer Diagnosis. In 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA) (pp. 1169–1174). IEEE. (2023).
52. Bherje, A. et al. March. Design of deep learning-based approach to predict lung cancer on CT scan images. In 2024 5th International Conference on Innovative Trends in Information Technology (ICITIIT) (pp. 1–5). IEEE. (2024).
53. Wu, Q. et al. Immunotherapy Efficacy Prediction for Non-Small Cell Lung Cancer Using Multi-View Adaptive Weighted Graph Convolutional Networks (IEEE journal of biomedical and health informatics, 2023).
54. Tortora, M et al. RadioPathomics: multimodal learning in non-small cell lung cancer for adaptive radiotherapy. IEEE Access.; 2023; 11, pp. 47563-47578. [DOI: https://dx.doi.org/10.1109/ACCESS.2023.3275126]
55. Chandran, U et al. Machine learning and real-world data to predict lung cancer risk in routine care. Cancer Epidemiol. Biomarkers Prev.; 2023; 32,
56. Raghu, VK et al. Validation of a deep learning–based model to predict lung cancer risk using chest radiographs and electronic medical record data. JAMA Netw. Open.; 2022; 5,
57. Soni, M et al. Hybridizing convolutional neural network for classification of lung diseases. Int. J. Swarm Intell. Res. (IJSIR); 2022; 13,
58. Rajinikanth, V. et al. August. UNet with two-fold training for effective segmentation of lung section in chest X-ray. In 2022 third international conference on intelligent computing instrumentation and control technologies (ICICICT) (pp. 977–981). IEEE. (2022).
59. Ali, AM; Mohammed, MA. A comprehensive review of artificial intelligence approaches in omics data processing: evaluating progress and challenges. Int. J. Math. Stat. Comput. Sci.; 2024; 2, pp. 114-167. [DOI: https://dx.doi.org/10.59543/ijmscs.v2i.8703]
60. Shaffie, A. et al. April. Radiomic-based framework for early diagnosis of lung cancer. In 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019) (pp. 1293–1297). IEEE. (2019).
61. Agrawal, A., Misra, S., Narayanan, R., Polepeddi, L. & Choudhary, A. August. A lung cancer outcome calculator using ensemble data mining on SEER data. In Proceedings of the tenth international workshop on data mining in bioinformatics (pp. 1–9). (2011).
62. Mohalder, R. D., Ali, F. B., Paul, L. & Talukder, K. H. December. Deep learning-based colon cancer tumor prediction using histopathological images. In 2022 25th International Conference on Computer and Information Technology (ICCIT) (pp. 629–634). IEEE. (2022).
63. Su, Y et al. Colon cancer diagnosis and staging classification based on machine learning and bioinformatics analysis. Comput. Biol. Med.; 2022; 145, 105409. [DOI: https://dx.doi.org/10.1016/j.compbiomed.2022.105409] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35339846]
64. Garg, S. & Garg, S. December. Prediction of lung and colon cancer through analysis of histopathological images by utilizing Pre-trained CNN models with visualization of class activation and saliency maps. In Proceedings of the 2020 3rd Artificial Intelligence and Cloud Computing Conference (pp. 38–45). (2020).
65. Li, L; Liu, Z; Huang, H; Lin, M; Luo, D. Evaluating the performance of a deep learning-based computer‐aided diagnosis (DL‐CAD) system for detecting and characterizing lung nodules: comparison with the performance of double reading by radiologists. Thorac. Cancer; 2019; 10,
66. Teramoto, A; Tsukamoto, T; Kiriyama, Y; Fujita, H. Automated classification of lung cancer types from cytological images using deep convolutional neural networks. Biomed. Res. Int.; 2017; 2017,
67. Dutta, A. K. Detecting lung cancer using machine learning techniques. Intell. Autom. Soft Comput.31(2), 1007–1023. https://doi.org/10.32604/iasc.2022.019778 (2022).
68. Wang, H. et al. Comparison of Machine Learning Methods for Classifying Mediastinal Lymph Node Metastasis of non-small Cell Lung Cancer from 18 F-FDG PET/CT Images7pp.1–11 (EJNMMI research, 2017).
69. Dirik, M. Machine learning-based lung cancer diagnosis. Turkish J. Eng.; 2023; 7,
70. Mohammed, KK; Hassanien, AE; Afify, HM. A 3D image segmentation for lung cancer using V. Net architecture based deep convolutional networks. J. Med. Eng. Technol.; 2021; 45,
71. Zhang, G; Yang, Z; Jiang, S. Automatic lung tumor segmentation from CT images using improved 3D densely connected UNet. Med. Biol. Eng. Comput.; 2022; 60,
72. Bhattacharyya, D; Thirupathi Rao, N; Joshua, ESN; Hu, YC. A bi-directional deep learning architecture for lung nodule semantic segmentation. Visual Comput.; 2023; 39,
73. Liu, F., Chen, Z. & Sun, P. May. Detection and segmentation of pulmonary nodules based on improved 3D VNet algorithm. In International Conference on Algorithms, Microchips and Network Applications (Vol. 12176, pp. 51–59). SPIE. (2022).
74. Boubnovski, MM et al. Development of a multi-task learning V-Net for pulmonary Lobar segmentation on CT and application to diseased lungs. Clin. Radiol.; 2022; 77,
75. Dodia, S; Basava, A; Padukudru Anand, M. A novel receptive field-regularized V‐net and nodule classification network for lung nodule detection. Int. J. Imaging Syst. Technol.; 2022; 32,
76. Xiao, Z., Liu, B., Geng, L., Zhang, F. & Liu, Y. Segmentation of lung nodules using improved 3D-UNet neural network. Symmetry, 12(11), p.1787. (2020).
77. Bansal, G; Chamola, V; Narang, P; Kumar, S; Raman, S. Deep3DSCan: deep residual network and morphological descriptor based framework forlung cancer classification and 3D segmentation. IET Image Proc.; 2020; 14,
78. Sathish, R., Sathish, R., Sethuraman, R. & Sheet, D. July. Lung segmentation and nodule detection in computed tomography scan using a convolutional neural network trained adversarially using turing test loss. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) (pp. 1331–1334). IEEE. (2020).
79. Ye, Y; Tian, M; Liu, Q; Tai, HM. Pulmonary nodule detection using v-net and high-level descriptor based Svm classifier. Ieee Access.; 2020; 8, pp. 176033-176041. [DOI: https://dx.doi.org/10.1109/ACCESS.2020.3026168]
80. Liao, F; Liang, M; Li, Z; Hu, X; Song, S. Evaluate the malignancy of pulmonary nodules using the 3-d deep leaky noisy-or network. IEEE Trans. Neural Networks Learn. Syst.; 2019; 30,
81. Zhou, Y. et al. Classification of benign and malignant lung nodules based on residuals and 3D VNet network. In 2021 China Automation Congress (CAC) (1555–1559) (IEEE, 2021).
82. Jiang, H., Gao, F., Xu, X., Huang, F. & Zhu, S. Attentive and Ensemble 3D Dual Path Networks for Pulmonary Nodules Classification398pp.422–430 (Neurocomputing, 2020).
83. https://www.kaggle.com/datasets/adityamahimkar/iqothnccd-lung-cancer-dataset/data
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.