Content area

Abstract

Additive manufacturing has emerged as one of the revolutionary technologies of today, enabling quick prototyping, customized production, and reduced material waste. However, its reliability is often weakened due to faults arising during printing, which remain undetected and, thus, give rise to product defects, waste generation, and safety issues. Most of the existing fault detection methods suffer from limited accuracy, poor adaptability within different printing conditions, and a lack of real-time monitoring capability. These factors critically limit their effectiveness in practical deployment. To address these limitations, the current study proposes a novel process control approach for additive manufacturing with the integration of advanced segmentation, detection, and monitoring strategies. The implemented framework involves segmentation of layer regions using MaskLab-CRFNet, integrating Mask R-CNN, DeepLabv3, and Conditional Random Fields for precise defect location; detection is performed by MoShuResNet, hybridizing MobileNetV3, ShuffleNet, and Residual U-Net for lightweight yet robust fault classification; and monitoring is done by BLC-MonitorNet, which incorporates Bayesian deep networks, ConvAE-LSTM, and convolutional autoencoders together for reliable real-time anomaly detection. Experimental evaluation demonstrates superior performance, with the achievement of 99.31% accuracy and 97.73% sensitivity. This work presents a reliable and interpretable process control framework for additive manufacturing that will improve safety, efficiency, and sustainability.

Full text

Turn on search term navigation

1. Introduction

Additive manufacturing, popularly known as 3D printing, is an evolved technology in modern industries [1,2]. AM fabricates components layer by layer from digital designs, allowing for the construction of complicated geometries, personalized products, and rapid prototyping in domains such as the healthcare, aero-space, automotive, and consumer electronics industries [3,4]. Considering its societal impact, it offers sustainable production with minimal material waste; reduced chains in the supply system; and the possibility of localized, on-demand manufacturing [5]. However, despite its growing adoption, AM is still susceptible to numerous faults and anomalies inside the printing process that compromise the quality, safety, and reliability of the fabricated parts [6,7].

Effective process control in AM is required to ensure consistency and reliability [8]. Thus, real-time monitoring and fault detection offer proper identification at early stages to help minimize resource wastage and, in particular, catastrophic failure in critical applications [9,10]. The existing methods of fault detection using DMAIC, 3D-CNN, PCNN, and LADRC have been improving but are still plagued with major flaws, including poor generalization for various types of defects; inadequate representation of fine-grained structural irregularities; high computation costs, making them unsuitable for edge deployment; and, thus, a lack of interpretability within the decision-making process [11,12]. There is an urgent need for advanced methods that can bring efficiency, scalability, and explainability together to achieve appropriate process control in AM [13].

According to Gibson et al. [1], AM includes a suite of technologies that fabricate components directly from digital models by depositing materials layer by layer, enabling extraordinary opportunities for customization and sustainability. As discussed by Prashar et al. [2], AM has moved from just being prototyped to being widely used industrially with the developments of automation, smart sensors, and digital connectivity. Similarly, Kumar and Chhabra [3] highlighted that AM not only reduces material waste but also aids in the creation of a clean fabrication framework supporting sustainable product development. However, despite all these advantages, process instabilities like thermal distortion, lack of fusion, keyhole defects, and residual stresses are considerable problems in producing products with repeatable quality. These defects often originate from micro-scale variations in laser power, scanning velocity, the powder deposition rate, or temperature gradients, as outlined by Gu et al. [4], who emphasized the critical importance of integrating material, structure, and performance parameters within metal AM processes. From an industrial operation perspective, according to Hohn and Durach [5], the diffusion of AM across manufacturing chains will require robust monitoring and governance frameworks, which can assure reliability and traceability.

Recent studies have put great emphasis on real-time monitoring and anomaly detection as the main factors that lead to intelligent AM systems. Chen et al. [6] introduced an unsupervised metal AM online anomaly detection system that utilizes a statistical time–frequency-domain algorithm to very effectively identify faults in real-time operations without the use of labeled data. Nevertheless, these techniques are still far from being completely convincing, as most of the current solutions are still dependent on single-sensor data or offline analysis, which means that they cannot be very responsive and adaptive to the changes in industrial environments.

1.1. Research Gap

Although additive manufacturing allows for quick prototyping and complex shapes, currently available fault detection systems have low accuracy; are highly dependent on the conditions; and cannot provide interpretable, real-time monitoring. Most methods only concentrate on process control or defect detection, while very few combine the lightweight, edge-deployable solutions able to perform continuous anomaly detection.

1.2. Research Hypothesis

We speculate that a multi-stage, streamlined deep learning architecture that combines hierarchical segmentation, dual-branch feature extraction, and adaptive monitoring will be able to deliver higher levels of accuracy, real-time capability, and interpretability of fault detection in additive manufacturing under different printing conditions.

The proposed work will solve these issues by presenting a multi-step approach to 3D printing fault detection and process control. The framework incorporates improved preprocessing and segmentation plans, lightweight and powerful diagnostic modeling, real-time monitoring systems, and adaptive learning to be used in long-term deployment. Specifically, the methodology focuses on the proper representation of layer-wise anomalies, strong diagnostic prediction in resource-constrained settings, and the provision of explainable AI to support interpretable decisions [14,15]. The combination of these innovations will help the approach to fill the gaps that the existing methods have, which will eventually guarantee greater fault detection accuracy and operational reliability.

The work will help to improve AM process control by creating a stable, adaptive, and understandable framework that will not only help to improve the quality of the printed components but also increase confidence in the use of AM systems in critical applications in society.

1.3. Major Contributions

The following are the aims of this work:

To improve fault localization by proposing MaskLab-CRFNet, a hybrid segmentation pipeline combining Mask R-CNN, DeepLabv3, and Conditional Random Fields for precise layer-region representation;

To strengthen diagnostic capability through MoShuResNet, a lightweight yet effective hybrid model that integrates MobileNetV3, ShuffleNet, and Residual U-Net, optimized with pruning and quantization for edge deployment;

To ensure real-time and reliable monitoring using BLC-MonitorNet, which integrates Bayesian DNN for uncertainty-aware inference, ConvAE-LSTM for reconstruction-based anomaly detection, and CAE for feature drift detection.

The remainder of this paper is organized as follows: A literature review based on the current work is found in Section 2. The system model, comprising the framework and key elements, is presented in Section 3. The results and comparative analysis are reported in Section 4. Finally, the work comes to an end in Section 5 with the conclusion.

2. Literature Review

The authors of [16] suggested a framework grounded in the DMAIC approach and a multifaceted set of specific KPIs to enhance the quality and sustainability performance collaboratively in additive manufacturing. A case study was used to illustrate the framework, showing that it is possible to generalize DMAIC to AM. It pointed out that the strategy was successful but had to be tailored to particular company or industry settings to make it as broadly applicable as possible. The authors of [17] presented a defect detection system based on the use of a 3D convolutional neural network (3D-CNN) with in situ monitoring of light intensity in L-PBF. The suggested model categorized lack-of-fusion and keyhole induced defects and estimated the local volume fraction to determine the severity of defects. The framework was highly accurate in identifying pores and gave detailed information on defect characterization because it employed a joint classification and regression approach. In addition, a feature-level multi-sensor fusion method of in situ quality monitoring of selective laser melting processes was suggested in [18]. The proposed approach combined acoustic emission and photodiode signals and converted one-dimensional sensor data into two-dimensional images, which were analyzed with a convolutional neural network. The fusion-based framework was better than the baseline models and provided better defect detection and quality monitoring in the process of SLM. The authors of [19] created a multi-sensor image fusion model to detect defects in powder bed fusion processes. The method used visible and infrared imaging and combined the defect information of modalities using FDST, MSSTO, and an improved PCNN model. The suggested framework enhanced contrast, texture retention, and the richness of the image, achieving better results than current algorithms. It showed great potential in the more precise detection of flaws and quality control in additive manufacturing settings. In [20], a linear active disturbance rejection control (LADRC) approach to controlling the size of melt pools in selective laser melting was suggested. The LADRC framework offered fast reference tracking and efficient disturbance rejection with an energy balance model to explain the disturbances caused by the tracks that had been scanned previously. The outcomes of the simulation revealed that the simulation was much better than the traditional PID controllers in terms of error reduction, overshoot, and robustness, which guaranteed a more stable melt pool.

The authors of [21] suggested a data-driven approach to in situ thermal monitoring of material extrusion additive manufacturing in 2022. The method determined defect-related thermal anomalies by monitoring and modeling cooling histories on printed geometries. The technique also presented automated local temperature inhomogeneities—or hot and cold spots—which were used to signal possible defects. The framework was applied to a real BAAM case study, and it demonstrated the great potential of zero-defect production. A closed-loop control system of wire-and-arc additive manufacturing was proposed in [22] to ensure a fixed-contact tube-to-workpiece distance. The system dynamically adjusted z-axis movements by computing the distance between electrical resistance signals, and this method did not require predetermined layer steps. The closed-loop strategy ensured dimensional drift and a stabilized deposition and preserved mechanical properties as compared to open-loop strategies, which allowed for fully automated and reliable WAAM processes. In 2024, Horr and Amir [23] suggested a hybrid digital-twin architecture that combines reduced-order modeling with machine learning to assist in real-time process control in additive manufacturing. The system was able to simulate and optimize transient processes with fast thermal variations through the combination of physical modeling and data-driven neural network modules. The framework was confirmed by a real-world WAAM case study that showed that it could optimize process parameters and enhance monitoring and cyber-physical integration in AM.

In [24], a closed-loop control framework for regulating laser power in L-PBF using real-time melt-pool thermal emissions was introduced. By correlating thermal signals with dimensional printing errors, the system dynamically adjusted laser input at high frequency to prevent defects such as over-melt and balling. Implemented on a testbed platform, the proposed control approach significantly improved dimensional accuracy and surface quality, outperforming fixed-parameter strategies in maintaining part consistency. The authors of [25] proposed a heterogeneous sensor data fusion framework for detecting multiscale flaws in laser powder bed fusion. By integrating thermal, spatter, and optical imaging data, the approach extracted spectral graph-based process signatures to identify porosity, layer distortions, and geometry-related flaws. The framework achieved over 93% detection fidelity across varying geometries and conditions, outperforming single-sensor models and demonstrating the robustness of multi-source monitoring for in situ quality assurance in AM. Table 1 presents a comparison of the related works on AM process control.

3. Proposed Methodology

The proposed methodology integrates a multi-stage framework for reliable 3D printing fault detection, beginning with data collection from two publicly available datasets: the Early Detection of 3D Printing Issues Dataset and the 3D-Printer Defected Dataset. These datasets are fused and fed into the data preprocessing stage, which includes noise reduction via Gaussian filtering, normalization using CLAHE, data augmentation through random rotations and flips, and segmentation of layer regions via the proposed MaskLab-CRFNet pipeline consisting of Mask R-CNN followed by DeepLabv3 and Conditional Random Fields, enhancing layer-wise defect representation. Feature extraction follows a dual-branch approach, capturing low-level visual features such as Gabor filters, LBP, and HOG, and shape and structural features including Zernike moments, Fourier descriptors, and edge density with contour statistics. Extracted features are fused using canonical correlation analysis and passed to MoShuResNet, combining MobileNetV3, ShuffleNet, and Residual U-Net for high-confidence detection, optimized via pruning and quantization for edge deployment. A three-stage BLC-MonitorNet including BDNN, ConvAE-LSTM, and CAE ensures real-time monitoring, while continual learning via EWC enables adaptive fault detection. Finally, explainable AI techniques provide interpretable decision support. The architecture of the proposed approach is represented in Figure 1.

3.1. Data Collection

The first dataset employed is the Early Detection of 3D Printing Issues Dataset (https://www.kaggle.com/datasets/gauravduttakiit/early-detection-of-3d-printing-issues/data?select=test) (accessed on: 1 October 2025), which focuses on identifying under-extrusion defects. The data are close-up images taken near the printer nozzle, where anomalies such as inconsistent filament flow and incomplete layer deposition can be seen. This dataset’s main goal is to be used for the creation of predictive models that can detect under-extrusion as a result of the printing process.

The second utilized dataset is the 3D-Printer Defected Dataset (https://www.kaggle.com/datasets/justin900429/3d-printer-defected-dataset) (accessed on: 1 October 2025), designed for anomaly detection in 3D printing. The dataset comprises two different groups: defected samples and non-defected samples, which can be used as reference for the training and the performance evaluation of classification models. In addition, the dataset offers instructions for quantization in order to facilitate the deployment of models with reduced computational complexity.

To strengthen robustness and generalizability, the two datasets are fused and integrated into a unified dataset, which is mathematically expressed as Equation (1).

(1)IURAW=D1D2

Here, D1 denotes the Early Detection of 3D Printing Issues Dataset and D2 represents the 3D-Printer Defected Dataset. The union operation integrates both sources into a unified collection, denoted as IURAW, which serves as the input for the subsequent data preprocessing stage (Section 3.2).

3.2. Data Preprocessing

Effective preprocessing is essential for ensuring that the fused raw dataset (IURAW) is transformed into a structured and noise-free representation; this is the minimum requirement for the next feature extraction and fault detection stages. The stage here is mainly about the three core aspects: (i) noise reduction and normalization for image quality enhancement; (ii) data augmentation to ensure that the model can generalize well to different defect scenarios; and (iii) segmentation of layer regions via a hierarchical pipeline (Mask R-CNN → DeepLabv3+ → CRF), which is named MaskLab-CRFNet here, for the exact localization of defective zones.

3.2.1. Noise Reduction and Normalization

The fused raw dataset (IURAW) is initially subjected to a noise reduction stage using Gaussian filtering, which smooths high-frequency variations while preserving essential structural details of the printed layers. Given an input image (Ix,y), the Gaussian filter is expressed as Equation (2):

(2)IGx,y=u=kkv=kkIxu,yv·Gu,v;σs

where Gu,v;σs=12πσs2expu2+v2σs2 is the Gaussian kernel with a standard deviation of σs and IGx,y represents the noise-suppressed image.

Subsequently, the denoised output (IGx,y) is subjected to the normalization phase with the use of Contrast Limited Adaptive Histogram Equalization (CLAHE). CLAHE improves local contrast and, at the same time, avoids noise over-amplification by performing histogram equalization within contextual regions and clipping histogram peaks. For each contextual region (R_c), the transformation function is defined as Equation (3):

(3)ICx,y=CLAHEIGx,y,Rc,αcp

where αcp denotes the clip-limit parameter that controls contrast amplification and ICx,y denotes the contrast-enhanced image. The final output of this subsection is represented as INN=ICx,y, where INN serves as the input to Section 3.2.2 (Data Augmentation) to enhance dataset variability and robustness. Noise reduction via Gaussian filtering and normalization via CLAHE are graphically depicted in Figure 2.

3.2.2. Data Augmentation

The contrast-enhanced dataset (INN) derived from the noise reduction and normalization stage is additionally made larger by augmentation methods to make it less susceptible to variations in orientation and geometry. Such a procedure makes sure that the training models generalize to the different defect conditions that come from the various viewpoints; thus, the danger of overfitting to the limited viewpoints is lowered.

The first augmentation strategy applies random rotation to each image within a predefined angular range of θθmax,θmax. For an input pixel coordinate (x,y), the transformed coordinate after rotation is given by Equation (4):

(4)xy=cosθsinθsinθcosθxy

This transformation produces a rotated image (IRx,y, where θ is sampled randomly for each instance). The second augmentation step applies horizontal and vertical flipping, which introduces geometric invariance with respect to reflection. Formally, the flipping operations are expressed as Equation (5):

(5)IHx,y=IRWx,y, IVx,y=IRx,Hy

where W and H denote the width and height of the image, respectively. The horizontal flip (IH) reflects the image across the vertical axis, while the vertical flip (IV) reflects it across the horizontal axis. Figure 3 provides the data augmentation process.

The augmented dataset is finally represented as IAUG=IR,IH,IV, which consolidates rotated and flipped variants of the normalized images. This augmented dataset (IAUG) is then supplied as the input to Section 3.2.3 (Segmentation of Layer Regions via MaskLab-CRFNet), where precise defect localization is performed.

3.2.3. Segmentation of Layer Regions via MaskLab-CRFNet

The augmented dataset (IAUG) obtained from the previous stage is subjected to a segmentation pipeline in order to isolate and precisely delineate defective regions in the printed layers. This is a necessary step for the implementation of local fault detection, since faint faults like under-extrusion or irregular layer deposition can be limited to small areas of space.

To achieve high segmentation accuracy, a novel hybrid framework termed MaskLab-CRFNet is employed. The framework integrates three components in a sequential flow: (i) Mask R-CNN for initial region proposal and coarse mask generation, (ii) DeepLabv3+ for multi-scale feature refinement and semantic segmentation, and (iii) Conditional Random Fields (CRF) for boundary-aware mask optimization. The overall pipeline is represented as Equation (6):

(6)ISEG=CRFDeepLabv3+Mask RCNNIAUG

where ISEG denotes the final segmented output highlighting defective layer regions. This output is forwarded to the feature engineering stage (Section 3.3) for systematic representation learning.

Mask R-CNN

The first step in the MaskLab-CRFNet pipeline employs Mask R-CNN to generate initial region proposals and coarse segmentation masks for defective areas in the augmented dataset (IAUG). Mask R-CNN extends the Faster R-CNN object detection framework by incorporating a parallel branch for prediction of pixel-level masks, enabling simultaneous detection and segmentation of target regions.

Given an input image (IAUG), Mask R-CNN first extracts deep feature maps (Fm), using a backbone network such as ResNet combined with a Feature Pyramid Network (FPN) to capture multi-scale representations as Fm=FPNResNetIAUG.

The feature maps (Fm) are then passed to a Region Proposal Network (RPN), which predicts a set of candidate bounding boxes {BCi} with corresponding objectness scores as Equation (7):

(7)BCi,si=RPNFm, i=1,2,,Npr

where Npr denotes the number of proposed regions and si represents the likelihood that a region contains a defect.

For each proposed bounding box (BCi), Mask R-CNN applies RoIAlign to extract fixed-size feature representations (FiRoI), which are then used to predict a class label (y^i), a refined bounding box (BC^i), and a binary mask (bMi) indicating the defective region within the box as Equation (8):

(8) y^i,BC^i,bMi=Mask HeadFiRoI,i=1,2,,Npr

The resulting set of coarse masks {bMi} represents the preliminary segmentation of defects, capturing their approximate location and shape. These masks preserve structural information critical for subsequent refinement in DeepLabv3+, ensuring that the final segmentation accurately delineates defect boundaries. The output of this stage is denoted as IMASK=bMii=1Npr, which serves as the input for the next component in the pipeline.

DeepLabv3+

Coarse masks (IMASK) produced by Mask R-CNN are the input to the second stage of the MaskLab-CRFNet pipeline, where DeepLabv3+ is used to obtain detailed semantic segmentation. The Mask R-CNN localizes the defective regions roughly; however, it fails to capture the faint boundaries and multi-scale contextual changes. DeepLabv3+ mitigates these issues by using atrous spatial pyramid pooling (ASPP) and an encoder–decoder structure, enabling it refine the global context, as well as local detail.

Given the coarse input masks (IMASK), DeepLabv3+ extracts a refined feature representation (FDL) through an encoder that applies atrous convolution with varying dilation rates, which is defined as Equation (9):

(9)FDL=ASPPIMASK=rdRdIMASK×rdWrd

where ×rd denotes convolution with a dilation rate of rd, Wrd represents the learnable filters, and Rd is the set of predefined dilation rates. The model can identify contextual cues at different receptive fields through this formulation without losing resolution.

The DeepLabv3+ decoder phase combines these multi-scale features with low-level spatial features through skip connections; thus, it is able to reconstruct high-resolution segmentation maps. The final segmentation output of DeepLabv3+ is represented as IDL=DecoderFDL, where IDL provides semantically enriched segmentation masks with improved boundary alignment and defect-region consistency compared to the coarse predictions of Mask R-CNN.

This refined output (IDL) is subsequently passed to the Conditional Random Field (CRF) module for boundary-aware optimization, ensuring precise localization of defect edges.

Conditional Random Fields (CRF)

The segmentation maps (IDL) produced by DeepLabv3+ are semantically enriched to represent defect regions; nevertheless, problems with coarser edges still exist, in addition to blurred boundaries and pixel-level misclassifications caused by the inherent limitations of convolutional feature aggregation. In order to solve this problem, the last stage of the MaskLab-CRFNet pipeline uses Conditional Random Fields (CRF) as a boundary refinement tool. CRF enhances pixel-level accuracy by incorporating contextual dependencies between neighboring pixels, thereby ensuring sharper defect localization.

In a more formal manner, Conditional Random Fields (CRF) is presented as a probabilistic graphical framework that, to determine the label of each pixel (xi), utilizes not only unary potentials coming from DeepLabv3+ prediction but also pairwise potentials that represent the constraints of spatial smoothness. The CRF energy function is given by Equation (10):

(10)EFx=iψuxi+i<jψpxi,xj

where ψuxi denotes the unary potential representing the likelihood of pixel iii belonging to a defect class based on DeepLabv3+ outputs, while ψpxi,xj represents the pairwise potential that enforces consistency between neighboring pixels i and j. A common choice for the pairwise term is a Gaussian kernel that penalizes label differences between nearby pixels with similar color and position.

The refined segmentation map is obtained by minimizing the above energy function as Equation (11):

(11)ISEG=argminx EFx

This yielding boundary-preserving and structurally coherent defect masks. Thus, the final segmented output (ISEG) represents a precise delineation of defect-prone layer regions, combining coarse region proposals (Mask R-CNN), multi-scale semantic refinement (DeepLabv3+), and boundary optimization (CRF). The segmentation outcome is graphically shown in Figure 4.

The present output is a structured representation of the input, which will be used in the next phase of feature engineering (Section 3.3), where discriminative features will be extracted for hierarchical fault detection.

Boundary Accuracy Enhancement in MaskLab-CRFNet

The suggested MaskLab-CRFNet model is equipped with three synergistic elements that work together to achieve the goal of improving the boundary precision in the segmentation of additive manufacturing (AM) defects:

Mask R-CNN for instance-level segmentation and region proposal;

DeepLabv3+ for multi-scale contextual feature refinement using atrous spatial pyramid pooling (ASPP); and

Conditional Random Fields (CRF) for fine-grained boundary optimization at the pixel level.

This hybrid pipeline surpasses the typical limitation of boundary blurring that is usually noticed in conventional CNN-based architectures. Namely, Mask R-CNN is able to effectively capture object contours, although sub-pixel edge precision may be slightly deteriorated, whereas DeepLabv3+ enhances contextual consistency but is not pixel adjacency-aware. Thus, incorporating CRF provides edge-aware refinement, allowing the method to enforce local smoothness and intensity consistency, resulting in crisp defect contours, even when there is a high level of noise and complicated melt-pool textures.

Mathematically, the CRF module minimizes the Gibbs energy function:

(12)E(x)=i ψu(xi)+i<jψp(xi,xj)

where ψu(xi) is the unary potential from DeepLabv3+ outputs and ψp(xi,xj) encodes pairwise relationships between neighboring pixels, modulated by intensity and spatial proximity kernels. This formulation ensures that the final segmentation map maintains high structural fidelity around defect edges.

The results unequivocally show that MaskLab-CRFNet surpasses both Mask R-CNN and DeepLabv3+ in boundary-aware segmentation. The increase in the boundary F1 score of 3–5% characterizes more precise drawing of defect edges, whereas the decrease in the Hausdorff distance (↓ 2.4 px) corroborates greater contour accuracy. The CRF layer smoothly merges the conflicting segmentation boundaries, especially in the case of thin-layer voids, micro-holes, and thermal cracks, which are areas where standard CNNs usually result in fragmented masks.

Overall, MaskLab-CRFNet achieves very accurate boundaries through the integration of object-level discrimination (Mask R-CNN), context-aware refinement (DeepLabv3+), and pixel-wise consistency (CRF), resulting in morphologically consistent and visually correct defect segmentation, which is a prerequisite for the subsequent fault diagnosis stage in real-time AM monitoring.

3.3. Feature Engineering

The segmented output (ISEG) from MaskLab-CRFNet is converted to discriminative feature representations by means of a dual-branch feature engineering strategy. This step is executed simultaneously in two interrelated domains: (i) low-level visual features that can capture texture, gradient, and local appearance cues and (ii) shape and structural features that can represent the geometries and the boundaries of the printed layers.

Each branch independently extracts and fuses its respective feature sets, yielding two distinct outputs: FLVF from the low-level visual branch and FSSF from the shape–structural branch. These outputs are preserved as separate representations and subsequently integrated in the hierarchical fault detection stage (Section 3.4).

3.3.1. Low-Level Visual Features

The initial feature engineering branch dives into the segmentation data (ISEG) to extract low-level visual descriptors. These descriptors are instrumental in representing tiny changes in texture, gradient orientation, and spatial intensity patterns, which are the most common ways for the machine to show that it has been under-extruded or irregular deposition has occurred. Three complementary methods are implemented one after another—Gabor filters, Local Binary Patterns (LBP), and Histogram of Oriented Gradients (HOG)—each yielding a set of discriminative features that are later combined to create the low-level visual feature vector.

Gabor Filters: Gabor filters are applied to ISEG to capture localized frequency and orientation-specific information. A 2D Gabor filter is defined as Equation (13):

(13)GFa,b;λw,θo,ψoff,σs,γar=expa2+γar2b22σs2cos2πaλw+ψoff

where λw denotes the wavelength, θo represents the orientation, ψoff refers phase offset, σs represents the standard deviation, and γar signifies the aspect ratio. The transformed response is given by Equation (14):

(14)FGabor=ISEGGFa,b;λw,θo,ψoff,σs,γar

where denotes convolution. This step highlights defect-related textures across multiple orientations and scales.

Local Binary Patterns (LBP): LBP encodes local texture by thresholding the neighborhood of each pixel with respect to its center value. For a pixel at location (a,b), the LBP operator is expressed as Equation (15):

(15)LBPa,b=p=0P1sIpIC·2p

where IC is the intensity of the center pixel, Ip represents the neighboring pixels in a circle of radius o, and sz=1if z0,0if z<0.. The extracted LBP histogram forms the FLBP representation, capturing micro-patterns indicative of surface inconsistencies.

Histogram of Oriented Gradients (HOG): To capture gradient distribution and edge structure, HOG is applied to ISEG. The gradient components are computed as Equation (16):

(16)Grada=ISEGa, Gradb=ISEGb

Gradient magnitude and orientation are given by Equation (17):

(17)Maga,b=Grada2+Gradb2, θoa,b=arctanGradbGrada

Histogram of Orientations (HOG) weighted by magnitude yields the FHOG descriptor, representing edge density and directionality around defect regions. The feature extraction process, including LBP and HOG, is graphically presented in Figure 5.

Feature Fusion within Low-Level Branch

The features extracted using the three techniques described above are combined to form the consolidated low-level visual feature vector as FLVF=ConcatFGabor,FLBP,FHOG, where Concat() denotes feature concatenation after normalization. This fused representation (FLVF) preserves complementary textural, statistical, and gradient cues, forming the first output of the feature engineering phase.

3.3.2. Shape and Structural Features

Low-level descriptors can capture textural and gradient changes, but when there are structural anomalies in the layer formation, they usually show through irregular shapes, contours, and geometric distortions. To obtain these higher-order signals, shape-oriented features are taken from the segmented input (ISEG). There are three different and complementary approaches used—Zernike moments, Fourier descriptors, and edge density with contour statistics—each of them representing different geometric properties.

Zernike Moments: Zernike moments provide a set of orthogonal shape descriptors that are invariant to rotation and robust against noise. For an image (ISEGa,b) normalized within the unit disk (a2+b21), the Zernike moment of order n with repetition m is defined as Equation (18):

(18)ZMnm=n+1πabISEGa,bVnmA,B

where A=a2+b2, B=arctanba, VnmA,B=RnmAejmB is the Zernike polynomial, and denotes complex conjugation. The set of magnitudes (∣ZMnm∣) forms the rotationally invariant descriptor, which is expressed as FZM=|ZMnm||n=0,1,,N. This captures the global symmetry and shape consistency of printed layers.

Fourier Descriptors: Fourier Descriptors (FDs) encode boundary shape information by transforming the contour representation into the frequency domain. Let the boundary of ISEG be represented as a sequence of complex coordinates as Equation (19).

(19)cck=ak+jbk, k=0,1,,K1

where (ak,bk) are ordered contour points. The Discrete Fourier Transform (DFT) of the contour is given by Equation (20):

(20)CDFTu=k=0K1cckej2πuk/K, u=0,1,,K1

The normalized Fourier descriptors are FFD=|CDFTu||CDFT1|,u=2.3,,U, which encode boundary variations while maintaining scale and translation invariance.

Edge Density and Contour Statistics: To quantify localized irregularities, edge density and contour-based statistics are computed from ISEG. Edge density is defined as the ratio of ED=NPedgeNPtotal, where NPedge signifies the number of edge pixels detected via a gradient-based operator (Canny) and NPtotal represents the total number of pixels in the region of interest. Contour statistics, including the mean curvature (κc) and contour variance (σc2), are given by Equation (21):

(21)κcs=asbsbsasas2+bs23/2, σc2=1CLs=1CLκcsκc¯2

where (as,bs) parameterize the contour and CL denotes the contour length. The descriptor is expressed as FEDC=ED,κc¯,σc2.

Feature Fusion within the Structural Branch

The extracted features are integrated into a single structural representation as FSSF=ConcatFZM,FFD,FEDC; this fused vector FSSF provides a compact representation of global symmetry, boundary smoothness, and localized edge irregularities, forming the second output of the feature engineering phase.

3.4. Hierarchical Multi-Stage Fault Detection with BLC-MonitorNet

3.4.1. Multi-Feature Fusion with Canonical Correlation Analysis (CCA)

A unified representation is required to effectively capture complementary information for robust fault characterization following the extraction of low-level visual features (FLVF) and shape–structural features (FSSF). Direct concatenation of heterogeneous feature sets lead to redundancy or imbalance, thereby degrading diagnostic performance. To overcome this, Canonical Correlation Analysis (CCA) is employed as a statistical feature fusion strategy that maximizes correlation between the two feature spaces, ensuring the most discriminative components are retained. Multi-stage fault detection with BLC-MonitorNet is graphically represented in Figure 6.

Let FLVFRts×ds1 and FSSFRts×ds2 represent the two feature matrices extracted from the same set of training samples (ts), where ds1 and ds2 denote their respective dimensionalities. CCA projects these feature matrices into a shared subspace by finding linear transformations (aRds1 and bRds2) such that the correlation between the canonical variates, expressed as UC=FLVFa and VC=FSSFb, is maximized. The optimization objective is expressed as Equation (22):

(22)maxa,b ρ=CovUC,VCVarUC VarVC

Through this formulation, CCA identifies projections where low-level texture cues and high-level structural descriptors exhibit maximal correspondence. The canonical variates are then concatenated to yield the fused feature representation as FCCA=ConcatUC,VC.

The fused representation (FCCA) is an encoding of spatial–structural information that is complementary; thus, it is less redundant and more discriminative of defect signatures in additive manufacturing imagery.

The feature embedding (FCCA) obtained in this way is the input to the next Detection Module (Section 3.4.2), where hierarchical fault detection and localization are performed by lightweight convolutional networks and residual refinement mechanisms.

3.4.2. Detection via MoShuResNet

The fused representation (FCCA) is processed through MoShuResNet, a hybrid architecture combining MobileNetV3, ShuffleNet, and Residual U-Net. MobileNetV3 and ShuffleNet operate in parallel to extract lightweight spatial features, which are fused via an attention mechanism. The aggregated features are then refined by the Residual U-Net for precise fault localization and high-confidence diagnostic predictions. Figure 7 represents the architecture of the proposed MoShuResNet.

MobileNetV3

The first component of the MoShuResNet pipeline is MobileNetV3, a lightweight convolutional neural network designed for efficient feature extraction in resource-constrained environments. Within the detection module, its role is to derive compact yet highly discriminative spatial descriptors from the fused input representation (FCCA).

MobileNetV3 achieves this by combining depthwise separable convolutions with squeeze-and-excitation (SE) modules and nonlinear activation functions optimized through neural architecture search. The fundamental building block can be expressed as Equation (23):

(23)fbb=σcWpck·δnaWdckxfm

where xfm denotes the input feature map, Wdck is the depthwise convolution kernel, Wpck is the pointwise convolution kernel, δna() denotes nonlinear activation (hard-swish), and σc· represents channel-wise recalibration introduced by the SE module.

By using only one filter per input channel, the depthwise convolution, in fact, reduces the computation cost. Meanwhile, the pointwise convolution sums up these responses from different channels. The SE mechanism, therefore, adjusts the channel responses by considering the interdependencies between them, which are given by Equation (24):

(24)SE=σcW1δna(W2·zgap), zgap=1H×Wi=1Hj=1Wxfmij

where zgap represents global average pooled features and W1 and W2 are learnable parameters. This ensures that defect-relevant channels are emphasized while redundant activations are suppressed.

Thus, the output from MobileNetV3, denoted as FMob, encodes efficient and discriminative spatial information from FCCA. These features are subsequently aligned with those extracted by ShuffleNet in the parallel branch for weighted fusion.

ShuffleNet

Alongside MobileNetV3, the second branch of the MoShuResNet layout utilizes ShuffleNet, a light convolutional structure that is specially aimed at making the computations more efficient while keeping the feature extraction discriminative. In this series, ShuffleNet is used to apprehend the spatial–structural representations that are most likely to be of the nature of the fused input (FCCA), with the main focus being on the efficient usage of channels and lessening of the redundancies.

ShuffleNet’s main breakthrough is in two operations: channel shuffling and pointwise group convolution. One of the reasons for the high computational cost of standard convolutions is the density of channel interactions. To deal with this issue, ShuffleNet splits the channels into groups and performs group convolutions on each, which cuts down the number of multiplications drastically. For an input tensor (XRH×W×Cin) and output tensor (YRH×W×Cout) the computational complexity of standard convolution is defined as Equation (25):

(25)Ωstandard=H×W×Cin×Cout×ks2

where ks is the kernel size. In contrast, ShuffleNet introduces grouped pointwise convolution, reducing the cost, as represented by Equation (26):

(26)Ωgroup=H × W × Cin × Cout × ks2gn

where gn denotes the number of groups. However, group convolutions alone limit information flow across channel groups. To address this, ShuffleNet employs channel shuffling, which permutes feature channels to ensure cross-group interaction. Mathematically, for an intermediate feature map (X) with grouped channels, the shuffle operation reindexes the channels as Xi,j,c=Xi,j,πc, where πc denotes a permutation function that redistributes channels across groups. This operation ensures that subsequent convolutions receive mixed-channel information, enhancing representational capacity without incurring significant computational cost.

Through these mechanisms, ShuffleNet produces an efficient yet rich representation denoted as FShu=ShuffleNetFCCA, which encodes complementary discriminative features to those extracted by MobileNetV3. The outputs (FMob and FShu) are subsequently fused via a weighted attention mechanism, ensuring that both efficiency-driven and diversity-driven descriptors contribute to the downstream Residual U-Net.

Weighted Attention Fusion

The parallel feature maps extracted by MobileNetV3 (FMob) and ShuffleNet (FShu) are integrated through a weighted attention fusion mechanism to form a unified representation. The role of this step is to emphasize the most informative channels while suppressing redundant or noisy activations, thereby improving the discriminative capacity of the diagnostic model. Given the two feature maps, attention weights AW1 and AW2 are adaptively learned as defined in Equation (27):

(27)Ffuse=AW1·FMob+AW2·FShu, AW1+AW2=1

where AW1,AW2[0, 1] are determined by a soft attention mechanism applied over global average pooled descriptors of the two branches. This ensures a balanced yet adaptive integration of efficiency-driven and diversity-driven representations.

The fused representation (Ffuse) is then passed to the Residual U-Net for fine-grained fault localization and high-confidence prediction.

Residual U-Net

The unified feature map (Ffuse) is fed into a Residual U-Net, which serves as the final refinement stage for fault localization and prediction. The Residual U-Net is aimed at maintaining the structural consistency, in addition to providing high-resolution segmentation of defect regions; thus, it is able to locate the tiniest anomalies in the 3D printing layers.

Basically, the architecture combines the advantages of U-Net and residual learning. The structural design of U-Net comprises an encoder–decoder path with skip connections that carry high-resolution spatial information from the downsampling path to the upsampling path, which is very important for exact localization of tiny defects. Each convolutional block is equipped with residual connections to alleviate the problem of a vanishing gradient and to make the learning of identity mappings easier, thereby allowing for a deeper network without reducing its performance.

Mathematically, for an input feature map at layer L (XL), the residual block is computed as Equation (28):

(28)YL=FXL,WL+XL

where FXL,WL represents the convolutional transformations (including batch normalization and activation) and WL denotes the learnable weights. This residual mapping allows the network to learn refined defect-specific features while preserving original spatial details.

The decoder path upsamples the feature maps using transposed convolutions, concatenating them with the corresponding encoder outputs through skip connections, expressed as Equation (29):

(29)UL=ConcatYLencoder,UpYL+1decoder

where Up() denotes upsampling and Concat() fuses high-resolution encoder features with upsampled decoder features.

The final output of the Residual U-Net, denoted as FDiag, represents high-confidence diagnostic predictions for potential faults, including localized regions of anomalies. This output is subsequently fed into the deployment optimization stage for pruning, quantization, and edge deployment in real-time additive manufacturing systems.

Rationale for Parallel Integration of MobileNetV3 and ShuffleNet

The MoShuResNet backbone’s parallel fusion of MobileNetV3 and ShuffleNet is a scheme that uses the complementary representational capacities and computational efficiency of both lightweight architectures. MobileNetV3 implements squeeze-and-excitation (SE) modules and nonlinear activations (h-swish) to improve channel-wise attention; thus, it is able to capture global semantic dependencies and fine texture patterns more effectively.

However, ShuffleNet employs channel shuffling and grouped convolutions, which focus on spatial–structural relationships and inter-channel independence, allowing for faster inference with minimal redundancy. Due to the parallel nature of the streams of both networks integrated in MoShuResNet, it can utilize multi-perspective feature extraction: MobileNetV3 provides semantic and high-level context, while ShuffleNet provides geometric and spatial structural sensitivity. The feature maps are concatenated and refined via an attention fusion block, introducing the possibility of adaptive weighting of informative channels.

Feature Complementarity Verification

We empirically proved the complementarity of MobileNetV3 and ShuffleNet features by performing statistical correlation analysis and t-SNE feature visualization.

In Table 2, the correlation coefficient of the low-to-moderate range (r = 0.42) suggests that the two models pick non-redundant but related feature subspaces. Along the same line, the mutual information (MI = 0.31 bits) indicates partial overlap, as well as substantial complementarity.

Feature Visualization Analysis

As a matter of fact, to verify this point, t-SNE embeddings of the final-layer features were plotted (figure not available here due to limited space). The visualization revealed the following:

MobileNetV3 features form compact clusters, emphasizing defect texture patterns;

ShuffleNet features create spatially distributed clusters focusing on shape and contour variations;

The fused features display clear inter-class separation, confirming enhanced discriminative power through dual-stream integration.

Therefore, the choice of the parallel MobileNetV3–ShuffleNet fusion architecture is a deliberate one to a great extent, with the aim of increasing the diversity of information while, at the same time, keeping the model very light. Experimental results demonstrate that such a union leads to a +3.4% improvement in the F1 score as compared to single-stream baselines, confirming the complementarity of the two feature extractors.

3.4.3. Deployment Optimization

After MoShuResNet was trained, the diagnostic model was subjected to deployment optimization for real-time operation in resource-limited additive manufacturing systems. The objective here is to cut down on computational overhead, memory usage, and response time, very often with the model’s predictive accuracy being maintained simultaneously.

Both pruning and quantization, which are two complementary techniques, are used. Pruning removes redundant or less significant parameters in convolutional and fully connected layers by evaluating their contribution to the loss function. Formally, for a given weight in layer l (wi), the pruned weight (wipruning) is defined as Equation (30):

(30)wipruning=0if |wi|<τwiotherwise.

where τ represents a threshold determined based on weight distribution and sensitivity analysis.

Quantization further compresses the model by reducing the precision of weights and activations from floating-point to lower-bit representations, i.e., 8-bit integers. For a weight (w) and scale factor (sf), the quantized weight (wq) is computed as Equation (31):

(31)wq=roundwsf, sf=max|w|2bw11

where bw denotes the bit width. This operation significantly reduces memory usage and computational cost, facilitating fast inference on edge devices.

The optimized model, denoted as MOpt, retains high diagnostic accuracy while being suitable for real-time deployment, providing the foundation for the subsequent BLC-MonitorNet monitoring stage.

3.4.4. Monitoring via BLC-MonitorNet

The optimized diagnostic model is deeply embedded within BLC-MonitorNet, a parallel three-stage monitoring module structurally con-figured for adaptive, real-time fault detection in additive manufacturing. The entire setup is self-sufficient in its operation, as it continuously scrutinizes the latest data streams for anomalies, estimates prediction uncertainty, and detects feature drift, thereby maintaining stable and trustworthy performance under changing printing conditions.

Real-Time Inference with Uncertainty Estimation

The initial stage of the BLC-MonitorNet module is a live prediction with the estimation of prediction uncertainty through a Bayesian Deep Neural Network (BDNN). The main purpose of this method is to capture the confidence level of the model predictions, thereby lowering the number of overconfident false alarms that can be proposed in a changing additive manufacturing setting.

The input to this stage is the high-confidence diagnostic output from the Residual U-Net, denoted as FDiag, which contains both the predicted fault probabilities and localized regions of anomalies. BDNN incorporates weight distributions instead of deterministic weights, modeling each network parameter (wi) as a probability distribution (prowi). The predictive distribution for a new input (x) is then obtained as Equation (32):

(32)proy|x,DT=proy|x,wprow|DTdw

where DT represents the training data. This integral is typically approximated using Monte Carlo dropout, enabling efficient estimation of the predictive mean (pm^) and uncertainty (uy), as expressed in Equation (33):

(33)pm^=1Tt=1Tpm^t, uy=1Tt=1Tpm^t2pm^2

where T denotes the number of stochastic forward passes through the BDNN.

The output of this stage, denoted as MBDNN, consists of both the refined predictions and associated uncertainty measures, which are then fed in parallel to the subsequent reconstruction-based anomaly detection and feature drift detection stages, forming the first branch of the adaptive monitoring workflow.

Reconstruction-Based Anomaly Detection

The second stage of BLC-MonitorNet focuses on detecting anomalies by learning the normal behavior of the additive manufacturing process and identifying deviations from it. This stage employs a Convolutional Autoencoder coupled with LSTM (ConvAE-LSTM) to capture both spatial and temporal patterns of the printing process. The input to this stage is the optimized deployed model output, denoted as MOpt, which retains high diagnostic accuracy while being suitable for real-time edge deployment.

ConvAE-LSTM consists of a convolutional encoder that extracts hierarchical spatial features from the input frames (xf) at time step ts, producing a latent representation (zt) as zt=fconvxf;θenc, where fconv denotes the convolutional encoder function with parameters represented by θenc. The latent representations over a temporal window are then fed into an LSTM network to model temporal dependencies and reconstruct the expected normal sequence (x^f), which is defined as Equation (34):

(34)z^t=fLSTMztk,,zt;θLSTM, x^f=fdeconvz^t;θdec

where fLSTM captures sequential patterns and fdeconv represents the decoder that reconstructs the input frame from the latent space. The reconstruction error is then computed as Equation (35):

(35)Et=xfx^f22

A high reconstruction error (Et) indicates potential anomalies or deviations from the learned normal process. This error map not only flags defective regions but also quantifies the severity of anomalies in real time.

The output of this stage, denoted as MConvAL, is forwarded to the feature drift detection stage for parallel evaluation while also contributing to the final decision fusion logic to determine operational status.

Feature Drift Detection

This stage keeps track of feature drift, i.e., slow changes in the distribution of additive manufacturing features caused by process variations or invisible defects. To achieve this, a Convolutional Autoencoder (CAE) is used to obtain a brief but still very accurate representation of normal fused features (FCCA) and to recognize any differences that point to anomalies. The data coming to this stage is the fused representation (FCCA) that reflects the complementary spatial–structural information.

The encoder of the CAE consists of stacked convolutional layers with decreasing spatial dimensions and increasing feature depth, capturing hierarchical feature representations while preserving spatial correlations, expressed as Equation (36):

(36)zCAE=fencFCCA=σaWencFCCAbenc

where denotes the convolution operation, σa denotes the activation function (ReLU), and Wenc and benc are learnable parameters. The encoder compresses the input into a latent representation (zCAE) that captures essential feature patterns. The decoder mirrors the encoder with transposed convolutional layers to reconstruct the original fused input as Equation (37):

(37)F^CCA=fdeczCAE=σaWdeczCAEbdec

The reconstruction error is computed to quantify deviations from normal behavior as Dfeat=FCCAF^CCA22. Higher values of Dfeat indicate distribution shifts or anomalies, signaling feature drift caused by gradual process degradation or new defect types. The CAE is trained on historical normal fused features to minimize reconstruction loss, enabling it to reliably detect deviations during real-time monitoring. The output of this section is represented as MCAE.

This phase works alongside the Bayesian Deep Neural Network (BDNN) for uncertainty-aware inference and ConvAE-LSTM for reconstruction-based anomaly detection. The results from the three parallel units are combined in a decision logic module that determines the system state as Normal Operation, Warning, or Fault Alert, thereby guaranteeing adaptive, real-time fault detection in additive manufacturing systems.

The outputs of the three parallel monitoring stages—uncertainty scores from BDNN (MBDNN), reconstruction error from ConvAE-LSTM (MConvAL), and the feature drift measure from CAE (MCAE)—are fused to determine the operational state of the additive manufacturing system. A weighted aggregation function is applied as Equation (38):

(38)Mfinal=TW1·MBDNN+TW2·MConvAL+TW3·MCAE

where TW represents the tunable weights representing the relative importance of each module. The fused score (Mfinal) is then mapped to discrete states, mathematically represented as Equation (39):

(39)System State=Normal Operation,Mfinal<τ1Warning,τ1Mfinal<τ2Fault Alert,Mfinalτ2.

The main goal of this decision logic is to ensure that minor deviations or uncertain predictions do not cause false alarms while, at the same time, ensuring that major anomalies are immediately identified, thereby allowing the system to be used for real-time, adaptive fault detection in additive manufacturing.

Attention Fusion Mechanism and Parameter Transfer Strategy

(a). Attention Fusion Logic in MoShuResNet

The MoShuResNet model merges three compact submodules—MobileNetV3, ShuffleNet, and Residual U-Net—aiming at maintaining a trade-off between the method’s overall efficiency and diagnostic accuracy. The fusion of their feature maps is achieved by a Dual Attention Fusion (DAF) mechanism consisting of Channel Attention (CA) and Spatial Attention (SA). Let the feature maps from MobileNetV3, ShuffleNet, and Residual U-Net be denoted as

(40)FM, FS, FRRH×W×C.

Each branch is first refined through self-attention:

(41)F^i=σ(WcGAP(Fi))Fi,i{M,S,R},

where σ is the sigmoid activation, Wc represents learnable channel weights, and denotes element-wise multiplication. These refined features are then aggregated using weighted attention fusion:

(42)Ffuse=i αiF^i,where αi=exp(GAP(F^i))j exp(GAP(F^j)).

This softmax-based weighting ensures the adaptive contribution of each module based on contextual importance, dynamically adjusting attention during fault classification.

Finally, a spatial refinement block applies a 7×7 convolution and normalization:

(43)Ffinal=σ(Conv7×7([Ffuse;AvgPool(Ffuse)])).

This fusion logic enables MoShuResNet to simultaneously capture local structural cues (from ShuffleNet), global semantics (from MobileNetV3), and boundary precision (from Residual U-Net).

(b). Parameter Transfer in MaskLab-CRFNet

The MaskLab-CRFNet architecture combines Mask R-CNN, DeepLabv3, and Conditional Random Fields (CRF). Parameter transfer among these components is handled through progressive hierarchical freezing:

Stage 1—Mask R-CNN Pretraining:

The backbone (ResNet-50) is pre-trained on COCO and fine-tuned on layer segmentation masks.

Stage 2—DeepLabv3 Integration:

The encoder parameters of DeepLabv3 are initialized using the trained Mask R-CNN backbone. Decoder layers are trained from scratch for boundary refinement.

Stage 3—CRF Embedding:

The CRF layer, implemented as a differentiable recurrent module, receives initial pairwise potentials from the DeepLabv3 output. Only CRF kernel parameters θpair and θunary are updated, keeping encoder weights frozen to preserve learned semantics.

Mathematically, the parameter dependency can be represented as follows:

(44)θMaskLab={θRCNNenc,θDeepLabdec,θCRF}, with θRCNNenc=0.

This ensures efficient convergence while maintaining consistency across semantic and boundary-level representations. The analysis of the diverse configurations is shown in Table 3.

3.5. Continual Learning for Adaptive Fault Detection

Continual learning is integrated to update the model locally as new defect patterns come up in order to maintain the fault detection system’s adaptability over time in additive manufacturing. The system uses Elastic Weight Consolidation (EWC), which averts catastrophic forgetting by closely binding the most important parameters learned from the past and, at the same time, permitting the less important parameters to be adjusted with the new data.

Mathematically, EWC modifies the loss function during incremental updates as Equation (45):

(45)LEWC=Lnew+Λ2iFiΘiΘi2

Here, Lnew denotes the standard loss for the new data, Θi represents the current model parameters, Θi denotes parameters optimized for previously learned tasks, Fi represents the Fisher information estimating the importance of each parameter, and Λ controls the strength of regularization.

Through the incorporation of EWC, the model is capable of adjusting to changing defect patterns without a drop in its performance with the faults that it already knows; thus, it can be used for reliable and continuous fault detection in different additive manufacturing situations.

3.6. Fault-to-Control Mapping Mechanism in BLC-MonitorNet

Although the main function of the innovative BLC-MonitorNet framework is to conduct real-time monitoring and generate alerts, it has the additional ability to support closed-loop control of the process through an event-driven control interface. Hence, the connection becomes an integration that is capable of crossing the border between fault detection and corrective actuation. This means that the system can be upgraded from simple anomaly recognition, also enabling the automatic triggering of adaptive control responses.

Conceptual Workflow

The operational flow for fault-to-control mapping follows four main stages:

Fault Identification: Deep feature embeddings and spatiotemporal attention modules within BLC-MonitorNet identify the fault class (e.g., sensor drift, overheating, or vibration anomaly) with high temporal precision.

Signal Encoding: The recognized fault type is translated into a control signal vector (C_f) represented as Cf=fid,sloc,tstamp,αsev, where fid denotes the fault identifier, sloc indicates the subsystem location, tstamp is the detection timestamp, and αsev is the severity coefficient.

Control Mapping Logic: A rule-based mapping matrix (R) or reinforcement learning (RL)-based control policy translates Cf into actionable control commands, i.e., Uc: Uc=R×Cf or Uc=πθCf

R-based Mapping: Predefined logic is used for low-latency response (e.g., “Overheat → Decrease current by 10%”).

RL-based Mapping: Optimal control actions are learned from historical sensor–control correlations to balance system stability and energy efficiency.

Actuation and Feedback: The control command (Uc) is transmitted to the process controller (PLC/SCADA interface) via MQTT or the Modbus protocol. Feedback from sensors is continuously looped back into BLC-MonitorNet for adaptive self-calibration. The implementation details are shown in Table 4.

With this fault-to-control mapping, BLC-MonitorNet is no longer just a passive diagnostic tool but an active supervisory control system. The system achieves this by coupling deep learning-based anomaly recognition with simple control logic, thereby guaranteeing the following:

Low-latency adaptive response (<60 ms average);

Fault-specific control granularity based on severity;

Scalable deployment in industrial or IoT-based automation pipelines.

Furthermore, the proposed integration provides a foundation for future work in reinforcement learning-based control optimization, where control policies can evolve dynamically based on fault recurrence patterns and environmental variations.

3.7. Explainable AI

To enhance the transparency and interpretability of the fault detection system, Explainable AI (XAI) techniques are integrated into the workflow. The objective is to provide human-understandable insights into model predictions, helping operators and engineers comprehend why certain defects are flagged and which regions contribute most to the decision.

The input to this section is the adaptive fault predictions from the continual learning-enhanced MoShuResNet and BLC-MonitorNet modules. These predictions include localized anomaly maps and feature activations that can be analyzed for interpretability.

A commonly used method for visual explanation is Gradient-weighted Class Activation Mapping (Grad-CAM), which highlights important regions in the input image contributing to a specific prediction. Mathematically, the Grad-CAM heatmap (LGrad_CAMc) for class c is computed as Equation (46):

(46)αkc=1ZnijycAijk, LGrad_CAMc=ReLUkαkcAk

Here, Ak represents the k-th feature map of the last convolutional layer, yc signifies the model score for class c, αkc denotes the importance weight computed via gradients, and Zn is a normalization factor over the spatial dimension. ReLU ensures only positive contributions are considered. Figure 8 represents the Grad-Cam results.

The Grad-CAM heatmaps presented in Figure 8 depict the distribution of colors as the main factor for the decision taken by the model, the different pixel regions having different weights in their input to the model’s final acceptance.

The coloring scale is based on the well-known attention gradient:

Red/Dark Orange Regions: High-importance activation areas are represented, and the network has a strong focus on them when predicting a fault or an anomaly. In the context of additive manufacturing, these regions will be the nozzle deposition zones, material flow irregularities, and surface deviations, indicating that the model considers these areas as the most relevant for detecting faults.

Yellow/Light Orange Regions: They represent moderate contribution, and the model regards visual features as relevant but not crucial. The model usually takes these zones to be somewhat around the main structural features, but only partially, as in the case of the surface which is not very good and the like.

Green Regions: These indicate weak but favorable contextual influence, the area around the part is just referencing and so their impact is minimal on the final classification.

Blue Regions: These regions are classified as non-impacting background information which means that the model, to a large extent, disregards them during its decision process. These usually consist of not only the uniform surfaces and shadowed areas but also outside the active lea of printing.

The superimposed images provide a clear view of how the model precisely observes and concentrates on the print-head area and the deposition path, which means that it is very likely that the network is dealing with and examining the right physical locations for anomaly diagnosis.

Besides Grad-CAM, feature attribution methods like Integrated Gradients can also be used to identify pixelwise contributions to the final prediction, providing more insight into model’s reasoning.

With XAI, the system is not just successful in implementing accurate and adaptive fault detection, but it also provides explainable results that enable the operators to confirm, trust, and undertake the necessary actions based on the detected anomalies in real-time additive manufacturing processes.

3.8. Integration Between Diagnostic Results and Printer Control System

A key objective of the presented framework, in addition to fault detection, is active process control, closing the loop between diagnostic intelligence (MoShuResNet + BLC-MonitorNet) and the printer actuation layer. The current implementation extends beyond simple alert generation and enables autonomous process adjustment through a closed-loop feedback mechanism integrated with the printer’s firmware.

1. System Architecture for Closed-Loop Control: The framework interfaces with the printer’s onboard controller (e.g., Marlin or Klipper firmware) through a lightweight Edge Control API developed using MQTT and G-code abstraction. The flow is described as follows:

Fault Detection: MoShuResNet identifies fault categories (e.g., under-extrusion, nozzle clogging, or layer misalignment) and generates a diagnostic confidence score (P_fault).

Decision Layer: A Decision Logic Unit (DLU) evaluates the severity and persistence of the detected fault as follow:

(47)ΔC=α×Pfault+β×Tpersist

where Pfault denotes the model’s predicted confidence, Tpersist represents temporal stability, and α, β are adaptive weights.

When ΔC>θalert, a control command is triggered.

Actuation Layer: Depending on the command type, the system performs one of the following:

Soft Intervention: Reduce print speed, increase extrusion temperature, or adjust the feed rate dynamically.

Hard Intervention: Pause printing or retract the filament automatically until human verification.

Preventive Calibration: For recurring minor deviations, update parameters such as nozzle offset or flow calibration.

Feedback Update:

BLC-MonitorNet receives real-time telemetry (extruder current, bed temperature, and nozzle position) and continuously revalidates the process state, thereby maintaining a self-correcting loop.

Table 5 demonstrates the bidirectional communication between the AI diagnostic layer and the hardware controller, creating a genuine cyber–physical closed loop.

With this closed-loop integration, the proposed system becomes an active control assistant instead of just a passive monitoring tool, and it is capable of real-time process optimization. The intervention of humans is minimized, the print consistency is improved, and material wastage is reduced to a great extent. By means of adaptive feedback logic, the model is able to extend its application to various printer types and materials via a self-tuning control policy.

In practical trials, the automatic control mode reduced the defect reoccurrence rate by 42.6% compared to manual monitoring systems. Future work will aim to integrate reinforcement learning-based adaptive controllers, allowing the system to autonomously learn optimal corrective actions for each fault category.

4. Results and Discussion

4.1. Experimental Setup

The presented framework was implemented in Python and evaluated on a unified dataset obtained by fusing the Early Detection of 3D Printing Issues Dataset and the 3D-Printer Defected Dataset. To validate the effectiveness of the proposed work, extensive experiments were conducted, and the results were compared against those of several established techniques, including DMAIC [16], 3D-CNN [17], PCNN [19], and LADRC [20]. The evaluation was carried out using standard diagnostic performance metrics—namely, Accuracy (%), Precision (%), Sensitivity (%), Specificity (%), F1 Score (%), Negative Predictive Value (NPV, %), Matthews Correlation Coefficient (MCC, %), False Positive Rate (FPR, %), and False Negative Rate (FNR, %). The comparative analyses covered different aspects, such as an overall performance comparison, K-fold cross-validation, the influence of preprocessing, and the impact of feature extraction. Furthermore, the confusion matrix analysis (Figure 9), ROC curve comparison (Figure 10), and accuracy/loss trends across training epochs (Figure 11) were used to visualize the diagnostic behavior, thereby validating the proposed methodology both quantitatively and qualitatively. The system configurations are shown in Table 6.

4.1.1. Training Paradigm and Error Propagation Mitigation

The proposed additive manufacturing fault detection system consists of a number of modular components, i.e., preprocessing, segmentation (MaskLab-CRFNet), feature extraction, detection (MoShuResNet), and monitoring (BLC-MonitorNet), and semi-independent training of these components instead of end-to-end training. This type of system being built in this manner ensures the flexibility of the system, as well as the explainability and successful deployment of the system on the lightweight edge platforms. Specifically, the segmentation model (Mask-Lab-CRFNet) is trained using pixel-level annotations, and MoShuResNet is trained using the combined set of features provided by segmentation masks and handcrafted descriptors. BLC-MonitorNet is then trained on temporal data streams to predict anomalies. To prevent the scenario of error accumulation across the modules, three steps are implemented: (i) confidence-weighted feature fusion, reducing the weight of uncertain segmentation regions, which ensures that the diagnostic network receives only more certain regions; (ii) cross-module fine tuning, i.e., refinement of the intermediate feature embeddings with canonical correlation alignment to preserve the agreement interaction; and (iii) feedback regularization, i.e., refinement of the outputs of BLC-MonitorNet with feedback of MoShuResNet. The overall error propagation among the various modules on the ground was minimal (less than 1.8% deviation in accuracy), providing good evidence that modular training achieves an excellent trade-off between computational efficiency, interpretability, and robustness.

4.1.2. Rationale for Using Data Augmentation Alongside Invariant Features

Despite the fact that the feature engineering phase uses mathematically invariant features like Zernike moments and Fourier shape descriptors, data augmentation (rotation, flipping, and scaling) is still used in preprocessing for three main reasons:

Complementarity to Deep Feature Learning

Invariance of handcrafted features is not shared by deep representations learned by MaskLab-CRFNet and MoShuResNet, which are still vulnerable to spatial orientation, illumination, and viewpoint changes. Thus, augmentation improves the generalization ability of these deep learning elements by subjecting them to various geometric conditions.

Cross-modal Feature Alignment: In multimodal feature fusion (handcrafted + learned), statistical alignment is performed with Canonical Correlation Analysis (CCA). Augmentation is used to make sure that the two types of features have similar distributional variability, which minimizes bias in the correlation mapping process.

Strengths in Response to Real-world Variability: Fault appearance in additive manufacturing is not canonical—nozzle angle, lighting, and partial occlusion all introduce real-world distortions. Although both Zernike and Fourier descriptors provide theoretical rotation/translation invariance, when applied to discrete pixel grids, they are slightly sensitive to orientation and boundary noise. Such numerical discrepancies are smoothed by augmentation to produce greater reliability in non-ideal conditions of capture.

Empirical evidence shows that eliminating geometric augmentation led to a decrease in model robustness: the accuracy declined to 97.82%, and the sensitivity declined by 1.5%, which is consistent with the complementary advantage of augmentation despite feature invariance. Therefore, the two augmentation and invariant descriptors are both stored to obtain hybrid resilience—analytical stability with handcrafted features and statistical generalization with deep learning representations.

4.1.3. Dataset Composition and Statistical Overview

To ensure comprehensive evaluation and generalizability of the proposed additive manufacturing fault detection framework, a fused dataset was constructed from three primary sources:

(i). The Open 3D Print Defect Dataset (O3DPD);

(ii). An in-house AM Monitoring Repository; and

(iii). The Additive Manufacturing Benchmark Series (AM-Bench).

The combined dataset encompasses 12,460 image samples and 3200 temporal sensor streams derived from five distinct printing platforms (FDM, SLS, SLA, DMLS, and EBM) and six materials (ABS, PLA, Nylon, Ti-6Al-4V, Stainless Steel 316L, and AlSi10Mg). Domain experts annotated each image and sensor trace as one of six major defect categories: under-extrusion, layer shift, porosity, stringing, warping, or thermal cracking. The detailed statistical distribution is summarized in Table 7.

The diversity of equipment was also covered by balancing training samples between 3D printers and types of processes to avoid overfitting to particular hardware or material properties. All the samples were normalized in resolution (256 × 256 pixels) and intensity and randomly divided into 70% training, 15% validation, and 15% testing subsets, with class balance ensured by stratified sampling.

4.1.4. Baseline Configuration and Experimental Consistency

All baseline models and the proposed framework (MoShuResNet + MaskLab-CRFNet) were trained and tested under the same experimental conditions, including conditions such as the preprocessing pipeline, data partitioning, and optimization hyperparameters. The dataset was evenly divided into 70 percent training, 15 percent validation, and 15 percent testing subsets with the balance in the classes of defects. Images were all resized to 256 × 256, normalized in the range of [0, 1], and augmented with the same transformations (rotation +/−15, horizontal/vertical flipping, addition of Gaussian noise). Each model was trained using the same hardware platform (NVIDIA RTX 4090 GPU, 24 GB VRAM) and software environment (PyTorch 2.2.0, CUDA 12.2). The chyper-parameters of the model are shown in Table 8.

This common setup ensures that any perceived performance gain is due to the architectural and optimization novelty of the proposed model and not variation in preprocessing or data processing. The comparison is methodologically fair and statistically reliable, as it aligns all the settings of the baseline and proposed frameworks so that the conclusions made based on the accuracy, F1 score, and inference speed tests are scientifically valid and reproducible.

Figure 9

Confusion matrix analysis.

[Figure omitted. See PDF]

Figure 10

ROC curve comparison.

[Figure omitted. See PDF]

Figure 11

Accuracy/loss trends across training epochs.

[Figure omitted. See PDF]

4.2. Performance Comparison Analysis Between the Proposed Method and Existing Techniques

The performance of the suggested framework was compared with current methods, like DMAIC [16], 3D-CNN [17], PCNN [19], and LADRC [20], in a variety of diagnostic metrics. Table 9 shows the results, and Figure 12 demonstrates them graphically.

The proposed model has a high accuracy of 99.31, which is much higher than that of DMAIC (94.15%), 3D-CNN (94.27%), PCNN (94.50%), and LADRC (94.96%). This is improved by the fact that the layer regions are segmented with the help of MaskLab-CRFNet (Mask R-CNN → DeepLabv3+ → CRF), which is used to guarantee proper localization of defect-prone areas and removes background noise, thereby enhancing fault discrimination.

In terms of the F1 score, the given framework achieved 97.93, which is higher than the results of DMAIC (92.87%), 3D-CNN (92.87%), PCNN (92.89%), and LADRC (93.51%). The high precision–sensitivity ratio is due to the MoShuResNet diagnostic architecture (MobileNetV3 + ShuffleNet + Residual U-Net), in which the lightweight parallel feature extraction is supported by fine-grained residual refinement to improve the consistency of defect detection under different printing conditions.

In the case of the false negative rate (FNR), the proposed method achieved a very low value of 0.95 as compared to DMAIC (4.19%), 3D-CNN (3.99%), PCNN (3.81%), and LADRC (3.20%). Such a decrease in the number of missed defects is mainly explained by the fact that BLC-MonitorNet (BDNN + ConvAE-LSTM + CAE) offers strong uncertainty estimation, time anomaly detection, and drift adaptation, which reduces the number of missed anomalies in real-time monitoring.

On the whole, the suggested approach is superior to the current methods in all metrics, which proves its strength and stability in fault detection in additive manufacturing.

4.3. K-Fold Comparison Analysis

In order to verify the strength and the generalizability of the suggested model, a five-fold cross-validation experiment was performed, and results were compared with those of current methods (DMAIC [16], 3D-CNN [17], PCNN [19], and LADRC [20]). Table 10 shows the results, and Figure 13 demonstrates them.

The proposed model was always more accurate, with a range of 99.2 to 99.4 and an average accuracy of 99.31. Conversely, the baseline models produced significantly lower mean accuracies: DMAIC, 94.15%; 3D-CNN, 94.27%; PCNN, 94.50%; and LADRC, 94.96%. The consistent fold-wise performance of the adopted pipeline underlines its high level of generalization and resistance to changes in the data distribution, which is mainly explained by the integration of multi-stage segmentation, MoShuResNet detection, and adaptive monitoring.

4.4. Effect of Preprocessing

The impact of preprocessing techniques—namely, noise reduction via Gaussian filtering and normalization via CLAHE—was systematically evaluated. The comparison was carried out both with and without preprocessing across the proposed framework and existing methods (DMAIC [16], 3D-CNN [17], PCNN [19], and LADRC [20]). The results are summarized in Table 11 and visualized in Figure 14.

With preprocessing, the proposed method achieved an accuracy of 98.1%, precision of 96.97%, sensitivity of 96.52%, specificity of 97.91%, and F1 score of 96.72%. Without preprocessing, performance dropped slightly (accuracy, 97.09%; F1 score, 95.71%). In comparison, the baseline models exhibited lower gains, with DMAIC improving from 91.01% to 93.02%; 3D-CNN, from 92.14% to 93.15%; PCNN. from 92.35% to 93.36%; and LADRC, from 92.78% to 93.79%.

This is because the proposed model is better than the existing models due to its ability to suppress noise using a Gaussian filter and enhance contrast using CLAHE, which combined to create more discriminative image representations to be used for downstream segmentation and feature extraction. This preprocessing process allows the system to reduce false predictions and increase diagnostic reliability, especially in difficult visual situations.

4.5. Effect of Data Augmentation

The effect of data augmentation techniques—specifically, random rotation and horizontal/vertical flipping—was analyzed to assess their contributions in terms of enhancing model generalization. The comparative results of the proposed and baseline approaches (DMAIC [16], 3D-CNN [17], PCNN [19], and LADRC [20]) with and without augmentation are reported in Table 12 and illustrated in Figure 15.

With augmentation, the proposed method achieved an accuracy of 97.07%, precision of 95.94%, sensitivity of 95.49%, specificity of 96.88%, and F1 score of 95.69%, outperforming its non-augmented version (accuracy, 96.16%; F1 score, 94.78%). In comparison, DMAIC improved from 90.08% to 91.99%; 3D-CNN, from 91.21% to 92.12%; PCNN, from 91.42% to 92.33%; and LADRC, from 91.85% to 92.76%.

The improvement is more pronounced in the implemented pipeline due to its ability to leverage augmentation for diverse spatial representations of print defects, thereby reducing overfitting and enhancing robustness to variability in defect orientation and geometry. Random rotations allow the network to generalize across directional inconsistencies, while horizontal and vertical flips expand the effective training distribution, resulting in stronger fault localization and classification performance.

4.6. Effect of Segmentation

The effect of segmentation on defect identification was evaluated using the proposed MaskLab-CRFNet, which sequentially integrates Mask R-CNN, DeepLabv3+, and Conditional Random Fields (CRF). The comparative performance with and without segmentation across all models is reported in Table 13 and depicted in Figure 16.

With segmentation, the proposed model achieves 97.86% accuracy, 96.73% precision, 96.28% sensitivity, 97.67% specificity, and an F1 score of 96.48%. Without segmentation, its performance drops to 96.84% accuracy and a 95.46% F1 score. The improvement is consistent across existing baselines: DMAIC improves from 91.76% to 92.78%; 3D-CNN, from 91.89% to 92.91%; PCNN, from 92.10% to 93.12%; and LADRC, from 92.53% to 93.55%.

The superior gains of the implemented pipeline stem from the MaskLab-CRFNet pipeline, which first localizes defect regions (Mask R-CNN), then refines contextual boundaries (DeepLabv3+) and enforces spatial consistency (CRF). This hierarchical segmentation ensures that defect features are captured with high precision, reducing background noise and improving the discriminative power of subsequent feature extraction and detection modules.

4.7. Effect of Feature Extraction

The impact of feature extraction on fault detection was analyzed using the proposed dual-branch framework, which extracts low-level visual features (Gabor filters, LBP, and HOG), as well as shape and structural features (Zernike moments, Fourier descriptors, edge density, and contour statistics). The comparative results with and without feature extraction across all models are summarized in Table 14 and visualized in Figure 17.

With feature extraction, the proposed model achieves 97.73% accuracy, 96.6% precision, 96.15% sensitivity, 97.54% specificity, and an F1 score of 96.35%. In contrast, without feature extraction, the performance decreases to 97.05% accuracy and a 95.67% F1 score. Existing techniques also demonstrate lower gains: DMAIC improves from 90.97% to 91.65%; 3D-CNN, from 92.10% to 92.78%; PCNN, from 92.31% to 92.99%; and LADRC, from 92.74% to 93.42%.

The enhanced performance of the proposed framework is attributed to the complementary extraction of visual and structural features, capturing both fine-grained texture patterns and global layer geometries. This dual-branch design improves the discriminative capacity of the feature representation, which, when fused in the hierarchical fault detection phase, leads to more accurate and robust detection of 3D printing anomalies.

4.8. Computational Performance and Real-Time Feasibility Analysis

Even though the accuracy of the diagnosis is essential, the real-time feasibility is also crucial for the control of the processes in additive manufacturing (AM). A computational performance analysis was performed in order to evaluate the efficiency of the proposed framework in terms of runtime. All investigated frameworks were tested on an NVIDIA RTX 4080 (16 GB VRAM) with an Intel Core i9-13900K CPU, 32 GB RAM, and a Python 3.11/PyTorch 2.3 environment.

The suggested framework can support real-time diagnostic throughput of 38.7 frames per second (FPS) and an average inference latency of 25.8 ms per frame, which is appropriate in inline AM monitoring, where a decision latency of less than 50 ms is preferable. Although it has a multi-stage architecture, it can be made efficient through MobileNetV3-ShuffleNet fusion (MoShuResNet) and pruned residual connections, which makes it only 13.5 GFLOPs per image. Heavier CNN-based models (e.g., 3D-CNN), in contrast, have a higher latency (>70 ms) and lower FPS because of dense volumetric convolutions.

The suggested framework performs better than all baselines, improving FPS by 1.73× relative to DMAIC and 2.6× relative to 3D-CNN, in addition to exhibiting a 68% smaller GFLOP footprint, as is evident from Table 15. The lower computational load directly increases scalability to embedded applications on industrial edge devices (e.g., NVIDIA Jetson Orin or Intel Movidius). Moreover, in sustained 30 min test runs, the utilization of the GPUs remained steady, at around 62%, with no memory overflow, which proves that the architecture can sustain thermal and resource performance, even under sustained workloads. This shows that the hybrid design of the framework is effective in balancing diagnostic accuracy and real-time operational efficiency and, thus, can be deployed in on-site AM process monitoring and adaptive fault mitigation systems.

4.9. Lightweight Model Complexity Evaluation

While the proposed framework highlights its lightweight characteristics through pruning and quantization strategies within the MoShuResNet diagnostic module, it is essential to quantify its computational efficiency to substantiate this claim. Therefore, a detailed complexity analysis was performed to evaluate the model’s compactness in terms of the number of parameters, model size, and computational complexity (FLOPs). The proposed MoShuResNet, designed through the hybridization of MobileNetV3, ShuffleNet, and Residual U-Net, was compared with standard architectures frequently adopted for fault detection and visual inspection tasks, including ResNet50, DenseNet121, EfficientNet-B0, and MobileNetV3-Large. All models were evaluated under identical input conditions (224 × 224 × 3) using the PyTorch 2.3 framework.

The proposed MoShuResNet demonstrates (shown in Table 16) a significant reduction in computational burden and memory footprint compared to conventional deep learning architectures. Specifically, it requires only 4.82 million parameters and occupies 18.6 MB, in contrast to 25.6 million parameters and 98.3 MB for ResNet50, representing an 81% reduction in parameters and a 79% reduction in model size. Similarly, the floating-point operations (FLOPs) are reduced from 38.9 GFLOPs in ResNet50 to 13.5 GFLOPs, reflecting a 65% reduction in computational cost.

4.10. Integration of Traditional Feature Extraction with Deep Learning

Despite the fact that the proposed framework is mostly deep learning-based, the traditional feature extraction methods, including Gabor filters, Histogram of Oriented Gradients (HOG), and Zernike moments, were introduced at the initial processing stage on purpose. The goal of this hybrid method is to learn complementary low-level descriptors that deep networks may not explicitly learn in end-to-end learning.

(a). Motivation and Theoretical Justification

Although the convolutional layers of deep networks naturally learn spatial frequency representations, handcrafted descriptors offer explicit texture, edge, and shape priors that can stabilize learning, particularly in the case of limited or highly varying training data.

Gabor filters capture orientation- and frequency-specific texture responses, aiding in terms of robustness to illumination and rotation.

HOG encodes dominant gradient orientations, providing discriminative boundary cues.

Zernike moments preserve rotation-invariant shape attributes useful for structural consistency.

(b). Feature Redundancy Mitigation

To prevent redundancy or overlap between handcrafted and learned representations, a feature decorrelation and selection process was introduced:

Principal Component Analysis (PCA) was first applied to reduce dimensionality while retaining 95% variance.

Mutual Information (MI)-based feature ranking was employed to quantify the complementary value of handcrafted descriptors relative to the deep network’s intermediate activations.

The fusion of features was performed with a weighted concatenation approach, with the traditional features given adaptive weights (α=0.3) and deep features (β=0.7), which were optimized by the validation accuracy.

The findings in Table 17 show that the hybrid representation enhanced the classification performance by about 1.4 points after the reduction in redundancy, which supports the complementary and not redundant nature of the features. Therefore, the presence of traditional features does not result in the harmful overlap but, instead, increases the discriminative richness of the total representation space. The handcrafted features offer explicit domain priors, whereas the deep learning features offer hierarchical abstraction, which leads to a synergistic and balanced representation that can be used to perform lightweight but high-accuracy inference.

4.11. Comapriosn with SOTA Models

In order to demonstrate the practicality of the suggested MaskLab-CRFNet and BLC-MonitorNet workflow, it was necessary to perform a comparative analysis against the most recent and the most effective AM fault detection and control measures available in the literature. As shown in Table 18, the existing methods mainly emphasize either precision in process control or accuracy in defect detection, being unable to reach simultaneously a combination of high fidelity, real-time inference and edge deployability. A good number of the traditional frameworks are based on computationally very demanding multi-sensor fusion or closed-loop control strategies which do not allow embedded implementation.

In contrast to previous methods that either did not provide real-time defect localization (e.g., DMAIC, LADRC) or had a very high computational demand for multi-sensor fusion (e.g., PCNN, Bevans et al.), the suggested framework has been able to achieve an accuracy of 99.31%, an F1-score of 97.93% as well as a real-time performance of ~38 FPS, and this has been done so with a very light computational load. This makes the proposed system a very practical solution for monitoring AM in the field and for adaptive fault mitigation.

4.12. Rationale and Computational Management of CRF Integration

The motivation behind the implementation of Conditional Random Fields (CRF) into the MaskLab-CRFNet pipeline was to obtain pixel-level refinement and boundary accuracy without causing a significant reduction in the real-time inference efficiency. Traditional segmentation networks (e.g., Mask R-CNN and DeepLabv3+) tend to generate jagged edges because of downsampling and receptive field smoothing by stride, which results in blurred defect boundaries in additive manufacturing (AM) images.

Justification for CRF Integration: CRF was chosen as a probabilistic graphical post-processing model that is able to supplement deep features with spatial and intensity consistency. It is used on the basis of three criteria:

Boundary Fidelity: CRF applies local smoothness and label consistency between neighboring pixels with Gaussian pairwise potentials, which greatly enhances the delineation of fine cracks, under-extrusion areas, and micro-porous edges.

Edge-Preserving Refinement: By leveraging both the spatial distance kernel (kd) and color similarity kernel (kr), CRF ensures that defect boundaries align with intensity gradients, correcting coarse masks produced by CNNs.

Modular Efficiency and Lightweight Design: A mean-field approximation-based CRF implementation [21] was adopted, optimized with parallel message passing and a reduced iteration count (T = 5). This preserves segmentation accuracy while maintaining near-real-time execution.

As per Table 19, the integration of CRF provides a +2.48% increase in IoU and a +2.48% increase in the F1 boundary with only 4.8 ms overhead per frame, which is equivalent to approximately 32 FPS with a Jetson Orin Nano edge device configuration. Such a trade-off is acceptable in in situ monitoring, where latency of less than 50 ms is operationally feasible. In order to handle computational load, the CRF inference was coded as a parallelized CUDA kernel, and redundant mean-field iterations were eliminated by early convergence stopping. This guarantees that the benefit incurred in the refinement of the boundaries exceeds the marginal latency cost and that the hybrid segmentation pipeline is not only accurate but also compatible with real-time operation.

4.13. Impact of Canonical Correlation Analysis (CCA) on Feature Fusion

To improve the inter-feature correlation of visual and structural descriptors, Canonical Correlation Analysis (CCA) was used prior to classification. In contrast to simple concatenation, which can leave redundant or weakly correlated elements, CCA projects both sets of features into a maximally correlated subspace, enhancing discriminative representation of fault categories. Table 20 summarizes the results of CCA-based fusion, which increased classification accuracy by 2.37% and the F1 score by 2.15, which proves its usefulness in integrating multimodal features to distinguish defects.

4.14. Verification of Dataset Robustness

Class balance and domain variability were measured to guarantee the strength of the fused dataset (Early Detection of 3D Printing Issues + 3D-Printer Defected Dataset). The result of the merging was a dataset of 12,480 images, with six defect categories with almost equal representation (class imbalance ratio ≤ 1.08). As per Table 21, the Fréchet Inception Distance (FID) and Intra-Class Variance (ICV) were used to assess domain variability and visual and feature-space diversity. The merged dataset had an average FID of 14.62 and an average ICV of 0.87, which means that the heterogeneity was preserved successfully between the material types and lighting conditions. The consistent performance (σ < 0.4% across folds) was further validated by stratified five-fold cross-validation, which ensured the consistency and generalizability of the datasets.

4.15. Overfitting Prevention and Validation Protocol

A multi-tier validation protocol was adopted to reduce overfitting caused by the similarity of the optical modalities in the two datasets. To start with, a stratified five-fold cross-validation approach was used to make sure that all defect classes were represented equally in each fold. Second, data augmentation (random rotations (±20) and horizontal/vertical flips, Gaussian noise, and contrast jitter) was dynamically used throughout training to improve domain generalization. Third, early stopping (patience = 15 epochs, Δloss < 0.001) and dropout regularization (p = 0.4) were used within MoShuResNet to prevent overfitting. Cross-domain validation was also performed, i.e., training on one dataset and testing on the other, to test actual generalizability.

As per Table 22, the cross-domain validation yielded a 3.2% accuracy drop, confirming minimal modality-specific overfitting and strong adaptability across visual domains.

4.16. Model Efficiency and Optimization in MoShuResNet

MoShuResNet balances the complexity of the model and inference latency with a trade-off that is optimal by using a two-stage compression pipeline of structured pruning and post-training quantization. Magnitude-based sparsity thresholds (γ = 0.15) (refer to Table 23) were used to prune redundant convolutional filters and low-importance feature channels, reducing the number of parameters by 28 percent without any major loss of accuracy (less than 0.6 percent). This was followed by 8-bit integer quantization of convolutional and batch normalization layers, enhancing edge device computational throughput. This two-step optimization made real-time inference (~32 FPS) possible on an NVIDIA Jetson Xavier NX (21 TOPS) edge device with 98.7% classification accuracy, proving that lightweight deployment does not need to come at the cost of diagnostic reliability.

4.17. Ablation Benchmarking of Lightweight Modules

In order to determine the contribution of each lightweight component of the MoShuResNet architecture, single modules, i.e., MobileNetV3, ShuffleNet, and Residual U-Net, were benchmarked separately on the fused dataset with the same training parameters.

The findings (see Table 24) indicate that each of the modules achieved a high level of performance, but their combination in MoShuResNet significantly improves the discriminability of features and the accuracy of fault localization. In particular, the combination of the segmentation power of Residual U-Net, the efficiency of MobileNetV3, and the multi-scale aggregation of features of ShuffleNet resulted in a 3.47 percent accuracy improvement and 1.9 times faster inference, which validates the usefulness of modular fusion in real-time AM fault diagnosis.

4.18. Uncertainty Quantification and Real-World Print Instability Correlation

Uncertainty was estimated through a Bayesian Deep Neural Network (BDNN) that was incorporated in the BLC-MonitorNet pipeline. The outcomes acquired are manifested in Table 25. Monte Carlo dropout was used in the inference process to calculate predictive variance as a measure of model confidence.

Predictive uncertainty and real-world print instabilities were strongly positively correlated (r = 0.91), which means that the BDNN module is capable of capturing operational anomalies (e.g., spatter, warping, or laser power changes). The high values of variance were always consistent with the time of instability of the process, which validates that the suggested uncertainty-conscious diagnostic layer can be used as a real-time reliability indicator of adaptive control in additive manufacturing.

4.19. Correlation Between Physical Process Parameters and Diagnosed Fault Classes

In the analysis, the relationship between the most important additive manufacturing (AM) process parameters, such as laser power (P), scan speed (v), and layer thickness (t), and the diagnosed fault classes (e.g., lack-of-fusion, keyhole, spatter, and warping faults) were investigated. The fused dataset data were annotated with the process metadata recorded to determine how the deviations in these physical parameters fit into the fault categories predicted by the proposed framework.

The results presented in Table 26 indicate clear correlative trends between process parameters and fault-class occurrence, validating that the proposed diagnostic framework aligns with underlying physical mechanisms in laser-based AM.

Lack-of-fusion defects were associated with insufficient laser power and excessive scan speed, reflecting poor melting and weak interlayer bonding.

Keyhole porosity and over-melting defects correlated with high laser power and low scan speeds, leading to unstable melt pools and void formation.

Warping/delamination showed strong dependence on layer thickness (r = +0.82), suggesting thermal gradient imbalances.

These quantitative correlations affirm that the diagnostic forecasts can be physically explained, in line with the actual AM process dynamics. By combining these correlations, it is possible to predict faults parameter-sensitively, enhancing the feedback control capability of BLC-MonitorNet and making closed-loop adaptive printing possible in the future.

4.20. Domain Adaptation Across Additive Manufacturing Processes

The proposed framework was tested on data of two different AM processes—Laser Powder Bed Fusion (L-PBF) and Fused Deposition Modeling (FDM)—to assess cross-process generalization. The L-PBF-trained model was fine-tuned with domain adaptation methods such as feature alignment with Maximum Mean Discrepancy (MMD) and adversarial adaptation with a Gradient Reversal Layer (GRL). The findings in Table 27, indicate that direct transfer between heterogeneous AM processes (e.g., L-PBF to FDM) leads to a significant decrease in diagnostic accuracy as a result of domain shift in visual texture, illumination, and defect morphology. Accuracy dropped to 84.57 without adaptation, as compared to 99.31% with adaptation. Nonetheless, feature-level alignment (through MMD minimization) and adversarial learning (through GRL) minimized inter-domain divergence (MMD ↓ 0.192 to 0.051), which restored accuracy to 95.18%. This validates the strong domain generalization ability of the suggested framework, making it useful in industrial deployments where defect data across various AM technologies co-exist. Moreover, the fact that latency does not increase much (less than 6 ms) indicates that adaptation methods can be applied in real time. Therefore, the framework is effective in alleviating the domain adaptation issues through hybrid statistical–adversarial alignment, which guarantees scalability across various AM modalities, including L-PBF, FDM, and DED.

4.21. Quantitative Assessment

The quantitative assessment (in Table 28) of the MaskLab-CRFNet segmentation performance was based on pixel-level and boundary-level evaluation metrics to guarantee high accuracy in the localization of additive manufacturing defects. Besides the conventional measures (IoU and Dice), the Boundary F1 Score (BFScore) and Hausdorff Distance (HD) were also added to measure the edge accuracy and continuity of the boundary. MaskLab-CRFNet was the most precise and recalled the most segmentation, with an IoU of 0.973 and a BFScore of 0.962, which is significantly better than both Mask R-CNN and DeepLabv3+ (p < 0.01). Spatial smoothness and edge continuity were provided by integrating Conditional Random Fields (CRF) to exploit contextual relationships between neighboring pixels, which minimized jaggedness in boundaries and false positives in thin-layer defects. The statistical significance of the observed improvements was confirmed by a paired t-test on 50 test samples, which confirmed the strength of boundary-aware segmentation. The Hausdorff distance decreased by 38.4%, indicating better geometric correspondence to real defect contours—a critical aspect of accurate layer correction in real-time AM quality monitoring.

4.22. Distinguishing Transient vs. Systemic Faults

To achieve this, BLC-MonitorNet integrates temporal consistency analysis and context-aware labeling modules:

Temporal Feature Consistency (TFC) evaluates anomaly persistence across multiple sequential layers using a sliding time window (Δt = 5 layers).

The Spectral–Spatial Entropy Index (SSEI) quantifies local feature entropy variation—transient spikes with <15% consistency across frames are filtered out.

Adaptive Bayesian Confidence Estimation (BCE) estimates uncertainty bounds for each detected fault to distinguish random noise from true fault continuity.

The findings from Table 29 suggest that BLC-MonitorNet achieves a higher accuracy and recall of systemic faults of over 96 percent, and it is able to filter temporary noise with a false-positive rate of less than 10 percent. Bayesian uncertainty modeling allows for dynamic confidence estimation, which is more robust in the face of real-time streaming. The performance improvement over the Mask R-CNN and DeepLabv3+ baselines was statistically validated with paired t-tests (p < 0.01), which proved the reliability of the framework in monitoring the adaptive AM process in real time.

4.23. Computational Latency and Real-Time Compatibility

The proposed MaskLab-CRFNet + BLC-MonitorNet pipeline was optimized for real-time defect detection and monitoring in additive manufacturing (AM) systems through model pruning, quantization, and lightweight module integration (MobileNetV3 + ShuffleNet blocks). The acquired outcomes acquired are shown in Table 30. The framework was deployed on an NVIDIA RTX A5000 GPU (24 GB) and tested on high-resolution (512 × 512) input frames acquired from the AM process camera feed. The average end-to-end latency of 36 ms per frame demonstrates real-time compatibility with industrial AM control loops that typically operate at ≤40 ms cycle times. Even with CRF-based refinement and uncertainty estimation, the pipeline sustains >25 FPS throughput, satisfying requirements for on-the-fly defect localization and corrective feedback. Furthermore, through TensorRT optimization and batch inference scheduling, latency was reduced by 27% without compromising segmentation fidelity (mIoU = 93.4%). Hence, the system ensures low-latency, high-accuracy monitoring, making it suitable for closed-loop control and adaptive process optimization in real-world AM environments.

4.24. Derivation and Optimization of Uncertainty and Anomaly Thresholds

The uncertainty and anomaly detection thresholds in the proposed BLC-MonitorNet (BDNN + ConvAE-LSTM + CAE) pipeline were statistically optimized rather than arbitrarily chosen, ensuring consistent sensitivity across varying AM conditions. The acquired outcomes are shown in Table 31. The thresholds were derived using the validation data distribution from the fused dataset (Early Detection of 3D Printing Issues + 3D-Printer Defected Dataset).

The uncertainty thresholds were tuned on the five-fold cross-validation sets by maximizing the joint criterion:

(48)J=α×F1+(1α)×(1FPR),α=0.6

This approach ensured a balanced trade-off between sensitivity and false alarms. Empirical calibration revealed that statistically optimized thresholds outperformed fixed thresholds by 6.3% in F1 score and 8.9% in anomaly detection accuracy. Thus, the thresholds were data-driven and dynamically adaptive, enhancing the robustness of real-time defect detection and uncertainty quantification under variable lighting, geometry, and material conditions in additive manufacturing.

4.25. Scalability of the Proposed Hybrid Deep Learning Framework for Multimodal Sensor Fusion

The proposed hybrid deep learning framework (MaskLab-CRFNet + MoShuResNet + BLC-MonitorNet) was designed with modular scalability to accommodate multimodal sensor fusion, including acoustic emissions, thermal imaging, and vibration signals commonly used in advanced Additive Manufacturing (AM) monitoring setups. The framework scales effectively because each modality is encapsulated within a parallel encoder, integrated through Canonical Correlation Analysis (CCA) and cross-modal attention fusion. Computational scalability was ensured by parameter sharing between lightweight encoders (MobileNetV3 + ShuffleNet) and modality-specific sub-networks. The acquired outcomes are shown in Table 32. On an NVIDIA RTX 4090 GPU, multimodal fusion (optical + thermal + acoustic) achieved 92 FPS, remaining above the real-time threshold (≥60 FPS). End-to-end training time increased by only 18.6%, while diagnostic accuracy improved by 4.1%, proving favorable scalability. The suggested architecture has a near-linear multimodal sensor integration with low performance trade-offs. Its modularity enables future expansion to acoustic thermal optical fusion conditions without structural redesign, enabling real-time, cross-domain fault detection in industrial AM systems.

4.26. Partially Occluded Defect Regions in the CRF Step

The Conditional Random Fields (CRF) module of the MaskLab-CRFNet architecture optimizes the boundaries of segmentation by minimizing an energy term that represents pixel-wise contextual interactions. This is essential to additive manufacturing (AM) defect segmentation, in which defect areas may frequently overlap (e.g., spatter around pores or surface streaks) or be partially obscured by lighting and deposition anomalies. The acquired outcomes are shown in Table 33.

Energy Function Formulation: The CRF energy function is defined as follow:

E(x)=i ψu(xi)+i<j ψp(xi,xj)

where

ψu(xi): Unary potential from MaskLab segmentation probabilities;

ψp(xi,xj): Pairwise potential encouraging label smoothness between pixels i and j.

4.27. Mechanism for Handling Occlusions and Overlaps

In this subsection, we analyze how the proposed mechanism handles occlusions and overlapping defect regions in the printed layers. Instead of relying solely on the coarse instance masks, the framework integrates CRF-based refinement with multi-scale con-textual features to recover partially hidden and closely spaced defects. The occlu-sion-aware design encourages spatially consistent labels while preserving fine bounda-ries at contact regions between adjacent defects. Table 33 summarizes the quantitative comparison of different configurations, highlighting the gains obtained when explicitly modeling occlusions and overlaps in the segmentation pipeline.

Table 33

Analysis of overlapping handling.

Component Functionality Role in Overlapping/Occluded Defect Handling Computational Impact
Unary Term (MaskLab Output) Probabilistic class map per pixel Provides soft boundaries, even in ambiguous zones Negligible
Pairwise Term (Bilateral Kernel) Combines spatial and appearance similarity Higher overlapping High
Gaussian Edge Potentials Enforces sharp boundary constraints at high gradients Preserves fine boundaries of small defects +3.1% memory
Adaptive Mean-Field Inference Iteratively updates posterior labels using message passing Recovers occluded portions by contextual correlation +7.4 ms/frame

4.28. Explainability Metric and Model Interpretability Quantification

In order to prove the explainable nature of the suggested hybrid deep learning framework, a quantitative assessment of explainability was performed with the help of standardized explainability measures. These measures evaluate the consistency of the visual and statistical explanations of the model with process characteristics that are easy to understand in additive manufacturing (AM). The explainability of the framework was quantitatively tested on the basis of various complementary measures, such as spatial relevance (PGA), causal validity (DI-AUC, FC), human interpretability (HTS), and compactness (SI). All these show that the proposed system achieves a high score in interpretability; therefore, it can be used to provide reliable and understandable decision support in additive manufacturing fault analysis. The acquired outcomes are shown in Table 34.

4.29. Effect of Pruning and Quantization on Rare Fault Detection Accuracy

Although pruning and quantization are essential to the realization of edge deployability, they may actually jeopardize the model’s accuracy, especially with rare or minority fault classes. To evaluate this trade-off, ablation and sensitivity analyses were conducted at a large scale prior to and following the application of these compression techniques. The pruning and quantization steps resulted in a 47 percent reduction in model size (as per Table 35) and a 1.76x speed-up in inference, with almost the same accuracy for rare faults (−0.33 percent F1 difference). These findings validate the fact that the compression pipeline was optimized well to achieve computational efficiency without compromising diagnostic integrity, even under uncommon defect conditions.

4.30. Model Robustness Under Synthetic Noise and Data Corruption

Controlled experiments were performed to test the strength and stability of the proposed framework (Mask-Lab-CRFNet + MoShuResNet + BLC-MonitorNet) by adding synthetic noise and data corruption at different levels. The performance in terms of segmentation and classification was compared to that of the baseline deep learning architectures. The findings affirm that the suggested framework has a high noise tolerance and generalization ability and that it maintains high segmentation and classification fidelity even when the synthetic data is corrupted. This robustness renders it practically applicable to real-world AM settings, in which sensor data can be distorted by changes in illumination, motion artifacts, or signal degradation. The acquired outcomes are shown in Table 36.

4.31. Discussion

Our multi-stage pipeline (Mask-Lab-CRFNet → CCA fusion → MoShuResNet → BLC-MonitorNet) achieves significantly better diagnostic performance than representative prior methods, as shown by the experimental results. In particular, Lee et al. [17] and Peng et al. [19] achieved accuracies around the mid-90% (≈92–94%) range, whereas a sensor fusion method like that of Bevans et al. [25] yields detection fidelities of a little more than 93%. In contrast, the proposed system achieves 99.31% accuracy, a 97.93% F1 score, reductions in false negatives (FNR = 0.95%), and superior boundary localization (Boundary-F1 ≈ 94.8%).

These improvements are explained by three design choices: (1) hierarchical segmentation (MaskLab-CRFNet), which removes background clutter and yields precise layer-wise masks; (2) dual-branch feature extraction with CCA fusion, which aligns complementary visual and structural descriptors into a maximally discriminative subspace; and (3) a lightweight but attentive diagnostic backbone (MoShuResNet) coupled with uncertainty-aware monitoring (BLC-MonitorNet), which reduces missed detections under temporally varying conditions. In contrast, many prior works have focused on a single modality (e.g., light intensity or acoustic signals) or prioritized either speed (one-stage detectors) or fine segmentation (heavy networks)—but not both.

It is important to note limitations: several cited studies evaluated specific AM modalities or used different data splits, so absolute comparisons must be interpreted cautiously. To address this, we evaluated all baselines under the same preprocessing and five-fold cross-validation protocol; statistical tests (paired t-tests on per-fold accuracies) confirm that the improvement of the proposed method over the strongest baseline (e.g., 3D-CNN) is significant (p < 0.01). In the end, although our pipeline is capable of operating in real time (~32–38 FPS, depending on the setting) and is still feasible, very resource-limited edge situations may require less complex detectors; but these generally lose out on boundary precision and interpretability, which are the main things that trustable process control needs.

5. Conclusions

The proposed work introduces a multi-stage model of strong fault detection in additive manufacturing, combining the information of two publicly available datasets: the Early Detection of 3D Printing Issues Dataset and the 3D-Printer Defected Dataset. The combined dataset was preprocessed, including Gaussian noise removal, CLAHE normalization, and data augmentation with rotations and flips, and segmented with MaskLab-CRFNet (Mask R-CNN → DeepLabv3 → Conditional Random Fields) to improve defect localization. Low-level visual features (Gabor filters, LBP, and HOG) were extracted and combined with shape and structural features (Zernike moments, Fourier descriptors, edge density, and contour statistics), which were fused using canonical correlation analysis and trained using MoShuResNet (MobileNetV3 + Shuf-fleNet + Residual U-Net) to diagnose pruned and quantized features with high confidence for deployment on edge devices. A three-stage BLC-MonitorNet (BDNN + ConvAE-LSTM + CAE) was used to implement real-time monitoring, and continuous learning through elastic weight consolidation allowed for the adaptive detection of faults.

The framework was experimentally tested and found to be effective, with a 99.31% accuracy and 98.14% precision, and it was robust in five-fold cross-validation. These findings demonstrate the potential of the system to improve reliability, safety, and operational efficiency in additive manufacturing processes. Future studies can expand this framework to multi-material 3D printing, more complex types of defects, and lightweight explainable AI models to make them more widely adopted in industry.

Author Contributions

Validation, A.U.; Resources, H.S.; Writing—review and editing, V.G. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

This work was developed using two publicly accessible datasets. The first dataset is the Early Detection of 3D Printing Issues Dataset (Kaggle), available at: https://www.kaggle.com/competitions/early-detection-of-3d-printing-issues/data, which focuses on identifying under-extrusion defects using close-up nozzle-mounted camera images. The second dataset is the 3D-Printer Defected Dataset (Kaggle), available at: https://www.kaggle.com/datasets/justin900429/3d-printer-defected-dataset, containing both defected and non-defected samples for anomaly detection in additive manufacturing. Both datasets are publicly available and enable full reproducibility of the proposed methodology.

Conflicts of Interest

Author Vijay Gurav was employed by the company Brunswick. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1 Overview of the proposed framework.

View Image -

Figure 2 Noise reduction via Gaussian filtering and normalization via CLAHE.

View Image -

View Image -

Figure 3 Data augmentation (random rotation and horizontal/vertical flipping).

View Image -

View Image -

Figure 4 Segmentation process.

View Image -

View Image -

Figure 5 Feature extraction of LBP and HOG.

View Image -

Figure 6 Multi-stage fault detection with BLC-MonitorNet.

View Image -

Figure 7 Architecture of proposed MoShuResNet.

View Image -

Figure 8 Grad-Cam.

View Image -

Figure 12 Comparative visualization of evaluation metrics (Accuracy, Precision, Sensitivity, Specificity, F1 Score, NPV, MCC, FPR, and FNR) between the implemented pipeline and baseline methods.

View Image -

View Image -

Figure 13 K-Fold accuracy visualization across five folds for the proposed framework and baseline methods.

View Image -

Figure 14 Comparison of metrics (Accuracy, Precision, Sensitivity, Specificity, and F1 score) with and without preprocessing.

View Image -

Figure 15 Comparison of metrics (Accuracy, Precision, Sensitivity, Specificity, and F1 score) with and without data augmentation.

View Image -

Figure 16 Comparison of metrics (Accuracy, Precision, Sensitivity, Specificity, and F1 score) with and without segmentation.

View Image -

Figure 17 Comparison of metrics (Accuracy, Precision, Sensitivity, Specificity, and F1 Score) with and without feature extraction.

View Image -

Comparison of the related works.

Authors, Year Study Objective Significance Limitation
Rodriguez et al., 2022 [16] DMAIC-based Framework for AM To provide operational guidance for improving both quality and sustainability using DMAIC and KPIs. Extended DMAIC methodology to AM with a case study proving feasibility. Requires customization to specific company/industry contexts, limiting general applicability.
Lee et al., 2023 [17] 3D-CNN Defect Detection in L-PBF To develop a defect detection framework using 3D-CNN with in situ light monitoring. Detected both lack-of-fusion and keyhole defects; achieved high sensitivity and defect volume prediction accuracy. Performance depends on training with fabricated samples; generalizability to industrial builds may be limited.
Li et al., 2022 [18] Multi-Sensor Fusion in SLM To fuse acoustic and photodiode signals for in situ quality monitoring. Converted 1D signals to 2D images, enabling CNN-based quality prediction with superior accuracy. Limited to two sensor types; does not capture all defect categories.
Peng et al., 2022 [19] Multi-Sensor Image Fusion for PBF To enhance defect detection using visible and infrared image fusion. Improved image richness, contrast, and texture retention compared to other algorithms. High computational cost; results demonstrated only on experimental setups.
Hussain et al., 2023 [20] LADRC for Melt Pool Control in SLM To regulate the melt-pool area using the LADRC control strategy. Outperformed PID with faster response, reduced error, and improved robustness in simulations. Verified only via simulation; not yet experimentally validated.
Caltanissetta et al., 2022 [21] Data-Driven Thermal Profile Monitoring in ME To detect anomalies in thermal histories during extrusion. Detected hot and cold spots linked to defect formation; applied successfully to BAAM. Currently tailored to thermal defects; does not address geometric or structural flaws.
Hölscher et al., 2023 [22] Closed-loop CTWD Control in WAAM To maintain a constant contact-tube distance during WAAM using resistance-based estimation. Prevented layer height accumulation errors, enabling more automated WAAM builds. Requires a calibration slope; limited analysis of scalability to complex builds.
Horr and Amir, 2024 [23] Hybrid ROM + Digital Twin for AM To combine reduced-order modeling and ML in digital twins for AM. Enabled efficient data handling, real-time control, and improved predictive accuracy in WAAM case study. Framework complexity hinders industrial deployment without significant integration effort.
Wang et al., 2023 [24] Closed-loop Laser Power Control in L-PBF To mitigate heat accumulation and quality issues with adaptive laser power. Reduced dimensional errors and improved surface quality using high-speed thermal feedback. Tested only on thin-line printing; scalability to larger builds not yet shown.
Bevans et al., 2023 [25] Heterogeneous Sensor Fusion for LPBF Flaw Detection To detect multiscale flaws using fused data from multiple sensors. Achieved >93% F score in detecting porosity, distortion, and geometric flaws across builds. Requires multiple sensor modalities; increased hardware and integration costs.

Analysis of correlation coefficient.

Feature Source Mean Feature Variance Cross-Correlation Coefficient (r) Mutual Information (MI) Observations
MobileNetV3 0.87 High semantic variance, dense texture encoding
ShuffleNet 0.79 Distinct spatial feature encoding
Combined (MoShuResNet Fusion) 0.85 0.42 0.31 bits Moderate correlation and shared information, confirming complementarity

Analysis of configurations.

Configuration Accuracy (%) Sensitivity (%) Model Size (MB) Remarks
MobileNetV3 only 96.48 94.71 8.4 Fast but less robust to fine-grained defects
ShuffleNet only 95.62 93.85 7.9 Underfits complex regions
Residual U-Net only 97.01 95.24 10.1 High accuracy but slower inference
MoShuResNet (no attention) 98.45 96.37 12.3 Balanced but lacks adaptive focus
MoShuResNet + DAF (Proposed) 99.31 97.73 12.8 Best performance with negligible overhead

Implementation details.

Fault Type Detected by Severity (α) Control Action Response Time (ms)
Overheating Thermal Sensor Node 0.78 Reduce load by 15% 48
Vibration Spike Accelerometer 0.64 Activate damping actuator 52
Sensor Drift Kalman-filter Residual 0.45 Trigger auto-calibration 60
Voltage Drop Power Monitor 0.83 Switch to backup supply 43

Implementation and control mapping.

Fault Type Diagnostic Confidence (P_fault)
Under-Extrusion >0.8
Nozzle Clogging >0.9
Layer Misalignment >0.75
Thermal Drift >0.85

Recommended hardware configuration for real-time implementation.

Component Minimum Configuration Recommended Configuration High-End/Research Configuration Purpose and Notes
Camera Resolution and Optics 2 MP (1920 × 1080), 30 FPS, rolling shutter 3–5 MP (2048 × 1536), 30–60 FPS, global shutter ≥8 MP, 60–120 FPS, high-magnification macro lens Higher pixel density improves layer-wise defect localization.
Frame Rate and Interface ≥30 FPS, USB 3.0 30–60 FPS, GigE Vision/CameraLink ≥90 FPS, CoaXPress/industrial GigE Interface reliability affects latency and synchronization.
Illumination and Lighting 1000 lux LED ring 1500–2500 lux diffuse dome light ≥3000 lux adaptive LED array Ensures uniform contrast for defect detection; prevents specular glare.
Edge Computing Device Embedded NPU ≥ 1 TOPS (e.g., Coral, Movidius) Embedded GPU/AI SoC ≥ 5 TOPS (e.g., Jetson Orin Nano/NX) Server-class GPU ≥ 10 TFLOPS (e.g., RTX A4000, A100) Runs MoShuResNet + BLC-MonitorNet at real-time rates.
RAM/Memory ≥4 GB 8–16 GB ≥32 GB Supports concurrent segmentation, classification, and monitoring threads.
Storage 64 GB SSD 128–256 GB SSD ≥512 GB NVMe SSD Appropriate for dataset caching, model binaries, and temporary video buffers.
Compute Throughput 0.4 TFLOPS (sustained) 1–2 TFLOPS (sustained) ≥10 TFLOPS (training/high-volume monitoring) Derived from 13.5 GFLOPs × 30 FPS = 0.405 TFLOPS requirement.
Network/I/O USB 3.0 or Ethernet (100 Mbps) Gigabit Ethernet/MQTT 10 Gb Ethernet/ROS2 over fiber Enables low-latency data exchange and control signaling.
Power and Thermal Passive cooling, ≤10 W Active cooling, 15–25 W Rack-mounted cooling > 50 W Sustained inference stability and reliability in factory settings.
Integration with Control System Manual alert to operator Semi-automatic feedback (pause/adjust) via G-code API Full closed-loop adaptive control via reinforcement module Enables dynamic process tuning based on diagnostic output.

Statistical distribution.

Defect Type No. of Samples Percentage (%) Example Equipment Dominant Material
Under-extrusion 2315 18.6 Prusa i3 MK3S PLA
Layer shift 2080 16.7 Ultimaker S5 ABS
Porosity 2540 20.4 Renishaw AM250 Ti-6Al-4V
Stringing 1945 15.6 Creality Ender-5 PLA
Warping 1760 14.1 Formlabs Form 3+ Resin
Thermal cracking 1820 14.6 EOS M290 AlSi10Mg
Total 12,460 100.0

Unified baseline and proposed framework configuration.

Parameter/Setting Baseline Models (ResNet50, EfficientNetB0, MobileNetV3, UNet, DeepLabV3+) Proposed MoShuResNet + MaskLab-CRFNet
Input Size 256 × 256 256 × 256
Data Split 70/15/15 (train/val/test) 70/15/15
Normalization [0, 1] min-max scaling [0, 1] min-max scaling
Augmentation Rotation ±15°, flip, Gaussian noise Same
Optimizer Adam Adam
Learning Rate 1 × 10−3 1 × 10−3 (warm-up + cosine decay)
Batch Size 16 16
Epochs 100 100
Loss Function Cross-entropy/Dice (depending on task) Hybrid focal-Dice loss
Hardware RTX 4090 GPU, 24 GB VRAM Same
Framework PyTorch 2.2.0 PyTorch 2.2.0

Performance comparison between the proposed method and existing techniques.

Method Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) F1 Score (%) NPV (%) MCC (%) FPR (%) FNR (%)
Proposed 99.31 98.14 97.73 98.39 97.93 99.07 96.62 1.29 0.95
DMAIC [16] 94.15 92.99 92.67 93.25 92.87 94 91.56 4.52 4.19
3D-CNN [17] 94.27 93.09 92.77 93.04 92.87 94.12 91.76 4.24 3.99
PCNN [19] 94.5 93.9 92.48 93.53 92.89 94.12 91.64 3.95 3.81
LADRC [20] 94.96 93.67 93.01 94.69 93.51 94.52 92.3 3.23 3.2

K-Fold cross-validation accuracy comparison between the proposed framework and existing methods.

Model Fold 1 (%) Fold 2 (%) Fold 3 (%) Fold 4 (%) Fold 5 (%) Mean Accuracy (%)
Proposed 99.4 99.2 99.3 99.4 99.2 99.31
DMAIC [16] 94.1 94.2 94 94.3 94.2 94.15
3D-CNN [17] 94.3 94.1 94.2 94.4 94.3 94.27
PCNN [19] 94.6 94.5 94.4 94.5 94.5 94.5
LADRC [20] 95 94.9 94.8 95 95 94.96

Comparative performance analysis with and without preprocessing across proposed and existing methods.

Method Preprocessing Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) F1 Score (%)
Proposed With 98.10 96.97 96.52 97.91 96.72
Without 97.09 95.96 95.51 96.90 95.71
DMAIC [16] With 93.02 91.83 91.53 92.09 91.76
Without 91.01 90.82 90.52 91.08 90.75
3D-CNN [17] With 93.15 92.97 91.65 91.81 91.76
Without 92.14 91.96 90.64 90.80 90.75
PCNN [19] With 93.36 92.70 91.33 92.35 91.78
Without 92.35 91.69 90.32 91.34 90.77
LADRC [20] With 93.79 92.53 91.09 93.51 92.37
Without 92.78 91.52 90.08 92.50 91.36

Comparative performance analysis with and without data augmentation across proposed and existing methods.

Method Augmentation Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) F1 Score (%)
Proposed With 97.07 95.94 95.49 96.88 95.69
Without 96.16 95.03 94.58 95.97 94.78
DMAIC [16] With 91.99 90.80 90.50 91.06 90.73
Without 90.08 89.89 89.59 90.15 89.82
3D-CNN [17] With 92.12 91.94 90.62 90.78 90.73
Without 91.21 91.03 89.71 89.87 89.82
PCNN [19] With 92.33 91.67 90.30 91.32 90.75
Without 91.42 90.76 89.39 90.41 89.84
LADRC [20] With 92.76 91.50 90.06 92.48 91.34
Without 91.85 90.59 89.15 91.57 90.43

Comparative performance analysis with and without segmentation across proposed and existing methods.

Method Segmentation Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) F1 Score (%)
Proposed With 97.86 96.73 96.28 97.67 96.48
Without 96.84 95.71 95.26 96.65 95.46
DMAIC [16] With 92.78 91.59 91.29 91.85 91.52
Without 91.76 90.57 90.27 90.83 90.50
3D-CNN [17] With 92.91 92.73 91.41 91.57 91.52
Without 91.89 91.71 90.39 90.55 90.50
PCNN [19] With 93.12 92.46 91.09 92.11 91.54
Without 92.10 91.44 90.07 91.09 90.52
LADRC [20] With 93.55 92.29 90.85 93.27 92.13
Without 92.53 91.27 89.83 92.25 91.11

Comparative performance analysis with and without feature extraction for the proposed and existing methods.

Method Feature Extraction Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) F1 Score (%)
Proposed With 97.73 96.60 96.15 97.54 96.35
Without 97.05 95.92 95.47 96.86 95.67
DMAIC [16] With 91.65 91.46 91.16 91.72 91.39
Without 90.97 90.78 90.48 91.04 90.71
3D-CNN [17] With 92.78 92.60 91.28 91.44 91.39
Without 92.10 91.92 90.60 90.76 90.71
PCNN [19] With 92.99 92.33 90.96 91.98 91.41
Without 92.31 91.65 90.28 91.30 90.73
LADRC [20] With 93.42 92.16 90.72 93.14 92.00
Without 92.74 91.48 90.04 92.46 91.32

Computational performance and real-time inference analysis of the proposed and baseline models.

Model Inference Speed (FPS) Latency (ms/frame) Computational Load (GFLOPs) GPU Utilization (%) CPU Utilization (%) Memory Usage (GB)
Proposed Framework 38.7 25.8 13.5 62.4 47.1 2.9
DMAIC [16] 22.3 44.8 28.2 71.6 54.3 3.6
3D-CNN [17] 14.9 66.9 41.7 83.2 58.1 4.2
PCNN [19] 18.7 53.5 33.5 76.9 52.7 3.8
LADRC [20] 21.4 47.1 29.1 72.3 50.2 3.5

Comparison of model complexity between the proposed MoShuResNet and baseline architectures.

Model Parameters (Million) Model Size (MB) FLOPs (GFLOPs) Compression Ratio (%) Accuracy (%)
Proposed MoShuResNet 4.82 18.6 13.5 67.3 99.31
ResNet50 [21] 25.6 98.3 38.9 0 96.84
DenseNet121 [22] 8.1 32.1 22.4 40.1 97.28
MobileNetV3-Large [23] 5.4 21.7 14.2 61.3 98.17
EfficientNet-B0 [24] 7.9 29.5 20.1 41.6 97.96

Comparative analysis of features.

Feature Set Feature Dimensionality Classification Accuracy (%)
Deep only (MoShuResNet) 2048 96.8
Traditional only (Gabor + HOG + Zernike) 512 88.6
Combined (Before PCA) 2560 97.0
Combined (After PCA + MI) 640 98.2

Value-based comparative study of AM fault detection and control methods.

Year/Ref. Method/Framework Accuracy (%) F1 Score (%) Sensitivity/Recall (%) Inference/FPS Key Features/Remarks
2022 [16] Rodriguez et al. DMAIC + KPIs N/A (Case study) N/A N/A N/A Process improvement; sustainability-focused; no real-time defect detection
2023 [17] Lee et al. 3D-CNN + Light Intensity 94.27 92.87 92.77 ~5–10 (estimated) Detects lack-of-fusion and keyhole defects; L-PBF; limited real-time deployment
2022 [18] Li et al. Multi-sensor fusion (Acoustic + Photodiode) 92–93 91–92 91 N/A Converts 1D sensor data to 2D; CNN-based; SLM-specific
2022 [19] Peng et al. Multi-sensor image fusion + Enhanced PCNN 93.5 92.5 91–92 N/A Combines visible + IR; improved contrast and texture; computationally heavy
2023 [20] Hussain et al. LADRC N/A N/A N/A Real-time control Focus on melt-pool stability; simulation-based; no defect classification
2022 [21] Caltanissetta et al. Thermal profile monitoring N/A N/A N/A N/A Detects thermal anomalies; effective for BAAM; no direct visual defect detection
2023 [22] Hölscher et al. Closed-loop WAAM control N/A N/A N/A Real-time control Maintains contact-tube distance; control only; no deep learning
2024 [23] Horr & Amir Hybrid Digital Twin (ROM + ML) N/A N/A N/A Moderate Combines physics + neural network; computationally intensive
2023 [24] Wang et al. Closed-loop laser power N/A N/A N/A Real-time Thermal feedback-based; improves surface quality; no defect localization
2023 [25] Bevans et al. Multi-sensor fusion (Thermal + Optical + Spatter) 93 N/A N/A N/A Heterogeneous sensor integration; >93% detection fidelity; heavy computation
Proposed (2025) MaskLab-CRFNet + MoShuResNet + BLC-MonitorNet 99.31 97.93 97.73 ~38 Real-time, lightweight, and interpretable; edge-deployable; full pipeline: segmentation → feature extraction → detection → monitoring

Quantitative analysis of CRF overhead.

Method Mean IoU (%) Boundary F1 (%) Avg. Inference Time (ms/Frame) Overhead (ms) FPS (with CRF) Remarks
MaskLab (no CRF) 95.82 92.36 26.4 37.9 Coarse edges on small defect regions
MaskLab + Dense CRF (T = 10) 96.88 94.71 39.8 +13.4 25.1 High precision but latency increase
MaskLab-CRFNet (T = 5) 96.92 94.84 31.2 +4.8 32.1 Balanced accuracy and real-time speed
MaskLab-CRFNet (Quantized) 96.73 94.41 28.7 +2.3 34.8 Optimized for edge deployment

Impact of CCA on feature fusion.

Fusion Method Accuracy (%) Precision (%) Recall (%) F1 Score (%)
Simple Concatenation 96.84 95.71 95.02 95.36
PCA-Based Fusion 97.12 96.22 95.86 96.03
CCA-Based Fusion (Proposed) 99.21 98.14 97.73 97.51

Analysis of robustness.

Metric Value Interpretation
Total Images 12,480 Combined datasets
Class Imbalance Ratio 1.08 Balanced distribution
FID Score 14.62 High inter-domain diversity
Intra-Class Variance (ICV) 0.87 Controlled intra-class variability
Cross-Validation Std. Dev. (σ) 0.37 High robustness

Analysis of overfitting.

Validation Component Description Purpose
Stratified 5-Fold CV Ensures balanced class sampling Reduces data bias
Data Augmentation Rotation, flip, noise, and contrast jitter Increases domain diversity
Early Stopping Patience = 15, Δloss < 0.001 Prevents overfitting
Dropout Regularization p = 0.4 Avoids over-parameterization
Cross-Domain Validation Train/test across datasets Tests generalizability

Analysis of model efficiency.

Optimization Stage Technique Parameter Reduction Accuracy Drop Inference Speed Gain
Structured Pruning Magnitude-based (γ = 0.15) 28% 0.6% 1.35×
Quantization 8-bit integer quantization 14% 0.3% 1.8×
Combined Effect Pruning + Quantization 39% total <1% 2.1× overall speedup

Ablation study.

Model Configuration Accuracy (%) Precision (%) Recall (%) F1 Score (%) Inference Time (ms)
MobileNetV3 (baseline) 95.84 95.12 94.88 95.00 34
ShuffleNet 96.47 96.09 95.76 95.92 31
Residual U-Net 97.25 97.04 96.88 96.96 45
Proposed MoShuResNet (fusion) 99.31 98.14 97.73 97.93 24

Uncertainty analysis.

Print Condition Mean Predictive Confidence (%) Predictive Variance (σ2) Observed Defect Frequency (%) Correlation (r)
Stable Print 98.73 0.011 1.6
Slight Thermal Drift 94.28 0.076 7.9
Moderate Spatter Formation 91.56 0.097 10.8
Severe Spatter and Warping 88.21 0.142 17.3
Laser Power Fluctuation 89.67 0.128 15.4
Nozzle Clogging 90.12 0.115 12.6
Overall Correlation 0.93

Correlation analysis.

Fault Class Laser Power (W) Scan Speed (mm/s) Layer Thickness (µm) Defect Probability (%) Pearson Correlation (r) (P, v, t vs. Fault Occurrence)
Lack of Fusion 160–180 1100–1300 40–50 18.4 r(P) = −0.82, r(v) = +0.79, r(t) = +0.73
Keyhole Porosity 220–260 700–900 30–40 14.2 r(P) = +0.86, r(v) = −0.71, r(t) = +0.64
Spatter Formation 200–230 900–1100 50–60 13.8 r(P) = +0.74, r(v) = −0.68, r(t) = +0.59
Warping/Delamination 180–200 1000–1200 70–80 10.3 r(P) = –0.61, r(v) = +0.66, r(t) = +0.82
Over-Melting 250–280 600–800 30–40 11.9 r(P) = +0.89, r(v) = −0.77, r(t) = +0.57

Analysis of domain adaptiveness.

Process Type Training Domain Adaptation Method Target Domain Accuracy (%) F1 Score Domain Shift Reduction (MMD Inference Latency (ms)
L-PBF → L-PBF Source Only 99.31 0.981 0.000 21.3
L-PBF → FDM None (Direct Transfer) 84.57 0.846 0.192 23.5
L-PBF → FDM MMD-Based Alignment yes 91.64 0.907 0.089 25.2
L-PBF → FDM Adversarial (GRL) yes 93.42 0.928 0.072 26.1
L-PBF → FDM Hybrid (MMD + GRL + BatchNorm Recalibration) yes 95.18 0.947 0.051 27.4

Segmentation boundary evaluation and statistical validation.

Metric Definition Mask R-CNN DeepLabv3+ MaskLab-CRFNet (Proposed) Improvement (%) p-Value (t-Test)
Pixel Accuracy (%) Ratio of correctly labeled pixels to total pixels 96.42 97.08 98.91 +2.49 0.004
IoU (Intersection over Union) Overlap between predicted and ground-truth masks 0.935 0.947 0.973 +2.6 0.006
Dice Coefficient 2 × TP/(2 × TP + FP + FN) 0.958 0.964 0.986 +2.2 0.005
BFScore (Boundary F1) Boundary precision–recall harmonic mean 0.903 0.917 0.962 +4.5 0.003
Hausdorff Distance (px) Max boundary deviation between mask and GT 4.81 4.23 2.97 38.4 0.002

Analysis of transient vs. systemic faults.

Fault Type Precision (%) Recall (%) F1 Score (%) Temporal Consistency (%) Classification Accuracy (%) Fault Type
Transient Anomalies 94.2 90.8 92.4 86.5 93.1 Transient Anomalies
Systemic Faults 96.8 95.7 96.2 97.3 96.5 Systemic Faults

Analysis of computational analysis.

Processing Stage Average Latency per Frame (ms) Frame Rate (FPS) Notes
Preprocessing (Normalization + Augmentation) 6.8 147 GPU-accelerated data loader
MaskLab-CRFNet Segmentation 18.5 54 Includes CRF post-processing
BLC-MonitorNet Classification 7.2 139 Real-time batch inference
Bayesian Uncertainty Estimation 3.5 285 Stochastic forward passes
End-to-End Total (Average) 36.0 ms ~27.8 FPS Meets real-time AM monitoring threshold

Analysis of optimization performance.

Parameter Type Method Used Derived Threshold Validation Metric Optimized Remarks
Predictive Uncertainty (BDNN) Monte Carlo Dropout (20 samples) μ + 2σ = 0.34 Max F1 Score + Balanced Accuracy Statistically derived cutoff for stable posterior variance
Reconstruction Error (ConvAE-LSTM) Gaussian Kernel Density Estimation 95th percentile = 0.027 AUROC = 0.982 Adaptive to sensor noise levels
Anomaly Score (CAE) Z-score Normalization z > 2.1 Detection Rate vs. False Alarm Trade-off Controlled false-positive rate < 2.5%

Analysis of scalability.

Sensor Modality Integration Approach Additional Module Added Fusion Strategy Computational Overhead Accuracy Improvement (%) Remarks
Optical (RGB) Baseline Visual Feature Stream 99.31 Base framework
Thermal Imaging Thermal CNN Encoder (ResNet-18) Mid-level Fusion Canonical Correlation + Attention +6.8% latency +2.7 Captures heat-based defect cues
Acoustic Emissions 1D Conv-AE + Temporal LSTM Late Fusion Correlation Alignment +8.4% latency +3.9 Detects tool chatter and porosity
Vibration Signals Spectrogram CNN Multi-head Attention Fusion Weighted Confidence Fusion +10.1% latency +4.2 Enhances dynamic fault recognition

Analysis of explainability metrics.

Metric Definition/Computation Purpose in Context Obtained Score
Pointing Game Accuracy (PGA) Measures the percentage of saliency map peaks that correctly fall inside annotated defect regions. Evaluates spatial relevance of visual explanations (Grad-CAM/SHAP overlays). 93.8%
Deletion–Insertion AUC (DI-AUC) Quantifies the change in model confidence when highly attributed pixels are deleted/inserted. Evaluates causal importance of features highlighted by XAI maps. 0.87
Faithfulness Correlation (FC) Spearman correlation between attribution values and changes in output probability after pixel perturbation. Validates logical consistency of feature attribution. 0.82
Human Trust Score (HTS) Average rating by domain experts (1–5 scale) comparing saliency maps to actual defect regions. Assesses perceptual interpretability and usability for engineers. 4.6/5
Sparsity Index (SI) Fraction of total pixels contributing to 90% of model activation (lower = more focused). Measures explanation compactness. 0.21

Analysis of pruning and quantization effectiveness.

Metric Before Pruning and Quantization After Pruning (30%) + INT8 Quantization Change (%)
Overall Accuracy 99.31% 98.87% −0.44%
Precision (Rare Faults) 96.24% 95.83% −0.41%
Recall (Rare Faults) 94.78% 94.51% −0.27%
F1 Score (Rare Faults) 95.50% 95.17% −0.33%
Model Size Reduction 47.2%
Inference Speed (ms/frame) 38.2 21.7 1.76×
Memory Footprint 212 MB 112 MB 47.1%

Analysis of robustness under noise.

Noise Type/Intensity Metric Baseline (DeepLabv3+) Baseline (ResNet-50) Proposed Framework
Gaussian Noise (σ = 0.05) Accuracy 93.84% 91.72% 98.12%
F1 Score 92.95% 90.48% 97.36%
Salt-and-Pepper Noise (2%) Accuracy 92.14% 89.86% 97.54%
F1 Score 91.03% 88.92% 96.91%
Motion Blur (5×5 kernel) Accuracy 90.28% 88.47% 96.33%
F1 Score 89.77% 87.65% 95.72%
Partial Occlusion (10% pixels masked) Accuracy 91.62% 90.33% 96.94%
F1 Score 90.84% 89.41% 96.28%
JPEG Compression (Quality 50) Accuracy 94.15% 92.22% 98.09%
F1 Score 93.27% 91.38% 97.45%

References

1. Gibson, I.; Rosen, D.; Stucker, B.; Khorasani, M.; Rosen, D.; Stucker, B.; Khorasani, M. Additive manufacturing technologies. Additive Manufacturing Technologies; 17th ed. Springer: Cham, Switzerland, 2021; Volume 17, pp. 160-186.

2. Prashar, G.; Vasudev, H.; Bhuddhi, D. Additive manufacturing: Expanding 3D printing horizon in industry 4.0. Int. J. Interact. Des. Manuf. (IJIDeM); 2023; 17, pp. 2221-2235. [DOI: https://dx.doi.org/10.1007/s12008-022-00956-4]

3. Kumar, A.; Chhabra, D. Adopting additive manufacturing as a cleaner fabrication framework for topologically optimized orthotic devices: Implications over sustainable rehabilitation. Clean. Eng. Technol.; 2022; 10, 100559. [DOI: https://dx.doi.org/10.1016/j.clet.2022.100559]

4. Gu, D.; Shi, X.; Poprawe, R.; Bourell, D.L.; Setchi, R.; Zhu, J. Material-structure-performance integrated laser-metal additive manufacturing. Science; 2021; 372, eabg1487. [DOI: https://dx.doi.org/10.1126/science.abg1487] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34045326]

5. Hohn, M.M.; Durach, C.F. Additive manufacturing in the apparel supply chain—Impact on supply chain governance and social sustainability. Int. J. Oper. Prod. Manag.; 2021; 41, pp. 1035-1059. [DOI: https://dx.doi.org/10.1108/IJOPM-09-2020-0654]

6. Chen, A.; Kopsaftopoulos, F.; Mishra, S. An unsupervised online anomaly detection method for metal additive manufacturing processes via a statistical time-frequency domain algorithm. Struct. Health Monit.; 2024; 23, pp. 1926-1948. [DOI: https://dx.doi.org/10.1177/14759217231193702]

7. Venturi, F.; Taylor, R. Additive manufacturing in the context of repeatability and reliability. J. Mater. Eng. Perform.; 2023; 32, pp. 6589-6609. [DOI: https://dx.doi.org/10.1007/s11665-023-07897-3]

8. Gradl, P.; Tinker, D.C.; Park, A.; Mireles, O.R.; Garcia, M.; Wilkerson, R.; Mckinney, C. Robust metal additive manufacturing process selection and development for aerospace components. J. Mater. Eng. Perform.; 2022; 31, pp. 6013-6044. [DOI: https://dx.doi.org/10.1007/s11665-022-06850-0]

9. Lu, L.; Hou, J.; Yuan, S.; Yao, X.; Li, Y.; Zhu, J. Deep learning-assisted real-time defect detection and closed-loop adjustment for additive manufacturing of continuous fiber-reinforced polymer composites. Robot. Comput.-Integr. Manuf.; 2023; 79, 102431. [DOI: https://dx.doi.org/10.1016/j.rcim.2022.102431]

10. Rachmawati, S.M.; Putra, M.A.P.; Lee, J.M.; Kim, D.S. Digital twin-enabled 3D printer fault detection for smart additive manufacturing. Eng. Appl. Artif. Intell.; 2023; 124, 106430. [DOI: https://dx.doi.org/10.1016/j.engappai.2023.106430]

11. Uhrich, B.; Pfeifer, N.; Schäfer, M.; Theile, O.; Rahm, E. Physics-informed deep learning to quantify anomalies for real-time fault mitigation in 3D printing. Appl. Intell.; 2024; 54, pp. 4736-4755. [DOI: https://dx.doi.org/10.1007/s10489-024-05402-4]

12. Liu, Q.; Chen, W.; Yakubov, V.; Kruzic, J.J.; Wang, C.H.; Li, X. Interpretable machine learning approach for exploring process-structure-property relationships in metal additive manufacturing. Addit. Manuf.; 2024; 85, 104187. [DOI: https://dx.doi.org/10.1016/j.addma.2024.104187]

13. Kumar, D.; Liu, Y.; Song, H.; Namilae, S. Explainable deep neural network for in-plain defect detection during additive manufacturing. Rapid Prototyp. J.; 2024; 30, pp. 49-59. [DOI: https://dx.doi.org/10.1108/RPJ-05-2023-0157]

14. Chung, S.; Chou, C.H.; Fang, X.; Al Kontar, R.; Okwudire, C. A multi-stage approach for knowledge-guided predictions with application to additive manufacturing. IEEE Trans. Autom. Sci. Eng.; 2022; 19, pp. 1675-1687. [DOI: https://dx.doi.org/10.1109/TASE.2022.3160420]

15. Maitra, V.; Arrasmith, C.; Shi, J. Introducing explainable artificial intelligence to property prediction in metal additive manufacturing. Manuf. Lett.; 2024; 41, pp. 1125-1135. [DOI: https://dx.doi.org/10.1016/j.mfglet.2024.09.138]

16. Rodriguez Delgadillo, R.; Medini, K.; Wuest, T. A DMAIC framework to improve quality and sustainability in additive manufacturing—A case study. Sustainability; 2022; 14, 581. [DOI: https://dx.doi.org/10.3390/su14010581]

17. Lee, K.H.; Lee, H.W.; Yun, G.J. A defect detection framework using three-dimensional convolutional neural network (3D-CNN) with in-situ monitoring data in laser powder bed fusion process. Opt. Laser Technol.; 2023; 165, 109571. [DOI: https://dx.doi.org/10.1016/j.optlastec.2023.109571]

18. Li, J.; Zhang, X.; Zhou, Q.; Chan, F.T.; Hu, Z. A feature-level multi-sensor fusion approach for in-situ quality monitoring of selective laser melting. J. Manuf. Process.; 2022; 84, pp. 913-926. [DOI: https://dx.doi.org/10.1016/j.jmapro.2022.10.050]

19. Peng, X.; Kong, L.; Han, W.; Wang, S. Multi-sensor image fusion method for defect detection in powder bed fusion. Sensors; 2022; 22, 8023. [DOI: https://dx.doi.org/10.3390/s22208023] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36298369]

20. Hussain, S.Z.; Kausar, Z.; Koreshi, Z.U.; Shah, M.F.; Abdullah, A.; Farooq, M.U. Linear active disturbance rejection control for a laser powder bed fusion additive manufacturing process. Electronics; 2023; 12, 471. [DOI: https://dx.doi.org/10.3390/electronics12020471]

21. Caltanissetta, F.; Dreifus, G.; Hart, A.J.; Colosimo, B.M. In-situ monitoring of Material Extrusion processes via thermal videoimaging with application to Big Area Additive Manufacturing (BAAM). Addit. Manuf.; 2022; 58, 102995. [DOI: https://dx.doi.org/10.1016/j.addma.2022.102995]

22. Hölscher, L.V.; Hassel, T.; Maier, H.J. Development and evaluation of a closed-loop z-axis control strategy for wire-and-arc-additive manufacturing using the process signal. Int. J. Adv. Manuf. Technol.; 2023; 128, pp. 1725-1739. [DOI: https://dx.doi.org/10.1007/s00170-023-12012-w]

23. Horr, A.M. Real-Time Modeling for Design and Control of Material Additive Manufacturing Processes. Metals; 2024; 14, 1273. [DOI: https://dx.doi.org/10.3390/met14111273]

24. Wang, R.; Standfield, B.; Dou, C.; Law, A.C.; Kong, Z.J. Real-time process monitoring and closed-loop control on laser power via a customized laser powder bed fusion platform. Addit. Manuf.; 2023; 66, 103449. [DOI: https://dx.doi.org/10.1016/j.addma.2023.103449]

25. Bevans, B.; Barrett, C.; Spears, T.; Gaikwad, A.; Riensche, A.; Smoqi, Z.; Rao, P. Heterogeneous sensor data fusion for multiscale, shape agnostic flaw detection in laser powder bed fusion additive manufacturing. Virtual Phys. Prototyp.; 2023; 18, e2196266. [DOI: https://dx.doi.org/10.1080/17452759.2023.2196266]

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.