Content area
Purpose
Lean manufacturing has been pivotal in emphasizing the reduction of cycle times, minimizing manufacturing costs and diminishing inventories. This research endeavors to formulate a lean data management paradigm, through the design and execution of a strategic edge-cloud data governance approach. This study aims to discern anomalous or unforeseen patterns within data sets, enabling an efficacious examination of product shortcomings within manufacturing processes, while concurrently minimizing the redundancy associated with the storage, access and processing of nonvalue-adding data.
Design/methodology/approach
Adopting a lean data management framework within both edge and cloud computing contexts, this study ensures the preservation of significant time series sequences, while ascertaining the optimal quantity of normal time series data to retain. The efficacy of detected anomalous patterns, both at the edge and in the cloud, is assessed. A comparative analysis between traditional data management practices and the strategic edge-cloud data governance approach facilitates an exploration into the equilibrium between anomaly detection and space conservation in cloud environments for aggregated data analysis.
Findings
Evaluation of the proposed framework through a real-world inspection case study revealed its capability to navigate alternative strategies for harmonizing anomaly detection with data storage efficiency in cloud-based analysis. Contrary to the conventional belief that retaining comprehensive data in the cloud maximizes anomaly detection rates, our findings suggest that a strategic edge-cloud data governance model, which retains a specific subset of normal data, can achieve comparable or superior accuracy with less normal data relative to traditional methods. This approach further demonstrates enhanced space efficiency and mitigates various forms of waste, including temporal delays, storage of noncontributory normal data, costs associated with the analysis of such data and excess data transmission.
Originality/value
By treating inspected normal data as nonvalue-added, this study probes the intricacies of maintaining an optimal balance of such data in light of anomaly detection performance from aggregated data sets. Our proposed methodology augments existing research by integrating a strategic edge-cloud data governance model within a lean data analytical framework, thereby ensuring the retention of adequate data for effective anomaly detection.
1. Introduction
The genesis of lean manufacturing management, which aspires to enhance value creation by reducing waste and thereby elevating productivity (Koskela et al., 2002), can be traced back to Toyota in the 1950s, where it became known as the Toyota Production System (Ballard and Howell, 1998). This strategy emphasizes continuous process improvement, concentrating on the elimination of wasteful practices such as overproduction, waiting time, unnecessary movement, defects, excess inventory, unused talent and over-processing (Ohno, 1988). By adhering to this approach, organizations aim to refine their operational efficiency, curtail costs and escalate customer satisfaction (Sundar et al., 2014).
Since 2010, the rapid evolution of technology, characterized by the rise of artificial intelligence, big data and the Internet of Things (IoT) has provided unprecedented opportunities for manufacturing industries to amplify their quality and efficiency (Kusiak, 2023). The advances in information and communication technology have been a pivotal force in this transformation, catalyzing the shift from traditional computer-aided manufacturing to a more data-centric, intelligent approach. With IoT infrastructure and simulation, a manufacturing system that is able to automatically analyze the data to aid decision-making in lean manufacturing can obtain the operation efficiency (Wang, 2020). The implementation of machine learning algorithms with various computing models also enhance the incorporation of advanced sensors in the context of Industry 4.0 (Md et al., 2022).
However, the introduction of these disruptive technologies is not without its challenges, particularly from a data management perspective in a lean manufacturing process. These include handling the heterogeneous nature of the data, managing its enormous volume and addressing the need for real-time processing (Dai et al., 2020). Furthermore, dealing with the vast quantities of data being generated daily, and the corresponding demand for efficient data storage, access and computation resources, has proven to be a critical management issue. Consider, for example, the data value (DV) chain as shown in Figure 1 (Jarr, 2012). Within this chain, individual DVs are derived from interactive real-time manufacturing data, such as digitalized manufacturing machine analog signals, measurements, settings and collected images. In contrast, aggregated time series manufacturing data, subjected to exploratory data analytics based on the accumulated data, can yield significant business value. Analyzing individual data can enable real-time interventions to prevent defects, while studying aggregated data can uncover manufacturing patterns that can aid predictive maintenance strategies. Despite the obvious need for both types of data within smart manufacturing, current literature seldom explores lean data analysis that concurrently addresses real-time individual and long-term aggregated data analysis.
Take, for instance, the quality inspection process in an electronic devices manufacturing. In the discourse of management studies, particularly when examining the intricacies of quality control within manufacturing processes, the utilization of time series data such as Solder Paste Inspection (SPI) data emerges as a quintessential example. The SPI process is pivotal in ensuring that the application of solder paste on electronic boards is executed with precision concerning both quantity and positioning. From a managerial standpoint, the discernment of anomalies within SPI data acquires profound significance. Such irregularities in real-time data necessitate an immediate cessation of the ongoing process to facilitate a thorough investigation and requisite adjustments. This approach underscores a proactive management strategy, aiming to mitigate potential disruptions in production efficiency and maintain stringent quality standards. In the realm of manufacturing, the production of real-time data on a daily basis, which fluctuates in volume contingent upon the specific industry and its applications, mandates the implementation of strategies for both prompt analysis and incremental storage of such data. The prevalent methodology within the industry for the analysis of aggregated time series data entails an initial phase of comprehensive data storage, followed by the execution of analytical procedures. This conventional approach, however, precipitates substantial inefficiencies due to the extraneous retrieval, storage, processing and transmission of time series data. The accumulation of such data, not all of which is pertinent or necessary for insightful analysis, underscores a critical area for optimization. Addressing these inefficiencies by adopting more discerning data management and analysis techniques could significantly enhance operational efficiency and resource allocation within the manufacturing sector.
To mitigate the aforementioned inefficiencies inherent in the current paradigm of time series data management within the manufacturing industry, our research introduces the concept of strategic edge-cloud data governance (SECDG). SECDG embodies a sophisticated management approach aimed at the strategic governance of data, leveraging the capabilities of edge-cloud computing to achieve an optimal equilibrium between edge and cloud data storage from a managerial perspective. This innovative framework is predicated on the premise that by strategically distributing data processing and storage tasks between the edge of the network and the cloud, organizations can significantly enhance their operational efficiency and data management efficacy.
The novelty of our research lies in its empirical exploration of a real-world manufacturing case study, aimed at scrutinizing the equilibrium between data storage and anomaly detection performance under the edge-cloud computing paradigm. The findings of this study illuminate the potential of SECDG to reconcile the dual objectives of anomaly detection efficacy and storage efficiency. Specifically, the results suggest that by judiciously reducing the volume of ‘normal’ data retained, it is feasible to streamline the data manufacturing process, thereby facilitating more rapid data analysis, and achieving more efficient data storage solutions. This revelation points to the broader applicability of SECDG as a means to foster lean data manufacturing practices, characterized by enhanced speed and efficiency of data analysis, coupled with a reduction in unnecessary data storage burdens.
The sections following this introduction provide a thorough literature review to establish the research context and highlight its significance. We then present our proposed SECDG model, detailing its theoretical foundation and methodological approach for real-world application in manufacturing. This includes a description of the experimental design, data collection and analysis methods to evaluate the SECDG framework’s effectiveness. The results section critically examines our findings, discussing their practical and theoretical implications for enhancing manufacturing data management and processing. Finally, the conclusion summarizes the study’s contributions, outlines future research directions and reflects on the impact of our findings on the field, particularly in advancing strategic data governance with edge-cloud computing.
2. Proposition development
Today’s manufacturing industry has introduced digital machine monitoring sensors and process equipment, which enables rapid and large-scale accumulation of observation and operation data during the production process. In such a production environment, whether to store all the data generated by the machines and sensors on the production line and then analyze it becomes an important issue. If you want to store and analyze it, you need to consume a lot of resources, including spending a lot of time transferring to the computing analysis host and using a lot of computing power to get results such as material quality monitoring, production process monitoring early warning or production machine parts life cycle prediction. The concept of traditional lean manufacturing that tries to limit the wastes has extended to lean information management in big data era (Hicks, 2007). Corbett and Chen (2015) summarized that the concepts of traditional manufacturing wastes can be mapped to the information management wastes and big data wastes. As can be seen in Table 1, when the size of big data is increasing, it might lead to several excesses such as data collection excess, data transmission excess, data cleansing waste and data storage excess.
In the manufacturing industry, these data are very large, accumulate very quickly and have a characteristic that most of them are normal data. If you want to store all of them and upload them to a powerful host such as cloud computing server for analysis, the cost will be higher and higher, but the benefits may not be optimal. Therefore, think about whether there is a way to save data storage and transmission while having the same or even better analysis results.
Traditional and strategic edge-cloud data governance approaches
In the contemporary discourse of data management within the industrial sector, the traditional methodology for the analysis of aggregated time series data – a process which indiscriminately mandates the preliminary storage of the entirety of generated data (100%) prior to its transmission to databases for subsequent analysis on more potent host computing platforms, such as cloud computing servers – has been identified as resource-intensive. This approach necessitates substantial allocations of time, storage capacity and computational resources to process and preserve vast quantities of data, a significant portion of which may ultimately be deemed superfluous to analytical requirements.
Edge computing emerges as a salient technological paradigm, distinguished by its capability to store and process data proximate to its origin, such as within manufacturing machinery. Positioned at the periphery of the network, edge computing is characterized by its immediacy and locational cognizance, thereby facilitating the delivery of services that are both rapid and secure, through real-time data processing capabilities (Cao et al., 2020). Conceived as a “thin layer of computation,” edge computing infrastructure is inherently lightweight, equipped to store only a limited quantum of data and endowed with comparatively modest computational prowess. Conversely, cloud computing entails the transference of data to centralized computing centers via the network, thereby centralizing the resolution of computational and storage challenges. This dichotomy underscores the substantial costs associated with the storage, access and processing of aggregated time series data, particularly in light of the ongoing accumulation of real-time data.
Within the manufacturing industry, the synergistic deployment of edge and cloud computing technologies has been heralded for its potential to substantially augment the operational efficiency of manufacturing plants, enhance product quality and bolster manufacturers’ competitiveness (Georgakopoulos et al., 2016). The integration of edge and cloud computing is posited to offer significant reductions in latency and network congestion, alongside diminished bandwidth costs, through the provision of more agile resource allocation mechanisms. However, the efficacy of the edge-cloud model is contingent upon the equitable management of value-adding (VA) and nonvalue-adding (NVA) data. Ideally, the objective is for the edge-cloud architecture to retain all VA data while minimizing the presence of NVA data, thus optimizing the balance between data utility and resource efficiency. This balance is pivotal in harnessing the full potential of edge-cloud computing, thereby enabling industries to navigate the complexities of modern data management and analysis with greater agility and foresight.
Propositions
Within the ambit of this scholarly endeavor, we meticulously scrutinize three foundational propositions that are premised on the hypothesis that the manufacturing anomaly detection process exemplifies a data-intensive task, necessitating extensive data sets for both immediate reaction and comprehensive aggregated analysis. This investigation draws upon the dualistic data collection framework, wherein immediate data are captured in real-time at the edge-side machinery, and aggregated data are consolidated within a cloud database.
The initial proposition postulates that, within the context of a lean data manufacturing paradigm, cloud data management systems offer a distinct advantage over traditional data management solutions by significantly reducing waste generation throughout the manufacturing process. Traditional methodologies, which indiscriminately amass all collected data, whether from edge-side or cloud databases, inherently result in the storage of vast quantities of data pertaining to normal manufacturing operations – data which, upon analysis, may not yield actionable value. This approach, by its very nature, engenders the accumulation of unnecessary waste, given the low utility of maintaining comprehensive data sets of normal operational data for analysis. The proposition thus advocates for a more discerning approach to data storage within cloud data management systems, emphasizing the selective retention of data that genuinely contributes to the monitoring and optimization of the manufacturing process:
The second proposition explores the premise that in the realm of lean data manufacturing, the adoption of an SECDG model necessitates significantly less physical space for data storage in comparison to conventional data management practices. This model is characterized by its judicious use of edge-side devices for immediate data analysis and robust cloud servers for managing aggregated data. The effectiveness of this approach becomes particularly pronounced as the volume of real-time data escalates. Through edge computing’s capacity for anomaly detection at the source, and its selective transmission of both anomalous and a fraction of normal data to the cloud, this proposition highlights the spatial and computational efficiencies inherent in the strategic edge-cloud model, especially under conditions of data volume expansion.
The third proposition delineates the enhanced operational flexibility afforded by cloud data management systems within a lean manufacturing framework, characterized by their scalability, real-time data processing capabilities and improved accessibility. This increased operational flexibility is posited to act as a mediator in the reduction of operational costs, thereby bolstering the efficacy of cost-saving measures within the lean manufacturing process. The cloud’s inherent adaptability, in stark contrast to the rigidity of traditional, on-premises data management solutions, facilitates a dynamic environment conducive to cost-efficient manufacturing operations.
Operational flexibility and cost reduction via cloud data management systems – in a lean data manufacturing process, cloud data management systems characterized by their scalability, real-time data processing and enhanced accessibility, provide greater operational flexibility compared to traditional, on-premises data management solutions. This increased flexibility in managing data is hypothesized to mediate a reducing in operational costs, thereby contributing to the effectiveness of cost-saving measures in lean manufacturing process.
Our research extends beyond the theoretical exploration of these propositions to include an empirical validation within a real-world electronic device manufacturing scenario. This empirical study aims to rigorously evaluate the efficacy of the proposed SECDG model in enhancing anomaly detection, data management efficiency and overall operational flexibility. The ensuing chapter will delve into the comprehensive details of this empirical investigation, setting the stage for a profound exploration of the transformative potential of SECDG within the manufacturing domain.
3. Method
The processing framework
In this research, a SECDG approach was proposed to handle the anomaly detection based on the real-world collected SPI data which is commonly collected in electronic device manufacturing.
From the vantage point of data management, the discernment between VA and NVA data represents a critical juncture for many enterprises. In this context, VA data, often exemplified by SPI anomaly data, is earmarked for retention due to its potential to catalyze improvements. Conversely, data indicative of “normal” real-time manufacturing operations is categorized as NVA, on the premise that it does not precipitate further enhancement of processes or outcomes (Eaton et al., 2023). Given that a predominant share of real-time data mirrors standard operational parameters rather than deviations, a substantial fraction of such NVA data is construed as contributing to inefficiencies, thereby engendering resource depletion without commensurate value creation.
In the wake of these predicaments, the paradigm of lean data engineering principles (LDEP) has been propounded as a methodological countermeasure, predicated on the optimization of data processing practices to forestall the squandering of resources, thereby circumscribing operational expenditures to those quintessentially requisite for the fulfillment of the task at hand (Marttonen-Arola et al., 2020). LDEP synthesizes insights from data engineering, software engineering and DevOps, further augmenting this amalgam with the strategic utilization of cloud computing technologies. In alignment with the tenets of LDEP, our research delineates a data analysis framework designed to minimize the retention of NVA data by integrating an anomaly detection mechanism within the ambit of edge and cloud computing paradigms (Pan and McElhannon, 2018).
This study leverages empirical data from a real-world manufacturing scenario as a fulcrum for evaluating the efficacy of the proposed framework. The deployment of an edge-cloud-based anomaly detection methodology facilitates the identification of aberrant signals amidst the corpus of collected real-time data. The overarching aim of this investigation is to manifest the operational benefits derived from the application of our framework, specifically in terms of enhancing data utility, curtailing the proliferation of NVA data and thereby effectuating a paradigmatic shift toward leaner, more efficient data engineering practices within the manufacturing sector. This approach not only exemplifies the practical application of LDEP but also underscores the transformative potential of leveraging edge and cloud computing technologies to refine data management strategies in contemporary manufacturing environments.
To verify the impact of minimizing normal data, we infuse detected anomalies (VA data) with a certain quantity of normal data and transmit to the cloud database for further analysis. We then evaluate the performance of using the anomaly detection method under different normal data keeping rates (NDKR), which offers insight into the balance of data storage-saving rate (DSSR) and anomaly detection rate (ADR).
The SECDG approach embodies a synergistic integration of cloud computing’s expansive capabilities with the localized processing power of edge computing (Pan and McElhannon, 2018). This paradigm is anchored in the edge infrastructure, which constitutes the network layer proximal to terminal devices, facilitating the execution of storage, computation and data analysis operations at the network’s periphery. Such a configuration enables immediate responses postdata analysis, leveraging the edge’s proximity to data sources. Concurrently, the cloud computing framework, recognized for its robust and scalable computing resources, accommodates more complex analytical tasks and expansive data storage requirements. Predominantly, scholarly inquiries within this domain have concentrated on applications related to real-time data transmission, data security, privacy and energy consumption (Cao et al., 2020), with a limited focus on the inefficiencies associated with the management of NVA data in the context of edge-cloud computing.
In our investigation, we devised an edge-cloud computing methodology tailored for the detection of anomalous signals within both real-time and aggregated time series data. According to this methodology, anomalies identified at the edge are transmitted to a cloud database, which is better equipped in terms of computational and storage capacities, for comprehensive analysis (Pan and McElhannon, 2018). This process necessitates the differentiation between normal and anomalous data, acknowledging that the volume of normal data typically surpasses that of anomalies within real-time data streams. Thus, determining the optimal quantity of normal data to be retained in the cloud emerges as a critical consideration in data management practices.
This discussion underscores the imperative to judiciously manage the balance between the retention of normal and anomalous data within the cloud. By meticulously calibrating this balance, organizations can optimize their resource utilization, enhancing both the efficiency and effectiveness of their data governance strategies. The strategic deployment of edge-cloud computing, as advocated in the SECDG approach, represents a forward-thinking solution to the challenges posed by the prevalent surplus of NVA data, thereby contributing to the advancement of lean and strategic data management methodologies within the evolving landscape of digital technologies.
To handle the vast volume of the collected data, on edge, an anomaly detection technology was applied to detect unusual signals considered as VA data. All of the detected anomalies would be sent to a cloud database that had more sufficient computation and storage resource for further investigation (Pan and McElhannon, 2018). These anomalies can appear in many forms, such as extreme values or unusual time series patterns (Prasad, 1997), and may be correlated to a quality defect in production. Proper analysis of anomaly data can detect the real-time manufacturing problem as early as possible.
In this research, the anomaly detection on edge computing allows preliminary analysis to be conducted in real time, without waiting for the transmission of all collected data to a cloud database. Then, only the detected anomaly sequences (VA data) and several percentages of normal time series data were kept and sent to a cloud computing environment for further investigation. Because not all of data were transmitted from edge to cloud, the data transition and storage can be saved as lean data analysis process.
To validate the level of information lost under this framework, in the cloud, the received anomaly subsequences (VA data) were filled with normal data to form a “synthetic” data. The NDKR was defined to indicate the quantity of the normal data included during the anomaly detection process on a synthetic data. NDKR is ranged from 0% to 100%. Then, the synthetic data were analyzed by the anomaly detection method on cloud for comparison. The ADR that measures the accuracy of detecting anomaly patterns and DSSR that indicates the percentage of space saving can be investigated based on the experiment results. ADR can be defined as the ratio of the total length of detected anomaly time series subsequences in edge based on the original data and the total length of detected anomaly time series subsequences in cloud based on the synthetic data. If ADR is higher, the anomaly detection based on synthetic data on cloud is closer to the anomaly detection based on original data on edge. It also means that “synthetic” data generated by adding detected anomaly sequences with normal data can be used to represent the original data without losing the detection capability. DSSR is the ratio of data storage released without sending the original normal data to cloud. The higher DSSR means fewer normal data is needed for anomaly detection in cloud. When including more normal data (higher NDKR) on cloud, obviously, the DSSR will be lower. The optimal of this data management under edge-cloud computing approach is to find higher ADR with higher DSSR.
Experimental data of case study
To validate the trade-off of ADR and DSSR at the edge and cloud computing, we took multiple sets of real-world SPI data collected from a manufacturing production line in one of electronic device manufacturer in Taiwan from May 3, 2020 to August 25, 2020 as a case study. Figure 2 shows the overall look on height, volume and area of SPI data of a resistance component as an example. Please note that the collected height, volume and area values are time series data.
Figure 3 illustrates an example of the detected anomaly patterns (red ink) and normal data (blue ink). Basically, the detected anomaly data is VA data for investigation while normal data is auxiliary for detection. For conducting investigation, normal data is needed for detection and model training. However, much more normal data does not enhance the investigation because the goal of investigation is to find out what went wrong on manufacturing process. The detected anomaly time series patterns can help on analyzing the root cause of manufacturing defects.
Experiment design
In this work, an experiment was conducted to study the ADR and the corresponding DSSR achieved. First, compare with the traditional approach, the proposed edge-cloud approach analyzing the original data at the edge, while “synthetic” data were analyzed in the cloud. The “synthetic” data were composed of two parts: one is the detected anomaly patterns transmitted from the edge and another is the normal data kept in cloud based on the collected data in edge. The ADR and DSSR between the original data collected in the edge and the “synthetic” data generated on cloud were compared based on the same anomaly detection method. Figure 4 illustrates the conceptual model of the comparison process that the experiment followed.
4. Results
Table 2 shows experimental results of the ADR and DSSR across different NDKR based on SPI data of a component. For each NDKR ranging from 0% to 100% with 5% increment, 20 repetitions were conducted to measure ADR and DSSR. In Table 2, p-value of T-test for comparing the mean of space-saving ADR against the mean ADR in 100% (no space saving) was listed. As can be seen, the mean ADR when NDKR in 95%, 90%, 70%, 65% and 60% are not statistically different to the ADR of 100% that uses 100% normal data. It means using 95%, 90%, 70%, 65% or 60% of normal data can reach the same ADR level of 100% normal data usage. Choosing any one of these NDKR can save storage cost with maintaining the same anomaly detection level.
Figure 5 visualizes the result with two curves; blue one shows the DSSR across NDKR from 0% to 100%; the orange one shows the mean and the 95% confidence intervals. Please note that the ADR is 97.3% in the 100% NDKR actually represents the traditional approach which use 100% normal data in the anomaly detection model. In Figure 5, the result shows that the ADR obtained by using SECDG framework varies with the NDKR. While the NDKR is less than 65%, the ADR results cannot be used due to the lack of detectability by using the anomaly detection method. Therefore, the ADR values of edge-cloud approach can be compared with traditional approach only if NDKR > 65% (i.e. DSSR < 30.3%).
Specifically, for edge-cloud approach, when NDKR are 95%, 90%, 70%, 65% and 60%, the associated ADR are 96.1%, 96.6%, 96.8%, 96.3% and 93.7%, respectively, all of which are not significant difference against the ADR in 100% (97.3%), under 95% confidence interval. It also means if we use 60% normal data, the performance of anomaly detection is no different with the usage of 100% normal data. However, using 60% normal data can save 34.6% data storage cost.
The empirical data analysis based on the real-world SPI data can be used to verify the Proposition 1 to 3. Obviously, as addressed above, traditional data management approach actually generates more space waste because keeping more normal data which does not generate better ADR. The more normal data is kept, the more computational time will be needed. This extra normal data also creates the computational time waste. Therefore, Propositions 1 and 2 can be confirmed based on the experimental findings. In addition, as shown in Table 2, the edge-cloud approach can provide more than one options with different DSSR which have better ADR than the traditional approach. This finding also ensures that Proposition 3 offers better selectiveness than the traditional approach.
5. Implications and conclusion
Theoretical implication
In the ambit of contemporary academic inquiry within management scholarship, the present study delineates a series of theoretical contributions that resonate profoundly across the domains of lean manufacturing, production efficiency and the integration of advanced computational technologies within operations management.
Initially, this investigation advances the discourse within the sphere of lean data manufacturing, as articulated by Stojanovic et al. (2016), through the pragmatic application of LDEP as proposed by Marttonen-Arola et al. (2020). The introduction of a nuanced data analysis framework underpins our thesis, demonstrating how the strategic management of data within lean manufacturing contexts can markedly enhance productivity and mitigate waste. This is achieved through the employment of a sophisticated anomaly detection methodology that enables the expeditious identification and remediation of production discrepancies, thereby exemplifying the seamless fusion of lean manufacturing principles with contemporary data management practices.
Furthermore, this research endeavors to address the enduring quandaries related to production deficiencies that have perennially beleaguered the scholarly community, as highlighted by Trojanowska et al. (2018). By embracing the latest conceptualizations of VA and NVA data distinctions (Eaton et al., 2023), our study proposes a paradigm shift in data management strategies, advocating for a discerning approach toward data retention that prioritizes value-additive information. This methodology fosters a more dynamic and responsive data management framework, thereby equipping organizations with the strategic agility required to rectify operational deficiencies efficaciously.
Finally, our contribution to the intersecting realms of cloud computing, operations management and data management (Xu, 2012) responds to the scholarly impetus for leveraging cutting-edge technological advancements to address traditional operational challenges (Wang et al., 2015). The application of an edge-cloud computing architecture for anomaly detection facilitates the real-time analysis and surveillance of manufacturing operations, endowing manufacturers with the capability to monitor production processes with unprecedented immediacy. This approach not only optimizes responsiveness to operational dynamics but also aligns with the imperative to mitigate big data waste (Corbett and Chen, 2015) by streamlining the management of data through the minimization of redundant collection, transmission, cleansing and storage processes.
In summation, the contributions of this study to the corpus of management scholarship are manifold, spanning theoretical enhancements and practical innovations. By interweaving the principles of lean manufacturing with advanced data management and computational technologies, this research not only enriches the academic discourse but also delineates a path forward for the pragmatic application of these insights in enhancing the efficiency and adaptability of manufacturing operations.
Managerial implication
This study offers pivotal managerial implications by empirically substantiating a prevalent dilemma within the domain of data management: the necessity – or lack thereof – of storing all generated data prior to analysis. The act of preserving extraneous data, as exemplified through the SPI process in the manufacturing of electronic devices, engenders several forms of waste. Anomaly detection methodologies reveal that the value derived from cloud-analyzed data can be significantly enhanced during the data collection and storage phases by retaining only the data quintessential to the system’s requirements. This entails maintaining a selective portion of “normal” data alongside identified anomaly data to achieve a cost-efficient analysis performance.
The edge-cloud approach proposed in this investigation facilitates a superior degree of selectivity, offering enhanced flexibility in determining the portion of normal data to be retained in the cloud while preserving optimal ADR. Experimental outcomes suggest that operational managers are empowered to make judicious decisions under varying resource and condition constraints, embodying the principles of lean data management within the edge-cloud framework advanced by this study. For instance, in scenarios characterized by limited computing or transmission resources, managerial decisions can be calibrated to assess the necessity of using the entirety of normal data with minimal resource expenditure.
The adoption of lean data management practices within the manufacturing sector proffers at least two distinct managerial advantages. First, by furnishing real-time data and analysis, lean data management aids managers in making well-informed decisions that catalyze continuous improvement and augment competitive advantage. Second, the emphasis on waste reduction aligns with the burgeoning global movement toward minimizing the consumption of scarce resources and energy through energy-efficient and carbon mitigation strategies, concurrently bolstering corporate social responsibility in relation to environmental sustainability. The imperative to mitigate data processing waste emerges as a salient research focus within the broader discourse on resource optimization, underscoring the strategic significance of lean data management in navigating the complexities of contemporary operational environments.
Limitations and future research direction
This research elucidates the efficacy of using algorithm-based anomaly detection within an edge-cloud computing framework as a strategy for optimizing data storage and minimizing various forms of waste. The experimental analysis further investigates the dynamics between NDKR, ADR and data storage saving rate (DSSR), revealing a negative correlation between NDKR and DSSR. Interestingly, the association between ADR and NDKR does not exhibit a uniformly positive correlation across different components, suggesting nuanced interplays that merit deeper examination.
Looking ahead, there exists a fertile ground for future scholarly endeavors to refine the methodologies and technological frameworks underpinning cloud data analysis. One promising avenue involves the adjustment of computational modules and data processing methodologies to enhance the DV rate while maintaining a low rate of normal data retention. The exploration of diverse algorithms promises not only to augment the computational efficacy within edge computing environments but also to facilitate more precise real-time anomaly detection and subsequent corrective measures, thereby fostering cost reductions and heightened operational efficiency.
Moreover, while this study primarily focuses on the application of the edge-cloud computing approach to SPI data in electronic device manufacturing, the transferability and applicability of these findings to other manufacturing contexts warrant rigorous investigation. Industries such as vehicle assembly and construction offer potential landscapes for validating the universality and adaptability of this approach. Such inquiries would not only expand the empirical evidence base but also contribute to a more comprehensive understanding of the scalability and generalizability of edge-cloud computing strategies across diverse manufacturing sectors.
6. Conclusion
This research contributes to theory in multiple ways. First, we extend knowledge in lean data management by introducing the concept of big data waste which concerns data collection excess, data transmission excess, data cleansing waste and data storage excess. Using anomaly detection method approach in edge-cloud paradigm as a case study, our findings provide knowledge for electronic device manufacturing practitioners regarding the improvement of production efficiencies with an actual example. From our empirical study based on the real-world data, the cost of streaming data storage and waste of computation and transmission on cloud can be saved by detecting the anomaly pattern from original data in edge without sending too much “normal” data to cloud. The finding contradicts the traditional approach which keep all of normal data and create more wastes in terms of data management. The prior investigation on the balance of keeping normal data and storage saving is suggested for anomaly detection in manufacturing.
The authors appreciate the support from the Garmin Ltd. They thank for the financial support from the National Science and Technology Concil, Taiwan, R.O.C. (Contract No. 109-2221-E-011-101) and from Intelligent Manufacturing Innovation Center (IMIC), National Taiwan University of Science and Technology, Taipei 10607, Taiwan, which is a Featured Areas Research Center in Higher Education Sprout Project of Ministry of Education (MOE), Taiwan (since 2023).
Figure 1.Data value chain adopted from Jarr (2012)
Figure 2.Examples of SPI data from with height (top), area (middle) and volume (bottom) measurements
Figure 3.Examples of manufacturing data where red ink indicates the anomaly patterns while blue ink denotes the normal data
Figure 4.Comparison of anomaly detection between traditional and strategic edge-cloud data governance approach
Figure 5.Illustration of experimental result on ADR and DSSR across different NDKR based on a resistance testing data
Table 1.
Analogies the seven essential source wastes of manufacturing in information management and big data (Corbett and Chen, 2015)
| Manufacturing waste | Information management waste | Big data waste |
|---|---|---|
| Overproduction | Flow excess | Flow excess; excess data collection: activities associated with collecting and creating data beyond value-added needs |
| Waiting | Flow demand | Flow demand; data congestion: queue in data processing due to insufficient processing capacity/capability |
| Transport | Unnecessary transfer | Data transmission excess: unnecessary transmission of data during collection, storage and processing |
| Extra processing | Failure demand | Failure demand; data cleansing waste: resources spent on cleansing and mining “dirty data” |
| Inventory | Excessive information | Data storage excess; data entropy: over-time the usability of data diminishes |
| Motion | Incompatibility | Data integration failure: extra processing and data integration efforts taken to accommodate inefficient layout, defects, reprocessing and excess data |
| Detects | Flawed flow | Flawed flow; flawed data waste: activities resulting from poor data quality |
Source:Table created by author
Table 2.
Experimental result on ADR and DSSR across different NDKR based on a resistance testing data
| Normal data keeping rate (NDKR) | Data storage saving rate (DSSR) | Mean anomaly detection rate (ADR) | p-value (paired T-test for |
|---|---|---|---|
| 0% | 85.9% | 76.3% | <0.0001 |
| 5% | 81.6% | 72.5% | <0.0001 |
| 10% | 77.3% | 75.2% | <0.0001 |
| 15% | 73.1% | 74.9% | <0.0001 |
| 20% | 68.8% | 73.1% | <0.0001 |
| 25% | 64.5% | 71.9% | <0.0001 |
| 30% | 60.2% | 74.6% | <0.0001 |
| 35% | 56.0% | 73.9% | <0.0001 |
| 40% | 51.7% | 85.4% | <0.0001 |
| 45% | 47.4% | 89.6% | 0.0028 |
| 50% | 43.2% | 89.7% | 0.0010 |
| 55% | 38.9% | 86.1% | 0.0001 |
| 60% | 34.6% | 93.7% | 0.0564 |
| 65% | 30.3% | 96.3% | 0.2592 |
| 70% | 26.1% | 96.8% | 0.2727 |
| 75% | 21.8% | 94.5% | 0.0208 |
| 80% | 17.5% | 94.2% | 0.0141 |
| 85% | 13.3% | 92.9% | 0.0071 |
| 90% | 9.0% | 95.6% | 0.0626 |
| 95% | 4.7% | 96.1% | 0.1194 |
| 100% | 0.0% | 97.3% | – |
Source:Table created by author
© Emerald Publishing Limited.
