Content area
The large number of sensors in the field, and the usage of IoT based equipment generate big collections of data, some of them being useful to address maintenance policies, replacement requirements, calibration, research on new data analysis models etc. Both industrial and social systems are increasing in complexity due to new technologies applied to information processing. Recent developments in embedding and integration offered new opportunities to collect, filter, analyse, and interpret huge collections of data generated by large populations of sensor, special devices, or people, and named "Big Data". This work emphasizes on conceptual issues and methodological aspects related to data registration, filtering, smoothing, analysing in order to predict important indicators of the quality of life. The systems reliability engineering field is revisited taking into account both data sources and the new methodologies used for reliability data. Software reliability of applications for smart cities is also addressed. The following frameworks are considered: Systems of Systems (SoS), Big Data, System Operating/Environmental (SOE) data, and Smart cities reliability. The SoS reliability engineering depends on its nature: virtual - based on resource sharing, collaborative - based on agreements, acknowledged - based on collaborative management through a well defined interface, and directed SoS - based on centralized management. The SoS reliability is estimated differently depending on the specific architecture and particular reliability requirements. When the reliability is considered in context Big data, both technologies are considered: batch processing (based on analytics on "data at rest") or stream processing (analytics on "data in motion"). The adequacy of existing reliability models to the Big data reliability concerns, taking into account the 'curse of dimensionality" is considered in the last section.
Abstract: The large number of sensors in the field, and the usage of IoT based equipment generate big collections of data, some of them being useful to address maintenance policies, replacement requirements, calibration, research on new data analysis models etc. Both industrial and social systems are increasing in complexity due to new technologies applied to information processing. Recent developments in embedding and integration offered new opportunities to collect, filter, analyse, and interpret huge collections of data generated by large populations of sensor, special devices, or people, and named "Big Data". This work emphasizes on conceptual issues and methodological aspects related to data registration, filtering, smoothing, analysing in order to predict important indicators of the quality of life. The systems reliability engineering field is revisited taking into account both data sources and the new methodologies used for reliability data. Software reliability of applications for smart cities is also addressed. The following frameworks are considered: Systems of Systems (SoS), Big Data, System Operating/Environmental (SOE) data, and Smart cities reliability. The SoS reliability engineering depends on its nature: virtual - based on resource sharing, collaborative - based on agreements, acknowledged - based on collaborative management through a well defined interface, and directed SoS - based on centralized management. The SoS reliability is estimated differently depending on the specific architecture and particular reliability requirements. When the reliability is considered in context Big data, both technologies are considered: batch processing (based on analytics on "data at rest") or stream processing (analytics on "data in motion"). The adequacy of existing reliability models to the Big data reliability concerns, taking into account the 'curse of dimensionality" is considered in the last section.
Keywords: Systems of Systems; IoT; Big Data; Smart Cities; Reliability Data, Reliability Models.
I. INTRODUCTION
The spreading of IoT based equipment generates new opportunities but also new situations to be managed during equipment exploitation. Moreover, many IoT systems are embedded structures, using both hardware and software components. Complex applications, like smart cities, grids, meteorological monitoring, pollution monitoring etc., need a new view on testing and evaluation before running on the field. This is critical, also for Systems of Systems, mainly used as defence equipment.
Not only the large volume of data collected from sensors or various instruments, but the requirements for filtering, analysing, and make a decision in real time, ask for more investigation on methodological aspects and appropriate models to evaluate the dependability of the new software applications.
This paper investigates some reliability aspects in big data context with a close view on SoS/Soc/SOC/SOE, towards a specific architecture for monitoring the reliability of distributed systems in big data context.
II. THE WORLD OF BIG DATA
The study described in [2] on developing a modern course on new data challenges, considers Big Data as one of the most challenging issues. Not only the three Vs model, and technological approaches are considered, but also new extensions.
Recently, the world of Big Data was upgraded to the five Vs [11]: Volume, Velocity, Variety, Variability, and Value. By "volume" is referred the large size of data collections generated by a large number of actors like: people (in various societal activities), sensors, and smart devices used in applications that produce real time data streams. The last category, called also IoT - Internet of Things, is able to generate huge volumes of data. "Velocity" can be seen both as speed of data generation and as the minimum time to produce the best decision. For industrial applications, real-time results should be provided. Due to the complexity of a problem to be solved, for instance in the case of environment monitoring, not only the volume, or the velocity is important, but also the "variety" of data coming from different sources. The "variability" emphasizes on the context of data registration, or data filtering methods used before "mining" the data. Finally, reliable data collected from field have "value", being useful to derive appropriate decisions and learn about the process behavior. The report [20] mentioned a new V, from veracity: "This refers to data accuracy as well as source reliability, the context out of which the data comes, the methods for sorting and storing information, and a range of factors that can influence the data's validity". Big data technologies can be of type batch processing (based on analytics on "data at rest") or stream processing (analytics on "data in motion") [4].
In data analysis we are interested both in the size of the "sample", and the dimension of data (quantitative, categorial, univariate, multivariate). Some-times, the second aspect is referred as complexity (C). When consider the multivariate case, a Big data challenge is generated by the "curse of dimensionality" [3], which states that the performance/behavior of an algorithm which is good for low dimensions, it deteriorates when the dimension of input space increases. Another aspect outlines the difficulty of modelling with non-linear data.
Characteristics of maintenance data in context Big Data are identified by Zhang in [24] based on 3Vs model: Variety (structured data, semi-structured data, unstructured data), Velocity (transactional data, multidimensional data, stream data), Volume (directly related data, indirectly related data). Moreover, the study [1] add new attributes to be considered when deal with Big Data: privacy (P) and usability (U). Hence, the most recent Big Data model be-comes 5V+CPU.
The Big Data tower can be empowered if the following challenges are well supported: (1) big data representation models: supporting "not only SQL" data is compulsory; (2) big data confidentiality assurance: data breaches [25] are more and more reported in our era; (3) data life cycle management: with the increasing IoT data rate generation is important to filter those data to be stored for analysis; (4) Redundancy reduction and data reliability assurance: register only "centroid data", or free of outliers samples.
Following [2], the Big Data analysis pipeline includes the following five major steps: (1) acquisition and recording; (2) extraction, cleaning, and annotation; (3) integration, aggregation, and representation; (4) analysis and modelling, and (5) interpretation.
The first step is subject to a Big data generation module supporting large database management, and large scale data acquisition. Big data acquisition pipeline follows some steps like: data collection (from log files, sensors, mobile items, vehicular items etc.), data transmission (depending on infrastructure communication, protocols, data protection assurance etc.), and data pre-processing (outlier identification, noise elimination, data fitting and smoothing etc.). Following the second step, this depends on methods for pattern identification in a large collection of data sand labeling such patterns for future use in learning from data and prediction oriented tasks. The third step asks for a unique representation of data coming for various and heterogeneous sources. The current solution is based on RDF (Resource Description Framework [26]). The fourth step is a milestone. Various models should be checked and tested for validity. Learning from data is a difficult task, but very interesting. Many tools can be used to interactively assist the researcher in discovering patterns in data. Finally, the researcher has to interpret the findings. The experience of the researchers will prove the advantage of the selected pathway, or will declare the results as an "unsuccessful" research.
III.MAINTENANCE AND RELIABILITY SOLUTIONS IN BIG DATA CONTEXT
According to [21], the sources of big data are: Internet Data (represents all digital data hosted on the Internet, including user-generated data from social media platforms), Industrial and Sensor Data (data generated by machines, mobiles, GPS data, and IoT data), Enterprise Data (all business data, including customer, inventory, and transactional data), and Public Data (generated and collected by government agencies, educational institutions and all non-government organizations).
In the following we are interested in data registration from sources as machines, sensors and IoT, which address reliability and maintainability.
The most recent solution is IBM-PMO [23]: "IBM Predictive Maintenance and Optimization". It collects data from machines and analyzes to learn about failures and predict equipment failure". A detailed view into equipment performance is generated to be used for maintenance efforts' optimization.
IBM PMO permits to the reliability engineer the risks identification and management in the case of failure or a halt in operations, through six functionalities:
1) The health scores of every source of data are calculated according to specific models and the future life is predicted;
2) The assets and processes are monitored in real-time;
3) The asset failures and quality issues are detected earlier;
4) By mining procedures the cause of failure can be identified;
5) A recommendation plan on maintenance operations is generated by an optimization module;
6) Custom solutions can be generated depending on specific maintenance particular procedures.
In order to learn from reliability big data, various artificial intelligence methodologies can be used: data clustering [6], context-based analytics [18], statistical inference [13, 15], and deep learning [20].
The most challenging aspect is connected to the new concepts: SoS - System of Systems, SOC - Self Organized Criticality, and SOE (System Operating/Environmental) data.
Moving from Systems of Components (S°C) to Systems of Systems (SoS), means that engineering systems, mastered by a "divide and conquer" approach, makes possible to consider the complex engineering systems as larger cooperating systems and more similarly to natural systems [21], and more emphasize is necessary to understand "complex events", under imprecision and/or uncertain information. According to [14], the distinctive factor between a system with respect to SoS is "understanding the aspect of the environment or otherwise stated the differences between a system or a group of systems that constitute the SoS". High levels of safety, reliability, maintainability, and dependability will be required for any SoS, mainly those de-signed for critical missions [8]. The application of test and evaluation (T&E) to SoS is highly influenced by the complexity of the SoS.
The SoS reliability engineering depends on its nature:
* virtual - based on resource sharing,
* collaborative - based on agreements,
* acknowledged - based on collaborative management through a well defined interface, and
* directed SoS - based on centralized management.
The SoS reliability is estimated differently depending on the specific architecture and particular reliability requirements. A detailed analysis on T&E to acknowledged SoS is given by [8]. Based on systems engineering (SE) processes planned for SoS, Dahmann et al [8], found that a framework for T&E should consider evidence-based approaches, continuously assessment of SoS, and leam about SoS performance by extending T&E "to include continual feedback processes".
A strong impact on SoS reliability is generate by software systems developed for SoS [9]. The most difficult task is to integrate software components from different vendors to fulfil an unique objective of SoS.
SOC is recently considered in reliability engineering [21], being an well known concept in physics. SOC combines self-organization and criticality to describe complexity and is defined by such a state of the system which is formed by self-organization in a long transient period at the border of stability and chaos. It was also identified in neuroscience [10].
SOE concept is considered by [15] to describe data collected from sensors or smart chips installed in a product or equipment to measure different variables like environmental parameters, the usage rate, system load etc. Big SOE data are generated by: transportation engines (locomotives, aircrafts, automobiles), power distribution transformers, wind energy devices, solar energy devices, medical systems (computed tomography scanners, pressure sensors in infusion pumps or sleep apnea machines, airflow sensors in anesthesia delivery systems etc.).
Sometimes is useful to follow a fault-tolerant approach in order to maintain the data sequence integrity over an interval of time. For instance, the sensors installed in a meteorological point can be organized in groups of three for every parameter to be measured. At least two values should be available to report a "centroid" value to the prediction model. The faulty sensor should be replaced. This example illustrates that some individual systems (components) can have low operational reliability, and a mechanism to rebuild the "initial" state when a recovery from failure is activated by replacement or repairing.
From computational point of view, SOE data are vectors of time series [15] and should be analyzed by specific methods, and well suited tools.
An important concept, suitable to analyze SOE data, is "dynamic covariate information". With this respect, for every item is registered not only the failure time, but also its history and the current environment variables (t, f, Ht, E), t in some interval. The registration may use the fixed or variable clock model. The fixed model is used, at the mentioned time (t), a binary variable (f) will give the information on the failure existence (1), or missing a failure (0). Analyzing data collected from IoT, with a huge history (Ht), will increase the processing time, mainly important when real time analysis is required. Depending on the application, the most relevant historical aspects will be selected and used during the decision making process.
Such an approach is given by [6], for a large number of assets operating on the field. For every asset a the following data are available: the values of K control variables arranged into vectors Xa = (va[1], Va[2], ..., Va[K]), where a in {1, 2, ..., A}, with A the number of assets. Every asset a is observed over time, at moments M[0], M[1], ..., M[Na], where and the Na observations on its behavior are registered along an information vector F[0], F[1], ..., F[Na], telling us that M[i] is the failure time if F[i] = 1. If F[i] = 0, the asset is well "functioning". If there are many types of assets under monitoring, the partitioning in a number of classes is running first. Then, the reliability analysis can be applied on every class.
To support experiments, the Weibull distribution function can be used (also, the exponential distribution in some cases). The case of rare events is considered by Chen [7] for power systems. When study systems with rare reliability problems, one needs to use simulation models for rare events probability distribution function [5].
IV.SOFTWARE RELIABILITY IN THE NEW CONTEXT
The measurement/prediction of the degradation of an item is based on the covariate information collected by Ht. Many models for observed degradation, depending on a specific field, can be proposed and used to understand the future behavior of items under discussion: additive models (based on the dynamic covariate contribution, the random variability, and the error term), linear models, exponential models etc.
Following the past experience, an update to the already known algorithm for reliability growing, and for optimal maintenance planning should be considered.
Designing software in big data context asks for the management of distributed sources of data, reliable links of communication in the network in order to offer services with high availability and dependability [12]. Sources of data are organized as databases. This is important because if all databases have the same structure we can address the reliability of database by already known approaches. If the software works with multiple structures, the software should be tested and analyzed for every type of structure. The database reliability is assured by means of local recovery managers that maintains the atomicity and durability of local transactions at each node of the data network. Out-of-place and In-place strategies can be used by local recovery managers doing updates in local database.
The reliability of the distributed sources of data depends on the protocols used for transaction commitment and for data replication. After failure, the inconsistences should be detected and solved. Some protocols are quorum based [17], but a blockchain [27] approach (distributed ledger technology) can be used to register all transactions by multiple players over the network.
Starting from [16], an updated architecture can be developed to deal with reliability of distributed systems in big data context. The following modules will assure the reliability monitoring of big data based distributed systems: 1) M1 - reliability big data acquisition, recording and preprocessing; 2) M2 - data analytics including smart clustering; 3) M3 - reliability prediction; 4) M4 - adaptive algorithms for fault management.
Data collection is assured by M1, which is responsible for outliers detection and data cleansing. The automatic detection of outliers in time series is based on [28]. The module M2 is a data-mining component responsible with: classification, pattern discovery, and clustering. The software reliability growth algorithms are used to estimate the reliability characteristics in the framework of M3. Based on obtained results and specific procedures, various plans are offered by M4. For SoS T&E, the fourth module provides testing plans and associated tests generated by field based algorithms, or genetic algorithms.
This architecture considers the "curse of dimensionality" as an important challenge. Not only the volume of data is large, but also, the number of parameters to be estimated as explanatory variables is increasing. Therefore, many algorithms used for pre-processing, data analytics, reliability prediction, and maintenance planning should be redesigned to support the distributed computing paradigm.
The proposed architecture is useful both for monitoring the reliability of S°C, SOC, SoS, SOE, mechatronic projects [29, 30], and software applications supporting the "life" in smart cities.
V.CONCLUSIONS
The new complexity, generated both by the size of data collections from the field (SOE), and the integration of systems in SoS to assure the fulfilment of a unique objective with respect to the management of SOC, generates new challenges when consider the reliability, availability, and dependability of a complex system, including those based on large collections of sensor or devices generating Big Data collections.
The paper has considered both the world of Big data and the world of systems from reliability point of view. Future investigation will be dedicated to the adequacy of existing reliability models to the Big data reliability concerns, taking into account the "curse of dimensionality".
Acknowledgements
The second author acknowledge the support of the Scientific Research Center in Mathematics and Computer Science of "Spiru Haret" University in the framework of the project "Recent computing methodologies".
Reference Text and Citations
[1] Agrawal, D., Bernstein, P., Bertino, E., Davidson, S., Dayal, U., Franklin, M., Widom, J., 2012. Challenges and Opportunities with Big Data: A white paper prepared for the Computing Community Consortium committee of the Computing Research Association, http://cra.org/ccc/resources/ccc-led-whitepapers/
[2] Albeanu, G., 2017. Building an Undergraduate Course in Data-Driven Methodologies, The International Scientific Conference eLearning and Software for Education; Bucharest, Vol. 3, pp. 62-67.
[3] Bellman, R.E., 1957. Dynamic Programming, Princeton University Press
[4] Big data Analytics for Security Intelligence, Cloud Security Alliance, https://downloads.cloudsecurityalliance.org/ initiatives/bdwg/Big_Data_Analytics_for_Security_Intelligence.pdf (2013)
[5] Blanchet, J., Lam, H., 2011. Rare Event Simulation Techniques, In Proceedings of the 2011 Winter Simulation Conference (S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds.), pp. 146-160
[6] Cannarile, F., Compare, M., Di Maio, F., Zio, E., 2015. Handling reliability big data: a similarity-based approach for clustering a large fleet of assets, In: Podofillini, L., Sudret, B., Stojadinovic, B., Zio, E., Kröger, W. (eds) ESREL 2015, Safety and Reliability of Complex Engineered Systems, pp. 891-896
[7] Chen, Q., 2004. The probability, identification, and prevention of rare events in power systems, http://lib.dr.iastate.edu/rtd/1149/
[8] Dahmann, J., Rebovich, G., Lane, J.A., Lowry, R., Palmer, J., 2010. Systems of Systems Test and Evaluation Challenges, 5th IEEE International Conference on System of Systems Engineering, DOI: 10.1109/SYSOSE.2010.5543979.
[9] Goodenough, J.B., 2010. Evaluating Software's Impact on System and System of Systems Reliability, https://www.sei.cmu.edu/library/assets/SW%20impact%20on %20 system%20reliability.pdf, SEI
[10] Hesse, J., Gross, T., 2014. Self-organized criticality as a fundamental property of neural systems, DOI: 10.3389/fnsys.2014.00166
[11] Jain, A., 2016. The 5 Vs of Big Data, https://www.ibm.com/blogs/watson-health/the-5-vs-of-big-data/
[12] Kumar, V. D., 2017. Software Engineering for Big Data Systems, PhD thesis, University of Waterloo.
[13] Letot, Ch., Dehombreux, P., 2009. Degradation models for reliability estimation and mean residual lifetime, Proceedings of the 8th National Congress on Theoretical and Applied Mechanics, pp. 618-625
[14] Lubas, D.G., 2017. Department of defence system of systems reliability challenges, RAMS, DOI: 10.1109/RAM.2017.7889676
[15] Meeker, W.Q., Hong, Y., 2013. Reliability Meets Big Data: Opportunities and Challenges, http://lib.dr.iastate.edu/stat_las_preprints/82/, Iowa State University.
[16] Popentiu, Fl., Sens, P., 1999. A Software Architecture for Monitoring the Reliability in Distributed Systems. European Safety and Reliability Conference (ESREL '99), Munchen.
[17] Skeen, D., 1982. A quorum-based commit protocol, TR 82-483, Cornell University, https://ecommons.cornell.edu/ bitstream/handle/1813/6323/82-483.pdf?sequence= 1.
[18] Sokol, L., Chan, S., 2013. Context-Based Analytics in a Big Data World: Better Decisions, IBM Redbooks, pp. 1-8
[19] Statistical Package for Reliability Data Analysis (SPREDA), https://cran.r-project.org/web/packages/SPREDA/ SPREDA.pdf (2015)
[20] Tamura, Y., Matsumoto, M., Yamada, S., 2016. Software Reliability Model Selection Based on Deep Learning, DOI: 10.1109/ICIMSA.2016.7504034
[21] The future of Data-Driven Innovation, US Chamber of Commerce Foundation, https://www.uschamberfoundation. org/sites/default/files/Data%20Report%20Final%2010.23.pdf (2014)
[22] Volovoi, V., 2016. Big Data for Reliability Engineering. Threat and Opportunity, Reliability, February, pp. 11-15, http://volovoi.com/pubs/bdre16.pdf
[23] Watson IoT - IBM Predictive Maintenance and Optimization, https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=WW S 12362USEN& (2017)
[24] Zhang, L., 2016. Big Data Analytics for Fault Detection and its Application in Maintenance, PhD Thesis, Luleå University of Technology.
[25] ···, Data breaches (2017), https://www.identityforce.com/blog/2017-data-breaches.
[26] ···, RDF, Resource Description Framework (RDF). Model and Syntax.
[27] ···, Distributed Ledger Technology: Implications of Blockchain for the Securities Industry, Report from FIRA, https://www.finra.org/sites/default/files/FINRA_Blockchain_Report.pdf.
[28] ···, Automatic detection of outliers in time series, https://cran.r-project.org/web/packages/tsoutliers/tsoutliers.pdf.
[29] Tarca, R., Csokmai, L., Vesselenyi, T., Tarca, I., Vladicescu, F. P., 2008. Augmented Reality Used to Control a Robot System via Internet, International Joint Conference on Computer, Information, Systems Sciences and Engineering, Bridgeport, pag. 539-544.
[30] Albeanu, G.; Tarca, R. C.; Popentiu-Vladicescu, Fl.; Pasc, I., 2010, Interoperability Assurance For Remote Mechatronic Laboratories Used For Virtual Training 6th International Scientific Conference eLearning and Software for Education, eLearning and Software for Education, pp.: 249-256.
Copyright "Carol I" National Defence University 2018