A Review of Kernel Methods for Feature Extraction

Full text

Turn on search term navigation

1. Introduction

Process monitoring refers to various methods used for the detection, diagnosis, and prognosis of faults in industrial plants [1,2]. In literature, the term “fault” has been defined as any unpermitted deviation of at least one process parameter or variable in the plant [3]. Although controls are already in place to compensate for process upsets and disturbances, process faults can still occur [1]. These faults include sensor faults (e.g., measurement bias), actuator faults (e.g., valve stiction), fouling, loss of material, drifting reaction kinetics, pipe blockages, etc. Fault detection, diagnosis, and prognosis methods aim to, respectively, determine the presence, identify the cause, and predict the future behavior of these process anomalies [2,4]. Thus, process monitoring is a key layer of safety for maintaining an efficient and reliable operation of industrial plants [5].

In general, process monitoring can be performed using either a physics-driven, knowledge-driven, or data-driven approach (see Figure 1) [1,6]. Among these, the data-driven approach may be preferred due to the following reasons. Physics-driven methods rely on a first-principles model of the system, i.e., mass-and-energy balances and physical/chemical principles, which is used to check how well the theory agrees with the observed plant data. However, these models are difficult to construct given the complexity of modern industrial plants [6]. Similarly, knowledge-driven methods rely on expert knowledge and the experience of plant operators to judge process conditions, but a comprehensive knowledge base may be too time-consuming to accumulate and codify precisely [6]. In contrast, data-driven methods rely only on plant data from which statistical models can be built to classify normal from faulty conditions. Nowadays, plant data sets are generated in abundance [7]. Samples are collected from online sensors on hundreds to thousands of process variables every few seconds [8] via Supervisory Control and Data Acquisition (SCADA) systems. Many researchers have long recognized the opportunity to exploit these data sets for process monitoring, and this led to the development of Multivariate Statistical Process Monitoring (MSPM) methods. Data-driven methods and MSPM provide the context to this review paper. However, in the larger context, process monitoring researchers must still aim for the right synergy between physics-, knowledge-, and data-driven technologies.

The popularity of data-driven MSPM methods has increased in the past few decades, especially towards the advent of the Industry 4.0 era. Applications of machine learning [9,10,11], Big Data [12,13], artificial intelligence (AI) [14], and process data analytics [15,16] to the process systems engineering (PSE) field are now gaining acceptance. Deep neural nets, support vector machines, fuzzy systems, principal components analysis, k-nearest neighbors, K-means clustering, etc., are now being deployed to analyze plant data, generate useful information, and translate results into key operational decisions. For instance, Patwardhan et al. [17] recently reported real-world applications of these methods for predictive maintenance, alarm analytics, image analytics, and control performance monitoring, among others. Applications of the MSPM methods to an industrial-scale multiphase flow facility at Cranfield University have also been reported in [18,19]. Until now, new methods are still being developed within the machine learning and AI community, and so do their applications in PSE. This means that it may be difficult to select which data-driven methods to use. Nevertheless, chemical engineers can apply their domain expertise to match the right solutions to the right engineering problems.

Despite the benefits of data-driven techniques, it is still challenging to use them for process monitoring due to many issues that arise in practice. One key issue that is highlighted in this paper is the fact that real-world systems are nonlinear [20]. More precisely, the relationship between the process variables are nonlinear. For example, pressure drop and flow rate have a squared relationship according to Bernoulli’s equation, outlet stream temperature and composition in a chemical reactor are nonlinearly related due to complex reaction kinetics, and so on. These patterns must be learned and taken into account in the statistical models. If the analysis of data involves linear methods alone, fault detection may be inaccurate, yielding many false alarms and missed alarms. Note, however, that linear methods can still be applied provided that the plant conditions are kept sufficiently close to a single operating point. This is due to the fact that a first-degree (linear) Taylor series approximation of the variable relationships can be assumed close to a fixed point. Linear methods are attractive because they rely only on simple linear algebra and matrix theory, which are elegant and computationally accessible. However, if the plant is operating at a wide range of conditions, the resulting nonlinear dynamic behavior must be addressed with more advanced techniques.

Kernel methods or kernel machines are a class of machine learning methods that can be used to handle the nonlinear issue. The main idea behind kernel methods is to pre-process the data by projecting them onto higher-dimensional spaces where linear methods are more likely to be applicable [21]. Thus, kernel methods can discover nonlinear patterns from the data while retaining the computational elegance of matrix algebra [22]. In the process monitoring context, kernel learning is mostly used in the feature extraction step of the analysis of plant data. In this paper, we review the applications of kernel methods for feature extraction in nonlinear process monitoring.

In detail, the objectives of this review are: (1) To motivate the use of kernel methods for process monitoring; (2) To identify the issues regarding the use of kernel methods to perform feature extraction for nonlinear process monitoring; (3) To review the literature on how these issues were addressed by researchers; and (4) To suggest future research directions on kernel-based process monitoring. This work is mainly dedicated to the review of kernel-based process monitoring methods, which has not appeared before to the best of the authors’ knowledge. Other related reviews that may be of interest to the reader are also available, as listed Table 1, along with their relationship to this paper.

This review paper is timely for two reasons. The original proponent of the first developed kernel feature learner called kernel principal components analysis (KPCA) was Bernhard Schölkopf [22] in a 1998 paper, together with Alexander Smola and Klaus-Robert Müller. KPCA paved the framework for more kernel extensions of linear machines, known today as kernel methods. For his contributions, Schölkopf was awarded the Körber Prize last September 2019, which is “the scientific distinction with the highest prize money in Germany” [23]. This recognition highlights the impact kernel methods have made to the field of data analytics. The purpose of this paper is to showcase this impact in the process monitoring field. Shortly after, Lee et al. [24] was the first to use KPCA for nonlinear process monitoring in 2004. Hence, this paper is timely as it reviews the development of kernel-based process monitoring research for the last 15 years since the first application by Lee et al.

This paper is organized as follows. In Section 2, we first motivate the use of kernel methods and situate them among other machine learning tools. Section 3 provides the methodology on how the literature review was conducted, and also includes a brief summary of review results. The main body of this paper is Section 4, where we detail the issues surrounding the use of kernel methods in practice, and the many ways researchers have addressed them through the years. A future outlook on this area of research is given in Section 5. Finally, the paper is concluded in Section 6.

2. Motivation for Using Kernel Methods

To motivate the use of kernel methods, we first discuss how a typical data-driven fault detection framework works (see Figure 2). A plant data set for model training usually consists of N samples of M variables collected at normal operating conditions. This data is normalized so that the analysis is unbiased to any one variable, i.e., all variables are treated equally. Firstly, the data set undergoes a feature extraction step. We refer to feature extraction as any method of transforming the data in order to reveal a reduced set of mutually independent signals, called features, that are most sensitive to process faults. In Figure 2, this step is carried out by multiplying a projection matrix of weight vectors to a vector of samples, $x_{k}$ , at the kth instant. Secondly, a statistical index is built from the features, which serves as a health indicator of the process. The most commonly used index is Hotelling’s $T^{2}$ , which is computed as shown in the figure as well. Finally, the actual anomaly detector is trained by analyzing the distribution of $T^{2}$ . In this step, the aim is to find an upper bound or threshold on the normal $T^{2}$ values, called the upper control limit or UCL. This threshold is based on a user-defined confidence level, e.g., 95%, which represents the fraction of the area under the distribution of $T^{2}$ that is below the UCL. During the online phase, an alarm is triggered whenever the computed $T^{2}$ exceeds the fixed $T_{UCL}^{2}$ , signifying the presence of a fault.

When a fault is detected, fault diagnosis is usually achieved by identifying the variables with the largest contributions to the value of $T^{2}$ at that instant. Lastly, fault prognosis can be performed by predicting the future evolution of the faulty variables or the $T^{2}$ index itself.

2.1. Feature Extraction Using Kernel Methods

Among the three basic steps in Figure 2, feature extraction is found to have the greatest impact to process monitoring performance. Even in other contexts, feature engineering is regarded as the one aspect of machine learning that is domain-specific and, hence, requires creativity from the user [39,40]. As such, traditional MSPM methods mainly differ in how the weight vectors are obtained. Weights can be computed via principal components analysis (PCA), partial least squares (PLS), independent components analysis (ICA), Fisher/linear discriminant analysis (FDA or LDA), or canonical correlation analysis (CCA) [1]. However, only a linear transformation of the data is involved in these methods. Mathematically, a linear transformation can be written as:

(1) $f_{k} = W_{n}^{T} x_{k},$

where

W_{n} \in R^{M \times n}

is the projection matrix,

f_{k} \in R^{n}

are the features, and

x_{k} \in R^{M}

are the normalized raw data at the kth instant. For the case of PCA, the

W

can be computed by diagonalizing the sample covariance matrix,

C = cov (x_{k}, x_{k})

, as [1]:

(2) $\begin{matrix} C & = V Λ V^{T} \in R^{M \times M}, \end{matrix}$

(3) $\begin{matrix} W & = V^{T} Λ^{- 1 / 2} \in R^{M \times M}, \end{matrix}$

where

V

contains the eigenvectors with corresponding eigenvalues in

Λ

. Only the first n columns of

W

are taken to finally yield

W_{n}

. The weights from PCA are orthogonal basis vectors that describe directions of maximum variance in the data set [1].

In order to generate nonlinear features, a nonlinear mapping can be used to transform the data, $ϕ (x)$ , so that Equation (1) becomes $f_{k} = W_{n}^{T} ϕ (x_{k})$ . However, the mapping $ϕ (\cdot)$ is unknown and difficult to design. In 1998, Schölkopf et al. [22] proposed to replace the sample covariance matrix, $C = cov (ϕ (x_{k}), ϕ (x_{k}))$ , by a kernel matrix $K_{i j} = k (x_{i}, x_{j})$ whose elements are computed by a kernel function, $k (\cdot, \cdot)$ . They have shown that if the kernel function satisfies certain properties, it can act as a dot product in the feature space. That is, the $K_{i j}$ can take the role of a covariance matrix of nonlinear features. By adopting a kernel function, the need to specify $ϕ (\cdot)$ has now been avoided, and this realization has been termed as the kernel trick [22]. The result is a method called kernel principal components analysis (KPCA) [22], a nonlinear learner trained by merely solving the eigenvalue decomposition of $K_{i j}$ as in Equation (2). As mentioned in Section 1, KPCA is the first kernel method applied to process monitoring as a feature extractor [24].

Upon using kernel methods, the nonlinear transformation is now equivalent to [22]:

(4) $f_{k} = {[\sum_{j = 1}^{N} w_{i}^{T} k (x_{k}, x_{j}^{'})]}_{i = 1, \dots, n}$

where

w_{i} \in R^{M}

is a column weight vector,

f_{k} \in R^{n}

are the features,

x_{k} \in R^{M}

is the new data to be projected,

x^{'} \in R^{M}

is the training data set, and

k (\cdot, \cdot)

is the kernel function. The kernel function is responsible for projecting the data onto high-dimensional spaces where, according to Cover’s theorem [21], the features are more likely to be linearly separable. This high-dimensional space is known in functional analysis as a Reproducing Kernel Hilbert Space (RKHS) [22]. Usual choices of kernel functions found from this review are as follows:

(5) $\begin{matrix} Gaussian radial basis function (RBF) : & k (x, x^{'}) = exp (\frac{- ∥ x - x^{'} ∥^{2}}{c}) \end{matrix}$

(6) $\begin{matrix} Polynomial kernel (POLY) : & k (x, x^{'}) = {(⟨ x, x^{'} ⟩ + 1)}^{d} \end{matrix}$

(7) $\begin{matrix} Sigmoid kernel (SIG) : & k (x, x^{'}) = tanh (a ⟨ x, x^{'} ⟩ + b), \end{matrix}$

where

a, b, c, d

are kernel parameters to be determined by various selection routes.

To understand what happens in the kernel mapping, Figure 3 shows three sample data sets and their projections in the kernel feature space. The red and blue data points belong to different classes, and evidently, it is impossible to separate them by a straight line in the original data space. However, after a kernel transformation onto a higher dimensional space, it is now possible to separate them using a linear plane (white contour), which translates to a nonlinear boundary in the original space. In these examples, an RBF kernel of various c values was used, Equation (5), and the transformation is computed using Support Vector Machines (SVM). More theoretical details on kernel methods, KPCA, and SVM can be found in other articles [22,41,42], as well as books such as Kernel Methods for Pattern Analysis by Shawe-Taylor and Cristianini [43], Support Vector Machines and Other Kernel-based Learning Methods by Cristianini and Shawe-Taylor [44], and Pattern Recognition and Machine Learning by Bishop [45].

2.2. Kernel Methods in the Machine Learning Context

Aside from kernel methods, other tools from machine learning can also be applied to process monitoring. Figure 4 gives an overview of learning methods that are relevant to process monitoring, from the authors’ perspective. Each method in this figure represents a body of associated techniques, and so the reader can search using these keywords to learn more. More importantly, the methods that were marked with an asterisk (*) have a “kernelized” version, and so they belong to the family of kernel methods. To kernelize means to apply the kernel trick to a previously linear machine. For example, PCA becomes Kernel PCA, Ridge Regression becomes Kernel Ridge Regression, K-means clustering becomes Kernel K-means, and so on. All these methods were developed to solve a particular learning problem or learning task, such as classification, regression, clustering, etc.

Supervised and unsupervised learning are the two main categories of learning tasks (although semi-supervised, reinforcement, and self-supervised learning categories also exist [9,11,46]). According to Murphy [47], learning is supervised if the goal is to learn a mapping from inputs to outputs, given a labeled set of input-output pairs. On the other hand, learning is unsupervised if the goal is to discover patterns from a data set without any label information. In the context of process monitoring, examples of learning problems under each category can be listed as follows:

Supervised learning
Classification: Given data samples labeled as normal and faulty, find a boundary between the two classes; or, given samples from various fault types, find a boundary between the different types.
Regression: Given samples of regressors (e.g., process variables) and targets (e.g., key performance indicators), find a function of the former that predicts the latter; or, find a model for predicting the future evolution of process variables whether at normal or faulty conditions.
Ensemble methods: Find a strategy to combine results from several models.
Unsupervised learning
Dimensionality reduction: Extract low-dimensional features from the original data set that can enable process monitoring or data visualization.
Clustering: Find groups of similar samples within the data set, without knowing beforehand whether they are normal or faulty.
Density Estimation: Find the probability distribution of the data set.

In relation to the framework in Figure 2, one possible correspondence would be the following: (1) Use dimensionality reduction or clustering for feature extraction; (2) Use density estimation for threshold setting; (3) Use classification for diagnosis; and, (4) Use regression for prognosis and other predictive tasks. It is clear from Figure 4 that kernel methods can participate in any stage of the process monitoring procedure, not just in the feature extraction step. In fact, many existing frameworks already used kernel support vector machines (SVM) for fault classification, kernel density estimation (KDE) for threshold setting, etc. We also note that many other alternatives to kernel methods can be used to perform each learning task. For instance, an early nonlinear extension of PCA for process monitoring was based on principal curves and artificial neural networks (ANN) by Dong and McAvoy [48] in 1996. Even today, ANNs are still a popular alternative to kernel methods.

2.3. Relationship between Kernel Methods and Neural Networks

Neural networks are attractive due to their universal approximation property [49], that is, they can theoretically approximate any function to an arbitrary degree of accuracy [45]. Both ANNs and kernel methods can be used for nonlinear process monitoring. However, one important difference between them is in the computational aspect. Kernel methods such as KPCA are faster to train (see Section 2.1), whereas ANNs require an iterative process for training (i.e., gradient descent) because of the need to solve a nonlinear optimization problem [44]. But during the online phase, kernel methods may be slower since they need to store a copy of the training data in order to make predictions for new test data (see Equation (4)) [45]. In ANNs, once the parameters have been learned, the training data set can be discarded [45]. Thus, kernel methods have issues with scalability. Another distinction is provided by Pedro Domingos in his book The Master Algorithm [50] in terms of learning philosophy: If ANNs learn by mimicking the structure of the brain, kernel methods learn by analogy. Indeed, the reason why kernel methods need to store a copy of the training data is so that it can compute the similarity between any test sample and the training samples. The similarity measure is provided by the kernel function, $k (\cdot, \cdot)$ [44]. However, selecting a kernel function is also a long-standing issue. Later on, this review includes a survey of the commonly used kernel functions for process monitoring.

Despite the many distinctions between kernel methods and ANNs, neither of them is clearly superior to the other. Presently, many of the drawbacks of each are already being addressed, and their unique benefits are also being enhanced. Also, these two approaches are connected in some ways, as explained in [45]. For instance, the nonlinear kernel transformation in Equation (4) can be interpreted as a two-layer network [51]: the first layer corresponds to $x_{k} \to k (x_{k}, x^{'})$ , while the second layer corresponds to $k (x_{k}, x^{'}) \to f_{k}$ with weights, $w_{i}$ .

ANNs have found success in many areas, especially in computer vision where deep ANNs [52] have reportedly surpassed human-level performance for image recognition [53]. Opportunities for applying deep ANNs to the field of PSE were also given in [9]. Meanwhile, kernel methods were shown to have matched the accuracy of deep ANNs for speech recognition [54]. In the real world, kernel methods have been applied successfully to wind turbine performance assessment [55], machinery prognostics [56], and objective flow regime identification [57], to name a few.

In the AI community, methods that combine kernel methods with deep learning are now being developed, such as neural kernel networks [58,59], deep neural kernel blocks [60], and deep kernel learning [61,62]. A soft sensor based on deep kernel learning was recently applied in a polymerization process [63]. Based on these recent advances, Wilson et al. [62] has concluded that the relationship between kernel methods and deep ANNs must not be competing, but rather, complementary. Perhaps a more forward-looking claim would be that of Belkin et al. [51], who said that “in order to understand deep learning we need to understand kernel learning”. Therefore, kernel methods are powerful and important machine learning tools that are worthwhile to consider in practice.

3. Methodology and Results Summary

Having motivated the importance of kernel methods in the previous section, the rest of the paper is dedicated to a review of their applications to process monitoring.

3.1. Methodology

The scope of this review is limited to the applications of kernel methods in the feature extraction step of process monitoring. This is because we are after the important issues in feature extraction that may drive future research directions. Papers that used kernelized MSPM tools such as kernel PCA, kernel ICA, kernel PLS, kernel FDA, kernel SFA, kernel CCA, kernel LPP, kernel CVA, etc. were included, although their details are not given here. Meanwhile, papers that used kernel methods in other stages of process monitoring (e.g., SVMs for fault classification, Gaussian Processes (GP) for fault prediction, and KDE for threshold setting) may also appear, but these are not the main focus. Moreover, this review only includes papers with industrial process case studies, such as the Tennessee Eastman Plant benchmark. A review of literature on the condition monitoring of electro-mechanical system case studies (e.g., rotating machinery) can be found elsewhere [64,65]. Interested practitioners are also referred to Wang et al. [34] for a survey of patents related to process monitoring.

For this review, an extensive literature search was conducted on the following journals: (1) IEEE Transactions on Industrial Informatics; (2) IEEE Transactions on Industrial Electronics; (3) IEEE Transactions on Control Systems Technology; (4) IEEE Transactions on Automation Science and Engineering; (5) IEEE Access; (6) Chemical Engineering Science; (7) Chemometrics and Intelligent Laboratory Systems; (8) Computers and Chemical Engineering; (9) Chemical Engineering Research and Design; (10) Journal of Process Control; (11) Control Engineering Practice; (12) ISA Transactions; (13) Expert Systems with Applications; (14) Chinese Journal of Chemical Engineering; (15) Industrial and Engineering Chemistry Research; (16) Process Safety and Environmental Protection; (17) Journal of Chemometrics; (18) AIChE Journal; and, (19) Canadian Journal of Chemical Engineering. The keywords used for searching were “kernel and fault”. Keywords such as “monitoring”, “detection”, and “diagnosis” were not used because not all intended papers contain these words in the text. From the search results, only the papers that fit the aforementioned scope were included; 155 papers were found this way. Also, selected papers from other journals and conference proceedings were found by following citations forwards and backwards. However, a comprehensive search is not guaranteed. The entire search process was performed in October 2019, and hence, only published works until this time were found. In the end, a total of 230 papers were included in this review.

3.2. Results Summary

Figure 5 shows the distribution of the reviewed papers by year of publication. The overall increasing trend in the number of papers indicate that kernel-based feature extraction is being adopted by more and more process monitoring researchers. Figure 6a then shows the most commonly used kernelized feature extractors for nonlinear process monitoring. Kernel PCA is most widely used, followed by kernel PLS, kernel ICA, kernel FDA, kernel CVA, and so on. The widespread use of kernel PCA can be attributed to the fact that linear algorithms can be kernelized by performing kernel PCA followed by the linear algorithm itself. For instance, kernel ICA is equivalent to kernel PCA + ICA [66]. Likewise, kernel CVA can be performed as kernel PCA + CVA [67]. Hence, kernel PCA was cited more frequently than other techniques.

In the reviewed papers, application case studies were also used for evaluating the effectiveness of the proposed kernel methods for process monitoring. Figure 6b shows the breakdown of papers according to the type of case study they used: simulated or real-world. As shown, only 27% of the papers have indicated the use of at least one real-world data set, taken from either industrial processes or laboratory experiments. On the other hand, the rest of the papers used simulated data sets alone for testing. The Tennessee Eastman Plant (TEP) is found to be the most commonly used simulated case study. It may still be advantageous to use simulated case studies since the characteristics of the simulated data are usually known or can be built in the simulator. Hence, the user can highlight the strengths of a particular method by its ability to handle certain data characteristics. Another advantage of using simulated data is that tests can be repeated many times by performing many Monte Carlo simulations. Nevertheless, the ultimate goal should still be to assess the proposed methods on real-world data. For instance, in a paper by Fu et al. [68], kernel PCA and kernel PLS were applied to 3 different real-world data sets: two from the chemical process industry and one from a laboratory mixing experiment. Among the chemical processes is a butane distillation system. Vitale et al. [69] also used real-world data sets from the pharmaceutical industry to test kernel methods. Results from these examples have proven that handling the nonlinear issue is important for monitoring real-world industrial processes.

However, issues arise in the application of kernel methods for nonlinear process monitoring. After a careful study of the papers, 12 major issues were identified and listed in Table 2. The table includes the number of papers that addressed each of them. Although some of these issues are not unique to kernel methods alone, we review them within the context of kernel-based feature extraction. The bulk of this paper is dedicated to the discussion of these issues.

A list of all the reviewed papers is then given in Table 3. The table also shows the kernelized method they used, the case studies they used, the kernel functions they used, and more importantly, the issues they addressed. The purpose of this table is to help the reader choose a specific issue of interest (A to L) and peruse down the column for papers that addressed it. In the column on case studies, we have also highlighted in bold the ones that are real-world or industrial applications. The reader is referred to the appendix for the list of all abbreviations in this table.

4. Review Findings

In this section, the major issues on kernel-based process monitoring, as identified and presented in Table 2, are discussed one by one. We first motivate why they are important and then give examples of how they were addressed by many researchers through the years.

4.1. Batch Process Monitoring

Monitoring batch processes is important so as to reduce batch-to-batch variability and maintain the quality of products [70]. The first application of kernel PCA to process monitoring was in a continuous process [24], wherein the plant data set is a matrix of M variables $\times N$ samples (2-D) (see Section 2). In contrast, for a batch process, the plant data set is a tensor of K batches $\times M$ variables $\times N$ samples (3-D) and, hence, must be handled differently. A multi-way approach is commonly adopted, where tensor data is unfolded into matrix data either variable-wise or batch-wise so that the kernel MSPM method can now apply. This led to multi-way kernel PCA [71], multi-way kernel ICA [72,73], multi-way kernel FDA [74,75], and so on. Variable correlation analysis (VCA) and its kernelized version was also proposed for batch process monitoring in [76,77]. Common batch process case studies include the fed-batch fermentation process for producing penicillin (PenSim) available as a simulation package from Birol et al. [78], the hot strip mill process (HSMP) as detailed in [79], the injection moulding process (IMP) [80], and other pharmaceutical processes [69,81].

If batch data sets have uneven lengths, the trajectories must be synchronized prior to analysis. Dynamic time warping (DTW) is one such technique to handle this issue, as adopted by Yu [75] and Rashid and Yu [82]. Another problem is related to the multi-phase characteristic of batch process data. Since a whole batch consists of steady-state and transition phases, then each phase must be modelled differently. Phase division has been employed to address this issue, as did Tang et al. [77] and Peng et al. [83]. In all these studies, the RBF and POLY kernels were mostly used to generate nonlinear features for process monitoring. But in particular, Jia et al. [84] has found that the POLY kernel is optimal for the PenSim case study, as calculated by a genetic algorithm (GA).

We refer the reader to the reviews by Yao and Gao [297] and Rendall et al. [298] for more information on batch process data analytics beyond the application of kernel methods.

4.2. Dynamics, Multi-Scale, and Multi-Mode Monitoring

Recall that in the framework of Figure 2, a column vector of samples at instant k is used to generate the statistical index for that instant. This scheme is merely static, however. It does not account for the trends and dynamic behaviors of the plant in the statistical model. Dynamic behaviors manifest in the data as serial correlations or trends at multiple time scales, which can arise from varying operating conditions. It is important to address both nonlinear and dynamic issues, as they can improve the accuracy of fault detection significantly [25].

To address dynamics, features must be extracted from time-windows of data samples at once (lagged samples) rather than sample vectors at one instant only. Dynamic extensions of kernel PCA [85,96,115,116,260], kernel PLS [101], and kernel ICA [66] have used this approach. In addition, some MSPM tools are inherently capable of extracting dynamic features effectively, such as canonical variate analysis (CVA) [299], slow feature analysis (SFA) [300], and dynamic latent variable models (DLV). Kernel CVA is the kernelized version of CVA and is used in many works [67,166,172,177,178,223,224,281,290,291]. Meanwhile, kernel slow feature analysis has appeared in [174,215,216,259], and more recently, the kernel dynamic latent variable model was proposed in [225]. The details of kernel CVA, kernel SFA, and kernel DLV can be found in these references. For mining the trends in the data at multiple time scales, wavelet analysis is commonly used. Multi-scale kernel PCA was first proposed by Deng and Tian [91], followed by similar works in [94,95,134,169,210], which includes multi-scale kernel PLS and multi-scale kernel FDA. A wavelet kernel was also proposed by Guo et al. [137], which was applied to the Tennessee Eastman Plant (TEP).

Multi-modality is a related issue found in processes that are designed to work at multiple operating points [38]. Figure 7 shows an example of a data set taken from the multiphase flow facility at Cranfield University [18], which exhibits multi-modality on the air flow measurements. The challenge is having to distinguish if transitions in the data are due to a change in operating mode or due to a fault. If this issue is not addressed, the changes in operating mode will trigger false alarms [38]. To address this issue, Yu [75] used k-nearest neighbors to classify the data prior to performing localized kernel FDA for batch process monitoring. Meanwhile, Khediri et al. [131] used kernel K-means clustering to identify the modes, and then support vector data description (SVDD) to detect faults in each cluster. Other ways to identify modes include a kernel Gaussian mixture model [136], hierarchical clustering [139,142], and kernel fuzzy C-means [199,234]. More recently, Tan et al. [295,296] proposed a new kernel design, called non-stationary discrete convolution kernel (NSDC), for multi-mode monitoring (see Section 4.7). The NSDC kernel was found to yield better detection performance than the RBF kernel based on the multiphase flow facility data [18].

4.3. Fault Diagnosis in the Kernel Feature Space

Diagnosis is a key process monitoring task. When a fault is detected in the plant, it is imperative to determine where did it occur, what type of fault is it, and how large its magnitude. The actual issue is that when nonlinear feature extraction is employed, fault diagnosis is more difficult to perform.

4.3.1. Diagnosis by Fault Identification

The usual practice is to first identify the faulty variables based on their influence to the value of the statistical index. This scheme is called fault identification. It is beneficial to identify which variables are associated to the fault, especially when the plant is highly integrated and the number of process variables is large [1]. There are 2 major ways to perform fault identification: variable contributions and variable reconstructions. Variable contributions are computed by taking the first-order Taylor series expansion of the statistical index to reveal which variables contribute the most to its value [87]. In the other approach, each variable is reconstructed in terms of the remaining variables to estimate the fault magnitude (the amount of reconstruction) along that direction [117]. Hence, variables with the largest amount of reconstructions are associated to the fault. Results can be visualized in contribution plots or contribution maps [301] to convey the diagnosis.

Fault identification is straightforward if the feature extraction involves only a linear machine. For kernel methods, however, it is complicated by the fact that the data went through a nonlinear mapping. This is because both approaches entail differentiating the statistical index, which is difficult if the chain involves a kernel function [86]. Nevertheless, many researchers have derived analytical expressions for either kernel contributions-based diagnosis [66,79,81,83,87,94,119,127,133,136,146,150,156,157,162,164,194,213,241,268,275,276,278,279,288,289,293] or kernel reconstructions-based diagnosis [86,117,140,155,161,163,176,217,236,254,265,285]. However, most derivations are applicable only when the kernel function is the RBF, Equation (5). In one approach, Tan and Cao [251] proposed a new deviation contribution plot to perform fault identification for any nonlinear feature extractor.

4.3.2. Diagnosis by Fault Classification

The fault identification approach assumes that no prior fault information is available for making a diagnosis. If fault information is available, then the learning problem becomes that of finding the boundary between normal and faulty samples or the boundary between different fault types, within the feature space (see Section 2.2). This learning problem pertains to fault classification, and the three common approaches are similarity factors, discriminant analysis, and SVMs.

The similarity factor method (SFM) was proposed by Krzanowski [302] to measure the similarity of two data sets using PCA. For fault classification, the idea is to compute the similarity between the test samples against a historical database of fault samples, and find the fault type that is most similar. A series of works by Deng and Tian [91,95,148] used SFM for diagnosis, after performing multi-scale KPCA for fault detection. Ge and Song [303] also proposed the ICA similarity factor, although it was not performed in a kernel feature space. SFM was also applied to features derived from kernel slow feature analysis (SFA) [175] and serial PCA [257].

Discriminant analysis, notably Fisher discriminant analysis (FDA), is a linear MSPM method that transforms the data as in Equation (1) where the weights are obtained by maximizing the separation of samples from different classes while minimizing the scatter within each class [1]. This means that the generated features from FDA are discriminative in nature. Kernel FDA, its nonlinear extension, is used extensively such as in [74,75,80,92,98,102,103,105,118,130,151,169,175,183,195,204,222,232,238,258,266,294]. One variant of FDA is exponential discriminant analysis (EDA) which solves the singularity problem in the FDA covariance matrices by taking their exponential forms [281,283]. Another variant is scatter-difference-based discriminant analysis (SDA), whose kernel version first appeared in [99], and then in [104,124]. SDA differs from FDA in that the difference of between-class scatter and within-class scatter matrices is maximized rather than their ratio, and hence avoids any matrix inversion or singularity problems [99]. Lastly, a kernel PLS discriminant analysis variant is used in batch process monitoring in [69].

SVM is a well-known method of choice for classification in machine learning, originally proposed by Cortes and Vapnik [304]. It is also regarded as the most popular kernel method, according to Domingos [50], although he also advocates that simpler classifiers (e.g., kNN) must be tried first before SVM [40]. In this regard, Zhang [106,305] used SVM on kernel PCA and kernel ICA features to perform diagnosis. Xu and Hu [121] and Xiao and Zhang [203] used a similar approach for classification, but also employs multiple kernel learning [306]. Meanwhile, Md Nor et al. [232] used SVM on the features from multi-scale kernel FDA. Aside from SFM, FDA, and SVM, an ANN-based fault classifier was also used by Bernal de Lazaro [183] on kernel PCA and kernel FDA features.

The Tennessee Eastman Plant (TEP) is usually the case study in most of these papers, as it contains samples at normal plant operation as well as from each of 20 different fault scenarios. Once the fault classifier is trained, it can automatically assign every new test sample as to normal or to any fault scenario it was trained on. However, the fault classification methods require a database of samples from many different fault scenarios a priori, in order to provide a comprehensive diagnosis.

4.3.3. Diagnosis by Causality Analysis

So far, the above methods are unable to perform a root cause diagnosis. Root cause diagnosis is valuable for cases when the fault has already propagated to multiple locations, making it difficult to locate its origin. To perform such a task, the causal relationships between process variables must be known so that the fault propagation can be traced throughout the plant [307]. Causal information can be supplied by process knowledge, plant operator experience, or model-based principles. One such work is by Lu and Wang [101], who used a signed digraph (SDG) model of the TEP consisting of 127 nodes and 15 root-cause nodes, and then used 20 local dynamic kernel PLS models for the subsystems. However, as a consequence of the kernel mapping, traversing the SDG backwards is difficult since it is impossible to find the inverse function from the kernel feature space to the original space [101]. Hence, the diagnosis was only performed qualitatively in that work [101].

The Bayesian network is an architecture for causality analysis, where the concepts of Granger causality and transfer entropy are used to define if one variable is caused by another based on their time series data. In 2017, Gharahbagheri et al. [236,237] used these concepts together with the residuals from kernel PCA models to generate a causal map for a fluid catalytic cracking unit (FCCU) and the TEP. A statistical software called Eviews was used to perform causality analysis.

In the future, fault diagnosis by causality analysis can potentially benefit from the combination of knowledge-, physics-, and data-driven approaches [1].

4.4. Handling Non-Gaussian Noise and Outliers

Recall that in the feature extraction step in Figure 2, it is desired to yield features that are mutually independent so that the $T^{2}$ statistical index can be built. However, previous methods such as PCA and PLS (even their kernelized versions) may fail to yield such features, especially if the data is laden with non-Gaussian noise or outliers. This issue is widely recognized in practice [25]. Instinctively, MSPM methods can be used for detecting outliers. However, if outliers are present in the training data itself, the accuracy of MSPM algorithms will be seriously affected.

Independent components analysis (ICA) and its kernelized version, kernel ICA, are widely used MSPM methods that can handle the non-Gaussianity issue. Here, the data is treated as a mixture of independent source signals, so that the aim of ICA is to de-mix the data and recover these sources [308]. To do this, the projection matrix in ICA, $W_{n}$ (also known as a de-mixing matrix), is chosen so that the ICA features are as statistically independent as possible [308]. More concretely, the goal is usually to maximize negentropy, which is a measure of the distance of a distribution from Gaussianity [309]. Kernel ICA can be performed by doing kernel PCA for whitening, followed by linear ICA, as did many researchers [66,72,73,82,90,97,100,106,107,133,140,145,154,155,157,188,203,213,233,239,265,275,276,283,305]. A variant of kernel ICA that avoids the usual KPCA-ICA combination is also proposed by Feng et al. [262]. Aside from kernel ICA, the non-Gaussianity issue can also be handled using a kernel Gaussian mixture model [136], the use of statistical local approach for building the statistical index [112], and kernel density estimation (KDE) for threshold setting [67,194,251].

To handle outliers in the data, Zhang et al. [134] and Deng and Wang [255] incorporated a sliding median filter and a local outlier factor method, respectively, into kernel PCA. Other outlier-robust methods include the spherical kernel PLS [153], the joint kernel FDA [204] and the kernel probabilistic latent variable regression model [235].

4.5. Improved Sensitivity and Incipient Fault Detection

Despite the use of advanced MSPM tools, it may be desired to improve their detection sensitivity further. This is beneficial in particular for detecting incipient faults, which are small-magnitude faults with a drifting behavior. These faults are difficult to detect at the initial stage because they are masked by noise and process control [67]. Yet because they are drifting, they can seriously escalate if no action takes place. Kernel MSPM solutions to these issues already exist, which we review as follows.

An early approach for improved detection is dissimilarity analysis (DISSIM), proposed by Kano et al. [310]. DISSIM is mathematically equivalent to PCA but its statistical index is different from the $T^{2}$ in that it quantifies the dissimilarity between data distributions. Its kernel version, kernel DISSIM, was developed by Zhao et al. [113], and further used in Zhao and Huang [263]. The concept of dissimilarity was also adopted by Pilario et al. [67] and Xiao [291] for kernel CVA and Rashid and Yu [311] for kernel ICA. Related to DISSIM is statistical pattern analysis (SPA), used in [148,221,258] for kernel PCA. The idea of SPA, as proposed by He and Wang [312], is to build a statistical index from the dissimilarity between the higher-order statistics of two data sets.

Another approach is to use an exponentially weighted moving average filter (EWMA) to increase the sensitivity for drifting faults, as did Yoo and Lee [88], Cheng et al. [116], Fan et al. [154], and Peng et al. [283]. The shadow variables by Feng et al. [262] also involve applying EWMA on the statistical indices for smoothing purposes as well. For batch processes, a method for detecting weak faults is also proposed by Wang et al. [139]. The works of Jiang and Yan [143,144] improved the sensitivity of kernel PCA by investigating the rate of change of the statistical index and by giving a weight to each feature. Lastly, a new statistic based on the generalized likelihood ratio test (GLRT) can also improve detection for kernel PCA and kernel PLS, as shown by Mansouri et al. [192,193,210,270,271].

4.6. Quality-Relevant Monitoring

Before the widespread use of MSPM methods, the traditional approach to process monitoring is to monitor only the quality variables [8] as embodied by statistical quality control. MSPM methods are more beneficial in that it utilizes the entire plant data set rather than just the quality variables to perform fault detection. However, as noted by Qin [25], it is imperative to link the results from MSPM methods to the quality variables. The kernel MSPM methods discussed thus far have not yet established this link. This issue can be addressed by performing quality-relevant monitoring.

Partial least squares (PLS) is an MSPM method associated with quality-relevant monitoring, as it finds a relationship between the process and quality variables. The first kernel PLS application was in a biological anaerobic filter process (BAFP) by Lee et al. [89], where the quality variables are the total oxygen demand of the effluent and flow rate of exiting methane gas. Zhang and Zhang [107] combined ICA and kernel PLS for monitoring the well-known penicillin fermentation (PenSim) process and predicting the ${CO}_{2}$ and dissolved $O_{2}$ concentrations. Hierarchical kernel PLS, dynamic hierarchical kernel PLS, and multi-scale kernel PLS were introduced in [128,135], and [129], respectively. Total PLS (T-PLS) was proposed to make PLS more comprehensive, and its kernel version was developed by Peng et al. [79,141]. The application was in the HSMP, wherein both quality-related and non-quality-related faults were investigated. Further developments on kernel PLS can be found in [146,160,163,164,168,173,196,197,199,206,229,231,242,243,268,284]. Concurrent PLS was also proposed to solve some drawbacks of the T-PLS. Kernel concurrent PLS was developed by Zhang et al. [176] and Sheng et al. [205].

The other more recent MSPM tool for relating process and quality variables is canonical correlation analysis (CCA). CCA is different from PLS in that it finds projections that maximize the correlation between two data sets. Kernel CCA first appeared in process monitoring as a modified ICA by Wang and Shi [123], but it was not utilized for quality-relevant monitoring. The same is true in Cai et al. [181], where kernel CCA was merely used to build a complex network for the process. In 2017, Zhu et al. [240] first proposed the kernel concurrent CCA for quality-relevant monitoring. Liu et al. [241] followed with its dynamic version. In a very recent work by Yu et al. [277], a faster version of kernel CCA was proposed, to be discussed later in Section 4.8.

4.7. Kernel Design and Kernel Parameter Selection

The issue of kernel design is often cited as the reason why researchers would prefer to use other nonlinear techniques over kernel methods. It is difficult to decide which kernel function to use (see Equations (5)–(7)) and how kernel parameters should be chosen. (Note, however, that decisions like these also exist in ANNs, e.g., how to set the depth of the network, number of hidden neurons, and learning rate, which activation function to use and which regularization method to use.) These choices also depend on the decisions made at other stages of process monitoring. For instance, choosing one kernel function over another may change the number of retained kernel principal components necessary for good performance. Moreover, the quality of the training data can influence all these decisions. Even if these parameters were carefully tuned based on fixed data sets for training and validation, the detection model may still yield too many false alarms if the data sets are not representative of all behaviors of the normal process. Process monitoring performance greatly depends on these aspects. We review existing efforts that address these issues, as follows.

4.7.1. Choice of Kernel Function

The main requirement for a kernel function to be valid is to satisfy Mercer’s condition [22]. According to Mercer’s theorem, as quoted from [313]: A necessary and sufficient condition for a symmetric function $k (\cdot, \cdot)$ to be a kernel is that for any set of samples $x, \dots, x_{ℓ}$ and any set of real numbers $λ_{1}, \dots, λ_{ℓ}$ , the function $k (\cdot, \cdot)$ must satisfy:

(8) $\sum_{i = 1}^{ℓ} \sum_{j = 1}^{ℓ} λ_{i} λ_{j} k (x_{i}, x_{j}) \geq 0$

which translates to the function

k (\cdot, \cdot)

being positive definite.

This means that if a function satisfies the condition in Equation (8), it can act as a dot product in the mapping of $x$ defined by $ϕ (\cdot)$ , and hence, it is a valid Mercer kernel function. If $k (\cdot, \cdot)$ acts as a dot product, then for any two samples, $x$ and $z$ , the function is symmetric, i.e., $k (x, z) = k (z, x)$ , and also satisfies the Cauchy-Schwarz inequality: $k^{2} (x, z) \leq k (x, x) k (z, z)$ [313].

Although many kernel functions exist [44,314], only a few common ones are being used in process monitoring, namely, Equations (5)–(7). We identified the kernels used in each of the 230 papers included in this review. In the tally shown in Figure 8a, the RBF kernel is found to be the most popular choice, by a wide margin. Even outside the process monitoring community, the Gaussian RBF kernel (also known as the squared exponential kernel) is the most widely used kernel in the field of kernel machines [314], possibly owing to its smoothness and flexibility. Other kernels found from the review are the cosine kernel [105], wavelet kernel [137], the recent non-stationary discrete convolution kernel (NSDC) [295,296], and the heat kernel [182,266,290] for manifold learning (see Section 4.9).

Other advances are related to the kernel design itself. For instance, Shao et al. [108] and Luo et al. [182] proposed data-dependent kernels for kernel PCA, which is used to learn manifolds. A robust alternative to kernel PLS is proposed by Hu et al. [153] which uses a sphered kernel matrix. Meanwhile, Zhao and Xue [163] used a mixed kernel for kernel T-PLS to discover both local and global patterns. The mixed kernel consists of a convex addition of the RBF and POLY kernels. Mixed kernels were also used by Pilario et al. [67] for kernel CVA, but motivated by monitoring incipient faults. This additive principle was also used to design a kernel for batch processes by Yao and Wang [170]. More recently, Wang et al. [288,289] proposed to use the first-order expansion of the RBF kernel to save computational cost. However, it is not clear if the new design retains the same flexibility of the original RBF kernel to handle nonlinearity, or if it compares to polynomial kernels of the same order.

4.7.2. Kernel Parameter Selection

The kernel parameters for the RBF, POLY, and SIG kernels in Equations (5)–(7) are the kernel bandwidth, c, the polynomial degree, d, and the sigmoid scale a and bias b. These kernels satisfy Mercer’s conditions for $c > 0$ , $d \in N$ , and only some combinations of a and b [22,67]. There are currently no theoretical basis on how to specify the values of these parameters, yet they must be specified prior to performing any kernel method. We review some of the existing ways to obtain their values, as follows.

We have tallied the various parameter selection routes used by the 230 papers included in this review. Based on the results in Figure 8b, the most popular approach is to select them empirically. For the RBF kernel, c is usually computed based on the data variance ( $σ^{2}$ ) and dimensionality (m), i.e., $c = r m σ^{2}$ [24,72,96,97], where r is an empirical constant. Another heuristic is based solely on the dimensionality, such as $c = 5 m$ [86,87,88] or $c = 500 m$ [66,118,130,204] for the TEP case study. For the TEP alone, many values were used, such as $c = 6000$ [157,213], $c = 1720$ [177], $c = 4800$ [205], $c = 3300$ [220], and so on. However, note that the appropriate value of c does not depend on the case study, but rather on the characteristics of the data that enters the kernel mapping. Hence, various choices will differ upon using different data pre-processing steps, even for the same case study. Other notable heuristics for c can be found in [68,126,131,164,248,280].

A smaller number of papers have used cross-validation to decide kernel parameter values. In this scheme, the detection model is tuned according to some objective, such as minimizing false alarms, using a validation data set that must be independent from the training data [67]. Another scheme is to perform k-fold cross-validation, as did [85], in which the data set is split into k groups: $k - 1$ groups are used for training, while the remaining group is used for validation, and then repeat k times for different held-out data. Typically, $k = 5$ or 10. Grid search is a common approach for the tuning stage, where the kernel parameters are chosen from a grid of candidates, as did [67,79,98,121,124,141,151,170,171,195,201,215,259]. Based on a recent study by Fu et al. [68], cross-validation was found to yield better estimates of the kernel parameters than the empirical approach.

A more detailed approach to compute kernel parameters is via optimization. It is known that if certain objectives are set, these parameters will have an optimal value. For instance, as explained by Bernal de Lazaro [184], if the RBF kernel bandwidth c is too large, the model loses the ability to discover nonlinear patterns, but if it is too small, the model will become too sensitive to the noise in the training data. Hence, the value of c can be searched such that the false alarm rate is minimum and the detection rate is maximum [184]. Exploring these trade-offs is key to the optimization procedure. Other criteria for optimizing kernel parameters were proposed in [183]. Some search techniques include the bisection method [162], Tabu search [247,250,274], particle swarm optimization [184,276], differential evolution [184], and genetic algorithm [84,93,102,108,154]. More recent studies have emphasized that kernel parameters must be optimized simultaneously with the choice of latent components (e.g., no. of kernel principal components) since these choices depend on each other [67,68].

Finally, there are also some papers that investigated the effect of varying the kernel parameters and presented their results (see [67,80,98,165,185,256,295,296]). In case the reader is interested in the investigation, we have provided a MATLAB code for visualizing the contours of kernel PCA statisical indices for any 2-D data set, available in [315]. This code was used to generate one of the figures in [67]. Understanding the effect of kernel parameters and the kernel function is important, especially as process monitoring methods become more sophisticated in the future.

4.8. Fast Computation of Kernel Features

Recall in Section 2.3 that one of the issues of kernel methods is scalability. This is because the computational complexity of kernel methods grow in proportion to the size of training data. Hence, although they are fast to train, they are slow in making predictions [45]. Addressing the scalability of kernel methods is important, especially since samples are now being generated at large volumes in the plant [8]. The time complexity of naïve kernel PCA for the online testing phase is $O (N^{2})$ , where N is the number of training samples. Assuming that a typical CPU can do $10^{8}$ operations in one second [316], kernel PCA can only allow at most $10^{4}$ training samples if a prediction is desired within a second as well. In the following, we review the many approaches adopted by process monitoring researchers to compute kernel features faster.

An early approach to reduce the computational cost of kernel MSPM methods is to select only a subset of the training samples so that their mapping is as expressive as if the entire data set was used. By reducing the number of samples, the kernel matrix reduces in size, and hence the transformation in Equation (4) can be computed faster. Feature vector selection (FVS) is one such method in this regard, as proposed by Baudat and Anouar [317], and then adopted by Cui et al. [98] for kernel PCA based process monitoring. FVS aims to preserve the geometric structure of the kernel feature space by an iterative error minimization process. Cui et al. [98] have shown that for the TEP, even if only 30 out of the 480 samples were selected by FVS and stored by the model, the average fault detection rate has changed only by 0.7%. FVS was further adopted in [77,104,105,125,149,256]. A related feature points extraction scheme by Wang et al. [142] was also proposed for batch processes. Another idea is similarity analysis, wherein a sample is rejected from the mapping if it is found to be similar to the current set by some criteria (This is not to be confused with the similarity factor method, SFM, discussed in Section 4.3.2). Similarity analysis was adopted by Zhang and Qin [100] and Zhang [106]. Meanwhile, Guo et al. [278] reformulated kernel PCA itself to sparsify the projection matrix using elastic net regression. Other techniques for sample subset selection includes feature sample extraction [73], the use of fuzzy C-means clustering [159], reduced KPCA [207], partial KPCA [249], and dictionary learning [246,250,270,271,274]. These methods are efficient enough to warrant an online adaptive implementation (see Section 4.10).

The other set of approaches involves a low-rank approximation of the kernel matrix for large-scale learning. Nyström approximation and random Fourier features are the typical approaches in this set. The Nyström method approximates the kernel matrix by sampling a subset of its columns. It was adopted recently by Yu et al. [277] for kernel CCA. Meanwhile, random Fourier features was adopted by Wu et al. [279] for kernel PCA. This scheme exploits Bochner’s theorem [59,279], in which the kernel mapping is approximated by passing the data through a randomized projection and cosine functions. This results to a map of lower dimensions which saves computational cost. For more information, see the theoretical and empirical comparison of the Nyström method and random Fourier features by Yang et al. [318]. Other related low-rank approximation schemes were proposed by Peng et al. [283] which applies to kernel ICA, and that of Zhou et al. [286] called randomized kernel PCA. Lastly, a different approximation using the Taylor expansion of the RBF kernel was also derived by Wang et al. [288,289], and was called kernel sample equivalent replacement.

4.9. Manifold Learning and Local Structure Analysis

The kernel MSPM methods described thus far are limited in their ability to learn local structure. A famous example that exhibits local structure would be the S-curve data set, described in [319], which is a sheet of points forming an “S” in 3-D space (see Figure 9a). In this case, manifold learning methods are more appropriate for dimensionality reduction. While kernel PCA aims to preserve nonlinear global directions with the maximum variance, manifold learning methods are constrained to preserve the distances between data points in their local neighborhoods [320]. For the S-curve data, this means that manifold learning methods will be able to “unfold” the curve in a 2-D mapping so that the points from either end of the curve become farthest apart, whereas kernel PCA would undesirably map them close together. In Figure 9c, local linear embedding (LLE) was used as the manifold learner. The concept of manifold learning, sometimes called local structure analysis, was already adopted by many process monitoring researchers, which we review as follows.

The first few efforts to learn nonlinear manifolds via kernels for process monitoring were done by Shao et al. [108,109] in 2009. The techniques in [108,109] are related to maximum variance unfolding (MVU), which is a variant of kernel PCA that does not require selecting a kernel function a priori. Instead, MVU automatically learns the kernel matrix from the training data [109,320]. However, a parameter for defining the neighborhood must still be adjusted, for instance, the number of nearest neighbors, k. The strategy in [109] is to set k as the smallest integer that makes the entire neighborhood graph fully connected. Shao and Rong [109] have shown that the spectrum of the kernel matrix from MVU reveals a sharper contrast between the dominant and non-dominant eigenvalues than that from kernel PCA for the TEP case study. This result is important as it indicates that the salient features were separated from the noise more effectively. Other than MVU, a more popular technique is locality preserving projections (LPP), originally proposed by He and Niyogi [321] and then adopted by Hu and Yuan [322] for batch process monitoring. MVU only computes an embedding for the training data, hence, it requires a regression step to find the explicit mapping function for any test data. In contrast, the explicit mapping is readily available for LPP. The kernel version of LPP was adopted by Deng et al. [149,150] for process monitoring. Meanwhile, generalized LPP and discriminative LPP (and its kernel version) were proposed by Shao et al. [110] and Rong et al. [151], respectively. Other works that adopted variants of LPP can be found in [218,234,252,258,266,273,290]. The heat kernel (HK) is commonly used as a weighting function in LPP.

More recently, researchers have recognized that both global and local structure must be learned rather than focusing on one or the other. Hence, Luo et al. [182,187] proposed the kernel global-local preserving projections (GLPP). The projections from GLPP are in the middle of those from LPP and PCA because the local (LPP) and global (PCA) structures are simultaneously preserved. Other works in this regard can be found in [204,215,222,279,282]. To learn more about manifold learning, we refer the reader to a comparative review of dimensionality reduction methods by Van der Maaten et al. [320]. The connection between manifold learning and kernel PCA is also discussed by Ham et al. [323].

4.10. Time-Varying Behavior and Adaptive Kernel Computation

When an MSPM method is successfully trained and deployed for process monitoring, it is usually assumed that the normal process behavior represented in the training data is the same behavior to be monitored during the testing phase. This means that the computed projection matrices and upper control limits (UCLs) are fixed or time-invariant. However, in practice, the process behavior continuously changes. Even if sophisticated detection models were used, a changing process behavior would require the model to be adaptive. That is, the model must adapt to changes in the normal behavior without accommodating any fault behavior. However, it would be time-consuming for the model to be re-trained from scratch every time a new sample arrives. Hence, a recurrence relation or a recursive scheme must be formulated to make the model adaptive. For kernel methods, the actual issue is that kernel matrix adaptation is not straightforward. As noted by Hoegaerts et al. [324], adapting a linear PCA covariance matrix to a new data point will not change its size, whereas doing so for a kernel matrix would expand both its row and column dimensions. Hence, to keep its size, the kernel matrix must be updated and downdated at the same time. In addition, the eigendecomposition of the kernel matrix must also be adapted, wherein the number of retained principal components may change. These notions are important for addressing the time-varying process behavior.

In 2009, Liu et al. [111] proposed a moving window kernel PCA by implementing the adaptive schemes from Hoegaerts et al. [324] and Hall et al. [325]. It was applied to a butane distillation process where the fresh feed flow and the fresh feed temperature are time-varying. During implementation, adaptive control charts were produced, where the UCLs vary with time and the number of retained principal components varied between 8 and 13 as well. Khediri et al. [126] then proposed a variable moving window scheme where the model can be updated with a block of new data instead of a single data point. Meanwhile, Jaffel et al. [191] proposed a moving window reduced kernel PCA, where “reduced” pertains to an approach for easing the computational burden as discussed in Section 4.8. Other related works that utilize the moving window concept can be found in [190,207,208,209,238,293]. A different adaptive approach is to use multivariate EWMA to update any part of the model, such as the kernel matrix, its eigen-decomposition, or the statistical indices [116,132,179,224,253,281,283,292]. Finally, for the dictionary learning approach by Fezai et al. [246,247] (see Section 4.8), the Woodbury matrix identity is required to update the inverse of the kernel matrix, thereby updating the dictionary of kernel features as well. This scheme was adopted later in [250,270,271].

4.11. Multi-Block and Distributed Monitoring

Due to the enormous scale of industrial plants nowadays, having a centralized process monitoring system for the entire plant has its limitations. According to Jiang and Huang [326], a centralized system may be limited in terms of: (1) fault-tolerance—it may fail to recognize faults if many of them occur simultaneously at different locations; (2) reliability—because it handles all data channels, it is more likely to fail if ever one of the channels become unavailable; (3) economic efficiency—it does not account for geographically distant process units that should naturally be monitored separately; and (4) performance—its monitoring performance can still be improved by decomposing the plant into blocks. These reasons have led to the rise of multi-block, distributed, or decentralized process monitoring methods, of which the kernel-based ones are reviewed as follows.

Kernel PLS is widely applied to decentralized process monitoring, as found in [101,119,129,206,284]. Lu and Wang [101] utilized a signed digraph, which was mentioned in Section 4.3.3 to have achieved fault diagnosis by incorporating causality. Zhang et al. [119] proposed the multi-block kernel PLS to monitor the continuous annealing process (CAP) case study, and utilized the fact that each of the 18 rolls in the process constitute a block of variables. By monitoring each of the 18 blocks rather than the entire process as one, it becomes easier to diagnose the fault location. An equivalent multi-block multi-scale kernel PLS was used by Zhang and Hu [129] in the PenSim and the electro-fused magnesia furnace (EFMF) case studies. Multi-block kernel ICA was proposed by Zhang and Ma [133] to monitor the CAP case study as well. Enhanced results for the CAP was achieved by Liu et al. [241] by using dynamic concurrent kernel CCA with multi-block analysis for fault isolation. Peng et al. [283] also used a prior process knowledge of the TEP to partition the 33 process variables into 3 sub-blocks, each monitored by adaptive dynamic kernel ICA.

In order to perform block division when process knowledge is not available, Jiang and Yan [327] proposed to use mutual information (MI) based clustering. This idea was fused with kernel PCA based process monitoring by Jiang and Yan [180], Huang and Yan [245], and Deng et al. [287]. All these works have used the TEP as a case study, and they have consistently elucidated 4 sub-blocks for the TEP. For instance, in [245], their method initially produced 12 sub-blocks of variables, but 7 of these contain only one variable. Hence, some sub-blocks were fused into others, yielding only 4 sub-blocks in the end. Another approach is to divide the process according to blocks that give optimal fault detection performance, as proposed by Jiang et al. [198]. They used the genetic algorithm and kernel PCA for optimization and performance evaluation, respectively. Different from the above, Cai et al. [181] used kernel CCA to model the plant as a complex network and then used PCA for process monitoring. Li et al. [80,267] also proposed a hierarchical process modelling concept that separates the monitoring of linear from nonlinearly related variables. More recently, Yan et al. [284] used self-organizing maps (SOM) for block division, where the quality-related variables are monitored by kernel PLS and the quality-unrelated variables by kernel PCA.

For a systematic review of plant-wide monitoring methods, the reader can refer to Ge [33].

4.12. Advanced Methods: Ensembles and Deep Learning

Ensemble learning and deep learning are two emerging concepts that have now become standard in the AI community [40]. The idea of ensemble learning is to build an enhanced model by combining the strengths of many simpler models [308]. The case for using ensembles is strong due to the many data science competitions that were won by exploiting the concept. For example, the winner of the Netflix Prize for a video recommender system was an ensemble of more than 100 learners [40], the winner of the Higgs Boson machine learning challenge was an ensemble of 70 deep neural networks that differ in initialization and training data sets [328], and it was reported that 17 out of the 29 challenges published in a machine learning competition site called Kaggle in 2015 alone were won by an ensemble learner called XGBoost [329]. Meanwhile, deep learning methods are general-purpose learning procedures for the automatic extraction of features using a multi-layer stack of input-output mappings [52]. Because features are learned automatically, it then avoids the task of designing feature extractors by hand, which would have required domain expertise. The case for using deep learners is strengthened by the fact that they have beaten many records in computer vision tasks, natural language processing tasks, video games, etc. [52,330]. In the process monitoring community, ensemble and deep architectures have also started appearing among kernel-based methods.

In 2015, Li and Yang [167] proposed an ensemble kernel PCA strategy wherein the base learners are kernel PCA models of various RBF kernel widths. For the TEP, 11 base models of kernel widths $c = 2^{i - 1} 5 m, i = 1, \dots, 11$ were used and gave better detection rates than using a single RBF kernel alone. Later on, Deng et al. [220] proposed Deep PCA by stacking together linear PCA and kernel PCA mappings. Bayesian inference was used to consolidate the monitoring statistics from each layer, so that a single final result is obtained. Using the TEP as case study, the detection rates of a 2-layer Deep PCA model were shown to have improved against linear PCA and kernel PCA alone. Further work in [256] used more layers in Deep PCA, as well as the FVS scheme (see Section 4.8) for reducing the computational cost. Deng et al. [257] also proposed serial PCA, where kernel PCA is performed on the residual space of an initial linear PCA transformation. In that work, the similarity factors method was used for fault classification as well (see Section 4.3.2). A different way to hybridize PCA and kernel PCA is by parallel instead of serial means, as proposed by Jiang and Yan [261]. Meanwhile, Li et al. [80,267] also used multi-level hierarchical models involving both linear PCA and kernel PCA. More recently, the ensemble kernel PCA was fused with local structure analysis by Cui et al. [273] for manifold learning (see Section 4.9).

We refer the reader to Lee et al. [9] for a more general outlook of the implications of advanced learning models to the process systems engineering field.

5. A Future Outlook on Kernel-Based Process Monitoring

Despite the many advances in kernel-based process monitoring research, more challenges are still emerging. It is likely that kernel methods, and other machine learning tools, as presented in Figure 4, will have a role in addressing these challenges towards safer operations in the industry. A few of these challenges are discussed as follows.

5.1. Handling Heterogeneous and Multi-Rate Data

As introduced in Section 2, plant data sets are said to consist of N samples of M process variables. However, process measurements are not the only source of plant data. To perform process monitoring more effectively, it can also benefit from image data analytics, video data analytics, and alarm analytics. One notable work by Feng et al. [262] used kernel ICA to analyze video information for process monitoring. A more recent integration of alarm analytics to fault detection and identification was also developed by Lucke et al. [331]. Aside from these, spectroscopic data could be another information source from the plant since it is used for elucidating chemical structure. In addition, process monitoring can also be improved by combining information from both low- and high-frequency process measurements. Most of the case studies in the papers reviewed here generate only low-frequency data, e.g., 3-s sampling interval for the TEP. But there also exist data from pressure transducers (5 kHz), vibration measurements (0.5 Hz–10 kHz), and so on. Ruiz-Carcel et al. [332], for instance, have combined these multi-rate data to perform fault detection and diagnosis using CVA. It is projected that more efforts to handle heterogeneous and multi-rate data will appear in the future.

Although the above issues are recognized, the way to move forward is to first establish benchmark case studies that exhibit heterogeneous and multi-rate data. This will help ensure that new methods for handling these issues can be fairly compared. One such data set has been generated and made publicly available by Stief et al. [333], namely, from a real-world multiphase flow facility. For more details about the data set and how to acquire it, see the above reference.

5.2. Performing Fault Prognosis

Fault detection and diagnosis are the main objectives of the papers found in this review. As noted in Section 1, the third component of process monitoring is fault prognosis. After detecting and localizing the fault, prognosis methods aim to predict the future behavior of the process under faulty conditions. If the fault would lead to process failure, it is important to know in advance when it would happen, along with a measure of its uncertainty. This quantity is known as the remaining useful life or time-to-failure of the process [334]. Once these quantities are computed, the appropriate maintenance or repair actions can be performed, and hence, failure or emergency situations can be prevented.

To perform prognosis, the first step is to extract an incipient fault signal from the measured variables that is separated from noise and other disturbances as clearly as possible. This means that the method used for feature extraction should handle the incipient fault detection issue very well (see Section 4.5). Secondly, the drifting behavior of the incipient fault must be extrapolated into the future using a predictive model. This predictive element is key to the prognosis performance. The model must have a satisfactory extrapolation ability, that is, the ability to make reliable predictions beyond the data space where it was initially trained [20]. For instance, a detection model based on the widely used RBF kernel would have poor extrapolation abilities, as noted in Pilario et al. [67]. To solve this, a mixture of the RBF and the POLY kernels was used to improve both interpolation and extrapolation abilities. These kernels were adopted into kernel CVA for incipient fault monitoring. Another kernel method for prediction is Gaussian Processes (GP), which was used by Ge [335] under the PCA framework. Also, Ma et al. [265] used the fault reconstruction approach in kernel ICA to generate fault signals for prediction. Meanwhile, Xu et al. [186] used a neural network for prediction, together with local kernel PCA based monitoring.

Despite these efforts, predictive tasks are generally considered difficult, especially in nonlinear dynamic processes. For nonlinear processes, predictions will be inaccurate if the hypothesis space of the assumed predictive model is not sufficient to capture the complex process behavior. And even if the hypothesis space is sufficient, enough training data must be acquired to search the correct model within the hypothesis space. However, training data is scarce during the initial stage of process degradation. In other words, it is difficult to determine whether the future trend would be linear, exponential, or any other shape on the basis of only a few degradation samples. Furthermore, a process is dynamic if its behavior at one point in time depends on its behavior at a previous time. This means that if the current prediction is fed into a dynamic model to serve as input for the next prediction, then small errors will accumulate as predictions are made farther into the future. It is important to be aware of these issues when developing fault prognosis strategies for industrial processes.

5.3. Developing More Advanced Methods and Improving Kernel Designs

Due to the recent advances in AI research, more and more process monitoring methods that rely on ensembles and deep architectures are expected to appear in the future (see Section 4.12). As mentioned in Section 2.3, both kernel methods and deep ANNs can be exploited, possibly in combined form, in order to create more expressive models. In addition, more creative kernel designs can be used, especially via the multiple kernel learning approach as noted in [67,163,277]. Multiple kernels can be created by combining single kernels additively or multiplicatively while still satisfying Mercer’s conditions [44,306]. The combination of kernels can be done in series, in parallel, or both. For instance, the proposed serial PCA [257] and deep PCA [220] architectures can pave the way for deep kernel learning for process monitoring. Also, the concept of automatic relevance determination [314] can be considered in future works, wherein the Gaussian kernel width is allowed to have different values in each dimension of the data space. New kernel designs can also be inspired by the challenge of handling heterogeneous data, as mentioned in Section 5.1. Many examples of kernel designs for other types of data have already been used [44], such as for strings of text, images, gene expressions (bioinformatics), and categorical data. Hence, new kernel designs for heterogeneous process data may be inspired by these examples.

In parallel with these developments, a more careful approach to kernel parameter selection must be carried out, such as cross-validation and optimization techniques. To ensure that new results can be replicated and verified, we encourage researchers to always state the kernel functions chosen, the kernel parameter selection route, and how all other settings were obtained in their methods. The repeatability of results strengthens the understanding of new concepts, which will further lead to newer concepts more quickly. Hence, these efforts are necessary to further the development of the next generation of methods for fault detection, fault diagnosis, and fault prognosis in industrial plants.

It is important to note, however, that the development of new methods must be driven by the needs of the industry rather than for the sake of simply implementing new techniques. This means that, although it is tempting to develop a sophisticated method that can handle all the issues discussed in this article, it is more beneficial to understand the case study and the characteristics of the plant data at hand so that the right solutions are delivered to the end users.

6. Conclusions

In this paper, we reviewed the applications of kernel methods to perform nonlinear process monitoring. This paper firstly discussed the relationship between kernel methods and other techniques from machine learning, more importantly neural networks. Within this context, we gave motivations on why kernel methods are worthwhile to consider to perform nonlinear feature extraction from industrial plant data.

Based on 230 collected papers from 2004 to 2019, this article then identified 12 major issues that researchers aim to address regarding the use of kernel methods as feature extractors. We discussed issues such as how to choose the kernel function, how to decide kernel parameters, how to perform fault diagnosis in kernel feature space, how to compute kernel mappings faster, how to make the kernel computation adaptive, how to learn manifolds or local structures, and how to benefit from ensembles and deep architectures. The rest of the topics include how to handle batch process data, how to account for process dynamics, how to monitor quality variables, how to improve detection, and how to distribute the monitoring task across the whole plant. By addressing these issues, we have seen how nonlinear process monitoring research has progressed extensively in the last 15 years, through the impact of kernel methods.

Finally, potential future directions on kernel-based process monitoring research were presented. Emerging topics on new kernel designs, handling heterogeneous data, and performing fault prognosis were deemed worthwhile to investigate. In order to move forward, we encourage more researchers to venture in this area of process monitoring. For interested readers, this article is also supplemented by MATLAB codes for SVM and kernel PCA (see Figure 3 and Ref. [315]), which were made available to the public. We hope that this article can contribute to the further understanding of the role of kernel methods in process monitoring, and provide new insights for researchers in the field.

Author Contributions

Conceptualization, K.E.P.; data curation, K.E.P.; formal analysis, K.E.P.; funding acquisition, K.E.P. and Y.C.; investigation, K.E.P.; methodology, K.E.P.; project administration, K.E.P. and M.S.; resources, K.E.P. and M.S.; software, K.E.P.; supervision, M.S., Y.C., and L.L.; validation, K.E.P.; visualization, K.E.P.; writing—original draft preparation, K.E.P.; writing—review and editing, M.S., Y.C., L.L., and S.-H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Faculty Development Fund of the Engineering Research and Development for Technology (ERDT) program of the Department of Science and Technology (DOST), Philippines. Support from the National Key Research and Development Plan (2018YFC0214102) of P. R. China is also acknowledged.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in the manuscript text:

AI	Artificial Intelligence	MI	Mutual Information
ANN	Artificial Neural Network	MSPM	Multivariate Statistical Process Monitoring
CNKI	China National Knowledge Infrastructure	PSE	Process Systems Engineering
DTW	Dynamic Time Warping	RKHS	Reproducing Kernel Hilbert Space
EWMA	Exponentially Weighted Moving Average	SCADA	Supervisory Control and Data Acquisition
FVS	Feature Vector Selection	SDG	Signed Digraph
GA	Genetic Algorithm	SFM	Similarity Factor Method
GLRT	Generalized Likelihood Ratio Test	SOM	Self-organizing Maps
GP	Gaussian Processes	SPA	Statistical Pattern Analysis
KDE	Kernel Density Estimation	SVDD	Support Vector Data Description
kNN	k-Nearest Neighbors	SVM	Support Vector Machine
KPCA	Kernel Principal Components Analysis	UCL	Upper Control Limit

Abbreviations of the kernelized methods in Table 3 are as follows:

AMD	Augmented Mahalanobis distance	ICA	Independent components analysis
C-PLS	Concurrent partial least squares	K-means	K-means clustering
CCA	Canonical correlation analysis	LLE	Local linear embedding
CVA	Canonical variate analysis	LPP	Locality preserving projections
DD	Direct decomposition	LS	Least squares
DISSIM	Dissimilarity analysis	MVU	Maximum variance unfolding
DL	Dictionary learning	NNMF	Non-negative matrix factorization
DLV	Dynamic latent variable model	PCA	Principal components analysis
ECA	Entropy components analysis	PCR	Principal component regression
EDA	Exponential discriminant analysis	PLS	Partial least squares
ELM	Extreme learning machine	RPLVR	Robust probability latent variable regression
FDA	Fisher discriminant analysis	SDA	Scatter-difference-based discriminant analysis
FDFDA	Fault-degradation-oriented FDA	SFA	Slow feature analysis
GLPP	Global-local preserving projections	T-PLS	Total partial least squares
GMM	Gaussian mixture model	VCA	Variable correlations analysis

Abbreviations of the case studies in Table 3 are as follows:

AEP	Aluminum electrolysis process	HGPWLTP	Hot galvanizing pickling waste liquor
AIRLOR	Air quality monitoring network		treatment process
BAFP	Biological anaerobic filter process	HSMP	Hot strip mill process
BDP	Butane distillation process	IGT	Industrial gas turbine
CAP	Continuous annealing process	IMP	Injection moulding process
CFPP	Coal-fired power plant	IPOP	Industrial p-xylene oxidation process
CLG	Cyanide leaching of gold	MFF	Multiphase flow facility
CPP	Cigarette production process	NE	Numerical example
CSEC	Cad System in E. coli	NPP	Nosiheptide production process
CSTH	Continuous stirred-tank heater	PCBP	Polyvinyl chloride batch process
CSTR	Continuous stirred-tank reactor	PenSim	Penicillin fermentation process
DMCP	Dense medium coal preparation	PP	Polymerization process
DP	Drying process	PV	Photovoltaic systems
DTS	Dissolution tank system	RCP	Real chemical process
EFMF	Electro-fused magnesia furnace	SEP	Semiconductor etch process
FCCU	Fluid catalytic cracking unit	TEP	Tennessee Eastman plant
GCND	Genomic copy number data	TPP	Thermal power plant
GHP	Gold hydrometallurgy process	TTP	Three-tank process
GMP	Glass melter process	WWTP	Wastewater treatment plant

Abbreviations of kernel functions in Table 3 are as follows:

RBF	Gaussian radial basis function kernel	HK	Heat kernel
POLY	Polynomial kernel	SIG	Sigmoid kernel
COS	Cosine kernel	NSDC	Non-stationary discrete convolution kernel
WAV	Wavelet kernel

Figures and Tables

Figure 1. Three categories of process monitoring methods. See [1,6] for more details.

View Image - Figure 2. Basic steps of typical Multivariate Statistical Process Monitoring (MSPM) methods to achieve fault detection. Here, the feature extraction step shows only a linear transformation of data.

Figure 2. Basic steps of typical Multivariate Statistical Process Monitoring (MSPM) methods to achieve fault detection. Here, the feature extraction step shows only a linear transformation of data.

View Image - Figure 3. Illustration of kernel nonlinear transformation. These were generated with code available in https://uk.mathworks.com/matlabcentral/fileexchange/65232-binary-and-multi-class-svm.

Figure 3. Illustration of kernel nonlinear transformation. These were generated with code available in https://uk.mathworks.com/matlabcentral/fileexchange/65232-binary-and-multi-class-svm.

View Image - Figure 4. Machine learning methods relevant to process monitoring (from the authors’ perspective). Those with (*) have versions that belong to the family of kernel methods.

Figure 4. Machine learning methods relevant to process monitoring (from the authors’ perspective). Those with (*) have versions that belong to the family of kernel methods.

Figure 5. Yearly distribution of publications found in the literature review.

Figure 6. (a) Commonly used kernelized methods found in the review; (b) Breakdown of the type of case studies found in the review.

Figure 7. Illustration of multi-modality in process operations.

View Image - Figure 8. (a) Number of papers that cited the use of which kernel functions; (b) Number of papers that cited the use of which kernel parameter selection routes. Note: Papers can appear in more than one column, hence, the numbers will not add to 230 (the total number of reviewed papers).

Figure 8. (a) Number of papers that cited the use of which kernel functions; (b) Number of papers that cited the use of which kernel parameter selection routes. Note: Papers can appear in more than one column, hence, the numbers will not add to 230 (the total number of reviewed papers).

View Image - Figure 9. Illustration of manifold learning: (a) S-curve data set; (b) 2-D Kernel principal components analysis (PCA) projection using radial basis function (RBF) kernel, c=10; (c) 2-D Local linear embedding (LLE) using kNN, k=15. See [319] for more details.

Figure 9. Illustration of manifold learning: (a) S-curve data set; (b) 2-D Kernel principal components analysis (PCA) projection using radial basis function (RBF) kernel, c=10; (c) 2-D Local linear embedding (LLE) using kNN, k=15. See [319] for more details.

Table 1

Other recent reviews and their relationship to the present review.

Year	Reference	Remark
2012	Qin [25]	Discusses the general issues and explains how basic data-driven process monitoring (MSPM) methods work.
2012	MacGregor and Cinar [26]	Reviews data-driven models not only in process monitoring, but also in optimization and control.
2013	Ge et al. [6]	Reviews data-driven process monitoring using recent MSPM tools and discusses more recent issues.
2014	Yin et al. [27]	Reviews data-driven process monitoring but from an application point of view; it also provides a basic monitoring framework.
2014	Ding et al. [28]	Reviews data-driven process monitoring methods with specific focus on dynamic processes.
2014	Qin [15]	Gives an overview of process data analytics, in which process monitoring is only one of the applications.
2015	Yin et al. [29]	Reviews data-driven methods not only in industrial processes, but also in smart grids, energy, and power systems, etc.
2015	Severson et al. [30]	Gives an overview of process monitoring in a larger context than just data-driven methods, and advocates hybrid methods.
2016	Tidriri et al. [31]	Compares physics-driven and data-driven process monitoring methods, and reviews recent hybrid approaches.
2016	Yin and Hou [32]	Reviews process monitoring methods that used support vector machines (SVM) for electro-mechanical systems.
2017	Lee et al. [9]	Reviews recent progresses and implications of machine learning to the field of PSE.
2017	Ge et al. [11]	Reviews data-driven methods in the process industries from the point of view of machine learning.
2017	Ge [33]	Reviews data-driven process monitoring methods with specific focus on dealing with the issues on the plant-wide scale.
2017	Wang et al. [34]	Reviews MSPM algorithms from 2008 to 2017, including both papers and patents in Web of Science, IEEE Xplore, and the China National Knowledge Infrastructure (CNKI) databases.
2018	Md Nor et al. [35]	Reviews data-driven process monitoring methods with guidelines for choosing which MSPM and machine learning tools to use.
2018	Alauddin et al. [36]	Gives a bibliometric review and analysis of the literature on data-driven process monitoring.
2019	Qin and Chiang [16]	Reviews machine learning and AI in PSE and advocates the integration of data analytics to chemical engineering curricula.
2019	Jiang et al. [37]	Reviews data-driven process monitoring methods with specific focus on distributed MSPM tools for plant-wide monitoring.
2019	Qui $\tilde{n}$ ones-Grueiro et al. [38]	Reviews data-driven process monitoring methods with specific focus on handling the multi-mode issue.
	This paper	Reviews data-driven process monitoring methods that applied kernel methods for feature extraction.

Table 2

Issues surrounding the use of kernel methods for process monitoring.

Label	Name of Issue	No. of Papers That Addressed It
A	Batch process monitoring	30
B	Dynamics, multi-scale, and multi-mode monitoring	72
C	Fault diagnosis in the kernel feature space	100
D	Handling non-Gaussian noise and outliers	41
E	Improved sensitivity and incipient fault detection	39
F	Quality-relevant monitoring	37
G	Kernel design and kernel parameter selection	30
H	Fast computation of kernel features	34
I	Manifold learning and local structure analysis	20
J	Time-varying behavior and adaptive kernel computation	26
K	Multi-block and distributed monitoring	15
L	Advanced methods: Ensembles and Deep Learning	8

Table 3

Summary of papers: The issues they addressed and the kernel method, case studies, and kernel functions they used.

	Year	Reference	Kernelized	Issues Addressed												Case Studies	Kernel/s Used
	Year	Reference	Method/s	A	B	C	D	E	F	G	H	I	J	K	L	Case Studies	Kernel/s Used
1	2004	Lee et al. [24]	PCA	First application												NE, WWTP	RBF
2	2004	Lee et al. [71]	PCA	✓												PenSim	POLY
3	2004	Choi and Lee [85]	PCA		✓											NE, WWTP	RBF
4	2005	Choi et al. [86]	PCA			✓										NE, CSTR	RBF
5	2005	Cho et al. [87]	PCA			✓										NE, CSTR	RBF
6	2006	Yoo and Lee [88]	PCA		✓			✓								NE, WWTP	RBF
7	2006	Lee et al. [89]	PCA, PLS						✓							BAFP	RBF
8	2006	Zhang et al. [90]	ICA				✓									FCCU	-
9	2006	Deng and Tian [91]	PCA		✓	✓										CSTR	RBF
10	2007	Zhang and Qin [72]	PCA, ICA	✓			✓									NPP	RBF
11	2007	Cho [74]	FDA	✓		✓										PCBP, PenSim	POLY
12	2007	Cho [92]	FDA			✓										TEP	RBF
13	2007	Sun et al. [93]	PCA			✓				✓						NE, Rot. Machines	RBF
14	2008	Choi et al. [94]	PCA		✓	✓										CSTR	RBF
15	2008	Tian and Deng [95]	PCA		✓	✓										TEP	RBF
16	2008	Wang et al. [96]	PCA	✓	✓											NPP	RBF
17	2008	Lee et al. [97]	ICA				✓									NE, TEP	RBF
18	2008	Cui et al. [98]	FDA								✓					NE, TEP	RBF, POLY
19	2008	Cui et al. [99]	SDA			✓										TEP	POLY
20	2008	Zhang and Qin [100]	ICA				✓				✓					TEP, WWTP, PenSim	RBF
21	2008	Lu and Wang [101]	PLS		✓	✓								✓		TEP	-
22	2008	He et al. [102]	FDA			✓				✓						TEP	RBF
23	2008	Cho [103]	FDA			✓										TEP	POLY
24	2008	Li and Cui [104]	SDA			✓					✓					TEP	POLY
25	2009	Li and Cui [105]	FDA			✓				✓	✓					TEP, PenSim	POLY, COS
26	2009	Zhang [106]	ICA			✓	✓				✓					TEP	RBF
27	2009	Zhang and Zhang [107]	ICA, PLS				✓		✓							TEP, PenSim	RBF
28	2009	Shao et al. [108]	PCA							✓		✓				NE, TEP	RBF
29	2009	Shao and Rong [109]	MVU			✓						✓				TEP	Manifold
30	2009	Shao et al. [110]	LPP			✓						✓				NE, TEP	Manifold
31	2009	Tian et al. [73]	ICA	✓			✓				✓					PenSim	RBF, POLY
32	2009	Liu et al. [111]	PCA					✓			✓		✓			NE, BDP	RBF
33	2009	Ge et al. [112]	PCA				✓	✓								NE, TEP	RBF
34	2009	Zhao et al. [113]	DISSIM					✓								NE, TEP	RBF
35	2009	Zhao et al. [114]	ICA	✓	✓		✓									TTP, PenSim	RBF
36	2010	Jia et al. [115]	PCA	✓	✓											NE, PenSim	RBF
37	2010	Cheng et al. [116]	PCA		✓			✓					✓			NE, TEP	RBF
38	2010	Alcala and Qin [117]	PCA			✓										CSTR	RBF
39	2010	Zhu and Song [118]	FDA			✓										TEP	RBF
40	2010	Zhang et al. [119]	PLS			✓								✓		CAP	RBF
41	2010	Zhang et al. [120]	PCA	✓	✓						✓					NE, PenSim	RBF
42	2010	Xu and Hu [121]	PCA			✓				✓						TEP	RBF
43	2010	Ge and Song [122]	PCA					✓								TEP	RBF
44	2010	Wang and Shi [123]	ICA (CCA)				✓									WWTP, TEP	RBF
45	2010	Sumana et al. [124]	SDA		✓	✓										NE, TEP	RBF
46	2011	Sumana et al. [125]	PCA								✓					TEP	RBF
47	2011	Khediri et al. [126]	PCA										✓			NE, TEP	RBF
48	2011	Zhang and Ma [127]	PCA, PLS			✓										CAP, EFMF	RBF
49	2011	Zhang and Hu [128]	PLS	✓					✓							CAP, PenSim	RBF
50	2011	Zhang and Hu [129]	PLS			✓			✓					✓		NE, PenSim, EFMF	RBF
51	2011	Zhu and Song [130]	FDA			✓										TEP	RBF
52	2011	Yu [75]	FDA	✓	✓	✓										PenSim	RBF
53	2012	Khediri et al. [131]	K-means		✓											NE, SEP	RBF
54	2012	Rashid and Yu [82]	ICA	✓	✓		✓	✓								PenSim	RBF
55	2012	Zhang et al. [132]	PCA		✓								✓			CAP, PenSim	RBF
56	2012	Zhang and Ma [133]	ICA			✓	✓							✓		CAP	RBF
57	2012	Zhang et al. [134]	PCA		✓		✓									NE, TEP, EFMF	RBF
58	2012	Zhang et al. [135]	PLS	✓	✓				✓				✓			PenSim	-
59	2012	Yu [136]	GMM		✓	✓	✓									WWTP	RBF
60	2012	Guo et al. [137]	PCA		✓					✓						TEP	WAV
61	2012	Jia et al. [84]	PCA							✓						NE, PenSim	RBF, POLY, SIG
62	2012	Sumana et al. [138]	PCA		✓											TEP	POLY
63	2012	Wang et al. [139]	PCA	✓	✓			✓								PenSim	POLY
64	2013	Liu et al. [140]	ICA			✓	✓	✓								CLG	RBF
65	2013	Peng et al. [141]	T-PLS						✓							NE, TEP, HSMP	RBF
66	2013	Peng et al. [79]	T-PLS			✓			✓							HSMP	RBF
67	2013	Wang et al. [142]	PCA	✓	✓			✓			✓					PenSim	POLY
68	2013	Jiang and Yan [143]	PCA					✓					✓			NE, CSTR, TEP	RBF
69	2013	Jiang and Yan [144]	PCA					✓								NE, TEP	RBF
70	2013	Zhang et al. [145]	ICA		✓		✓									CAP	RBF
71	2013	Zhang et al. [146]	PLS			✓			✓							NE, EFMF	RBF
72	2013	Zhang et al. [147]	PCA	✓	✓											PenSim, EFMF	-
73	2013	Zhang et al. [76]	VCA	✓	✓											EFMF	RBF
74	2013	Deng and Tian [148]	PCA			✓	✓	✓								NE, TEP	RBF
75	2013	Deng and Tian [149]	LPP								✓	✓				CSTR	RBF
76	2013	Deng et al. [150]	PCA			✓						✓				TEP	RBF
77	2013	Rong et al. [151]	LPP, FDA			✓	✓					✓				TEP, WWTP	RBF
78	2013	Hu et al. [152]	PLS	✓	✓											PP, PenSim	RBF
79	2013	Hu et al. [153]	PLS				✓			✓						NE, TEP	RBF
80	2014	Fan and Wang [66]	ICA		✓	✓	✓									TEP	RBF
81	2014	Fan et al. [154]	ICA			✓	✓	✓		✓						NE, TEP	RBF
82	2014	Zhang et al. [155]	ICA			✓	✓									EFMF	-
83	2014	Zhang and Li [156]	PCA		✓	✓										EFMF	RBF
84	2014	Cai et al. [157]	ICA			✓	✓									NE, TEP	RBF
85	2014	Wang and Shi [158]	PLS				✓	✓								TEP	-
86	2014	Elshenawy and Mohamed [159]	PCA								✓					TEP	RBF
87	2014	Mori and Yu [160]	PCA, ICA, PLS	✓			✓		✓							PenSim	RBF
88	2014	Castillo et al. [161]	PCA			✓										Air Heater	RBF
89	2014	Vitale et al. [69]	PCA, PLS, FDA	✓		✓										NE, PP, DP	RBF, POLY
90	2014	Peng et al. [162]	PCA			✓				✓						CSTR	RBF
91	2014	Zhao and Xue [163]	T-PLS			✓			✓	✓						TEP	RBF+POLY
92	2014	Godoy et al. [164]	PLS			✓			✓	✓						NE	RBF
93	2014	Kallas et al. [165]	PCA			✓										NE, CSTR	RBF
94	2015	Ciabattoni et al. [166]	CVA		✓											Microgrid	RBF
95	2015	Vitale et al. [81]	PCA	✓		✓										NE, DP, RCP	RBF, POLY
96	2015	Li and Yang [167]	PCA							✓					✓	NE, TEP	RBF
97	2015	Liu and Zhang [168]	PLS		✓				✓							NE, PenSim	RBF
98	2015	Md Nor et al. [169]	FDA		✓	✓										TEP	-
99	2015	Yao and Wang [170]	PCA	✓						✓						PenSim	RBF
100	2015	Wang and Yao [171]	PCA	✓				✓								NE, SEP	RBF
101	2015	Huang et al. [172]	CVA		✓				✓							TEP	RBF
102	2015	Zhang et al. [173]	PLS						✓							NE, EFMF	RBF
103	2015	Zhang et al. [174]	SFA		✓											NE, TEP	RBF
104	2015	Zhang et al. [175]	SFA, FDA			✓										CSTR	RBF
105	2015	Zhang et al. [176]	C-PLS			✓										PenSim	-
106	2015	Samuel and Cao [177]	CVA		✓											TEP	RBF
107	2015	Samuel and Cao [178]	CVA		✓											TEP	RBF
108	2015	Chakour et al. [179]	PCA		✓	✓							✓			TEP	RBF
109	2015	Jiang and Yan [180]	PCA											✓		NE, TEP	RBF
110	2015	Cai et al. [181]	CCA		✓									✓		NE, TEP	RBF
111	2015	Luo et al. [182]	GLPP							✓		✓				NE, TEP	RBF, HK
112	2015	Tang et al. [77]	VCA	✓	✓						✓					PenSim	RBF
113	2015	Bernal de Lazaro et al. [183]	PCA, FDA			✓				✓						TEP	RBF
114	2016	Bernal de Lazaro et al. [184]	PCA, ICA		✓			✓		✓						TEP	RBF
115	2016	Ji et al. [185]	PCA							✓						NE	RBF
116	2016	Xu et al. [186]	PCA		✓	✓		✓								NE, TEP	-
117	2016	Luo et al. [187]	GLPP									✓				NE, TEP	RBF
118	2016	Zhang et al. [188]	ICA		✓		✓									TEP	-
119	2016	Taouali et al. [189]	PCA								✓					CSTR	RBF
120	2016	Fazai et al. [190]	PCA										✓			CSTR, TEP	RBF
121	2016	Jaffel et al. [191]	PCA								✓		✓			TEP	RBF
122	2016	Mansouri et al. [192]	PCA					✓								NE, CSTR	-
123	2016	Botre et al. [193]	PLS					✓								CSTR	-
124	2016	Samuel and Cao [194]	PCA			✓	✓									TEP	RBF
125	2016	Ge et al. [195]	FDA			✓										CSTH, TEP	RBF
126	2016	Jia et al. [196]	PLS			✓	✓		✓							NE, HGPWLTP	RBF
127	2016	Jia and Zhang [197]	PLS		✓				✓							NE, TEP	RBF
128	2016	Jiang et al. [198]	PCA			✓								✓		TEP, CSTR	RBF
129	2016	Peng et al. [199]	PLS, Fuzzy C-means	✓	✓	✓			✓							HSMP	RBF
130	2016	Xie et al. [200]	PCA							✓	✓					NE, BDP	RBF
131	2016	Wang et al. [201]	PCR						✓							NE	RBF
132	2016	Huang and Yan [202]	PCA					✓								NE, TEP	RBF
133	2016	Xiao and Zhang [203]	PCA, ICA			✓	✓									TEP	RBF
134	2016	Feng et al. [204]	FDA			✓	✓					✓				TEP	RBF
135	2016	Sheng et al. [205]	C-PLS						✓							NE, TEP	RBF
136	2016	Zhang et al. [206]	PLS, PCA			✓			✓					✓		CAP	RBF
137	2017	Jaffel et al. [207]	PCA								✓		✓			CSTR, TEP	RBF
138	2017	Lahdhiri et al. [208]	PCA								✓		✓			NE, CSTR, AIRLOR	RBF
139	2017	Lahdhiri et al. [209]	PCA								✓		✓			NE, CSTR	RBF
140	2017	Mansouri et al. [210]	PLS		✓			✓								CSEC, GCND	RBF
141	2017	Mansouri et al. [211]	PCA					✓								CSEC	-
142	2017	Sheriff et al. [212]	PCA		✓											CSTR	RBF
143	2017	Cai et al. [213]	ICA			✓	✓									NE, TEP	RBF
144	2017	Zhang et al. [214]	ECA		✓	✓		✓								TEP	RBF
145	2017	Zhang et al. [215]	SFA	✓	✓							✓				NE, PenSim	RBF
146	2017	Zhang and Tian [216]	SFA	✓	✓											PenSim	POLY
147	2017	Zhang et al. [217]	PCA			✓										EFMF	-
148	2017	Zhang et al. [218]	PCA, LLE									✓				EFMF	-
149	2017	Zhang et al. [219]	PCA		✓											NE, SEP	RBF
150	2017	Deng et al. [220]	PCA					✓							✓	TEP	RBF
151	2017	Deng et al. [221]	PCA		✓			✓								NE, CSTR	RBF
152	2017	Deng et al. [222]	PCA, FDA			✓						✓				NE, CSTR	RBF
153	2017	Tan et al. [223]	CVA		✓											MFF	-
154	2017	Shang et al. [224]	CVA		✓								✓			CSTR	RBF
155	2017	Li et al. [225]	DLV		✓											HSMP	RBF
156	2017	Wang and Jiao [226]	LS						✓							NE, TEP	RBF
157	2017	Wang et al. [227]	DD						✓							NE, TEP	RBF
158	2017	Wang et al. [228]	EDA	✓		✓										PenSim	RBF
159	2017	Jiao et al. [229]	PLS						✓							NE, TEP	RBF
160	2017	Huang and Yan [230]	PCA						✓							NE, TEP	RBF
161	2017	Yi et al. [231]	PLS			✓			✓							TEP, AEP	-
162	2017	Md Nor et al. [232]	FDA		✓	✓										TEP	RBF
163	2017	Du et al. [233]	ICA				✓									EFMF	-
164	2017	Zhang and Zhao [234]	PCA, Fuzzy C-means		✓							✓				TEP, MFF	RBF
165	2017	Zhou et al. [235]	RPLVR				✓		✓							NE, TEP	-
166	2017	Gharahbagheri et al. [236]	PCA			✓										DTS, FCCU, TEP	RBF
167	2017	Gharahbagheri et al. [237]	PCA			✓										NE, FCCU, TEP	RBF
168	2017	Fu et al. [68]	PCA, PLS							✓						NE, GMP, BDP, Mixing	RBF
169	2017	Galiaskarov et al. [238]	FDA			✓							✓			Pyrolysis gas furnace	POLY
170	2017	Zhu et al. [239]	ICA			✓	✓			✓						TEP	RBF, POLY, SIG
171	2017	Zhu et al. [240]	CCA						✓							TEP	RBF
172	2018	Liu et al. [241]	CCA		✓	✓			✓					✓		CAP	RBF
173	2018	Wang and Jiao [242]	PLS						✓							NE, TEP	RBF
174	2018	Wang [243]	PLS					✓	✓							NE, CSTR	RBF
175	2018	Huang and Yan [244]	PCA						✓							NE, TEP	RBF
176	2018	Huang and Yan [245]	PCA					✓	✓					✓		NE, TEP, IPOP	RBF
177	2018	Fezai et al. [246]	PCA								✓		✓			NE, TEP	RBF
178	2018	Fezai et al. [247]	PCA			✓				✓	✓		✓			AIRLOR	RBF
179	2018	Mansouri et al. [248]	PCA		✓			✓								NE, CSEC	RBF
180	2018	Jaffel et al. [249]	PCA			✓					✓		✓			CSTR	RBF
181	2018	Lahdhiri et al. [250]	PCA							✓	✓		✓			NE, TEP	RBF
182	2018	Tan and Cao [251]	PCA			✓										NE, TEP	RBF
183	2018	He et al. [252]	LPP	✓				✓				✓				PenSim, HSMP	RBF
184	2018	Navi et al. [253]	PCA		✓	✓							✓			IGT	RBF
185	2018	Chakour et al. [254]	PCA			✓										TEP, Weather station	RBF
186	2018	Deng and Wang [255]	PCA				✓									NE, TEP	RBF
187	2018	Deng et al. [256]	PCA					✓			✓				✓	NE, TEP	RBF, POLY
188	2018	Deng et al. [257]	PCA			✓									✓	NE, TEP	RBF
189	2018	Deng et al. [258]	FDA			✓		✓				✓				TEP	RBF
190	2018	Zhang et al. [259]	SFA	✓	✓	✓										NE, CSTR	RBF
191	2018	Shang et al. [260]	AMD		✓											NE, TEP	POLY
192	2018	Jiang and Yan [261]	PCA												✓	NE, CSTR	-
193	2018	Feng et al. [262]	ICA				✓	✓								EFMF	RBF
194	2018	Zhao and Huang [263]	PCA, DISSIM					✓								TPP, CPP	RBF
195	2018	Zhai et al. [264]	NNMF			✓										PenSim	-
196	2018	Ma et al. [265]	ICA			✓	✓									TEP	RBF
197	2018	Lu et al. [266]	CVA, LPP, FDA		✓	✓						✓				TEP	HK
198	2018	Li et al. [267]	PCA											✓	✓	NE, CPP	-
199	2018	Chu et al. [268]	PLS			✓			✓							DMCPP	RBF
200	2019	Zhai and Jia [269]	NNMF			✓										NE, PenSim	RBF
201	2019	Fezai et al. [270]	PCA					✓			✓		✓			PV	RBF
202	2019	Fazai et al. [271]	PLS					✓			✓		✓			TEP	RBF
203	2019	Deng and Deng [272]	PCA		✓			✓								NE, TEP	RBF
204	2019	Cui et al. [273]	PCA									✓			✓	NE, TEP	RBF, Manifold
205	2019	Pilario et al. [67]	CVA		✓		✓	✓		✓						NE, CSTR	RBF+POLY
206	2019	Lahdhiri et al. [274]	PCA			✓				✓	✓		✓			AIRLOR	RBF
207	2019	Liu et al. [275]	ICA			✓	✓		✓							GHP	RBF
208	2019	Liu et al. [276]	ICA			✓	✓			✓						TEP	RBF
209	2019	Yu et al. [277]	CCA			✓			✓		✓					NE, TEP	RBF
210	2019	Guo et al. [278]	PCA			✓					✓					NE, TEP	RBF
211	2019	Wu et al. [279]	PCA			✓					✓	✓				NE, TEP	RBF
212	2019	Harkat et al. [280]	PCA							✓						NE, TEP	RBF
213	2019	Ma et al. [281]	CVA, EDA		✓	✓			✓				✓			HSMP	-
214	2019	Zhang et al. [282]	ELM									✓				NE, CSTR	RBF
215	2019	Peng et al. [83]	ECA	✓	✓	✓										NE, PenSim	RBF
216	2019	Peng et al. [283]	ICA, EDA		✓	✓	✓				✓		✓	✓		TEP	-
217	2019	Yan et al. [284]	PCA, PLS						✓					✓		NE, TEP	RBF
218	2019	Huang et al. [285]	DL			✓										NE, CSTH, AEP	RBF
219	2019	Li and Zhao [80]	FDFDA			✓								✓	✓	NE, IMP, CFPP	RBF
220	2019	Zhou et al. [286]	PCA								✓					NE, TEP	RBF
221	2019	Deng et al. [287]	PCA		✓									✓		TEP	RBF
222	2019	Wang et al. [288]	PCA			✓				✓	✓					CSTR, HSMP	RBF
223	2019	Zhu et al. [289]	PLS			✓				✓	✓					TEP	RBF
224	2019	Xiao [290]	CVA, LPP		✓							✓				TEP	HK
225	2019	Xiao [291]	CVA		✓			✓								TEP	RBF
226	2019	Shang et al. [292]	PCA										✓			TEP	RBF
227	2019	Geng et al. [293]	PCA			✓							✓			TEP	RBF
228	2019	Md Nor et al. [294]	FDA		✓	✓										TEP	-
229	2019	Tan et al. [295]	PCA		✓					✓						NE, MFF	NSDC
230	2019	Tan et al. [296]	PCA		✓					✓						NE, MFF	NSDC

Word count: 16945

Show less

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Kernel methods are a class of learning machines for the fast recognition of nonlinear patterns in any data set. In this paper, the applications of kernel methods for feature extraction in industrial process monitoring are systematically reviewed. First, we describe the reasons for using kernel methods and contextualize them among other machine learning tools. Second, by reviewing a total of 230 papers, this work has identified 12 major issues surrounding the use of kernel methods for nonlinear feature extraction. Each issue was discussed as to why they are important and how they were addressed through the years by many researchers. We also present a breakdown of the commonly used kernel functions, parameter selection routes, and case studies. Lastly, this review provides an outlook into the future of kernel-based process monitoring, which can hopefully instigate more advanced yet practical solutions in the process industries.

Details

Title

A Review of Kernel Methods for Feature Extraction in Nonlinear Process Monitoring

Author

Pilario, Karl Ezra¹

; Shafiee, Mahmood²

; Cao, Yi³

; Lao, Liyun⁴

; Shuang-Hua Yang³

¹ Department of Energy and Power, Cranfield University, Bedfordshire MK43 0AL, UK; [email protected]; Department of Chemical Engineering, University of the Philippines Diliman, Quezon City 1101, Philippines
² Department of Energy and Power, Cranfield University, Bedfordshire MK43 0AL, UK; [email protected]; School of Engineering and Digital Arts, University of Kent, Canterbury CT2 7NT, UK
³ College of Chemical and Biological Engineering, Zhejiang University, Hangzhou 310027, China; [email protected]
⁴ Department of Energy and Power, Cranfield University, Bedfordshire MK43 0AL, UK; [email protected]

First page

Publication year

2020

Publication date

2020

Publisher

MDPI AG

e-ISSN

22279717

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/pr8010024

ProQuest document ID

2550239322

A Review of Kernel Methods for Feature Extraction in Nonlinear Process Monitoring

Jump to:

Full text

Abstract

Details

Suggested sources