A Time-Series Data Generation Method to Predict

Full text

Turn on search term navigation

1. Introduction

Prognostics and health management (PHM) is an important contributor to maintaining manufacturing productivity and has rapidly evolved from corrective maintenance to condition-based maintenance (CBM). Also known as predictive or data-driven maintenance, CBM decisions are based on analysis of data gathered from equipment and sensors [1].

The established applications of CBM are monitoring, fault diagnosis, maintenance scheduling, and remaining useful life (RUL) prediction [2]. Among them, RUL prediction is the subject of growing interest because of its effectiveness in maintenance scheduling. The RUL can be defined as the difference between the current time and the time failure occurs, and RUL prediction is to predict RUL (or failure time) at the current time based on collected signals, condition, and so forth [3]. The RUL prediction techniques can be categorized into physical model-driven and data-driven techniques [4].

Zhu and Yang [5] computed the themo-elasto-plastic stress and strain fields of turbine blade using finite element methods. They predicted the fatigue life through Basquin and Manson–Coffin formulae. They performed the thermal stress field analysis by using of Ansys software and then diagnosed thermal–mechanical stress for some nodes. Then, they predicted the fatigue life by maximum stress and mean stress. The remaining lifetime was predicted by extracting the parameter of the formula from controlled stress test data. While they introduced the influence of limited external factors such as thermal and mechanical stress on the remaining life, we consider the state as a time series and predicted the remaining life in a continuous situation probabilistically. Taheri and Taheri [6] studied the feasibility and technical design for implementing a combined heat and power system for the Mahshad Power plant. They figured out the remaining life for gas turbines on the basis of available data by approximated approach, whose prediction is derived by using predetermined elements or formulae. In addition, they used the operation data reflecting the current status for prediction, while our method in this paper adopts a probabilistic approach and uses obtainable data rather than deriving the close-formed formula.

Even though the physical model is easy to understand and very accurate, it is almost impossible to formulate modern industrial system. In contrast, the data-driven technique using machine learning (ML) and deep learning (DL) can formulate a very complex system with data collected from the system. For this reason, this paper focuses on data-driven RUL prediction.

Recently, ML and DL models have been frequently applied to predict RUL after training them with run-to-failure data [7]. These models learn the degradation patterns and relationships between the pattern and the RUL from the data collected until the end of the life of the components and predict the RUL of target components. In many recent studies, ML and DL models have shown superior performance for RUL problems [8,9,10,11,12,13,14,15].

Data insufficiency is a major challenge when training ML- or DL-based RUL models. Because a time-series sample is collected only after a component fails, collecting training data tends to be time-consuming and expensive. A model trained with insufficient data can be overfitted [16] and fail to accurately predict the RUL of a new component. Collecting more data can mitigate overfitting but may not be an option because of the required time and cost.

In this paper, we propose a time-series data-generation method that avoids overfitting when training an RUL model. The proposed method generates a sample on the basis of two existing samples called parents in a probabilistic manner and contributes to avoiding overfitting and increasing prediction performance. The generated samples will be added to training data for an RUL prediction model and contribute to avoid overfitting. In other words, an RUL prediction model is trained with union of original samples and generated samples. This approach is similar to SMOTE (Synthetic Minority Over-sampling Technique [17]), one of the most widely used oversampling methods that generates a minority class samples between two selected samples at random, and accordingly, the generated one may not be similar to the original samples in terms of relationship among features while contributing to the increase of classification performance such as recall and F1-score.

As far as our literature survey is concerned, ours is the first attempt to address the data insufficiency problem when training an RUL prediction model. In this paper, we propose a method to transform a time series into an alphabetical sequence and inversely transform the sequence that can be applied in time-series classification and clustering. The proposed data-generation method is not only efficient (i.e., it is inexpensive in terms of computational time) but also effective (i.e., the generated time series cab help avoid overfitting).

The remainder of this paper is organized as follows. Section 2 introduces background theory and related works of RUL prediction and symbolic aggregate approximation. Section 3 proposes a time-series data-generation method for RUL prediction in detail, and Section 4 illustrates the process of the proposed method using a small example. Section 5 experimentally verifies the performance of the proposed method, and Section 6 draws conclusions and suggests future research directions.

2. Background Theory and Related Works

2.1. Remaining Useful Life Prediction

Training and usage process of the RUL prediction model proceeds as follows. First, a set $X = {x^{(1)}, x^{(2)}, \dots, x^{(n)}}$ of training time series samples $x^{(i)}, i = 1, 2, \dots, n$ is transformed by extracting features and the RUL such that:

(1) $D (X) = ⋃_{i = 1}^{n} D (x^{(i)}),$

where

D (X)

is a set of transformed time series in

X

, and

D (x^{(i)})

is a set of pairs (feature, RUL) extracted from

x^{(i)}

(2) $D (x^{(i)}) = (ϕ (x_{1 : t}^{(i)}), y_{t}^{(i)}),$

where

ϕ

is a vector of feature functions,

x_{1 : t}^{(i)}

is a part of the time series collected from time 1 to

t

of sample

i

y_{t}^{(i)} = \frac{T_{i} - t}{T_{i}}

is the RUL of sample

i

measured at

t

, and

T_{i}

is the length of

x^{(i)}

. Second, a model

f

is trained using

D (X)

, and finally, the RUL of the new sample

n + 1

t

is predicted as

f (ϕ (x_{1 : t}^{(n + 1)}))

Many researchers have developed RUL prediction models based on ML or DL. For example, Ali et al. [18] introduced a root mean square entropy estimator (RMSEE), which is the entropy of the root mean square (RMS) of the $j^{th}$ windows, to capture bearing degradation and used it as a feature. They also converted an RUL prediction model into a classification problem with seven classes according to the degradation rate (that is, class 1: under 10%, class 2: 10%–25%, class 3: 25%–40%, class 4: 40%–55%, class 5: 55%–70%, class 6: 70%–85%, and class 7: 85%–100%). Finally, they used a multi-layered perceptron (MLP) combined with a simplified fuzzy adaptive resonance theory map as a prediction model. Zheng et al. [19] pointed out that a traditional regression model using features from a window is not appropriate for RUL prediction because it does not fully consider sequence information, and other sequence learning models have also flaws (e.g., hidden Markov model and recurrent neural networks do not consider long-term dependency among nodes). They proposed a deep long short-term memory (LSTM) consisting of four layers: an input layer, a multi-layer LSTM, a multi-layer perceptron (MLP), and an output layer. In their experiment, MLP, support vector regression (SVR), relevance vector regression, and a convolutional neural network were compared in terms of root mean squared error (RMSE), with the deep LSTM exhibiting the smallest RMSE for four datasets.

Table 1 summarizes previous research that developed RUL prediction models using ML and DL including convolution neural network (CNN), in terms of domain, feature, and base model. As seen Table 1, statistics such as RMS, kurtosis, and skewness are frequently used as features, and MLP, LSTM, and SVR are used primarily for models.

2.2. Symbolic Aggregate Approximation

Pattern extraction is very important for time series analysis such as classification and clustering, but it is hard to extract patterns directly from time series due to huge search space [25]. Thus, representation methods such as symbolic aggregate approximation (SAX), discrete cosine transform (DCT), and discrete wavelet transform (DWT) are usually used to represent time series as sequence before pattern extraction. In this paper, each time series is discretized as alphabet sequence using SAX, patterns are extracted from the sequences, and a new sequence is generated considering the extracted pattern.

SAX can convert a time series $x$ into an alphabetical sequence $S$ for efficient time-series data mining in the following manner [26]. First, the $t^{th}$ element $x^{t}$ of $x$ is normalized as ${\tilde{x}}^{t} = \frac{x^{t} - μ}{σ}$ , where $μ$ and $σ$ denote the mean and standard deviation of $x$ , respectively. Second, ${\tilde{x}}^{t}$ is split into $w$ windows and $r = (r_{1}, r_{2}, \dots, r_{w})$ , where $r_{j}, 1 \leq j \leq w$ is a representative value such as the mean of the values in the $j^{th}$ window, is calculated. One can introduce the standard deviation as the alternative representative value. Third, break points $β_{k}$ for $k = 1, 2, \dots, l$ are computed, where $l$ is the number of alphabetical strings defined by the user, satisfying $\Pr (β_{k} \leq r_{j} < β_{k + 1}) = 1 / l$ . Finally, an alphabet is assigned to each window on the basis of the break points. That is, if $β_{k} \leq r_{j} < β_{k + 1}$ , then the $k^{th}$ alphabet string is assigned to the $j^{th}$ window.

Figure 1 illustrates a SAX application process when the time series is assumed to be normalized. Time series is split into $w = 6$ windows, and mean values $r_{1}, r_{2}, \dots, r_{6}$ in each window are calculated. Three ( $l = 3$ ) alphabetical strings $A$ , $B$ , and $ℂ$ are introduced, and then three break points $β_{μ, 1}$ , $β_{μ, 2}$ , $β_{μ, 3}$ for means are obtained. Then, the alphabet $A$ is assigned to those means less than $β_{μ, 1}$ , $ℂ$ to those bigger than $β_{μ, 2}$ , and $B$ to those between $β_{μ, 1}$ and $β_{μ, 2}$ . Then, we obtain a sequence S = $ℂ$ − $B$ − $A$ − $ℂ$ − $B$ − $ℂ$ of alphabetical strings that is converted from the time series.

SAX has been used to extract features from time series for various tasks such as classification and clustering. For example, Georgoulas et al. [27] extracted alphabetical features to represent the vibration of bearings and used them to detect bearing faults with various classifiers. Park and Jung [28] proposed a method to reveal rules from multivariate time series. It transforms a time series to an alphabetical sequence through SAX and identifies frequent patterns from the sequences using association rule mining. Notaristefano et al. [29] made groups of electrical load pattern by reducing data size using SAX.

3. Proposed Data Generation Method

This section explains the proposed three-phase data-generation method: preprocessing, generating an alphabetical sequence, and generating time-series values. In the first phase, every time-series sample is transformed into a pair of vectors, one of window means and another of window standard deviations, and then each vector is transformed into an alphabetical sequence. In the second phase, two arbitrarily selected pairs of alphabetical sequences form a new sequence pair, with a pattern similar to those of the originally selected sequences. In the third phase, time-series values for each window are generated from the generated pair. Table 2 presents the mathematical notations used in this paper.

3.1. Preprocessing

The objective of the preprocessing phase is to express $x^{(i)}$ as a tuple $(ℳ^{(i)}, S^{(i)})$ , where $ℳ^{(i)}$ and $S^{(i)}$ are window mean vector $i$ and window standard deviation vector $i$ , respectively. The preprocessing phase consists of four steps: z-normalization, segmentation, calculation of break points, and conversion into an alphabetical sequence, as illustrated in Figure 2.

In the first step, $x^{(i)}$ for $i = 1, 2, \dots, n$ is normalized to ${\tilde{x}}^{(i)}$ with its mean ${\bar{x}}^{(i)}$ and standard deviation $s^{(i)}$ as:

(3) ${\tilde{x}}_{t}^{(i)} = \frac{x_{t}^{(i)} - {\bar{x}}^{(i)}}{s^{(i)}} .$

In the second step, ${\tilde{x}}^{(i)}$ for $i = 1, 2, \dots, n$ is split into $w$ windows, where $w$ is the number of windows set by the user, and the mean and standard deviation of each window are calculated by:

(4) $μ_{j}^{(i)} = \frac{\sum_{t = j \times n_{i, j} + 1}^{(j + 1) \times n_{i, j}} {\tilde{x}}_{t}^{(i)}}{n_{i, j}},$

(5) $σ_{j}^{(i)} = \sqrt{\frac{\sum_{t = j \times n_{i, j} + 1}^{(j + 1) \times n_{i, j}} {({\tilde{x}}_{t}^{(i)} - μ_{j}^{(i)})}^{2}}{n_{i, j} - 1}},$

where

n_{i, j}

is the number of elements in the

j^{th} (j = 1, 2, \dots, w)

window of

{\tilde{x}}^{(i)}

, which equals

T_{i} / w

when

j \neq w

, and

{\tilde{x}}^{(i)}

is expressed as a pair of vectors, one of window mean vector

μ^{(i)} = (μ_{1}^{(i)}, μ_{2}^{(i)}, \dots, μ_{w}^{(i)})

, and another of window standard deviation vector

σ^{(i)} = (σ_{1}^{(i)}, σ_{2}^{(i)}, \dots, σ_{w}^{(i)})

In the third step, break points $β_{μ}$ and $β_{σ}$ for $μ_{j}^{(i)}$ and $σ_{j}^{(i)}$ ( $i = 1, \dots, n,$ , $j = 1, \dots, w$ ), respectively, are obtained according to the size of the set of the alphabetical strings, $| A |$ , which is also the user parameter. As explained in Section 2.2, the break points $β_{μ}$ and $β_{σ}$ are used as criterion to convert mean and standard deviation of each window into alphabets. $β_{μ}$ and $β_{σ}$ are calculated and used not for individual samples but for all samples to consider the sample’s scale when generating new sample. The break point is obtained from $\Pr (β_{q} \leq x < β_{q + 1}) = \frac{1}{| A |}$ , which implies that each interval $(β_{q}, β_{q + 1}]$ , $q = 1, 2, \dots, | A | - 1$ , contains the same number of values, and $β_{q}$ is therefore $\frac{q}{| A |} \times 100 %$ .

In the fourth step, $μ^{(i)}$ and $σ^{(i)}$ are expressed as alphabetical sequences $ℳ^{(i)}$ and $S^{(i)}$ , respectively, as follows:

(6) $ℳ_{j}^{(i)} = {\begin{matrix} \begin{matrix} A_{μ, 1}, \\ A_{μ, 2}, \end{matrix} & \begin{matrix} β_{μ, 0} < μ_{j}^{(i)} \leq β_{μ, 1} \\ β_{μ, 1} < μ_{j}^{(i)} \leq β_{μ, 2} \end{matrix} \\ \begin{matrix} ⋮ \\ A_{μ, | A_{μ} |}, \end{matrix} & \begin{matrix} ⋮ \\ β_{μ, | A_{μ} | - 1} < μ_{j}^{(i)} \leq β_{μ, | A_{μ} |} \end{matrix} \end{matrix},$

(7) $S_{j}^{(i)} = {\begin{matrix} \begin{matrix} A_{σ, 1}, \\ A_{σ, 2}, \end{matrix} & \begin{matrix} β_{σ, 0} < σ_{j}^{(i)} \leq β_{σ, 1} \\ β_{σ, 1} < σ_{j}^{(i)} \leq β_{σ, 2} \end{matrix} \\ \begin{matrix} ⋮ \\ A_{σ, | A_{μ} |}, \end{matrix} & \begin{matrix} ⋮ \\ β_{σ, | A_{σ} | - 1} < σ_{j}^{(i)} \leq β_{σ, | A_{σ} |} \end{matrix} \end{matrix},$

where

β_{μ, 0}

and

β_{σ, 0}

are negative infinity, and

A_{μ, j}

and

A_{σ, j}

indicate the

j^{th}

predefined alphabet (e.g.,

A_{μ, 1} = A

A_{σ, 4} = [double-struck d]

) for the window mean and standard deviation, respectively.

The first phase is summarized in Algorithm A1 in the Appendix A.

3.2. Generating Alphabetical Sequences

The second phase is to generate artificial sequences of alphabets for $ℳ^{(k)}$ and $S^{(k)}$ on the basis of two randomly selected parental samples $(ℳ^{(a)}, S^{(a)})$ and $(ℳ^{(b)}, S^{(b)})$ . Figure 3 illustrates an example of the generating process of alphabetical sequence $ℳ^{(k)}$ based on $ℳ^{(a)} = A B ℂ B$ and $ℳ^{(b)} = B B B ℂ$ . Note that generating process of $S^{(k)}$ is the same as the process of $ℳ^{(k)}$ .

In this figure, an edge from alphabet A to B denotes that alphabet A impacts on generating alphabet B.

Sequence $ℳ^{(k)}$ for the window mean is generated by sequentially selecting an element from either $ℳ_{j}^{(a)}$ and $ℳ_{j}^{(b)}$ on the basis of the probabilities $\Pr (ℳ_{j}^{(a)} | ℳ_{1 : j - 1}^{(k)})$ and $\Pr (ℳ_{j}^{(b)} | ℳ_{1 : j - 1}^{(k)})$ , where $ℳ_{1 : j - 1}^{(k)} = (ℳ_{1}^{(k)}, ℳ_{2}^{(k)}, \dots, ℳ_{j - 1}^{(k)})$ is a partial sequence of $ℳ^{(k)}$ . Sequence $S^{(k)}$ for the window standard deviation is similarly generated on the basis of $\Pr (S_{j}^{(a)} | S_{1 : j - 1}^{(k)})$ and $\Pr (S_{j}^{(a)} | S_{1 : j - 1}^{(k)})$ . $ℳ^{(k)}$ and $S^{(k)}$ are assumed to be independent of each other for simplicity.

Probabilities $\Pr (ℳ_{1}^{(a)})$ and $\Pr (ℳ_{1}^{(b)})$ , which are used to select the first element $ℳ_{1}^{(k)}$ , can be calculated by:

(8) $\Pr (ℳ_{1}^{(a)}) = \frac{\sum_{i = 1}^{n} I (ℳ_{1}^{(i)} = ℳ_{1}^{(a)}) + α}{n + α \times | A_{μ} |},$

(9) $\Pr (ℳ_{1}^{(b)}) = \frac{\sum_{i = 1}^{n} I (ℳ_{1}^{(i)} = ℳ_{1}^{(b)}) + α}{n + α \times | A_{μ} |},$

where

I (c o n d i t i o n)

is an indicator function, which returns 1 if

c o n d i t i o n

is satisfied, and 0, otherwise, and

α \geq 0

is a Laplace smoothing parameter to prevent

\Pr (ℳ_{1}^{(a)})

and

\Pr (ℳ_{1}^{(b)})

from becoming either zero or one. That is,

\Pr (ℳ_{1}^{(a)})

and

\Pr (ℳ_{1}^{(b)})

are calculated as smoothed ratio of

ℳ_{1}^{(a)}

and

ℳ_{1}^{(b)}

among the first alphabets of mean sequences, respectively.

Because $ℳ_{1}^{(a)}$ and $ℳ_{1}^{(a)}$ are selected with probabilities in (8) and (9), respectively, these probabilities should be normalized by dividing each by their sum as presented in Equations (10) and (11):

(10) $\tilde{\Pr} (ℳ_{1}^{(a)}) = \frac{\Pr (ℳ_{1}^{(a)})}{\Pr (ℳ_{1}^{(a)}) + \Pr (ℳ_{1}^{(b)})},$

(11) $\tilde{\Pr} (ℳ_{1}^{(b)}) = \frac{\Pr (ℳ_{1}^{(b)})}{\Pr (ℳ_{1}^{(a)}) + \Pr (ℳ_{1}^{(b)})}$

Probabilities, $\Pr (ℳ_{j}^{(a)} | ℳ_{1 : j - 1}^{(k)})$ and $\Pr (ℳ_{j}^{(b)} | ℳ_{1 : j - 1}^{(k)})$ , used to select the $j^{th} (2 \leq j \leq w)$ element $ℳ_{j}^{(k)}$ , can be calculated using the Markovian assumption:

(12) $\Pr (ℳ_{j}^{(a)} | ℳ_{1 : j - 1}^{(k)}) = \Pr (ℳ_{j}^{(a)} | ℳ_{j - 1}^{(k)}),$

(13) $\Pr (ℳ_{j}^{(b)} | ℳ_{1 : j - 1}^{(k)}) = \Pr (ℳ_{j}^{(b)} | ℳ_{j - 1}^{(k)})$

where

\Pr (ℳ_{j}^{(a)} | ℳ_{j - 1}^{(k)})

and

\Pr (ℳ_{j}^{(b)} | ℳ_{j - 1}^{(k)})

are calculated by:

(14) $\Pr (ℳ_{j}^{(a)} | ℳ_{j - 1}^{(k)}) = \frac{\sum_{i = 1}^{n} \sum_{h = \max (0, j - L)}^{\min (w, j + L - 1)} I ((ℳ_{h}^{(i)} = ℳ_{j - 1}^{(k)}) a n d (ℳ_{h + 1}^{(i)} = ℳ_{j}^{(a)}))}{\sum_{i = 1}^{n} \sum_{h = \max (0, j - L)}^{\min (w, j + L)} I (ℳ_{h}^{(i)} = ℳ_{j - 1}^{(k)})},$

(15) $\Pr (ℳ_{j}^{(b)} | ℳ_{j - 1}^{(k)}) = \frac{\sum_{i = 1}^{n} \sum_{h = \max (0, j - L)}^{\min (w, j + L - 1)} I ((ℳ_{h}^{(i)} = ℳ_{j - 1}^{(k)}) a n d (ℳ_{h + 1}^{(i)} = ℳ_{j}^{(b)}))}{\sum_{i = 1}^{n} \sum_{h = \max (0, j - L)}^{\min (w, j + L)} I (ℳ_{h}^{(i)} = ℳ_{j - 1}^{(k)})}$

In Equations (14) and (15), $L$ is a parameter to restrict the search space to determine the number of alphabets that match $ℳ_{j - 1}^{(k)}$ , $ℳ_{j}^{(a)}$ , $ℳ_{j}^{(b)}$ in $ℳ^{(i)}$ for all values of $i$ . The parent sample is randomly selected, and to produce the next sample, we adopt the Markov process for randomness and variability. In the Markov assumption, we can add more variability by properly setting the value of $L$ representing the size of the search space (Equations (14) and (15)). These probabilities should be normalized as follows:

(16) $\tilde{\Pr} (ℳ_{j}^{(a)} | ℳ_{1 : j - 1}^{(k)}) = \frac{\Pr (ℳ_{j}^{(a)} | ℳ_{j - 1}^{(k)})}{\Pr (ℳ_{j}^{(a)} | ℳ_{j - 1}^{(k)}) + \Pr (ℳ_{j}^{(b)} | ℳ_{j - 1}^{(k)})}$

(17) $\tilde{\Pr} (ℳ_{j}^{(b)} | ℳ_{1 : j - 1}^{(k)}) = \frac{\Pr (ℳ_{j}^{(b)} | ℳ_{j - 1}^{(k)})}{\Pr (ℳ_{j}^{(a)} | ℳ_{j - 1}^{(k)}) + \Pr (ℳ_{j}^{(b)} | ℳ_{j - 1}^{(k)})}$

An algorithm to generate alphabetical sequences of an artificial sample’s mean and standard deviation is presented as Algorithm A2. It is based on sampling from categorical distribution to select either $a$ or $b$ . For example, the first element of $ℳ^{(k)}$ follows the categorical distribution $C (ℳ_{1}^{(a)}, ℳ_{1}^{(b)} | \tilde{\Pr} (ℳ_{1}^{(a)}), \tilde{\Pr} (ℳ_{1}^{(b)}))$ , implying that $ℳ_{1}^{(k)}$ is selected from $ℳ_{1}^{(a)}$ and $ℳ_{1}^{(b)}$ with probabilities $\tilde{\Pr} (ℳ_{1}^{(a)})$ and $\tilde{\Pr} (ℳ_{1}^{(b)})$ .

3.3. Generating Time-Series Values

In this phase, time-series values ${\tilde{z}}_{j}^{(k)}$ in window $j$ are generated from $ℳ_{j}^{(k)}$ and $S_{j}^{(k)}$ for $j = 1, 2, \dots, w$ . Let $ℳ_{j}^{(k)}$ be $A_{μ, r_{1}}$ implying that $β_{μ, r_{1} - 1} < μ_{j}^{(k)} \leq β_{μ, r_{1}}$ , and $S_{j}^{(k)}$ be $A_{μ, r_{2}}$ , implying that $β_{σ, r_{2} - 1} < σ_{j}^{(k)} \leq β_{σ, r_{2}}$ . We assume that ${\tilde{z}}_{j}^{(k)}$ follows a normal distribution, with the mean $μ_{j}^{(k)}$ and standard deviation $σ_{j}^{(k)}$ , and $μ_{j}^{(k)}$ and $σ_{j}^{(k)}$ uniformly distributed in $(β_{μ, r_{1} - 1}, β_{μ, r_{1}}]$ and $(β_{σ, r_{2} - 1}, β_{σ, r_{2}}]$ , respectively, when $r_{1}$ and $r_{2}$ are not 1. When $r_{1}$ is 1, we set $μ_{j}^{(k)} = β_{μ, r_{1}}$ , and $σ_{j}^{(k)} = β_{σ, r_{2}}$ when $r_{2}$ is 1. We also assume that the length of ${\tilde{z}}_{j}^{(k)}$ follows a uniform distribution in $[\min (\frac{T_{a}}{w}, \frac{T_{b}}{w}), \max (\frac{T_{a}}{w}, \frac{T_{b}}{w})]$ , where $a$ and $b$ are indices of the parents of $k$ , and $\frac{T_{a}}{w}$ and $\frac{T_{b}}{w}$ denote the number of elements in each window of ${\tilde{x}}^{(a)}$ and ${\tilde{x}}^{(b)}$ , respectively.

After a ${\tilde{z}}_{j}^{(k)}$ for every value of $j$ is generated, it should be inversely transformed to $z_{j}^{(k)}$ using its mean ${\bar{z}}^{(k)}$ and standard deviation $s^{(k)}$ . We set them as a weighted average of ${\bar{x}}^{(a)}$ and ${\bar{x}}^{(b)}$ , and $s^{(a)}$ and $s^{(b)}$ , respectively, as presented Equations (18) and (19):

(18) ${\bar{z}}^{(k)} = r \times {\bar{x}}^{(a)} + (1 - r) \times {\bar{x}}^{(b)},$

(19) $s^{(k)} = r \times s^{(a)} + (1 - r) \times s^{(b)},$

where

0 < r < 1

is randomly chosen. That is, Equation (18) means

{\bar{z}}^{(k)}

is randomly selected between

{\bar{x}}^{(a)}

and

{\bar{x}}^{(b)}

, and Equation (19) means

s^{(k)}

is randomly selected between

s^{(a)}

and

s^{(b)}

4. Numerical Example

This section describes an example of how a new time series $z^{(k)}$ is generated using the proposed method. Table 3 shows example dataset $X$ , which consists of five samples. Each sample is a time series collected from start to failure of a part by a sensor. In each sample, there are 18 numeric data, implying that life of each corresponding part is 18. Please note that the proposed method can be applied to the data that includes samples with different lengths.

Phase 1. Preprocessing

(1). z-normalization

Each sample in Table 3 is normalized according to its mean and standard deviation as follows.

${\tilde{x}}^{(1)} =$ (−1.79, −1.79, −1.22, −1.22, −0.66, −0.09, −0.09, 0.47, −0.09, −0.09, −0.09, 0.47, 1.03, 0.47, 0.47, 1.03, 1.6, 1.6),

${\tilde{x}}^{(2)} =$ (1.15, 0.16, 0.16, 0.16, 1.15, 0.16, 1.15, 1.15, 1.15, 0.16, 0.16, −0.82, 0.16, 0.16, −0.82, −1.81, −1.81, −1.81),

${\tilde{x}}^{(3)} =$ (0.33, 0.33, 1.08, 1.82, 1.08, 1.08, 1.08, 0.33, 0.33, 0.33, −0.41, −1.16, −1.9, −1.16, −1.16, −0.41, −1.16, −0.41),

${\tilde{x}}^{(4)} =$ (−0.41, −1.46, −0.41, −0.41, −0.41, −1.46, −0.41, −0.41, −0.41, 0.64, 1.69, 1.69, 1.69, 0.64, 0.64, 0.64, −0.41, −1.46),

${\tilde{x}}^{(5)} =$ (−1.07, −1.07, −1.07, −0.27, −0.27, −0.27, −0.27, 0.53, −0.27, −1.07, −1.07, −0.27, −0.27, 0.53, 0.53, 1.34, 2.14, 2.14).

(2). Segmentation

Each normalized sample is split into $w (w = 6)$ windows, and the mean and standard deviation of each window are calculated. For example, the first window of ${\tilde{x}}^{(2)}$ is (1.15, 0.16, 0.16) and its mean and standard deviation $μ_{1}^{(2)}$ and $σ_{1}^{(2)}$ are 0.49 and 0.47, respectively. Values for $μ_{j}^{(i)}$ and $σ_{j}^{(i)}$ for all $i$ and $j$ are calculated as follows.

$μ^{(1)} =$ (−1.60, −0.66, 0.10, 0.10, 0.66, 1.41), $σ^{(1)} =$ (0.27, 0.46, 0.26, 0.26, 0.26, 0.27),

$μ^{(2)} =$ (0.49, 0.49, 1.15, −0.17, −0.17, −1.81), $σ^{(2)} =$ (0.47, 0.47, 0.00, 0.46, 0.46, 0.00),

$μ^{(3)} =$ (0.58, 1.33, 0.58, −0.41, −1.41, −0.66), $σ^{(3)} =$ (0.35, 0.35, 0.35, 0.61, 0.35, 0.35),

$μ^{(4)} =$ (−0.76, −0.76, −0.41, 1.34, 0.99, −0.41), $σ^{(4)} =$ (0.49, 0.49, 0.00, 0.49, 0.49, 0.86),

$μ^{(5)} =$ (−1.07, −0.27, −0.00, −0.80, 0.26, 1.87), $σ^{(5)} =$ (0.00, 0.00, 0.38, 0.38, 0.38, 0.38).

(3). Calculation of break points

We set $| A_{μ} | = | A_{σ} | = 3$ , and break points $β_{μ, j}$ and $β_{σ, j}$ are those that divide the values of all values of $μ_{j}^{(i)}$ and $σ_{j}^{(i)}$ , respectively, into three equal parts. For example, $β_{μ, 1} = - 0.41$ is 1/3 quantile of {−1.60, −0.66, 0.10, 0.10, 0.66, 1.41, 0.49, 0.49, 1.15, −0.17, −0.17, −1.81, 0.58, 1.33, 0.58, −0.41, −1.41, −0.66, −0.76, −0.76, −0.41, 1.34, 0.99, −0.41, −1.07, −0.27, −0.00, −0.80, 0.26, 1.87}. In this manner, all break points can be calculated by:

$β_{μ} = (β_{μ, 1}, β_{μ, 2}, β_{μ, 3}) =$ (−0.41, 0.49, 1.87),

$β_{σ} = (β_{σ, 1}, β_{σ, 2}, β_{σ, 3}) =$ (0.32, 0.46, 0.86).

(4). Conversion into an alphabetical sequence

On the basis of $β_{μ}$ and $β_{σ}$ , $μ^{(i)}$ and $σ^{(i)}$ are converted to an alphabetical sequence $ℳ^{(i)}$ and $S^{(i)}$ for all $i$ , respectively. For example, $μ_{1}^{(1)} = - 1.60$ is converted as alphabet $A$ because $β_{μ, 0} = - \infty < - 1.60 < β_{μ, 1}$ , and $μ_{3}^{(1)} = 0.10$ is converted as alphabet $B$ because $β_{μ, 1} < 0.10 < β_{μ, 2}$ . Thus, $μ^{(1)} =$ (−1.60, −0.66, 0.10, 0.10, 0.66, 1.41) is converted as $A A B B ℂ ℂ$ . $ℳ^{(i)}$ and $S^{(i)}$ for all $i$ are obtained as follows.

$ℳ^{(1)}$ = $A A B B ℂ ℂ$ , $S^{(1)} = [double-struck a] [double-struck c] [double-struck a] [double-struck a] [double-struck a] [double-struck a]$ ,

$ℳ^{(2)}$ = $ℂ ℂ ℂ B B A$ , $S^{(2)} = [double-struck c] [double-struck c] [double-struck a] [double-struck c] [double-struck c] [double-struck a]$ ,

$ℳ^{(3)}$ = $ℂ ℂ ℂ B A A$ , $S^{(3)} = 𝕓 𝕓 𝕓 [double-struck c] 𝕓 𝕓$ ,

$ℳ^{(4)}$ = $A A B ℂ ℂ B$ , $S^{(4)} = [double-struck c] [double-struck c] [double-struck a] [double-struck c] [double-struck c] [double-struck c]$ ,

$ℳ^{(5)}$ = $A B B A B ℂ$ , $S^{(5)} = [double-struck a] [double-struck a] 𝕓 𝕓 𝕓 𝕓$ .

Phase 2. Generating an alphabetical sequence

Suppose sample 1 and 3 are randomly selected as parents. The first mean alphabetical strings of sample 1 and 3 are $A$ and $ℂ$ , and the first standard deviation alphabetical strings are $[double-struck a]$ and $𝕓$ . Thus, $\Pr (A)$ , $\Pr (ℂ)$ , $\Pr ([double-struck a])$ , and $\Pr (𝕓)$ should be calculated and normalized using Equations (6)–(9).

$\Pr (A) = \frac{3}{5}, \Pr (ℂ) = \frac{2}{5}, \tilde{\Pr} (A) = \frac{\frac{3}{5}}{\frac{3}{5} + \frac{2}{5}} = \frac{3}{5}, \tilde{\Pr} (ℂ) = \frac{\frac{2}{5}}{\frac{3}{5} + \frac{2}{5}} = \frac{2}{5}$

$\Pr ([double-struck a]) = \frac{2}{5}, \Pr (𝕓) = \frac{1}{5}, \tilde{\Pr} ([double-struck a]) = \frac{\frac{2}{5}}{\frac{2}{5} + \frac{1}{5}} = \frac{2}{3}, \tilde{\Pr} (𝕓) = \frac{\frac{1}{5}}{\frac{2}{5} + \frac{1}{5}} = \frac{1}{3}$

ℳ_{1}^{(k)}

is sampling from

C (A, ℂ | \frac{3}{5}, \frac{2}{5})

, and

S_{1}^{(k)}

is sampling from

C (𝕓, [double-struck c] | \frac{2}{3}, \frac{1}{3})

, and as a result,

A

and

𝕓

are selected.

The second mean alphabets of sample 1 and 3 are $A$ and $ℂ$ , and the second standard deviation alphabets are $[double-struck c]$ and $𝕓$ . Therefore, $\Pr (A | A)$ , $\Pr (ℂ | A)$ , $\Pr ([double-struck c] | [double-struck a])$ , and $\Pr (𝕓 | [double-struck a])$ should be calculated and normalized using Equations (10)–(15). For convenience, we set $L$ to 1, and accordingly, $ℳ_{1 : 3}^{(i)}$ and $S_{1 : 3}^{(i)}$ for all $i$ were used for the calculation.

$\Pr (A | A) = \frac{2}{3}, \Pr (ℂ | A) = \frac{0}{3}, \tilde{\Pr} (A | A) = \frac{\frac{2}{3}}{\frac{2}{3} + 0} = 1, \tilde{\Pr} (ℂ | A) = \frac{0}{\frac{2}{3} + 0} = 0$

$\Pr ([double-struck c] | [double-struck a]) = \frac{1}{3}, \Pr (𝕓 | [double-struck a]) = \frac{1}{3}, \tilde{\Pr} ([double-struck c] | [double-struck a]) = \frac{\frac{1}{3}}{\frac{1}{3} + \frac{1}{3}} = \frac{1}{2}, \tilde{\Pr} (𝕓 | [double-struck a]) = \frac{\frac{1}{3}}{\frac{1}{3} + \frac{1}{3}} = \frac{1}{2}$

ℳ_{2}^{(k)}

is sampling from

C (A, ℂ | 1, 0)

, and

S_{2}^{(k)}

is sampling from

C (𝕓, [double-struck c] | \frac{1}{2}, \frac{1}{2})

, and as a result,

A

and

𝕓

are selected. This process repeats until

j

becomes

w = 6

From this phase, we obtain $ℳ^{(k)} = A A B ℂ ℂ ℂ$ and $S^{(k)} = [double-struck a] 𝕓 𝕓 𝕓 𝕓 [double-struck c]$ .

Phase 3. Generating time-series values

In this phase, ${\tilde{z}}_{j}^{(k)}$ are generated from $ℳ_{j}^{(k)}$ and $S_{j}^{(k)}$ for $j = 1, 2, \dots, 6$ , and we obtain ${\tilde{z}}^{(k)} =$ (−0.6, −0.11, 0.16, −0.23, −1.32, −0.12, −0.04, 0.44, 0.66, 0.64, 1.23, 0.39, 0.88, 0.9, 1.86, 0.45, −0.2, 1.09).

The generation process for the first window $(j = 1)$ is as follows. Since $ℳ_{j}^{(k)} = A$ and $S_{j}^{(k)} = [double-struck a]$ , $U [\min (\frac{T_{1}}{w}, \frac{T_{3}}{w}), \max (\frac{T_{1}}{w}, \frac{T_{3}}{w})] = U [3, 3] = 3$ samples are generated, where each sample follows a normal distribution with a mean $β_{μ, 1} = - 0.41$ and standard deviation $β_{σ, 1} = 0.32$ , because $ℳ_{j}^{(k)}$ and $S_{j}^{(k)}$ are the first alphabets, and we use the constant mean and standard deviation. As a result, we obtain ${\tilde{z}}_{1}^{(k)} = (- 0.60, - 0.11, 0.16)$ . As an another example, for the fourth window ( $j = 4$ ), $ℳ_{j}^{(k)} = ℂ$ and $S_{j}^{(k)} = 𝕓$ , and three samples are generated if each sample follows a normal distribution with a mean in $U [β_{μ, 1} = - 0.41, β_{μ, 2} = 0.49$ ] and standard deviation $σ$ in $U [β_{σ, 2} =$ 0.32, $β_{σ, 3} =$ 0.46]. As a result, we obtain ${\tilde{z}}_{4}^{(k)} = (0.88, 0.90, 1.86)$ .

Finally, $z^{(k)}$ is obtained by inversely transforming ${\tilde{z}}_{j}^{(k)}$ with ${\bar{z}}^{(k)} = 0.65 \times {\bar{x}}^{(1)} + (1 - 0.65) \times {\bar{x}}^{(3)} = 12.25$ and $s^{(k)} = 0.82 \times s^{(1)} + (1 - 0.18) \times s^{(3)} = 1.42$ , where $r = 0.65$ and 0.18 are randomly chosen weights.

$z^{(k)} =$ (11.4, 12.1, 12.48, 11.93, 10.38, 12.08, 12.2, 12.88, 13.19, 13.16, 14.00, 12.81, 13.5, 13.53, 14.89, 12.89, 11.97, 13.80).

Figure 4 shows the generated sample and its parents (sample 1 and 3) of the example. Dashed and dotted lines denote samples 1 and 3 in Table 3, respectively, and solid line denotes the generated sample when the parents are the sample 1 and 3. Y-axis denotes the sensor value and X-axis denotes time, and thus the horizontal length of a line indicates the whole life. As explained before, the length (i.e., whole life) of every sample in X is 18, and thus the length of the generated sample is also 18. To be more accurate, the length of the generated sample follows uniform distribution with [minimum length of parents, maximum length of parents].

As seen in this graph, the generated sample follows a similar pattern to the ones of sample 1 and 3, which implies it contains the characteristics of exiting samples. However, at the same time, the generated sample should not be too close to the exiting samples in order to ensure the variability. The proposed method selects two parent samples at random, and all alphabet sequences are created on the basis of the Equations (12)–(17), which ensure enough randomness and variability of the generated samples when, e.g., selecting time series size for each alphabet, selecting the first alphabet, and so forth.

5. Experiment and Results

In this section, we describe an experiment to verify that the samples generated by the proposed method contribute to training an RUL model without overfitting. Two RUL prediction models were compared in terms of mean absolute percent error (MAPE), one with an original dataset $X$ and the other with a dataset $X \cup^{} Z$ , where $Z$ is an artificially generated dataset. Section 5.1 explains the procedure of the experiment, Section 5.2 introduces the datasets and hyperparameters used in the experiment, and Section 5.3 shows the results.

5.1. Procedure

First, an original sample $i$ , $x^{(i)}$ is reserved for the test, and the others (i.e., $X^{(- i)} \equiv X - {x^{(i)}}$ ) are used for training. Second, an RUL prediction model $f_{1}$ is trained with $X^{(- i)}$ , to which the transformation for RUL prediction is applied, $⋃_{x \in X^{(- i)}}^{} D (x)$ . Third, the MAPE of the model for $D (x^{(i)})$ , $M A P E_{1, i}$ is calculated. Fourth, we repeat $Q = 100$ times to generate $Z = {z^{(k)} | k = 1, 2, \dots, n - 1}$ using the proposed algorithm for $X^{(- i)}$ under hyperparameters $w$ , $L$ , $| A_{μ} |$ , and $| A_{σ} |$ ; train $f_{2}$ with $⋃_{x \in X^{(- i)} \cup^{} Z}^{} D (x)$ ; and then calculate $M A P E_{2, i}^{(q)}$ . Finally, $M A P E_{1, i}$ and the mean of 〖 $M A P E_{2, i}^{(q)}$ are compared. This procedure is repeated for all possible values of $i$ , $w$ , $L$ , $| A_{μ} |$ , $| A_{σ} |$ , and the models.

The specific procedure is described in Algorithm A3 and the flowchart to illustrate to calculate $M A P E_{1, i}$ , and $M A P E_{2, i}$ is presented in Figure 5.

In Step 6 and Step 7 of this algorithm, MAPE is calculated by:

(20) $M A P E = \frac{1}{T_{i} \times 0.7 - T_{i} \times 0.3} \times \sum_{t = T_{i} \times 0.3}^{T_{i} \times 0.7} | \frac{y_{t}^{(i)} - f (ϕ (x_{1 : t}^{(i)}))}{y_{t}^{(i)}} |,$

This figure illustrates an example of the procedure to calculate $M A P E_{1, 4}$ and $M A P E_{2, 4}$ . The specific process illustrated in this figure is as follows:

(1). $x^{(1)}$ , $x^{(2)}$ , $x^{(3)}$ , and $x^{(4)}$ are transformed into $D (x^{(1)})$ , $D (x^{(2)})$ , $D (x^{(3)})$ , and $D (x^{(4)})$ by applying feature functions, respectively.

(2). RUL prediction model, $f_{1}$ , is trained with $D (x^{(1)}) \cup^{} D (x^{(2)}) \cup^{} D (x^{(3)})$ .

(3). $D (x^{(4)})$ is used to validate $f_{1}$ . That is, ${\hat{y}}_{t}^{(4)} = f_{1} (ϕ (x_{1 : t}^{(4)}))$ for all $t$ is obtained and the prediction results are used to calculate $M A P E_{1, 4}$ .

(4). Three new samples, $z^{(1)}$ , $z^{(2)}$ , and $z^{(3)}$ , are generated by means of the proposed method.

(5). $z^{(1)}$ , $z^{(2)}$ , and $z^{(3)}$ are also transformed into $D (z^{(1)})$ , $D (z^{(2)})$ , and $D (z^{(3)})$ .

(6). RUL prediction model, $f_{2}$ is trained with $D (x^{(1)}) \cup^{} D (x^{(2)}) \cup^{} D (x^{(3)}) \cup^{} D (z^{(1)}) \cup^{} D (z^{(2)}) \cup^{} D (z^{(3)})$ .

(7). $D (x^{(4)})$ is used to validate $f_{2}$ . That is, ${\hat{y}}_{t}^{(4)} = f_{2} (ϕ (x_{1 : t}^{(4)}))$ for all $t$ is obtained, and the prediction results are used to calculate $M A P E_{2, 4}$ .

5.2. Experiment Setting

Datasets were obtained from the prognostic data repository of the U.S. National Aeronautics and Space Administration. Table 4 shows information of the datasets.

All three datasets are well known and have been widely used in the literature for the purpose of verifying the performance of the developed machine learning methods. Each sample in the first and second dataset is a time series of the capacity of lithium-ion battery until it is dead. Discharge was carried out at 24 °C, and each battery was regarded as dead when its capacity was at 30%. Each sample of the third dataset was a signal from vibration sensor attached to bearing. The operating condition of the bearing was 1800 rpm and 4000N, and sampling frequency of the sensor was 25.6 kHz.

Hyperparameters for each experiment are given in the following Table 5.

Battery: Battery #1 and #6; bearing: FEMTO Bearing Set #1; MLP( $h_{1}$ , $h_{2}$ ): multi-layered perceptron with two hidden layers with $h_{1}$ and $h_{2}$ nodes, respectively; LSTM( $h$ , $s$ , $b$ , $e$ ): long short-term memory with $h$ neurons, $s$ timestamps, $b$ batch size, and $e$ epochs; SVR( $C$ , $ε$ , $κ$ ): support vector regression with regularization parameter $C$ , epsilon $ε$ , and kernel function $κ$ .

5.3. Results

Figure 6, Figure 7 and Figure 8 show the experiment results with battery #1, battery #6, and bearing datasets. In the figures, blue bars denote the MAPEs of the model trained with original samples except for the test sample, and the orange bars denote the MAPEs of a model trained with original samples and generated samples.

From the results presented in Figure 6, Figure 7 and Figure 8, we found the following. First, when we included the generated samples for training, except for the LSTM with battery #1 as shown in Figure 6, the MAPEs were smaller than those of cases with only original samples. In the case of MLP with bearing, shown in Figure 8, 20.6% of MAPE was decreased, which was the largest improvement. This shows the proposed method to artificially generate the training samples is effective and could be used to improve the performance of models. Second, MAPE of the model trained using original samples without test sample could be very high. For example, MAPEs of MLP and SVR for the bearing dataset were 35.82% and 37.21%, respectively. This may have been because the features of some test samples (i.e., cumulative root mean square and kurtosis) are quite different from those of the other samples. In other words, the relationships between feature vectors and label (i.e., RUL) are markedly different from each other. In this case, the proposed method effectively decreased the MAPE, as in the case of the MLP for the bearing dataset. Third, the proposed method showed bigger MAPE when using LSTM for battery #1 (Figure 6), contrary to the other results. In essence, LSTM considered previous feature values (i.e., cumulative RMS and kurtosis at time $t - 1$ ) to predict the current label (i.e., RUL at t), but the proposed method did not consider the relationship between two consecutive values. Instead, it took the relationship between two consecutive windows into the consideration, and the values within a window were aggregated to single value, either mean or standard deviation. We think this sometimes may worsen the performance of the model when using more samples for training, which, in turn, results in the larger value of MAPE. However, we obtained smaller MAPEs in battery #6 and bearing, as shown in Figure 7 and Figure 8.

From the experiment, we verified that the proposed method can solve the data insufficiency problem that is common for RUL prediction and often leads to overfitting. In other words, the RUL prediction model trained with original samples and generated samples is more generalized than the one trained with original samples only.

6. Conclusions

Due to time and cost, it is often difficult to collect sufficient run-to-failure data to train ML- and DL-based RUL models. The data insufficiency problem can result in overfitting and undermining of a model’s performance. In this paper, we proposed a time-series data-generation method that identifies patterns from alphabetical sequences converted from original time-series samples using SAX and generates new sequence on the basis of these patterns. Finally, it generates time-series values from each alphabet in the generated sequence. In an experiment using three benchmark datasets, we found that the samples generated by the proposed method effectively increased the performance of the RUL prediction model.

Future efforts to improve the proposed method should take into the account the relationship between consecutive values when generating time-series values. In addition, the proposed method was designed for a univariate time series and may be not appropriate for multivariate time series, which are common in datasets used for RUL predictions. The proposed method should therefore be expanded to consider multivariate time series. Finally, the proposed method has many parameters such as the number of windows, alphabets, and generated samples, which would impact the prediction performance of RUL prediction model. Thus, in the future research, sensitivity analysis of the parameters should be conducted and the method to choose the proper parameter values should be developed.

Author Contributions

Conceptualization, G.A., S.H. and S.L.; methodology, G.A.; software, G.A. and H.Y.; data curation, H.Y.; original draft preparation, G.A. and H.Y.; review and editing, G.A., S.H. and S.L.; funding acquisition, S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government ((MSIT)2019R1A2C1088255).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures and Tables

Figure 1. Symbolic aggregate approximation.

Figure 2. Steps of the preprocessing phase.

Figure 3. Generating process of alphabetical sequences.

Figure 4. Generated sample example.

Figure 5. Example flowchart of the experimental procedure.

Figure 6. Experiment result for battery #1.

Figure 7. Experiment result for battery #6.

Figure 8. Experiment result for bearing.

Table 1

Previous studies on RUL prediction.

Research	Domain	Feature	Base Model
[18]	Bearing	RMSEE	MLP
[19]	General	Original	LSTM
[20]	Bearing	Original	CNN
[21]	Rotating machine	RMS, kurtosis, etc.	MLP
[22]	Bearing	Spectral kurtosis	SVR
[23]	Data	RMS, peak, kurtosis, crest factor, etc.	SVR
[24]	Bearing	RMS and kurtosis	LSTM

General: not considering specific industry domain; original: not using specific feature functions but using raw values.

Table 2

Mathematical notations used in this paper.

Notation	Meaning
$x^{(i)}$	Time series sample $i (i = 1, 2, \dots, n)$ , $x^{(i)} = (x_{1}^{(i)}, x_{2}^{(i)}, \dots, x_{T_{i}}^{(i)})$ , where $x_{t}^{(i)}$ denotes value measured at $t$ , and $T_{i}$ denotes length of $x^{(i)}$ (i.e., whole life of $x^{(i)}$ .
$y_{t}^{(i)}$	RUL of $x^{(i)}$ at $t$ , where $y_{t}^{(i)} = \frac{T_{i} - t}{T_{i}}$ .
$ϕ$	Vector of feature functions to extract features from $x^{(i)}$ to train an RUL prediction model.
$D (x^{(i)})$	Dataset generated by transforming $x^{(i)}$ for RUL prediction, where the bottom 30% and top 30% are trimmed for stability, that is, $D (x^{(i)}) = {(ϕ (x_{1 : t}^{(i)}), y_{t}) \| ｢ T_{i} \times 0.3 ⎤ \leq t \leq ｢ T_{i} \times 0.7 ⎤}$ .
$f$	Supervised model for RUL prediction.
$z^{(k)}$	Generated time-series $k (k = 1, 2, \dots, m)$ , $z^{(k)} = (z_{1}^{(k)}, z_{2}^{(k)}, \dots, z_{T_{k}}^{(k)})$ where $T_{k}$ denotes length of $z^{(k)}$ (i.e., whole life of $z^{(k)}$ ).
${\bar{x}}^{(i)}$ , $s^{(i)}$	Mean and standard deviation of $x^{(i)}$ .
${\tilde{x}}^{(i)}$	Normalized $x^{(i)}$ with mean ${\bar{x}}^{(i)}$ and standard deviation $s^{(i)}$ .
$w$	Number of windows.
$μ^{(i)}$	Window mean vector $i$ , $μ^{(i)} = (μ_{1}^{(i)}, μ_{2}^{(i)}, \dots, μ_{w}^{(i)})$ , where ${\tilde{x}}^{(i)}$ denotes mean of ${\tilde{x}}^{(i)}$ in the $j^{th}$ window.
$σ^{(i)}$	Window standard deviation vector $i$ , $σ^{(i)} = (σ_{1}^{(i)}, σ_{2}^{(i)}, \dots, σ_{w}^{(i)})$ , where $σ_{j}^{(i)}$ denotes standard deviation of ${\tilde{x}}^{(i)}$ in the $j^{th}$ window.
$A_{μ}$	Alphabetical set for $μ^{(i)}$ , $A_{μ} = {A_{μ, 1}, A_{μ, 2}, \dots, A_{μ, \| A_{μ} \|}}$ .
$A_{σ}$	Alphabetical set for $σ^{(i)}$ , $A_{σ} = {A_{σ, 1}, A_{σ, 2}, \dots, A_{σ, \| A_{σ} \|}}$ .
$β_{μ, q}$	The $q^{t h} (q = 1, \dots, \| A_{μ} \|)$ break point for $μ_{j}^{(i)}$ .
$β_{σ, q}$	The $q^{t h} (q = 1, \dots, \| A_{σ} \|)$ break point for $σ_{j}^{(i)}$ .
$ℳ^{(i)}$	Alphabetical sequence of window mean vector $i$ , $ℳ^{(i)} = (ℳ_{1}^{(i)}, ℳ_{2}^{(i)}, \dots, ℳ_{w}^{(i)})$ , where $ℳ_{j}^{(i)}$ denotes alphabet for $μ_{j}^{(i)}$ .
$S^{(i)}$	Alphabetical sequence of window standard deviation vector $i$ , $S^{(i)} = (S_{1}^{(i)}, S_{2}^{(i)}, \dots, S_{w}^{(i)})$ , where $S_{j}^{(i)}$ denotes alphabet for $σ_{j}^{(i)}$ .

Table 3

Example original dataset.

Index	Data	Mean	Standard Deviation
1	(10, 10, 11, 11, 12, 13, 13, 14, 13, 13, 13, 14, 15, 14, 14, 15, 16, 16)	13.17	1.77
2	(10, 9, 9, 9, 10, 9, 10, 10, 10, 9, 9, 8, 9, 9, 8, 7, 7, 7)	8.83	1.01
3	(11, 11, 12, 13, 12, 12, 12, 11, 11, 11, 10, 9, 8, 9, 9, 10, 9, 10)	10.56	1.34
4	(9, 8, 9, 9, 9, 8, 9, 9, 9, 10, 11, 11, 11, 10, 10, 10, 9, 8)	9.39	0.95
5	(11, 11, 11, 12, 12, 12, 12, 13, 12, 11, 11, 12, 12, 13, 13, 14, 15, 15)	12.33	1.25

Table 4

Used datasets.

Dataset	Feature	Number of Samples	Mean Length	Reference
Battery #1	Capacity	4	159.00	[30]
Battery #6	Capacity	4	90.75	[30]
FEMTO Bearing Set #1	Vibration signal	4	44,154,880.00	[31]

Table 5

Hyper parameters for each experiment.

Dataset	Model	$w$	$L$	$\| A_{μ} \|$	$\| A_{σ} \|$	$ϕ$
Battery	MLP (5, 5)LSTM (10, 5, 32, 50)SVR (1, 0.1, rbf)	50	3	3	3	Cumulative RMS and kurtosis
Bearing	MLP (10, 10)LSTM (20, 10, 32, 50)SVR (1, 0.1, rbf)	10000	6	4	4	Cumulative RMS and kurtosis

Appendix A

Algorithm A1. Preprocessing phase.
Input	$x^{(i)}$ for $i = 1, 2, \dots, n$ , $w$ , $A_{μ}$ , $A_{σ}$ .
Notation	Step 0 Initialize $i$ as 1. Step 1 Normalize $x^{(i)}$ as ${\tilde{x}}^{(i)}$ with its mean and standard deviation. Step 2 Split ${\tilde{x}}^{(i)}$ into w windows, and convert ${\tilde{x}}^{(i)}$ into $μ^{(i)}$ and $σ^{(i)}$ by calculating the mean and standard deviation of each window. Step 3 Find break points $β_{μ}$ and $β_{σ}$ according to $\| A_{μ} \|$ and $\| A_{σ} \|$ . Step 4 Convert $μ^{(i)}$ and $σ^{(i)}$ into $ℳ^{(i)}$ and $S^{(i)}$ , respectively. Step 5 If $i$ equals to $n$ , terminate the algorithm. Otherwise, increase $i$ by 1 and go to Step 1.
Output	${(ℳ^{(i)}, S^{(i)}) \| i = 1, 2, \dots, n}$ .

Algorithm A2. Alphabetical sequence generation.
Input	$(ℳ^{(a)}, S^{(a)})$ , $(ℳ^{(b)}, S^{(b)})$ .
Procedure	Step 1 Calculate $\tilde{\Pr} (ℳ_{1}^{(a)})$ , $\tilde{\Pr} (ℳ_{1}^{(b)})$ , $\tilde{\Pr} (S_{1}^{(a)})$ , and $\tilde{\Pr} (S_{1}^{(b)})$ . Step 2 Sampling $ℳ_{1}^{(k)} ~ C (ℳ_{1}^{(a)}, ℳ_{1}^{(b)} \| \tilde{\Pr} (ℳ_{1}^{(a)}), \tilde{\Pr} (ℳ_{1}^{(b)}))$ . Step 3 Sampling $S_{1}^{(k)} ~ C (S_{1}^{(a)}, S_{1}^{(b)} \| \tilde{\Pr} (S_{1}^{(a)}), \tilde{\Pr} (S_{1}^{(b)}))$ . Step 4 $ℳ_{1}^{(k)} = μ_{1}$ and $S_{1}^{(k)} = σ_{1}$ . Step 5 Initialize $j = 2$ . Step 6 Calculate $\tilde{\Pr} (ℳ_{j}^{(a)} \| ℳ_{j - 1}^{(k)})$ , $\tilde{\Pr} (ℳ_{j}^{(b)} \| ℳ_{j - 1}^{(k)})$ , $\tilde{\Pr} (S_{j}^{(a)} \| S_{j - 1}^{(k)})$ , and $\tilde{\Pr} (S_{j}^{(b)} \| S_{j - 1}^{(k)})$ . Step 7 Sampling $ℳ_{j}^{(k)} ~ C (ℳ_{j}^{(a)}, ℳ_{j}^{(b)} \| \tilde{\Pr} (ℳ_{j}^{(a)} \| ℳ_{j - 1}^{(k)}), \tilde{\Pr} (ℳ_{j}^{(b)} \| ℳ_{j - 1}^{(k)}))$ . Step 8 Sampling $S_{j}^{(k)} ~ C (S_{j}^{(a)}, S_{j}^{(b)} \| \tilde{\Pr} (S_{j}^{(a)} \| S_{j - 1}^{(k)}), \tilde{\Pr} (S_{j}^{(b)} \| S_{j - 1}^{(k)}))$ . Step 9 If $j$ equals to w, terminate the algorithm. Otherwise, increase $j$ by 1 and go to Step 5.
Output	$(ℳ^{(k)}, S^{(k)})$ .

Algorithm A3. Procedure of the experiment.
Input	$ϕ$ , $f$ , $X$ , $w$ , $L$ , $\| A_{μ} \|$ , $\| A_{σ} \|$ , $Q$ .
Procedure	Step 1 Transform $x^{(i)}$ to $D (x^{(i)}) = {(ϕ (x_{1 : t}^{(i)}), y_{t}^{(i)}) \| T_{i} \times 0.3 \leq t \leq T_{i} \times 0.7}$ for all $i$ . Step 2 Initialize $i = 1 .$ Step 3 Train $f_{1}$ with $⋃_{x \in X^{(- i)}}^{} D (x)$ . Step 4 Calculate MAPE, $M A P E_{1, i}$ , of $f_{1}$ for $y^{(i)}$ , using Equation (20). Step 5 Initialize $q = 1$ . Step 6 Generate a set of new time series samples, $Z = {z^{(k)} \| k = 1, 2, \dots, n - 1}$ by applying the proposed method to $X - {x^{(i)}}$ with parameters $w$ , $L$ , $\| A_{μ} \|$ and $\| A_{σ} \|$ . Step 7 Transform $z^{(k)}$ to $D (z^{(k)}) = {(ϕ (z_{1 : t}^{(k)}), y_{t}^{(k)}) \| ⎡ T_{k} \times 0.3 ⎤ \leq t \leq ⎡ T_{k} \times 0.7 ⎤}$ for all $k$ . Step 8 Train $f_{2}$ with $⋃_{x \in X^{(- i)} \cup^{} Z}^{} D (x)$ . Step 9 Calculate MAPE, $M A P E_{2, i}^{(q)}$ , of $f_{2}$ for $y^{(i)}$ , using Equation (20). Step 10 If $q$ does not equal to $Q$ , increase $q$ by 1 and go to Step 6. Otherwise, calculate mean of $M A P E_{2, i}^{(q)}$ for all $q$ . Step 11 If $i$ equals to $n$ , terminate this algorithm. Otherwise, increase $i$ by 1 and go to Step 3.
Output	$(M A P E_{1, i}, M A P E_{2, i})$ for $i = 1, 2, \dots, n$ .

References

1. Xia, T.; Dong, Y.; Xiao, L.; Du, S.; Pan, E.; Xi, L. Recent advances in prognostics and health management for advanced manufacturing paradigms. Reliab. Eng. Syst. Saf.; 2018; 178, pp. 255-268. [DOI: https://dx.doi.org/10.1016/j.ress.2018.06.021]

2. Ahmad, R.; Kamaruddin, S. An overview of time-based and condition-based maintenance in industrial application. Comput. Ind. Eng.; 2012; 63, pp. 135-149. [DOI: https://dx.doi.org/10.1016/j.cie.2012.02.002]

3. Si, X.S.; Wang, W.; Hu, C.H.; Zhou, D.H. Remaining useful life estimation–a review on the statistical data driven approaches. Eur. J. Oper. Res.; 2011; 213, pp. 1-14. [DOI: https://dx.doi.org/10.1016/j.ejor.2010.11.018]

4. Okoh, C.; Roy, R.; Mehnen, J.; Redding, L. Overview of remaining useful life prediction techniques in through-life engineering services. Procedia Cirp; 2014; 16, pp. 158-163. [DOI: https://dx.doi.org/10.1016/j.procir.2014.02.006]

5. Zhu, J.; Yang, Z. Themo-elasto-plastic stress and strain analysis and life prediction of gas turbine. Proc. Int. Conf. Meas. Technol. Mechatron. Autom.; 2010; pp. 1019-1022.

6. Taheri, M.J.; Taheri, P. Feasibility study of congeneration for a gas power plant. Proceedings of the 2017 IEEE Electrical Power and Energy Conference; Saskatoon, SK, Canada, 22–25 October 2017.

7. Liao, L.; Köttig, F. Review of hybrid prognostics approaches for remaining useful life prediction of engineered systems, and an application to battery life prediction. IEEE Trans. Reliab.; 2014; 63, pp. 191-207. [DOI: https://dx.doi.org/10.1109/TR.2014.2299152]

8. Lyu, Y.; Gao, J.; Chen, C.; Jiang, Y.; Li, H.; Chen, K.; Zhang, Y. Joint model for residual life estimation based on Long-Short Term Memory network. Neurocomputing; 2020; 410, pp. 284-294. [DOI: https://dx.doi.org/10.1016/j.neucom.2020.06.052]

9. Ruiz-Tagle Palazuelos, A.; Droguett, E.L.; Pascual, R. A novel deep capsule neural network for remaining useful life estimation. Proceedings of the Institution of Mechanical Engineers. Part O J. Risk Reliab.; 2020; 234, pp. 151-167.

10. Sun, H.; Zhang, J.; Mo, R.; Zhang, X. In-process tool condition forecasting based on a deep learning method. Robot. Comput. Integr. Manuf.; 2020; 64, 101924. [DOI: https://dx.doi.org/10.1016/j.rcim.2019.101924]

11. Mo, Y.; Wu, Q.; Li, X.; Huang, B. Remaining useful life estimation via transformer encoder enhanced by a gated convolutional unit. J. Intell. Manuf.; 2021; pp. 1-10. [DOI: https://dx.doi.org/10.1007/s10845-021-01750-x]

12. An, D.; Choi, J.H.; Kim, N.H. Prediction of remaining useful life under different conditions using accelerated life testing data. J. Mech. Sci. Technol.; 2018; 32, pp. 2497-2507. [DOI: https://dx.doi.org/10.1007/s12206-018-0507-z]

13. Borst, N.G. Adaptations for CNN-LSTM Network for Remaining Useful Life Prediction: Adaptable Time Window and Sub-Network Training. Master’s Thesis; Delft University of Technology: Delft, The Netherlands, August 2020.

14. Xie, Z.; Du, S.; Lv, J.; Deng, Y.; Jia, S. A Hybrid Prognostics Deep Learning Model for Remaining Useful Life Prediction. Electronics; 2021; 10, 39. [DOI: https://dx.doi.org/10.3390/electronics10010039]

15. Ramasso, E.; Gouriveau, R. Remaining useful life estimation by classification of predictions based on a neuro-fuzzy system and theory of belief functions. IEEE Trans. Reliab.; 2014; 63, pp. 555-566. [DOI: https://dx.doi.org/10.1109/TR.2014.2315912]

16. Huang, S.; Guo, Y.; Liu, D.; Zha, S.; Fang, W. A two-stage transfer learning-based deep learning approach for production progress prediction in iot-enabled manufacturing. IEEE Internet Things J.; 2019; 6, pp. 10627-10638. [DOI: https://dx.doi.org/10.1109/JIOT.2019.2940131]

17. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmever, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res.; 2002; 16, pp. 321-357. [DOI: https://dx.doi.org/10.1613/jair.953]

18. Ali, J.B.; Chebel-Morello, B.; Saidi, L.; Malinowski, S.; Fnaiech, F. Accurate bearing remaining useful life prediction based on Weibull distribution and artificial neural network. Mech. Syst. Signal Process.; 2015; 56, pp. 150-172.

19. Zheng, S.; Ristovski, K.; Farahat, A.; Gupta, C. Long short-term memory network for remaining useful life estimation. Proceedings of the 2017 IEEE International Conference on Prognostics and Health Management; Dallas, TX, USA, 19–21 June 2017.

20. Zhu, J.; Chen, N.; Peng, W. Estimation of bearing remaining useful life based on multiscale convolutional neural network. IEEE Trans. Ind. Electron.; 2018; 66, pp. 3208-3216. [DOI: https://dx.doi.org/10.1109/TIE.2018.2844856]

21. Deutsch, J.; He, D. Using deep learning-based approach to predict remaining useful life of rotating components. IEEE Trans. Syst. Man Cybern. Syst.; 2017; 48, pp. 11-20. [DOI: https://dx.doi.org/10.1109/TSMC.2017.2697842]

22. Saidi, L.; Ali, J.B.; Bechhoefer, E.; Benbouzid, M. Wind turbine high-speed shaft bearings health prognosis through a spectral Kurtosis-derived indices and SVR. Appl. Acoust.; 2017; 120, pp. 1-8. [DOI: https://dx.doi.org/10.1016/j.apacoust.2017.01.005]

23. Sutrisno, E.; Oh, H.; Vasan, A.S.S.; Pecht, M. Estimation of remaining useful life of ball bearings using data driven methodologies. Proceedings of the 2012 IEEE Conference on Prognostics and Health Management; Denver, CO, USA, 18–21 June 2012.

24. Zhang, B.; Zhang, S.; Li, W. Bearing performance degradation assessment using long short-term memory recurrent network. Comput. Ind.; 2019; 106, pp. 14-29. [DOI: https://dx.doi.org/10.1016/j.compind.2018.12.016]

25. Sun, Y.; Li, J.; Liu, J.; Sun, B.; Chow, C. An improvement of symbolic aggregate approximation distance measure for time series. Neurocomputing; 2014; 138, pp. 189-198. [DOI: https://dx.doi.org/10.1016/j.neucom.2014.01.045]

26. Lin, J.; Keogh, E.; Lonardi, S.; Chiu, B. A symbolic representation of time series, with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery; New York, NY, USA, 13 June 2003.

27. Georgoulas, G.; Karvelis, P.; Loutas, T.; Stylios, C.D. Rolling element bearings diagnostics using the Symbolic Aggregate approXimation. Mech. Syst. Signal Process.; 2015; 60, pp. 229-242. [DOI: https://dx.doi.org/10.1016/j.ymssp.2015.01.033]

28. Park, H.; Jung, J.Y. SAX-ARM: Deviant event pattern discovery from multivariate time series using symbolic aggregate approximation and association rule mining. Expert Syst. Appl.; 2020; 141, 112950. [DOI: https://dx.doi.org/10.1016/j.eswa.2019.112950]

29. Notaristefano, A.; Chicco, G.; Piglione, F. Data size reduction with symbolic aggregate approximation for electrical load pattern grouping. IET Gener. Transm. Distrib.; 2013; 7, pp. 108-117. [DOI: https://dx.doi.org/10.1049/iet-gtd.2012.0383]

30. Saha, B.; Goebel, K. Battery Data Set. NASA Ames Prognostics Data Repository. Available online: http://ti.arc.nasa.gov/project/prognostic-data-repository (accessed on 16 May 2020).

31. Nectoux, P.; Gouriveau, R.; Medjaher, K.; Ramasso, E.; Chebel-Morello, B.; Zerhouni, N.; Varnier, C. PRONOSTIA: An experimental platform for bearings accelerated degradation tests. Proceedings of the IEEE International Conference on Prognostics and Health Management; Denver, CO, USA, 18–21 June 2012.

Word count: 7200

Show less

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Accurate predictions of remaining useful life (RUL) of equipment using machine learning (ML) or deep learning (DL) models that collect data until the equipment fails are crucial for maintenance scheduling. Because the data are unavailable until the equipment fails, collecting sufficient data to train a model without overfitting can be challenging. Here, we propose a method of generating time-series data for RUL models to resolve the problems posed by insufficient data. The proposed method converts every training time series into a sequence of alphabetical strings by symbolic aggregate approximation and identifies occurrence patterns in the converted sequences. The method then generates a new sequence and inversely transforms it to a new time series. Experiments with various RUL prediction datasets and ML/DL models verified that the proposed data-generation model can help avoid overfitting in RUL prediction model.

Details

Title

A Time-Series Data Generation Method to Predict Remaining Useful Life

Author

Ahn, Gilseung¹

; Hyungseok Yun²; Hur, Sun²

; Lim, Siyeong³

¹ Data Analytic Team 1, Hyundai Motors Company, Seoul 06797, Korea; [email protected]
² Department of Industrial and Management Engineering, Hanyang University, Ansan 15588, Korea; [email protected] (H.Y.); [email protected] (S.H.)
³ Korean Research Institute for Human Settlements, Sejong 30147, Korea

First page

1115

Publication year

2021

Publication date

2021

Publisher

MDPI AG

e-ISSN

22279717

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/pr9071115

ProQuest document ID

2579126866

A Time-Series Data Generation Method to Predict Remaining Useful Life

Jump to:

Full text

Abstract

Details

Suggested sources