Functional Principal Components Analysis of

Full text

Turn on search term navigation

(ProQuest: ... denotes non-US-ASCII text omitted.)

Academic Editor:Seenith Sivasundaram

College of Mathematics and Informatics, North China University of Water Conservancy and Hydroelectric Power, Zhengzhou 450000, China

Received 28 February 2014; Revised 23 May 2014; Accepted 4 July 2014; 22 July 2014

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

In the present study of data analysis we have learned, the data we research is either cross-sectional data or panel data. In the practical research, however, we often meet with such data which has functional characteristics. Functional data is multivariate data with an ordering on the dimensions [1]. The data seem to deserve the label "functional" since they so clearly reflect the smooth curves that we assume generated them. The typical dataset of this sort consists of time series and cross-sectional data, such as the time series of stock price, and some datasets even may take on curves or images. Advances in data collection and storage have tremendously increased the presence of such functional data, whose graphical representations are curves, images, or shapes. The theoretical and practical developments in functional data analysis are mainly from the last four decades, due to the rapid development of computer recording and storing facilities. As a new area of statistics, functional data analysis extends existing methodologies and theories from the fields of data analysis, generalized linear models, multivariate data analysis, nonparametric statistics, and many others. Recently, there were several impressive attempts to analyze functional dataset such as Ramsay et al. [2-5], who proposed some new concepts and methods in the field of FDA.

FPCA is the functional analogue of the well-known dimension reduction technique in the multivariate statistical analysis and is useful in determining the common factors or trends that are present in the dynamics of the underlying recovered functions.

The advance of FPCA can be seen when Karhunen [7] and Loève [8] independently developed a theory on the optimal series expansion of a continuous stochastic process. Motivated by a dataset of growth curve, Rao [9] developed some preliminary ideas on FPCA and proposed statistical tests for the equality of average growth curves over a period of time. Much later, Dauxois et al. [10] introduced a functional exposition of PCA with applications to statistical inference. Several other notable developments have arisen out of the systematic research of the functional data analysis group named the Toulouse School of Functional Data Analysis [11].

In recent years, Hall and Hosseini-Nasab [12, 13] showed how the properties of functional principal component analysis can be elucidated through stochastic expansions and related results. Yao et al. [14] proposed a FPCA procedure via a conditional expectation method, which is aimed at estimating functional principal component scores for sparse longitudinal data. Hall and Vial [15] have investigated the properties of FPCA and have given some insights into methodology and convergence rates. Di et al. [16] introduced multilevel FPCA, which is designed to extract the intra- and intersubject geometric components of multilevel functional data. Based on FPCA, Hyndman and Shang [17] proposed graphical tools for visualizing functional data and detecting functional outliers.

Due to the theoretical and practical developments, FPCA has been successfully applied to many practical problems, such as the analysis of cornea curvature in the human eye [18], the analysis of electronic commerce [19], the analysis of growth curve [20], the analysis of income density [21], the analysis of implied volatility surface in finance [22], the analysis of longitudinal primary biliary liver cirrhosis [23], and the analysis of spectroscopy data [24]. Furthermore, Hyndman and Shahid Ullah [25] proposed a smoothed and robust FPCA and used it to forecast age-specific mortality and fertility rates.

The objective of this paper is to study the monthly volatility of return of Shanghai 50 index which consists of 50 stocks. Treating stock price series as random function in a space spanned by finite dimensional functional bases, we intensively explore methods of functional data analysis, especially functional principal component analysis.

In the area of finance, some impressive papers with the functional data analysis are found such as Ramsay and Ramsey [26], Muller and Ulrich [27], and Miao [28]. But, few republications are found with research on the increasingly flourishing Chinese financial market. This paper will fill the blank both in theory and in application.

Our study can be described as an exploratory data approach:

: Data collection...Data Analysis...Conclusions .

This paper is organized as follows. In Section 2, we describe the functional principal component analysis (FPCA), which plays a significant role in the development of functional data analysis. It is also an essential ingredient of functional principal component regression (FPCR). Section 3 will illustrate the empirical study with the application of the theory in Section 2. Some further discussion and a conclusion are presented in Section 4.

2. Methodology

As mentioned before, an important tool in the functional data analysis toolbox is FPCA, that is, functional principal component analysis. The main idea of FPCA is just like multivariate principal component analysis (PCA) but its principal component weights or harmonics are functions of time. They carry the main features of the functional data object and can be interpreted separately.

The differences in notation between PCA and FPCA are summarized in Table 1.

Table 1: The differences in notation between PCA and FPCA [6].

	PCA	FPCA
Variables	X = [ _{x 1} , _{x 2} , ... , _{x n} ] , _xi =^[^_x1i^,...,^_xpi^{][variant prime]} , i=1,...,n	f ( t ) = [ _{f 1} ( t ) , _{f 2} ( t ) , ... , _f_n ( t ) ] , t∈[_x1 ,_xp ]

Data	Vectors ∈^RP	Curves ∈_L2 [_x1 ,_xp ]

Covariance	Matrix V=Cov(X)∈^RP ×^RP	Operator V bounded between _x1 and _xp , _[varphi]k (t)∈_L2 [_x1 ,_xp ],^∫^_x1^_xp V_ξk (t)dt=_λk_ξk (t) V : _L2 [_x1 ,_xp ][arrow right]_L2 [_x1 ,_xp ]

Eigen structure	Vector _Φk ∈R , V_Φk =_λk_Φk , for 1...4;k...4;min...(n,p)	Function _[varphi]k (t)∈_L2 [_x1 ,_xp ] , ^∫^_x1^_xp V_[varphi]k (t)dt=_λk_[varphi]k (t) , for 1...4;k...4;n

Components	Random variables in ^RP	Random variables in _L2 [_x1 ,_xp ]

The basic assumption of FDA is that data generating process can be described as a smooth function. FPCA finds the set of orthogonal principal component function by maximizing the variance along each component.

The first functional principal component _[varphi]1 (t) is defined by [figure omitted; refer to PDF] subject to [figure omitted; refer to PDF] The k th functional principal component _[varphi]k (t) can be found analogously, subject to the additional constraint [figure omitted; refer to PDF] The sample covariance function of f(x)=[_f1 (x),_f2 (x),...,_fn (x)], x∈[_x1 ,_xp ] is given by [figure omitted; refer to PDF] where function _fi (t) has usually been first centered.

Covariance operator V extends the concept of a sample covariance matrix to functional data; it is easy to show that V is a positive compact symmetric linear operator. It is obvious that [figure omitted; refer to PDF] Detailed calculation procedure is provided below.

Step 1.

The data we need in this paper is collected through some public resources such as WIND database.

Step 2.

The data we get may be dirty, so data preprocessing is necessary. Then, the raw data are collected, cleaned, and organized.

Step 3.

The data are next converted to functional form. Through this step, the raw data for observation i are used to define a function _fi that can be evaluated at all values of t over interval [_x1 ,_xp ] . In order to do this, a basis must be specified. A nonparametric method is used to estimate _fi (t) for t∈[_x1 ,_xp ], i=1,...,n .

Then, we express each function as a linear combination of basic functions and approximate each function by a finite number of basis functions _{[straight phi]k} . Consider [figure omitted; refer to PDF] Some popular basis functions, such as polynomial basis functions, Bernstein polynomial basis functions, Fourier basis functions, and wavelet basis function and B-spline, are used to estimate the functions. B-spline is our first choice because of its goodness of fitting nonperiodic data in our study.

Step 4.

The function may also need to be registered or aligned in order to show some important features. Vertical amplitude variation and horizontal variation can be separated by this step. In our study, this step is not used due to our data characteristics.

Step 5.

Next, a variety of preliminary displays and summary statistics are developed. For example, first and second derivative curves estimated from these data using techniques discussed before are displayed and we can elude that some curves have larger variation, while other curves are with less impressed variation.

Step 6.

Then exploratory analyses such as FPCA can be carried out.

The first principal component can be found by solving [figure omitted; refer to PDF]

Step 7.

The k th functional principal component is a solution of [figure omitted; refer to PDF]

subject to [figure omitted; refer to PDF]

Step 8.

Accumulative percentage of explained variance is calculated, and some discussion and economic explanation about the functional principal component are provided finally.

3. Application

We now represent the monthly rate of return of 50 stocks in Figure 1, which constitute the SSE50 index.

Figure 1: The monthly rate of return of 50 stocks.