Polynomial Representations of High-Dimensional

Full text

Turn on search term navigation

1. Introduction The interest in developing and improving statistical methods and models is driven by the ever increasing volumes and variety of data. Making sense of data often requires one to uncover the existing patterns and relationships in data. One of the most commonly used universal tools for data exploration is evaluation of cross-correlation among different data sets and of autocorrelation of data sequence with itself. The correlation values can be utilized to carry out sensitivity analysis, to forecast future values, to visualize complex relationships among subsystems, and to evaluate other important spatio-temporal statistical properties. However, it has been well established that correlations do not imply causalization. The correlations measure linear dependencies between pairs of random variables which is implicitly exploited in linear regression. The correlation values rely on estimated or empirical statistical moments such as sample mean and sample variance. If the mean values are removed from data, the correlations are referred to as covariances. The pairwise correlations can be represented as a fully connected graph with edges parameterized by time shifts and possibly by amplitude adjustments between the corresponding data sequences. However, the graph may become excessively complex when the number of random variables considered is large, for example, when analyzing multiple long sequences of random observations.

The problem of extending the notion of correlations and covariances to more than two random variables has been considered previously in literature. In particular, the multirelation has been defined as a measure of linearity for multiple random variables in [1]. This measure is based on a geometric analysis of linear regression. Under the assumption of multivariate Student-t distribution, the statistical significance of the multirelation coefficient is defined in terms of eigenvalues of the correlation matrix in [2]. An univariate correlation measure for multiple random variables is defined in [3] to be a sum of elements on the main diagonal of the covariance matrix. Linear regression is again utilized in [4] to define the multiple correlation coefficient. It is derived from the coefficient of determination of linear regression as a proportion of the variance in the dependent variable which is predictable from the independent variables.

The distance correlation measure between two random vectors based on the hypothesis testing of independence of random variables is proposed in [5]. Different operations on random variables including ordering, weighting, and a nonlinear transformation are assumed in [6] to define a family of Minkowski distance measures for random vectors. A maximal correlation measure for random vectors is introduced in [7]. It generalizes the concept of maximal information coefficient while nonlinear transformations are also used to allow assessment of nonlinear correlations. Similarly to [3], the sample cross-correlation between multiple realizations of two random vectors is shown in [8] to be proportional to the sum of diagonal elements of the product of the corresponding correlation matrices. The reference [9] investigates two-dimensional general and central moments for random matrices which are invariant to similarity transformations. Two new distance measures for random vectors are defined in [10] assuming the joint characteristic function. These measures can be also used to test for statistical independence of random vectors. The most recent paper [11] derives a linear correlation measure for multiple random variables using determinants of the correlation submatrices.

This brief literature survey indicates that there are still no commonly accepted correlation measures for multiple random variables. The measures considered in the literature are either rigorously derived but mathematically rather complicated, or there are many modifications of the existing, simpler, but well understood measures. In this paper, it is shown that by constraining the complexity of multivariate Taylor series to reduce the number of its parameters or degrees-of-freedom, the Taylor series can be rewritten as a finite degree univariate polynomial. The independent variable of this polynomial is a simple sum of the random variables considered. The polynomial coefficients are real-valued constants, and they are application dependent. The polynomial defines a many-to-one transformation of multiple random variables to another scalar random variable. The mean value of the polynomial represents a broad class of polynomial measures which can be used for any number of random variables. The mean value of each polynomial element corresponds to a general or central moment of the sum of random variables. Therefore, these moments are referred to here as sum-moments. In case of multiple random vectors, similarly to computing the correlation or covariance matrix by first concatenating the vectors into one long vector, the sum-moments can be readily defined and computed for such a concatenated random vector. The main advantages of assuming sum-moments to study statistical properties of multiple random variables or multiple random vectors are the clarity in understanding their statistical significance, and mathematical simplicity of their definition. Moreover, as long as the distribution of the sum of random variables can be found, the closed-form expression for the sum-moments can be obtained.

Before introducing the polynomial representations of random vectors in Section 4 together with central and general sum-moments, a number of auxiliary results are presented in Section 2 and Section 3. In particular, Section 2 summarizes the key results and concepts from the literature concerning stationary random processes, their parameter estimation via linear regression and method of moments, and how to generate the 1st and the 2nd order Markov processes is also given. Section 3 extends the results from Section 2 by deriving additional results which are used in Section 4 and in Section 5 such as a low complexity approximation of linear regression and a procedure to generate multiple Gaussian processes with defined autocorrelation and cross-correlation. The main results of the paper are obtained in Section 4 including defining a class of polynomial statistical measures and sum-moments for multiple random variables and random processes. Other related concepts involving sums of random variables are also reviewed. Section 5 provides several examples to illustrate and evaluate the results obtained in the paper. The paper is concluded in Section 6.

2. Background This section reviews key concepts and results from the literature which are used to develop new results in the subsequent sections. Specifically, the following concepts are briefly summarized: stationarity of random processes, estimation of general and central moments and of correlation and covariance using the method of moments, definition of cosine similarity and Minkowski distance, parameter estimation via linear regression, generation of the 1st and the 2nd order Markov processes, and selected properties of polynomials and multivariate functions are also given. Note that more straightforward proofs for some lemmas are only indicated and not fully and rigorously elaborated. 2.1. Random Processes

Consider a real-valued one-dimensional random processx(t)∈Rover a continuous timet∈R. The process is observed at N discrete time instances_t1<_t2<…<_tNcorresponding to N random variables_Xi=x(_ti),i=1,…,N. The random variablesX={_X1,…,_XN}∈^RNare completely statistically described by their joint density function_fx(X), so that_fx(X)≥0and_∫^RN _fx(X)dX=1. The processx(t) is further assumed to be K-th order stationary [12].

Definition 1.

A random processx(t)is K-th order stationary, if,

_fx(_X1,…,_XK;_t1,…,_tK)=_fx(_X1,…,_XK;_t1+_t0,…,_tK+_t0)∀_t0∈R.

Lemma 1.

The K-th order stationary process is also(K−k)-th order stationary,k=1,2,…,K−1, for any subset{_{X_i1},…,_{X_iK−k}}⊆{_X1,…,_XN}.

Proof. The unwanted random variables can be integrated out from the joint density. □

Unless otherwise stated, in the sequel, all observations of random processes are assumed to be stationary.

The expectation,EX=_∫Rx_fx(x)dx, of a random variable X is a measure of its mean value. A linear correlation between two random variablesx(_t1)andx(_t2) is defined as [12],

_Rx(_t1,_t2)=corrx(_t1)x(_t2)=_Rx(_t2−_t1)=_Rx(τ).

The (auto-) covariance measures a linear dependency between two zero-mean random variables, i.e.,

_Cx(_t1,_t2)=covx(_t1),x(_t2)=Ex(_t1)−Ex(_t1)x(_t2)−Ex(_t2)=_Cx(_t2−_t1).

It can be shown that the maximum of_Cx(τ),τ=_t2−_t1, occurs forτ=0, corresponding to the variance of a stationary processx(t), i.e.,

_Cx(0)=varx(t)=E^{x(t)−Ex(t)2}∀t∈R.

The covariance_Cx(τ)can be normalized, so that,−1≤_Cx(τ)/_Cx(0)≤1. Furthermore, for real-valued regular processes, the covariance has an even symmetry, i.e.,_Cx(τ)=_Cx(−τ) ,12].

ForN>2, it is convenient to define a random vector,X=^{[_X1,…,_XN]T}, where^(·)Tdenotes the transpose, and the corresponding mean vector,X¯=EX. Then, the covariance matrix,

_Cx=E(X−X¯)^(X−X¯)T⊆^RN×N

has as its elements the pairwise covariances,_{[_Cx]ij}=_Cx(_tj−_ti),i,j=1,2,…,N.

Assume now the case of two K-th order stationary random processes_x1(t)and_x2(t), and the discrete time observations,

_X1i=x(_t1i),i=1,2,…,_N1_X2i=x(_t2i),i=1,2,…,_N2.

Using the set notation,_{{_Xi}1:K}={_X1,_X2,…,_XK}, the jointly or mutually stationary processes imply time-shift invariance of their joint density function.

Definition 2.

Random processes_x1(t)and_x2(t)are K-th order jointly stationary, if,

_fx(_{{_X1i}1:K},_{{_X2i}1:K};_{{_t1i}1:K},_{{_t2i}1:K})=_fx(_{{_X1i}1:K},_{{_X2i}1:K};_{{_t1i+_t0}1:K},_{{_t2i+_t0}1:K})∀_t0∈R

is satisfied for all subsets,_{{_X1i}1:K}⊆_{{_X1i}1:_N1},K≤_N1, and_{{_X2i}1:K}⊆_{{_X2i}1:_N2},K≤_N2.

Lemma 2.The K-th order joint stationarity implies the joint stationarity of all orders smaller than K.

Proof. The claim follows from marginalization of the joint density function to remove unwanted variables. □

The cross-covariance of random variables_X1=_x1(_t1)and_X2=_x2(_t2)being discrete time observations of jointly stationary random processes_x1(t)and_x2(t), respectively, is defined as,

_{C_x1 _x2}(_t1,_t2)=cov_X1,_X2=covx(_t1)−Ex(_t1)x(_t2)−Ex(_t2)=_{C_x1 _x2}(_t2−_t1).

The cross-covariance can be again normalized, so it is bounded as,

−1≤_{C_x1 _x2}(τ)/var_x1(t)var_x2(t)≤1.

Note that, unlike for autocovariance, the maximum of_{C_x1 _x2}(τ)can occur for any value of the argumentτ.

The covariance matrix for the random vectors_X1=^{[_X1i,…,_{X1_N1}]T}and_X2=^{[_X2i,…,_{X2_N2}]T}having the means,_X¯1=E_X1and_X¯2=E_X2, respectively, is computed as,

_{C_x1 _x2}=E(_X1−_X¯1)^{(_X2−_X¯2)T}⊆^R_N1×_N2.

Its elements are the covariances,_{[_{C_x1 _x2}]ij}=_{C_x1 _x2}(_t1i−_t2j),i=1,…,_N1,j=1,…,_N2.

In addition to the first order (mean value) and the second order (covariance) statistical moments, higher order statistics of a random variable X are given by the general and the central moments defined, respectively, as [13],

_gm(X)=E^|X|mand_μm(X)=E|X−EX^|m,m=1,2,…

where|X| denotes the absolute value of scalar variable X. The positive integer-valued moments (1) facilitate mathematically tractable integration, and prevent producing complex numbers, if the argument is negative. Note also that the absolute value in (1) is necessary if X is complex valued, or if m is odd, in order to make the moments to be real-valued and convex. The central moment can be normalized by the variance as,

_μm(X)=E|X−EX^|m/E^{^(X−EX)2m/2}.

The cosine similarity between two equal-length vectors,_X1=^{[_X11,…,_X1N]T}and_X2=^{[_X21,…,_X2N]T} , is defined as [14],

_Scos(_X1,_X2)=_∑i=1N _X1i _X2i_∑i=1N _X1i2_∑i=1N _X2i2.

The Minkowski distance between two equal-length vectors_X1and_X2is defined as the_lm -norm [6], i.e.,

_Smnk(_X1,_X2)=_{_X1−_X2m}=(∑i=1N^{_X1i−_X2i ^|m1/m},m=1,2,…

2.2. Estimation Methods

Statistical moments can be empirically estimated from measured data using sample moments. Such inference strategy is referred to as the method of moments [15]. In particular, under the ergodicity assumption, a natural estimator of the mean value of a random variable X from its measurements_Xiis the sample moment,

X¯=EX≈1N∑i=1N_Xi.

The sample mean estimator is unbiased and consistent [15]. More generally, the sample mean estimator of the first moment of a transformed random variableh(X)is,

Eh(X)≈1N−d∑i=1Nh(_Xi)

where d is the number of degrees of freedom used in the transformation h, i.e., the number of other parameters which must be estimated. For example, the variance of X is estimated as,

varX=E|X−X¯^|2≈1N−1∑i=1N^{(_Xi−X¯^)2}

whereX¯^is the estimate of the mean valueX¯of X.

Assuming random sequences_{{_X1i}1:_N1}and_{{_X2i}1:_N2} , their autocovariance and cross-covariance, respectively, are estimated as [15],

_Cx(k)=E_Xi _Xi+k≈1N−k∑i=1N−k_Xi _Xi+k,k≪N._{C_x1 _x2}(k)=E_X1i _X2(i+k)≈1N−k∑i=1N−k_X1i _X2(i+k),k≪N=min(_N1,_N2).

Since these estimators are consistent, the conditionk≪Nis necessary to combine a sufficient number of samples and achieve an acceptable estimation accuracy.

The parameters of a random process can be estimated by fitting a suitable data model to the measurements_Xi. Denote such data model as,E_Xi=_Mi(P),i=1,2,…,N. Assuming the least-squares (LS) criterion, the vector of parameters,P=^{[_P1,…,_PD]T}⊆^RD, is estimated as,

P^=argminP∑i=1N^{(_Xi−_Mi(P))2}.

For continuous parameters, the minimum is obtained using the derivatives, i.e., letddP_Mi(P)=_M˙i(P), and,

ddP∑i=1N^{(_Xi−_Mi(P))2}=!0⇔∑i=1N_M˙i(P^)_Mi(P^)=∑i=1N_M˙i(P^)_Xi.

For a linear data model,_Mi(P)=_wiTPwhere_wi=^{[_w1i,…,_wDi]T}are known coefficients, the LS estimate can be obtained in the closed-form, i.e.,

P^=^{∑i=1N_wi _wiT−1}∑i=1N_wi _Xi

where^(·)−1denotes the matrix inverse.

2.3. Generating Random Processes

The task is to generate a discrete time stationary random process with a given probability density and a given autocovariance. The usual strategy is to generate a correlated Gaussian process followed by a nonlinear memoryless transformation. For instance, the autoregressive (AR) process described by the second order difference equation with constant coefficients_a1,_a2, andb>0 [12], i.e.,

x(n)+_a1x(n−1)+_a2x(n−2)=bu(n)

generates the 2nd order Markov process from a zero-mean white (i.e., uncorrelated) processu(n). For_a1=−21−a1+aand_a2=1−a1+a,0<a<1, this process has the autocovariance,

_C2MP(k)=^b2^(1+a)24^a3︸^σ2^{1−a1+a|k|/2}︸≐^(1−a)|k|(1+a|k|)≐^σ2 ^(1−a)|k|(1+a|k|)=^σ2^{e−|k|(−ln(1−a))}(1+a|k|).

On the other hand, for_a1=a>0, and_a2=0, the AR process,

x(n)+ax(n−1)=bu(n)

generates the 1st order Markov process with the autocovariance,

_C1MP(k)=^b21−^a2︸^σ2^a|k|=^σ2^e−α|k|,a=^e−α.

Lemma 3.

[16] The stationary random processx(k)with autocovariance_Cx(k)is transformed by a linear time-invariant system with real-valued impulse responseh(k)into another stationary random processy(k)=_{∑i=−∞∞}h(i)x(k−i)≡h(k)⊛x(k)with autocovariance,_Cy(k)=h(k)⊛h(−k)⊛_Cx(k). The symbol, ⊛, denotes (discrete time) convolution.

Proof.

By definition, the output covariance,_Cy(k,l)=E(y(k)−Ey(k))(y(l)−Ey(l)). Substitutingy(k)=_{∑i=−∞∞}h(i)x(k−i), and rearranging, the covariance,_Cy(k−l)=_∑i _∑mh(m)_Cx(i−m)h(k−i)=h(k)⊛h(−k)⊛_Cx(k). □

Lemma 4.A stationary random process at the output of a linear or nonlinear time-invariant system remains stationary.

Proof.

For any multivariate functionh(x)∈Rand any_t0∈R, the expectation,

Eh(x)=_∫^RNh(x)_fX(x;_t1,…,_tN)dx=_∫^RNh(x)_fX(x;_t1−_t0,…,_tN−_t0)dx

assuming Definition 1 and provided that dimension N of observations is at most equal to the order of stationarity K. □

For shorter sequences, the linear transformation,X=TU, can be used to generate a normally distributed vectorX∈^RNhaving the covariance matrix,_Cx=T^TT, from uncorrelated Gaussian vectorU∈^RN. The mean,EX=TEU. For longer sequences, a linear time-invariant filter can be equivalently used as indicated by Lemma 3.

2.4. Polynomials and Multivariate Functions

Lemma 5.

A univariate twice differentiable functionp(x)is convex, if and only if,^d2d^x2p(x)=p¨(x)>0for∀x∈R. More generally, a twice differentiable multivariate functionf:^RN↦Ris convex, if and only if, the domain of f is convex, and its Hessian is positive semidefinite, i.e.,^∇2f≥0for∀x∈domf.

Proof.

See [17] (Sec. 3.1). □

Consequently, convex polynomials can be generated as follows.

Lemma 6.

LetQ∈^Rm×mbe a positive semidefinite matrix, and assume a polynomial,p¨(x)=_∑i=02m−2 _pi ^xifor∀x∈Rwhere_pi=_∑k+l=i _Qkl. Then, for any_q0,_q1∈R, the polynomialp(x)of degree2m,

p(x)=_q0+_q1x+∑i=02m−2_pi(i+1)(i+2)^x2+i

is convex.

Proof.

Letx=^{[^x0,^x1,…,^xm−1]T}. Then,^xTQx=_∑i=02m−2 _pi ^xi=p¨(x)≥0for∀x, sinceQis positive semidefinite. Using Lemma 5 concludes the proof. □

For a non-negative integerm∈{0}∪_N+={0,1,2,…} , assume the following notations to simplify the subsequent mathematical expressions [18]:

n={_n1,_n2,…,_nN}∈^{_N+∪0}Nx={_x1,_x2,…,_xN}∈^RN_xp=^{_x1p+_x2p+⋯+_xNp1/p}_x1=|_x1|+|_x2|+⋯+|_xN|_|x|1=_x1+_x2+⋯+_xNh={_h1,_h2,…,_hN}∈^RN^hn=_{h1_n1} _{h2_n2}⋯_{hN_nN}^∂nf(x)=_{∂1_n1} _{∂2_n2}⋯_{∂N_nN}f(x)=^∂_|n|1∂_{x1_n1} _{x2_n2}⋯_{xN_nN}f(x)m!=∏i=1min!=_n1!_n2!⋯_nN!

Note that_|x|1denotes the sum of elements ofxwhereas_x1is the sum of absolute values of its elements.

Lemma 7.

The m-th power of a finite sum can be expanded as [19],

_|x|1m=∑_|n|1=mm!n!^xn.

Proof.

See [18]. □

Theorem 1.

The multivariate Taylor’s expansion of a(m+1)-order differentiable functionf:^RN↦Rabout the pointx∈^RN is written as [19],

f(x+h)=f(x)+∑_|n|1≤m^∂nf(x)n!^hn+∑_|n|1=m+1^∂nf(x+th)n!^hn

for somet∈(0,1).

Proof.

See [18,19]. □

Definition 3.

A multivariate functionf(x)=f(_x1,…,_xN)is said to be symmetric, if, for any permutation of its argumentsxdenoted as^x′,f(x)=f(^x′).

3. Background Extensions In this section, additional results are obtained which are used in the next section to introduce statistical sum-moments for random vectors. In particular, the mean cosine similarity, the mean Minkowski distance as well as the higher central moments for random vectors are defined. A polynomial approximation of univariate functions is shown to be a linear regression problem. A numerically efficient solution of the LS problem is derived. Finally, a procedure to generate multiple Gaussian processes with defined autocovariance and cross-covariance is devised.

Recall the second moment of a random variable X, i.e.,

_μ2=E^(X−c)2,c∈R.

It is straightforward to show that_μ2is minimized forc=EX, giving the variance of X. On the other hand,_μ2=0, if and only ifc=EX±−varX.

For random vectors, both cosine similarity and Minkowski distance are random variables. Assuming that the vectors are jointly stationary, and their elements are identically distributed, the mean cosine similarity can be defined as,

_S¯cos(_X1,_X2)=_∑i=1NE_X1i _X2i_∑i=1NE_X1i2−_X¯1i2_∑i=1NE_X2i2−_X¯2i2=1N∑i=1NE_X1i _X2i_σ1 _σ2=1N∑i=1N_ρ1i,2i

where_σ12=var_X1i,_σ22=var_X2i, and,_ρ1i,2idenotes the Pearson correlation coefficient of the i-th elements of the vectors_X1and_X2. It should be noted that this definition of the mean cosine similarity does not account for other correlations,E_X1i _X2j,i≠j.

The mean Minkowski distance for random vectors can be defined as,

_S¯mnk(_X1,_X2)=^{∑i=1NE^_X1i−_X2im1/m},m=1,2,…

Recognizing the m-th general moment in (5), the m-th power of the mean Minkowski distance can be normalized as,

_S˜mnkm(_X1,_X2)=∑i=1NE^_X1i−_X2imN^{E^_X1i−_X2i2m/2}=_μ¯m(_X1−_X2),m=1,2,…

where the average Minkowski distance between two random vectors is,

_μ¯m(_X1−_X2)=1N∑i=1N_μm(_X1i−_X2i),m=1,2,…

Furthermore, note that form=2,

12E_{_X1−_X222}+E_{_X1+_X222}=E_{_X122}+E_{_X222}.

Assuming positive integersm={_m1,_m2,…,_mN}, the higher order joint central moments of random vectorX=_{{_Xi}1:N}can be defined as,

_{μ_m1,…,_mN}(_X1,…,_XN)=E∏i=1N^{(_Xi−_X¯i)_mi}

or, using more compact index notation as,

_μm(X)=E^(X−X¯)m.

3.1. Linear LS Estimation

The linear LS estimation can be used to fit a degree(D−1)polynomial to N samples of a random processx(t)at discrete time instances_t1,_t2,…,_tN. Hence, consider the polynomial data model,

Ex(t)≈M(t;P)=∑k=1D_Pk ^tk−1.

Denoting_wi=^{[_ti0,_ti1,…,_tiD−1]T} , the linear LS solution (2) gives the estimates,

P^=^{∑i=1N_ti0_ti1⋯_tiD−1_ti1_ti2⋯_tiD⋮⋮⋱⋮_tiD−1_tiD⋯_ti2D−2−1}∑i=1Nx(_ti)_ti0_ti1⋮_tiD−1.

AssumingD=2parameters, i.e., the linear LS regression for a straight line, the parameters_P1and_P2to be estimated must satisfy the following equality:

dd_P1∑i=1N^{_Xi−_w1i _P1−_w2i _P22}=−2∑i=1N_w1i _Xi+2∑i=1N_w1i2 _P1+2∑i=1N_w1i _w2i _P2=!0.

Denoting the weighted averages,X¯=_∑i=1N _w1i _Xi,_{_w122}=_∑i=1N _w1i2, and_w¯12=_∑i=1N _w1i _w2i, a necessary but not sufficient condition for the linear LS estimation of parameters_P1and_P2is,

X¯=_{_w122}_P1+_w¯12 _P2.

In the LS terminology, the values_{_w122}and_w¯12represent independent variables whereasX¯is a dependent variable.

Note that all N measurements are used in (7). However, if N is sufficiently large, the data points could be split into two disjoint sets of_N1and_N2elements, respectively,_N1+_N2=N. For convenience, denote the sums,

_X¯1=1_a1∑i∈_I1_w1i _Xi,_W¯11=1_a1∑i∈_I1_w1i2,_W¯12=1_a1∑i∈_I1_w1i _w2i,_X¯2=1_a2∑i∈_I2_w1i _Xi,_W¯21=1_a2∑i∈_I2_w1i2,_W¯22=1_a2∑i∈_I2_w1i _w2i,

where_a1,_a2>1are some constants (to be determined later), and_I1and_I2are two disjoint index sets, such that_I1∪_I2={1,2,…,N}, and the cardinality,|_I1|=_N1and|_I2|=_N2. Note also that,

_{_w122}=_a1 _W¯11+_a2 _W¯21_w¯12=_a1 _W¯12+_a2 _W¯22X¯=_a1 _X¯1+_a2 _X¯2.

Consequently, the approximate LS estimates of parameters_P1and_P2are readily computed as,

_P^1_P^2=^{_W¯11_W¯12_W¯21_W¯22−1}_X¯1_X¯2.

There are^2Npossibilities how to split N data points into two disjoint subsets indexed by_I1and_I2 . More importantly, the estimates (9) do not guarantee the minimum LS fit, i.e., achieving the minimum squared error (MSE). However, the complexity of performing the LS fit is greatly reduced by splitting the data, since independently of the value ofN≫1, only a2×2 matrix needs to be inverted. The optimum LS fit and the approximate LS fit (9) are depicted in Figure 1. The points_A1and_A2 in Figure 1 correspond to the data subsets indexed by_I1and_I2, respectively. The mid-point,B=_a1 _A1+_a2 _A2 , follows from (8). Note that B is always located at the intersection of the optimum and the approximate linesL_SoptandL_Sapr, respectively. The vertical arrows at points_A1and_A2 in Figure 1 indicate that the dependent values_X¯1and_X¯2are random variables.

The larger the variation of the gradient of the lineL_Sapr in Figure 1, the larger the uncertainty and the probability that the lineL_Saprdeviates from the optimum regression lineL_Sopt. Since the lineL_Sapris defined by points_X¯1and_X¯2, and always,B∈L_Sapr, the spread of random variables_X¯1and_X¯2about their mean values affect the likelihood thatL_Saprdeviates fromL_Sopt. In particular, given0<p<1, there existsξ>0, such that the probability of the gradient_GaprofL_Saprto be within the given bounds is at least,

Pr_bl(ξ)<_Gapr<_bu(ξ)≥p

where

_bl(ξ)=E_X¯2−ξvar_X¯2−E_X¯1+ξvar_X¯1_W¯22−_W¯12_bu(ξ)=E_X¯2+ξvar_X¯2−E_X¯1−ξvar_X¯1_W¯22−_W¯12.

For stationary measurements (cf. (9)), the means and the variances in (10) are equal to,

E_X¯1=EX¯_N1/(_a1N)var_X¯1=varX¯_N1/(_a12N)E_X¯2=EX¯_N2/(_a2N)var_X¯2=varX¯_N2/(_a22N).

The uncertainty in computing_Gaprfrom data, and thus, also the probability of lineL_Saprdeviating from lineL_Sopt is inversely proportional to the width of the interval in (10), i.e.,

_bu(ξ)−_bl(ξ)=2ξ(var_X¯1+var_X¯2)_W¯22−_W¯12∝_N1_a1+_N2_a2_W¯22−_W¯12.

Consequently, the numerator of (11) must be minimized, and the denominator maximized.

In order to minimize the numerator in (11), it is convenient to choose,_a1=_N1νand_a2=_N2ν, whereν∈^R+is a constant to be optimized. It is straightforward to show that the expression,(_N11/2−ν+_N21/2−ν)is convex, i.e., it has a unique global minimum for∀ν>1/2. Hence, a necessary condition to reduce the approximation error is that_a1>_N1and_a2>_N2. For numerical convenience, let_a1=_N1and_a2=_N2 . Then, the dependent and independent variables assumed in (9) become arithmetic averages, and the optimum index subsets have the cardinality,

|_I1|=|_I2|=N/2N−even|_I1|=|_I2|±1=(N±1)/2N−odd.

In order to maximize the denominator in (11), assume that the independent variables(_{_w122},_w¯12) in (9) are sorted by_wli, i.e., let_w11<_w12<…<_w1N . This ordering and the condition (12) suggests that the disjoint index sets_I1and_I2maximizing the difference,_W¯22−_W¯12, are,

_I1={1,2,…,N/2},_I2={N/2+1,…,N}N−even_I1={1,2,…,(N−1)/2},_I2={(N+1)/2,…,N}N−oddor,_I1={1,2,…,(N+1)/2},_I2={(N+3)/2,…,N}.

Such a partitioning corresponds to splitting the data into two equal (N-even) or approximately equal (N-odd) sized subsets by the median (2-quantile) index point.

In summary, the approximate linear LS regression can be efficiently computed with a good accuracy by splitting the data into multiple disjoint subsets, calculating the average data points in each of these subsets, and then solving the set of linear equations with the same (or smaller) number of unknown parameters. The data splitting should exploit ordering of data points by one of the independent variables. It can be expected that the accuracy of approximate LS regression is going to improve with the number of data points N. Numerical evaluation of the approximate LS regression is considered in Section 5.

3.2. Generating Pairwise-Correlated Gaussian Processes

How to generate a single correlated Gaussian process is well established in the literature, and it has been described in Section 2.3. Moreover, the linear transformation to generate correlated Gaussian variables from uncorrelated Gaussian variables does not have to be square. A sufficient condition on the rank of linear transformation to obtain a positive definite covariance matrix is given by the following lemma.

Lemma 8.

The matrix,^TTT, is positive definite, provided that the matrix,T∈^R_N1×_N2, has rank_N2.

Proof.

The rank_N2ofTimplies thatTconsists of_N2linearly independent columns and that_N1≥_N2. The matrix^TTTis positive definite, provided that^UT ^TTTU=_TU22>0for∀U∈^R_N2where_·2denotes the Euclidean norm of a vector. Since the columns ofTare linearly independent,_TU2=0, if and only ifU=0. □

Corollary 1.

The matrix,T^TT, is positive definite, provided that the matrix,T∈^R_N1×_N2, has rank_N1.

Corollary 2.

A rank_N1linear transformationTof uncorrelated Gaussian vectorU∈^R_N2generates_N1≤_N2correlated Gaussian variables having the (positive definite) covariance matrix,T^TT.

Furthermore, it is often necessary to generate multiple mutually correlated Gaussian processes with given autocorrelation as well as cross-correlation.

Lemma 9.

The linear transformation,

_x1_x2=_T1K0_T2_u1_u2

generates a pair of correlated Gaussian vectors_x1∈^R_N1and_x2∈^R_N2from uncorrelated zero-mean Gaussian vectors_u1,_u2∈^RNwhere0denotes a zero matrix, and according to Corollary 2, it is necessary that,max(_N1,_N2)≤N. The corresponding (auto-) correlation and cross-correlation matrices are,

E_u1 _u1T=_σ12I,E_u2 _u2T=_σ22I,E_u1 _u2T=0

_{C_x1}=E_x1 _x1T=_T1 _T1T+K^KT,_{C_x2}=E_x2 _x2T=_T2 _T2T,_{C_x1 _x2}=E_x1 _x2T=K_T2T=_{C_x2 _x1T}

whereIdenotes an identity matrix.

Proof.

The proof is straightforward by substituting (13) to definitions of_{C_x1},_{C_x2}and_{C_x1 _x2}. □

The following corollary details the procedure described in Lemma 9.

Corollary 3.

Given_T2, calculate the (auto-) correlation matrix,_{C_x2}=K_T2T, or vice versa. Then, given the cross-correlation matrix,_{C_x1 _x2}, computeK=_{C_x1 _x2} _T2 _{C_x2−T}. Finally, given_T1, calculate the (auto-) correlation matrix,_{C_x1}=_T1 _T1T+K^KT, or, obtain_T1by solving the matrix equation,_T1 _T1T=_{C_x1}−K^KT. Note that the matrix equation,^TTT=C, can be solved forTusing the singular value decomposition,C=UΛ^UTwhereUis a unitary matrix and Λ is a diagonal matrix of eigenvalues. Then,T=UΛ.

4. Polynomial Statistics and Sum-Moments for Vectors of Random Variables The main objective of this section is to define a universal function to effectively measure the statistics of random vectors and random processes observed at multiple discrete time instances. The measure function should: (1) be universally applicable for an arbitrary number of random variables and random vectors, (2) be symmetric, so that all random variables are considered equally, (3) lead to mathematically tractable expressions, and (4) be convex to allow defining convex optimization problems.

Letf:^RN↦Rdenote such a mapping or transformation of N random variables_Xito a scalar random variable Y, i.e.,

Y=f(_X1,_X2,…,_XN)=f(x).

In order to satisfy the symmetry requirement, the random variables_Xican be first combined as,

Y=f(_X1∘_X2∘⋯∘_XN)

using a binary commutative operator, ∘, such as addition or multiplication. In case of addition, it is important that the function f is nonlinear. The nonlinearity can be also used to limit the extreme values of the combined variables.

For a random processx(t), the random variables are defined as,_Xi=x(_ti). Define a vector of discrete time instances,t=(_t1,…,_tN), and assume the index notation,x(t)={_X1=x(_t1),…,_XN=x(_tN)} . Then the mapping (14) can be rewritten as,

Y=f(x(t))=F(t)=F(_t1,_t2,…,_tN).

The mean value is the most important statistical property of the random variable Y. In addition, if the processx(t)is K-th order stationary, the dimension of the mean mapping for N observations is reduced by one.

Lemma 10.

The mean ofY=f(x(t))for N discrete time observations of a K-th order stationary random process has dimension(N−1), i.e., ifN≤K,

Y¯=Ef(x(t)=C(_t2−_t1,_t3−_t1,…,_tN−_t1)=C(_τ1,_τ2,…,_τN−1)=C(τ)

where_τi=_ti+1−_t1,i=1,2,…,N−1.

Proof.

Assuming Lemma 1 and the first sample_X1=x(_t1)to be a reference, the joint probability density of N process observations becomes,_fX(x;0,_t2−_t1,…,_tN−_t1)≡_f˜X(x;_t2−_t1,…,_tN−_t1), so the corresponding statistical moments have dimension(N−1). □

In optimization problems, it is useful to consider the gradient ofY¯, i.e.,

∇Y¯=∂Y¯∂_τ1,∂Y¯∂_τ2,…,∂Y¯∂_τN−1=∂Y¯∂T∂T∂_τ1,∂T∂_τ2,…,∂T∂_τN−1

whereTis a application dependent measure of the vectorτsuch as the norms,T=_τ1=|_τ1|+⋯+|_τN−1|, orT=_τ∞=max(_τ1,…,_τN−1).

In general, assuming the Taylor’s series defined in Theorem 1 on p. 8, a multivariate function of a random vectorxcan be expanded about the meanx¯=Exas,

f(x¯+h)≈f(x¯)+∑l=1m∑_|n|1=l^∂nf(x)n!^hn.

Thus, the value off(x¯+h)is a weighted sum of^hnplus an offsetf(x¯). More importantly, if the partial derivatives^∂nf(x)are replaced with the coefficients(l!_pl)which are independent ofn , the number of parameters in (15) is greatly reduced. Moreover, instead of precisely determining the values of_pl∈R to obtain the best possible approximation of the original function f, it is useful as well as sufficient to constrain the Taylor expansion (15) to the class of functions that are exactly constructed as,

f(x¯+h)=f(x¯)+∑l=1m_pl∑_|n|1=ll!n!^hn=∑l=0m_pl ^{(_h1+⋯+_hN)l}

where_p0=f(x¯) . The function expansion (16) represents a m-th degree polynomial in variable_|h|1. The coefficients_plof this polynomial can be set using Lemma 6, so the polynomial is convex.

The key realization is that the polynomial functions (16) have all the desired properties specified at the beginning of this section.

Claim 1.

A multivariate functionf:^RN↦Rhaving the desirable properties to measure the statistics of a random vectorxis the m-th degree polynomial,

Y=f(x)=∑l=0m_pl ^{∑i=1N(_Xi−E_Xi)l}

where_p0=Ex, and the values of m and of the coefficients_pl∈Rare determined by the application requirements.

The polynomial Function (17) can be used for any number of observations N. It is symmetric, so all observations are treated equally. Moreover, it is prone to integration and employing the expectation operator. The convexity can be achieved by Lemma 6.

4.1. Related Concepts

Assume the scalar function f defined in (17) for a K-th order stationarity random processx(t). Define the auxiliary random variable,

Z(a)=1a∑i=1N(_Xi−_X¯i)

where a is a normalization constant,a≠0 . The expression (17) can be then rewritten as,Y(a)=_∑l=0m _pl^Zl(a). Fora=1,Z(1)has a zero mean, and the variance,varZ(1)=_∑i,jE(_Xi−_X¯i)(_Xj−_X¯j). Fora=N,Z(N)represents a sample mean, and its variance is equal to,varZ(N)=varZ(1)/^N2. Fora=N, the variance ofZ(N)is normalized by the number of dimensions, i.e.,varZ(N)=varZ(1)/N.

For_pl=^sl/l! , in the limit of large m, Equation (17) gives,

Y=limm→∞∑l=0m^sll!^(Z(a))l=^esZ(a).

The mean,EY=E^esZ(a), is the moment generating function of the random variableZ(a).

In data processing, the sample mean is intended to be an estimate of the true population mean, i.e.,N≫1is required. Here,Z(a)is calculated over a finite number of vector or process dimensions, so it is a random variable for∀a∈R\{0}. The variableZ(N)should be then referred to as an arithmetic average or a center of gravity of the random vectorXin the Euclidean space^RN, i.e.,

Z(N)≜X¯=1N∑i=1N_Xi∈R.

Note that (18) is not an_l1-norm, since the variables_Xiare not summed with absolute values.

If the random variables_Xiare independent, the distribution ofZ(a)is given by convolution of their marginal distributions. For correlated observations, if the characteristic function,f˜(s/a)=E^ej·_|X|1s/a, of the sum_|X|1can be obtained, the distribution ofZ(a)=_|X|1/ais computed as the inverse transform,

_fZ(Z(a))=12π_∫−∞∞^e−jsZ(a)f˜sads.

Many other properties involving the sums of random variables can be obtained. For instance, if the random variables_Xiare independent and have zero mean, then,

E^Zm(1)−E^{(Z(1)−_XN)m}=E_XNmm=2,3E_XN4+6∑i=1N−1E_XN2E_Xi2m=4E_XN5+10∑i=1N−1E_XN2E_Xi3+E_XN3E_Xi2m=5.

2 Considering Claim 1, an important statistic for a random vector can be defined as the m-th central sum-moment.

Definition 4.

The m-th central sum-moment of random vectorX∈^RNis computed as,

∑_μm(X)=_μm _(|X|1)=E^{∑i=1N(_Xi−_X¯i)m},m=1,2,…

Lemma 11.

The second central sum-moment of random vector is equal to the sum of all elements of its covariance matrix, i.e.,

∑_μ2(X)=_μ2(|X−X¯_|1)=E^{∑i=1N(_Xi−_X¯i)2}=∑i,j=1Ncov_Xi,_Xj=_{|_Cx|1}.

Furthermore, the second central sum-moment is also equal to the variance of|X|, i.e.,

∑_μ2(X)=var_|X|1=var∑i=1N(_Xi−_X¯i).

Proof.

The first equality is shown by expanding the expectation, and substituting for elements of the covariance matrix,_{[_Cx]i,j}=cov_Xi,_Xj. The second expression follows by noting that_∑i=1N(_Xi−_X¯i)has zero mean. □

In the literature, there are other measures involving sums of random variables. For instance, in Mean-Field Theory, the model dimensionality is reduced by representing N-dimensional vectors by their center of gravity [20]. The central point of a vector is also used in the first order approximation of multivariate functions in [21] and in the model overall variance in [22].

In Measure Theory [23], the total variation (TV) of a real-valued univariate function,x:(_t0,_tN)↦R, is defined as the supremum over all possible partitionsP:_t0<_t1<⋯<_tNof the interval(_t0,_tN), i.e.,

TV(x)=supP∑i=0N−1|x(_ti+1−x(_ti)|.

The TV concept can be adopted for observations_Xi=x(_ti)of a stationary random processx(t)at discrete time instances,_{{_ti}0:N}. A mathematically tractable mean TV measure can be defined as,

^TV¯2(x)=supPE∑i=0N−1^{|_Xi+1−_Xi|2}=supP2N(E_Xi2−cov_Xi+1,_Xi).

Jensen’s inequality for a random vector assuming equal weights can be stated as [17],

E^{1N∑i=1N(_Xi−_X¯i)m}≤1N∑i=1NE|_Xi−_X¯i ^|m.

Alternatively, exchanging the expectation and summation, Jensen’s inequality becomes,

∑i=1N|E_Xi−_X¯i^|m≤E∑i=1N^{|_Xi−_X¯i|m}.

Furthermore, if the right-hand side of (19) is to be minimized andm=2 , the inequality in (19) changes to equality. In particular, consider the minimum mean square error (MMSE) estimation of a vector of random parametersP=_{{_Pi}1:N}from measurementsX. Denoting_P¯i(X)=E_Pi|X, conditioned onX, the MMSE estimatorP^(X) minimizes [15],

minP^|XE∑i=1N^{(_P^i(X)−_Pi)2}=minP^|XE∑i=1N^{(_P^i(X)−_P¯i(X))−(_Pi−_P¯i(X))2}=minP^|X∑i=1N^{(_P^i(X)−_P¯i(X))2}+E∑i=1n^{(_Pi−_P¯i(X))2}=minP^|X∑i=1N^{(E_P^i(X)−_Pi|X)2}=minP^|X∑i=1N^{(_P^i(X)−E_Pi|X)2}

where the expectations are over the conditional distribution_fP|X.

In signal processing, a length N moving average (MA) filter transforms the input sequence_Xiinto an output sequence_Yiby discrete-time convolution ⊛, i.e.,

_Yi=∑j=0N−1_Xi−j=[11…1︸_1N]⊛_Xi.

The (auto-) correlations of the input and output sequences are related by Lemma 3 on p. 6, i.e.,

_CY(i)=_1N⊛_1N⊛_Cx(i)=∑j=−N+1N−1(N−|j|)_Cx(i−j).

Note that if the input process is stationary, then the input and output processes are jointly stationary. 4.2. Multiple Random Processes

The major complication with observing, evaluating, and processing multiple random processes is how to achieve their time alignment and amplitude normalization (scaling). Focusing here on the time alignment problem only, denote the discrete time observation instances of L random processes as,

_tl={_tl1<_tl2<…<_{tl_Nl}},l=1,2,…,L.

Assume that the first time instance_tl1of every process serves as a reference. Then, there are(L−1)uncertainties in time alignment of L processes, i.e.,

_Δl=(_tl1−_t11)∈R,l=2,3,…,L.

The(L−1)values_Δlare unknown parameters which must be estimated. Note also that the difference,

_Δl−_Δk=_tl1−_tk1

represents an unknown time shift between the process_xl(t)and_xk(t).

For any multivariate stationary distribution of observations of two random processes, the corresponding cross-correlation normally attains a maximum for some time shift between these processes [12]. Hence, a usual strategy for aligning the observed sequences is to locate the maximum value of their cross-correlation. The time shifts_Δl,l=1,2,…,L, are then estimated as,

_Δ^l=argmaxΔ_{C_x1 _xl}(Δ),Δ∈_{{(_tli−_t11)}i=1,…,_Nl}.

Assuming the center values_X¯1and_X¯2 in (18) as scalar representations of the vectors_X1and_X2, their cross-covariance can be computed as,

cov|_X1 _|1,_{|_X2|1}=^N2cov_X¯1,_X¯2=^N2E(_X¯1−E_X1)(_X¯2−E_X2).

The task now is how to generalize the pairwise cross-covariance (22) to the case of multiple random vectors having possibly different lengths. If all random vectors of interest are concatenated into one single vector, the m-th joint central sum-moment can be defined by utilizing Claim 1.

Definition 5.

The m-th central sum-moment for L random processes with_Nlobservations,l=1,2,…,L, is computed as,

∑_μm(_X1,…,_XL)=_μm(|_X1 _|1+…+|_XL_|1)=E^{∑l=1L∑i=1_Nl(_Xli−_X¯li)m}.

Lemma 12.

The second central sum-moment for L random processes with_Nlobservations,l=1,2,…,L, is equal to the sum of all pairwise covariances, i.e.,

∑_μ2(_X1,…,_XL)=∑l,k=1L∑i,j=1_Nlcov_Xli,_Xkj=∑l,k=1L_{|cov_Xl,_Xk|1}=var|_X1 _|1+…+_{|_XL|1}.

Proof. The expression can be obtained by expanding the sum and then applying the expectation. □

Many other properties of central and noncentral sum-moments can be obtained. For example, assuming two equal-length vectors_X1and_X2, it is straightforward to show that,

E^{|_X1 _|1+_{|_X2|1}2}−E|_X1 _|12+_{|_X2|12}=2∑i,j=1NE_X1i _X2jE^{|_X1 _|1+_{|_X2|1}2}−E_{_X1−_X222}=∑i,j=1i≠jNE_X1i _X1j+_X2i _X2j+2∑i,j=1NE_X1i _X2j+2∑i=1NE_X1i _X2i.

5. Illustrative Examples

This section provides examples to quantitatively evaluate the results developed in the previous sections. In particular, the accuracy of approximate linear LS regression proposed in Section 3.1 is assessed to justify its lower computational complexity. The central sum-moments introduced in Section 4 are compared assuming correlated Gaussian processes. Finally, several signal processing problems involving the 1st order Markov process are investigated.

5.1. Linear Regression

Consider a classical one-dimensional linear LS regression problem with independent and identically normally distributed errors. The errors are also assumed to be independent from all other data model parameters. The data points are generated as,

_Xi=Δi:_Yi=_P2 _Xi+_P1+_Ei,i=1,2,…,N

where_Eiare zero-mean, uncorrelated Gaussian samples having the equal variance_σe2, and_P1and_P2 are unknown parameters to be estimated. This LS problem can be solved exactly using the expression (2), and substituting_w1i=1and_w2i=Δi,∀i=1,2,…,N. Alternatively, to avoid inverting theN×N data matrix, the procedure devised in Section 3.1 suggests to split the data into two equal-size subsets, compute the average data point for each subset, and then solve the corresponding set of two equations with two unknowns. Specifically, the set of two equations with the unknown parameters_P^1and_P^2is,

2N∑i=1N/2_Yi=_P^1+_P^2Δ8N(2+N)2N∑i=N/2+1N_Yi=_P^1+_P^2Δ8N(2+3N)

assuming N is even, and using,_∑i=1N/2Δi=Δ8N(2+N), and_∑i=N/2+1NΔi=Δ8N(2+3N). Denoting_Y¯1=2N_∑i=1N/2 _Yiand_Y¯2=2N_∑i=N/2+1N _Yi , the closed-form solution of (23) is,

_P^1=12N(2+3N)_Y¯1−(2+N)_Y¯2_P^2=4Δ^N2(_Y¯2−_Y¯1).

As a numerical example, assume the true values_P1=1.5,_P2=0.3,E_Ei=0,var_Ei=1, andN=40 data points. Figure 2 shows the intervals(T¯−varT,T¯+varT)versus the subset size1≤_N1≤N/2for the random variable T defined as,

T=100_Sapr(N)−_Sopt(N)_Sopt(N)

where_Sapr(N)=_∑i=1N ^{(_Yi−_P^1−_P^2 _Xi)2}and_Sopt(N)=_∑i=1N ^{(_Yi−_P1−_P2 _Xi)2}are the total MSEs. In the limit,_limN→∞(_Sapr(N)−_Sopt(N))=0, since a sufficiently large subset of data is as good as the complete set of data. For finite N, it is likely that_Sapr(N)>_Sopt(N) , so the lower bounds in Figure 2 converge much faster to zero than the upper bounds.

5.2. Comparison of Central Moments

Assuming Lemma 11 and Equation (3), the second central sum-moment of the 2nd order Markov process of length N is,

∑_μ2(X)=∑i,j=1N_C2MP(i−j)=^σ2N+2∑i=1N−1i(1+(N−i)α)^{e−α(N−i)}.

These moments are compared in Figure 3 for different values of the sequence length N, three values of the parameterα, and assuming^σ2=1. It can be observed that the values of the second central sum-moment are increasing with N and the level of correlation,^e−α.

Consider now the following three central moments of orderm=1,2,…, i.e.,

_S¯mnk(N)=∑i=1NE|N_Xi ^|m∑_μ2(N)=E^∑i=1N_Xim∑_μ˜2(N)=E^{∑i=1N|_Xi|m}.

The moment,_S¯mnk, is the mean Minkowski distance; the scaling byNis introduced to facilitate the comparison with the other two moments, i.e., the mean sum-moment∑_μ2, and the mean sum-moment∑_μ˜2having the samples summed as the_l1norm. More importantly, assuming a correlated Gaussian process_Xi=x(_ti), the central sum-moment can be readily obtained in a closed form whereas obtaining the closed form expressions for the other two metrics may be mathematically intractable. In particular, by factoring the covariance matrix as,_Cx=_Tx _TxT, the correlated Gaussian vector can be expressed as,X=TU. Then the sum of elements,_|X|1=^1TTU, and the m-th central sum-moment can be computed as,

∑_μ2(N)=E^{^1TTUm}=_^1TT2mE^|U|m=_^1TT2m^2m/2πΓm+12

where U is a zero-mean, unit-variance Gaussian random variable, andΓdenotes the gamma function.

Figure 4 shows all three moments as a function of sequence length N for three values of parameterα assuming the 1st order Markov process. The vertical axis in Figure 4 is scaled by1/Nfor convenience. Note that, for uncorrelated, i.e., independent Gaussian samples, the moments∑_μ2and∑_μ˜2are identical. More importantly, all three moments are strictly increasing with the number of samples N and with the moment order m.

Finally, the second central sum-moment can be visualized in two dimensions. ConsiderN=3observations of a zero-mean real-valued stationary random process,_Xi=x(_ti),i=1,2,3. Let∑_μ2=E^{(_X1+_X2+_X3)2}=_∑i,j=13E_Xi _Xj. Assuming the 1st order Markov correlation model,E_Xi _Xj∝^e−0.5|τ| , Figure 5 shows the values of the central sum-moment∑_μ2versus the sample distances_τ1=(_t2−_t1)and_τ2=(_t3−_t1) . Several symmetries can be observed in Figure 5. In particular, the central sum-moment∑_μ2is symmetric about the axis_τ1=_τ2as well as about the axis_τ1=−_τ2. These symmetries are consequences of the following equalities:

∑_μ2(_τ1,_τ2)=∑_μ2(_τ2,_τ1)∑_μ2(_τ1,_τ2)=∑_μ2(−_τ1,−_τ2)∀_τ1,_τ2.

5.3. Signal Processing Problems for the 1st Order Markov Process

Consider the 1st order Markov process observed at the output of a length N MA filter. According to (21), the (auto-) covariance of the output process is,

_C1MP+MA(k;α)=∑j=−N+1N−1(N−|j|)^σ2^{e−α|k−j|}.

Conjecture 1.The MA filtering of the 1st order Markov process generates nearly a 2nd order Markov process.

The parameter of the 2nd order Markov process approximating the combined (auto-) covariance (24) can be obtained using the LS regression fit, i.e.,

α^=argminα˜∑k^{(_C1MP+MA(k;α)−_C2MP(k;α˜))2}.

Substituting (3) and (24) into (25), and letting the first derivative to be equal to zero, the LS estimateα^must satisfy the linear equation,

∑kα^|k|+1+_W−1−_C1MP+MA(k;α)e=0

which can be readily solved forα^, and_W−1 denotes the Lambert function [24].

A discrete time sequence of N elements has the (auto-) covariance constrained to(2N−1) time indexes as indicated in (24). Assuming the length N MA filter, and that there are(_nxN)samples,_nx=1,2,… , of the 1st order Markov process available, the (auto-) covariance (24) has the overall length2N(_nx+1)−3 samples. Figure 6 compares the MSE,

MSE(_nx)=100×_∑k ^{(_C1MP+MA(k;α)−_C2MP(k;α^))2}_∑k _C1MP+MA2(k;α)

of the LS fit of the (auto-) covariance of the 2nd order Markov process to the combined (auto-) covariance of the 1st order Markov process and the impulse response of the MA filter assuming three values ofαand two values of_nx. Givenαand_nx , Figure 6 shows that the best LS fit occurs for a certain value of the MA filter length N. It can be concluded that, in general, the 1st order Markov process changes to the 2nd order Markov process at the output of the MA filter.

The second problem to investigate is a linear MMSE (LMMSE) prediction of the 1st order Markov process observed at the output of a MA filter. In particular, given N samples_Xi=x(_ti),i=1,2,…,N , of random process having the (auto-) covariance (24), the task is to predict its future value,_XN+1=x(_tN+1),_tN+1>_tN.

In general, the impulse responsehof the LMMSE filter to estimate an unknown scalar parameter P from the measurementsX is computed as [15],

h=E(x−x¯)^(X−X¯)TEPX.

Here, the unknown parameterP=_XN+1, andE_XN+1 _Xi=_C1MP+MA(N+1−i)andE_Xi _Xj=_C1MP+MA(i−j) in (26) which gives the length N LMMSE filter,

h=[00…0︸N−1_C1MP+MA(1)].

Consequently, the predicted value,_XN+1=_XN_C1MP+MA(1)/_σX2. Note that the same procedure, but excluding the MA filter, gives the LMMSE estimate,_XN+1=_XN_C1MP(1)/_σX2.

The last problem to consider is a time alignment of two zero-mean, jointly stationary processes. It is assumed that the normalized cross-covariance of these two processes is,

E_X1i _X2jE_X1i2E_X2i2=^{e−α|i−j|}(1+α|i−j|).

Denote the uncertainty in determining the difference,(i−j), asΔ. In order to estimate the unknown parametersαandΔ , the left-hand side of (27) can be estimated by the method of moments, i.e., letE_Xi2≈1N_∑i=1N _Xi2, and,E_X1i _X2j≈1N_∑i=1N _X1i _X2|i−Δ| . The cross-covariance (27) can be then rewritten as,

_vk=_∑i=1N _X1i _{X2|i−Δ−k|}_∑i=1N _X1i2_∑i=1N _X2i2=^e−α|Δ+k|(1+α|Δ+k|),k=0,1,2,…

Utilizing the Lambert function_W−1, the cross-covariance can be rewritten further as,

α|Δ+k|=−1−_W−1−_vke≜_v˜k,k=0,1,2,…

Assuming, without loss of generality, thatΔ≥0 , the absolute value in (28) can be ignored. Consequently, the unknown parametersαandΔcan be obtained as a linear LS fit to N measured and calculated values_v˜k in the linear model (28).

6. Conclusions The development of a novel statistical measure to enable correlation analysis for multiple random vectors resumed by summarizing background knowledge on statistical description of discrete time random processes. This was then extended with the derivation of several supporting results which were used in the following sections. Specifically, it was shown that linear regression can be effectively approximated by splitting the data into disjoint subsets and assuming only one average data point within each subset. In addition, a procedure for generating multiple Gaussian processes with the prescribed autocovariance and cross-covariance was devised. The main result of the paper was obtained by assuming the Taylor’s expansion of multivariate symmetric scalar functions, and then approximating the Taylor’s expansion by a univariate polynomial. The single polynomial variable is a simple sum of variables in the original multivariate function. The polynomial approximation represents a mapping from multiple discrete time observations of a random process to a multidimensional scalar field. The mean field value is a weighted sum of canonical central moments with increasing orders. These moments were named central sum-moments to reflect how they are defined. The sum-moments were then discussed in light of other similar concepts such as total variance, Mean Field Theory, and moving average sequence filtering. Illustrative examples were studied in the last section of the paper. In particular, the accuracy of approximate linear regression was evaluated quantitatively assuming two disjoint data subsets. Assuming the 1st and the 2nd order Markov processes, the central sum-moments were compared with the mean Minkowski distance. For Gaussian processes, the central sum-moments can be obtained in closed form. The remaining problems investigated moving average filtering of the 1st order Markov processes and its prediction using a linear MMSE filter.

Figure 1. The exact (LSopt) and the reduced complexity (LSapr) linear LS regression.

Figure 2. The relative total mean-square error of the approximate linear LS regression.

Figure 3. The second central sum-moments of the 2nd order Markov process with parameterα and length N.

View Image - Figure 4. The Minkowski (blue), sum-moment (black), and sum-moment of absolute values (red) mean statistics for the 1st order Markov sequence of length N. Columns: different valuesα . Rows: different values m.

Figure 4. The Minkowski (blue), sum-moment (black), and sum-moment of absolute values (red) mean statistics for the 1st order Markov sequence of length N. Columns: different valuesα . Rows: different values m.

Figure 5. The second central sum-moment as a function of time differences betweenN=3observations of a stationary random process.

View Image - Figure 6. The MSE of approximating the (auto-) covariance of the 1st order Markov process at the output of length N MA filter by the (auto-) covariance of the 2nd order Markov process.

Figure 6. The MSE of approximating the (auto-) covariance of the 1st order Markov process at the output of length N MA filter by the (auto-) covariance of the 2nd order Markov process.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in the paper:

1MP	1st order Markov process
2MP	2nd order Markov process
AR	autoregressive
LMMSE	linear minimum mean square error
LS	least squares
MMSE	minimum mean square error
MSE	mean square error
MA	moving average
TV	total variance
\|·\|	absolute value, set cardinality, sum of vector elements
E·	expectation
corr·	correlation
cov·	covariance
var·	variance
^(·)-1	matrix inverse
^(·)T	matrix/vector transpose
_fx	distribution of a random variable X
f,f˙,f¨	function f, and its first and second derivatives
_N+	positive non-zero integers
R,^R+	real numbers, positive real numbers
X¯	mean value of random variable X
_Xij	j-th sample of process i
_W-1	Lambert function

References

1. Drezner, Z. Multirelation-A correlation among more than two variables. Comput. Stat. Data Anal. 1995, 19, 283-292.

2. Dear, R.; Drezner, Z. On the significance level of the multirelation coefficient. J. Appl. Math. Decis. Sci. 1997, 1, 119-130.

3. Geiß, S.; Einax, J. Multivariate correlation analysis-A method for the analysis of multidimensional time series in environmental studies. Chemom. Intell. Lab. Syst. 1996, 32, 57-65.

4. Abdi, H. Chapter Multiple correlation coefficient. In Encyclopedia of Measurements and Statistics; SAGE: Los Angeles, CA, USA, 2007; pp. 648-655.

5. Székely, G.J.; Rizzo, M.L.; Bakirov, N.K. Measuring and testing dependence by correlation of distances. Ann. Stat. 2007, 35, 2769-2794.

6. Merigó, J.M.; Casanovas, M. A New Minkowski Distance Based on Induced Aggregation Operators. Int. J. Comput. Intell. Syst. 2011, 4, 123-133.

7. Nguyen, H.V.; Müller, E.; Vreeken, J.; Efros, P.; Böhm, K. Multivariate maximal correlation analysis. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21-26 June 2014; pp. II-775-II-783.

8. Josse, J. Measuring multivariate association and beyond. Stat. Surv. 2016, 10, 132-167.

9. Shu, X.; Zhang, Q.; Shi, J.; Qi, Y. A comparative study on weighted central moment and its application in 2D shape retrieval. Information 2016, 7, 10.

10. Böttcher, B.; Keller-Ressel, M.; Schilling, R.L. Distance multivariance: New dependence measures for random vectors. Ann. Stat. 2019, 47, 2757-2789.

11. Wang, J.; Zheng, N. Measures of correlation for multiple variables. arXiv 2020, arXiv:1401.4827.

12. Gardner, W.A. Introduction to Random Processes With Applications, 2nd ed.; McGraw-Hill: New York, NY, USA, 1990.

13. Papoulis, A.; Pillai, S.U. Probability, Random Variables, and Stochastic Processes, 4th ed.; McGraw-Hill: New York, NY, USA, 2002.

14. Giller, G.L. The Statistical Properties of Random Bitstreams and the Sampling Distribution of Cosine Similarity. SSRN Preprint 2012.

15. Kay, S.M. Fundamentals of Statistical Signal Processing: Estimation Theory; Prentice Hall: Upper Saddle River, NJ, USA, 1993; Volume I.

16. Oppenheim, A.V.; Schafer, R.W.; Buck, J.R. Discrete-Time Signal Processing, 3rd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2009.

17. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004.

18. Folland, G.B. Higher-Order Derivatives and Taylor's Formula in Several Variables. Available online: https://sites.math.washington.edu/~folland/Math425/taylor2.pdf (accessed on 1 December 2020).

19. Apostol, T. Mathematical Analysis, 2nd ed.; Pearson: London, UK, 1973.

20. Yedidia, J.S. chapter An idiosyncratic journey beyond mean field theory. In Advanced Mean Field Methods; MIT Press: Cambridge, MA, USA, 2001; pp. 21-36.

21. Sobol, I.M. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 2001, 55, 271-280.

22. Saltelli, A.; Ratto, M.; Tarantola, S.; Campolongo, F. Sensitivity analysis for chemical models. Chem. Rev. 2005, 105, 2811-2828.

23. Shirali, S. A Concise Introduction to Measure Theory; Springer: Berlin/Heidelberg, Germany, 2018.

24. Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; Dover: Mineola, NY, USA, 1974.

AuthorAffiliation

Pavel Loskot

Zhejiang University/University of Illinois at Urbana-Champaign Institute, Haining 314400, China

Word count: 6995

Show less

© 2021. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

The paper investigates the problem of performing a correlation analysis when the number of observations is large. In such a case, it is often necessary to combine random observations to achieve dimensionality reduction of the problem. A novel class of statistical measures is obtained by approximating the Taylor expansion of a general multivariate scalar symmetric function by a univariate polynomial in the variable given as a simple sum of the original random variables. The mean value of the polynomial is then a weighted sum of statistical central sum-moments with the weights being application dependent. Computing the sum-moments is computationally efficient and amenable to mathematical analysis, provided that the distribution of the sum of random variables can be obtained. Among several auxiliary results also obtained, the first order sum-moments corresponding to sample means are used to reduce the numerical complexity of linear regression by partitioning the data into disjoint subsets. Illustrative examples provided assume the first and the second order Markov processes.

Details

Title

Polynomial Representations of High-Dimensional Observations of Random Processes

First page

123

Publication year

2021

Publication date

2021

Publisher

MDPI AG

e-ISSN

22277390

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/math9020123

ProQuest document ID

2476771141

Polynomial Representations of High-Dimensional Observations of Random Processes

Jump to:

Full text

Abstract

Details

Suggested sources