1. Introduction The interest in developing and improving statistical methods and models is driven by the ever increasing volumes and variety of data. Making sense of data often requires one to uncover the existing patterns and relationships in data. One of the most commonly used universal tools for data exploration is evaluation of cross-correlation among different data sets and of autocorrelation of data sequence with itself. The correlation values can be utilized to carry out sensitivity analysis, to forecast future values, to visualize complex relationships among subsystems, and to evaluate other important spatio-temporal statistical properties. However, it has been well established that correlations do not imply causalization. The correlations measure linear dependencies between pairs of random variables which is implicitly exploited in linear regression. The correlation values rely on estimated or empirical statistical moments such as sample mean and sample variance. If the mean values are removed from data, the correlations are referred to as covariances. The pairwise correlations can be represented as a fully connected graph with edges parameterized by time shifts and possibly by amplitude adjustments between the corresponding data sequences. However, the graph may become excessively complex when the number of random variables considered is large, for example, when analyzing multiple long sequences of random observations.
The problem of extending the notion of correlations and covariances to more than two random variables has been considered previously in literature. In particular, the multirelation has been defined as a measure of linearity for multiple random variables in [1]. This measure is based on a geometric analysis of linear regression. Under the assumption of multivariate Student-t distribution, the statistical significance of the multirelation coefficient is defined in terms of eigenvalues of the correlation matrix in [2]. An univariate correlation measure for multiple random variables is defined in [3] to be a sum of elements on the main diagonal of the covariance matrix. Linear regression is again utilized in [4] to define the multiple correlation coefficient. It is derived from the coefficient of determination of linear regression as a proportion of the variance in the dependent variable which is predictable from the independent variables.
The distance correlation measure between two random vectors based on the hypothesis testing of independence of random variables is proposed in [5]. Different operations on random variables including ordering, weighting, and a nonlinear transformation are assumed in [6] to define a family of Minkowski distance measures for random vectors. A maximal correlation measure for random vectors is introduced in [7]. It generalizes the concept of maximal information coefficient while nonlinear transformations are also used to allow assessment of nonlinear correlations. Similarly to [3], the sample cross-correlation between multiple realizations of two random vectors is shown in [8] to be proportional to the sum of diagonal elements of the product of the corresponding correlation matrices. The reference [9] investigates two-dimensional general and central moments for random matrices which are invariant to similarity transformations. Two new distance measures for random vectors are defined in [10] assuming the joint characteristic function. These measures can be also used to test for statistical independence of random vectors. The most recent paper [11] derives a linear correlation measure for multiple random variables using determinants of the correlation submatrices.
This brief literature survey indicates that there are still no commonly accepted correlation measures for multiple random variables. The measures considered in the literature are either rigorously derived but mathematically rather complicated, or there are many modifications of the existing, simpler, but well understood measures. In this paper, it is shown that by constraining the complexity of multivariate Taylor series to reduce the number of its parameters or degrees-of-freedom, the Taylor series can be rewritten as a finite degree univariate polynomial. The independent variable of this polynomial is a simple sum of the random variables considered. The polynomial coefficients are real-valued constants, and they are application dependent. The polynomial defines a many-to-one transformation of multiple random variables to another scalar random variable. The mean value of the polynomial represents a broad class of polynomial measures which can be used for any number of random variables. The mean value of each polynomial element corresponds to a general or central moment of the sum of random variables. Therefore, these moments are referred to here as sum-moments. In case of multiple random vectors, similarly to computing the correlation or covariance matrix by first concatenating the vectors into one long vector, the sum-moments can be readily defined and computed for such a concatenated random vector. The main advantages of assuming sum-moments to study statistical properties of multiple random variables or multiple random vectors are the clarity in understanding their statistical significance, and mathematical simplicity of their definition. Moreover, as long as the distribution of the sum of random variables can be found, the closed-form expression for the sum-moments can be obtained.
Before introducing the polynomial representations of random vectors in Section 4 together with central and general sum-moments, a number of auxiliary results are presented in Section 2 and Section 3. In particular, Section 2 summarizes the key results and concepts from the literature concerning stationary random processes, their parameter estimation via linear regression and method of moments, and how to generate the 1st and the 2nd order Markov processes is also given. Section 3 extends the results from Section 2 by deriving additional results which are used in Section 4 and in Section 5 such as a low complexity approximation of linear regression and a procedure to generate multiple Gaussian processes with defined autocorrelation and cross-correlation. The main results of the paper are obtained in Section 4 including defining a class of polynomial statistical measures and sum-moments for multiple random variables and random processes. Other related concepts involving sums of random variables are also reviewed. Section 5 provides several examples to illustrate and evaluate the results obtained in the paper. The paper is concluded in Section 6.
2. Background This section reviews key concepts and results from the literature which are used to develop new results in the subsequent sections. Specifically, the following concepts are briefly summarized: stationarity of random processes, estimation of general and central moments and of correlation and covariance using the method of moments, definition of cosine similarity and Minkowski distance, parameter estimation via linear regression, generation of the 1st and the 2nd order Markov processes, and selected properties of polynomials and multivariate functions are also given. Note that more straightforward proofs for some lemmas are only indicated and not fully and rigorously elaborated. 2.1. Random Processes
Consider a real-valued one-dimensional random processx(t)∈Rover a continuous timet∈R. The process is observed at N discrete time instancest1<t2<…<tNcorresponding to N random variablesXi=x(ti),i=1,…,N. The random variablesX={X1,…,XN}∈RNare completely statistically described by their joint density functionfx(X), so thatfx(X)≥0and∫RN fx(X)dX=1. The processx(t) is further assumed to be K-th order stationary [12].
Definition 1.
A random processx(t)is K-th order stationary, if,
fx(X1,…,XK;t1,…,tK)=fx(X1,…,XK;t1+t0,…,tK+t0)∀t0∈R.
Lemma 1.
The K-th order stationary process is also(K−k)-th order stationary,k=1,2,…,K−1, for any subset{Xi1 ,…,XiK−k }⊆{X1,…,XN}.
Proof. The unwanted random variables can be integrated out from the joint density. □
Unless otherwise stated, in the sequel, all observations of random processes are assumed to be stationary.
The expectation,EX=∫Rxfx(x)dx, of a random variable X is a measure of its mean value. A linear correlation between two random variablesx(t1)andx(t2) is defined as [12],
Rx(t1,t2)=corrx(t1)x(t2)=Rx(t2−t1)=Rx(τ).
The (auto-) covariance measures a linear dependency between two zero-mean random variables, i.e.,
Cx(t1,t2)=covx(t1),x(t2)=Ex(t1)−Ex(t1)x(t2)−Ex(t2)=Cx(t2−t1).
It can be shown that the maximum ofCx(τ),τ=t2−t1, occurs forτ=0, corresponding to the variance of a stationary processx(t), i.e.,
Cx(0)=varx(t)=Ex(t)−Ex(t)2∀t∈R.
The covarianceCx(τ)can be normalized, so that,−1≤Cx(τ)/Cx(0)≤1. Furthermore, for real-valued regular processes, the covariance has an even symmetry, i.e.,Cx(τ)=Cx(−τ) ,12].
ForN>2, it is convenient to define a random vector,X=[X1,…,XN]T, where(·)Tdenotes the transpose, and the corresponding mean vector,X¯=EX. Then, the covariance matrix,
Cx=E(X−X¯)(X−X¯)T⊆RN×N
has as its elements the pairwise covariances,[Cx]ij=Cx(tj−ti),i,j=1,2,…,N.
Assume now the case of two K-th order stationary random processesx1(t)andx2(t), and the discrete time observations,
X1i=x(t1i),i=1,2,…,N1X2i=x(t2i),i=1,2,…,N2.
Using the set notation,{Xi}1:K={X1,X2,…,XK}, the jointly or mutually stationary processes imply time-shift invariance of their joint density function.
Definition 2.
Random processesx1(t)andx2(t)are K-th order jointly stationary, if,
fx({X1i}1:K,{X2i}1:K;{t1i}1:K,{t2i}1:K)=fx({X1i}1:K,{X2i}1:K;{t1i+t0}1:K,{t2i+t0}1:K)∀t0∈R
is satisfied for all subsets,{X1i}1:K⊆{X1i}1:N1,K≤N1, and{X2i}1:K⊆{X2i}1:N2,K≤N2.
Lemma 2.The K-th order joint stationarity implies the joint stationarity of all orders smaller than K.
Proof. The claim follows from marginalization of the joint density function to remove unwanted variables. □
The cross-covariance of random variablesX1=x1(t1)andX2=x2(t2)being discrete time observations of jointly stationary random processesx1(t)andx2(t), respectively, is defined as,
Cx1 x2(t1,t2)=covX1,X2=covx(t1)−Ex(t1)x(t2)−Ex(t2)=Cx1 x2(t2−t1).
The cross-covariance can be again normalized, so it is bounded as,
−1≤Cx1 x2(τ)/varx1(t)varx2(t)≤1.
Note that, unlike for autocovariance, the maximum ofCx1 x2(τ)can occur for any value of the argumentτ.
The covariance matrix for the random vectorsX1=[X1i,…,X1N1]TandX2=[X2i,…,X2N2]Thaving the means,X¯1=EX1andX¯2=EX2, respectively, is computed as,
Cx1 x2=E(X1−X¯1)(X2−X¯2)T⊆RN1×N2.
Its elements are the covariances,[Cx1 x2]ij=Cx1 x2(t1i−t2j),i=1,…,N1,j=1,…,N2.
In addition to the first order (mean value) and the second order (covariance) statistical moments, higher order statistics of a random variable X are given by the general and the central moments defined, respectively, as [13],
gm(X)=E|X|mandμm(X)=E|X−EX|m,m=1,2,…
where|X| denotes the absolute value of scalar variable X. The positive integer-valued moments (1) facilitate mathematically tractable integration, and prevent producing complex numbers, if the argument is negative. Note also that the absolute value in (1) is necessary if X is complex valued, or if m is odd, in order to make the moments to be real-valued and convex. The central moment can be normalized by the variance as,
μm(X)=E|X−EX|m/E(X−EX)2m/2.
The cosine similarity between two equal-length vectors,X1=[X11,…,X1N]TandX2=[X21,…,X2N]T , is defined as [14],
Scos(X1,X2)=∑i=1N X1i X2i∑i=1N X1i2∑i=1N X2i2.
The Minkowski distance between two equal-length vectorsX1andX2is defined as thelm -norm [6], i.e.,
Smnk(X1,X2)=X1−X2m=(∑i=1NX1i−X2i |m1/m,m=1,2,…
2.2. Estimation Methods
Statistical moments can be empirically estimated from measured data using sample moments. Such inference strategy is referred to as the method of moments [15]. In particular, under the ergodicity assumption, a natural estimator of the mean value of a random variable X from its measurementsXiis the sample moment,
X¯=EX≈1N∑i=1NXi.
The sample mean estimator is unbiased and consistent [15]. More generally, the sample mean estimator of the first moment of a transformed random variableh(X)is,
Eh(X)≈1N−d∑i=1Nh(Xi)
where d is the number of degrees of freedom used in the transformation h, i.e., the number of other parameters which must be estimated. For example, the variance of X is estimated as,
varX=E|X−X¯|2≈1N−1∑i=1N(Xi−X¯^)2
whereX¯^is the estimate of the mean valueX¯of X.
Assuming random sequences{X1i}1:N1and{X2i}1:N2 , their autocovariance and cross-covariance, respectively, are estimated as [15],
Cx(k)=EXi Xi+k≈1N−k∑i=1N−kXi Xi+k,k≪N.Cx1 x2(k)=EX1i X2(i+k)≈1N−k∑i=1N−kX1i X2(i+k),k≪N=min(N1,N2).
Since these estimators are consistent, the conditionk≪Nis necessary to combine a sufficient number of samples and achieve an acceptable estimation accuracy.
The parameters of a random process can be estimated by fitting a suitable data model to the measurementsXi. Denote such data model as,EXi=Mi(P),i=1,2,…,N. Assuming the least-squares (LS) criterion, the vector of parameters,P=[P1,…,PD]T⊆RD, is estimated as,
P^=argminP∑i=1N(Xi−Mi(P))2.
For continuous parameters, the minimum is obtained using the derivatives, i.e., letddPMi(P)=M˙i(P), and,
ddP∑i=1N(Xi−Mi(P))2=!0⇔∑i=1NM˙i(P^)Mi(P^)=∑i=1NM˙i(P^)Xi.
For a linear data model,Mi(P)=wiTPwherewi=[w1i,…,wDi]Tare known coefficients, the LS estimate can be obtained in the closed-form, i.e.,
P^=∑i=1Nwi wiT−1∑i=1Nwi Xi
where(·)−1denotes the matrix inverse.
2.3. Generating Random Processes
The task is to generate a discrete time stationary random process with a given probability density and a given autocovariance. The usual strategy is to generate a correlated Gaussian process followed by a nonlinear memoryless transformation. For instance, the autoregressive (AR) process described by the second order difference equation with constant coefficientsa1,a2, andb>0 [12], i.e.,
x(n)+a1x(n−1)+a2x(n−2)=bu(n)
generates the 2nd order Markov process from a zero-mean white (i.e., uncorrelated) processu(n). Fora1=−21−a1+aanda2=1−a1+a,0<a<1, this process has the autocovariance,
C2MP(k)=b2(1+a)24a3︸σ21−a1+a|k|/2︸≐(1−a)|k|(1+a|k|)≐σ2 (1−a)|k|(1+a|k|)=σ2e−|k|(−ln(1−a))(1+a|k|).
On the other hand, fora1=a>0, anda2=0, the AR process,
x(n)+ax(n−1)=bu(n)
generates the 1st order Markov process with the autocovariance,
C1MP(k)=b21−a2︸σ2a|k|=σ2e−α|k|,a=e−α.
Lemma 3.
[16] The stationary random processx(k)with autocovarianceCx(k)is transformed by a linear time-invariant system with real-valued impulse responseh(k)into another stationary random processy(k)=∑i=−∞∞h(i)x(k−i)≡h(k)⊛x(k)with autocovariance,Cy(k)=h(k)⊛h(−k)⊛Cx(k). The symbol, ⊛, denotes (discrete time) convolution.
Proof.
By definition, the output covariance,Cy(k,l)=E(y(k)−Ey(k))(y(l)−Ey(l)). Substitutingy(k)=∑i=−∞∞h(i)x(k−i), and rearranging, the covariance,Cy(k−l)=∑i ∑mh(m)Cx(i−m)h(k−i)=h(k)⊛h(−k)⊛Cx(k). □
Lemma 4.A stationary random process at the output of a linear or nonlinear time-invariant system remains stationary.
Proof.
For any multivariate functionh(x)∈Rand anyt0∈R, the expectation,
Eh(x)=∫RN h(x)fX(x;t1,…,tN)dx=∫RN h(x)fX(x;t1−t0,…,tN−t0)dx
assuming Definition 1 and provided that dimension N of observations is at most equal to the order of stationarity K. □
For shorter sequences, the linear transformation,X=TU, can be used to generate a normally distributed vectorX∈RNhaving the covariance matrix,Cx=TTT, from uncorrelated Gaussian vectorU∈RN. The mean,EX=TEU. For longer sequences, a linear time-invariant filter can be equivalently used as indicated by Lemma 3.
2.4. Polynomials and Multivariate Functions
Lemma 5.
A univariate twice differentiable functionp(x)is convex, if and only if,d2dx2p(x)=p¨(x)>0for∀x∈R. More generally, a twice differentiable multivariate functionf:RN↦Ris convex, if and only if, the domain of f is convex, and its Hessian is positive semidefinite, i.e.,∇2f≥0for∀x∈domf.
Proof.
See [17] (Sec. 3.1). □
Consequently, convex polynomials can be generated as follows.
Lemma 6.
LetQ∈Rm×mbe a positive semidefinite matrix, and assume a polynomial,p¨(x)=∑i=02m−2 pi xifor∀x∈Rwherepi=∑k+l=i Qkl. Then, for anyq0,q1∈R, the polynomialp(x)of degree2m,
p(x)=q0+q1x+∑i=02m−2pi(i+1)(i+2)x2+i
is convex.
Proof.
Letx=[x0,x1,…,xm−1]T. Then,xTQx=∑i=02m−2 pi xi=p¨(x)≥0for∀x, sinceQis positive semidefinite. Using Lemma 5 concludes the proof. □
For a non-negative integerm∈{0}∪N+={0,1,2,…} , assume the following notations to simplify the subsequent mathematical expressions [18]:
n={n1,n2,…,nN}∈{N+∪0}Nx={x1,x2,…,xN}∈RNxp=x1p+x2p+⋯+xNp1/px1=|x1|+|x2|+⋯+|xN||x|1=x1+x2+⋯+xNh={h1,h2,…,hN}∈RNhn=h1n1 h2n2 ⋯hNnN ∂nf(x)=∂1n1 ∂2n2 ⋯∂NnN f(x)=∂|n|1 ∂x1n1 x2n2 ⋯xNnN f(x)m!=∏i=1min!=n1!n2!⋯nN!
Note that|x|1denotes the sum of elements ofxwhereasx1is the sum of absolute values of its elements.
Lemma 7.
The m-th power of a finite sum can be expanded as [19],
|x|1m=∑|n|1=mm!n!xn.
Proof.
See [18]. □
Theorem 1.
The multivariate Taylor’s expansion of a(m+1)-order differentiable functionf:RN↦Rabout the pointx∈RN is written as [19],
f(x+h)=f(x)+∑|n|1≤m∂nf(x)n!hn+∑|n|1=m+1∂nf(x+th)n!hn
for somet∈(0,1).
Proof.
See [18,19]. □
Definition 3.
A multivariate functionf(x)=f(x1,…,xN)is said to be symmetric, if, for any permutation of its argumentsxdenoted asx′,f(x)=f(x′).
3. Background Extensions In this section, additional results are obtained which are used in the next section to introduce statistical sum-moments for random vectors. In particular, the mean cosine similarity, the mean Minkowski distance as well as the higher central moments for random vectors are defined. A polynomial approximation of univariate functions is shown to be a linear regression problem. A numerically efficient solution of the LS problem is derived. Finally, a procedure to generate multiple Gaussian processes with defined autocovariance and cross-covariance is devised.
Recall the second moment of a random variable X, i.e.,
μ2=E(X−c)2,c∈R.
It is straightforward to show thatμ2is minimized forc=EX, giving the variance of X. On the other hand,μ2=0, if and only ifc=EX±−varX.
For random vectors, both cosine similarity and Minkowski distance are random variables. Assuming that the vectors are jointly stationary, and their elements are identically distributed, the mean cosine similarity can be defined as,
S¯cos(X1,X2)=∑i=1NEX1i X2i∑i=1NEX1i2−X¯1i2∑i=1NEX2i2−X¯2i2=1N∑i=1NEX1i X2iσ1 σ2=1N∑i=1Nρ1i,2i
whereσ12=varX1i,σ22=varX2i, and,ρ1i,2idenotes the Pearson correlation coefficient of the i-th elements of the vectorsX1andX2. It should be noted that this definition of the mean cosine similarity does not account for other correlations,EX1i X2j,i≠j.
The mean Minkowski distance for random vectors can be defined as,
S¯mnk(X1,X2)=∑i=1NEX1i−X2im1/m,m=1,2,…
Recognizing the m-th general moment in (5), the m-th power of the mean Minkowski distance can be normalized as,
S˜mnkm(X1,X2)=∑i=1NEX1i−X2imNEX1i−X2i2m/2=μ¯m(X1−X2),m=1,2,…
where the average Minkowski distance between two random vectors is,
μ¯m(X1−X2)=1N∑i=1Nμm(X1i−X2i),m=1,2,…
Furthermore, note that form=2,
12EX1−X222+EX1+X222=EX122+EX222.
Assuming positive integersm={m1,m2,…,mN}, the higher order joint central moments of random vectorX={Xi}1:Ncan be defined as,
μm1,…,mN(X1,…,XN)=E∏i=1N(Xi−X¯i)mi
or, using more compact index notation as,
μm(X)=E(X−X¯)m.
3.1. Linear LS Estimation
The linear LS estimation can be used to fit a degree(D−1)polynomial to N samples of a random processx(t)at discrete time instancest1,t2,…,tN. Hence, consider the polynomial data model,
Ex(t)≈M(t;P)=∑k=1DPk tk−1.
Denotingwi=[ti0,ti1,…,tiD−1]T , the linear LS solution (2) gives the estimates,
P^=∑i=1Nti0ti1⋯tiD−1ti1ti2⋯tiD⋮⋮⋱⋮tiD−1tiD⋯ti2D−2−1∑i=1Nx(ti)ti0ti1⋮tiD−1.
AssumingD=2parameters, i.e., the linear LS regression for a straight line, the parametersP1andP2to be estimated must satisfy the following equality:
ddP1∑i=1NXi−w1i P1−w2i P22=−2∑i=1Nw1i Xi+2∑i=1Nw1i2 P1+2∑i=1Nw1i w2i P2=!0.
Denoting the weighted averages,X¯=∑i=1N w1i Xi,w122=∑i=1N w1i2, andw¯12=∑i=1N w1i w2i, a necessary but not sufficient condition for the linear LS estimation of parametersP1andP2is,
X¯=w122P1+w¯12 P2.
In the LS terminology, the valuesw122andw¯12represent independent variables whereasX¯is a dependent variable.
Note that all N measurements are used in (7). However, if N is sufficiently large, the data points could be split into two disjoint sets ofN1andN2elements, respectively,N1+N2=N. For convenience, denote the sums,
X¯1=1a1∑i∈I1w1i Xi,W¯11=1a1∑i∈I1w1i2,W¯12=1a1∑i∈I1w1i w2i,X¯2=1a2∑i∈I2w1i Xi,W¯21=1a2∑i∈I2w1i2,W¯22=1a2∑i∈I2w1i w2i,
wherea1,a2>1are some constants (to be determined later), andI1andI2are two disjoint index sets, such thatI1∪I2={1,2,…,N}, and the cardinality,|I1|=N1and|I2|=N2. Note also that,
w122=a1 W¯11+a2 W¯21w¯12=a1 W¯12+a2 W¯22X¯=a1 X¯1+a2 X¯2.
Consequently, the approximate LS estimates of parametersP1andP2are readily computed as,
P^1P^2=W¯11W¯12W¯21W¯22−1X¯1X¯2.
There are2Npossibilities how to split N data points into two disjoint subsets indexed byI1andI2 . More importantly, the estimates (9) do not guarantee the minimum LS fit, i.e., achieving the minimum squared error (MSE). However, the complexity of performing the LS fit is greatly reduced by splitting the data, since independently of the value ofN≫1, only a2×2 matrix needs to be inverted. The optimum LS fit and the approximate LS fit (9) are depicted in Figure 1. The pointsA1andA2 in Figure 1 correspond to the data subsets indexed byI1andI2, respectively. The mid-point,B=a1 A1+a2 A2 , follows from (8). Note that B is always located at the intersection of the optimum and the approximate linesLSoptandLSapr, respectively. The vertical arrows at pointsA1andA2 in Figure 1 indicate that the dependent valuesX¯1andX¯2are random variables.
The larger the variation of the gradient of the lineLSapr in Figure 1, the larger the uncertainty and the probability that the lineLSaprdeviates from the optimum regression lineLSopt. Since the lineLSapris defined by pointsX¯1andX¯2, and always,B∈LSapr, the spread of random variablesX¯1andX¯2about their mean values affect the likelihood thatLSaprdeviates fromLSopt. In particular, given0<p<1, there existsξ>0, such that the probability of the gradientGaprofLSaprto be within the given bounds is at least,
Prbl(ξ)<Gapr<bu(ξ)≥p
where
bl(ξ)=EX¯2−ξvarX¯2−EX¯1+ξvarX¯1W¯22−W¯12bu(ξ)=EX¯2+ξvarX¯2−EX¯1−ξvarX¯1W¯22−W¯12.
For stationary measurements (cf. (9)), the means and the variances in (10) are equal to,
EX¯1=EX¯N1/(a1N)varX¯1=varX¯N1/(a12N)EX¯2=EX¯N2/(a2N)varX¯2=varX¯N2/(a22N).
The uncertainty in computingGaprfrom data, and thus, also the probability of lineLSaprdeviating from lineLSopt is inversely proportional to the width of the interval in (10), i.e.,
bu(ξ)−bl(ξ)=2ξ(varX¯1+varX¯2)W¯22−W¯12∝N1a1+N2a2W¯22−W¯12.
Consequently, the numerator of (11) must be minimized, and the denominator maximized.
In order to minimize the numerator in (11), it is convenient to choose,a1=N1νanda2=N2ν, whereν∈R+is a constant to be optimized. It is straightforward to show that the expression,(N11/2−ν+N21/2−ν)is convex, i.e., it has a unique global minimum for∀ν>1/2. Hence, a necessary condition to reduce the approximation error is thata1>N1anda2>N2. For numerical convenience, leta1=N1anda2=N2 . Then, the dependent and independent variables assumed in (9) become arithmetic averages, and the optimum index subsets have the cardinality,
|I1|=|I2|=N/2N−even|I1|=|I2|±1=(N±1)/2N−odd.
In order to maximize the denominator in (11), assume that the independent variables(w122,w¯12) in (9) are sorted bywli, i.e., letw11<w12<…<w1N . This ordering and the condition (12) suggests that the disjoint index setsI1andI2maximizing the difference,W¯22−W¯12, are,
I1={1,2,…,N/2},I2={N/2+1,…,N}N−evenI1={1,2,…,(N−1)/2},I2={(N+1)/2,…,N}N−oddor,I1={1,2,…,(N+1)/2},I2={(N+3)/2,…,N}.
Such a partitioning corresponds to splitting the data into two equal (N-even) or approximately equal (N-odd) sized subsets by the median (2-quantile) index point.
In summary, the approximate linear LS regression can be efficiently computed with a good accuracy by splitting the data into multiple disjoint subsets, calculating the average data points in each of these subsets, and then solving the set of linear equations with the same (or smaller) number of unknown parameters. The data splitting should exploit ordering of data points by one of the independent variables. It can be expected that the accuracy of approximate LS regression is going to improve with the number of data points N. Numerical evaluation of the approximate LS regression is considered in Section 5.
3.2. Generating Pairwise-Correlated Gaussian Processes
How to generate a single correlated Gaussian process is well established in the literature, and it has been described in Section 2.3. Moreover, the linear transformation to generate correlated Gaussian variables from uncorrelated Gaussian variables does not have to be square. A sufficient condition on the rank of linear transformation to obtain a positive definite covariance matrix is given by the following lemma.
Lemma 8.
The matrix,TTT, is positive definite, provided that the matrix,T∈RN1×N2, has rankN2.
Proof.
The rankN2ofTimplies thatTconsists ofN2linearly independent columns and thatN1≥N2. The matrixTTTis positive definite, provided thatUT TTTU=TU22>0for∀U∈RN2 where·2denotes the Euclidean norm of a vector. Since the columns ofTare linearly independent,TU2=0, if and only ifU=0. □
Corollary 1.
The matrix,TTT, is positive definite, provided that the matrix,T∈RN1×N2, has rankN1.
Corollary 2.
A rankN1linear transformationTof uncorrelated Gaussian vectorU∈RN2 generatesN1≤N2correlated Gaussian variables having the (positive definite) covariance matrix,TTT.
Furthermore, it is often necessary to generate multiple mutually correlated Gaussian processes with given autocorrelation as well as cross-correlation.
Lemma 9.
The linear transformation,
x1x2=T1K0T2u1u2
generates a pair of correlated Gaussian vectorsx1∈RN1 andx2∈RN2 from uncorrelated zero-mean Gaussian vectorsu1,u2∈RNwhere0denotes a zero matrix, and according to Corollary 2, it is necessary that,max(N1,N2)≤N. The corresponding (auto-) correlation and cross-correlation matrices are,
Eu1 u1T=σ12I,Eu2 u2T=σ22I,Eu1 u2T=0
Cx1 =Ex1 x1T=T1 T1T+KKT,Cx2 =Ex2 x2T=T2 T2T,Cx1 x2=Ex1 x2T=KT2T=Cx2 x1T
whereIdenotes an identity matrix.
Proof.
The proof is straightforward by substituting (13) to definitions ofCx1 ,Cx2 andCx1 x2. □
The following corollary details the procedure described in Lemma 9.
Corollary 3.
GivenT2, calculate the (auto-) correlation matrix,Cx2 =KT2T, or vice versa. Then, given the cross-correlation matrix,Cx1 x2, computeK=Cx1 x2 T2 Cx2−T. Finally, givenT1, calculate the (auto-) correlation matrix,Cx1 =T1 T1T+KKT, or, obtainT1by solving the matrix equation,T1 T1T=Cx1 −KKT. Note that the matrix equation,TTT=C, can be solved forTusing the singular value decomposition,C=UΛUTwhereUis a unitary matrix and Λ is a diagonal matrix of eigenvalues. Then,T=UΛ.
4. Polynomial Statistics and Sum-Moments for Vectors of Random Variables The main objective of this section is to define a universal function to effectively measure the statistics of random vectors and random processes observed at multiple discrete time instances. The measure function should: (1) be universally applicable for an arbitrary number of random variables and random vectors, (2) be symmetric, so that all random variables are considered equally, (3) lead to mathematically tractable expressions, and (4) be convex to allow defining convex optimization problems.
Letf:RN↦Rdenote such a mapping or transformation of N random variablesXito a scalar random variable Y, i.e.,
Y=f(X1,X2,…,XN)=f(x).
In order to satisfy the symmetry requirement, the random variablesXican be first combined as,
Y=f(X1∘X2∘⋯∘XN)
using a binary commutative operator, ∘, such as addition or multiplication. In case of addition, it is important that the function f is nonlinear. The nonlinearity can be also used to limit the extreme values of the combined variables.
For a random processx(t), the random variables are defined as,Xi=x(ti). Define a vector of discrete time instances,t=(t1,…,tN), and assume the index notation,x(t)={X1=x(t1),…,XN=x(tN)} . Then the mapping (14) can be rewritten as,
Y=f(x(t))=F(t)=F(t1,t2,…,tN).
The mean value is the most important statistical property of the random variable Y. In addition, if the processx(t)is K-th order stationary, the dimension of the mean mapping for N observations is reduced by one.
Lemma 10.
The mean ofY=f(x(t))for N discrete time observations of a K-th order stationary random process has dimension(N−1), i.e., ifN≤K,
Y¯=Ef(x(t)=C(t2−t1,t3−t1,…,tN−t1)=C(τ1,τ2,…,τN−1)=C(τ)
whereτi=ti+1−t1,i=1,2,…,N−1.
Proof.
Assuming Lemma 1 and the first sampleX1=x(t1)to be a reference, the joint probability density of N process observations becomes,fX(x;0,t2−t1,…,tN−t1)≡f˜X(x;t2−t1,…,tN−t1), so the corresponding statistical moments have dimension(N−1). □
In optimization problems, it is useful to consider the gradient ofY¯, i.e.,
∇Y¯=∂Y¯∂τ1,∂Y¯∂τ2,…,∂Y¯∂τN−1=∂Y¯∂T∂T∂τ1,∂T∂τ2,…,∂T∂τN−1
whereTis a application dependent measure of the vectorτsuch as the norms,T=τ1=|τ1|+⋯+|τN−1|, orT=τ∞=max(τ1,…,τN−1).
In general, assuming the Taylor’s series defined in Theorem 1 on p. 8, a multivariate function of a random vectorxcan be expanded about the meanx¯=Exas,
f(x¯+h)≈f(x¯)+∑l=1m∑|n|1=l∂nf(x)n!hn.
Thus, the value off(x¯+h)is a weighted sum ofhnplus an offsetf(x¯). More importantly, if the partial derivatives∂nf(x)are replaced with the coefficients(l!pl)which are independent ofn , the number of parameters in (15) is greatly reduced. Moreover, instead of precisely determining the values ofpl∈R to obtain the best possible approximation of the original function f, it is useful as well as sufficient to constrain the Taylor expansion (15) to the class of functions that are exactly constructed as,
f(x¯+h)=f(x¯)+∑l=1mpl∑|n|1=ll!n!hn=∑l=0mpl (h1+⋯+hN)l
wherep0=f(x¯) . The function expansion (16) represents a m-th degree polynomial in variable|h|1. The coefficientsplof this polynomial can be set using Lemma 6, so the polynomial is convex.
The key realization is that the polynomial functions (16) have all the desired properties specified at the beginning of this section.
Claim 1.
A multivariate functionf:RN↦Rhaving the desirable properties to measure the statistics of a random vectorxis the m-th degree polynomial,
Y=f(x)=∑l=0mpl ∑i=1N(Xi−EXi)l
wherep0=Ex, and the values of m and of the coefficientspl∈Rare determined by the application requirements.
The polynomial Function (17) can be used for any number of observations N. It is symmetric, so all observations are treated equally. Moreover, it is prone to integration and employing the expectation operator. The convexity can be achieved by Lemma 6.
4.1. Related Concepts
Assume the scalar function f defined in (17) for a K-th order stationarity random processx(t). Define the auxiliary random variable,
Z(a)=1a∑i=1N(Xi−X¯i)
where a is a normalization constant,a≠0 . The expression (17) can be then rewritten as,Y(a)=∑l=0m plZl(a). Fora=1,Z(1)has a zero mean, and the variance,varZ(1)=∑i,jE(Xi−X¯i)(Xj−X¯j). Fora=N,Z(N)represents a sample mean, and its variance is equal to,varZ(N)=varZ(1)/N2. Fora=N, the variance ofZ(N)is normalized by the number of dimensions, i.e.,varZ(N)=varZ(1)/N.
Forpl=sl/l! , in the limit of large m, Equation (17) gives,
Y=limm→∞∑l=0msll!(Z(a))l=esZ(a).
The mean,EY=EesZ(a), is the moment generating function of the random variableZ(a).
In data processing, the sample mean is intended to be an estimate of the true population mean, i.e.,N≫1is required. Here,Z(a)is calculated over a finite number of vector or process dimensions, so it is a random variable for∀a∈R\{0}. The variableZ(N)should be then referred to as an arithmetic average or a center of gravity of the random vectorXin the Euclidean spaceRN, i.e.,
Z(N)≜X¯=1N∑i=1NXi∈R.
Note that (18) is not anl1-norm, since the variablesXiare not summed with absolute values.
If the random variablesXiare independent, the distribution ofZ(a)is given by convolution of their marginal distributions. For correlated observations, if the characteristic function,f˜(s/a)=Eej·|X|1s/a, of the sum|X|1can be obtained, the distribution ofZ(a)=|X|1/ais computed as the inverse transform,
fZ(Z(a))=12π∫−∞∞e−jsZ(a)f˜sads.
Many other properties involving the sums of random variables can be obtained. For instance, if the random variablesXiare independent and have zero mean, then,
EZm(1)−E(Z(1)−XN)m=EXNmm=2,3EXN4+6∑i=1N−1EXN2EXi2m=4EXN5+10∑i=1N−1EXN2EXi3+EXN3EXi2m=5.
2 Considering Claim 1, an important statistic for a random vector can be defined as the m-th central sum-moment.
Definition 4.
The m-th central sum-moment of random vectorX∈RNis computed as,
∑μm(X)=μm (|X|1)=E∑i=1N(Xi−X¯i)m,m=1,2,…
Lemma 11.
The second central sum-moment of random vector is equal to the sum of all elements of its covariance matrix, i.e.,
∑μ2(X)=μ2(|X−X¯|1)=E∑i=1N(Xi−X¯i)2=∑i,j=1NcovXi,Xj=|Cx|1.
Furthermore, the second central sum-moment is also equal to the variance of|X|, i.e.,
∑μ2(X)=var|X|1=var∑i=1N(Xi−X¯i).
Proof.
The first equality is shown by expanding the expectation, and substituting for elements of the covariance matrix,[Cx]i,j=covXi,Xj. The second expression follows by noting that∑i=1N(Xi−X¯i)has zero mean. □
In the literature, there are other measures involving sums of random variables. For instance, in Mean-Field Theory, the model dimensionality is reduced by representing N-dimensional vectors by their center of gravity [20]. The central point of a vector is also used in the first order approximation of multivariate functions in [21] and in the model overall variance in [22].
In Measure Theory [23], the total variation (TV) of a real-valued univariate function,x:(t0,tN)↦R, is defined as the supremum over all possible partitionsP:t0<t1<⋯<tNof the interval(t0,tN), i.e.,
TV(x)=supP∑i=0N−1|x(ti+1−x(ti)|.
The TV concept can be adopted for observationsXi=x(ti)of a stationary random processx(t)at discrete time instances,{ti}0:N. A mathematically tractable mean TV measure can be defined as,
TV¯2(x)=supPE∑i=0N−1|Xi+1−Xi|2=supP2N(EXi2−covXi+1,Xi).
Jensen’s inequality for a random vector assuming equal weights can be stated as [17],
E1N∑i=1N(Xi−X¯i)m≤1N∑i=1NE|Xi−X¯i |m.
Alternatively, exchanging the expectation and summation, Jensen’s inequality becomes,
∑i=1N|EXi−X¯i|m≤E∑i=1N|Xi−X¯i|m.
Furthermore, if the right-hand side of (19) is to be minimized andm=2 , the inequality in (19) changes to equality. In particular, consider the minimum mean square error (MMSE) estimation of a vector of random parametersP={Pi}1:Nfrom measurementsX. DenotingP¯i(X)=EPi|X, conditioned onX, the MMSE estimatorP^(X) minimizes [15],
minP^|XE∑i=1N(P^i(X)−Pi)2=minP^|XE∑i=1N(P^i(X)−P¯i(X))−(Pi−P¯i(X))2=minP^|X∑i=1N(P^i(X)−P¯i(X))2+E∑i=1n(Pi−P¯i(X))2=minP^|X∑i=1N(EP^i(X)−Pi|X)2=minP^|X∑i=1N(P^i(X)−EPi|X)2
where the expectations are over the conditional distributionfP|X.
In signal processing, a length N moving average (MA) filter transforms the input sequenceXiinto an output sequenceYiby discrete-time convolution ⊛, i.e.,
Yi=∑j=0N−1Xi−j=[11…1︸1N]⊛Xi.
The (auto-) correlations of the input and output sequences are related by Lemma 3 on p. 6, i.e.,
CY(i)=1N⊛1N⊛Cx(i)=∑j=−N+1N−1(N−|j|)Cx(i−j).
Note that if the input process is stationary, then the input and output processes are jointly stationary. 4.2. Multiple Random Processes
The major complication with observing, evaluating, and processing multiple random processes is how to achieve their time alignment and amplitude normalization (scaling). Focusing here on the time alignment problem only, denote the discrete time observation instances of L random processes as,
tl={tl1<tl2<…<tlNl},l=1,2,…,L.
Assume that the first time instancetl1of every process serves as a reference. Then, there are(L−1)uncertainties in time alignment of L processes, i.e.,
Δl=(tl1−t11)∈R,l=2,3,…,L.
The(L−1)valuesΔlare unknown parameters which must be estimated. Note also that the difference,
Δl−Δk=tl1−tk1
represents an unknown time shift between the processxl(t)andxk(t).
For any multivariate stationary distribution of observations of two random processes, the corresponding cross-correlation normally attains a maximum for some time shift between these processes [12]. Hence, a usual strategy for aligning the observed sequences is to locate the maximum value of their cross-correlation. The time shiftsΔl,l=1,2,…,L, are then estimated as,
Δ^l=argmaxΔCx1 xl(Δ),Δ∈{(tli−t11)}i=1,…,Nl.
Assuming the center valuesX¯1andX¯2 in (18) as scalar representations of the vectorsX1andX2, their cross-covariance can be computed as,
cov|X1 |1,|X2|1=N2covX¯1,X¯2=N2E(X¯1−EX1)(X¯2−EX2).
The task now is how to generalize the pairwise cross-covariance (22) to the case of multiple random vectors having possibly different lengths. If all random vectors of interest are concatenated into one single vector, the m-th joint central sum-moment can be defined by utilizing Claim 1.
Definition 5.
The m-th central sum-moment for L random processes withNlobservations,l=1,2,…,L, is computed as,
∑μm(X1,…,XL)=μm(|X1 |1+…+|XL|1)=E∑l=1L∑i=1Nl(Xli−X¯li)m.
Lemma 12.
The second central sum-moment for L random processes withNlobservations,l=1,2,…,L, is equal to the sum of all pairwise covariances, i.e.,
∑μ2(X1,…,XL)=∑l,k=1L∑i,j=1NlcovXli,Xkj=∑l,k=1L|covXl,Xk|1=var|X1 |1+…+|XL|1.
Proof. The expression can be obtained by expanding the sum and then applying the expectation. □
Many other properties of central and noncentral sum-moments can be obtained. For example, assuming two equal-length vectorsX1andX2, it is straightforward to show that,
E|X1 |1+|X2|12−E|X1 |12+|X2|12=2∑i,j=1NEX1i X2jE|X1 |1+|X2|12−EX1−X222=∑i,j=1i≠jNEX1i X1j+X2i X2j+2∑i,j=1NEX1i X2j+2∑i=1NEX1i X2i.
5. Illustrative Examples
This section provides examples to quantitatively evaluate the results developed in the previous sections. In particular, the accuracy of approximate linear LS regression proposed in Section 3.1 is assessed to justify its lower computational complexity. The central sum-moments introduced in Section 4 are compared assuming correlated Gaussian processes. Finally, several signal processing problems involving the 1st order Markov process are investigated.
5.1. Linear Regression
Consider a classical one-dimensional linear LS regression problem with independent and identically normally distributed errors. The errors are also assumed to be independent from all other data model parameters. The data points are generated as,
Xi=Δi:Yi=P2 Xi+P1+Ei,i=1,2,…,N
whereEiare zero-mean, uncorrelated Gaussian samples having the equal varianceσe2, andP1andP2 are unknown parameters to be estimated. This LS problem can be solved exactly using the expression (2), and substitutingw1i=1andw2i=Δi,∀i=1,2,…,N. Alternatively, to avoid inverting theN×N data matrix, the procedure devised in Section 3.1 suggests to split the data into two equal-size subsets, compute the average data point for each subset, and then solve the corresponding set of two equations with two unknowns. Specifically, the set of two equations with the unknown parametersP^1andP^2is,
2N∑i=1N/2Yi=P^1+P^2Δ8N(2+N)2N∑i=N/2+1NYi=P^1+P^2Δ8N(2+3N)
assuming N is even, and using,∑i=1N/2Δi=Δ8N(2+N), and∑i=N/2+1NΔi=Δ8N(2+3N). DenotingY¯1=2N∑i=1N/2 YiandY¯2=2N∑i=N/2+1N Yi , the closed-form solution of (23) is,
P^1=12N(2+3N)Y¯1−(2+N)Y¯2P^2=4ΔN2(Y¯2−Y¯1).
As a numerical example, assume the true valuesP1=1.5,P2=0.3,EEi=0,varEi=1, andN=40 data points. Figure 2 shows the intervals(T¯−varT,T¯+varT)versus the subset size1≤N1≤N/2for the random variable T defined as,
T=100Sapr(N)−Sopt(N)Sopt(N)
whereSapr(N)=∑i=1N (Yi−P^1−P^2 Xi)2andSopt(N)=∑i=1N (Yi−P1−P2 Xi)2are the total MSEs. In the limit,limN→∞(Sapr(N)−Sopt(N))=0, since a sufficiently large subset of data is as good as the complete set of data. For finite N, it is likely thatSapr(N)>Sopt(N) , so the lower bounds in Figure 2 converge much faster to zero than the upper bounds.
5.2. Comparison of Central Moments
Assuming Lemma 11 and Equation (3), the second central sum-moment of the 2nd order Markov process of length N is,
∑μ2(X)=∑i,j=1NC2MP(i−j)=σ2N+2∑i=1N−1i(1+(N−i)α)e−α(N−i).
These moments are compared in Figure 3 for different values of the sequence length N, three values of the parameterα, and assumingσ2=1. It can be observed that the values of the second central sum-moment are increasing with N and the level of correlation,e−α.
Consider now the following three central moments of orderm=1,2,…, i.e.,
S¯mnk(N)=∑i=1NE|NXi |m∑μ2(N)=E∑i=1NXim∑μ˜2(N)=E∑i=1N|Xi|m.
The moment,S¯mnk, is the mean Minkowski distance; the scaling byNis introduced to facilitate the comparison with the other two moments, i.e., the mean sum-moment∑μ2, and the mean sum-moment∑μ˜2having the samples summed as thel1norm. More importantly, assuming a correlated Gaussian processXi=x(ti), the central sum-moment can be readily obtained in a closed form whereas obtaining the closed form expressions for the other two metrics may be mathematically intractable. In particular, by factoring the covariance matrix as,Cx=Tx TxT, the correlated Gaussian vector can be expressed as,X=TU. Then the sum of elements,|X|1=1TTU, and the m-th central sum-moment can be computed as,
∑μ2(N)=E1TTUm=1TT2mE|U|m=1TT2m2m/2πΓm+12
where U is a zero-mean, unit-variance Gaussian random variable, andΓdenotes the gamma function.
Figure 4 shows all three moments as a function of sequence length N for three values of parameterα assuming the 1st order Markov process. The vertical axis in Figure 4 is scaled by1/Nfor convenience. Note that, for uncorrelated, i.e., independent Gaussian samples, the moments∑μ2and∑μ˜2are identical. More importantly, all three moments are strictly increasing with the number of samples N and with the moment order m.
2
Finally, the second central sum-moment can be visualized in two dimensions. ConsiderN=3observations of a zero-mean real-valued stationary random process,Xi=x(ti),i=1,2,3. Let∑μ2=E(X1+X2+X3)2=∑i,j=13EXi Xj. Assuming the 1st order Markov correlation model,EXi Xj∝e−0.5|τ| , Figure 5 shows the values of the central sum-moment∑μ2versus the sample distancesτ1=(t2−t1)andτ2=(t3−t1) . Several symmetries can be observed in Figure 5. In particular, the central sum-moment∑μ2is symmetric about the axisτ1=τ2as well as about the axisτ1=−τ2. These symmetries are consequences of the following equalities:
∑μ2(τ1,τ2)=∑μ2(τ2,τ1)∑μ2(τ1,τ2)=∑μ2(−τ1,−τ2)∀τ1,τ2.
5.3. Signal Processing Problems for the 1st Order Markov Process
Consider the 1st order Markov process observed at the output of a length N MA filter. According to (21), the (auto-) covariance of the output process is,
C1MP+MA(k;α)=∑j=−N+1N−1(N−|j|)σ2e−α|k−j|.
Conjecture 1.The MA filtering of the 1st order Markov process generates nearly a 2nd order Markov process.
The parameter of the 2nd order Markov process approximating the combined (auto-) covariance (24) can be obtained using the LS regression fit, i.e.,
α^=argminα˜∑k(C1MP+MA(k;α)−C2MP(k;α˜))2.
Substituting (3) and (24) into (25), and letting the first derivative to be equal to zero, the LS estimateα^must satisfy the linear equation,
∑kα^|k|+1+W−1−C1MP+MA(k;α)e=0
which can be readily solved forα^, andW−1 denotes the Lambert function [24].
A discrete time sequence of N elements has the (auto-) covariance constrained to(2N−1) time indexes as indicated in (24). Assuming the length N MA filter, and that there are(nxN)samples,nx=1,2,… , of the 1st order Markov process available, the (auto-) covariance (24) has the overall length2N(nx+1)−3 samples. Figure 6 compares the MSE,
MSE(nx)=100×∑k (C1MP+MA(k;α)−C2MP(k;α^))2∑k C1MP+MA2(k;α)
of the LS fit of the (auto-) covariance of the 2nd order Markov process to the combined (auto-) covariance of the 1st order Markov process and the impulse response of the MA filter assuming three values ofαand two values ofnx. Givenαandnx , Figure 6 shows that the best LS fit occurs for a certain value of the MA filter length N. It can be concluded that, in general, the 1st order Markov process changes to the 2nd order Markov process at the output of the MA filter.
2
The second problem to investigate is a linear MMSE (LMMSE) prediction of the 1st order Markov process observed at the output of a MA filter. In particular, given N samplesXi=x(ti),i=1,2,…,N , of random process having the (auto-) covariance (24), the task is to predict its future value,XN+1=x(tN+1),tN+1>tN.
In general, the impulse responsehof the LMMSE filter to estimate an unknown scalar parameter P from the measurementsX is computed as [15],
h=E(x−x¯)(X−X¯)TEPX.
Here, the unknown parameterP=XN+1, andEXN+1 Xi=C1MP+MA(N+1−i)andEXi Xj=C1MP+MA(i−j) in (26) which gives the length N LMMSE filter,
h=[00…0︸N−1C1MP+MA(1)].
Consequently, the predicted value,XN+1=XNC1MP+MA(1)/σX2. Note that the same procedure, but excluding the MA filter, gives the LMMSE estimate,XN+1=XNC1MP(1)/σX2.
The last problem to consider is a time alignment of two zero-mean, jointly stationary processes. It is assumed that the normalized cross-covariance of these two processes is,
EX1i X2jEX1i2EX2i2=e−α|i−j|(1+α|i−j|).
Denote the uncertainty in determining the difference,(i−j), asΔ. In order to estimate the unknown parametersαandΔ , the left-hand side of (27) can be estimated by the method of moments, i.e., letEXi2≈1N∑i=1N Xi2, and,EX1i X2j≈1N∑i=1N X1i X2|i−Δ| . The cross-covariance (27) can be then rewritten as,
vk=∑i=1N X1i X2|i−Δ−k|∑i=1N X1i2∑i=1N X2i2=e−α|Δ+k|(1+α|Δ+k|),k=0,1,2,…
Utilizing the Lambert functionW−1, the cross-covariance can be rewritten further as,
α|Δ+k|=−1−W−1−vke≜v˜k,k=0,1,2,…
Assuming, without loss of generality, thatΔ≥0 , the absolute value in (28) can be ignored. Consequently, the unknown parametersαandΔcan be obtained as a linear LS fit to N measured and calculated valuesv˜k in the linear model (28).
6. Conclusions The development of a novel statistical measure to enable correlation analysis for multiple random vectors resumed by summarizing background knowledge on statistical description of discrete time random processes. This was then extended with the derivation of several supporting results which were used in the following sections. Specifically, it was shown that linear regression can be effectively approximated by splitting the data into disjoint subsets and assuming only one average data point within each subset. In addition, a procedure for generating multiple Gaussian processes with the prescribed autocovariance and cross-covariance was devised. The main result of the paper was obtained by assuming the Taylor’s expansion of multivariate symmetric scalar functions, and then approximating the Taylor’s expansion by a univariate polynomial. The single polynomial variable is a simple sum of variables in the original multivariate function. The polynomial approximation represents a mapping from multiple discrete time observations of a random process to a multidimensional scalar field. The mean field value is a weighted sum of canonical central moments with increasing orders. These moments were named central sum-moments to reflect how they are defined. The sum-moments were then discussed in light of other similar concepts such as total variance, Mean Field Theory, and moving average sequence filtering. Illustrative examples were studied in the last section of the paper. In particular, the accuracy of approximate linear regression was evaluated quantitatively assuming two disjoint data subsets. Assuming the 1st and the 2nd order Markov processes, the central sum-moments were compared with the mean Minkowski distance. For Gaussian processes, the central sum-moments can be obtained in closed form. The remaining problems investigated moving average filtering of the 1st order Markov processes and its prediction using a linear MMSE filter.
Figure 1. The exact (LSopt) and the reduced complexity (LSapr) linear LS regression.
Figure 2. The relative total mean-square error of the approximate linear LS regression.
Figure 3. The second central sum-moments of the 2nd order Markov process with parameterα and length N.
Figure 4. The Minkowski (blue), sum-moment (black), and sum-moment of absolute values (red) mean statistics for the 1st order Markov sequence of length N. Columns: different valuesα . Rows: different values m.
Figure 5. The second central sum-moment as a function of time differences betweenN=3observations of a stationary random process.
Figure 6. The MSE of approximating the (auto-) covariance of the 1st order Markov process at the output of length N MA filter by the (auto-) covariance of the 2nd order Markov process.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The author declares no conflict of interest.
Abbreviations
The following abbreviations are used in the paper:
| 1MP | 1st order Markov process |
| 2MP | 2nd order Markov process |
| AR | autoregressive |
| LMMSE | linear minimum mean square error |
| LS | least squares |
| MMSE | minimum mean square error |
| MSE | mean square error |
| MA | moving average |
| TV | total variance |
| |·| | absolute value, set cardinality, sum of vector elements |
| E· | expectation |
| corr· | correlation |
| cov· | covariance |
| var· | variance |
| (·)-1 | matrix inverse |
| (·)T | matrix/vector transpose |
| fx | distribution of a random variable X |
| f,f˙,f¨ | function f, and its first and second derivatives |
| N+ | positive non-zero integers |
| R,R+ | real numbers, positive real numbers |
| X¯ | mean value of random variable X |
| Xij | j-th sample of process i |
| W-1 | Lambert function |
1. Drezner, Z. Multirelation-A correlation among more than two variables. Comput. Stat. Data Anal. 1995, 19, 283-292.
2. Dear, R.; Drezner, Z. On the significance level of the multirelation coefficient. J. Appl. Math. Decis. Sci. 1997, 1, 119-130.
3. Geiß, S.; Einax, J. Multivariate correlation analysis-A method for the analysis of multidimensional time series in environmental studies. Chemom. Intell. Lab. Syst. 1996, 32, 57-65.
4. Abdi, H. Chapter Multiple correlation coefficient. In Encyclopedia of Measurements and Statistics; SAGE: Los Angeles, CA, USA, 2007; pp. 648-655.
5. Székely, G.J.; Rizzo, M.L.; Bakirov, N.K. Measuring and testing dependence by correlation of distances. Ann. Stat. 2007, 35, 2769-2794.
6. Merigó, J.M.; Casanovas, M. A New Minkowski Distance Based on Induced Aggregation Operators. Int. J. Comput. Intell. Syst. 2011, 4, 123-133.
7. Nguyen, H.V.; Müller, E.; Vreeken, J.; Efros, P.; Böhm, K. Multivariate maximal correlation analysis. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21-26 June 2014; pp. II-775-II-783.
8. Josse, J. Measuring multivariate association and beyond. Stat. Surv. 2016, 10, 132-167.
9. Shu, X.; Zhang, Q.; Shi, J.; Qi, Y. A comparative study on weighted central moment and its application in 2D shape retrieval. Information 2016, 7, 10.
10. Böttcher, B.; Keller-Ressel, M.; Schilling, R.L. Distance multivariance: New dependence measures for random vectors. Ann. Stat. 2019, 47, 2757-2789.
11. Wang, J.; Zheng, N. Measures of correlation for multiple variables. arXiv 2020, arXiv:1401.4827.
12. Gardner, W.A. Introduction to Random Processes With Applications, 2nd ed.; McGraw-Hill: New York, NY, USA, 1990.
13. Papoulis, A.; Pillai, S.U. Probability, Random Variables, and Stochastic Processes, 4th ed.; McGraw-Hill: New York, NY, USA, 2002.
14. Giller, G.L. The Statistical Properties of Random Bitstreams and the Sampling Distribution of Cosine Similarity. SSRN Preprint 2012.
15. Kay, S.M. Fundamentals of Statistical Signal Processing: Estimation Theory; Prentice Hall: Upper Saddle River, NJ, USA, 1993; Volume I.
16. Oppenheim, A.V.; Schafer, R.W.; Buck, J.R. Discrete-Time Signal Processing, 3rd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2009.
17. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004.
18. Folland, G.B. Higher-Order Derivatives and Taylor's Formula in Several Variables. Available online: https://sites.math.washington.edu/~folland/Math425/taylor2.pdf (accessed on 1 December 2020).
19. Apostol, T. Mathematical Analysis, 2nd ed.; Pearson: London, UK, 1973.
20. Yedidia, J.S. chapter An idiosyncratic journey beyond mean field theory. In Advanced Mean Field Methods; MIT Press: Cambridge, MA, USA, 2001; pp. 21-36.
21. Sobol, I.M. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 2001, 55, 271-280.
22. Saltelli, A.; Ratto, M.; Tarantola, S.; Campolongo, F. Sensitivity analysis for chemical models. Chem. Rev. 2005, 105, 2811-2828.
23. Shirali, S. A Concise Introduction to Measure Theory; Springer: Berlin/Heidelberg, Germany, 2018.
24. Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; Dover: Mineola, NY, USA, 1974.
Pavel Loskot
Zhejiang University/University of Illinois at Urbana-Champaign Institute, Haining 314400, China
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The paper investigates the problem of performing a correlation analysis when the number of observations is large. In such a case, it is often necessary to combine random observations to achieve dimensionality reduction of the problem. A novel class of statistical measures is obtained by approximating the Taylor expansion of a general multivariate scalar symmetric function by a univariate polynomial in the variable given as a simple sum of the original random variables. The mean value of the polynomial is then a weighted sum of statistical central sum-moments with the weights being application dependent. Computing the sum-moments is computationally efficient and amenable to mathematical analysis, provided that the distribution of the sum of random variables can be obtained. Among several auxiliary results also obtained, the first order sum-moments corresponding to sample means are used to reduce the numerical complexity of linear regression by partitioning the data into disjoint subsets. Illustrative examples provided assume the first and the second order Markov processes.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer




