Introduction
Many natural processes are sufficiently complex that a stochastic model is essential, or at the very least an efficient description . Such a process will be specified by several properties, of which a particularly important one is the degree of memory in a time series, often expressed through a characteristic autocorrelation time over which fluctuations will decay in magnitude. In this paper, however, we are concerned with specific types of stochastic processes that are capable of possessing long memory (LM) . Long memory is the notion of there being correlation between the present and all points in the past. A standard definition is that a (finite variance, stationary) process has long memory if its autocorrelation function (ACF) has power-law decay: such that as , for some non-zero constant , and where 0 . The parameter is the memory parameter; if 0 the process does not exhibit long memory, while if 0 the process is said to be anti-persistent.
The asymptotic power-law form of the ACF corresponds to an absence of a characteristic decay timescale, in striking contrast to many standard (stationary) stochastic processes where the effect of each data point decays so fast that it rapidly becomes indistinguishable from noise. An example of the latter is the exponential ACF, where the e-folding timescale sets a characteristic correlation time. The study of processes that do possess long memory is important because they exhibit unusual properties, because many familiar mathematical results fail to hold, and because of the numerous examples of data sets where LM is seen.
The study of long memory originated in the 1950s in the field of hydrology, where studies of the levels of the Nile demonstrated anomalously fast growth of the rescaled range of the time series. After protracted debates
For a detailed exposition of this period of mathematical history, see .
about whether this was a transient (finite time) effect, the mathematical pioneer Benoît B. Mandelbrot showed that if one retained the assumption of stationarity, novel mathematics would then be essential to sufficiently explain the Hurst effect. In doing so he rigorously defined the concept of long memory .Most research into long memory and its properties has been based on classical
statistical methods, spanning parametric, semi-parametric, and non-parametric
modeling
Towards easing the computational burden, we focus on the autoregressive fractional integrated moving average (ARFIMA) class of processes as the basis of developing a systematic and unifying Bayesian framework for modeling a variety of common time series phenomena, with particular emphasis on (marginally) detecting potential long-memory effects (i.e., averaging over short-memory and seasonal effects). ARFIMA has become very popular in statistics and econometrics because it is generalizable and its connection to the autoregressive moving average (ARMA) family and to fractional Gaussian noise is relatively transparent. A key property of ARFIMA is its ability to simultaneously yet separately model long and short memory.
Here we present a Bayesian framework for the efficient and systematic estimation of the ARFIMA parameters. We provide a new approximate likelihood for ARFIMA processes that can be computed quickly for repeated evaluation on large time series, and which underpins an efficient Markov chain Monte Carlo (MCMC) scheme for Bayesian inference. Our sampling scheme can be best described as a modernization of a blocked MCMC scheme proposed by – adapting it to the approximate likelihood and extending it to handle a richer form of (known) short-memory effects. We then further extend the analysis to the case where the short-memory form is unknown, which requires trans-dimensional MCMC, in which the model order (the and parameters in the ARFIMA model) varies and, thus, so does the dimension of the problem. This aspect is similar to the work of , who considered the simpler autoregressive-integrated moving average (ARIMA) model class, and to , who worked with a non-parametric long-memory process. Our contribution has aspects in common with , who presented a more limited method focused on model selection rather than averaging. The advantage of averaging is that the unknown form of short-memory effects can be integrated out, focusing on long memory without conditioning on nuisance parameters.
The aim of this paper is to introduce an efficient Bayesian algorithm for the inference of the parameters of the ARFIMA(,,) model, with particular emphasis on the LM parameter . Our Bayesian inference algorithm has been designed in a flexible fashion so that, for instance, the innovations can come from a wide class of different distributions, e.g., stable or distribution (to be published in a companion paper). The remainder of the paper is organized as follows. Section discusses the important numerical calculation of likelihoods, representing a hybrid between earlier classical statistical methods and our new contributions towards a full-Bayesian approach. Section describes our proposed Bayesian framework and methodology in detail, focusing on long memory only. Then, in Sect. , we consider extensions for additional short memory and the computational techniques required to integrate them out. Empirical illustration and comparison of all methods is provided in Sect. via synthetic and real data including the Nile water level data and the central England temperature (CET) time series, with favorable comparison to the standard estimators. In the case of the Nile data, we find strong evidence for long memory. The CET analysis requires a slight extension to handle seasonal long memory, and we find that the situation here is more nuanced in terms of evidence for long memory. The paper concludes with a discussion in Sect. focused on the potential for further extension.
Likelihood evaluation for Bayesian inference
ARFIMA model
We provide here a brief review of the ARFIMA model. More details are given in Appendix .
An ARFIMA model is given by We define the backshift operator , where , and powers of are defined iteratively: . is the autoregressive component and is the moving average component and constitute the short-memory components of the ARFIMA model. These are defined in more detail in Appendix and in .
Likelihood function
For now, we restrict our attention to a Bayesian analysis of an ARFIMA(0,,0) process, having no short-ranged ARMA components ( 0), placing emphasis squarely on the memory parameter . As we explain in our Appendix, the resulting process is identical to a fractionally integrated processes with memory parameter .
Here we develop an efficient and new scheme for evaluating the (log) likelihood, via approximation. Throughout, the reader should suppose that we have observed the vector (, …, as a realization of a stationary, causal and invertible ARFIMA(0,,0) process with mean . The innovations will be assumed to be independent, and taken from a zero-mean location-scale probability density ; 0, , ), which means the density can be written as ; , , ) ; 0, 1, ). The parameters and are called the location and scale parameters, respectively. The -dimensional is a shape parameter (if it exists, i.e., 0). A common example is the Gaussian , ), where and there is no . We classify the four parameters , , , and into three distinct classes: (1) the mean of process, ; (2) innovation distribution parameters, (, ); and (3) memory structure, . Together, (, , ), where will later encompass the short-range parameters and .
Our proposed likelihood approximation uses a truncated autoregressive model (AR) () approximation (cf. ). We first re-write the AR() approximation of ARFIMA(0,,0) to incorporate the unknown parameter , and drop the () superscript for convenience: ). Then we truncate this AR() representation to obtain an AR() one, with large enough to retain low frequency effects, e.g., . We denote and, with 1, rearrange terms to obtain the following modified model:
It is now possible to write down a conditional likelihood. For convenience the notation (, …, for 1, …, will be used (and is interpreted as appropriate where necessary). Denote the unobserved vector of random variables (, …, , by (in the Bayesian context these will be auxiliary, hence “”). Consider the likelihood as a joint density, which can be factorized as a product of conditionals. Writing , for the density of conditional on , we obtain , .
This is still of little use because the may have a complicated form. However, by further conditioning on , and writing , , for the density of conditional on and , we obtain , ) , , ). Returning to Eq. () observe that, conditional on both the observed and unobserved past values, is simply distributed according to the innovations' density with a suitable change in location: , ; [ ], , ). Then using location-scale representation: Therefore, , ) , or equivalently
Evaluating this expression efficiently depends upon efficient calculation of
(, …, and . From
Eq. (), is a convolution of the augmented
data, , , and coefficients depending on , which
can be evaluated quickly in the R language for statistical computing
via
A Bayesian approach to long-memory inference
We are now ready to consider Bayesian inference for ARFIMA(0,,0) processes. Our method can be succinctly described as a modernization of the blocked MCMC method of . Isolating parameters by blocking provides significant scope for modularization, which helps to accommodate our extensions for short memory. Pairing with efficient likelihood evaluations allows much longer time series to be entertained than ever before. Our description begins with the appropriate specification of priors, which are more general than previous choices, yet still encourages tractable inference. We then provide the relevant updating calculations for all parameters, including those for auxiliary parameters .
We follow earlier work and assume a priori independence for components of . Each component will leverage familiar prior forms with diffuse versions as limiting cases. Specifically, we use a diffuse Gaussian prior on : , ), with large. The improper flat prior is obtained as the limiting distribution when : 1. We place a gamma prior on the precision implying a root-inverse gamma distribution , for , with density , 0. A diffuse/improper prior is obtained as the limiting distribution when , 0: , which, in the asymptotic limit, is equivalent to a log uniform prior. Finally, we specify , ).
Updating : following , we use a symmetric random walk (RW) Metropolis–Hastings (MH) update with proposals , , for some . The acceptance ratio is under the approximate likelihood.
Updating : we diverge from here, who suggest independent MH with moment-matched inverse gamma proposals, finding poor performance under poor moment estimates. We instead prefer a RW MH approach, which we conduct in log space since the domain is . Specifically, we set: , where (0, ) for some . is log-normal and we obtain . Recalling Eq. (), the MH acceptance ratio under the approximate likelihood is
The MH algorithm, applied alternately in a Metropolis-within-Gibbs fashion to the parameters and , works well. However, actual Gibbs sampling is an efficient alternative in this two-parameter case (i.e., for known ; see ).
Update of : updating the memory parameter is far less straightforward than either or . Regardless of the innovations' distribution, the conditional posterior is not amenable to Gibbs sampling. We use RW proposals from truncated Gaussian , , with density where and are the standard normal cumulative density function (CDF) and probability density function (PDF), respectively. In particular, we use , via rejection sampling from , until , (, ). Although this may seem inefficient, it is perfectly acceptable; for example, if 0.5 the expected number of required variates is still less than 2, regardless of . More refined methods of directly sampling from truncated normal distributions exist – see for example – but we find little added benefit in our context.
A useful cancellation in ; ; obtained from Eq. () yields Denote for 1, …, , where are the proposed coefficients ; ; furthermore, . Then in the approximate case
Optional update of : when using the approximate
likelihood method, one must account for the auxiliary variables
, a vector (e.g., ). We find that, in practice, it is
not necessary to update all the auxiliary parameters at each iteration. In
fact the method can be shown to work perfectly well, empirically, if we
never update them, provided they are given a sensible initial value
(such as the sample mean of the observed data ). This is not an
uncommon tactic in the long-memory (big-) context
For a full-MH approach, we recommend an independence sampler to backward project the observed time series. Specifically, first relabel the observed data: , 0, … 1; furthermore, use the vector (, …, , to generate a new vector of length , (, …, where via Eq. (): , where the coefficients are determined by the current value of the memory parameter(s). Then take the proposed , denoted , as the reverse sequence: , 0, …, 1. Since this is an independence sampler, calculation of the acceptance probability is straightforward. It is only necessary to evaluate the proposal density , . But this is easy using the results from Sect. . For simplicity, we prefer uniform prior for .
Besides simplicity, justification for this approach lies primarily in is preservation of the autocorrelation structure – this is clear since the ACF is symmetric in time. The proposed vector has a low acceptance rate, and the potential remedies (e.g., multiple-try methods) seem unnecessarily complicated given the success of the simpler method.
Extensions to accommodate short memory
Simple ARFIMA(0,,0) models are mathematically convenient but have limited practical applicability because the entire memory structure is determined by just one parameter, . Although is often of primary interest, it may be unrealistic to assume no short-memory effects. This issue is often implicitly acknowledged since semi-parametric estimation methods, such as those used as comparators in Sect. , are motivated by a desire to circumvent the problem of specifying precisely (and inferring) the form of short memory (i.e., the values of and in an ARIMA model). Full parametric Bayesian modeling of ARFIMA(,,) processes represents an essentially untried alternative, primarily due to computational challenges. Related, more discrete, alternatives show potential. considered all four models with , 1, whereas considered 16 with , 3.
Such approaches, especially ones allowing larger , , can be computationally burdensome as much effort is spent modeling unsuitable processes towards a goal (inferring , ), which is not of primary interest ( is). To develop an efficient, fully parametric, Bayesian method of inference that properly accounts for varying models, and to marginalize out these nuisance quantities, we use reversible-jump (RJ) MCMC . We extend the parameter space to include the set of models ( and ), with chains moving between (i.e., changing and/or ) and within (sampling and given particular fixed and ) models, and focus on the marginal posterior distribution of obtained by (Monte Carlo) integration over all models and parameters therein. RJ methods, which mixes so-called trans-dimensional, between-model moves with the conventional within-model ones, have previously been applied to both autoregressive models , and full-ARMA models . In the long-memory context, applied RJ to fractional exponential processes (FEXP). However for ARFIMA, the only related work we are aware of is by who demonstrated a promising if limited alternative.
Below we show how the likelihood may be calculated with extra short-memory components when and are known, and subsequently how Bayesian inference can be applied in this case. Then, the more general case of unknown and via RJ is described. The result is a Monte Carlo inferential scheme that allows short-memory effects to be marginalized out when summarizing inferences for the main parameter of interest: , for long memory.
Likelihood derivation and inference for known short memory
Recall that short-memory components of an ARFIMA process are defined by the AR and moving average (MA) polynomials, and , respectively (see Sect. ). Here, we distinguish between the polynomial, , and the vector of its coefficients, (, …, ). When the polynomial degree is required explicitly, bracketed superscripts will be used: , , , .
We combine the short-memory parameters and with to create a single memory parameter, (, , ). For a given unit-variance ARFIMA(,,) process, we denote its autocovariance (ACV) by , with and , those of the relevant unit-variance ARFIMA(0,,0) and ARMA(,) processes, respectively. The spectral density function (SDF) of the unit-variance ARFIMA(,,) process is written as , and its covariance matrix is .
An exact likelihood evaluation requires an explicit calculation of the ACV
; however, there is no simple closed form
for arbitrary ARFIMA processes. Fortunately, our proposed approximate
likelihood method of section can be ported over directly. Given
the coefficients and polynomials and , it is
straightforward to calculate the
coefficients required by again applying the numerical methods of
To focus the exposition, consider the simple, yet useful, ARFIMA(1,,0) model where the full memory parameter is (, ). Because the parameter spaces of and are independent, it is simplest to update each of these parameters separately; with the methods of Sect. and similarly: , , for some . In practice however, the posteriors of and typically exhibit significant correlation so independent proposals are inefficient. One solution would be to parametrize to some and orthogonal , but the interpretation of would not be clear. An alternative to explicit reparametrization is to update the parameters jointly, but in such a way that proposals are aligned with the correlation structure. This will ensure a reasonable acceptance rate and mixing.
To propose parameters in the manner described above, a two-dimensional,
suitably truncated Gaussian random walk, with covariance matrix aligned with
the posterior covariance, is required. To make proposals of this sort, and
indeed for arbitrary in larger and cases,
requires sampling from a hypercuboid-truncated multivariate normal (MVN)
, ),
where (, ) describe the coordinates of the hypercube. We
find that rejection sampling-based unconstrained similarly parametrized MVN
samples (e.g., using
The only technical difficulty is the choice of proposal covariance matrix . Ideally, it would be aligned with the posterior covariance; however, this is not a priori known. We find that running a pilot chain with independent proposals via , ) can help choose a . A rescaled version of the sample covariance matrix from the pilot posterior chain, following , works well (see Sect. ).
Unknown short-memory form
We now expand the parameter space to include models , the set of ARFIMA models with and short-memory parameters, indexing the size of the parameter space . For our trans-dimensional moves, we only consider adjacent models, on which we will be more specific later. For now, note that the choice of bijective function mapping between model spaces (whose Jacobian term appears in the acceptance ratio) is crucial to the success of the sampler. To illustrate, consider transforming from down to . This turns out to be a non-trivial problem, however, because (for 1) has a very complicated shape. The most natural map would be (, …, , ) (, …, ). However, there is no guarantee that the image will lie in . Even if the model dimension is fixed, difficulties are still encountered; a natural proposal method would be to update each component of separately but, because of the awkward shape of , the allowable values for each component are a complicated function of the others. Non-trivial proposals are required.
A potential approach is to parametrize in terms of the inverse roots (poles) of , as advocated by : by writing (1 ), we have 1 for all . This looks attractive because it transforms into ( times) where is the open unit disc, which is easy to sample from. But this method has serious drawbacks when we consider the RJ step. To decrease dimension, the natural map would be to remove one of the roots from the polynomial. But because it is assumed that has real coefficients (otherwise the model has no realistic interpretation), any complex must appear as conjugate pairs. There is then no obvious way to remove a root; a contrived method might be to remove the conjugate pair and replace it with a real root with the same modulus; however, it is unclear how this new polynomial is related to the original, and to other aspects of the process, like ACV.
Reparametrization of and
We therefore propose reparametrization (and ) using the bijection between and (1, 1) advocated by various authors, e.g., and . To our knowledge, these methods have not previously been deployed towards integrating out short-memory components in Bayesian analysis of ARFIMA processes.
defined a mapping recursively as follows: Then set for 1, …, . The reverse recursion is given by Note that . Moreover, if 1, the two parametrizations are the same, i.e., (consequently the brief study of ARFIMA(1,,0) in Sect. fits in this framework). The equivalent parametrized form for is . The full memory parameter is parametrized as (, ) (the image of ). However, recall that in practice, will be assumed equivalent to , so the parameter space is effectively (, ) (1, 1).
Besides mathematical convenience, this bijection has a very useful property
Application of RJ MCMC to ARFIMA(,,) processes
We now use this reparametrization to efficiently propose new parameter values. Firstly, it is necessary to propose a new memory parameter while keeping the model fixed. Attempts at updating each component individually suffer from the same problems of excessive posterior correlation that were encountered in Sect. . Therefore, the simultaneous update of the entire ( 1)-dimensional parameter is performed using the hypercuboid-truncated Gaussian distribution from definition , , where defines the -dimensional rectangle. The covariance matrix is discussed in some detail below. The choice of prior is arbitrary. used a uniform prior for , which has an explicit expression in the parametrization . However, their expression is unnecessarily complicated since a uniform prior over holds no special interpretation. We therefore prefer uniform prior over : 1, .
Now consider the between-model transition. We must first choose a model prior . A variety of priors are possible; the simplest option would be to have a uniform prior over , but this would of course be improper. We may in practice want to restrict the possible values of , to 0 and 0 for some , (say 5), which would render the uniform prior proper. However, even in this formulation, a lot of prior weight is being put onto (larger) more complicated models that, in the interests of parsimony, might be undesired. As a simple representative of potential priors that give greater weight to smaller models, we prefer a truncated joint Poisson distribution with parameter : , , .
Now, denote the probability of jumping from model to model by . could allocate non-zero probability for every model pair, but for convenience we severely restrict the possible jumps (while retaining irreducibility) using a two-dimensional bounded birth and death process. Consider the subgraph of : : 0 , 0 , and allocate uniform non-zero probability only to neighboring values, i.e., if and only if 1. Each point in the body of has four neighbors, each point on the line boundaries has three, and each of the four corner points has only two neighbors. Therefore, the model transition probabilities are either , , , or 0.
Now suppose the current ( 3)-dimensional parameter is , given by (, , , , ), using a slight abuse of notation. Because the mathematical detail of the AR and MA components are almost identical, we consider only the case of decreasing/increasing by 1 here; all of the following remains valid if is replaced by , and replaced by . We therefore seek to propose a parameter (, , , , ), that is somehow based on . We further simplify by regarding the other three parameters (, , and ) as having the same interpretation in every model, choosing , and . For simplicity we also set . Now consider the map . To specify a bijection, we match dimensions by adding in a random scalar . The most obvious map is to specify , so that its support is the interval (1, 1) and then set: (, ). The corresponding map for decreasing the dimension is (, …, ). We either add, or remove the final parameter, while keeping all others fixed with the identity map, so the Jacobian is unity. The proposal can be made in many ways – we prefer the simple (1,1). With these choices the RJ acceptance ratio is which applies to both increasing and decreasing dimensional moves.
Posterior outputs; (a) Bayesian estimate values on the axis against the true on the axis, (b) residuals from the Bayesian estimate from the truth against that truth, . Each “x” plotted represts one estimate or residual.
[Figure omitted. See PDF]
Construction of : much of the efficiency of the above scheme, including within- and between-model moves, depends on the choice of , the within-model move RW proposal covariance matrix. We first seek an appropriate , as in Sect. , with a pilot tuning scheme. That matrix is shown on the left below, where we have blocked it out (where each block is a scalar), so that we can extend this idea to the (, ) case in the obvious way – on the right above – where , is a matrix, , is a matrix, etc. If either (or both) , 0 then the relevant blocks are simply omitted. To specify the various sub-matrices, we propose , …, with equal variances, and independently of , , , (and similarly for , …, ). In the context of Eq. (), the following holds true: where the dotted lines indicate further blocking, is a row vector of zeros, and is a zero matrix. This choice of is conceptually simple, computationally easy and preserves the positive definiteness as required (see ).
Empirical illustration and comparison
Here we provide empirical illustrations for the methods above: for classical and Bayesian analysis of long-memory models, and extensions for short memory. To ensure consistency throughout, the location and scale parameters will always be chosen as 0 and 1. Furthermore, unless stated otherwise, the simulated series will be of length 2 1024. This is a reasonable size for many applications; it is equivalent to 85 years of monthly observations. When using the approximate likelihood method we set .
Long memory
Standard MCMC diagnostics were used throughout to ensure, and tune for, good mixing. Because is the parameter of primary interest, the initial values will be chosen to systematically cover its parameter space, usually starting five chains at the regularly spaced points 0.4, 0.2, 0, 0.2, 0.4. Initial values for other parameters are not varied: will start at the sample mean ; at the sample standard deviation of the observed series .
Efficacy of approximate likelihood method
We start with the null case; i.e., how does the algorithm perform when the data are not from a long-memory process? One hundred independent ARFIMA(0,0,0), or Gaussian white noise, processes are simulated, from which marginal posterior means, standard deviations, and credibility interval end points are extracted. Table shows averages over the runs.
The average estimate for each of the three parameters is less than a quarter of a standard deviation away from the truth. Credibility intervals are nearly symmetric about the estimate and the marginal posteriors are, to a good approximation, locally Gaussian (not shown). Upon, applying a proxy credible-interval-based hypothesis test, one would conclude in 98 of the cases that 0 could not be ruled out. A similar analysis for and shows that hypotheses 0 and 1 would each have been accepted 96 times. These results indicate that the 95 % credibility intervals are approximately correctly sized.
Next, consider the more interesting case of 0. We repeat the above experiment except that 10 processes are generated with set to each of 0.45, 0.35, …, 0.45, giving 100 series total. Figure shows a graphical analog of results from this experiment. The plot axes involve a Bayesian residual estimate of , , defined as , where is the Bayesian estimate of .
Posterior outputs: (a) Bayesian estimated standard deviation against true values; (b) Bayesian estimated mean against ; and (c) uncertainty in the posterior for , the standard deviation against (semi-log scale). Each “x” plotted corresponds to an estimate.
[Figure omitted. See PDF]
Posterior summary statistics for an ARFIMA(0,0,0) process. Results are based on averaging over 100 independent ARFIMA(0,0,0) simulations for the long-memory parameter , mean and noise variance .
Mean | SD | 95 % CI | ||
---|---|---|---|---|
0.006 | 0.025 | 0.042 | 0.055 | |
0.004 | 0.035 | 0.073 | 0.063 | |
1.002 | 0.022 | 0.956 | 1.041 |
From the figure is clear that the estimator for is performing well. Figure a shows how tight the estimates of are around the input value – recall that the parameter space for is the whole interval (, ). Moreover, Fig. b indicates that there is no significant change of posterior bias or variance as is varied.
Next, the corresponding plots for the parameters and are shown in Fig. . We see from Fig. a that the estimate of also appears to be unaffected by the input value . The situation is different however in Fig. b for the location parameter . Although the bias appears to be roughly zero for all , the posterior variance clearly is affected by . To ascertain the precise functional dependence, consider Fig. c, which shows, on a semi-log scale, the marginal posterior standard deviation of , against .
It appears that the marginal posterior standard deviation is a function of ; specifically, , for some . The constant could be estimated via least-squares regression. Instead however, inspired by asymptotic results in literature concerning classical estimation of long-memory processes , we set and plotted the best-fitting such line (shown in Fig. c). Observe that, although not fitting exactly, the relation holds reasonably well for (, ). Indeed, motivated long memory in this way, and derived asymptotic consistency results for optimum (likelihood-based) estimators and found indeed that the standard error for is proportional to but the standard errors of all other parameters are proportional to .
Posterior outputs from an ARFIMA(0,0,0) series: (a) the posterior standard deviation in , against the sample size ; (b) posterior standard deviation in , against ; and (c) against (log–log scale).
[Figure omitted. See PDF]
Table: mean difference of estimates under alternative prior assumption. Plots: comparison of posteriors (solid lines) obtained under different priors (dotted lines). Time series used: ARFIMA(0,0.25,0) – (a) 2 128, (b) 2 1024.
[Figure omitted. See PDF]
Effect of varying time series length
We now analyze the effect of changing the time series length. For this we conduct a similar experiment but fix 0 and vary . The posterior statistics of interest are the posterior standard deviations , , and . For each 128 2, 2, …, 2 16 384, 10 independent ARFIMA(0,0,0) time series are generated. The resulting posterior standard deviations are plotted against (on log–log scale) in Fig. .
Observe that all three marginal posterior standard deviations are proportional to , although the posterior of is less reliable. Combining these observations with our earlier deduction that , we conclude that for an ARFIMA(0,,0) process of length , the marginal posterior standard deviations follow those of .
Comparison with common estimators
In many practical applications, the long-memory parameter is estimated using
non-/semi-parametric methods. These may be appropriate in many situations,
where the exact form of the underlying process is unknown. However, when a
specific model form is known (or at least assumed) they tend to perform
poorly compared with fully parametric alternatives . Our
aim here is to demonstrate, via a short Monte Carlo study involving
ARFIMA(0,,0) data, that our Bayesian likelihood-based method
significantly outperforms other common methods in that case. We consider the
following comparators: (i) rescaled adjusted range, or
– we use the R implementation in the
Each of these four methods will be applied to the same 100 time series with varying as were used earlier experiments above. We extend the idea of a residual, , , , and , to accommodate the new comparators, respectively, and plot them against in Fig. .
Observe that all four methods have a much larger variance than our Bayesian method, and moreover the is positively biased. Actually, the bias in some cases would seem to depend on : is significantly (i.e., 0.25) biased for 0.3 but slightly negatively biased for 0.3 (not shown); DFA is only unbiased for 0; both the GPH and wavelet methods are unbiased for all (, ).
Extensions for short memory
Known form: we first consider the MCMC algorithm from Sect. for sampling under an ARFIMA(1,,0) model where the full memory parameter is (, ). Recall that method involved proposals from a hypercuboid MVN using a pilot-tuned covariance matrix. Also recall that it is a special case of the reparametrized method from Sect. .
In general, this method works very well; two example outputs are presented in Fig. , under two similar data-generating mechanisms.
Comparison of Bayesian estimator with common classical estimators: (a) , (b) GPH, (c) DFA, and (d) wavelet.
[Figure omitted. See PDF]
Posterior samples of (, ): input time series (a) (1 0.92)(1 , (b) (1 0.83)(1 .
[Figure omitted. See PDF]
Spectra for processes in Fig. . Green line is relevant ARMA(1,0) process, red line is relevant ARFIMA(0,,0) process, black line is ARFIMA(1,,0) process: (a) (1 0.92)(1 ; (b) (1 0.83)(1 . (c) Shows posterior samples of (, ) from series considered in (b) with credibility sets: red is 95 % credibility set for (, ), green is 95 % credibility interval for , blue is 95 % credibility interval for .
[Figure omitted. See PDF]
Figure a shows relatively mild correlation ( 0.21) compared with Fig. b, which shows strong correlation ( 0.91). This differential behavior can be explained heuristically by considering the differing data-generating values. For the process in Fig. a the short-memory and long-memory components exhibit their effects at opposite ends of the spectrum; see Fig. a. The resulting ARFIMA spectrum, with peaks at either end, makes it easy to distinguish between short and long-memory effects, and consequently the posteriors of and are largely uncorrelated. In contrast, the parameters of the process in Fig. b express their behavior at the same end of the spectrum. With negative these effects partially cancel each other out, except very near the origin where the negative memory effect dominates; see Fig. b. Distinguishing between the effects of and is much more difficult in this case; consequently the posteriors are much more dependent.
Marginal posterior density of from series in Fig. , (a, b). Solid line is density obtained using reversible-jump algorithm. Dotted line is density obtained using fixed 1 and 0. The true values are 0.25 and 0.35, respectively. (c, d) Shows the posterior densities for and , respectively, corresponding to the series in Fig. a; those for Fig. b look similar. The true values are 0 and 1. True values are marked by an X.
[Figure omitted. See PDF]
In cases where there is significant correlation between and , it arguably makes little sense to consider only the marginal posterior distribution of . For example the 95 % credibility interval for from Fig. b is (0.473, 0.247), and the corresponding interval for is (0.910, 0.753), yet these clearly give a rather pessimistic view of our joint knowledge about and ; see Fig. c. In theory an ellipsoidal credibility set could be constructed although this is clearly less practical when 2.
Unknown form: the RJ scheme outlined in Sect. works well for data simulated with and up to 3. The marginal posteriors for are generally roughly centered around (the data-generating value) and the modal posterior model probability is usually the correct one. To illustrate, consider again the two example data-generating contexts used above.
For both series, kernel density for the marginal posterior for are plotted in Fig. a and b, together with the equivalent density estimated assuming unknown model orders.
Notice how the densities obtained via the RJ method are very close to those obtained assuming 1 and 0. The former are slightly more heavy tailed, reflecting a greater level of uncertainty about . Interestingly, the corresponding plots for the posteriors of and do not appear to exhibit this effect; see Fig. c and d. The posterior model probabilities are presented in Table , showing that the correct modes are being picked up consistently.
Posterior model probabilities for time series from Figs. a, b and a, b for the autoregressive parameter and moving average parameter . Bold numbers denote the true model.
0 | 1 | 2 | 3 | 4 | 5 | Marginal | |
---|---|---|---|---|---|---|---|
(a) | |||||||
0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
1 | 0.805 | 0.101 | 0.003 | 0.000 | 0.000 | 0.000 | 0.908 |
2 | 0.038 | 0.043 | 0.001 | 0.000 | 0.000 | 0.000 | 0.082 |
3 | 0.005 | 0.004 | 0.000 | 0.000 | 0.000 | 0.000 | 0.009 |
4 | 0.000 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.001 |
5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Marginal | 0.848 | 0.148 | 0.004 | 0.000 | 0.000 | 0.000 | |
(b) | |||||||
0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
1 | 0.829 | 0.125 | 0.002 | 0.000 | 0.000 | 0.000 | 0.956 |
2 | 0.031 | 0.013 | 0.000 | 0.000 | 0.000 | 0.000 | 0.044 |
3 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
4 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Marginal | 0.860 | 0.138 | 0.002 | 0.000 | 0.000 | 0.000 |
Marginal posterior densities (a) , (b) from the model Eq. ().
[Figure omitted. See PDF]
As a test of the robustness of the method, consider a complicated short-memory input combined with a heavy-tailed -stable innovations
distribution. Specifically, the time series that will be used is the
following ARFIMA(2,,1) process
For more details, see
Performance looks good despite the complicated structure. The posterior estimate for is 0.22, with 95 % CI (0.04, 0.41). Although this interval is admittedly rather wide, it is reasonably clear that long memory is present in the signal. The corresponding interval for is (1.71, 1.88) with estimate 1.79. Finally, we see from Table that the algorithm is very rarely in the wrong model.
Observational data analysis
We conclude with the application of our method to two long data sets: the Nile
water level minima data and the CET. The
Nile data are part of the R package “longmemo” and the CET time series
can be downloaded from
Posterior model probabilities based on simulations of model Eq. () for the autoregressive parameter and moving average parameter . Bold numbers denote the true model.
0 | 1 | 2 | 3 | 4 | 5 | Marginal | |
---|---|---|---|---|---|---|---|
0 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
1 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
2 | 0.000 | 0.822 | 0.098 | 0.001 | 0.000 | 0.000 | 0.921 |
3 | 0.014 | 0.056 | 0.004 | 0.000 | 0.000 | 0.000 | 0.075 |
4 | 0.003 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.004 |
5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Marginal | 0.017 | 0.880 | 0.102 | 0.002 | 0.000 | 0.000 |
Annual Nile minima time series.
[Figure omitted. See PDF]
Marginal posterior densities for Nile minima; (a) , (b) .
[Figure omitted. See PDF]
The Nile data
Because of the fundamental importance of the Nile river to the civilizations it has supported, local rulers kept measurements of the annual maximal and minimal heights obtained by the river at certain points (called gauges). The longest uninterrupted sequence of recordings is from the Roda gauge (near Cairo), between AD 622 and 1284 ( 663).
There is evidence
We immediately observe the apparent low frequency component of the data. The data appear to be on the “verge” of being stationary; however, the general consensus amongst the statistical community is that the series is stationary. The posterior summary statistics are presented in Table , density estimates of the marginal posteriors of and are presented in Fig. , and the posterior model probabilities are presented in Table .
Table: summary posterior statistics for Nile minima. Plots: marginal posterior densities for Nile minima – (a) , (b) .
[Figure omitted. See PDF]
Posterior model probabilities for Nile minima time series for the autoregressive parameter and moving average parameter . Bold numbers denote the best fit model.
0 | 1 | 2 | 3 | 4 | 5 | Marginal | |
---|---|---|---|---|---|---|---|
0 | 0.638 | 0.101 | 0.010 | 0.000 | 0.000 | 0.000 | 0.750 |
1 | 0.097 | 0.124 | 0.011 | 0.000 | 0.000 | 0.000 | 0.232 |
2 | 0.007 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.018 |
3 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
4 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Marginal | 0.742 | 0.236 | 0.022 | 0.000 | 0.000 | 0.000 |
Summary posterior statistics for Nile minima time series for the long-memory parameter , mean , and noise variance .
Mean | SD | 95 % CI end points | ||
---|---|---|---|---|
0.402 | 0.039 | 0.336 | 0.482 | |
1158 | 62 | 1037 | 1284 | |
70.15 | 1.91 | 66.46 | 73.97 |
The posterior summary statistics and marginal densities of and for the Nile data are presented in Fig. . Posterior model probabilities are presented in Table . We see that the model with the highest posterior probability is the ARFIMA(0,,0) model with 0.4. This suggests a strong, pure, long-memory feature. Our results compare favorably with other studies .
It is interesting to compare these findings with other literature. used a semi-parametric Bayesian method on the first 512 observations of the sequence and obtained an estimate for of 0.278. used a similar method to to estimate (within an ARFIMA(0,,0) model) at 0.416 with an approximate credibility interval of (0.315, 0.463). similarly found using wavelets 0.379 with a credibility interval of (0.327, 0.427). obtained 0.420. obtained 0.387 with a credibility interval of (0.316, 0.475) using their Bayesian FEXP method.
We note that the interpretation as persistence of the 0.4 ( 0.9) value that we and others have obtained has been challenged by . In his view the analysis should be applied to the increments of the level heights rather than the level heights themselves, giving an anti-persistent time series with a negative value. The need for a short-range-dependent component that he argues for is, however, automatically included in the use of an ARFIMA model. Although ARFIMA was originally introduced in econometrics as a phenomenological model of LM, very recent progress is being made in statistics and physics on building a bridge between it and continuous time linear dynamical systems (see e.g., ).
CET time series (deseasonalized).
[Figure omitted. See PDF]
CET time series; (a) assumed deterministic seasonal component , (b) spectrum of deseasonalized index.
[Figure omitted. See PDF]
Posterior model probabilities for Nile minima time series for the autoregressive parameter and moving average parameter .
0 | 1 | 2 | 3 | 4 | 5 | Marginal | |
---|---|---|---|---|---|---|---|
0 | 0.638 | 0.101 | 0.010 | 0.000 | 0.000 | 0.000 | 0.750 |
1 | 0.097 | 0.124 | 0.011 | 0.000 | 0.000 | 0.000 | 0.232 |
2 | 0.007 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.018 |
3 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
4 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
5 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Marginal | 0.742 | 0.236 | 0.022 | 0.000 | 0.000 | 0.000 |
In conclusion, our findings agree with all published Bayesian long-memory
results (except for the anomalous finding of ). Moreover, these
findings agree with numerous classical methods of analysis
Central England temperature
There is increasing evidence that surface air temperatures posses long memory but long time series are needed to get robust results. The CET index is a famous measure of the monthly mean temperature in an area of southern-central England dating back to 1659 . Given to a precision of 0.5 C prior to 1699 and 0.1 C thereafter, the index is considered to be the longest reliable known temperature record from station data. As expected, the CET exhibits a significant seasonal signal, at least some of which must be considered as deterministic. Following the approach of , the index is first deseasonalized using the additive “STL” method . This deseasonalized CET index is shown in Fig. .
The estimated seasonal function that was removed is shown in Fig. a. The spectrum of the deseasonalized process is shown in Fig. b. denotes the seasonal long-memory parameter. Notice that, in addition to the obvious spectral peak at the origin, there still remains a noticeable peak at the monthly frequency . However, there are no further peaks in the spectrum, which would appear to rule out a seasonal ARFIMA (SARFIMA) model. These observations therefore suggest that a simple two-frequency Gegenbauer(,; process might be an appropriate model. See Appendix for more details about seasonal long memory.
Applying this model, the marginal posterior statistics are presented in Table and the joint posterior samples of (, ) from this model are plotted in Fig. . These clearly indicate that both and are non-zero (albeit small in the case of ) suggesting the presence of long memory in both the conventional and seasonal sense.
Joint posterior samples of (, ) with 95 % credibility set in red for CET time series.
[Figure omitted. See PDF]
CET time series; posterior estimate (solid line) and 95 % credibility interval (dotted line) for four blocks (black) and whole index (red) for (a) , (b) .
[Figure omitted. See PDF]
Posterior summary statistics for CET index for the long-memory parameter , seasonal long-memory parameter , mean , and noise variance .
Mean | SD | 95 % CI end points | ||
---|---|---|---|---|
0.209 | 0.013 | 0.186 | 0.235 | |
0.040 | 0.011 | 0.018 | 0.062 | |
9.266 | 0.144 | 9.010 | 9.576 | |
1.322 | 0.015 | 1.294 | 1.353 |
In order to compare these results with other publications', it is important to note that to remove annual seasonality from the CET, the series of annual means is often used instead of the monthly series. This of course reduces the fidelity of the analysis. found (using rather crude estimation procedures) that the best-fitting model for the annual means of the CET was the ARFIMA(1,0.33,0) model with 0.16. used the same series as test data for their Bayesian method; they fitted each of the ARFIMA models with , 1 and found that all models were suitable. Their estimates of ranged from 0.24 for 0 to 0.34 for 0, 1.
Of course all these studies assume the time series is stationary and, in particular, has a constant mean. The validity of this assumption was considered by who used formal hypothesis testing to consider models: where is an ARFIMA(0,,0) process. For values of 0, 0.05, 0.10, 0.15, was found to be significantly non-zero (at about 0.23 C per century) but for 0.20, statistical significance was not found. later extended this work by replacing the ARFIMA(0,,0) process in Eq. () with a Gegenbauer(;) process to obtain similar results. However, choice of was rather ad hoc, likely influencing the results.
In order to consider the stationarity of the time series, we divided the series up into four blocks of length 1024 months (chosen to maximize efficiency of the fast Fourier transform) and analyzed each block independently. The posterior statistics for each block are presented in Table with some results presented graphically in Fig. .
Posterior summary statistics for four blocks of CET index for the long-memory parameter , seasonal long-memory parameter , mean , and noise variance .
Mean | SD | 95 % CI end points | |||
---|---|---|---|---|---|
1659–1744 | 0.277 | 0.026 | 0.231 | 0.332 | |
0.054 | 0.022 | 0.013 | 0.097 | ||
9.036 | 0.347 | 8.332 | 9.702 | ||
1.217 | 0.027 | 1.167 | 1.271 | ||
1744–1829 | 0.204 | 0.028 | 0.151 | 0.259 | |
0.017 | 0.023 | 0.028 | 0.063 | ||
9.107 | 0.216 | 8.671 | 9.533 | ||
1.348 | 0.031 | 1.290 | 1.409 | ||
1829–1914 | 0.172 | 0.027 | 0.118 | 0.223 | |
0.036 | 0.022 | 0.010 | 0.076 | ||
9.172 | 0.168 | 8.859 | 9.517 | ||
1.364 | 0.030 | 1.312 | 1.429 | ||
1914–2000 | 0.163 | 0.027 | 0.108 | 0.213 | |
0.063 | 0.022 | 0.023 | 0.109 | ||
9.591 | 0.152 | 9.314 | 9.906 | ||
1.348 | 0.030 | 1.291 | 1.406 |
It is interesting to note that the degree of (conventional) long memory is roughly constant over the last three blocks but appears to be larger in the first block. Of particular concern is that there is no value of that is included in all four 95 % credibility intervals; this would suggest non-stationarity. Although this phenomenon may indeed have a physical explanation, it is more likely caused by the inhomogeneity of the time series. Recall that the first 50 years of the index are only given to an accuracy of 0.5 C compared to 0.1 C afterwards; this lack of resolution clearly has the potential to bias in favor of strong autocorrelation when compared with later periods.
Interestingly, the seasonal long-memory parameter has 95 % credibility intervals that include zero for the both the second and third blocks. Finally, note that the 95 % credibility intervals for all include the range (9.314, 9.517), in other words it is entirely credible that the mean is non-varying over the time period.
Conclusions
We have provided a systematic treatment of efficient Bayesian inference for ARFIMA models, the most popular parametric model combining long- and short-memory effects. Through a mixture of theoretical and empirical work we have demonstrated that our method can handle the sorts of time series data with possible long memory that we are typically confronted with.
Many of the choices made throughout, but in particular those leading to our
likelihood approximation, stem from a need to accommodate further extension.
For example, in future work we intend to extend them to cope with
heavy-tailed innovation distributions. For more evidence of potential in this
context, see
Finally, an advantage of the Bayesian approach is that it provides a natural mechanism for dealing with missing data, via data augmentation. This is particularly relevant for long historical time series, which may, for a myriad of reasons, have recording gaps. For example, some of the data recorded at other gauges along the Nile have missing observations although otherwise span a similarly long time frame. For a demonstration of how this might fit within our framework, see Sect. 5.6 of .
ARFIMA model
We define an autocovariance ACV of a weakly stationary process as Cov( is the lag-covariance matrix. The (normalized) ACF is defined as . A stationary process is said to be causal if there exists a sequence of coefficients , with finite total mean square such that for all , a given member of the process can be expanded as a power series in the backshift operator acting on the innovations, : The innovations are a white (i.e., stationary, zero mean, iid) noise process with variance . Causality specifies that for every , can only depend on the past and present values of the innovations .
A process is said to be an autoregressive process of order , AR(), if for all : AR() processes are invertible, stationary and causal if and only if 0 for all such that 1. is said to be a moving average process on the order of , MA(), if for all .
Many authors define 1 . Our version emphasizes connections between and Eqs. ()–().
MA() processes are stationary and causal, and are invertible if and only if 0 for all such that 1. A natural extension of the AR and MA classes arises by combining them .The process is said to be an ARMA process of orders and , ARMA(,), if for all : Although there is no simple closed form for the ACV of an ARMA process with arbitrary and , so long as the process is causal and invertible, then , for 0; i.e., it decays exponentially fast. In other words, although correlation between nearby points may be high, dependence between distant points is negligible.
Before turning to long memory, we require one further result. Under some
extra conditions, stationary processes with ACV possess a
SDF defined such that
,
. This can be inverted to obtain an explicit expression for
the SDF
Since ACV of a stationary process is an even function of lag, the above equation implies that the associated SDF is an even function. One therefore only needs to be interested positive arguments: 0 .
Finally, the SDF of an ARMA process isFor an ARFIMA process (Eq. ) the restriction is necessary to ensure stationarity; clearly if the ACF would not decay. The continuity between stationary and non-stationary processes around is similar to those that occur for the AR(1) process with 1 (such processes are stationary for 1, but the case 1 is the non-stationary random walk).
There are a number of alternative definitions of LM, one of which is particularly useful, as it considers the frequency domain: a stationary process has long memory when its SDF follows , as 0 for some positive constant , and where 0 .
The simplest way of creating a process that exhibits long memory is through the SDF. Consider 1 , where 0 . By simple algebraic manipulation, this is equivalently , from which we deduce that as 0. Therefore, assuming stationarity, the process that has this SDF (or any scalar multiple of it) is a long-memory process. More generally, a process having spectral density is called fractionally integrated with memory parameter , Fractionally Integrated FI() with memory parameter . The full trichotomy of negative, short, and long memory is determined solely by .
In practice this model is of limited appeal to time series analysts because the entire memory structure is determined by just one parameter, . One often therefore generalizes it by taking any short-memory SDF , and defining a new SDF: , 0 . An obvious class of short-memory processes to use this way is ARMA. Taking from Eq. () yields so-called autoregressive fractionally integrated moving average process with parameter , and orders and (ARFIMA(,,)), having SDF: Choosing 0 recovers FI() ARFIMA(0,,0).
Practical utility from the perspective of (Bayesian) inference demands
finding a representation in the temporal domain. To obtain this, consider the
operator (1 for real 1, which is formally defined using
the generalized form of the binomial expansion
Finally, to connect back to our first definition of long memory, consider the ACV of the ARFIMA(0,,0) process. By using the definition of spectral density to directly integrate Eq. (), and an alternative expression for in Eq. () one can obtain the following representation of the ACV of the ARFIMA(0,,0) process: Because the parameter is just a scalar multiplier, we may simplify notation by defining , whereby ; 1). Then the ACF is from which Stirling's approximation gives , confirming a power-law relationship for the ACF. Finally, note that Eq. () can be used to represent ARFIMA(0,,0) as an AR() process, as . Furthermore, noting that in this case leads to the following MA() analog: .
Seasonal long-memory models
We define a seasonal differencing operator (1 ), as a natural extension to a SARFIMA processes by combining seasonal and non-seasonal fractional differencing operators :
The generalization to include both seasonal and non-seasonal short-memory components is obvious :
Focusing on the first of these issues, considered generalising the ARFIMA(0,,0) process in a different manner by retaining only one pole but at any given frequency in [0, ]. The model he suggested was later studied and popularized by and , and became known as the “Gegenbauer process”.
A process is a Gegenbauer (; ) process if for all : where is called the Gegenbauer frequency. The obvious extension to include short-memory components and is denoted GARMA(,,;).
The term “Gegenbauer” derives from the close relationship to the Gegenbauer polynomials, a set of orthogonal polynomials useful in applied mathematics. The Gegenbauer polynomials are most usefully defined in terms of their generating function. The Gegenbauer polynomial on the order of with parameter , satisfies
The spectral density function of the Gegenbauer(;) process is
Note that Gegenbauer(;) processes possess a pole at the Gegenbauer frequency . Gegenbauer processes may be considered to be somewhat ambiguous in terms of long memory. Non-trivial (i.e., 0) Gegenbauer processes have bounded spectral density functions at the origin, and therefore do not have long memory according to our strict definition. Consequently a more general Gegenbauer process was developed: let (, …, ) and (, …, ), and for all , (assumed distinct). Then a process is a -factor Gegenbauer(;) process if for all :
The spectral density function of the -factor Gegenbauer(;) process is
Indeed, -factor Gegenbauer models are very flexible, and include nearly all other seasonal variants of ARFIMA processes such as the flexible-seasonal ARFIMA and fractional ARUMA processes. Importantly, they also includes SARFIMA processes : a SARFIMA(0,,0) (0,,0) process is equivalent to a factor Gegenbauer(;) process where: and , for 2, …, , unless is even in which case .
Although -factor Gegenbauer models are very general, one particular sub-model is potentially very appealing. This is the two-factor model, with one pole at the origin and one at a non-zero frequency. In order to conform with notation for ARFIMA(0,,0) processes, we will slightly re-define this model: a process is a simple two-frequency Gegenbauer process with parameters , , and , denoted Gegenbauer(,; if for all : The Bayesian MCMC methodology developed here is easily extended to incorporate these seasonal fractional models. It is assumed that the frequency , or seasonal period , is a priori known.
Acknowledgements
We thank one anonymous reviewer and M. Crucifix for their comments, which helped to improve this manuscript. C. L. E. Franzke is supported by the German Research Foundation (DFG) through the cluster of excellence CliSAP (EXC177), N. W. Watkins is supported by ONR NICOP grant N62909-15-1-N143, and both are supported by the Norwegian Research Council KLIMAFORSK project 229754. N. W. Watkins thanks the University of Potsdam for hospitality. Edited by: Z. Toth Reviewed by: M. Crucifix and another anonymous referee
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2015. This work is published under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Many geophysical quantities, such as atmospheric temperature, water levels in rivers, and wind speeds, have shown evidence of long memory (LM). LM implies that these quantities experience non-trivial temporal memory, which potentially not only enhances their predictability, but also hampers the detection of externally forced trends. Thus, it is important to reliably identify whether or not a system exhibits LM. In this paper we present a modern and systematic approach to the inference of LM. We use the flexible autoregressive fractional integrated moving average (ARFIMA) model, which is widely used in time series analysis, and of increasing interest in climate science. Unlike most previous work on the inference of LM, which is frequentist in nature, we provide a systematic treatment of Bayesian inference. In particular, we provide a new approximate likelihood for efficient parameter inference, and show how nuisance parameters (e.g., short-memory effects) can be integrated over in order to focus on long-memory parameters and hypothesis testing more directly. We illustrate our new methodology on the Nile water level data and the central England temperature (CET) time series, with favorable comparison to the standard estimators. For CET we also extend our method to seasonal long memory.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 URS Corporation, London, UK
2 The University of Chicago, Booth School of Business, Chicago, IL, USA
3 Meteorological Institute and Center for Earth System Research and Sustainability (CEN), University of Hamburg, Hamburg, Germany
4 Centre for the Analysis of Time Series, London School of Economics and Political Science, London, UK; Centre for Fusion Space and Astrophysics, University of Warwick, Coventry, UK; Max Planck Institute for the Physics of Complex Systems, Dresden, Germany; Faculty of Mathematics, Computing and Technology, Open University, Milton Keynes, UK