A comparison of two causal methods in the context

Full text

Turn on search term navigation

1 Introduction

One of the most commonly used methodologies to identify potential relationships between variables in climate research is correlation, with or without a lag (or time delay). For example, used an approach based on lead–lag correlations between sea-surface temperature (SST) and turbulent heat flux to discriminate between atmospheric-driven and ocean-led variability using both a stochastic energy balance model and satellite observations at monthly timescale. In another study, found a systematic large anticorrelation between Arctic sea-ice area and northward ocean heat transport in climate models at different resolutions, which confirmed previous observational findings showing that the latter is a driver of the former . Another example is the modeling analysis from , who used a regression analysis to quantify the dynamical and thermodynamical contributions to the ocean heat content tendency at the global scale.

However, such correlation (or linear regression) approaches, despite being useful for identifying potential relationships between variables, do not imply causation. A significant correlation simply means that there is a relationship, or synchronous behavior, between two variables without explicitly confirming a causal link between the two. Correlation suffers from five key limitations. First, a significant correlation between variables could appear by chance (that is called “random coincidence”). Second, the correlation does not allow us to identify the direction of the potential causal link, so this approach supposes an a priori knowledge of processes at play. The problem of directional dependence is often coped with by using lagged correlation or regression, but this method is susceptible to overstate causal relationships when one variable has significant memory . Third, there could be an external (hidden) variable (sometimes referred to as a “confounding variable”) that influences two correlated variables, as demonstrated in , and a simple correlation analysis would not allow for disentangling these causal links. Fourth, linear correlation cannot identify possible nonlinear relationships. Lastly, the correlation is computed for pairs of variables and does not consider multivariate frameworks.

Hence, causal methods prove to be very useful. provide a detailed review of selected causal inference frameworks applied to Earth system sciences. Some of these methods are briefly described hereafter. Granger causality has been the first formalization of causality to time series and is based on autoregressive modeling . It has been used in a series of climate studies, including several analyses focusing on air–sea interactions . Convergent cross mapping (CCM) attempts to uncover causal relationships based on Takens' theorem and nonlinear state-space reconstruction . For example, CCM has been used for analyzing the temperature–CO $_{2}$ relationship over glacial–interglacial timescales , the causal dependencies between different ocean basins , and the stratosphere–troposphere coupling . Transfer entropy and conditional mutual information (CMI; ) are also two widely used causal methods. have used a computationally fast alternative of transfer entropy, called pseudo-transfer entropy, to quantify causal dependencies between 13 climate indices representing large-scale climate patterns.

The Peter and Clark momentary conditional independence (PCMCI) method is a causal discovery method based on the Peter and Clark (PC) algorithm , combined with the momentary conditional independence (MCI) approach . It is based on the systematic exploitation of partial correlations, conditional mutual information, or any other conditional dependency measure. PCMCI has been used, for example, to analyze Arctic drivers of midlatitude winter circulation , relationships between Niño3.4 and extratropical air temperature over British Columbia , tropical and midlatitude drivers of the Indian summer monsoon , predictors for seasonal Atlantic hurricane activity , and interactions between tropical convection and midlatitude circulation .

The Liang–Kleeman information flow (LKIF; ) is based on the rate of information transfer in dynamical systems and has been rigorously derived from the propagation of information entropy between variables . This method has been applied to several climate studies, including the El Niño–Indian Ocean Dipole (IOD) link , the relationship between carbon dioxide and air temperature , dynamical dependencies between a set of observables and the Antarctic surface mass balance , identification of potential drivers of Arctic sea-ice changes , causal links between climate indices in the North Pacific and Atlantic regions and local Belgian time series , and ocean–atmosphere interactions .

Commonly, each study focuses on only one causal method. However, contradictory results might appear when using different causal methods, and it is thus important to compare them. Several studies have investigated differences between causal methods. One of the most comprehensive studies in this respect in the recent past is the intercomparison of , in which the authors compared six causal methods, namely, Granger causality, two extended versions of Granger causality, CMI, CCM, and predictability improvement . They used seven artificial datasets based on coupled systems. A key outcome of their analysis is that there is no single best causal method as results depend on the intrinsic characteristics of the used dataset. found that for simple autoregressive models, Granger causality and its extensions were the best tools to identify the right causal links, while CCM and predictability improvement failed. On the contrary, for more complex systems, Granger causality and its extensions failed, while the remaining methods were more successful, although they differed considerably in their ability to detect the presence and direction of coupling. showed that the Granger causality principle, that the cause precedes the effect, was violated in coupled chaotic dynamical systems using CMI, CCM, and predictability improvement. used CMI and CCM and showed that the detection of coupling delays in coupled nonlinear dynamical systems was challenging. compared CMI with LKIF and interventional causality , and they confirmed a robust influence of solar wind on geomagnetic indices using all causal methods. An advantage of interventional causality compared to other causal methods is the detection of indirect causal links (i.e., if $x$ influences $y$ and $y$ drives $z$ , then the indirect influence from $x$ to $z$ will be recovered).

The main goal of this study is to provide a detailed comparison between two independent causal methods, namely, LKIF and PCMCI, which have been widely used in the context of the JPI-Climate/JPI-Oceans ROADMAP project (Role of ocean dynamics and Ocean-Atmosphere interactions in Driving cliMAte variations and future Projections of impact-relevant extreme events; https://jpi-climate.eu/project/roadmap/, last access: 21 February 2024) and have never been methodically compared together before. In this analysis, we use these two methods in the same framework to allow for a fair comparison. We also compute the correlation coefficient to show the superiority of causal methods compared to a classical correlation analysis. In particular, we use four different artificial models with an increasing level of complexity and one real-world case study based on climate indices. These different datasets are described in Sect. , and our two causal methods are presented in Sect. . Results of our comparison are presented in Sect. , and a discussion is provided in Sect. , before concluding in Sect. .

2 Data

In order to apply the two causal methods described below (Sect. ), we use three different stochastic models (including two linear models and one nonlinear model), one deterministic nonlinear model , and one real-world case study using climate indices in the Atlantic and Pacific regions. This allows us to test LKIF and PCMCI with an increasing level of complexity (from a simple two-dimensional model to a real-world case study).

2.1 Two-dimensional (2D) model

We first consider a two-dimensional (2D) stochastic linear model (Eq. 12 in ):

1 $\begin{aligned} d x_{1} & = (- x_{1} + 0.5 x_{2}) d t + 0.1 d w_{1}, \\ d x_{2} & = - x_{2} d t + 0.1 d w_{2}, \end{aligned}$ where $x_{1}$ and $x_{2}$ are the two variables, $t$ is time, and $w_{1}$ and $w_{2}$ represent standard Wiener processes in $x_{1}$ and $x_{2}$ , respectively ( $w_{k, t + Δ t} - w_{k, t} \sim \sqrt{Δ t} N$ (0,1), with $N$ (0,1) being a normal distribution with zero mean and unit variance). In this simple system, $x_{2}$ drives $x_{1}$ but not vice versa (Fig. f).

We solve this system with the Euler–Maruyama method using a time step $Δ t = 0.001$ and 1000 unit times, which brings 10 $^{6}$ time steps. We initialize the system with $x_{1} (0) = 1$ and $x_{2} (0) = 2$ . For our analysis, we discard the first 10 unit times (first 10 $^{4}$ time steps), which is considered to be our spin-up period.

2.2 Six-dimensional (6D) model

Then, we investigate a six-dimensional (6D) stochastic linear vector autoregressive (VAR) model with only one lag (Eq. 21 in ):

2 $\begin{aligned} x_{1, t + 1} & = 0.1 - 0.6 x_{3, t} + u_{1, t + 1}, \\ x_{2, t + 1} & = 0.7 - 0.5 x_{1, t} + 0.8 x_{6, t} + u_{2, t + 1}, \\ x_{3, t + 1} & = 0.5 + 0.7 x_{2, t} + u_{3, t + 1}, \\ x_{4, t + 1} & = 0.2 + 0.7 x_{4, t} + 0.4 x_{5, t} + u_{4, t + 1}, \\ x_{5, t + 1} & = 0.8 + 0.2 x_{4, t} + 0.7 x_{6, t} + u_{5, t + 1}, \\ x_{6, t + 1} & = 0.3 - 0.5 x_{6, t} + u_{6, t + 1}, \end{aligned}$ where $x_{k}$ ( $k = 1, \dots, 6$ ) represents the six variables, and $u_{k}$ represents normal random noises in these six variables ( $u_{k} \sim N$ (0,1)). By construction, we have two directed cycles, i.e., $x_{1} \to x_{2} \to x_{3} \to x_{1}$ and $x_{4} \to x_{5} \to x_{4}$ , and these cycles are driven by a common cause, i.e., $x_{6}$ , which drives both $x_{2}$ and $x_{5}$ (Fig. d).

We solve this system using 10 $^{6}$ time steps ( $Δ t = 1$ ). For our analysis, we discard the first 10 $^{4}$ time steps.

2.3 Nine-dimensional (9D) model

The next model is a nine-dimensional (9D) stochastic nonlinear VAR system with a maximum of four lags (Eq. 17 in ):

3 $\begin{aligned} x_{1, t} & = 3.4 x_{1, t - 1} (1 - x_{1, t - 1}^{2}) e^{- x_{1, t - 1}^{2}} + 2.5 x_{2, t - 4} \\ + 1.8 x_{3, t - 2} + 1.5 x_{4, t - 2} + 0.4 u_{1, t}, \\ x_{2, t} & = 3.4 x_{2, t - 1} (1 - x_{2, t - 1}^{2}) e^{- x_{2, t - 1}^{2}} + 0.4 u_{2, t}, \\ x_{3, t} & = 3.4 x_{3, t - 1} (1 - x_{3, t - 1}^{2}) e^{- x_{3, t - 1}^{2}} + 0.25 x_{1, t - 1} \\ + 0.4 u_{3, t}, \\ x_{4, t} & = 3.4 x_{4, t - 1} (1 - x_{4, t - 1}^{2}) e^{- x_{4, t - 1}^{2}} + 1.5 x_{5, t - 3} \\ + 1.2 x_{6, t - 1} + 0.4 u_{4, t}, \\ x_{5, t} & = 3.4 x_{5, t - 1} (1 - x_{5, t - 1}^{2}) e^{- x_{5, t - 1}^{2}} + 0.4 u_{5, t}, \\ x_{6, t} & = 3.4 x_{6, t - 1} (1 - x_{6, t - 1}^{2}) e^{- x_{6, t - 1}^{2}} + 1.5 x_{7, t - 3} \\ + 0.4 u_{6, t}, \\ x_{7, t} & = 3.4 x_{7, t - 1} (1 - x_{7, t - 1}^{2}) e^{- x_{7, t - 1}^{2}} + 0.4 u_{7, t}, \\ x_{8, t} & = 3.4 x_{8, t - 1} (1 - x_{8, t - 1}^{2}) e^{- x_{8, t - 1}^{2}} + 0.8 x_{7, t - 1} \\ + 0.4 u_{8, t}, \\ x_{9, t} & = 3.4 x_{9, t - 1} (1 - x_{9, t - 1}^{2}) e^{- x_{9, t - 1}^{2}} + 1.8 x_{7, t - 1} \\ + 0.4 u_{9, t}, \end{aligned}$ where $x_{k}$ ( $k = 1, \dots, 9$ ) represents the nine variables, $e$ is the exponential function, and $u_{k}$ represents normal random noises in these nine variables ( $u_{k} \sim N$ (0,1)). This system contains a directed chain $x_{7} \to x_{6} \to x_{4} \to x_{1} \to x_{3}$ and a fork, i.e., $x_{7}$ driving $x_{6}$ , $x_{8}$ , and $x_{9}$ . There are also two colliders, with $x_{5}$ and $x_{6}$ both affecting $x_{4}$ on the one hand, and $x_{2}$ , $x_{3}$ , and $x_{4}$ driving $x_{1}$ on the other hand (Fig. d). A particularity of this system compared to the 6D model (Eq. ) is the presence of lags larger than one.

We solve this system using 10 $^{6}$ time steps ( $Δ t = 1$ ). For our analysis, we discard the first 10 $^{4}$ time steps.

2.4 Lorenz (1963) model

We also use the three-dimensional (3D) model, which is deterministic, nonlinear, and non-periodic; it is a simplified model representing atmospheric convection:

4 $\begin{aligned} \frac{d x}{d t} & = 10 (y - x), \\ \frac{d y}{d t} & = 28 x - y - x z, \\ \frac{d z}{d t} & = x y - \frac{8}{3} z, \end{aligned}$ where $x$ , $y$ , and $z$ are the three variables and are proportional to the convection intensity, the horizontal temperature variation and the vertical temperature variation, respectively. We use the standard parameters of the model.

We solve the model using the fourth-order Runge–Kutta scheme, a time step $Δ t = 0.01$ , and 1000 unit times, which brings 10 $^{5}$ time steps. We initialize the system with $x (0) = 0$ , $y (0) = 1$ , and $z (0) = 0$ . For our analysis, we discard the first 100 unit times (first 10 $^{4}$ time steps; the spin-up period).

2.5 Climate indices

Finally, we use eight different regional climate indices affecting the Atlantic and Pacific regions of especially the Northern Hemisphere, following a similar approach as and . Four of these indices are based on atmospheric variables and four of them are based on oceanic ones. Time series of these indices were retrieved from the Physical Sciences Laboratory (PSL) of the National Oceanic and Atmospheric Administration (NOAA; https://psl.noaa.gov/data/climateindices/list/, last access: 20 January 2023). We use monthly values from January 1950 to December 2021 (864 months), and we remove the linear trend in order to get approximately stationary time series, which is a requirement for applying our causal methods.

The four atmospheric indices are computed from the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) reanalysis:

The Pacific–North American (PNA) index is obtained by projecting the daily 500 hPa geopotential height anomalies over the Northern Hemisphere (0–90° N) onto the PNA loading pattern (second leading mode of rotated empirical orthogonal function (EOF) analysis of monthly mean 500 hPa height anomalies during the 1950–2000 period). A positive PNA features above-average heights in the vicinity of Hawaii and over the intermountain region of North America and below-average heights south of the Aleutian Islands and over the southeastern United States. A negative PNA reflects an opposite pattern of height anomalies over these regions.
The North Atlantic Oscillation (NAO) index is based on the difference in sea-level pressure between the subtropical high (Azores) and the subpolar low (Iceland). A positive NAO reflects above-normal pressure over the central North Atlantic, the eastern United States, and western Europe and below-normal pressure across high latitudes of the North Atlantic. A negative NAO features an opposite pattern of pressure anomalies over these regions.
The Arctic Oscillation (AO), or Northern Annular Mode (NAM), index is constructed by projecting the 1000 hPa geopotential height anomalies poleward of 20° N onto the leading EOF (using monthly mean 1000 hPa height anomalies from 1979 to 2000). When the AO is in its positive phase, strong westerlies act to confine colder air across polar regions. When the AO is negative, the westerly jet weakens and can become more meandering.
The Quasi-Biennial Oscillation (QBO) index is calculated from the zonal average of the 30 hPa zonal wind at the Equator. It is the most predictable mode of atmospheric variability that is not linked to changing seasons, with easterly and westerly winds alternating each 13 months.

Below are the four indices based on ocean conditions:

The Atlantic Multidecadal Oscillation (AMO) index is computed based on version 2 of the extended SST gridded dataset (which uses UK Met Office SST data) averaged over the North Atlantic (0–70° N; unsmoothed time series) and following the procedure described in . Cool and warm phases of the AMO may alternate every 20–40 years.
The Pacific Decadal Oscillation (PDO) index is obtained by projecting the Pacific SST anomalies from version 5 of the NOAA Extended Reconstructed SST (ERSST) dataset onto the dominant EOF from 20 to 60° N. The PDO is positive when SST is anomalously cold in the interior North Pacific and warm along the eastern Pacific Ocean. The PDO is negative when the climate anomaly patterns are reversed.
The Tropical North Atlantic (TNA) index is computed based on SST anomalies from the Hadley Centre Global Sea Ice and Sea Surface Temperature (HadISST) and NOAA Optimal Interpolation (OI) datasets averaged in the Tropical North Atlantic (5.5–23.5° N; 57.5–15° W), based on .
The Niño3.4 index is based on standardized SST anomalies (using ERSST v5) averaged over the eastern tropical Pacific (5° S–5° N; 170–120° W). The Niño3.4 index is in its warm phase when SST anomaly exceeds 0.5 °C, and it is in its cold phase when SST anomaly is below $- 0.5$ °C. For the remainder of the paper, we will refer to this index as “ENSO” (El Niño–Southern Oscillation), as it is closely associated with this oscillation.

3 Methods

In this section, we describe the two causal methods used in this study, namely, the Liang–Kleeman information flow (LKIF; Sect. ) and the Peter and Clark momentary conditional independence (PCMCI; Sect. ) methods. We compare our results to the more traditional Pearson correlation coefficient, which is the covariance between two variables divided by the product of their standard deviations. We also explain below the main differences between the two methods (Sect. ) and provide details about the comparison diagnostics used in our study (Sect. ).

3.1 Liang–Kleeman information flow (LKIF)

The LKIF method has been developed by . It has been first applied in bivariate cases and has subsequently been extended to multivariate cases . In our study, we use the multivariate formulation of LKIF. In this framework, causal inference is based on information flow, which has been recognized as a real physical notion, i.e., formulated from first principles of information theory .

Under the assumption of a linear model with additive noise, the maximum likelihood estimate of the information flow reads as follows :

5 $T_{j \to i} = \frac{1}{\det C} \cdot \sum_{k = 1}^{d} Δ_{j k} C_{k, d i} \cdot \frac{C_{i j}}{C_{i i}},$ where $T_{j \to i}$ is the absolute rate of information transfer from variable $x_{j}$ to variable $x_{i}$ , $C$ is the covariance matrix, $d$ is the number of variables, $Δ_{j k}$ represents the cofactors of $C$ ( $Δ_{j k} = (- 1)^{j + k} M_{j k}$ , where $M_{j k}$ represents the minors), $C_{k, d i}$ is the sample covariance between all $x_{k}$ and the Euler forward difference approximation of $d x_{i} / d t$ , $C_{i j}$ is the sample covariance between $x_{i}$ and $x_{j}$ , and $C_{i i}$ is the sample variance of $x_{i}$ . Note that a nonlinear version of LKIF has recently been developed but will not be used in this study .

To assess the importance of the different cause–effect relationships, we compute the relative rate of information transfer $τ_{j \to i}$ from variable $x_{j}$ to variable $x_{i}$ following the normalization procedure of : 6 $τ_{j \to i} = \frac{T_{j \to i}}{Z_{i}},$ where $Z_{i}$ is the normalizer, computed as follows: 7 $Z_{i} = \sum_{k = 1}^{d} |T_{k \to i}| + |\frac{d H_{i}^{noise}}{d t}|,$ where the first term on the right-hand side represents the information flowing from all the $x_{k}$ to $x_{i}$ (including the influence of $x_{i}$ on itself), and the last term is the effect of noise (taking stochastic effects into account), computed following .

In the following, we will only use the relative rate of information transfer $τ$ (expressed in $%$ ). When $τ_{j \to i}$ is significantly different from 0, $x_{j}$ has an influence on $x_{i}$ ; when $τ_{j \to i}$ = 0, there is no influence. The absolute value of $τ$ indicates the strength of the causal influence. A positive (negative) value is indicative of an increase (decrease) in variability of the target variable $x_{i}$ due to the causal influence of the source $x_{j}$ . However, we will mainly use the absolute value of $τ$ in this study and will only briefly discuss the sign in the case of the model (Fig. ). Statistical significance of $τ_{j \to i}$ is computed via bootstrap resampling with replacement of all terms included in Eqs. ()–() and using a significance level $α = 5 %$ . The number of bootstrap realizations varies depending on the case study: 100 for the 2D and models, 300 for the 6D and 9D models, and 1000 for the real-world case study. This number is chosen sufficiently large to achieve convergence of results. The relative rate of information transfer $τ$ is computed for each bootstrap realization, and the error in $τ$ , which we refer to as $ϵ_{τ}$ , is calculated as the standard deviation across all $τ$ bootstrapped values. If the confidence interval $τ \pm 1.96 ϵ_{τ}$ does not contain the zero value, then $τ$ is significant at the 5 $%$ level; otherwise, it is not significant.

3.2 Peter and Clark momentary conditional independence (PCMCI)

The PCMCI method is a causal discovery method based on the Peter and Clark (PC) algorithm , combined with the momentary conditional independence (MCI) approach . Given a set of univariate time series (called “actors”), PCMCI estimates their causal graph representing the conditional dependencies among the time-lagged actors. In its linear application, PCMCI uses partial correlations to iteratively test conditional dependencies in a set of actors, distinguishing between true causal links and spurious links arising from autocorrelation effects, indirect links, or common drivers.

Note that the term “causal” rests upon a set of assumptions, which are described in . In general, the causal graph should represent a stationary (stable in time) set of causal links, in which causality is determined with a lag $l$ of at least one time step, and it is only true among the specific set of analyzed actors. The PCMCI algorithm is composed of two steps: the PC step and the MCI step. Each step is briefly described in this section.

In the first step, or PC step, for each actor in the (example) set of actors $P$ , the algorithm identifies the initial set of parents $P^{0}$ based only on the simple correlation between each actor and all other actors up to a maximum lag $l_{\max}$ . Let us assume that with $l_{\max} = 3$ , $P = {A, B, C, D, E}$ and $P_{A}^{0} = {A_{l = - 1}, B_{l = - 1}, D_{l = - 2}, C_{l = - 2}, E_{l = - 1}}$ , where actors $A$ to $E$ in the set of parents of $A$ , $P_{A}^{0}$ , are ordered based on the absolute value of their correlation coefficient with $A_{l = 0}$ . Then, in the first iteration of the algorithm, the partial correlation $ρ$ between $A_{l = 0}$ and each actor in $P_{A}^{0}$ is calculated by conditioning on an additional actor taken from $P_{A}^{0}$ . For example, $ρ$ ( $A_{l = 0}$ , $A_{l = - 1} | B_{l = - 1}$ ) $=$ $ρ$ (Res( $A_{l = 0}$ ), Res( $A_{l = - 1}$ )), where Res( $A_{l = 0}$ ) and Res( $A_{l = - 1}$ ) are the residuals of $A_{l = 0}$ and $A_{l = - 1}$ after removing the linear influence of $B_{l = - 1}$ . The partial correlation is computed for each actor in $P_{A}^{0}$ by conditioning (only once) on the strongest available actor. This process is called “iterative conditioning”. At the end of this first iteration, the set of parents of $A$ is updated. Let us assume that in our example $P_{A}^{1} = {A_{l = - 1}, B_{l = - 1}, C_{l = - 2}, E_{l = - 1}}$ , then in the second iteration the set of parents $P_{A}^{2}$ will be identified by conditioning on the first two strongest actors, e.g., $ρ$ ( $A_{l = 0}$ , $A_{l = - 1} | B_{l = - 1}$ , $C_{l = - 2}$ ). The PC step ends when the number of actors on which to condition equals the numbers of actors contained in $P_{A}^{n}$ . Then, the same computation is repeated for each actor contained in $P$ , until each actor has its own set of parents $P_{n}$ .

In the second step, or MCI step, the partial correlation between each possible pair of actors is calculated a second time by regressing once on the combined set of parents. If we assume that $P_{A}^{4} = {A_{l = - 1}, C_{l = - 2}, E_{l = - 1}}$ and $P_{B}^{3} = {B_{l = - 1}, A_{l = - 2}, D_{l = - 1}}$ , then a causal link between $A_{l = 0}$ and $B_{l = - 1}$ is detected if their partial correlation conditioned on their joint set of parents is significant for a certain threshold $α$ . In this example, $ρ$ ( $A_{l = 0}$ , $B_{l = - 1} | A_{l = - 1}$ , $C_{l = - 2}$ , $E_{l = - 1}$ , $B_{l = - 2}$ , $A_{l = - 3}$ , $D_{l = - 2}$ ) is given (note that the lag of $P_{B}^{3}$ is increased accordingly). At the end of the MCI step, each actor will have its own set of causal parents, and the causal effect of each link can be computed.

The strength of a causal link from variable $x_{j}$ at time $t - l$ to variable $x_{i}$ at time $t$ , noted $x_{j, t - l} \to x_{i, t}$ , is expressed in terms of the path coefficient $β$ , which measures the change in the expectation of $x_{i, t}$ following an increase of $x_{j, t - l}$ by 1 standard deviation, keeping all other parents of $x_{i, t}$ constant. The linear coefficients $β$ are calculated as follows:

8 $x_{i, t} = \sum_{k = 1}^{N} β_{k} x_{j, k} + η_{x_{i}},$ where $x_{j, k} \in P {x_{i}}$ ( $k =$ 1,..., $N$ ) is the set of parents of $x_{i, t}$ ( $N$ is the number of parents), and $η_{x_{i}}$ is the residual of $x_{i, t}$ . Note that in order to allow for a meaningful comparison with correlation and LKIF based on a linear model, we use here the PCMCI algorithm along with a linear similarity measure (partial correlation). In principle, PCMCI could also be combined with other statistical association measures that allow for conditioning on the effects of any third variable (like CMI), the study of which is however beyond the scope of the present work. The $β$ coefficients are only calculated for causal links that are significant at the 5 $%$ level, where each $p$ value obtained from the MCI step is corrected using the Benjamini–Hochberg false discovery rate correction method .

3.3 Differences between the two methods

Before investigating results from the two causal methods, it is important to highlight the main differences between the two methods, which are summarized in Table . LKIF is directly derived from the propagation of information entropy and quantifies the rate of information transfer from one variable to the other . PCMCI, on the other hand, is a causal network algorithm starting with a fully connected graph from which non-causal links are iteratively removed based on conditioning sets of growing cardinality . The actual underlying PCMCI measure for directional statistical dependence is partial correlations, including the effect of possible causal parents. LKIF does not systematically test the latter but uses a different approach, in which the statistical dependence is measured via the information flowing from one variable to the other.

The metric used by LKIF is the rate of information transfer from variable $x_{j}$ to variable $x_{i}$ and can be expressed either in natural unit of information (nat) per unit time (for $T$ ; Eq. ) or in percent (for $τ$ ; Eq. ). For PCMCI, the path coefficient $β$ (Eq. ) measures the expected change in $x_{i}$ at time $t$ (in units of standard deviation) if $x_{j}$ is perturbed at time $t - l$ by 1 standard deviation. While time lags must be incorporated with PCMCI, LKIF has not been designed to work with such lags by default, although they can be used in principle . To this end, we can shift in time the time series of the leading variable and recompute LKIF based on the lagged time series.

While for both methods the strength of the metric, in absolute value, indicates how strongly two variables are causally linked (i.e., the larger $| τ |$ and $| β |$ , the larger the causal link), the sign has a different meaning. For LKIF, a positive (negative) value of $τ_{j \to i}$ means that the variability of the source $x_{j}$ increases (decreases) the variability of the target $x_{i}$ . For PCMCI, the sign of $β_{j \to i}$ is closely linked to the correlation between $x_{j}$ and $x_{i}$ (i.e., a positive (negative) value means that an increase in $x_{j}$ leads to an increase (a decrease) in $x_{i}$ in the subsequent time step).

Table 1

Main differences between the two causal methods used in this study.

	LKIF	PCMCI
Full name	Liang–Kleeman information flow	Peter and Clark momentary conditional independence
Type of method	Information flow	Causal discovery algorithm
Use of time lags	Not by default	Always
Use of iterative conditioning	No	Yes
Metric	Rate of information transfer	Path coefficient $β$
	$T$ (absolute) or $τ$ (relative)
Unit	$T$ : nat per unit time; $τ$ : $%$	No unit
Sign meaning	$> 0 : x_{j}$ variability $\to x_{i}$ variability $↑$	$> 0 : x_{j} ↑ \to x_{i} ↑$
	$< 0 : x_{j}$ variability $\to x_{i}$ variability $↓$	$< 0 : x_{j} ↑ \to x_{i} ↓$
Key references

3.4 Comparison diagnostics

Since correct causal links are known for the three first artificial models (2D, 6D, and 9D models), we can check the performance of the two causal methods, as well as the correlation coefficient, in identifying the ground truth. The diagnostics presented here are not computed for the model and the real-world case study, as no exact solution exists for these two cases. We compute true-positive, true-negative, false-positive, and false-negative rates. The true-positive rate is the percentage of causal links correctly detected by the method among the total number of ground truth causal links. The true-negative rate is the percentage of non-causal links correctly detected by the method among the total number of ground truth non-causal links. The false-positive rate represents the percentage of cases where the method incorrectly detects a causal link among the total number of ground truth non-causal links. The false-negative rate represents the percentage of cases where the method fails to find an existing causal link among the total number of ground truth causal links.

To summarize the results from the confusion matrix, we also compute the $ϕ$ coefficient based on true positives (denoted TP), true negatives (denoted TN), false positives (denoted FP), and false negatives (denoted FN):

9 $ϕ = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP) (TP + FN) (TN + FP) (TN + FN)}} .$ The denominator is set to 1 if any of the four sums in the denominator is equal to 0, in which case $ϕ = 0$ . A value of 1 represents a perfect prediction of ground truth causal and non-causal links by the method, while a value of 0 means that the result is not better than a random prediction. These diagnostics are presented in Table and discussed in Sect. .

4 Results

We provide results from the four artificial models and the real-world case study hereafter. Table provides a summary of results for the three first models and will be discussed in Sect. .

4.1 2D model

For the 2D model, the numerical value of the correlation between $x_{1}$ and $x_{2}$ is significantly positive ( $R = 0.23$ ; Fig. a) and is similar to the analytical value (Fig. d), but it does not provide any indication on the direction of influence.

LKIF can accurately retrieve the correct causal link, i.e., from $x_{2}$ to $x_{1}$ , as well as the absence of influence in the reverse direction (Fig. b), as was already demonstrated in . In addition, the numerical estimate of the rate of information transfer ( $| τ_{2 \to 1} | = 5.72 %$ ; Fig. b) is very close to the analytical solution ( $| τ_{2 \to 1} | = 5.56 %$ ; Fig. e), which provides confidence in the LKIF results found for this simple system.

PCMCI only captures the self-influences of $x_{1}$ and $x_{2}$ but is not able to capture any significant causal influence between $x_{1}$ and $x_{2}$ with the original time step (i.e., $Δ t = 0.001$ ) (Fig. c). This missed detection is partly due to the fact that PCMCI responds better for discrete maps with finite time steps. Indeed, the time step for discretization is too small, and if we recompute causal links with PCMCI taking every 100 time steps ( $Δ t = 0.1$ ), we can recover the influence from $x_{2}$ to $x_{1}$ , although the value of $β$ is relatively small (Fig. c).

This example shows that LKIF performs well for such a very simple 2D system, while PCMCI struggles with the original time step. In particular, the serial dependency in this particular model might overcast the mutual dependency for a “typical” maximum lag considered by PCMCI, which has not been designed for such conditions.

Figure 1

Numerical results from the 2D model: (a) correlation coefficient, (b) rate of information transfer (LKIF, absolute value), and (c) maximum path coefficient (PCMCI) when using three lags (zero to two time steps). Analytical values of (d) correlation coefficient and (e) rate of information transfer (LKIF). (f) Correct causal links from Eq. (). For numerical results, only significant values at the $α = 5 %$ level are shown, and correct causal links are highlighted by black or blue contours. The dashed contour in panel (c) indicates a significant value with a larger time step ( $Δ t = 0.1$ ), while it is not significant with the original time step ( $Δ t = 0.001$ ).

[Figure omitted. See PDF]

4.2 6D model

For the 6D model, the correlations are significant for all 30 pairs of variables (excluding autocorrelations), despite relatively small values for many of them (Fig. a). This shows that a simple correlation analysis fails to only identify the seven causal links that should be identified in this system (Fig. d). The largest correlation of all pairs is between $x_{2}$ and $x_{5}$ ( $R = 0.37$ ), but no causal link should exist between the two variables (i.e., this is a false positive). This large correlation probably appears because $x_{6}$ influences both $x_{2}$ and $x_{5}$ by construction (Fig. d) and is thus a confounding variable. Correlations larger than 0.3 in absolute value appear for the two pairs $x_{6}$ – $x_{2}$ and $x_{6}$ – $x_{5}$ (Fig. a), which confirms the role of $x_{6}$ as a confounding variable, but these correlations do not indicate the direction of influences.

Both LKIF (Fig. b; no lag is used) and PCMCI (Fig. c; use of four time lags) can capture the seven correct causal links (Fig. d), i.e., the directed cycle $x_{1} \to x_{2} \to x_{3} \to x_{1}$ , the two-way causal link between $x_{4}$ and $x_{5}$ , and the influence of $x_{6}$ on both $x_{2}$ and $x_{5}$ . Results from PCMCI in terms of self-influences are more accurate based on Eq. (), as it provides two significant self-influences, i.e., $x_{4}$ and $x_{6}$ , while LKIF identifies all six self-influences as significant. The latter result indicates that the LKIF method may fail at representing the correct self-influences, while PCMCI does not.

This example shows the strength of causal methods, which can capture the correct causal influences, while the correlation is not able to provide such information and cannot identify confounding variables and the direction of causality.

Figure 2

Results from the 6D model: (a) correlation coefficient, (b) rate of information transfer (LKIF, absolute value), and (c) maximum path coefficient (PCMCI) when using four lags (zero to three time steps). (d) Correct causal links from Eq. (). Only significant values at the $α = 5 %$ level are shown, and correct causal links are highlighted by black or blue contours.

[Figure omitted. See PDF]

4.3 9D model

For the 9D model, the correlation does a poor job at identifying correct causal influences (Fig. a). In particular, the largest correlation is between $x_{8}$ and $x_{9}$ , which is not a correct causal link by construction (Eq. ). As for the 6D model, this is due to the fact that $x_{7}$ should influence both variables (Fig. d). $x_{7}$ is indeed significantly correlated to both $x_{8}$ and $x_{9}$ , but the causal direction is not identified by the correlation analysis.

Using LKIF without any lag shows that the method can detect all correct links, except $x_{5} \to x_{4}$ , although only four causal influences have a rate of information transfer $| τ |$ larger than 1 % (Fig. b). These four influences are the ones that should appear at lag $- 1$ (Fig. d, i.e., $x_{1} \to x_{3}$ , $x_{6} \to x_{4}$ , $x_{7} \to x_{8}$ , and $x_{7} \to x_{9}$ . The method also wrongly identifies 13 causal influences, even if values of information transfer remain small.

The use of time lags up to $l = 3$ time steps with LKIF (we use 9 variables $\times$ 4 lags $= 36$ variables in total) allows us to improve results (Fig. b, where the maximum value of all lags is plotted). In particular, all nine correct causal links can now be identified with $| τ | > 3 %$ , except the influence of $x_{3}$ on $x_{1}$ , which is significant but has a much smaller value ( $| τ | = 0.68 %$ ). Five additional causal influences are wrongly identified by the method with lags up to 3 time steps, but with relatively small values ( $| τ | < 0.4 %$ ).

Using PCMCI with lags up to $l = 4$ time steps also allows us to correctly reproduce all causal links, except that it wrongly identifies four additional causal influences but with very small values (Fig. c). All self-influences are also correctly identified by the two methods.

This example also demonstrates the power of causal methods compared to a correlation analysis when using an appropriate number of lags: all expected links are correctly identified. Although some wrong causal links are identified by both methods, the strength of the relationship remains small for these wrong influences.

Figure 3

Results from the 9D model without lags: (a) correlation coefficient and (b) rate of information transfer (LKIF, absolute value). Only significant values at the $α = 5 %$ level are shown, and correct causal links are highlighted by black or blue contours.

[Figure omitted. See PDF]

Figure 4

Results from the 9D model with lags: (a) maximum correlation coefficient when using four lags (zero to three time steps), (b) maximum rate of information transfer (LKIF, absolute value) when using four lags (zero to three time steps), and (c) maximum path coefficient (PCMCI) using five lags (zero to four time steps). (d) Correct causal links from Eq. (). Only significant values at the $α = 5 %$ level are shown, and correct causal links are highlighted by black or blue contours.

[Figure omitted. See PDF]

4.4 Lorenz (1963) model

The only large correlation (excluding autocorrelation) in this system is between $x$ and $y$ , with $R = 0.88$ (Fig. a). The other correlations are very small ( $R = - 0.01$ ) but significant, probably due to the length of the time series.

According to LKIF, a two-way causal link appears between $x$ and $y$ (Fig. b). This causal link is also identified by PCMCI with lags up to $l = 3$ time steps (Fig. c). PCMCI also identifies a significant two-way causal link between $y$ and $z$ , but the value is very close to 0.

Then, we investigate whether there is a lag dependence on the results. For the correlation and LKIF, we repeat the computation by shifting the three variables one by one with a lag from 0 to 1 unit time (100 time steps) with 0.1 unit time increment (i.e., every 10 time steps). For example, we take $x_{t - l}$ at lag $l = 0 .$ 1 unit time and keep $y_{t}$ and $z_{t}$ at lag 0, and we recompute the correlation and relative rate of information transfer. Then, we take $x_{t - l}$ at lag $l = 0.2$ unit time, keeping $y_{t}$ and $z_{t}$ at lag 0 and so on until lag $l = 1$ unit time. We do the same when $y$ leads $x$ and $z$ and when $z$ drives $x$ and $y$ . For PCMCI, all lags from 0 to 1 unit time with 0.01 unit time increment (i.e., every time step) are included in the same computation as the method is designed to work with multiple lags by default. Results are presented in Fig. .

The correlation coefficient between $x$ and $y$ decreases exponentially with increasing lag when $x$ leads $y$ (Fig. a), and it first increases from $l = 0$ to $l = 0.1$ unit time before decreasing exponentially when $y$ leads $x$ (Fig. b). No correlation appears between $z$ and any of the two other variables at any lag (Fig. a–c).

The LKIF rates of information transfer from $x$ to $y$ and from $y$ to $x$ also decrease with increasing lag between 0 and 1 unit time, but starting with a plateau of $| τ | \sim 50 %$ (Fig. d–e). This plateau lasts from $l = 0$ to $l = 0.2$ unit time for $τ_{x_{t - l} \to y_{t}}$ (Fig. d) and from $l = 0$ to $l = 0.4$ unit time for $τ_{y_{t - l} \to x_{t}}$ (Fig. e). No information transfer exists between $z$ and the two other variables at any lag (Fig. d–f), in agreement with the absence of correlation (Fig. a–c).

The PCMCI path coefficients between $x$ and $y$ also generally decrease (in the two directions) with increasing lag, although the decrease presents more variability than the correlation and LKIF, with $β = 0$ at lag 0, the largest $β$ value when $l = 0.1$ unit time, and then an oscillatory behavior until $l = 1$ unit time (Fig. g–h). As for the correlation and LKIF, no causal influence is found between $z$ and the two other variables at any lag (Fig. g–i).

If we replace $x$ by $x^{2}$ to take nonlinearities into account and look at the triplet ( $x^{2}$ , $y$ , $z$ ), a strong positive correlation now appears between $x^{2}$ and $z$ ( $R = 0.65$ ; Fig. d). In addition, a strong two-way causal link now appears between $x^{2}$ and $z$ with both LKIF ( $| τ | \sim 50 %$ in the two directions; Fig. e) and PCMCI ( $| β | = 1.7$ in the two directions; Fig. f). This shows that the linear versions of LKIF and PCMCI can detect causal links between nonlinear transformed variables in nonlinear models. In this case, the correlation between $x$ and $y$ , combined with the nonlinear forcing product $x y$ in the third equation of the model ( $z$ equation; Eq. ), results in a linear correlation between $z$ and the nonlinear non-invertible variable change $x^{2}$ .

The correlation between $x_{t + l}^{2}$ and $z_{t}$ oscillates between $R \sim - 0.7$ ( $l = 0.2$ unit time) and $R \sim 0.9$ ( $l = - 0 .$ 1 unit time) with a period of $\sim 0.7$ unit time (Fig. a). The rates of information transfer from $x_{t + l}^{2}$ to $z_{t}$ and from $z_{t}$ to $x_{t + l}^{2}$ also show an oscillatory behavior with a period of $\sim 0.35$ unit time (Fig. b), i.e., half of the correlation oscillation. PCMCI does not exhibit such an oscillatory behavior but rather a quickly decreasing $β$ value for small lags (Fig. c).

Figure 5

Results from the model when using ( $x$ , $y$ , $z$ ): (a) correlation coefficient, (b) rate of information transfer (LKIF, absolute value), and (c) maximum path coefficient (PCMCI) when using four lags (zero to three time steps). Results from the model when using ( $x^{2}$ , $y$ , $z$ ): (d) correlation coefficient, (e) rate of information transfer (LKIF, absolute value), and (c) maximum path coefficient (PCMCI) when using four lags (zero to three time steps). Only significant values at the $α = 5 %$ level are shown.

[Figure omitted. See PDF]

Figure 6

Results from the model when using ( $x$ , $y$ , $z$ ): (a–c) correlation coefficient, (d–f) rate of information transfer (LKIF, absolute value), and (g–i) path coefficient (PCMCI) as a function of the lag $l$ when (a, d, g) $x$ leads; (b, e, h) $y$ leads; and (c, f, i) $z$ leads.

[Figure omitted. See PDF]

Figure 7

Results from the model when using ( $x^{2}$ , $y$ , $z$ ): (a) correlation coefficient, (b) rate of information transfer (LKIF), and (c) path coefficient (PCMCI) as a function of the lag $l$ in $x$ .

[Figure omitted. See PDF]

4.5 Climate indices

The real-world case study with climate indices shows that 54 % of the pairs of variables (excluding autocorrelations) are related by significant correlations when considering no lag (Fig. a). However, it is obvious that a large number of these pairs are correlated but not causally linked. The use of causal methods allows us to remove such spurious links, as demonstrated by the application of LKIF without any lag (Fig. b) and PCMCI with lags up to $l = 2$ months (Fig. c).

Results from the two causal methods present several similarities, including the AO influence on both PDO and TNA (Fig. b–c). Another similarity is the two-way causal link between AMO and TNA, in agreement with and . The AMO–TNA influence is not surprising as both indices are computed from SST anomalies in the North Atlantic, with AMO spanning the majority of the North Atlantic and TNA focusing on the tropical region. Values of the AMO–TNA influence in the two directions are relatively strong for LKIF ( $| τ_{AMO \to TNA} | = 22 %$ and $| τ_{TNA \to AMO} | = 38 %$ ) compared to other pairs of influence (Fig. b). In addition, ENSO influences PDO according to both methods (Fig. b–c), and the positive sign of the correlation means that a warm Niño3.4 phase results in a positive PDO (Fig. a). The ENSO influence on PDO was recently reported by , also using LKIF, and , based on the pseudo-transfer entropy. Spatial patterns of ENSO and PDO are very similar, and PDO is often being viewed as an ENSO-like interdecadal climate variability, with PDO occurring at decadal timescales, while ENSO is predominantly an interannual phenomenon .

In terms of differences between the two causal methods, LKIF identifies additional causal influences of AO on PNA, NAO, and AMO, while PCMCI does not identify these causal links (Fig. b–c). It is well known that there is a clear relationship between AO and NAO and that NAO is often referred to as the local manifestation of the AO . Also, according to LKIF, there are two-way causal influences between ENSO and PNA and between ENSO and TNA, which do not appear with PCMCI with lags up to $l = 2$ months (Fig. b–c). It is well known that ENSO has a major influence on the extratropical Northern Hemisphere climate variability, in particular on PNA . However, the influence of ENSO on PNA is complicated by the fact that other mechanisms can affect this relationship, such as the position of the Pacific jet stream . Our results with LKIF suggest that PNA has a stronger influence on ENSO than the reverse, which would go in favor of more complex mechanisms in action. Finally, the influence of ENSO on TNA has also been reported in the literature, and different mechanisms have been proposed . It is interesting to find that the influence of TNA on ENSO is stronger than the reverse influence with LKIF (Fig. b).

The use of 12 time lags (0 to 11 months) with both methods (bringing 8 variables $\times 12$ lags $= 96$ variables in total for LKIF) provides additional insights (Figs. –). PNA influences ENSO with a 1-month lag with LKIF (Fig. a) and with a 4-month lag with PCMCI (Fig. a). Additionally, PNA influences PDO with a 4-month lag and AMO with a 11-month lag using LKIF (Fig. a). However, all PNA influences appear relatively weak in intensity ( $| τ | < 1 %$ with LKIF and $| β | < 0.1$ ).

NAO influences PDO with both methods but with very different lags depending on the method, i.e., 11 months with LKIF (Fig. b) and 1 month with PCMCI (Fig. b). It also influences TNA with LKIF with a 1-month lag. As for PNA, all significant NAO influences remain limited in intensity ( $| τ | < 1 %$ with LKIF and $| β | < 0.1$ ).

AO is by far the climate index that influences most variables with LKIF (Fig. c), in agreement with . When considering no lag, AO influences all other indices, except QBO and ENSO (Fig. c). The largest value of rate of information transfer is from AO to NAO with $| τ | = 4 %$ , in agreement with the value considering no lag (Fig. b). AO also influences TNA and AMO at larger lags with LKIF ( $l = 1$ , 2, and 4 months for TNA and $l = 2$ , 5, and 11 months for AMO). With PCMCI, AO only influences TNA at lags $l = 1$ to 4 months, PDO at lag $l = 1$ month, and QBO at lag $l = 4$ months (Fig. c). It is intriguing to notice that no AO influence on NAO appears with PCMCI.

QBO does not have any influence on any other climate indices with any of the methods (Figs. d and d).

The AMO–TNA two-way causal influence already identified in Fig. also appears in the lagged plots but with contrasting behaviors depending on the causal method. With LKIF, AMO only influences TNA at lag 0 (Fig. e) and TNA influences AMO at lags $l = 0$ and 11 months (Fig. g). With PCMCI, the AMO influence on TNA increases with increasing lag from $l = 0$ to 6 months, then decreases and stays relatively constant until $l = 11$ months (Fig. e), and TNA influences AMO at lags $l = 2$ , 4–6, 8, and 10–11 months (Fig. g). TNA has additional influences with PCMCI at lags $l \geq 2$ months (on NAO, QBO, PDO, and ENSO; Fig. g) and with LKIF at lags $l \geq 9$ months (on PDO and ENSO). The TNA influences on PDO and ENSO, appearing for both causal methods, remain limited to large lags (Figs. g–g).

PDO has an influence on PNA with LKIF at lag $l = 6$ months (Fig. f), which is consistent with using sensitivity experiments with a coupled model. According to PCMCI, PDO influences ENSO at lags $l \geq 3$ months (Fig. f).

Finally, ENSO influences PDO at lags $l = 0, 2$ and 6 months, and influences TNA at lags $l = 2$ and 6 months with LKIF (Fig. h). With PCMCI, ENSO is the climate index that influences most variables (all but NAO), especially PDO from $l = 2$ to 10 months, TNA from $l = 3$ to 11 months, and AMO from $l = 4$ to 11 months (Fig. h). The large role of ENSO was also reported using pseudo-transfer entropy using lags $l = 1$ to 9 months .

Figure 8

Results from the real-world case study: (a) correlation coefficient, (b) rate of information transfer (LKIF, absolute value), and (c) maximum path coefficient (PCMCI) when using three lags (0 to 2 months). Only significant values at the $α = 5 %$ level are shown.

[Figure omitted. See PDF]

Figure 9

Results from the real-world case study with LKIF (rate of information transfer; absolute value) as a function of the lag: (a) PNA influence on the other variables; (b) NAO influence; (c) AO influence; (d) QBO influence; (e) AMO influence; (f) PDO influence; (g) TNA influence; (h) ENSO influence. Only significant influences at the $α = 5 %$ level are shown as filled dots.

[Figure omitted. See PDF]

Figure 10

Results from the real-world case study with PCMCI (path coefficient; absolute value) as a function of the lag: (a) PNA influence on the other variables; (b) NAO influence; (c) AO influence; (d) QBO influence; (e) AMO influence; (f) PDO influence; (g) TNA influence; (h) ENSO influence. Only significant influences at the $α = 5 %$ level are shown as filled dots.

[Figure omitted. See PDF]

5 Discussion

Correlation is often used by the climate community to identify potential relationships between variables, but a statistically significant correlation does not necessarily imply causation. In our study, we used two causal methods, LKIF and PCMCI, to disentangle true causal links from spurious correlations, and we applied them to four artificial models and one real-world case study based on climate indices. Below we discuss our results compared to previous literature (Sect. for the artificial models and Sect. for the real-world case study).

5.1 Artificial models

For the simplest (2D) model used here, we show that LKIF can accurately reproduce the correct causal link, with relatively high accuracy compared to the analytical solution, while PCMCI fails to reproduce this link when using the original time step (Sect. and Fig. ). PCMCI provides the correct influence for the 2D model when taking every 100 time steps (although the $β$ value is small), which shows that PCMCI responds better for discrete maps with finite time steps. For the 6D model, both LKIF and PCMCI can detect the correct causal links (Sect. and Fig. ). For the 9D nonlinear model, PCMCI allows us to retrieve the correct causal relationships, while some care with the number of lags is needed with LKIF to achieve appropriate results (Sect. and Figs. –). This shows that LKIF performs better for simpler systems and presents a few more difficulties with more complex models with several lags. On the other hand, PCMCI does not work well in the presence of very strong autocorrelations but may be preferential over LKIF as the number of variables increases. Results from the model are more complicated to interpret as the system is highly nonlinear and chaotic. Both methods detect the same causal links (Sect. and Fig. ), although some differences appear in the dependence of the causal influence on the time lag (Figs. –). Moreover, the combination of model nonlinearities and nonlinear variable changes can result in linear causal links detectable by LKIF and PCMCI (Fig. ).

The above results are not entirely comparable to findings from from a methodological perspective, as the latter used other causal methods and different coupled systems. However, a similarity is the fact that some methods (Granger causality and its extensions) better perform with the simplest models, while other methods (CCM and predictability improvement) are better suited for more complex systems . This goes in hand with LKIF being better with the specific time-continuous 2D model studied here, while PCMCI is well suited for the time-discrete 9D model of our analysis. Thus, the key finding from , that “it is important to choose the right method for a particular type of data”, is also valid for our study.

The main novelties compared to are that (1) we use two causal methods that have not been compared yet, (2) we compare our causal methods to the classical correlation coefficient, (3) we assess causality between nonlinear variable changes, and (4) we apply the two methods to a real-world case study. Regarding (1), no definite conclusion can be provided as to which method is the best: it depends on the system used. For certain very simple models, LKIF appears to be preferential over PCMCI, although PCMCI has not been designed for the particular 2D model used here (Sect. ). For a more complex model involving more variables and several lags, like the 9D model, PCMCI may be better suited. In any case, we recommend to use as many methods as possible for a specific problem to increase the robustness of results. Regarding (2), we show that both LKIF and PCMCI are superior to correlation, as they allow us to remove spurious links. Regarding (3), we show that the combination of model nonlinearities with nonlinear variable changes can result in linear causal links, detectable by both LKIF or PCMCI. Point (4) is discussed in Sect. .

Table 2

True-positive, true-negative, false-positive, and false-negative rates (in $%$ ), as well as $ϕ$ coefficient, for the correlation and the two causal methods (LKIF and PCMCI) for the first three artificial models (2D, 6D, and 9D models, the latter without and with lags), excluding self-influences. The number of ground truth correct (incorrect) links (without considering if the exact time lags are reproduced or not) is indicated in parentheses after “True positives” (“True negatives”) for each model. For the 2D model and PCMCI, numbers are also provided in parentheses for the case with larger sampling time step ( $Δ t = 0.1$ ).

		Correlation	LKIF	PCMCI
2D model	True positives (1) [ $%$ ]	100	100	0 (100)
	True negatives (1) [ $%$ ]	0	100	100 (100)
	False positives [ $%$ ]	100	0	0 (0)
	False negatives [ $%$ ]	0	0	100 (0)
	$ϕ$ coefficient	0	1	0 (1)
6D model	True positives (7) [ $%$ ]	100	100	100
	True negatives (23) [ $%$ ]	0	100	100
	False positives [ $%$ ]	100	0	0
	False negatives [ $%$ ]	0	0	0
	$ϕ$ coefficient	0	1	1
9D model without lag	True positives (9) [ $%$ ]	100	89	–
	True negatives (63) [ $%$ ]	60	79	–
	False positives [ $%$ ]	40	21	–
	False negatives [ $%$ ]	0	11	–
	$ϕ$ coefficient	0.40	0.50	–
9D model with lags	True positives (9) [ $%$ ]	100	100	100
	True negatives (63) [ $%$ ]	27	92	94
	False positives [ $%$ ]	73	8	6
	False negatives [ $%$ ]	0	0	0
	$ϕ$ coefficient	0.21	0.77	0.81

Table provides true-positive, true-negative, false-positive, and false-negative rates, as well as the $ϕ$ coefficient, for the correlation and the two causal methods used in this study and for the three first artificial models (2D, 6D, and 9D models). For the 9D model, a distinction is made between the case where lags are not considered (PCMCI is not used in this case) and the case where lags are considered. Results show that the correlation has a large chance of detecting false positives (i.e., incorrect detection of causal influences) for all models; thus, the correlation largely overestimates causal links. LKIF and PCMCI allow us to substantially reduce false positives, with 0 $%$ for the 2D and 6D models with both methods, 21 $%$ for the 9D model without lag with LKIF, and $< 10 %$ for the 9D model with lags with both methods. For the 2D model, LKIF perfectly reproduces the right causal links ( $ϕ = 1$ ), while the correlation coefficient and PCMCI (with the original time step) do not make better than a random prediction ( $ϕ = 0$ ). Only when using a larger sampling time step can PCMCI reproduce the correct causal links. For the 6D model, both LKIF and PCMCI accurately reproduce the ground truth ( $ϕ = 1$ ), while the correlation coefficient again does not make better than a random prediction and identifies all relationships as causal ( $ϕ = 0$ and false-positive rate $= 100 %$ ). For the 9D model without lag (PCMCI not included), the correlation does a better job at identifying a certain amount of true negatives (60 $%$ ) compared to the 2D and 6D models, but LKIF provides overall better results ( $ϕ = 0.5$ for LKIF vs. $ϕ = 0.4$ for correlation), despite the identification of one false negative with LKIF (Fig. b). For the 9D model with lags, the performance of the two causal methods is clearly better than the correlation ( $ϕ = 0.21$ ), with PCMCI ( $ϕ = 0.81$ ) performing slightly better than LKIF ( $ϕ = 0.77$ ).

5.2 Climate indices

In our study, we extend previous analyses from and by using monthly time series of climate indices in the Atlantic and Pacific regions. We use the same seven climate indices as ; add QBO to the list to have four indices characterizing both the atmosphere and ocean; and do not use local air temperature, precipitation, or insolation. also computed LKIF based on these indices but focused on the dependence of the rate of information transfer on the timescale (using a time-moving window) and did not compare LKIF to another method. also used NAO, QBO, AMO, PDO, and ENSO (Niño3.4); they used a slightly different index for TNA, and they incorporated seven additional indices. The causal method used by is the pseudo-transfer entropy method .

Due to the small methodological differences in our analysis compared to (see above), some small differences appear, but key results with LKIF remain similar. In particular, we find that AO is the largest driver of all variables as it influences all other indices, except QBO and ENSO (Sect. and Fig. b). We show that the AO influence mainly occurs at lag $l = 0$ (Fig. c). This is in agreement with , who find that AO plays a key role at short timescale. PCMCI only identifies two AO influences with lags shorter than 2 months, i.e., to PDO and TNA (Fig. c). It is particularly intriguing to see that PCMCI does not detect the AO influence on NAO (Fig. c), while LKIF does (Fig. b), as NAO is often referred to as the local manifestation of AO . This discrepancy might hide seasonal differences, as for example winter and summer NAO have different spatial patterns .

ENSO has a relatively large influence on other climate indices, especially on PDO for both LKIF and PCMCI (Fig. b–c). The pivotal role of ENSO was already identified by and is not surprising due to its importance on the global climate . ENSO has a clear influence on PDO at lags 2 to 10 months for PCMCI (Fig. h), while it only appears at lags 0, 2 and 6 months for LKIF (Fig. h). This ENSO–PDO influence was detected from lags 1 to 7 with pseudo-transfer entropy , thus somewhere in between PCMCI and LKIF. The other clear ENSO influence according to PCMCI, LKIF and pseudo-transfer entropy is on TNA, at lags 2 and 6 months with LKIF (Fig. h), at lags 3–11 months with PCMCI (Fig. h), and at lags 1–9 months with pseudo-transfer entropy . According to PCMCI and pseudo-transfer entropy, ENSO also largely influences other climate indices than PDO and TNA at different lags, which is not the case for LKIF. More research would be needed to further investigate this difference between causal methods.

6 Conclusions

In this study, we compare two independent causal methods, namely, the Liang–Kleeman information flow (LKIF) and the Peter and Clark momentary conditional independence (PCMCI), and the Pearson correlation coefficient. We use five different datasets with an increasing level of complexity, including three stochastic models, one nonlinear deterministic model, and one real-world case study.

We show that both causal methods are superior to the correlation, which suffers from five key limitations: random coincidence, no identification of the direction of causality, external drivers not distinguished from direct drivers, no identification of potential nonlinear influences, and application to bivariate cases only. For most models and the real-world case study, the number of significant correlations is much larger than the number of significant causal links, which is incorrect from a causal perspective for the three first models. By extension, we assume that the correlation also suffers from this overestimation in the real-world case study, and causal methods allow us to improve results.

When comparing both causal methods together, LKIF can accurately reproduce the correct causal link in the 2D model, while PCMCI cannot with the original time step and needs to be computed with a larger sampling time step to provide correct causal links, although the influence remains small. For the 6D model, both methods can capture the seven correct causal links. For the 9D model, PCMCI correctly reproduces all causal links, and LKIF without any time lag is not totally accurate. When used with time lags, LKIF can identify the correct causal links.

For the model, results are more complicated to interpret as the system is time-continuous, nonlinear, and chaotic. Both causal methods show a strong two-way causal link between $x$ and $y$ , while no causal link appears between $z$ and the two other variables. However, when we replace $x$ by $x^{2}$ to take nonlinearities into account, $x^{2}$ and $z$ are causally linked (in the two directions) with both methods. We also show that both LKIF and PCMCI display a decrease in the two-way causal influence between $x$ and $y$ with increasing time lag, although the shape of this decrease is different between methods. Additionally, the oscillatory behavior in correlation coefficient and LKIF for the $x^{2}$ – $z$ pair as a function of lag is not displayed by PCMCI.

Finally, the real-world case study with climate indices provides some similarities but also important differences between the two methods. In terms of similarities, AO influences both PDO and TNA, there is a two-way causal link between AMO and TNA, and ENSO influences PDO. In terms of differences, LKIF identifies additional influences of AO on PNA, NAO, and AMO, as well as two-way causal links between ENSO and PNA and between ENSO and TNA. When using 12 time lags, the number of influences detected by PCMCI becomes larger compared to LKIF, e.g., ENSO has a large influence on all other variables except NAO, while AO remains the largest influencer (at smaller lags) with LKIF. More detailed analysis of the physical processes would be needed to identify correct causal links between these climate indices.

In summary, this analysis shows that both causal methods should be preferred to correlation when it comes to identify causal links. Additionally, as both LKIF and PCMCI display strengths and weaknesses when used with relatively simple models in which correct causal links can be detected by construction, we do not recommend one or the other method but rather encourage the climate community to use several methods whenever possible. We highlight that both methods, as used here, assume linearity, so results need to be taken with caution for nonlinear problems, such as the system and the real-world case study. The use of extensions of the methods for which fully nonlinear terms are taken into account are necessary to complement the current results (e.g., ). Also, both LKIF and PCMCI deal with direct causal links, while other methods, such as interventional causality , can detect indirect influences. Further analysis would be needed to explore this aspect. Lastly, we could test the robustness of the methods to noise and their performance in the context of high-dimensional systems.

Code and data availability

The climate indices were retrieved from the Physical Sciences Laboratory (PSL) of the National Oceanic and Atmospheric Administration (NOAA; https://psl.noaa.gov/data/climateindices/list/, ). The Python scripts to produce the outputs and figures of this article, including the computation of LKIF, are available on Zenodo: 10.5281/zenodo.8383534 .

Author contributions

DD, GDC, RVD, CALP, AS and SV designed the study. DD generated the model datasets and retrieved the climate indices. DD computed the LKIF method and Pearson correlation onto the datasets, and GDC ran the PCMCI algorithm. DD led the writing of the manuscript, with contributions from all co-authors. DD created all figures, with the help of GDC. All authors participated to the data analysis and interpretation.

Competing interests

At least one of the (co-)authors is a member of the editorial board of Nonlinear Processes in Geophysics. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

We thank X. San Liang for his feedback related to our analysis. We also thank the editor Stefano Pierini and two anonymous reviewers for their comments, which helped to improve our article.

Financial support

David Docquier, Giorgia Di Capua, Reik Donner, Carlos Pires, Amélie Simon, and Stéphane Vannitsem were supported by ROADMAP (Role of ocean dynamics and Ocean-Atmosphere interactions in Driving cliMAte variations and future Projections of impact-relevant extreme events; https://jpi-climate.eu/project/roadmap/, last access: 21 February 2024), a coordinated JPI-Climate/JPI-Oceans project. David Docquier and Stéphane Vannitsem received funding from the Belgian Federal Science Policy Office under contract B2/20E/P1/ROADMAP. Giorgia Di Capua and Reik Donner were supported by the German Federal Ministry for Education and Research (BMBF) via the ROADMAP project (grant no. 01LP2002B). Amélie Simon and Carlos Pires were supported by Portuguese funds: Fundação para a Ciência e a Tecnologia (FCT) I.P./MCTES through national funds (PIDDAC) – UIDB/50019/2020 (10.54499/UIDB/50019/2020), UIDP/50019/2020 (10.54499/UIDP/50019/2020) and LA/P/0068/2020 (10.54499/LA/P/0068/2020), and the project JPIOCEANS/0001/2019 (ROADMAP).

Review statement

This paper was edited by Stefano Pierini and reviewed by two anonymous referees.

Word count: 10274

Show less

© 2024. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Correlation does not necessarily imply causation, and this is why causal methods have been developed to try to disentangle true causal links from spurious relationships. In our study, we use two causal methods, namely, the Liang–Kleeman information flow (LKIF) and the Peter and Clark momentary conditional independence (PCMCI) algorithm, and we apply them to four different artificial models of increasing complexity and one real-world case study based on climate indices in the Atlantic and Pacific regions. We show that both methods are superior to the classical correlation analysis, especially in removing spurious links. LKIF and PCMCI display some strengths and weaknesses for the three simplest models, with LKIF performing better with a smaller number of variables and with PCMCI being best with a larger number of variables. Detecting causal links from the fourth model is more challenging as the system is nonlinear and chaotic. For the real-world case study with climate indices, both methods present some similarities and differences at monthly timescale. One of the key differences is that LKIF identifies the Arctic Oscillation (AO) as the largest driver, while the El Niño–Southern Oscillation (ENSO) is the main influencing variable for PCMCI. More research is needed to confirm these links, in particular including nonlinear causal methods.

Details

Title

A comparison of two causal methods in the context of climate analyses

Author

Docquier, David¹

; Giorgia Di Capua²

; Donner, Reik V²

; Pires, Carlos A L³; Simon, Amélie⁴

; Vannitsem, Stéphane¹

¹ Meteorological and Climatological Information Service, Royal Meteorological Institute of Belgium, Brussels, Belgium
² Department of Water, Environment, Construction and Safety, Magdeburg-Stendal University of Applied Sciences, Magdeburg, Germany; Research Department I – Earth System Analysis, Potsdam Institute for Climate Impact Research – Member of the Leibniz Association, Potsdam, Germany
³ Instituto Dom Luiz, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal
⁴ Instituto Dom Luiz, Faculdade de Ciências, Universidade de Lisboa, Lisbon, Portugal; Department of Mathematical and Electrical Engineering, IMT Atlantique, Lab-STICC, UMR CNRS 6285, Brest, France

Pages

115-136

Publication year

2024

Publication date

2024

Publisher

Copernicus GmbH

ISSN

1023-5809

e-ISSN

1607-7946

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/npg-31-115-2024

ProQuest document ID

2931887487

A comparison of two causal methods in the context of climate analyses

Jump to:

Full text

Abstract

Details

Suggested sources