Research on the Graphical Model Structure

Full text

Turn on search term navigation

1. Introduction

What would be the impact of increasing the amount of homework that a student does in preparation for his (or her) exam? Would job training program lead to an increase in employment rate? What is the probability that a certain case of disease is “attributable” to a particular exposure? What is the probability that defendant’s action was the cause of the plaintiff’s damage (or death)? Causation shapes how we view, understand, and react to the world around us. The study of causal inference is very important in epidemiology [1,2,3], artificial intelligence (AI) [4,5], biomedicine [6], law [7], and policy analysis [8,9,10]. Causal inference can be based on a counterfactual approach [11,12] or a decision analysis approach [13]. In this paper, we adopt the popular counterfactual approach. According to the definition of causal effects, for the inference of causal effects of intervention, we will need to compare the effect that one intervention would have had on a unit which, in fact, received some other intervention. We consider the case of two treatments—intervention 0 and intervention 1. Assuming that the ith of the N units under study has an outcome y_i₁ that would have resulted if it had received intervention 1, and an outcome y_i₀ that would have resulted if it had received intervention 0. The causal effects are comparisons of y_i₁ and y_i₀ for the same unit and under the same situation. Since each unit receives only one intervention under the same situation, either y_i₁ or y_i₀ can be observed, but not both; therefore, comparisons of y_i₁ or y_i₀ represent a missing data problem.

Decades ago, the statistician R. A. Fisher proposed a method to solve this problem—randomized controlled trials (RCT) [14]. In an RCT, a direct comparison is made between two treatment groups, one of which serves as a control for the other. The assignment of different units to different treatment groups is chosen randomly. This ensures that no unobservable characteristics of the units are reflected in the assignment, and hence that any difference between treatment and control units reflects the impact of the intervention. The randomized controlled trial is the most rigorous and robust research method of causal inference between an intervention and an outcome. However, in many analysis situations, the random assignment of participants to treatment and comparison conditions may be unethical or impractical. For example, unless researchers are genuinely uncertain about the potential harms or benefits of an intervention, it is unethical to assign it to one group of people while withholding it from others. Therefore, observational studies have long been a source of valuable information about causal inference in many fields of study.

However, observational data lack the benefit of randomization that exists within an experimental setting, as units may self-select into the treatment group or control group due to unobservable characteristics of the units, therefore, we can no longer infer the impact of the intervention from the difference between treatment and control units directly, which brings difficulty in the study of causal inference [15]. Because of this, additional assumptions must be made when working with such data. One of the standard assumptions is strong exogeneity, also called “strong ignorability” by Rosenbaum and Rubin [11], which means that outcome variable and corresponding counterfactual variable are all independent of the intervention variable. In the estimation of treatment effects, based on strong exogeneity assumption, the propensity score is based on the variables, conditioning on which uncounfoundedness holds, we can obtain an unbiased estimate of the treatment effect with a propensity score adjustment (PSA) [16]. In probability of causation estimation, with a strong exogeneity assumption, we can obtain the bounds and even a point estimate, for the probabilities of causation based on the observational data [17]. However, the identification of strong exogeneity is the premise of causal inference based on observational data.

Strong exogeneity relies on the identification of independence between an intervention variable in the actual world and a counterfactual outcome variable in the hypothetical world according to its definition [17], which is difficult in practical situations [18]. Due to a lack of identifying of strong exogeneity, sensitivity analysis is usually adopted to assess the plausibility of results in causal inference [19,20]. To date, there is little research literature on identifying strong exogeneity. In the existing three references on identifying of strong exogeneity [12,21,22], identifying methods can be roughly divided into two categories: the hypothesis test method and the graphical diagnosis method. Refs [21,22] provide a method with a hypothesis test, which can be useful in some cases. However, they are all based on other assumptions that may be impractical in application situations, which prevent us from applying these methods in practice. Ref [12] provides a graphical diagnosis method for strong exogeneity assumption. Compared with the hypothesis test method, the graphical diagnosis method is more convenient to practical situations. In the derivation of the graphical diagnosis method, the error terms in structural equation models, i.e., exogenous variables in a graph, are viewed as the cause of the variation of endogenous variables, including counterfactual variables. In other words, the exogenous variable can be interpreted as a modifier of the functional mapping from endogenous variable’s parents to itself. According to this interpretation, counterfactual variables corresponding to outcome variable Y represent the sum total of all exogenous variables which can influence Y through paths that avoid intervention variable T in the graph. Based on aforementioned analysis, the graphical diagnosis method is derived, i.e., strong exogeneity assumption holds when every path between an intervention variable and an outcome variable that contains an arrow into the intervention variable is blocked. However, the derivation in [12] involves a structural equation model and a graphical model at the same time; thus, the derivation procedure is complicated. More importantly, the derivation mainly uses the method of text description, but not mathematical derivation, the derivation process is not intuitive, not clear, and not easy to understand.

A graphical method to overcome this difficulty in the derivation of graphical diagnosis method is a twin network, which uses two networks (one to represent the actual world and one to represent the hypothetical world). Based on the twin network method, we can easily and succinctly analyze a variable relationship which involves variables in the actual world and the hypothetical world at the same time, and conveniently identify strong exogeneity between the intervention variable and the outcome variable, according to its definition. Compared with the derivation in [12], the method based on twin network is quite succinct and comprehensible.

In this paper, the graphical model structure characteristic of strong exogeneity is investigated based on the twin network method, which can be used in the identification of strong exogeneity. The paper is organized as follows. In Section 2, we present some useful preliminary aspects. In Section 3, the graphical model structure characteristic of strong exogeneity is investigated based on a twin network method. In Section 4, as an application example in causal inference, we illustrate the application of the graphical model structure, characteristic of strong exogeneity in the context of LUCAS [23,24]. Finally, Section 5 provides some concluding remarks.

2. Preliminary Results

In this section, we present some preliminary aspects that are useful for derivation, as discussed in Section 3.

2.1. Directed Acyclic Graph (DAG)

Wright [25] introduced graph to represent causal relation firstly, and nowadays Directed Acyclic Graph (DAG) [26,27,28,29] has become a useful mathematical tool for representing a causal relationship in statistical analysis. Based on DAG, we can address problems of causation and independencies with simple operations similar to those used to solve arithmetic problems, and DAG also is called a causal diagram model. DAG is a collection of vertices (or, as we will call them, nodes or variables) and directed edges. The variables (or nodes) in DAG can be divided into two categories—the exogenous variable and the endogenous variable. The endogenous variable is determined by variables in the model, and the exogenous variable is determined by factors outside the model which represents impact on endogenous variables in the background. Nodes in a graph are connected by the directed edge. The node that a directed edge starts from is called the parent of the node that the edge goes into; conversely, the node that the edge goes into is the child of the node that it comes from. For example, in Figure 1, node A is parent of node C, and node C is the child of node A. A path between two nodes is a sequence of nodes beginning with one node and ending with another node. A path between two nodes is a directed path if it can be traced along the arrows, i.e., if no node on the path has two edges on the path directed into it, or two edges directed out of it.

If two nodes are connected by a directed path, then the first node is the ancestor of every node on the path, and every node on the path is the descendant of the first node. If the nodes in a graph are connected by the directed path, the ancestor is the cause of descendent. For instance, in Figure 1, node A is the ancestor of node C and node D on the directed path $A \to C \to D$ . When a directed path exists from a node to itself, the path (and graph) is called cyclic. A directed graph with no cycles is acyclic. Because a node (or variable) cannot be caused by itself, the directed graph representing causal relation between nodes should be acyclic, i.e., a directed acyclic graph (DAG).

2.2. Independence between Variables in DAG

Armed with the tool of DAG, we can conveniently determine which variables are independent or dependent in DAG graphical models. A variable (or node) is a collider on the path if the path enters and leaves the variable via arrowheads (a term suggested by the collision of causal forces at the variable) [30]. Note that being a collider is relative to a path. For example, in Figure 1, C is a collider on the path $A \to C \leftarrow B$ and a non-collider on the path $A \to C \to D$ ; thus, a collider is represented with the corresponding path in DAG. For instance, collider E on the path $A \to E \leftarrow C \leftarrow B$ is represented with path $(A \to E \leftarrow C)$ . A path with a collider (such as $E \leftarrow A \to C \leftarrow B$ ) is blocked and a path with no collider (such as $E \leftarrow A \to C \to D$ ) is open or unblocked. Two variables in the graph are independent if there is no open path between them; in other words, the paths between the two variables are all blocked. Two variables are dependent if there is at least one open path between them. For example, in Figure 1, variable A and variable B are independent because all four paths between variable A and variable B, $A \to C \leftarrow B$ , $A \to C \to D \leftarrow B$ , $A \to E \leftarrow C \leftarrow B$ , and $A \to E \leftarrow C \to D \leftarrow B$ have collider $(A \to C \leftarrow B$ ), $(C \to D \leftarrow B)$ , and ( $A \to E \leftarrow C$ ), respectively. However, variable A and variable D are dependent, because path $A \to C \to D$ is open, although the other path between node A and node D, $A \to E \leftarrow C \to D$ is blocked with collider ( $A \to E \leftarrow C$ ).

2.3. Counterfactual Variable

In a causal relation model, the relation “Y would be y, had T been t in situation U = u”, is denoted by $Y_{T = t} (u) = y$ , which is often denoted by $Y_{t} (u) = y$ [31,32]. Y and T are endogenous variables in the model, T is the intervention variable and Y is the outcome variable, and U represents exogenous variables in the model. The value of Y in unit u, had T been t, is given by $Y_{t} (u)$ , which is called the counterfactual variable. Assuming that M is a causal relation model represented by DAG, that M_t is a modified version of M by removing all arrows entering the variable T, and that T is t, the value of counterfactual variable $Y_{t} (u)$ in model M is defined as the solution for Y in the modified model M_t [33].

2.4. Definition of Strong Exogeneity

The definition of exogeneity includes weak exogeneity and strong exogeneity [17].

An intervention variable T is said to be weakly exogenous relative to outcome variable Y in model M if, and only if, for any unit u,

(1) $P (Y_{t} (u) = y) = P (Y (u) = y | T (u) = t) .$

The condition of weak exogeneity, as defined in Equation (1), is testable by comparing experimental and nonexperimental data.

An intervention variable T is said to be strongly exogenous relative to outcome variable Y in model M if, and only if, for any unit u,

(2) ${Y_{t^{'}} (u), Y_{t} (u)} ⫫ T,$

which was also called “strong ignorability” in [11]. In other words, the counterfactual variables

Y_{t} (u)

and

Y_{t^{'}} (u)

are independent of the intervention variable T. Strong exogeneity is weak exogeneity, but the converse is not true [17]. Strong exogeneity is an important assumption in causal inference. Because the identification of strong exogeneity involves variable relation between the actual world and the hypothetical world at the same time, according to its definition, strong exogeneity seems to be untestable. However, armed with the twin network method, we can succinctly investigate its graphical model structure characteristic, and then identify strong exogeneity easily.

3. Graphical Model Structure Characteristic of Strong Exogeneity

The graphical model (excluding Semi-Markovian model) structure characteristic of strong exogeneity is as follows: A variable T is said to be strongly exogenous relative to Y in model M if and only if variable T and variable Y have not common ancestor in graphical model M i.e., that variable T and variable Y have not common ancestor is sufficient and necessary condition of strong exogeneity between variable T and variable Y in graphical model M.

3.1. Twin Network Method

The identification of strong exogeneity requires an analysis of independence with regards to the counterfactual variable and the intervention variable, which is difficult because it involves a study of the actual world and the hypothetical world at the same time. The twin network method is a useful graphical model method used to analyze a counterfactual variable introduced by Balke and Pearl [34], which uses two networks—one to represent the actual world and one to represent the hypothetical world. The two networks are identical in structure, save for the arrows entering intervention variable T in the hypothetical world network, which are deleted to denote the counterfactual hypothesis. The two networks share the background variables, i.e., exogenous variables (in our case, $U_{T}, U_{Y}, U_{A N}$ , and $U_{D E}$ in Figure 2 and Figure 3), since those remain invariant under the actual world and the hypothetical world. The endogenous variables are replicated and labeled distinctly by repeating the first letter of the name of an endogenous variable in the hypothetical world network, because they may obtain different values in the hypothetical world versus the actual world. For example, in Figure 2, the replica of intervention variable T in an actual world network is variable TT in the hypothetical world network, while the replica of outcome variable Y in an actual world network is variable YY in the hypothetical world network, which represents the counterfactual variable $Y_{t} (u)$ .

The twin network representation offers a useful way of testing independencies among variables in the actual world and counterfactual variables in the hypothetical world. In the twin network, the independence expression ${Y_{t^{'}} (u), Y_{t} (u)} ⫫ T$ equals the independence between the intervention variable T in the actual world network and the outcome variable YY in the hypothetical world network, i.e., $Y_{t} (u)$ (or $Y_{t^{'}} (u)$ ), which equals to the paths between intervention variable T and outcome variable YY are all blocked in the twin network.

In order to analyze the strong exogeneity between the intervention variable T and the outcome variable Y, we just consider the nodes on the paths between variable T and variable Y, while the others nodes are neglected. In Figure 2, the network to represent the actual world is simplified as four parts: the intervention variable T, the outcome variable Y, the ancestor nodes of node T and their descendant nodes which are on the path between variable T and variable Y denoted by $A N_{T}$ , the child nodes of node T and their descendant nodes which are on the path between variable T and variable Y denoted by $D E_{T}$ . The hypothetical world network has the same structure with a different endogenous variable name.

3.2. Sufficiency Proof

Intervention variable T is said to be strongly exogenous, relative to outcome variable Y in graphical model M if variable T and variable Y have no common ancestor in graphical model M.

When variable T and variable Y have no common ancestor, there are no edges from the ancestor of node T, i.e., $A N_{T}$ to variable Y in an actual world network, and it is the same in the hypothetical world network. The corresponding graphical model is depicted in Figure 2.

First, let us look at the paths starting from variable T. As can be easily seen, there are two alternative paths, $T \to D E_{T} \leftarrow U_{D E} \to D D E_{T} \leftrightarrow Y Y$ and $T \to D E_{T} \leftrightarrow Y \leftarrow U_{Y} \to Y Y$ . For the path $T \to D E_{T} \leftarrow U_{D E} \to D D E_{T} \leftrightarrow Y Y$ , it is blocked by the collider ( $T \to D E_{T} \leftarrow U_{D E}$ ). For the path $T \to D E_{T} \leftrightarrow Y \leftarrow U_{Y} \to Y Y$ , if the edge between $D E_{T}$ and Y is $D E_{T} \to Y$ , then the corresponding path $T \to D E_{T} \to Y \leftarrow U_{Y} \to Y Y$ is blocked by the collider ( $D E_{T} \to Y \leftarrow U_{Y}$ ). Otherwise, if the edge between $D E_{T}$ and Y is $D E_{T} \leftarrow Y$ , then the corresponding path $T \to D E_{T} \leftarrow Y \leftarrow U_{Y} \to Y Y$ is blocked by the collider ( $T \to D E_{T} \leftarrow Y$ ).

Let us now analyze the paths going into variable T between variable T and variable YY. There are three alternative paths— $T \leftarrow A N_{T} \leftarrow U_{A N} \to A A N_{T} \leftarrow Y Y$ , $T \leftarrow A N_{T} \leftarrow Y \leftarrow U_{Y} \to Y Y$ , and $T \leftarrow A N_{T} \leftarrow Y \leftrightarrow D E_{T} \leftarrow U_{D E} \to D D E_{T} \leftrightarrow Y Y$ . The path $T \leftarrow A N_{T} \leftarrow U_{A N} \to A A N_{T} \leftarrow Y Y$ is blocked by the collider ( $U_{A N} \to A A N_{T} \leftarrow Y Y$ ). For the path $T \leftarrow A N_{T} \leftarrow Y \leftarrow U_{Y} \to Y Y$ , there should be collider in the nodes $A N_{T}$ . Otherwise, a directed cycle $T \to D E_{T} \to Y \to A N_{T} \to T$ will be produced inevitably in the graph, because variable T is the intervention variable and variable Y is the outcome variable, as the directed path $T \to D E_{T} \to Y$ always exists. So, the path $T \leftarrow A N_{T} \leftarrow Y \leftarrow U_{Y} \to Y Y$ is blocked by the collider in the nodes $A N_{T}$ . The path $T \leftarrow A N_{T} \leftarrow Y \leftrightarrow D E_{T} \leftarrow U_{D E} \to D D E_{T} \leftrightarrow Y Y$ is blocked by the collider in the nodes $A N_{T}$ , too.

Hence, when variable T and variable Y have no common ancestor in a graphical model M (actual world), the paths between variable T and variable YY in the twin network are all blocked, i.e., ${Y_{t^{'}} (u), Y_{t} (u)} ⫫ T$ , variable T is said to be strongly exogenous relative to Y.

3.3. Necessity Proof

If variable T is said to be strongly exogenous relative to Y in graphical model M, then variable T and variable T have no common ancestor in graphical model M.

When variable T and variable Y have a common ancestor, if we can find one path between intervention variable T in actual world network and outcome variable YY in the hypothetical world network is open, i.e., expression ${Y_{t^{'}} (u), Y_{t} (u)} ⫫ T$ is violated, then variable T is not strongly exogenous relative to Y in model M, and necessity is established.

When variable T and variable Y have a common ancestor, then there are directed edges $A N_{T} \to Y$ between nodes $A N_{T}$ and node Y in the actual world network. The hypothetical world network has directed edges $A A N_{T} \to Y Y$ , accordingly. The corresponding graphical model is shown in Figure 3.

The paths between variable T and variable YY can be divided into two categories based on the direction of directed edge connected to variable T on the paths, the paths starting from variable T, and the paths going into variable T. Obviously, the paths starting from variable T are all blocked because of collider ( $T \to D E_{T} \leftarrow U_{D E}$ ), collider ( $T \to D E_{T} \leftarrow Y$ ), or collider ( $D E_{T} \to Y \leftarrow U_{Y})$ on the paths.

Let us analyze the paths going into variable T between variable T and variable YY. There are three alternative paths— $T \leftarrow A N_{T} \leftrightarrow Y \leftarrow U_{Y} \to Y Y$ , $T \leftarrow A N_{T} \leftrightarrow Y \leftrightarrow D E_{T} \leftarrow U_{D E} \to D D E_{T} \leftrightarrow Y Y,$ and $T \leftarrow A N_{T} \leftarrow U_{A N} \to A A N_{T} \leftrightarrow Y Y$ .

For the path $T \leftarrow A N_{T} \leftrightarrow Y \leftarrow U_{Y} \to Y Y,$ there are two alternative situations—path $T \leftarrow A N_{T} \leftarrow Y \leftarrow U_{Y} \to Y Y$ and path $T \leftarrow A N_{T} \to Y \leftarrow U_{Y} \to Y Y$ . The path $T \leftarrow A N_{T} \leftarrow Y \leftarrow U_{Y} \to Y Y$ is blocked by the collider in the nodes $A N_{T}$ . Otherwise, a directed cycle $T \to D E_{T} \to Y \to A N_{T} \to Y$ will be produced inevitably in the graph. The path $T \leftarrow A N_{T} \to Y \leftarrow U_{Y} \to Y Y$ is blocked by the collider $(A N_{T} \to Y \leftarrow U_{Y})$ . So, the path $T \leftarrow A N_{T} \leftrightarrow Y \leftarrow U_{Y} \to Y Y$ is blocked.

For the path $T \leftarrow A N_{T} \leftrightarrow Y \leftrightarrow D E_{T} \leftarrow U_{D E} \to D D E_{T} \leftrightarrow Y Y$ , there are four alternative situations—path $T \leftarrow A N_{T} \leftarrow Y \leftarrow D E_{T} \leftarrow U_{D E} \to D D E_{T} \to Y Y$ , path $T \leftarrow A N_{T} \leftarrow Y \to D E_{T} \leftarrow U_{D E} \to D D E_{X T} \leftarrow Y Y$ , path $T \leftarrow A N_{T} \to Y \leftarrow D E_{T} \leftarrow U_{D E} \to D D E_{T} \to Y Y$ , and path $T \leftarrow A N_{T} \to Y \to D E_{T} \leftarrow U_{D E} \to D D E_{T} \leftarrow Y Y$ . Path $T \leftarrow A N_{T} \leftarrow Y \leftarrow D E_{T} \leftarrow U_{D E} \to D D E_{T} \to Y Y$ and path $T \leftarrow A N_{T} \leftarrow Y \to D E_{T} \leftarrow U_{D E} \to D D E_{T} \leftarrow Y Y$ are all blocked by the collider in the nodes $A N_{T}$ . Otherwise, a directed cycle will be produced. Path $T \leftarrow A N_{T} \to Y \leftarrow D E_{T} \leftarrow U_{D E} \to D D E_{T} \to Y Y$ is blocked by the collider $(A N_{T} \to Y \leftarrow D E_{T})$ on the path. Path $T \leftarrow A N_{T} \to Y \to D E_{T} \leftarrow U_{D E} \to D D E_{T} \leftarrow Y Y$ is blocked by the collider $(Y \to D E_{T} \leftarrow U_{D E})$ . So, the path $T \leftarrow A N_{T} \leftrightarrow Y \leftrightarrow D E_{T} \leftarrow U_{D E} \to D D E_{T} \leftrightarrow Y Y$ is blocked.

However, for the path $T \leftarrow A N_{T} \leftarrow U_{A N} \to A A N_{T} \leftrightarrow Y Y$ , because variable T and variable Y have common ancestors, there must be directed edge $A N_{T} \to Y$ in the actual world, and there must be directed edge $A A N_{T} \to Y Y$ in the hypothetical world. Correspondingly, there must be a path $T \leftarrow A N_{T} \leftarrow U_{A N} \to A A N_{T} \to Y Y$ , which has no collider and connects variable T and variable YY.

Hence, when variable T and variable Y have a common ancestor, there must be a path between variable T and variable YY which is not blocked in twin a network. Because ${Y_{t^{'}} (u), Y_{t} (u)} ⫫ T$ no longer holds, variable T is not strongly exogenous relative to Y in model M.

4. Estimation for Probability of Causation on LUCAS

Quantitatively assessing the likelihood that one event was the cause of another is very important for decision-making. In this section, firstly, we present the definitions of probability of causation, as defined in [35], with the language of counterfactual, and its estimation under condition of exogenous or monotonic [17]. Then, we illustrate the estimation of probability of causation in the context of LUCAS using a graphical model structure characteristic of strong exogeneity.

4.1. Definition of Probabilities of Causation

Probability of causation is the odds that one event was the cause of another and it is quantization of causal relationship between events. For instance, epidemiologists are interested in estimating the probability that a certain case of disease is “attributable” to a particular exposure. In other words, “the probability that disease would not have occurred in the absence of exposure, given that disease and exposure did in fact occur.” This probability of causation measures how necessary the cause is for the happening of the effect, and is called the probability of necessary (PS).

Let T and Y be two binary variables in a causal model M. Let t and y stand for the propositions T = true and Y = true, respectively, and let $t^{'}$ and $y^{'}$ denote their complements.

The probability of necessity (PN) is defined as the expression

(3) $P N ≜ P (Y_{T = false} = false | T = true, Y = true) ≜ P (Y_{T = t^{’}} = y^{'} | T = t, Y = y) .$

PN stands for the probability of $Y_{T = t^{’}} = y^{'}$ (that event y would not have occurred in the absence of event t), under the condition that t and y did in fact occur.

The probability of sufficiency (PS) is defined as the expression

(4) $P S ≜ P (Y_{T = true} = true | T = false, Y = false) ≜ P (Y_{T = t} = y | T = t^{'}, Y = y^{'}) .$

PS stands for the probability of $Y_{T = t} = y$ (that event y would have occurred if event t occur), under the condition that event t and y did not occur in fact.

Tian and Pearl derived the bound or point estimates of probabilities of causation under the condition of exogeneity or monotonicity, as shown in the following expressions.

4.2. Definition of Monotonicity

An outcome variable Y is said to be monotonically relative to the intervention variable T in a causal model M if, and only if, the counterfactual variable $Y_{t} (u)$ is monotonic in t for all u.

(5) $Y_{t} (u) > Y_{t^{'}} (u) (i f t > t^{'}) .$

4.3. Estimation for Probabilities of Causation under Condition of Exogeneity

If intervention variable T is said to be weakly exogenous relative to outcome variable Y in model M, then the probability of causation, PN and PS, can be bounded with the following expressions.

(6) $\frac{\max [o, P (y | t) - P (y | t^{'})]}{P (y | t)} \leq P N \leq \frac{\min [P (y | t), P (y^{'} | t^{'})]}{P (y | t)},$

(7) $\frac{\max [o, P (y | t) - P (y | t^{'})]}{P (y^{'} | t^{'})} \leq P S \leq \frac{\min [P (y | t), P (y^{'} | t^{'})]}{P (y^{'} | t^{'})} .$

So, we can obtain the bounds of PN and PS based on the observational data under the condition of weak exogeneity. To simply the expression, Y = y is abbreviated as y, $Y = y^{'}$ , Z = z, T = t and $T = t^{'}$ are all in the same way.

4.4. Estimation for Probability of Causation under Condition of Exogeneity and Monotonicity

If intervention variable T is said to be weakly exogenous relative to outcome variable Y, and variable Y is said to be monotonic relative to variable T in model M, then the probability of causation, PN and PS, can be calculated with the following expressions.

(8) $P N = \frac{P (y | t) - P (y | t^{'})}{P (y | t)},$

(9) $P S = \frac{P (y | t) - P (y | t^{'})}{1 - P (y | t^{'})} .$

So, we can obtain the value of PN and PS based on the observational data under the condition of exogeneity and monotonicity.

4.5. Estimation for Probability of Causation in Context of LUCAS

LUCAS are synthetic datasets provided in the tutorial to evaluate causal modeling techniques, which is organized for IEEE World Congress on Computational Intelligence (WCCI) 2008 [24]. The datasets have 12 binary variables: (1) smoking, (2) yellow fingers, (3) anxiety, (4) peer pressure, (5) genetics, (6) attention disorder, (7) born on even day, (8) car accident, (9) fatigue, (10) allergy, (11) coughing, and (12) lung cancer. Assuming we only have the training dataset of observational LUCAS0, was the probability of smoking the cause of lung cancer?

All of the data in the training dataset of LUCAS0 are observational data; thus, we cannot identify weak exogeneity by comparing experimental and nonexperimental data according to its definition Equation (1). However, based on observational data, we can obtain the DAG, representing the causal relation between variables in LUCAS using bnlearn, pcalg, or other causal discovery tools.

In this paper, we will sidestep the procedure of obtaining the DAG representing the causal relation between variables in LUCAS0, and focus on the estimation for the probability of causation. The corresponding DAG is displayed Figure 4. From Figure 4, the smoking X variable and the lung cancer Y variable have no common ancestor, i.e., the smoking X variable is said to be strongly exogenous relative to the lung cancer Y variable.

The observational data associated with LUCAS0 training data are shown in Table 1, which provide the following estimates.

$P (y | z^{'}, x) = 0.83934, P (y | z^{'}, x^{'}) = 0.23146, P (y^{'} | z^{'}, x) = 0.16066, P (y^{'} | z^{'}, x^{'}) = 0.76854, P (y | z, x) = 0.99351, P (y | z, x^{'}) = 0.86996, P (y^{'} | z, x) = 0.00649, P (y^{'} | z, x^{'}) = 0.13004 .$

If variable z is a binary variable, then any conditional probability can satisfy the following relationship (see Appendix A).

(10) $(y | x) = P (y | z, x) P (z | x) + P (y | z^{'}, x) P (z^{'} | x) .$

As shown in Figure 4, Smoking(X) → Lung Cancer(Y) ← Genetics(Z) is a collider in the DAG; therefore, we can obtain the following:

$X ⫫ Z,$

$P (z | x) = P (z),$

and

$P (z^{'} | x) = P (z^{'}),$

then we have

$P (y | x) = P (y | z, x) P (z) + P (y | z^{'}, x) P (z^{'}) = 0.86393 .$

Similarly, we can obtain

$P (y | x^{'}) = P (y | z, x^{'}) P (z | x^{'}) + P (y | z^{'}, x^{'}) P (z^{'} | x^{'}) = 0.33332, P (y^{'} | x^{'}) = 1 - P (y | x^{'}) = 0.66668 .$

The smoking X variable is said to be strongly exogenous relative to the lung cancer Y variable; thus, the smoking X variable is said to be weakly exogenous relative to the lung cancer Y variable. Then, we can obtain bounds on probabilities of causation through Equations (6) and (7)

$0.61418 \leq P N \leq 0.77168, 0.79590 \leq P S \leq 1 .$

If we assume that smoking X can only cause, but never prevent, lung cancer Y, i.e., if the lung cancer Y variable is said to be monotonically relative to the smoking X variable, Equations (8) and (9) are applicable and can yield

$P N = 0.61418, P S = 0.79590 .$

5. Conclusions, Limitations, and Future Research

Strong exogeneity is an important assumption in the study of causal inference. Because of involving variables in the actual world and the hypothetical world at the same time, it is difficult to identify strong exogeneity, and there are very few relevant references about it to date. However, based on the twin network method, we can investigate the graphical model structure characteristic of strong exogeneity with simple operations similar to those used to solve arithmetic problems. Compared with the other derivation method of graphical diagnosis, the method based on the twin network is much more concise, clear, and understandable. With the graphical model structure characteristic of strong exogeneity, the identification of strong exogeneity and the estimation for probability of causation can be easily achieved based on observational data. In this paper, we have only considered the case of binary treatments. For future work, aiming at the case of multivalued treatments, the graphical model structure characteristic of strong exogeneity will be investigated based on the parallel worlds graph method, which is a generalization of the twin network method.

Author Contributions

Conceptualization and methodology, R.L.; analysis, L.S.; writing—original draft preparation, R.L., L.S. and M.L.; writing—review and editing, Y.K. and P.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Sichuan Science and Technology Program [grant numbers:2021YFG0169], [grant numbers: 2022YFG0190], Chengdu Normal University Science and Technology Project [no. 111/111159001], and the National Nature Science Foundation of China [No.61871332].

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request. The dataset is described at relevant places within the text as reference [24].

Acknowledgments

We would like to thank the reviewers for their valuable comments and pointing out possible further research topics.

Conflicts of Interest

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Figure 1. An example of DAG.

Figure 2. Twin network for counterfactual analysis when variable T and variable Y have no common ancestors.

Figure 3. Twin network for counterfactual analysis when variable T and variable Y have common ancestors.

Figure 4. DAG of LUCAS0.

Table 1

Frequency data comparing lung cancer among smoking (x) and no smoking ( $x^{'}$ ) in LUCAS0.

	Genetics = F $(z^{'}$ $) P (z^{'} = 0.84047)$		Genetics = T (z) P(z = 0.15953)
	Smoking = T(x)	Smoking = F $(x^{'})$	Smoking = T(x)	Smoking = F $(x^{'})$
Lung cancer = T (y)	0.83934	0.23146	0.99351	0.86996
Lung cancer = F( $y^{'}$ )	0.16066	0.76854	0.00649	0.13004

Appendix A

Proof of Equation (9).

$\sum_{z} [P (y | z, x) P (z | x)]$ $= \sum_{z} \frac{P (y | z, x) P (z | x) P (x)}{P (x)}$ $= \sum_{z} \frac{P (y | z, x) P (z, x)}{P (x)}$ $= \sum_{z} \frac{P (y, z, x)}{P (x)}$ $= \frac{\sum_{z} P (y, z, x)}{P (x)}$ $= \frac{P (y, x)}{P (x)}$ $= P (y | x)$

If Z is binary variable with value z and $z^{'}$ , then we have $\sum_{z} [P (y | z, x) P (z | x)] = P (y | z, x) P (z | x) + P (y | z^{'}, x) P (z^{'} | x)$

So, we can obtain $P (y | x) = P (y | z, x) P (z | x) + P (y | z^{'}, x) P (z^{'} | x)$ □

References

1. Schuler, M.S.; Rose, S. Targeted Maximum Likelihood Estimation for Causal Inference in Observational Studies. Am. J. Epidemiol.; 2017; 185, pp. 65-73. [DOI: https://dx.doi.org/10.1093/aje/kww165]

2. Hernán, M.A.; Brumback, B.; Robins, J.M. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology; 2000; 11, pp. 561-570. [DOI: https://dx.doi.org/10.1097/00001648-200009000-00012]

3. Tyrrell, J.; Mulugeta, A.; Wood, A.R.; Zhou, A.; Beaumont, R.N.; Tuke, M.A.; Jones, S.E.; Ruth, K.S.; Yaghootkar, H.; Sharp, S. et al. Using genetics to understand the causal influence of higher BMI on depression. Int. J. Epidemiol.; 2019; 48, pp. 834-848.

4. Schölkopf, B.; Locatello, F.; Bauer, S.; Ke, N.R.; Kalchbrenner, N.; Goyal, A.; Bengio, Y. Toward Causal Representation Learning. Proc. IEEE.; 2021; 109, pp. 612-634. [DOI: https://dx.doi.org/10.1109/JPROC.2021.3058954]

5. Bareinboim, E.; Forney, A.; Pearl, J. Bandits with Unobserved Confounders: A Causal Approach. Adv. Neural Inf. Process. Syst. (NIPS 2015); 2015; 28, pp. 1342-1350.

6. Boros, F.A.; Maszlag-Török, R.; Szűcs, M.; Annus, Á.; Klivényi, P.; Vécsei, L. Relationships of Ischemic Stroke Occurrence and Outcome with Gene Variants Encoding Enzymes of Tryptophan Metabolism. Biomedicines; 2021; 9, 1441. [DOI: https://dx.doi.org/10.3390/biomedicines9101441]

7. Ghafele, R.; Gibert, B. A Counterfactual Impact Analysis of Fair Use Policy on Copyright Related Industries in Singapore. Laws; 2014; 3, pp. 327-352. [DOI: https://dx.doi.org/10.3390/laws3020327]

8. Keele, L.; Stevenson, R.T.; Elwert, F. The causal interpretation of estimated associations in regression models. Politcal Sci. Res. Methods; 2019; 8, pp. 1-13. [DOI: https://dx.doi.org/10.1017/psrm.2019.31]

9. Criscuolo, C.; Martin, R.; Overman, H.G.; Reenen, J.V. Some Causal Effects of an Industrial Policy. Am. Econ. Rev.; 2019; 109, pp. 48-85. [DOI: https://dx.doi.org/10.1257/aer.20160034]

10. Nazarov, D. Causality: Intelligent Valuation Models in the Digital Economy. Mathematics; 2020; 8, 2174. [DOI: https://dx.doi.org/10.3390/math8122174]

11. Rosenbaum, P.R.; Rubin, D.B. The central role of propensity score in observational studies for causal effects. Biometrika; 1983; 70, pp. 41-55. [DOI: https://dx.doi.org/10.1093/biomet/70.1.41]

12. Pearl, J. Causality: Models, Reasoning, and Inference; Cambridge University Press: Cambridge, UK, 2009.

13. Dawid, A.P. Causal Inference without Counterfactuals. J. Am. Stat. Assoc.; 2000; 95, pp. 407-424. [DOI: https://dx.doi.org/10.1080/01621459.2000.10474210]

14. Fisher, R.A. Statistical Methods for Research Workers. Breakthroughs in Statistics; Springer Series in Statistics (Perspectives in Statistics) Kotz, S.; Johnson, N.L. Springer: New York, NY, USA, 1992; [DOI: https://dx.doi.org/10.1007/978-1-4612-4380-9_6]

15. Sauppe, J.; Jacobson, S. The Role of Covariate Balance in Observational Studies. Nav. Res. Logist.; 2017; 64, pp. 323-344. [DOI: https://dx.doi.org/10.1002/nav.21751]

16. Castro-Martín, L.; Rueda, M.d.M.; Ferri-García, R. Estimating General Parameters from Non-Probability Surveys Using Propensity Score Adjustment. Mathematics; 2020; 8, 2096. [DOI: https://dx.doi.org/10.3390/math8112096]

17. Tian, J.; Pearl, J. Probabilities of causation: Bounds and identification. Ann. Math. Artif. Intell.; 2000; 28, pp. 287-313. [DOI: https://dx.doi.org/10.1023/A:1018912507879]

18. Zhou, X.; Xie, Y. Propensity Score–Based Methods versus MTE-Based Methods in Causal Inference: Identification, Estimation, and Application. Sociol. Methods Res.; 2016; 45, pp. 3-40. [DOI: https://dx.doi.org/10.1177/0049124114555199]

19. Harding, D.J. Counterfactual Models of Neighborhood Effects: The Effect of Neighborhood Poverty on High School Dropout and Teenage Pregnancy. Am. J. Sociol.; 2003; 109, pp. 676-719. [DOI: https://dx.doi.org/10.1086/379217]

20. DiPrete, T.A.; Gangl, M. Assessing Bias in the Estimation of Causal Effects: Rosenbaum Bounds on Matching Estimators and Instrumental Variables Estimation with Imperfect Instruments. Sociol. Methodol.; 2004; 34, pp. 271-310. [DOI: https://dx.doi.org/10.1111/j.0081-1750.2004.00154.x]

21. Rosenbaum, P.R. From Association to Causation in Observational Studies: The Role of Strongly Ignorable Treatment Assignment. J. Am. Stat. Assoc.; 1984; 79, pp. 41-48. [DOI: https://dx.doi.org/10.1080/01621459.1984.10477060]

22. Emura, T.; Wang, J.F.; Katsuyama, H. Assessing the Assumption of the Strongly Ignorable Treatment Assignment Under Assumed Causal Models; Technical Reports of Mathematical Sciences Chiba University: Chiba, Japan, 2008; 24.

23. Cheng, L.; Guo, R.C.; Moraffah, R.; Candan, K.S.; Raglin, A.; Liu, H. A Practical Data Repository for Causal Learning with Big Data. Benchmarking, Measuring, and Optimizing. Bench 2019. Lecture Notes in Computer Science; Gao, W.; Zhan, J.; Fox, G.; Lu, X.; Stanzione, D. Springer: Cham, Switzerland, 2020; Volume 12093, [DOI: https://dx.doi.org/10.1007/978-3-030-49556-5_23]

24. Guyon, I.; Aliferis, C.; Cooper, G.; Elisseeff, A.; Pellet, J.P.; Spirtes, P.; Statnikov, A. Design and analysis of the causation and prediction challenge. Proceedings of the Workshop on the Causation and Prediction Challenge at WCCI 2008, Machine Learning Research; Hong Kong, China, 3–4 June 2008; Volume 3, pp. 1-33. Available online: http://proceedings.mlr.press/v3/guyon08a/guyon08a.pdf (accessed on 20 September 2021).

25. Wright, S. The relative importance of heredity and environment in determining the piebald pattern of guinea-pigs. Proc. Natl. Acad. Sci. USA; 1920; 6, pp. 320-332. [DOI: https://dx.doi.org/10.1073/pnas.6.6.320]

26. Elwert, F. Graphical Causal Models. Handbook of Causal Analysis for Social Research. Handbooks of Sociology and Social Research; Morgan, S. Springer: Dordrecht, The Netherlands, 2013; pp. 245-274. [DOI: https://dx.doi.org/10.1007/978-94-007-6094-3_13]

27. Glymour, M.M.; Greenland, S. Causal diagrams. Mod. Epidemiol.; 2008; 3, pp. 183-209.

28. Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009.

29. Pearl, J.; Glymour, M.; Jewell, N.P. Causal Inference in Statistics: A Primer; Wiley: New York, NY, USA, 2016.

30. Stigler, S.M. Statistics on the Table: The History of Statistical Concepts and Methods; Harvard University Press: Cambridge, MA, USA, Hoboken, NJ, USA, 1999.

31. Lewis, D. Causation as Influence. J. Philosophy; 2000; 97, pp. 182-197. [DOI: https://dx.doi.org/10.2307/2678389]

32. Neyman, J.S.; Dabrowska, D.M.; Speed, T.P. On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Stat. Sci.; 1990; 5, pp. 465-472.

33. Balke, A.; Pearl, J. Counterfactual probabilities: Computational methods, bounds, and application. Uncertainty Proceedings 1994; Elsevier: Amsterdam, The Netherlands, 1994; pp. 46-54.

34. Balke, A.; Pearl, J. Probabilistic Evaluation of Counterfactual Queries. Proceedings of the Twelfth National Conference on Artificial Intelligence; Seattle, WA, USA, 1–4 August 1994; Volume 1, pp. 230-237.

35. Pearl, J. Probabilities of causation: Three counterfactual interpretations and their identification. Synthese; 1999; 121, pp. 93-149. [DOI: https://dx.doi.org/10.1023/A:1005233831499]

Word count: 5924

Show less

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Strong exogeneity is an important assumption in the study of causal inference, but it is difficult to identify according to its definition. The twin network method provides a graphical model tool for analyzing the variable relationship, involving the actual world and the hypothetical world, which facilitates the investigating of strong exogeneity. In this paper, the graphical model structure characteristic of strong exogeneity is investigated based on the twin network method. Compared with other derivation methods of graphical diagnosis, the method based on the twin network is more concise, clearer, and easier to understand. Under the condition of strong exogeneity, it is easy to estimate the probability of causation based on observational data. As an example, the application of graphical model structure characteristic of strong exogeneity in causal inference in the context of lung cancer simple sets (LUCAS) is illustrated.

Details

Title

Research on the Graphical Model Structure Characteristic of Strong Exogeneity Based on Twin Network Method and Its Application in Causal Inference

Author

Luo, Rui¹; Sun, Lijia²; Yin Kuang¹; Deng, Ping³; Lu, Mengna⁴

¹ Key Lab of Interior Layout optimization and Security, Chengdu Normal University, Chengdu 611130, China; [email protected]
² School of Electronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; [email protected]
³ Key Lab of Information Coding and Transmission, Southwest Jiaotong University, Chengdu 611756, China; [email protected]
⁴ School of Aeronautics and Astronautics, University of Electronic Science and Technology of China, Chengdu 611731, China; [email protected]

First page

957

Publication year

2022

Publication date

2022

Publisher

MDPI AG

e-ISSN

22277390

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/math10060957

ProQuest document ID

2642485670

Research on the Graphical Model Structure Characteristic of Strong Exogeneity Based on Twin Network Method and Its Application in Causal Inference

Jump to:

Full text

Abstract

Details

Suggested sources