Content area
In our recent work, we developed a novel dynamic programming algorithm to find optimal Bayesian networks with parent set constraints. This ‘generational orderings’ based dynamic programming algorithm—CausNet—efficiently searches the space of possible Bayesian networks. The method is designed for continuous as well as discrete data, and continuous, discrete and survival outcomes. In the present work, we develop a variant of CausNet—CausNet-partial—where we introduce the space of ‘partial generational orderings’, which is a novel way to search for small and sparse optimal Bayesian networks from large dimensional data. We test this method both on simulated and real data. In simulations, CausNet-partial shows superior performance when compared with three state-of-the-art algorithms. We apply it also to a benchmark discrete Bayesian network ALARM, a Bayesian network designed to provide an alarm message system for patient monitoring. We first apply the original CausNet and then CausNet-partial, varying the partial order from 5 to 2. CausNet-partial discovers small sparse networks with drastically reduced runtime as expected from theory. To further demonstrate the efficacy of CausNet-partial, we apply it to an Ovarian Cancer gene expression dataset with 513 genes and a survival outcome. Our algorithm is able to find optimal Bayesian networks with different number of nodes as we vary the partial order. On a personal computer with a 2.3 GHz Intel Core i9 processor with 16 GB RAM, each processing takes less than five minutes. Our ‘partial generational orderings’ based method CausNet-partial is an efficient and scalable method for finding optimal sparse and small Bayesian networks from high dimensional data.
1 Introduction
Optimal Bayesian Network (BN) Structure Discovery is a method to learn optimal Bayesian networks from data (see e.g. [1–6]). Finding a best BN is NP-hard [7, 8]—the number of possible structures for n variables being [9]. The super-exponential size of the search space makes BN Structure Discovery through exhaustive search impossible for high dimensional data.
Optimal BN Structure Discovery methods are broadly classified as score-based (exact and approximate), constraint-based, and hybrid methods [10]. In the present work, we focus our attention to the score-based exact methods. The first exact methods that searched the whole space were based on Dynamic Programming (DP)—these include DP methods by Koivisto & Sood [11, 12], Silander and Myllymäki [13] and Singh and Andrew [14]. But even with dynamic programming, that reduces the complexity to exponential, the search is feasible only for very small data sets—usually less than 30 variables [13, 14], and even that is achieved by severely restricting the number of parents for each node.
Later exact algorithms worked on pruning the search space to find an optimal BN. The main methods in this category are GOBNILP (Globally Optimal Bayesian Network learning using Integer Linear Programming) [15] by Cussens that uses integer linear programming [15, 16] for pruning the search space; the A* search method by Yuan and Malone [17, 18]; and Branch-and-Bound methods (e.g. by de Campos et al. [19]). In [20], Fan et al. also implemented potentially optimal parent sets (POPS) and ancestral constraints to further prune the search space in the A* suite of algorithms. These exact methods improved upon the DP algorithms in terms of efficiency and speed and can be applied to higher number of variables [10].
On the similar lines, to overcome the curse of dimensionality in large datasets, we recently developed a dynamic programming based method CausNet [21] that identifies and implements ‘parent set constraints’ to prune the search space to find an optimal BN. In this method, we introduced a novel approach of ‘Generational orderings’ based search rather than the current lexicographic search for learning optimal BNs using dynamic programming.
In the current work, we introduce an extension of the CausNet approach where we consider ‘Partial Generational orderings’ rather than complete ‘Generational orderings’ to provide another novel way to tackle the curse of dimensionality in learning optimal BNs. This new method- CausNet-partial- further optimizes the search for small and sparse BNs in the subspace of ‘Partial Generational orderings’. This reduces the search space drastically and makes CausNet-partial work for large-dimensional data. The amount of reduction in search space can be controlled by the user according to the problem and prior domain knowledge, using specifiable parameters.
Many available algorithms do not support both continuous and discrete data [22], and few support survival outcomes [23]. Among the score-based exact methods, none of the algorithms by Koivisto & Sood [11], Silander and Myllymäki [13], Singh and Andrew [14] or GOBNILP [15] support survival outcomes. Even for other algorithms, little work has been done to support survival outcome in optimal BN learning [23]. To our knowledge, only a few methods like [24–26] support survival outcomes. CausNet and our new method CausNet-partial provide support for both continuous and discrete data (but not mixed data as of now), and also provide support for survival outcomes in the case of continuous data. As survival outcome is a scenario common in many biomedical data, this fills an important gap among the available BN learning methods.
2 Background
A Bayesian network [27, 28] is a probabilistic graphical model represented by a directed acyclic graph (DAG). The vertices of the DAG correspond to random variables and the edges represent conditional dependences among the random variables. For each vertex , a conditional probability distribution gives the dependence of the variable on its set of parents .
A Bayesian network G can be compactly represented by a vector : here Gi is a subset of V with directed edges to . Any BN also corresponds to an ordering of the vertices, given by the ordered set , where is a permutation of [p]—the set of first p positive numbers, with . A BN is consistent with an ordering if the parents of are a subset of , i.e. .
In the score-based exact approaches for optimal BN structure learning, a scoring function is used. The scoring function gives a real value (score) representing the goodness of fit of a BN to the data. To find a best BN, this score is maximized over all the possible BNs. There are a variety of scoring functions available—for example, BIC/MDL [29–31], BDeu [32, 33], and BGe [33–35]. In Causnet and CausNet-partial, we provide BIC (Bayesian information criterion) and BGe (Bayesian Gaussian equivalent) scoring function options. These two scoring functions have been shown to perform consistently well for large data, and for both discrete and continuous variables [31, 32].
3 CausNet
CausNet [21] uses the dynamic programming approach to finding a best BN, following the work by Koivisto & Sood [11, 12], Silander and Myllymäki [13] and by Singh and Andrew [14]. CausNet follows the algorithm proposed by Silander and Myllymäki [13] (SM algorithm henceforth) and makes heuristic modifications to it to prune the search space. It uses ‘generational orderings’ based search with parent set identification and resulting parent set constraints, as well as in-degree constraints for this purpose.
The SM method and CausNet use an an important fact about DAGs that every DAG contains at least one sink (a node with no outgoing edges). The problem of finding a best BN given the data relies on finding a best sink for the whole set of nodes. This in turn requires finding a best sink for the set of nodes without that sink, leading to the following recursion :
(1)
where s is the best sink, and bestscore(V) is the score of a best network with nodes V, and bestScore(s,U) is the best score of s with parents in U. The result is an ordering of the nodes, with each best sink removed progressively, , and their best parents from which the best BN can be recovered.
To implement the above recursion, we use a local score for a node with parents using a scoring function (BIC or BGe). Then the score of a network network G is defined as :
(2)
where localscore(x,y) gives the score of x with parents y in the network G. The best score for each node with possible parents ppi is then computed and the best parents bpsi found for :
(3)(4)
Then the best sink s and the best score for a best network in V can be found by the following equations.
(5)(6)
In Fig 1, we show the subset lattice of four nodes ; all the paths in the lattice denote node orderings that need to be searched to find a best network. Each arrow in the lattice also encodes a sink—if ps is the subset at the source and ph the subset at the end of a directed edge, the sink is given by . In the exhaustive search, there are paths that need to be searched if we know the best parents for each node in the ordering (which is computed beforehand in the DP approach). The basic CausNet approach without restricting the search space, has the following five steps:
1. Compute local scores for all (node, node set)-pairs.
2. Find best parents of each of the (node, possible parent set)-pairs using local scores.
3. Find a best sink for each of the node sets.
4. Using the results from Step 3, find a best ordering of the nodes.
5. Find a best network using results of Steps 2 and 4.
[Figure omitted. See PDF.]
The CausNet algorithm, however, doesn’t search the whole space of possible BNs but searches a much smaller subspace based on parent set identification and searches only the paths that are consistent with ‘generational orderings’ resulting from the parent set constraints. Here we reproduce the algorithms of the CausNet method. For details, the reader is referred to [21].
Algorithm 1. Find pp, Compute feasSet and feasSetData.
In Algorithm 1, we find the possible parents ppi, and also, as a result, the possible offsprings poi for each node using the magnitude of a marginal association effect estimate or a False Discovery Rate (FDR) cutoff . This step may result in a much smaller subset of nodes feasSet to be considered. If the response is a survival outcome, we evaluate associations using Cox proportional hazards regression, independently for each feature. In the CausNet software, we also provide an option for a ‘phenotype driven search’ for possible parents in which we consider only two or three levels of associations (similar to 1-hop and 2-hop approaches respectively in [36, 37]) starting with the outcome variable, i.e. we identify possible parents, ‘grandparents’ and ‘great-grandparents’ of the phenotype outcome.
Algorithm 2. Compute local scores for feasSet nodes.
Algorithm 3. Compute best scores and best parents for feasSet nodes in all parent subsets.
Algorithm 4. Compute best sinks for feasSet subsets.
Algorithm 5. Find the best network(s).
4 CausNet-partial—Dynamic programming on the space of ‘partial generational orderings’
In Causnet-partial, we introduce the space of ‘partial generational orderings’ of nodes instead of the full generational orderings to find small and sparse optimal BNs. This space of ‘partial generational orderings’ is a much smaller space and different from the space of ‘generational orderings’ which is searched by CausNet. In this approach we start not at the level of singletons in the algorithm for finding best sinks as is done in basic CausNet [21] but at a certain level of subsets of cardinality , where pOrd is the length of partial orderings that we want to consider. For example, if we want to search partial orderings of length 2 for our example of 4 variables, we will start at the level of subsets of cardinality 2 instead of starting at the the empty set at the top of the subset lattice. This reduces the search space of orderings, and is useful when we are interested in smaller networks. CausNet-partial is a novel way that utilizes the working of the DP algorithm and modifies it at the stage of finding best sinks. It considers smaller orderings of nodes instead of full orderings of nodes. Furthermore, these ‘partial generational orderings’ are consistent with the parent set constraints.
The bestSinks algorithm of basic CausNet (Algorithm 4) is modified to Algorithm 6 in case of CausNet-partial. Here, we start at the level of subsets of cardinality . At this level, each element in a subset is considered as a sink with the rest of elements as possible parents. The best sink is computed to be the one that gives the best score with parents among the rest of the elements. Following this, the rest of the bestSinks algorithm proceeds as in the case of the original CausNet.
The subset lattice to be explored for partial orderings of length 2 for our example of 4 variables reduces to the one in Fig 2. Observe that in this case, without parent set constraints, we are searching only orderings, instead of orderings as is the case with original CausNet without parent set constraints.
[Figure omitted. See PDF.]
Now suppose that the possible parent sets for the four nodes are as follows : pp1 = {2,4}, pp2 = {1,3}, pp3 = {2}, pp4 = {1}. Factoring in these possible parent sets, we get the subset lattice as in Fig 3. Observe that the partial orderings of 2 nodes to be searched has further reduced to only 9. These 9 partial orderings are what we call ‘partial generational orderings’. Every node in a ‘partial generational ordering’ has a possible parent in the preceding set of nodes as per the possible parent set constraints. Observe that the missing arrows in Fig 3 compared with Fig 2 are the result of the parent set restrictions. For example, there is no arrow from the subset to because the node {4} as a sink has no parent in the set - the only possible parent of node {4} is {1}. Observe that the total number of orderings to be searched has reduced from 24 (in CausNet) to 9 (in CausNet-partial).
[Figure omitted. See PDF.]
The parent set constraints are given by : pp1 = {2,4}, pp2 = {3,1}, pp3 = {2}, pp4 = {1}.
Now suppose the best sinks are given by the ones encoded by blue arrows as in Fig 4. Then the best BN is given by the reverse ordering (4,1) encoded by the path given by the complete path of blue arrows from top to bottom. The first blue arrow in this path encodes the best sink {1} in the set , and the second blue arrow encodes the best sink {4} in the set .
[Figure omitted. See PDF.]
The blue arrows encode the best sink for each subset in the lattice. As an example, the blue arrow from to encodes the best sink {2}. The complete blue path represents the best network.
Search for sparse networks in the space of Partial Orderings reduces the run time of the best sinks algorithm substantially, and thus of the CausNet-partial method.
Algorithm 6. Compute best sinks for feasSet subsets—partial orderings.
Theorem 4.1. Total number of labeled DAGs with p nodes and partial order r is .
Proof: For p nodes, total number of orderings is ; With pOrd = r, the number of orderings with r nodes is . For each node in the partial ordering of r nodes, the number of possible parent sets is given by-
So, the total number of labeled DAGs with p nodes and partial order r is given by :
The number of network structures is because there are many repeated structures in this combinatorial computation; e.g. there are structures with all r nodes disconnected.
Theorem 4.2. Without parent set and in-degree restrictions, the CausNet-partial -the partial generational ordering based DP algorithm—explores all the network structures for p nodes.
Proof: Without parent set restrictions, every node is a possible parent of every other node. In Fig 1, let be the cardinality of subsets in the subset lattice for p nodes. Let each row in the subset lattice be the kth level in the lattice. Now adding a new element in Algorithm 4 corresponds to an edge between a subset of cardinality k and k–1, which considers the added element as a sink in the subset of cardinality k. Number of edges to a subset of cardinality k from subsets of cardinality k–1 is given by. The number of possible parent combinations for a sink in subset of cardinality k, without in-degree restrictions, is given by . In Algorithm 3, we explore all these possible parent sets to find the best parents for each sink s in each subset at each level k. The Algorithm 6 uses this information to get the best sink (possibly multiple) for each subset at level k. Thus the total number of networks searched by the Causnet-partial algorithm is given by
5 Simulations
We compare CausNet and CausNet-partial with three state-of-the-art methods that are currently extensively used for optimal BN structure learning. One is an exact method called GOBNILP [15, 16], which is an integer learning based method, while the other two—BNlearn’s Hill Climbing (HC) [38] and Max-min Hill Climbing (MMHC) [39]—are approximate methods. While Hill-Climbing (HC) is a score-based method that uses greedy search on the space of the BNs, the Max-Min Hill-Climbing (MMHC) is a hybrid method that uses a constraint-based algorithm, Max-Min Parents and Children (MMPC), to find skeleton of optimal undirected graph, followed by a score-based search to find directions of the edges.
Bayesian networks were simulated as in [21] by generating N x p data matrices of continuous Gaussian data with the dependencies between variables simulated using linear regression with varying effect sizes. For details of simulation design, the reader is referred to [21]. Again as in [21], we use the False Discovery Rate (FDR) and Hamming Distance as the metrics to compare the methods. With FP being the number of false positives and TP the number of true positives,
FDR is a especially useful metric for high dimensional data and for network analysis [40]. Reducing FDR is an important indicator of the quality of BN given the data, where we want to identify as many true edges while simultaneously reducing false positive edges.
The Hamming distance is a measure of structural similarity of BNs, and gives a measure of goodness of a predicted BN [41, 42]. With FP representing the number of false positives and FN the number of false negatives, Hamming Distance is defined as :
In the first set of simulations, we ran the methods on simulated BNs with number of variables below 100, i.e. , and number of data samples . For using CausNet and CausNet-partial, we used FDR cutoff of 0.3 for parent set identification and an in-degree of 2 for all the experiments. In [21], we have shown how CausNet compares with the three other methods. Here we include CausNet-partial in the two Tables 1 and 2. We use pOrd = 3 for search on partial orderings in CausNet-partial. In these tables, we show the average FDR for different methods in finding both directed and undirected graphs. The results for CausNet and CausNet-partial with the BIC scoring function are given. The results show that our methods—both CausNet and CausNet-partial—outperform the other three methods for the most part.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
For the number of variables up to 40 (Table 1), on the metric of FDR, CausNet-partial performs the best, while Gobnilp is the second best and Causnet the third best for directed graphs. For undirected graphs, CausNet-partial and CausNet are the first and second best performers followed by Gobnilp. For the number of variables between 50 and 100 (Table 2), our methods are the top two best for both directed and undirected graphs.
The Figs 5 and 6 show the comparison of CausNet-partial with the three methods on the metrics of FDR and Hamming distance. The FDR for Partial Orderings based Causnet (CausNet-partial) is the best for number of variables greater than 10; Gobnilp is the best for 10 variables and the second best for number of variables greater than 10. In terms of Hamming distance, CausNet-partial again is the best for variables more than 20; Gobnilp and MMHC have lower Hamming distance than that of Causnet for variables less than 20, but rise quickly for variables more than 40; overall, Gobnilp is again the second best.
[Figure omitted. See PDF.]
Here the CausNet uses Partial orderings based search and uses parent set identification for dimensionality reduction while other algorithms don’t use any parent set identification.
[Figure omitted. See PDF.]
Here the CausNet uses Partial orderings based search and uses parent set identification for dimensionality reduction while other algorithms don’t use any parent set identification.
5.1 Number of variables upto 1000
For these simulations, we use number of variables , and number of data samples . For these simulations, we don’t compare with the other three algorithms as they either can not handle such high number of variables or take many orders of magnitude longer to process. Instead, we compare the three variants of CausNet—CausNet [21], phenotype-driven CausNet [21], and CausNet-partial. The results are shown in Figs 7 and 8. Here we observe that Partial Orderings Causnet gives the best results for both FDR and Hamming distance. Furthermore, both the FDR and Hamming Distance for Partial Orderings Causnet keeps getting better as the number of variables increase. This can be explained by the fact that as we reduce the length of orderings being searched there is a lesser chance of false positives. This also shows the effectiveness of using partial orderings where we expect sparse networks while the original data has much higher number of features, a common scenario for high dimensional data.
[Figure omitted. See PDF.]
Here we compare the three versions of Causnet—Causnet basic, Phenotype based Causnet and Partial Orderings Causnet.
[Figure omitted. See PDF.]
Here we compare the three versions of Causnet—Causnet basic, Phenotype based Causnet and Partial Orderings Causnet.
5.2 Runtime
Search for sparse and small networks in the space of Partial Orderings reduces the run time of the best sinks algorithm substantially, and thus of the Causnet-partial method. To show the runtime gains expected from theory as shown in Sect 4, we compare the runtimes of the five algorithms. During the simulations, it was observed that Gobnilp mostly terminates without producing any output BN for p>100. Due to this reason, we split the comparison for and . The average runtimes are shown in Table 3. Causnet-partial and Causnet have the top best runtimes for . For , Causnet-partial and CausNet are better than Gobnilp and MMHC. Although HC is faster for more than 100 variables, we have shown in the sections above that its accuracy is not so good when compared with Causnet-partial and CausNet. Between Causnet-partial and CausNet, Causnet-partial gives substantially reduced runtime, further validating the utility of Causnet-partial for finding sparse small optimal BNs.
[Figure omitted. See PDF.]
6 Application to benchmark Bayesian network—ALARM
Having shown the superior performance of CausNet and CausNet-partial, we show the efficacy of our method in finding sparse BNs now based on partial orderings. To this end, we apply our method to a benchmark Bayesian network ALARM [43]. The ALARM (“A Logical Alarm Reduction Mechanism") is a Bayesian network designed to provide an alarm message system for patient monitoring. The data set contains 37 variables, all discrete, with number of categories between 2 and 4. The dataset was downloaded from the bnlearn website (https://www.bnlearn.com/bnrepository/). We used bootstrap samples of size 1000, taken from the alarm data set, which has a total sample size of 20,000.
We first apply the original CausNet with discrete option for BIC score. Then we apply CausNet-partial varying the partial order from 5 to 2. Fig 9 shows the BN discovered by CausNet, and Fig 10 shows the BNs discovered by CausNet-partial with different partial orders. CausNet discovers a BN with 15 nodes when using the in-degree 2 and the FDR cut-off of 0.3 for possible parent set identification. With the same in-degree and the FDR cut-off, we then apply the CausNet-partial to obtain sparser BNs based on partial generational orderings. The results are shown in Table 4. As we can see from the figures and the table, we get smaller and smaller networks as we reduces pOrd. The FDR for CausNet-partial is a bit higher than the one we get with CausNet, but this keeps getting better as pOrd is reduced. FDR for undirected networks is consistently 0 for CausNet-partial. Initial rise in FDR is due to substantial reduction of nodes from CausNet to CausNet-partial. The nodes and the directed edges in the partial order 5 network are subsets of nodes and the directed edges in the CausNet network, as expected. The lower partial order networks maintain the same consistency with the directed edges. The gain in speed of computation is illustrated well with the runtime decreasing by many orders of magnitude going from CausNet to CausNet-partial.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
7 Application to clinical trial data
To demonstrate the effectiveness of CausNet-partial in the case of a survival outcome and continuous data, we applied our method to a recently published data on ovarian cancer. This dataset [44] includes gene expression data of 513 genes collected from participants in multiple clinical trials. This study aimed to develop a prognostic signature based on gene expression for overall survival (OS) in patients with high-grade serous ovarian cancer (HGSOC). Expression of 513 genes, selected from a meta-analysis of 1455 tumors was measured using NanoString technology from formalin-fixed paraffin-embedded tumor tissue collected from 3769 women with HGSOC. Results of this study showed that expression levels of 276 genes were associated with OS (false discovery rate <0.05) in covariate-adjusted single-gene analyses.
We first reduce the dimensionality of data by finding parent sets using FDR and correlation cut-offs. For this, we used a three-level phenotype driven search. For the disease node ‘Status’, dead or alive at censoring or end of follow-up, we carried out Cox proportional hazard regression for each gene separately, adjusted for age, stage, and stratified site. Then we computed analysis of variance (ANOVA) tables for the fitted models and created a list of p-values based on the distribution. FDR was then computed using the Benjamini & Hochberg (BH) method. We choose the 5 genes, ZFHX4, TIMP3, COL5A2, FBN1, and COL3A1, with the most significant p-values as possible parents of the disease node. For rest of the nodes, we use correlation cut-offs to identify possible parent sets of each of the other nodes. Using the indegree of 2 and tuning the correlation cut-off parameter, we arrive at a ‘feasSet’ of 16 genes with non-null parent sets. We then apply first the CausNet and then CausNet-partial, varying the partial orders from 5 to 3. The resulting optimal BNs learned are shown in Fig 11. On a personal computer with a 2.3 GHz Intel Core i9 processor with 16 GB RAM, the processing for each network took less than 5 minutes. As expected, there are substantial overlap of genes and the edges in the BNs discovered by CausNet and CausNet-partial. But there are also differences, with some different genes and some different edges in the networks. Some differences are explained by the fact that there are multiple best BNs for each of the cases, and we pick the first one given by the system. These differences can be further analyzed over multiple best BNs for statistical uncertainty quantification.
[Figure omitted. See PDF.]
In a recent work [45] on finding ovarian cancer signature, Ye et, al. found that the tumors highly express a panel of genes, namely POSTN, LUM, THBS2, COL3A1, COL5A1, COL5A2, FAP1 and FBN1. All these eight genes are in the BN found by CausNet, while partial order smaller BNs have a subset of these genes as expected. Observe that FAP and FAP1 in [45] are the same gene (while the abstract mentions FAP, Table 1 mentions it as FAP1). The role of the pool of genes in the BNs learned by CausNet and CausNet-partial is further supported by recent research in [46] (FBN1) and [47] (COL5A1). This further shows the efficacy of CausNet and CausNet-partial in learning small and sparse networks leading to disease outcome. The networks produced by CausNet-partial can be used for further experimental design and validation.
8 Discussion
In this work, we implemented an extension to our earlier method CausNet. This extension—CausNet-partial—is a novel, simple and highly efficient method to find optimal small and sparse BNs using dynamic programming with parent set identification and constraints. Our main novel contribution is the revision of the SM algorithm [13] to identify possible parent sets, reduce the search space with parent set constraints and in using ‘partial generational orderings’ based search. This search method is a more efficient and cost-effective way to explore the search space for small and sparse BNs than the original approach which is based on lexicographical ordering. This enables our method to be applied to large dimensional data effectively, while the SM algorithm can not be applied to more than 30 variables [13].
In simulations, our algorithm outperforms the three state-of-the-art algorithms that are widely used currently for optimal BN discovery. We further show its application to a benchmark discrete Bayesian network ALARM, a Bayesian network designed to provide an alarm message system for patient monitoring. We first apply the original CausNet and then CausNet-partial, varying the partial order from 5 to 2. CausNet-partial discovers small sparse networks with drastically reduced runtime as expected from theory developed as part of this work.
We also showed a usecase application of CausNet-partial to a recently published data involving gene expression of 513 genes and ovarian cancer prognosis with a survival outcome. The BNs learned by CausNet and CausNet-partial included most of the genes that have been shown to be associated with ovarian cancer as reported in [45–47]. While the BN learned by CausNet included all the genes reported in these papers, the smaller BNs learned by CausNet-partial had a subset of these genes as expected. As most existing methods, including the ones in our study, do not provide support for survival outcomes for BN structure learning, our method fills an important gap in providing a method that works for large data with survival outcome.
In conclusion, our partial generational orderings based search for small optimal BNs, CausNet-partial, is an efficient and highly scalable approach for finding optimal sparse and small Bayesian Networks and can be applied to high dimensional data. Using specifiable parameters—correlation, FDR cutoffs, in-degree, and partial order—one can increase or decrease the number of nodes and density of the networks, as required by the application domain.
References
1. 1. Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. In: Proceedings of the fourth annual international conference on Computational molecular biology. ACM. 2000. p. 127–135. https://doi.org/10.1145/332306.332355
2. 2. Bielza C, Larrañaga P. Bayesian networks in neuroscience: a survey. Front Comput Neurosci. 2014;8:131. pmid:25360109
* View Article
* PubMed/NCBI
* Google Scholar
3. 3. Agrahari R, Foroushani A, Docking TR, Chang L, Duns G, Hudoba M, et al. Applications of Bayesian network models in predicting types of hematological malignancies. Sci Rep. 2018;8(1):6951. pmid:29725024
* View Article
* PubMed/NCBI
* Google Scholar
4. 4. Su C, Andrew A, Karagas MR, Borsuk ME. Using Bayesian networks to discover relations between genes, environment, and disease. BioData Min. 2013;6(1):6. pmid:23514120
* View Article
* PubMed/NCBI
* Google Scholar
5. 5. Cheng J, Greiner R, Kelly J, Bell D, Liu W. Learning Bayesian networks from data: An information-theory based approach. Artif Intell. 2002;137(1–2):43–90.
* View Article
* Google Scholar
6. 6. Collins R, Fenton N. Bayesian network modelling for early diagnosis and prediction of Endometriosis. Cold Spring Harbor Laboratory. 2020. https://doi.org/10.1101/2020.11.04.20225946
7. 7. Chickering DM, Heckerman D, Meek C. Large-sample learning of Bayesian networks is NP-hard. J Mach Learn Res. 2004;5:1287–330.
* View Article
* Google Scholar
8. 8. Cooper GF. The computational complexity of probabilistic inference using Bayesian belief networks. Artif Intell. 1990;42(2–3):393–405.
* View Article
* Google Scholar
9. 9. Robinson R. Counting labeled acyclic digraphs. In: Harary F, editor. New directions in the theory of graphs. New York: Academic Press. 1973. p. 239–273.
10. 10. Kitson N, Constantinou A, Guo Z. A survey of Bayesian network structure learning. Artif Intell Rev. 2023;56:8721–814.
* View Article
* Google Scholar
11. 11. Koivisto M, Sood K. Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res. 2004;5:549–73.
* View Article
* Google Scholar
12. 12. Koivisto M. Advances in exact Bayesian structure discovery in Bayesian networks. In: Proceedings of the twenty-second conference on uncertainty in artificial intelligence. AUAI Press. 2006. p. 241–248.
13. 13. Silander T, Myllymäki P. A simple approach for finding the globally optimal Bayesian network structure. In: Proceedings of the twenty-second conference on uncertainty in artificial intelligence. AUAI Press. 2006. p. 445–452.
14. 14. Singh A, Moore A. Finding optimal Bayesian networks by dynamic programming. 2018. https://doi.org/https://doi.org/10.1184/R1/6605669.v1
15. 15. Cussens J. Bayesian network learning with cutting planes. In: Proceedings of the twenty-seventh conference on uncertainty in artificial intelligence. AUAI Press. 2011. p. 153–160.
16. 16. Bartlett M, Cussens J. Integer linear programming for the Bayesian network structure learning problem. Artif Intell. 2017;244:258–71.
* View Article
* Google Scholar
17. 17. Yuan C, Malone B. Learning optimal Bayesian networks: a shortest path perspective. J Artif Int Res. 2013;48(1):23–65.
* View Article
* Google Scholar
18. 18. Yuan C, Malone B, Wu X. Learning optimal Bayesian networks using A* search. In: Proceedings of the twenty-second international joint conference on artificial intelligence. AAAI Press. 2011. p. 2186–2191.
19. 19. de Campos CP, Ji Q. Efficient structure learning of Bayesian networks using constraints. J Mach Learn Res. 2011;12(20):663–89.
* View Article
* Google Scholar
20. 20. Fan X, Malone B, Yuan C. Finding optimal Bayesian network structures with constraints learned from data. In: Proceedings of the thirtieth conference on uncertainty in artificial intelligence. AUAI Press. 2014. p. 200–209.
21. 21. Sharma N, Millstein J. CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints. BMC Bioinform. 2023;24(1):46. pmid:36788490
* View Article
* PubMed/NCBI
* Google Scholar
22. 22. Scanagatta M, Salmerón A, Stella F. A survey on Bayesian network structure learning from data. Prog Artif Intell. 2019;8(4):425–39.
* View Article
* Google Scholar
23. 23. Lucas PJF, van der Gaag LC, Abu-Hanna A. Bayesian networks in biomedicine and health-care. Artif Intell Med. 2004;30(3):201–14. pmid:15081072
* View Article
* PubMed/NCBI
* Google Scholar
24. 24. Sierra B, Larrañaga P. Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artif Intell Med. 1998;14(1–2):215–30. pmid:9779891
* View Article
* PubMed/NCBI
* Google Scholar
25. 25. Marshall A, McClean S, Shapcott M, Millard P. Learning dynamic Bayesian belief networks using conditional phase-type distributions. Vol. 1910. 2000. p. 516–523.
26. 26. Stajduhar I, Dalbelo-Basić B. Learning Bayesian networks from survival data using weighting censored instances. J Biomed Inform. 2010;43(4):613–22. pmid:20332035
* View Article
* PubMed/NCBI
* Google Scholar
27. 27. Darwiche A. Modeling and reasoning with Bayesian networks. Cambridge: Cambridge University Press; 2009.
28. 28. Korb KB, Nicholson AE. Bayesian artificial intelligence. Boca Raton, FL, USA: Chapman & Hall/CRC. 2004.
29. 29. Schwarz G. Estimating the dimension of a model. Ann Statist. 1978;6(2).
* View Article
* Google Scholar
30. 30. Neath AA, Cavanaugh JE. The Bayesian information criterion: background, derivation, and applications. WIREs Comput Stat. 2011;4(2):199–203.
* View Article
* Google Scholar
31. 31. Liu Z, Malone B, Yuan C. Empirical evaluation of scoring functions for Bayesian network model selection. BMC Bioinform. 2012;13(Suppl 15):S14. pmid:23046392
* View Article
* PubMed/NCBI
* Google Scholar
32. 32. Carvalho A, Avancados T. Scoring functions for learning Bayesian networks. 2009. https://api.semanticscholar.org/CorpusID:14829394
33. 33. Heckerman D, Geiger D. Learning Bayesian networks: a unification for discrete and Gaussian domains. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. 1995. p. 274–284.
34. 34. Geiger D, Heckerman D. Learning Gaussian networks. In: Proceedings of the tenth international conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. 1994. p. 235–243.
35. 35. Kuipers J, Moffa G, Heckerman D. Addendum on the scoring of Gaussian directed acyclic graphical models. Ann Statist. 2014;42(4):1689–1691.
* View Article
* Google Scholar
36. 36. Mosca E, Bersanelli M, Gnocchi M, Moscatelli M, Castellani G, Milanesi L, et al. Network diffusion-based prioritization of autism risk genes identifies significantly connected gene modules. Front Genet. 2017;8:129. pmid:28993790
* View Article
* PubMed/NCBI
* Google Scholar
37. 37. Bersanelli M, Mosca E, Remondini D, Castellani G, Milanesi L. Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules. Sci Rep. 2016;6:34841. pmid:27731320
* View Article
* PubMed/NCBI
* Google Scholar
38. 38. Margaritis D. Learning Bayesian network model structure from data. PhD Thesis. Pittsburgh: Carne-Mellon University, School of Computer Science. 2003.
39. 39. Tsamardinos I, Brown LE, Aliferis CF. The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn. 2006;65(1):31–78.
* View Article
* Google Scholar
40. 40. Bhattacharjee M, Dhar SK, Subramanian S. Recent advances in biostatistics: false discovery rates, survival analysis, and related topics. World Scientific; 2011. https://www.worldscientific.com/doi/abs/10.1142/8010
41. 41. Butts CT, Carley KM. Some simple algorithms for structural comparison. Comput Math Organiz Theor. 2005;11(4):291–305.
* View Article
* Google Scholar
42. 42. Hamming RW. Error detecting and error correcting codes. Bell Syst Tech J. 1950;29(2):147–60.
* View Article
* Google Scholar
43. 43. Beinlich I, Suermondt H, Chavez R, Cooper G. The alarm monitoring system: A case study with two probabilistic inference techniques for belief networks. In: Hunter J, Cookson J, Wyatt J, eds. AIME 89. Berlin: Springer; 1989. p. 247–256.
44. 44. Millstein J, Budden T, Goode EL, Anglesio MS, Talhouk A, Intermaggio MP, et al. Prognostic gene expression signature for high-grade serous ovarian cancer. Ann Oncol. 2020;31(9):1240–50. pmid:32473302
* View Article
* PubMed/NCBI
* Google Scholar
45. 45. Yue H, Wang J, Chen R, Hou X, Li J, Lu X. Gene signature characteristic of elevated stromal infiltration and activation is associated with increased risk of hematogenous and lymphatic metastasis in serous ovarian cancer. BMC Cancer. 2019;19(1):1266. pmid:31888563
* View Article
* PubMed/NCBI
* Google Scholar
46. 46. Wang Z, Chen W, Zuo L, Xu M, Wu Y, Huang J, et al. The Fibrillin-1/VEGFR2/STAT2 signaling axis promotes chemoresistance via modulating glycolysis and angiogenesis in ovarian cancer organoids and cells. Cancer Commun (Lond). 2022;42(3):245–65. pmid:35234370
* View Article
* PubMed/NCBI
* Google Scholar
47. 47. Zhang J, Zhang J, Wang F, Xu X, Li X, Guan W, et al. Overexpressed COL5A1 is correlated with tumor progression, paclitaxel resistance, and tumor-infiltrating immune cells in ovarian cancer. J Cell Physiol. 2021;236(10):6907–19. pmid:33655494
* View Article
* PubMed/NCBI
* Google Scholar
Citation: Sharma N, Millstein J (2025) CausNet-partial: ‘Partial Generational Orderings’ based search for optimal sparse Bayesian networks via dynamic programming with parent set constraints. PLoS One 20(6): e0324622. https://doi.org/10.1371/journal.pone.0324622
About the Authors:
Nand Sharma
Contributed equally to this work with: Nand Sharma, Joshua Millstein
Roles: Conceptualization, Formal analysis, Methodology, Software, Writing – original draft, Writing – review & editing
E-mail: [email protected]
Affiliation: Division of Biostatistics, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, California, United States of America
ORICD: https://orcid.org/0009-0002-3274-2247
Joshua Millstein
Contributed equally to this work with: Nand Sharma, Joshua Millstein
Roles: Conceptualization, Methodology, Writing – review & editing
Affiliation: Division of Biostatistics, Department of Population and Public Health Sciences, University of Southern California, Los Angeles, California, United States of America
ORICD: https://orcid.org/0000-0001-7961-8943
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
1. Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. In: Proceedings of the fourth annual international conference on Computational molecular biology. ACM. 2000. p. 127–135. https://doi.org/10.1145/332306.332355
2. Bielza C, Larrañaga P. Bayesian networks in neuroscience: a survey. Front Comput Neurosci. 2014;8:131. pmid:25360109
3. Agrahari R, Foroushani A, Docking TR, Chang L, Duns G, Hudoba M, et al. Applications of Bayesian network models in predicting types of hematological malignancies. Sci Rep. 2018;8(1):6951. pmid:29725024
4. Su C, Andrew A, Karagas MR, Borsuk ME. Using Bayesian networks to discover relations between genes, environment, and disease. BioData Min. 2013;6(1):6. pmid:23514120
5. Cheng J, Greiner R, Kelly J, Bell D, Liu W. Learning Bayesian networks from data: An information-theory based approach. Artif Intell. 2002;137(1–2):43–90.
6. Collins R, Fenton N. Bayesian network modelling for early diagnosis and prediction of Endometriosis. Cold Spring Harbor Laboratory. 2020. https://doi.org/10.1101/2020.11.04.20225946
7. Chickering DM, Heckerman D, Meek C. Large-sample learning of Bayesian networks is NP-hard. J Mach Learn Res. 2004;5:1287–330.
8. Cooper GF. The computational complexity of probabilistic inference using Bayesian belief networks. Artif Intell. 1990;42(2–3):393–405.
9. Robinson R. Counting labeled acyclic digraphs. In: Harary F, editor. New directions in the theory of graphs. New York: Academic Press. 1973. p. 239–273.
10. Kitson N, Constantinou A, Guo Z. A survey of Bayesian network structure learning. Artif Intell Rev. 2023;56:8721–814.
11. Koivisto M, Sood K. Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res. 2004;5:549–73.
12. Koivisto M. Advances in exact Bayesian structure discovery in Bayesian networks. In: Proceedings of the twenty-second conference on uncertainty in artificial intelligence. AUAI Press. 2006. p. 241–248.
13. Silander T, Myllymäki P. A simple approach for finding the globally optimal Bayesian network structure. In: Proceedings of the twenty-second conference on uncertainty in artificial intelligence. AUAI Press. 2006. p. 445–452.
14. Singh A, Moore A. Finding optimal Bayesian networks by dynamic programming. 2018. https://doi.org/https://doi.org/10.1184/R1/6605669.v1
15. Cussens J. Bayesian network learning with cutting planes. In: Proceedings of the twenty-seventh conference on uncertainty in artificial intelligence. AUAI Press. 2011. p. 153–160.
16. Bartlett M, Cussens J. Integer linear programming for the Bayesian network structure learning problem. Artif Intell. 2017;244:258–71.
17. Yuan C, Malone B. Learning optimal Bayesian networks: a shortest path perspective. J Artif Int Res. 2013;48(1):23–65.
18. Yuan C, Malone B, Wu X. Learning optimal Bayesian networks using A* search. In: Proceedings of the twenty-second international joint conference on artificial intelligence. AAAI Press. 2011. p. 2186–2191.
19. de Campos CP, Ji Q. Efficient structure learning of Bayesian networks using constraints. J Mach Learn Res. 2011;12(20):663–89.
20. Fan X, Malone B, Yuan C. Finding optimal Bayesian network structures with constraints learned from data. In: Proceedings of the thirtieth conference on uncertainty in artificial intelligence. AUAI Press. 2014. p. 200–209.
21. Sharma N, Millstein J. CausNet: generational orderings based search for optimal Bayesian networks via dynamic programming with parent set constraints. BMC Bioinform. 2023;24(1):46. pmid:36788490
22. Scanagatta M, Salmerón A, Stella F. A survey on Bayesian network structure learning from data. Prog Artif Intell. 2019;8(4):425–39.
23. Lucas PJF, van der Gaag LC, Abu-Hanna A. Bayesian networks in biomedicine and health-care. Artif Intell Med. 2004;30(3):201–14. pmid:15081072
24. Sierra B, Larrañaga P. Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artif Intell Med. 1998;14(1–2):215–30. pmid:9779891
25. Marshall A, McClean S, Shapcott M, Millard P. Learning dynamic Bayesian belief networks using conditional phase-type distributions. Vol. 1910. 2000. p. 516–523.
26. Stajduhar I, Dalbelo-Basić B. Learning Bayesian networks from survival data using weighting censored instances. J Biomed Inform. 2010;43(4):613–22. pmid:20332035
27. Darwiche A. Modeling and reasoning with Bayesian networks. Cambridge: Cambridge University Press; 2009.
28. Korb KB, Nicholson AE. Bayesian artificial intelligence. Boca Raton, FL, USA: Chapman & Hall/CRC. 2004.
29. Schwarz G. Estimating the dimension of a model. Ann Statist. 1978;6(2).
30. Neath AA, Cavanaugh JE. The Bayesian information criterion: background, derivation, and applications. WIREs Comput Stat. 2011;4(2):199–203.
31. Liu Z, Malone B, Yuan C. Empirical evaluation of scoring functions for Bayesian network model selection. BMC Bioinform. 2012;13(Suppl 15):S14. pmid:23046392
32. Carvalho A, Avancados T. Scoring functions for learning Bayesian networks. 2009. https://api.semanticscholar.org/CorpusID:14829394
33. Heckerman D, Geiger D. Learning Bayesian networks: a unification for discrete and Gaussian domains. In: Proceedings of the eleventh conference on uncertainty in artificial intelligence. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. 1995. p. 274–284.
34. Geiger D, Heckerman D. Learning Gaussian networks. In: Proceedings of the tenth international conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc. 1994. p. 235–243.
35. Kuipers J, Moffa G, Heckerman D. Addendum on the scoring of Gaussian directed acyclic graphical models. Ann Statist. 2014;42(4):1689–1691.
36. Mosca E, Bersanelli M, Gnocchi M, Moscatelli M, Castellani G, Milanesi L, et al. Network diffusion-based prioritization of autism risk genes identifies significantly connected gene modules. Front Genet. 2017;8:129. pmid:28993790
37. Bersanelli M, Mosca E, Remondini D, Castellani G, Milanesi L. Network diffusion-based analysis of high-throughput data for the detection of differentially enriched modules. Sci Rep. 2016;6:34841. pmid:27731320
38. Margaritis D. Learning Bayesian network model structure from data. PhD Thesis. Pittsburgh: Carne-Mellon University, School of Computer Science. 2003.
39. Tsamardinos I, Brown LE, Aliferis CF. The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn. 2006;65(1):31–78.
40. Bhattacharjee M, Dhar SK, Subramanian S. Recent advances in biostatistics: false discovery rates, survival analysis, and related topics. World Scientific; 2011. https://www.worldscientific.com/doi/abs/10.1142/8010
41. Butts CT, Carley KM. Some simple algorithms for structural comparison. Comput Math Organiz Theor. 2005;11(4):291–305.
42. Hamming RW. Error detecting and error correcting codes. Bell Syst Tech J. 1950;29(2):147–60.
43. Beinlich I, Suermondt H, Chavez R, Cooper G. The alarm monitoring system: A case study with two probabilistic inference techniques for belief networks. In: Hunter J, Cookson J, Wyatt J, eds. AIME 89. Berlin: Springer; 1989. p. 247–256.
44. Millstein J, Budden T, Goode EL, Anglesio MS, Talhouk A, Intermaggio MP, et al. Prognostic gene expression signature for high-grade serous ovarian cancer. Ann Oncol. 2020;31(9):1240–50. pmid:32473302
45. Yue H, Wang J, Chen R, Hou X, Li J, Lu X. Gene signature characteristic of elevated stromal infiltration and activation is associated with increased risk of hematogenous and lymphatic metastasis in serous ovarian cancer. BMC Cancer. 2019;19(1):1266. pmid:31888563
46. Wang Z, Chen W, Zuo L, Xu M, Wu Y, Huang J, et al. The Fibrillin-1/VEGFR2/STAT2 signaling axis promotes chemoresistance via modulating glycolysis and angiogenesis in ovarian cancer organoids and cells. Cancer Commun (Lond). 2022;42(3):245–65. pmid:35234370
47. Zhang J, Zhang J, Wang F, Xu X, Li X, Guan W, et al. Overexpressed COL5A1 is correlated with tumor progression, paclitaxel resistance, and tumor-infiltrating immune cells in ovarian cancer. J Cell Physiol. 2021;236(10):6907–19. pmid:33655494
© 2025 Sharma, Millstein. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.