1. Introduction
Current research in the field of neuroscience [1,2,3,4], biology [5], social science [6], and physics [7] confirms that considering higher-order relationships provides highly valuable insights into the structure and dynamics of complex systems. At the same time, the predominant practice of constructing association networks in many fields is still based on pairwise correlations, which are known to miss most or all higher-order relationships [8].
In particular, synergy is a measure used to quantify the higher-order relationships within a system. In a synergistic system, variables tend to be pairwise independent, while the system follows global constraints [4]. An example of a maximally synergistic system is the XOR gate, where and are i.i.d binary variables, and . This is maximally synergistic as and together fully determine Y, while still being pairwise independent of each other and of Y [8]. Therefore, information about any single variable in the system can only be accessed by looking at the system as a whole. As variables in synergistic systems tend to be more independent on a pairwise level, synergistic higher-order relationships are exactly the type that are elusive in a pairwise analysis. A targeted search for synergy is therefore necessary to uncover these higher-order relationships.
On the opposite side of synergy we have redundancy. In a redundant system, the variables are highly correlated with each other, which can be pairwise, but also at a higher level between sets of variables. An example of a system containing redundancy would be a system where variables X1 and X2 are both fully determined by a third variable, X3. In this case, both X1 and X2 would inherently carry information about each other through X3. This introduces redundancy, as each variable contains information about the remaining variables of the system.
There is currently no agreed-upon method of calculating synergy [9]. One popular framework is the Partial Information Decomposition (PID) method, introduced in 2010 by William and Beer [10], which postulates that information can be decomposed into unique, redundant, and synergistic information atoms [11]. However, no measure that satisfies the axioms, or similar sets of axioms, have been found [12] and the search continues [13]. Even if found, the calculations needed for the full decomposition may in general increase super exponentially with the size of the system [12]. Other approaches exist but have their own drawbacks. For instance, Quax et al. [8] propose a method to directly quantify synergy by using intermediate so-called synergistic random variables. However, the method has an extremely high computational complexity. Furthermore, it remains to be seen whether its concept of orthogonal decomposition actually works well for variables with a small number of states, such as binary variables. Other approaches include localized formulations [14,15] and based on information geometry [16,17].
In the absence of an agreed-upon synergy measure, an easy-to-compute heuristic that is gaining popularity is the O-information measure, introduced 2019 by Rosas et al. [9]. It quantifies the balance between synergy and redundancy in a system of variables; while the O-information loses the ability to quantify synergy directly, and may miss synergistic associations altogether if outweighed by redundant effects, it can nevertheless determine whether a system is “synergy-dominated” or “redundancy-dominated”. Moreover, it is calculated through a linear combination of Shannon entropy measures, making its interpretation intuitive and its calculation computationally efficient. Its computation also scales very well with the system size, easily allowing higher-order interactions of more than three variables to be analyzed [9]. These attributes motivate the use of the O-information to quantify the “synergy-dominance” within systems of variables.
Current research on the O-information highlights its potential to identify synergy- and redundancy-dominated variable groups, demonstrating its broad applicability. For example, the authors of [9,12] present proof of concept through musical score analysis. It has also been used in the neuroscience domain, as seen in [3,4]. Dynamic O-information, explored by [3], measures system evolution via synergistic interactions. Meanwhile, the authors of [18] analyze higher-order fMRI data, while [19,20] extend the O-information to the O-information rate (OIR) for studying dynamical interactions in the frequency domain, which is further applied to post-stroke brain networks [21]. The authors of [22] investigate O-information gradients in an Ising spin model and macro-economic data, while [12] introduce the local O-information for granular synergy-redundancy analysis.
However, a caveat that has been unnoticed is the statistical bias that occurs when estimating the O-information, which is introduced through the estimation of its Shannon entropy term components. As shown in this paper, these biases are exasperated by the estimation of the joint entropy, and are especially strong when the sample size is small relative to the number of bins that the variables are discretized into. This can make the interpretation of the O-information misleading, as the biases cause higher-order relationships to appear more redundancy- or synergy-dominated than they are in reality. Although one obvious way to avoid this particular type of bias is to directly compute O-information from continuous data, for which methods indeed exist [23], this is not always possible. For example, when some or all of the variables in the data may have already been discretized or are discrete by nature. No mixed method for estimating O-information exists, so if some of the data are discretized, it seems unavoidable to discretize all variables, since the continuous method will have entirely different biases as well as potentially different order of magnitude.
If O-information is to become a widely used measure to identify synergistic higher-order relationships, it is crucial to be aware of the bias in its naive estimation. With this paper, we hope to provide an intuition for the direction and magnitude of the bias in the naive O-information estimation, and how this bias behaves across different sample sizes N, number of discretization bins K, and various joint distributions for systems of variables. Moreover, an O-information bias approximation based on the Miller–Maddow entropy estimation is derived. We explore the cases in which this bias approximation does (and does not) improve the O-information estimation for , and inspire future work into the bias correction of the O-information. For simplicity, our analysis focuses on these variable “triplets”, but all analyses can be extended to systems of more than variables.
2. Theoretical Background
2.1. The O-Information
The O-information, denoted by , quantifies the balance between redundancy and synergy within a system , and is calculated through a linear combination of Shannon’s entropy measures [9]:
(1)
where is the entropy of variable , is the joint entropy of the system , and is the joint entropy of the system without variable . If the O-information , this implies the system is redundancy dominated; meanwhile, if , this implies synergy domination. If the O-information is zero, the variables may be fully independent, or the synergy and redundancy in the system cancel each other out perfectly [9].2.2. Properties of the O-Information
The O-information is a symmetric measure [9], and does not require the specification of a target variable. It is used on cross-sectional data, but extensions of the O-information to time-series data have been explored in the previous literature [22].
Tight bounds on the O-information are provided in [9]. Let be the cardinality of the discrete random variable . Then, let be the cardinality of the variable with the largest cardinality in system . The upper and lower bounds on the O-information are then:
(2)
where the log is always calculated with the base of 2 unless specified otherwise. The authors of [9] further provides proof that when all variables are discretized into the same number of bins , the maximum O-information can be achieved by a system if—and only if—the outcome of each random variable can be inferred by any other variable in the system. Then, the system is maximally redundant. The minimum O-information can be achieved if—and only if— are independent and uniformly distributed, and [9]. Then, the system is maximally synergistic.2.3. O-Information Estimation
When the underlying marginal and joint probability distributions are not known, like in most empirical datasets, the O-information must be estimated from the available data. Let the estimation of O-information and its entropy terms be denoted as:
(3)
It is well-known that a limited sample size results in a downward bias in the estimation of the entropy terms , and . This can be proven using Jensen’s Inequality, but can also be intuitively explained: with a finite sample size we observe rare events less, or maybe not at all. With the naive entropy estimation, we then assign these events a smaller probability than they actually have. Therefore, the probability distribution seems less spread out, and by definition this results in a smaller entropy estimation. It is then clear to see that these biases may subsequently introduce a bias into the estimation of O-information (3), such that
(4)
where is the bias in O-information estimation. The balance of the biases introduced by each of the entropy terms in (3) determines whether O-information is under- or overestimated. The marginal and joint distributions of the variables in the system will ultimately determine the sign of O-information bias. If the bias is not accounted for, a sample size that is too small may result in a system of variables to appear to be more redundancy or synergy-dominated than it is in reality.3. Previous Literature
3.1. Entropy Bias Approximation Techniques
The bias in the naive entropy estimation has been explored extensively in the previous literature. The most prominent methods include the Miller–Maddow bias approximation [24], the Jackknife entropy estimation [25], the Grassberger estimation [26], and several Bayesian methods initiated by Wolpert and Wolf [27]. In this paper, we extend the Miller–Maddow method from the entropy to O-information due to its low computational cost in comparison to the Jackknife method, its simplicity in comparison to the Grassberger method, and its accuracy in comparison so the Bayesian methods.
The Miller–Maddow Entropy Bias Approximation
The Miller–Maddow method [24] estimates the expected value of the naive entropy estimations by making a Taylor approximation of the function around each bin’s respective probability . This is repeated for all possible outcomes of , and the expected value over all Taylor approximations per bin i is taken. See Appendix A for the full derivation.
A requirement of the Miller–Maddow method is that N is in the asymptotic sampling regime, which means that the sample size is large enough such that each outcome of the random variable is observed many times [28].
When taking the Taylor approximation to the second order, the resulting Miller–Maddow entropy estimator is
(5)
Let the Miller–Maddow bias approximation term of entropy be denoted by , such that
(6)
This entropy bias approximation (6) only depends on the number of bins, K, and the number of observations, N. We see that as , the bias approximation . This makes sense intuitively, as the naive plugin estimator as . Moreover, as , the bias approximation , as increasing the number of bins decreases the mean number of possible observations per bin.
3.2. Mutual Information Bias Approximation Techniques
Beyond the bias approximations for entropy estimations, several bias approximation methods have been explored for the estimation of other information theoretic measures as well, such as the mutual information.
In this paper, however, we derive the O-information bias approximation from the Miller–Maddow entropy bias approximation. As the mutual information is a composite measure of various entropy terms, each with their own biases, it would be more difficult to attribute the behavior of the O-information estimation bias to its individual mutual information components. The entropy formulation is preferred over a mutual information formulation because it is clearer to separate the bias of each entropy term in the O-information estimation (3). Especially as we investigate the O-information estimation bias for various joint distributions which each have distinct behaviors of and , it is important to keep the sources of the bias in each of these terms as clear as possible. Moreover, it is more convenient to generalize the bias of the joint entropy of a system with variables to systems of any other number of variables. This makes the entropy formulation of the O-information estimation more compatible when considering the extension of this analysis to various system sizes in future work.
3.3. Permutation Tests
Further, we do not use permutation tests to quantify the bias of O-information. For a single triplet, the permutation test involves the permutation of data for each variable, forcing independence between the variables, and then calculating the new O-information from the resulting joint probability distributions. This process is repeated many times per triplet, and finally the original O-information estimation is compared to the distribution of the O-information estimations of the permuted data. As the permutation destroys any relationships between the three variables, the mean O-information after permutation should, in theory, be zero. If the mean is not zero, the deviation from zero represents the bias in the O-information estimation that persists. However, since this process is repeated for each triplet individually, estimating the bias using permutation tests is computationally very expensive. This is particularly true when trying to identify synergistic triplets in a larger dataset. Therefore, we develop a bias approximation in this paper that can be reasonably applied to the entire dataset, making the process much more convenient for exploratory analyses.
4. Methods
4.1. Derivation of the Miller–Maddow O-Information Bias Approximation
The Miller–Maddow entropy estimation (5) applies to the entropy of a single random variable. In order to approximate the bias in the O-information estimation, we must extend the entropy estimations to the joint entropy terms and .
The Taylor expansion process is the same for the joint entropy estimation as for the entropy estimation , just that we are now looking at the probabilities of possible bin combinations, rather than the probabilities of individual variables’ bins. The estimated joint entropy therefore becomes
(7)
where is the number of possible bin combinations the n variables in the system can have. We see that this yields the following joint entropy bias approximation term:(8)
Similarly, we can derive that the estimation bias of the joint entropy of system can be approximated by:
(9)
where is the number of possible bin combinations of all variables in the system except for variable .As the estimation is a random variable, we can take the expected value of (3):
(10)
Substituting the bias approximations (6), (8), and (9) and simplifying we obtain the final O-information estimation derived from the Miller–Maddow method:
(11)
Thus, we find that by extending the Miller–Maddow entropy estimation to the joint and conditional entropy, the O-information bias can be approximated by the bias term:
(12)
4.2. Simulating the Bias in the Naive O-Information Estimation
The effect of sample size, N, and number of bins, K, on the bias of the naive O-information estimation can be investigated in a controlled environment through simulations. In these simulations, we construct variable triplets with joint distributions for which we know the true , and compare this to our estimation that is based on solely the available, simulated samples.
4.2.1. The Discretization Strategy
As we explore the discrete case in this paper, the sampled variables must be discretized before estimating their O-information. In empirical data, the true underlying bin widths are not known, and the best we can do is to construct the bins based on the available sample we observe. In our case, we use the quantile binning strategy, where the bin width is determined such that the number of observation in each bin stays the same. For instance, with , the lowest 10% of observations are allocated into bin 1, the highest 10% of observations into bin 10. This results in the marginal distributions of each variable to be approximately uniform.
Of course, in our simulations, we do know the true underlying marginal and joint probability distributions of the variables. However, as we want our simulations to reflect the scenario of working with real empirical data, we construct the bins for each simulated variable according to the observed outcomes we have sampled, rather than the true underlying distribution, while this likely reduces the accuracy of the bias correction in these simulations, it allows us to obtain an idea of the performance of the Miller–Maddow bias correction on real, empirical data.
4.2.2. The Simulated Distributions
We construct three different systems: The first system achieves the theoretical maximum O-information (a fully redundant system), the second achieves a theoretical O-information of zero (a system of fully independent variables), and the third achieves the theoretical minimum O-information (a fully synergistic system).
These three distributions are constructed as follows:
Redundant triplet: each variable is a copy of each other . Each variable is uniformly distributed and discretized into K bins. This achieves the maximum possible O-information of a triplet with K bins: [9].
Independent triplet: each variable is independent and uniformly distributed, discretized into K bins. Due to the variables’ independence, the true O-information regardless of N and K.
Synergistic triplet: and are independent and uniformly distributed, discretized into K bins. . This achieves the minimum possible O-information, [9].
We set the number of bins to be equal across the three variables in the system, such that . This allows us to apply the same bias approximation method across all triplets in the dataset.
It is important to realize that these three systems require a different minimum sample size in order to theoretically be able to estimate their joint probability distributions. This is simple to see for a system of independent variables: the number of samples needed increases exponentially with the number of variables in the system n. In contrast, for a system of fully redundant variables, we only need a minimum sample size of K to be able to estimate the joint probability distribution, regardless of the number of variables in the system. In a fully synergistic system, all variables are independent from each other, except for the last, which is fully determined by all others. Thus, the theoretical minimum sample size needed to estimate the joint distribution of all variables except for the last increases exponentially (just like in the fully independent system). To estimate the joint distribution of the entire system however, only samples are needed.
4.2.3. Simulation Setup
The tested number of bins ranges from to , increasing in intervals of 1. The tested sample sizes range from to , increasing in intervals of 500. 30 trials are carried out due to the stochasticity in the data generation, leading to 30 naive O-information estimations per combination per system:
Let the mean naive O-information estimation over the 30 trials be denoted as , and the resulting mean bias be denoted as
(13)
As the data are generated artificially, we know the underlying number of possible bin combinations and . Note that K is known by definition of the quantile binning strategy. These values can then be substituted into the equation for the O-information bias correction (12). Again, we obtain 30 bias-corrected O-information estimations
and we let the mean observed bias be denoted as(14)
However, in real empirical data, the true and values are usually not known, and have to be estimated from the available data. We estimate them here by simply counting the number of joint bin combinations that occur in the data. Let the mean observed bias using the estimated and values be denoted as
(15)
The mean error of the naive O-information estimation can then be compared to the mean error of the bias-corrected O-information,
(16)
We focus on the bias approximation using rather than , as we usually do not know the true and values and must estimate them from the available data.
4.3. Application to the Young Finns Study
Next, we aim to investigate how our knowledge of the bias in the O-information estimation affects the conclusions we draw about the higher-order interactions of variable triplets in an empirical dataset. This is explored on the empirical dataset of the Young Finns Study (YFS), which is one the largest and long-standing observational cohort studies globally that has followed a cohort of individuals from childhood to adulthood [29]. Specifically, we are interested whether we can discover synergistic higher-order relationships between cardiovascular disease (CVD) and depression variables. The subset of data we use is the 2007 wave of the YFS, and contains lipidomic data, metabolomic data, depression scores, and CVD related phenotypes. For more information on the YFS dataset, see Appendix B. Only the participants who appear in all four datasets are kept in our final dataset, ensuring that lipidome-, NMR-, depression-, and CVD-related phenotype data are available for all participants. This leaves a total of N = 1684 participants. Any missing values are imputed with the median of the feature.
The dataset contains a mix of continuous and discrete variables. Variables are discretized using the quantile K-bins method, where all variables are discretized into bins with an equal number of samples in each bin. Just like in our simulations, this results in the marginal probability distributions of the variables being approximately uniform after discretization, and they are perfectly uniform if N is a multiple of K. As the number of bins, K, is one of the main components in the O-information estimation and the bias approximation, , it is crucial that all variables are discretized into the same number of bins, K. The quantile binning strategy ensures that this is the case, given that . An equal-width binning does not guarantee this, as empty bins can remain, even if .
After a feature selection described in Appendix C, we compute the naive and bias-corrected O-information estimations for all possible sets of variable triplets. The resulting distribution of O-information estimations from the empirical data are then compared to the O-information estimations of our simulations with the same N and K values.
5. Experiments and Results
5.1. Bias of the Naive O-Information Estimation in Simulations
First, we explore the effect of varying the sample size, N, and the number of bins, K, on the mean naive O-information estimation and its bias (13). Figure 1 shows for all simulated distributions, and Figure 2 shows for all simulated distributions. Note that the bias of the redundant triplet is shown for a range of much smaller sample sizes, with N only going up to . As the joint entropy of the fully redundant triplet is equal to the marginal entropies, the O-information bias is much smaller; thus, this is the scale needed to observe the bias for up to bins.
Figure 2 can be read as follows: if (red), then is more synergistic than the true . If (blue), then is more redundant than . In this figure, we also plot the line for the redundant triplet, the and lines for the independent triplet, and the line for the synergistic triplet. As explained in Section 4.2.2, these boundary lines are the theoretical minimum sample sizes that are needed to observe all possible joint bin combinations.
The bias of the naive O-information estimation, , is particularly strong for the triplet of independent random variables, being strongly biased towards synergy as seen in Figure 2. The most severe bias occurs approximately in the space where . This is demonstrated through the red shaded area between the dashed lines of and . Interestingly, despite the lower ratio, a decreased (but still positive) bias is observed in the region where , demonstrated by the lighter shades below the line.
For the redundant triplet, we observe that is biased strongly towards synergy, being most severe when is the smallest. Around , however, the bias fades to zero. For the synergistic triplet, we observe that the line approximately traces the boundary above which is biased. The bias has around the same magnitude as in the redundant and independent triplet, but an opposite direction, while the redundant and independent triplets were biased towards synergy, the synergistic triplet is biased towards redundancy.
Looking at the standard deviations of the naive O-information estimations on Figure 3, we observe that they are quite low for all three simulated systems, ranging up to a standard deviation of . The lowest standard deviations are seen for the redundant triplet, which is approximately zero for all combinations. The standard deviations of the synergistic and independent triplets are comparatively higher, especially for small N.
5.2. The Miller–Maddow O-Information Bias Approximation
Next, we explore the accuracy of the O-information bias approximation derived in (12) on our simulations. Recall that and are estimated from the data by simply counting the number of unique joint bin occurrences. We are mainly interested in the decrease in the estimation error after the bias correction, which is given by in (16).
Figure 4 shows for all simulated combinations. A negative value (blue) implies that the bias-corrected O-information estimation is more accurate than the naive O-information estimation . A positive value (red) implies that the naive estimation is more accurate. This analysis highlights the situations in which the bias approximation (12) works well, and in which situations it fails.
To better visualize the behavior of , , and the true O-information , Figure 5 shows these values for the slice and Figure 6 for the slice of the heatmap in Figure 4. These two K values are chosen because they exhibit quite different behaviors, which provides an interesting contrast.
For the independent triplet, four separate sections can be identified in Figure 4. There is close to no bias in the naive estimation when , and the bias correction is also approximately zero in this case. In Figure 5, this is equivalent to the section of large N where both and have almost converged to the true O-information . When , the bias correction indeed reduces the bias, as indicated by the negative . In Figure 6 this is the section in which both and approach , but approaches it faster. Then, as N decreases further, there is a small section where the naive estimation and the bias-corrected estimation have approximately equal errors. In Figure 6, this is equivalent to the part where the two O-information values intersect. This only happens when the number of bins , so we do not observe this intersection in Figure 5. For small N and large K, , implying that the naive estimation is more accurate than the bias-corrected estimation. This is also reflected in Figure 6 for small sample sizes, where the orange line, , briefly dips below the blue line, .
For the redundant triplet we also observe an interesting pattern. The bias correction seems to improve the accuracy of the O-information estimation for most . On Figure 5 and Figure 6, this is reflected in the part where both and approach , when , but approaches it faster. On and around the line, the naive estimation becomes more accurate than the bias-corrected estimation. In Figure 5 and Figure 6, this is reflected by , intersecting with the true , while the overshoots the true value at . Gradually, however, approaches the true value again as N increases further. There is also an intriguing pattern of white dots along the line in the heatmap in Figure 4. For the larger sample sizes the error fades to approximately zero.
For the synergistic triplet, the pattern is much more straightforward. The bias correction improves the accuracy of the O-information estimation wherever there is a bias in the naive estimation to begin with, which is where . If , the bias approximation is also around zero.
5.3. Application to the Young Finns Study
Applying this exploration to an empirical dataset, the O-information of all possible triplets in the feature selected YFS dataset is estimated. Figure 7 compares results from the empirical YFS data with the results from our simulations. For the empirical data, it shows the distribution of the naive O-information estimations and the estimations with the bias correction applied (12). For the simulations, dashed vertical lines at and show the mean naive O-information estimation and the mean bias-corrected O-information estimation of the simulated independent triplets with and . The dark shaded distributions around these dashed lines additionally show the distribution of these O-information estimations over all 30 trials.
Taking a closer look at the 20 most synergistic triplets in the YFS dataset, we note two important results. Firstly, 90% of these highly synergistic triplets contain ratio variables. For example, a ratio triplet might look like “MHDLC, MHDLTG, MHDLTGPCT”, where MHDLC is the total cholesterol in medium HDL, MHDLTG are the triglycerides in medium HDL, and MHDLTGPCT is the ratio of the triglycerides to total lipids ratio in medium HDL. Note that lipids include both cholesterol and triglycerides (as well as other types of lipids) [30]; thus, the MHDLTGPCT variable contains information about the ratio of cholesterol to triglycerides. This provides a proof of concept of the O-information in a technical sense, as these ratios embody the core concept of synergy: after all, a ratio can only be computed if the two input variables are known. A table of the 20 most synergistic triplets with indication which ones contain such a ratio variable can be found in Appendix D.
However, in a medical context, the synergy of ratio triplets is not relevant. Therefore, we exclude such triplets from our further analysis. We additionally remove variables that are sums or means of other variables. Triplets containing these composite variables are likely to appear highly synergistic due to the inherent mathematical relationships between each other. Consequently, these triplets are not medically relevant as they do not represent actual synergistic biological pathways.
On this remaining dataset, we now explore whether the bias correction gives any further insight on the synergistic triplets. Specifically, whether correcting for the bias changes the order of synergistic triplets in the dataset, or if it reveals any new synergistic triplets. The result is that the top five most synergistic triplets remain unchanged before and after the bias correction. When looking at the overall composition of the top 50 remaining synergistic triplets, we find that 36 of the triplets are present both before and after the bias correction (see Appendix E). Therefore, the bias correction introduces 14 new synergistic triplets which can be analyzed. None of the most synergistic triplets contain the depression score.
The most synergistic remaining triplet both before and after bias correction is [“UnSatDeg”: Estimated degree of unsaturation, “SLDLTG”: Triglycerides in small LDL; mmol/L, “FAw3”: Omega-3 fatty acids; mmol/L], with and .
6. Discussion
6.1. The Naive O-Information Estimation in Simulations
The mean naive O-information estimation over 30 trials and their standard deviations are shown in Figure 1 and Figure 3, respectively. The values of the fully redundant triplet and the fully synergistic triplet behave as expected, as they seem to follow their theoretical equations and . The slight deviation from the theoretical in the synergistic triplet is due to the bias, discussed in depth in the following sections. The of the independent triplet very clearly shows the presence of a bias, as we see for a large section of the heatmap, specifically where is small.
The standard deviations of in Figure 3 also align with our expectations. As in the redundant triplet, the joint distributions are identical to the marginal distributions. Moreover, the quantile binning strategy ensures that we have a near perfect uniform marginal distribution in every trial. Together, this causes the standard deviation of to be approximately zero. The standard deviations for the synergistic and independent triplet are larger, as there two independent variables in the synergistic triplet, and three independent variables in the independent triplet. However, overall the standard deviations are still quite low.
6.2. Bias of the Naive O-Information Estimation in Simulations
Before discussing the behavior of the bias , we point out that the quantile binning strategy results in the naive entropy estimation to have zero bias if and . Even if , the bias of is still very close to zero. This happens because the bins for each variable are constructed based on the observed samples (even though we know the ground truth quantiles of the simulated variables), resulting in their quasi-uniform distributions. We do this to make the simulation comparable to the procedure of binning empirical data, for which the ground truth quantiles are not known. If a different binning strategy is used, we can also expect the entropy estimation to become biased, thus increasing the bias of the naive O-information estimation. However, it is not trivial to see how the bias in the entropy estimation would affect the bias of the joint entropy estimation.
6.2.1. Bias in the Independent Triplet
A strong bias towards synergy is observed when . A sample size smaller than implies that not all of the jointly possible bins can be observed. Thus, by definition, an accurate estimation of the joint probability distribution is impossible. This leads to a severe underestimation of . From the equation of the O-information estimation (3), it is clear that an underestimation of biases the O-information downwards, making it appear more synergistic than it is in reality. Moreover, even if the sample size is equal to or just slightly larger than , it is unlikely that every unique bin combination is observed at least once. Thus, even above the line, some bias remains, which slowly fades to zero as the ratio increases.
We also see a decreased (but still positive) bias when . Here, the sample size is too small to observe all possible bin combinations between two of the three variables. Thus, it is impossible to even estimate accurately, resulting in a severe underestimation of both and . As seen by the opposite sign of these two terms in (3), their underestimations cancel each other out to some extent. This results in a smaller bias , despite the very small sample size, N, and large number of bins, K.
Clearly, this smaller bias when should not be taken as an improvement. In Figure 3, we see a much higher standard deviation of O-information estimations for the case that . and are both still severely underestimated, and achieving a careful balance between their two biases is not a reliable method of reducing the bias in the naive estimation.
It is important to emphasize that a significant bias can still be observed with relatively large sample sizes. For example, when and , the triplet of independent variables has , while its theoretical O-information should be zero. To put this into perspective, the minimum bound of the O-information for a system of variables and bins is equal to . The naive O-information estimation is thus closer to the minimum O-information bound than to its true O-information of zero.
6.2.2. Bias in the Redundant Triplet
Figure 2 shows that the redundant triplet is biased towards synergy when . Due to the fact that , we have that . Thus, the strength of the underestimation of all three of these entropy terms are equal in the estimation of the O-information (3). The underestimations of and cancel each other out in the sum in (3). Therefore, the only underestimation that remains is the underestimation of . If , then by definition must be underestimated, and we see the bias towards synergy in the naive O-information estimation. We also see that this bias decreases to zero very quickly as soon as N increases above K. As the quantile binning eliminates the bias in the entropy estimation as long as , this also eliminates the bias of the joint entropy terms. Thus, the naive O-information estimation becomes much more accurate as soon as N increases above K.
6.2.3. Bias in the Synergistic Triplet
For the fully synergistic triplet, the minimum sample size which is theoretically needed to accurately estimate the O-information is . However, we see that, even when N is larger than , the bias is not zero. Again, this occurs because it is unlikely that each of the samples observes a unique bin combination of the total possible combinations.
The bias is towards redundancy in the synergistic triplet due to the balance of and in the naive O-information estimation (3). The joint entropy appears once, adding to the O-information estimation, while appears three times, subtracting from the O-information estimation. In the synergistic triplet, the magnitude of the underestimation of both and are the same per sample size N. Thus, the net effect on the bias term is negative, and the effect on the resulting O-information estimation is positive. Therefore, the balance of their underestimations make the naive O-information estimation biased towards redundancy.
Moreover, we do not see the O-information bias fade to zero as fast as in the redundant triplet. This occurs because the quantile binning only eliminates the bias in the entropy estimation , and unlike the redundant triplet, the joint entropy terms are not equal to the marginal entropy terms any more.
6.3. The O-Information Bias Approximation in Simulations
In this section, the accuracy of the mean naive estimation is compared to the accuracy of the bias-corrected estimation . This is performed through the term introduced in Equation (16). Recall that, if , is more accurate than , and vice versa if . The term for each combination is shown in Figure 4. Figure 5 and Figure 6 show the behavior of and for the slices shown in Figure 4, where and . These two figures are helpful to visualize many of the behaviors discussed in this section.
6.3.1. Bias Approximation of the Independent Triplet
The sections in Figure 4 of negative values for large enough N are expected. Here, the bias correction partially corrects for the underestimation of the entropy terms in the O-information estimation, which provides a more accurate estimation of the O-information.
We also see that is positive for small ratios, implying that the naive estimation is more accurate than the bias-corrected one. As discussed in Section 6.2, the mean bias of the naive estimation actually decreases when due to the canceling out of underestimations of various entropy terms. This effect is not captured by the bias approximation (12), where a smaller N always results in a larger bias approximation. The bias approximation is not able to capture this effect because we use the second-order expansion of the Taylor approximation, instead of the third-order expansion, as given by [31]. The second-order expansion works well for uniform distributions; however, in the independent triplet, it is not guaranteed that the joint probability distributions and are uniform. Therefore, the balance of biases in and terms results in this overestimation of the bias we observe.
6.3.2. Bias Approximation of the Redundant Triplet
Recall that, in the fully redundant system, we only need to concern ourselves with the bias in the entropy estimations . In the section with a slightly larger N but where , the entropy underestimations are partially corrected for. From the O-information estimation (3), we see that increasing has the effect of increasing the O-information estimation as a whole to reflect the underlying redundancy.
Around the line, however, the naive estimation is more accurate than the bias-corrected estimation. This is due to the quantile binning strategy used. The quantile binning results in almost perfectly uniform marginal distributions as long as . Moreover, due to the system constraint that , the marginal distributions are equal to the joint distributions. Thus, the naive estimation has approximately zero bias in the fully redundant case. The bias correction, however, is derived under slightly different assumptions than the quantile binning (discussed in more depth in Section 6.4), and therefore tries to correct for a nonexistent bias in the entropy estimation . In the redundant triplet, this results the an overshoot of over the true .
The white dots along the line in Figure 4 are the points where have overshot the true O-information by the same distance as has yet to approach the true O-information. Therefore, their absolute errors cancel each other out, resulting in .
6.3.3. Bias Approximation of the Synergistic Triplet
The bias correction works much better for the fully synergistic triplet than the independent triplet when . For the synergistic triplet, the bias approximation formula (12) reduces to the concave parabola . This parabola is centered around approximately zero and is very wide, as the coefficient is so close to zero. The larger the sample size N, the wider the parabola becomes, and thus a higher K is needed in order for the bias approximation to noticeably deviate from zero. Only then can the bias correction take effect for the synergistic triplet. This explains why the bias approximation works particularly well for , and does not have a noticeable effect when N is larger.
6.4. A Note on the Quantile Binning Strategy and the Bias Approximation
While the experiments on the simulated data show that the bias-correction process is far from perfect, we point out that our procedure of bin allocation is not completely in line with the idea behind which the O-information bias correction (12) was derived. As stated above, the bins are constructed based on the observed samples, even though we know the ground truth quantiles of the generated variables. However, the bias-correction process (12) assumes that the bins are created from the true underlying distribution, and the observed samples are assigned the bins they happen to fall into.
This is illustrated by the fact that in the fully redundant triplet when , the naive O-information estimation has zero error while the bias-correction process does not have zero error, and tries to correct for the underestimation in the entropy we would see if the samples were assigned to the bins constructed from the true underlying distribution.
Therefore, the bias correction might have performed better in this simulated experiment if we had constructed the bins from the known underlying distribution, and assigned the observed samples to these existing bins. As this is impossible with empirical data, we decided to keep the binning procedure of the simulation experiment comparable to the empirical data.
6.5. When Can the Bias-Correction Process Be Applied to Empirical Data?
The redundant triplet shows next to no bias for sample sizes N that are larger than the number of bins, K, to begin with; while the bias correction is less accurate than the naive estimation around , this inaccuracy fades to zero relatively quickly as increases further. As it is generally recommended to have a sample size much larger than the number of bins, it is unlikely that the errors of the bias correction in redundant triplets pose any serious problems when applying this bias correction. The situation is more tricky for triplets of independent variables. If the sample size is , then the bias correction does more harm than good. However, despite the naive estimation performing better in this case, it is still quite biased itself. The bias correction only improves the O-information estimation for triplets of independent variables given that . However, it only partially corrects for the bias. If the triplet is synergistic, the bias correction works very well for , substantially improving the O-information estimation. If , then there is very little bias to begin with, and the bias correction can only very slightly improve the estimation further.
Assuming we have no prior knowledge about the joint distribution of variables, the prudent approach is to assume independence between variables in a triplet, as these triplets show the largest error of both the naive estimation and the bias-corrected estimation . With this assumption, the bias-correction process can be applied only if , but it is important to keep in mind that it does not correct the bias fully. If , the naive O-information estimation should quite good to begin with, and the bias correction would simply improve it slightly more.
6.6. Application to the Young Finns Study
The simulated independent triplets have a mean O-information estimation (both naive and bias-corrected) that aligns almost perfectly with the peak of the O-information distribution of the empirical data. This reveals that a large majority of triplets in the YFS dataset are independent from each other (or their synergy and redundancy cancel each other out perfectly), rather than synergistic as their negative O-information might imply.
The result that almost all of the top synergistic triplets are ratios makes sense as well. For triplets which are exact ratios, the pairwise constraints are much weaker than the constraints on the system, which is completely governed by the equation X3 = X1/X2. This perfectly aligns with the idea that a synergistic system has local independence but global cohesion. Two of the three variables in the triplet must be known in order to derive the third, but just knowing one does not give much information about either of the other two variables. Even though the ratio triplets in the YFS dataset are not exact ratios due to the discretization for example, they were still identified as highly synergistic. This provides a proof of concept of the O-information, as it indeed assigns the most negative O-information to the systems in empirical data which embody this concept of synergy.
After removing the triplets containing inherent mathematical relationships (ratios, sums, and means), the fact that the bias correction does not change the order of the top 5 synergistic triplets implies that there is a level of robustness in the O-information. It suggests that the strength of the bias does not drastically change across similarly synergistic triplets. This allows us to make interpretations (with caution) about the relative strength of higher-order synergistic relationships. It also allows us to single out specific variables that frequently occur across highly synergistic triplets, with the reasonable assumption that these variables remain relevant after correcting for the bias. However, the bias correction does change the ordering beyond the top 5 synergistic triplets, and introduces 12 new triplets to the 50 most synergistic ranking. Thus, in combination with the simulations, the bias correction could provide valuable insights into synergistic triplets that may previously have been overlooked, for instance when performing centrality analyses on this hyper-network.
The most synergistic remaining triplet both before and after bias correction is [“UnSatDeg”: Estimated degree of unsaturation, “SLDLTG”: Triglycerides in small LDL; mmol/L, “FAw3”: Omega-3 fatty acids; mmol/L], with and . The degree of unsaturation is determined by the total number of double bonds and rings present in molecules such as triglycerides. The number of double bonds present in a triglyceride is in turn influenced by its fatty acid composition [32]. Omega-3 fatty acids are a type of polyunsaturated fatty acid which can be found in triglycerides [33]. The “poly” signifies the existence of multiple double bonds, and each of these double bonds increases the overall degree of unsaturation of the triglyceride. The third variable SLDLTG represents the triglyceride content in small LDL particles. A potential explanation for the synergy between these three variables is that the degree of unsaturation cannot be determined by the amount of Omega-3 fatty acids or the amount of triglycerides in LDLs alone. However, together they provide more information on the overall composition of saturated and unsaturated fatty acids in LDLs, and thus the degree of unsaturation can be more accurately estimated.
7. Conclusions
An important finding is that in triplets of independent variables the bias in the naive O-information estimation can persist even with relatively large sample sizes. In contrast, the O-information estimation of fully redundant and synergistic triplets is less biased, as their joint probability distributions can be estimated with less samples.
For systems of independent variables, the naive O-information estimation is severely biased towards synergy if . At this sample size, especially the joint entropy is severely underestimated, resulting in a strong downward bias of the naive O-information estimation. Without taking this bias into account, higher-order relationships may be classified as highly synergistic while in reality being close to independent. In fact, our simulations show that, even with a larger sample size of and , the naive O-information estimation is so biased that it is closer to the minimum bound of the O-information than its true value of zero.
This implies that highly synergistic groups of variables identified in empirical data using the O-information may in reality be closer to independent, if the ratio is small enough and no bias correction attempts have been made. Indeed, findings illustrated by the Young Finns Study and our simulations suggest that a large majority of triplets in the dataset which would naively be labeled as synergistic due to their negative O-information are actually independent. When looking for synergistic higher-order relationships in empirical data, it is therefore crucial to be aware of the existence of this bias. It may also be helpful to use simulations to establish the O-information estimation value which fully independent triplets with the same N and K values would have, in order to benchmark the boundary between synergistic and redundant in the empirical dataset.
Future work could improve the Miller–Maddow bias-correction process derived in this paper by exploring better ways to estimate and from the data, rather than simply counting the number of joint bin combinations we observe. Moreover, other O-information bias correction methods should be explored. The Jackknife method may be a better alternative than the Miller–Maddow method for small sample sizes, if one is willing to take on the additional computational cost. Other alternatives with lower computational cost should also be explored, such as Harris’ third-order Taylor approximation or Bayesian approaches.
Moreover, future work should explore the effect of different discretization methods on , , and , in order to apply these results to cases where quantile binning may not be appropriate. While it is out of scope for this paper to compare the simulation results for various discretization methods, it is certainly important to keep in mind that the discretization method will affect the behavior of information theoretic measures [34], including , , and .
Different discretization methods also open the door to exploring the estimation bias when variables have marginal distributions other than uniform. For instance, in the case of gaussian marginal distributions, the inherent relationship between the entropy and the variance may be exploited to estimate the O-information, following the method of [18]. While this would still require the estimation of the variables’ variances and the system’s covariance matrices, it would provide an interesting comparison to the bias correction presented in this paper.
Lastly, the analyses carried out in this paper for systems of could be extended to consider systems of more than three variables. The O-information’s power lies in its scalability to systems of higher dimensions, which should be taken advantage of, while the derived O-information bias approximation term is flexible to include a higher number of variables, it would be important to explore the effect of increasing the number of variables on the approximation’s accuracy.
Conceptualization, J.G., J.L. and R.Q.; methodology, J.G., J.L. and R.Q.; software, J.G.; validation, J.G. and R.Q.; formal analysis, J.G., J.L. and R.Q.; investigation, J.G., J.L., C.H., S.T. and R.Q.; data curation, P.P.M., T.L., M.K. and O.R.; writing—original draft preparation, J.G.; writing—review and editing, J.G., J.L., C.H., P.P.M., O.R. and R.Q.; visualization, J.G.; supervision, J.L. and R.Q.; project administration, J.A.B. All authors have read and agreed to the published version of the manuscript.
Not applicable.
Informed consent was obtained from all subjects involved in the study.
The GitHub repository to run the simulations can be found at
The authors declare no conflicts of interest.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1. The mean naive O-information estimation [Forumla omitted. See PDF.] per [Forumla omitted. See PDF.] combination over the 30 trials. This analysis is performed for the fully redundant triplet (left panel), the triplet of independent variables (middle panel), and the fully synergistic triplet (right panel).
Figure 2. The bias [Forumla omitted. See PDF.] of the naive O-information estimation per [Forumla omitted. See PDF.] combination. Boundary lines indicate the theoretically minimum sample size, N, needed to observe the number of bins, K, as well as the joint bin combinations [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.].
Figure 3. The standard deviation of the naive O-information estimations [Forumla omitted. See PDF.] per [Forumla omitted. See PDF.] combination over the 30 trials, for each of the three simulated systems.
Figure 4. The difference in an estimation error between the naive and bias-corrected O-information estimation ([Forumla omitted. See PDF.]). A negative value (blue) implies that [Forumla omitted. See PDF.] is more accurate than the naive estimation [Forumla omitted. See PDF.]. A positive value (red) implies that the naive estimation is more accurate.
Figure 5. Behavior of the true O-information [Forumla omitted. See PDF.], the mean naive O-information estimation [Forumla omitted. See PDF.], and the bias-corrected O-information estimation [Forumla omitted. See PDF.] for [Forumla omitted. See PDF.] bins.
Figure 6. Behavior of the true O-information [Forumla omitted. See PDF.], the mean naive O-information estimation [Forumla omitted. See PDF.], and the mean bias-corrected O-information estimation [Forumla omitted. See PDF.] for [Forumla omitted. See PDF.] bins.
Figure 7. Distribution of the naive estimated O-information [Forumla omitted. See PDF.] and the bias-corrected estimated O-information [Forumla omitted. See PDF.] for all triplets in the feature selected dataset. The dashed lines of [Forumla omitted. See PDF.] and [Forumla omitted. See PDF.], respectively, indicate the mean naive O-information estimation and the mean bias-corrected O-information estimation of the simulated independent triplets. The darker shaded distributions around the dashed lines are the distributions of the simulated O-information estimations.
Appendix A. Miller–Maddow Derivation
The idea of the Miller–Maddow entropy estimation is to make a second-order Taylor approximation of the function
It can be shown that
The first term in this expression is the theoretical entropy H. Simplifying the second term further, we obtain the final Miller–Maddow Entropy bias estimation:
Appendix B. Young Finns Study Data
High-throughput NMR spectroscopy was used for quantification of metabolites from serum [
Lipidome quantification for the stored plasma samples was performed at Zora Biosciences Oy (Espoo, Finland) using liquid chromatography–tandem mass spectrometry (LC-MS/MS). Lipid extraction was based on a previously described method [
Appendix C. Feature Selection on the Young Finns Study Dataset
Due to the high computational cost of calculating the O-information for every possible triplet in the YFS dataset, a the following feature selection methods are implemented:
First, dropping constant and quasi-constant features. Quasi-constant features were defined as features which have the same value in 95% of the available observations; 681 features of the total 690 features remain.
Then, identifying blocks of highly correlated features (absolute Pearson’s R value above 0.85), and keeping only the feature per block which has the highest absolute correlation with “imtka”, the cartiod IMT average, and “ beckpisteet”, a depression score ranking depression from a score of 0 for no depression to 63 for the most severe depression is derived. The carotid IMT average is chosen as a representative value of cardiovascular health, which has been proposed by experts of the field [
38 ]; 374 features of the 690 total features remain.
This feature selection filters the YFS dataset based on relevance to the morbidities associated with cardiovascular disease and depression, as these are the two health disorders represented in the YFS dataset. After feature selection, 374 features of the 690 total features remain. Then, slightly above 8.6 million possible triplets remain in the dataset. The brute force approach to calculate the O-information for each possible remaining triplet is implemented in Julia, due to its speed advantages in comparison with Python.
Appendix D. Top 20 Synergistic Triplets before Bias Correction; Which Ones Are Ratios?
Top 20 most synergistic triplets before bias correction. The ratio triplets are highlighted in gray.
| Variable 1 | Variable 2 | Variable 3 |
---|---|---|---|
−1.462 | MHDLC | MHDLTGPCT | MHDLTG |
−1.331 | LHDLTG | LHDLTGPCT | LHDLCE |
−1.276 | LHDLTG | LHDLTGPCT | LHDLPL |
−1.061 | SLDLTG | MLDLPL | SLDLTGPCT |
−1.022 | LVLDLCEPCT | LVLDLFCPCT | LVLDLCPCT |
−0.978 | XLHDLTG | XLHDLTGPCT | HDLD |
−0.887 | SHDLP | HDLD | LHDLPL |
−0.881 | SLDLTG | LLDLCEPCT | SLDLTGPCT |
−0.867 | SHDLP | SHDLPLPCT | SHDLCE |
−0.864 | TotFA | FAw3 | DHAFA |
−0.830 | LHDLTG | LHDLTGPCT | HDLD |
−0.829 | MLDLPL | MHDLFCPCT | ApoBApoA1 |
−0.814 | RemnantC | SLDLTG | IDLCEPCT |
−0.814 | LLDLCEPCT | SLDLTGPCT | MLDLFCPCT |
−0.8 | MHDLC | SHDLPLPCT | SHDLCE |
−0.794 | IDLFCPCT | MLDLFCPCT | VLDLTG |
−0.789 | MHDLC | SHDLP | HDLD |
−0.784 | LLDLCEPCT | MLDLFCPCT | IDLCEPCT |
−0.773 | SHDLFCPCT | TotFA | MLDLFCPCT |
−0.770 | IDLPLPCT | MLDLPL | SHDLCE |
Top 10 synergistic triplets after removing ratio, mean, and sum triplets and before bias correction.
| Variable 1 | Variable 2 | Variable 3 |
---|---|---|---|
−0.691 | UnSatDeg | SLDLTG | FAw3 |
−0.638 | UnSatDeg | TAG.16.0.18.0.18.1...1 | FAw3 |
−0.616 | PC.34.5 | LPC.20.5_sn1 | PC.32.2 |
−0.595 | PC.34.5 | LPC.20.5_sn2 | PC.32.2 |
−0.588 | UnSatDeg | VLDLTG | FAw3 |
−0.546 | LPC.18.2_sn2 | LPC.20.3_sn1 | PC.38.3..1 |
−0.536 | UnSatDeg | FAw3 | PE.40.4 |
−0.534 | UnSatDeg | FAw3 | XLVLDLTG |
−0.531 | LPC.20.5_sn1 | LPC.18.2_sn2 | PC.36.5b..1 |
−0.529 | PG.36.1 | UnSatDeg | FAw3 |
Top 10 synergistic triplets after removing ratio, mean, and sum triplets and after bias correction.
| Variable 1 | Variable 2 | Variable 3 |
---|---|---|---|
−0.592 | UnSatDeg | SLDLTG | FAw3 |
−0.527 | UnSatDeg | TAG.16.0.18.0.18.1...1 | FAw3 |
−0.524 | PC.34.5 | LPC.20.5_sn1 | PC.32.2 |
−0.503 | PC.34.5 | LPC.20.5_sn2 | PC.32.2 |
−0.463 | UnSatDeg | VLDLTG | FAw3 |
−0.443 | LPC.18.2_sn2 | LPC.20.5_sn2 | PC.36.5b..1 |
−0.441 | LPC.20.5_sn1 | LPC.18.2_sn2 | PC.36.5b..1 |
−0.422 | UnSatDeg | FAw3 | XLVLDLTG |
−0.420 | LPC.18.2_sn2 | LPC.20.3_sn1 | PC.38.3..1 |
−0.403 | LPC.18.2_sn2 | LPC.22.4_sn1 | PC.40.4 |
Appendix E. Top 20 Synergistic Triplets after Bias Correction, with Indication Which Triplets Are Not Present before Bias Correction
Top 20 synergistic triplets after removing ratio, mean, and sum triplets and after bias correction. The newly introduced triplets in the top 20 rank after the bias correction are highlighted in green.
| Variable 1 | Variable 2 | Variable 3 |
---|---|---|---|
−0.592 | UnSatDeg | SLDLTG | FAw3 |
−0.527 | UnSatDeg | TAG.16.0.18.0.18.1...1 | FAw3 |
−0.524 | PC.34.5 | LPC.20.5_sn1 | PC.32.2 |
−0.503 | PC.34.5 | LPC.20.5_sn2 | PC.32.2 |
−0.463 | UnSatDeg | VLDLTG | FAw3 |
−0.443 | LPC.18.2_sn2 | LPC.20.5_sn2 | PC.36.5b..1 |
−0.441 | LPC.20.5_sn1 | LPC.18.2_sn2 | PC.36.5b..1 |
−0.422 | UnSatDeg | FAw3 | XLVLDLTG |
−0.420 | LPC.18.2_sn2 | LPC.20.3_sn1 | PC.38.3..1 |
−0.403 | LPC.18.2_sn2 | LPC.22.4_sn1 | PC.40.4 |
−0.403 | UnSatDeg | FAw3 | PE.40.4 |
−0.399 | XXLVLDLTG | MLDLPL | SHDLCE |
−0.391 | PG.36.1 | UnSatDeg | FAw3 |
−0.389 | UnSatDeg | FAw3 | DAG.14.0.18.1. |
−0.386 | UnSatDeg | XXLVLDLTG | FAw3 |
−0.385 | UnSatDeg | PE.36.1 | FAw3 |
−0.382 | PC.34.5 | PC.32.2 | PC.36.5b..1 |
−0.375 | LacCer.d18.2.16.0. | PC.36.6 | PC.38.7 |
−0.373 | LPE.18.1_sn2 | PE.40.6 | PE.36.1 |
−0.365 | UnSatDeg | TAG.17.0.18.1.18.1. | FAw3 |
Appendix F. Variable Descriptions
The variable descriptions provided by the Young Finn Study are the following:
XLVLDLTG: Triglycerides in very large VLDL; mmol/L.
MLDLPL: Phospholipids in medium LDL; mmol/L.
SLDLTG: Triglycerides in small LDL; mmol/L.
XLHDLTG: Triglycerides in very large HDL; mmol/L.
LHDLPL: Phospholipids in large HDL; mmol/L.
LHDLCE: Cholesterol esters in large HDL; mmol/L.
LHDLTG: Triglycerides in large HDL; mmol/L.
MHDLC: Total cholesterol in medium HDL; mmol/L.
MHDLTG: Triglycerides in medium HDL; mmol/L.
SHDLP: Concentration of small HDL particles; mol/L.
SHDLCE: Cholesterol esters in small HDL; mmol/L.
LVLDLCPCT: Total cholesterol to total lipids ratio in large VLDL.
LVLDLCEPCT: Cholesterol esters to total lipids ratio in large VLDL.
LVLDLFCPCT: Free cholesterol to total lipids ratio in large VLDL.
IDLPLPCT: Phospholipids to total lipds ratio in IDL.
IDLCEPCT: Cholesterol esters to total lipids ratio in IDL.
IDLFCPCT: Free cholesterol to total lipids ratio in IDL.
LLDLCEPCT: Cholesterol esters to total lipids ratio in large LDL.
MLDLFCPCT: Free cholesterol to total lipids ratio in medium LDL.
SLDLTGPCT: Triglycerides to total lipids ratio in small LDL.
XLHDLTGPCT: Triglycerides to total lipids ratio in very large HDL.
LHDLTGPCT: Triglycerides to total lipids ratio in large HDL.
MHDLFCPCT: Free cholesterol to total lipids ratio in medium HDL.
MHDLTGPCT: Triglycerides to total lipids ratio in medium HDL.
SHDLPLPCT: Phospholipids to total lipds ratio in small HDL.
SHDLFCPCT: Free cholesterol to total lipids ratio in small HDL.
HDLD: Mean diameter for HDL particles; nm.
RemnantC: Remnant cholesterol (non-HDL07, non-LDL -cholesterol); mmol/L.
VLDLTG: Triglycerides in VLDL; mmol/L.
ApoBApoA1: Ratio of apolipoprotein B to apolipoprotein A-I.
TotFA: Total fatty acids; mmol/L.
UnSatDeg: Estimated degree of unsaturation.
FAw3: Omega-3 fatty acids; mmol/L.
DHAFA: Ratio of 22:6 docosahexaenoic acid to total fatty acids.
LPC.18.2_sn2: Lysophosphatidylcholine.
LPC.20.3_sn1: Lysophosphatidylcholine.
LPC.20.5_sn1: Lysophosphatidylcholine.
LPC.20.5_sn2: Lysophosphatidylcholine.
PC.38.3..1: Phosphatidylcholine.
PC.36.5b..1: Phosphatidylcholine.
PC.32.2: Phosphatidylcholine.
PC.34.5: Phosphatidylcholine.
TAG.16.0.18.0.18.1...1: Triacylglycerol.
References
1. Yu, S. A Simple Extended-Cavity Diode Laser. J. Neurosci.; 2011; 69, pp. 1236-1239.
2. Giusti, C.; Ghrist, R.; Bassett, D.S. Two’s company, three (or more) is a simplex: Algebraic-topological tools for understanding higher-order structure in neural data. J. Comput. Neurosci.; 2016; 41, pp. 1-14. [DOI: https://dx.doi.org/10.1007/s10827-016-0608-6] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27287487]
3. Stramaglia, S.; Scagliarini, T.; Daniels, B.C.; Marinazzo, D. Quantifying Dynamical High-Order Interdependencies From the O-Information: An Application to Neural Spiking Dynamics. Front. Physiol.; 2021; 11, 595736. [DOI: https://dx.doi.org/10.3389/fphys.2020.595736] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33519503]
4. Gatica, M.; Cofré, R.; Mediano, P.A.; Rosas, F.E.; Orio, P.; Diez, I.; Swinnen, S.P.; Cortes, J.M. High-Order Interdependencies in the Aging Brain. Brain Connect.; 2021; 11, pp. 734-744. [DOI: https://dx.doi.org/10.1089/brain.2020.0982] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33858199]
5. Sanchez-Gorostiaga A, B.D. High-order interactions distort the functional landscape of microbial consortia. PLoS Biol.; 2019; 17, e3000550. [DOI: https://dx.doi.org/10.1371/journal.pbio.3000550]
6. Wasserman, S.; Faust, K. Social Network Analysis: Methods and Applications; Cambridge University Press: Cambridge, UK, 1994.
7. Lucas, M.; Cencetti, G.; Battiston, F. Multiorder Laplacian for synchronization in higher-order networks. Phys. Rev. Res.; 2020; 2, 033410. [DOI: https://dx.doi.org/10.1103/PhysRevResearch.2.033410]
8. Quax, R.; Har-Shemesh, O.; Sloot, P.M.A. Quantifying Synergistic Information Using Intermediate Stochastic Variables. Entropy; 2017; 19, 85. [DOI: https://dx.doi.org/10.3390/e19020085]
9. Rosas, F.E.; Mediano, P.A.M.; Gastpar, M.; Jensen, H.J. Quantifying high-order interdependencies via multivariate extensions of the mutual information. Phys. Rev. E; 2019; 100, 032305. [DOI: https://dx.doi.org/10.1103/PhysRevE.100.032305]
10. Williams, P.L.; Beer, R.D. Nonnegative Decomposition of Multivariate Information. arXiv; 2010; arXiv: 1004.2515
11. Kunert-Graf, J.; Sakhanenko, N.; Galas, D. Partial Information Decomposition and the Information Delta: A Geometric Unification Disentangling Non-Pairwise Information. Entropy; 2020; 22, 1333. [DOI: https://dx.doi.org/10.3390/e22121333]
12. Scagliarini, T.; Marinazzo, D.; Guo, Y.; Stramaglia, S.; Rosas, F.E. Quantifying high-order interdependencies on individual patterns via the local O-information: Theory and applications to music analysis. Phys. Rev. Res.; 2022; 4, 013184. [DOI: https://dx.doi.org/10.1103/PhysRevResearch.4.013184]
13. Kolchinsky, A. A Novel Approach to the Partial Information Decomposition. Entropy; 2022; 24, 403. [DOI: https://dx.doi.org/10.3390/e24030403] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/35327914]
14. Finn, C.; Lizier, J.T. Generalised Measures of Multivariate Information Content. Entropy; 2020; 22, 216. [DOI: https://dx.doi.org/10.3390/e22020216] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33285991]
15. Finn, C.; Lizier, J.T. Pointwise Partial Information Decomposition Using the Specificity and Ambiguity Lattices. Entropy; 2018; 20, 297. [DOI: https://dx.doi.org/10.3390/e20040297] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33265388]
16. Ay, N. Information Geometry on Complexity and Stochastic Interaction. Entropy; 2015; 17, pp. 2432-2458. [DOI: https://dx.doi.org/10.3390/e17042432]
17. Niu, X.; Quinn, C.J. A Measure of Synergy, Redundancy, and Unique Information using Information Geometry. Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT); Paris, France, 7–12 July 2019; pp. 3127-3131. [DOI: https://dx.doi.org/10.1109/ISIT.2019.8849724]
18. Sparacino, L.; Faes, L.; Mijatović, G.; Parla, G.; Lo Re, V.; Miraglia, R.; de Ville de Goyet, J.; Sparacia, G. Statistical Approaches to Identify Pairwise and High-Order Brain Functional Connectivity Signatures on a Single-Subject Basis. Life; 2023; 13, 2075. [DOI: https://dx.doi.org/10.3390/life13102075]
19. Antonacci, Y.; Minati, L.; Nuzzi, D.; Mijatovic, G.; Pernice, R.; Marinazzo, D.; Stramaglia, S.; Faes, L. Measuring high-order interactions in rhythmic processes through multivariate spectral information decomposition. IEEE Access; 2021; 9, pp. 149486-149505. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3124601]
20. Faes, L.; Mijatovic, G.; Antonacci, Y.; Pernice, R.; Barà, C.; Sparacino, L.; Sammartino, M.; Porta, A.; Marinazzo, D.; Stramaglia, S. A new framework for the time- and frequency-domain assessment of high-order interactions in networks of random processes. IEEE Trans. Signal Process.; 2022; 70, pp. 5766-5777. [DOI: https://dx.doi.org/10.1109/TSP.2022.3221892]
21. Pirovano, I.; Antonacci, Y.; Mastropietro, A.; Bara, C.; Sparacino, L.; Guanziroli, E.; Molteni, F.; Tettamanti, M.; Faes, L.; Rizzo, G. Rehabilitation Modulates High-Order Interactions Among Large-Scale Brain Networks in Subacute Stroke. IEEE Trans. Neural Syst. Rehabil. Eng.; 2023; 31, pp. 4549-4560. [DOI: https://dx.doi.org/10.1109/TNSRE.2023.3332114]
22. Scagliarini, T.; Nuzzi, D.; Antonacci, Y.; Faes, L.; Rosas, F.E.; Marinazzo, D.; Stramaglia, S. Gradients of O-information: Low-order descriptors of high-order dependencies. arXiv; 2022; arXiv: 2207.03581
23. Beirlant, J.; Dudewicz, E.; Gyor, L.; Meulen, E. Nonparametric Entropy Estimation: An Overview. Int. J. Math. Stat. Sci.; 1997; 6, pp. 17-39.
24. Miller, G. Note on the Bias of Information Estimates. Information Theory in Psychology. Problems and Methods; Quastler, H. Free Press: Glencoe, IN, USA, 1955; 95p. [DOI: https://dx.doi.org/10.1126/science.123.3200.747.a]
25. Zahl, S. Jackknifing An Index of Diversity. Ecology; 1977; 58, pp. 907-913. [DOI: https://dx.doi.org/10.2307/1936227]
26. Grassberger, P. Entropy Estimates from Insufficient Samplings. arXiv; 2008; arXiv: physics/0307138
27. Wolpert, D.H.; Wolf, D.R. Estimating Functions of Probability Distributions from a Finite Set of Samples, Part 1: Bayes Estimators and the Shannon Entropy. arXiv; 1994; arXiv: comp-gas/9403001
28. Panzeri, S.; Senatore, R.; Montemurro, M.A.; Petersen, R.S. Correcting for the Sampling Bias Problem in Spike Train Information Measures. J. Neurophysiol.; 2007; 98, pp. 1064-1072. [DOI: https://dx.doi.org/10.1152/jn.00559.2007] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/17615128]
29. Raitakari, O.T.; Juonala, M.; Rönnemaa, T.; Keltikangas-Järvinen, L.; Räsänen, L.; Pietikäinen, M.; Hutri-Kähönen, N.; Taittonen, L.; Jokinen, E.; Marniemi, J. et al. Cohort profile: The Cardiovascular Risk in Young Finns Study. Int. J. Epidemiol.; 2008; 37, pp. 1220-1226. [DOI: https://dx.doi.org/10.1093/ije/dym225]
30. Patient Education: High Cholesterol and Lipids (Beyond the Basics). Available online: https://www.uptodate.com/contents/high-cholesterol-and-lipids-beyond-the-basics/ (accessed on 3 April 2024).
31. Harris, B. Colloquia Mathematica Societatis János Bolyai; North-Holland, János Bolyai Mathematical Society, Elsevier Science Publishing Company Inc.: Amsterdam, The Netherlands, 1975.
32. Morsch, L.; Farmer, S.; Cunningham, K.; Sharrett, Z.; Shea, K.M. 7.3: Calculating Degree of Unsaturation. Organic Chemistry; John Wiley & Sons: Hoboken, NJ, USA, 2015; Chapter 7
33. Oluk, C.A.; Karaca, O.B. 18-Functional food ingredients and nutraceuticals, milk proteins as nutraceuticals nanoScience and food industry. Nutraceuticals; Grumezescu, A.M. Nanotechnology in the Agri-Food Industry, Academic Press: Cambridge, MA, USA, 2016; pp. 715-759.
34. Barà, C.; Pernice, R.; Catania, C.A.; Hilal, M.; Porta, A.; Humeau-Heurtier, A.; Faes, L. Comparison of entropy rate measures for the evaluation of time series complexity: Simulations and application to heart rate and respiratory variability. Biocybern. Biomed. Eng.; 2024; 44, pp. 380-392. [DOI: https://dx.doi.org/10.1016/j.bbe.2024.04.004]
35. Soininen, P.; Kangas, A.J.; Würtz, P.; Tukiainen, T.; Tynkkynen, T.; Laatikainen, R.; Järvelin, M.R.; Kähönen, M.; Lehtimäki, T.; Viikari, J. et al. High-throughput serum NMR metabonomics for cost-effective holistic studies on systemic metabolism. Analyst; 2009; 134, pp. 1781-1785. [DOI: https://dx.doi.org/10.1039/b910205a]
36. Wong, G.; Barlow, C.K.; Weir, J.M.; Jowett, J.B.; Magliano, D.J.; Zimmet, P.; Shaw, J.; Meikle, P.J. Inclusion of plasma lipid species improves classification of individuals at risk of type 2 diabetes. PLoS ONE; 2013; 8, e76577. [DOI: https://dx.doi.org/10.1371/journal.pone.0076577]
37. Braicu, E.I.; Darb-Esfahani, S.; Schmitt, W.D.; Koistinen, K.M.; Heiskanen, L.; Pöhö, P.; Budczies, J.; Kuhberg, M.; Dietel, M.; Frezza, C. et al. High-grade ovarian serous carcinoma patients exhibit profound alterations in lipid metabolism. Oncotarget; 2017; 8, 102912. [DOI: https://dx.doi.org/10.18632/oncotarget.22076]
38. Ebrahim, S.; Papacosta, O.; Whincup, P.; Wannamethee, G.; Walker, M.; Nicolaides, A.N.; Dhanjil, S.; Griffin, M.; Belcaro, G.; Rumley, A. et al. Carotid Plaque, Intima Media Thickness, Cardiovascular Risk Factors, and Prevalent Cardiovascular Disease in Men and Women. Stroke; 1999; 30, pp. 841-850. [DOI: https://dx.doi.org/10.1161/01.STR.30.4.841]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Higher-order relationships are a central concept in the science of complex systems. A popular method of attempting to estimate the higher-order relationships of synergy and redundancy from data is through the O-information. It is an information–theoretic measure composed of Shannon entropy terms that quantifies the balance between redundancy and synergy in a system. However, bias is not yet taken into account in the estimation of the O-information of discrete variables. In this paper, we explain where this bias comes from and explore it for fully synergistic, fully redundant, and fully independent simulated systems of
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 Computational Science Lab, Informatics Institute, University of Amsterdam, 1098 Amsterdam, The Netherlands;
2 Unit of Medical Technology and Intelligent Information Systems (MEDLAB), Department of Material Science and Engineering, University of Ioannina, 45110 Ioannina, Greece;
3 Department of Clinical Chemistry, Faculty of Medicine and Health Technology, Tampere University, 33720 Tampere, Finland;
4 Finnish Cardiovascular Research Center Tampere, Faculty of Medicine and Health Technology, Tampere University, 33720 Tampere, Finland;
5 Centre for Population Health Research, University of Turku and Turku University Hospital, 20520 Turku, Finland;
6 Clinical Psychology, Faculty of Social and Behavioural Sciences, University of Amsterdam, 1018 Amsterdam, The Netherlands;
7 Computational Science Lab, Informatics Institute, University of Amsterdam, 1098 Amsterdam, The Netherlands;