This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
In genome-wide association studies (GWAS), genotype data from a large number of single nucleotide polymorphisms (SNPs) are collected, to associate SNPs with traits of interest [1]. Not only single gene effects, but also interaction effects, between genes, play important roles in complex diseases such as hypertension, diabetes, and autism. By identifying gene-gene interactions (GGIs), we expect to increase statistical power, to detect associations. Moreover, we also hope to clarify the biological pathways underlying human diseases, by detecting interactions between loci [2].
In many cases, a phenotype is considered, and there are various studies on statistical methods for finding GGIs, for univariate phenotypes. For studying qualitative traits, as in the case-control studies, one simple way for identifying genetic interaction is to fit a logistic regression model (LRM) that includes main effects and relevant interaction terms. However, LRMs perform poorly when there is a dimensionality problem. Another well-known approach is a multifactor dimensionality reduction (MDR) method [3, 4], which reduces dimensions by converting a high-dimensional to a one-dimensional model. The genotype combinations are classified as either “high-risk” or “low-risk,” depending on the ratio of cases to controls, for each genotype combination. Thus, an MDR can avoid the issues of sparse data cells and overparameterization of models [2] and can outperform LRMs, for detecting higher order GGIs [5]. Recently, various approaches such as using multiple contingency table (MODENDR) [6] or particle swarm optimization method (PBMDR) have been developed [7].
Due to its superior performance there are now various extensions of MDR, including ordinal phenotypes, quantitative phenotypes, survival information, and odds-ratio-based analysis [8–11]. One specific extension of MDR, generalized MDR, which is applicable to both dichotomous and continuous traits, was proposed [12]. However, GMDR does not provide a computationally efficient algorithm that is easy to implement, and it still requires a dichotomous outcome in the data file [9]. As an alternative, quantitative MDR (QMDR) modified MDR’s constructive induction algorithm, which assigns a genotype to either the high- or low-risk groups by comparing the local and global means and then applies a t-test to compare the means of the two groups. More recently, cluster-based MDR (CL-MDR), which is less sensitive to outliers and distributional assumptions, was also developed [13, 14]. Compared to QMDR, CL-MDR was shown to yield higher power, when the phenotype distribution is skewed. However, CL-MDR was developed only for univariate phenotype rather than multivariate phenotypes.
When considering multiple phenotypes, it becomes more difficult to find GGIs. Thus, most GWAS studies still focus on one trait to identify genetic variants associated with common complex traits, even though multiple phenotypes or repeated measurements of phenotypes are available. However, in the study of a complex disease, several correlated traits are often measured at the same time as risk factors for the disease. For example, it is known that intermediately correlated phenotypes, such as Factors VII, VIII, IX, XI, and XII and von Willebrand factor, jointly predict the risk of developing thrombosis [1, 9, 20]. By modeling multivariate disease-related traits, the power to detect associations between genes and diseases is expected to increase. Analyses of multiple traits have been successful in analyzing various complex diseases. In general, the multivariate approach has several advantages over the univariate approach considering one trait at a time. For example, the multivariate approach can consider several traits simultaneously in one model and hence it can take into account the correlation among traits. As a result, the multivariate approach would have higher power to detect pleiotropic genes and it can identify genetic variants not easily detected by the univariate approach [21].
There is relatively less GGI research on multivariate traits case. To deal with multiple phenotypes, generalized estimating equations (GEE)-GMDR is an extension of GMDR method, using the GEE model [22]. Multi-QMDR, which extends QMDR to multivariate cases, has also been proposed [5]. Multi-QMDR classifies samples into high- vs. low-risk groups, by using summary statistics, based mainly on principal component scores. After classification, the two groups’ mean vectors are compared, using Hotelling’s
Recently, several MDR extensions were proposed using the fuzzy set theory [23–27]. Such fuzzy set-based MDR methods classify high-risk or low-risk groups as equivalent to defining the degree of membership in high- and low-risk groups. By adopting the fuzzy set theory, fuzzy set-based MDR methods take into account the uncertainty of this binary classification. Fuzzy set-based MDR methods allow the possibility of partial membership into high- and low-risk groups, through a membership function, which transforms the degree of uncertainty into a
Here, we propose a new method to detect GGIs for multiple quantitative traits. The main idea of our method to detect GGIs for multiple quantitative traits lies in combining fuzzy clustering with a modified multifactor dimensionality reduction (MDR) approach, named “multivariate cluster MDR” (multi-CMDR). Like other MDR-based methods, multi-CMDR also pools multiple genotype combinations into two groups and uses them as a new attribute, reducing multidimensional space into one dimension. To classify genotype combinations, we first performed fuzzy k-means clustering and computed a threshold, representing the ratio of the sum of the membership degrees of the two groups. Each multilocus genotype is labeled by comparing the local ratio, in each multilocus genotype, to the global ratio. Then, multi-CMDR identifies the best genotype model, using Hotelling’s
We first introduce the multi-CMDR method in detail in Section 2. We next present a simulation study in Section 3, to show the performance of the proposed methods by comparing them to other methods, such as multi-QMDR. For a phenotype distribution, multivariate normal and multivariate gamma distributions are considered. In Section 4, we apply our method to three lipid-related phenotypes data extracted from the GWA study of the Korean Association Resource (KARE) project, as an illustration. We end with some conclusions in Section 5.
2. Materials and Methods
In this section, we introduce a new procedure, multi-CMDR, for finding GGIs for multiple continuous phenotypes. Similar to other MDR-based methods, multi-CMDR pools multiple genotype combinations into two groups and uses them as a new attribute that reduces a multidimensional space into only one dimension. The detailed algorithm is described in Figure 1 and the multi-CMDR pseudocode is presented in Pseudocode 1.
Pseudocode 1: Pseudocode of multi-CMDR.
(01) perform fuzzy k-means clustering with noise cluster for phenotypes
(02) remove samples in noise cluster
(03) compute global ratio
(04) get all combinations of SNPs
(05) divide samples into N folds
(06) for k = 1 to N
(07) set samples in kth folds as test dataset and the other samples as training data
(08) for i = 1 to number of all combinations of SNPs
(09) get all combination of genotypes
(10) for j = 1 to number of all combination of genotypes
(11) compute local ratio
(12) classify each genotype combination as
(13) end j
(14) compute Hotelling’s
(15) end i
(16) select the best SNP combination at
(17) end k
(18) compute CVC and select SNP combination with highest CVC as the best SNP combination
(19) compute p-value by permutation test for the best SNP combination
Step 0. Preprocessing.
(i)
Suppose there are
(ii)
Standardize all the phenotypes to have a mean of zero and no unit variance.
Step 1. Perform fuzzy k-means clustering.
(i)
Perform fuzzy k−means clustering with
such that
Step 2. Trim the data and calculate the global ratio.
(i)
Data are trimmed by removing all the samples in the noise cluster. The remaining samples have membership degrees for each of the two groups. Denote these two groups as
(ii)
Calculate global ratio
where
Step 3. Divide the samples by N-folds.
(i)
For N-folds, split the cross-validation (CV) samples randomly into N subgroups of equal size. Let N-1 sets of samples be the training dataset and let the remaining dataset be the test dataset used for evaluating the model.
Step 4. Calculate the local ratio.
(i)
To find the
where
(ii)
Label each genotype combination either “
Step 5. Calculate the test statistic.
(i)
Calculate Hotelling’s
(ii)
The model with the largest statistic in the training data is chosen as the best model. Statistics for the test data will be performed later.
Step 6. Find the final best model and obtain the empirical p-value.
(i)
Repeat Steps 4 and 5 N times, for each fold, and count the number of specific SNP combinations for the best model. We call this cross-validation consistency (CVC).
(ii)
Find the best final interaction model, i.e., the one with the largest CVC.
(iii)
Derive the final statistic for the best model by averaging N
(iv)
To evaluate the statistical significance of the best model, perform a permutation test and obtain the empirical p-value. Generate
where
3. Results and Discussion
3.1. Simulation Analysis
In this section, we conducted simulations to compare the performance of the proposed multi-CMDR method, with multi-QMDR and univariate QMDR methods. We also compared the performance of the two versions of multi-CMDR. One version is a nontrimmed version of multi-CMDR. That is, the noise cluster is not generated in the fuzzy clustering step. The other version uses k-means clustering, without considering membership score. For multi-QMDR methods, the First Principal Component (FPC) was used to classify each cell into high- or low-risk groups, as previously described [5]. For a univariate approach, QMDR was performed for each phenotype, separately. All of these methods were compared in terms of their hit-ratios, representing the ratio at which the true causal SNP pair is identified by the best model.
We then generated a multivariate normal distribution and a multivariate gamma distribution for phenotypes. We used 70 different penetrance functions that define a probabilistic relationship with disease-causal interaction. The models consisted of 7 different heritability values (0.01, 0.025, 0.05, 0.1, 0.2, 0.3, and 0.4) and 2 different minor allele frequencies (MAFs, 0.2 and 0.4). A total of 5 models for each of the 14 heritability-minor allele frequency combinations were considered. Thus, a total of 70 models were generated. The details of the 70 penetrance functions are given in [29]. For every 70 models, 100 datasets were generated. For each dataset, the sample size was 400, and we considered 20 SNPs and 2 continuous phenotypes. SNP1 and SNP2 denoted disease-causal SNP interactions. We used 10-fold cross-validation to determine best overall model.
3.1.1. Multivariate Normal Distribution
For the multivariate normal distributed case, two continuous phenotype values,
The hit-ratios for each heritability values are reported in Figure 2. In the bivariate normal distribution case, all the multivariate methods were generally more powerful than the univariate QMDR methods. As the correlation increased, however, the difference between multivariate and univariate methods decreased. All multivariate methods showed similar performance. In the case of zero correlation, multi-QMDR showed slightly better performance than multi-CMDR. The hit-ratios of multi-CMDR, with trimming, were similar to those of multi-CMDR without trimming. That is, there was no effect of trimming outliers in multi-CMDR for the bivariate normal distribution case. The lower the correlation, the higher the hit-ratio, when the values of heritability were 0.05, 0.1, and 0.2. This is because the lower the correlation, the more unique information for each variable. In a similar context, when the correlation was high, the hit-ratios of the multivariate and univariate methods were similar.
[figure omitted; refer to PDF]3.1.2. Multivariate Gamma Distribution
For the skewed distribution, we generated bivariate gamma distribution using Gaussian copula [30]. In the Gaussian copula, the correlation matrix is responsible for the dependence. We used the same correlation structure, for the bivariate normal case. When the marginal distributions were continuous, a bivariate distribution could be defined by the density of the following form:
In Figure 2, we observed that the proposed multi-CMDR outperformed the QMDR and the multi-QMDR, for all ranges of heritability, for the bivariate gamma distribution case. Also, multi-CMDR, without trimming, performed better than multi-QMDR. For the bivariate gamma distribution, the lower the correlation, the higher the overall hit ratio. The difference of hit-ratios between multi-CMDR and other methods was greatest when the heritability was 0.1. As the correlation increases, the differences between hit ratios of the multivariate methods, except multi-CMDR, decrease.
To sum up, the power of proposed multi-CMDR is similar to that of multi-QMDR, for symmetric distribution while it outperformed multi-QMDR for the skewed distribution. Moreover, the powers of the two different versions of multi-CMDR were also slightly better than those of multi-QMDR, in skewed phenotype distributions. For all situations, multivariate methods performed better than univariate methods. Results for each combination of two minor allele frequency (MAF) values and 5 models are presented in the supplemental materials (Supplemental Figures 1-6).
3.1.3. Empirical False Positive Rate
We computed empirical false positive rate. To compute empirical false positive rate, we permuted phenotypes over individuals for each case to generate null data. The selection rate of each SNP pair in null data is
3.2. Real Biological Data Analysis
For real-life data analysis, three lipid-related phenotypes’ data, retrieved from the Korean Association Resource (KARE) project [31], were considered to evaluate the proposed multi-CMDR. Three lipid-related phenotypes consisted of high-density lipoprotein cholesterol (HDL), low-density lipoprotein cholesterol (LDL), and triglyceride (TG). After removing those observations with at least one missing phenotype value, there were 8,581 samples remaining. The largest absolute value of correlation between three phenotypes was 0.39 (Figure 3). Among 344,596 SNPs, we used 324 SNPs selected in [5] for this analysis.
[figure omitted; refer to PDF]We then applied the proposed multi-CMDR to search for the best second interaction model, again by using 10-fold CV. Table 1 displays the best
Table 1
Best models from
Order | rs ID | Chr. | CVC | Hotelling’s | p-value | Ref. |
---|---|---|---|---|---|---|
| rs11066280 | 12 | 4 | 2.86 | <0.001 | [5, 15] |
rs10503669 | 8 | 4 | 2.79 | <0.001 | [16] | |
rs2074356 | 12 | 2 | 2.82 | <0.001 | [1] | |
| ||||||
| rs11216126, rs4244457 | 11, 8 | 4 | 3.86 | <0.001 | [5, 17] |
rs11600380, rs10503669 | 11, 8 | 3 | 3.54 | <0.001 | [16, 18] | |
rs11216126, rs10503669 | 11, 8 | 1 | 3.29 | <0.001 | [17, 18] | |
rs16940212, rs10503669 | 15, 8 | 1 | 3.57 | <0.001 | [18, 19] | |
rs16940212, rs4244457 | 15, 8 | 1 | 2.78 | <0.001 | [5, 19] |
For
For
4. Discussion
For GGI analysis for multiple quantitative traits, we proposed multi-CMDR. Analyzing correlated multivariate phenotypes was shown to have higher power to detect susceptible genes and GGIs, by using more information from data [32]. The main feature differences between multi-QMDR and multi-CMDR lies in how to define groups for each combination cell. Multi-QMDR uses summary scores obtained by principal component analysis to classify high-risk and low-risk groups. The observations of each cell are assigned to the high-risk group if the local mean is greater than or equal to the global mean; otherwise the observations are assigned to the low-risk group. On the other hand, multi-CMDR divides groups using clustering. By comparing the global and local ratios, as calculated by using the membership degrees obtained through fuzzy k-means clustering, the observations of each cell are assigned to
This proposed multi-CMDR was shown to be less sensitive for outliers and nonsymmetric distributions than other methods. 10-fold cross-validation and Hotelling’s
In terms of computation time efficiency, multi-QMDR was slightly faster than multi-CMDR. Using an AMD Ryzen 2700x desktop machine with 16G RAM, multi-QMDR took 145.8841 seconds on average (100 repetitions) to conduct real data analysis for the first-order interaction, whereas multi-CMDR took 162.7906 seconds on average. For simulation dataset with 400 sample size and 20 SNPs, multi-QMDR took 17.3334 seconds on average to conduct the
5. Conclusion
For the analysis of GGIs associated with multiple quantitative traits, we proposed a new extension of the MDR algorithm that includes clustering. Using fuzzy k-means clustering, we divided samples into two groups and trimmed outliers in noise cluster. By fuzzy k-means clustering, we can capture numerous attributes of multivariate data. Therefore, this is a very productive way to use values calculated from clusters to set thresholds to assign observations to specific groups, in that the proposed multi-CMDR uses a fuzzy k-means clustering method. Unlike k-means clustering, where each observation is assigned to only one cluster, fuzzy k-means clustering provides each observation with a degree of membership to each cluster. Fuzzy k-means clustering is especially useful when the cluster boundary is not clear, and it also allows outliers to be clustered into a noise cluster and reflects individual membership degrees of elements in the same cluster. We expect that multi-CMDR would improve the identification of gene-gene interactions associated with numerous multifactorial human pathologies.
Disclosure
This paper has been presented at 2018 annual meeting of the Western North American Region of the International Biometric Society (WNAR), Edmonton, Canada. Our earlier work on univariate CL-MDR was presented at 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, USA.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
[1] S. Basu, Y. Zhang, D. Ray, M. B. Miller, W. G. Iacono, M. McGue, "A Rapid Gene-Based Genome-Wide Association Test with Multivariate Traits," Human Heredity, vol. 76 no. 2, pp. 53-63, DOI: 10.1159/000356016, 2013.
[2] H. J. Cordell, "Detecting gene-gene interactions that underlie human diseases," Nature Reviews Genetics, vol. 10 no. 6, pp. 392-404, DOI: 10.1038/nrg2579, 2009.
[3] M. D. Ritchie, L. W. Hahn, N. Roodi, L. R. Bailey, W. D. Dupont, F. F. Parl, J. H. Moore, "Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer," American Journal of Human Genetics, vol. 69 no. 1, pp. 138-147, DOI: 10.1086/321276, 2001.
[4] L. W. Hahn, M. D. Ritchie, J. H. Moore, "Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions," Bioinformatics, vol. 19 no. 3, pp. 376-382, DOI: 10.1093/bioinformatics/btf869, 2003.
[5] W. Yu, M. Kwon, T. Park, "Multivariate Quantitative Multifactor Dimensionality Reduction for Detecting Gene-Gene Interactions," Human Heredity, vol. 79 no. 3-4, pp. 168-181, DOI: 10.1159/000377723, 2015.
[6] C. Yang, L. Chuang, Y. Lin, "Multiobjective differential evolution-based multifactor dimensionality reduction for detecting gene-gene interactions," . 2017
[7] C. Yang, H. Yang, L. Chuang, "PBMDR: A particle swarm optimization-based multifactor dimensionality reduction for the detection of multilocus interactions," Journal of Theoretical Biology, vol. 461, pp. 68-75, DOI: 10.1016/j.jtbi.2018.10.012, 2019.
[8] D. Gola, J. M. Mahachie John, K. van Steen, I. R. König, "A roadmap to multifactor dimensionality reduction methods," Briefings in Bioinformatics, vol. 17 no. 2, pp. 293-308, DOI: 10.1093/bib/bbv038, 2016.
[9] M. Germain, N. Saut, N. Greliche, C. Dina, J.-C. Lambert, C. Perret, W. Cohen, T. Oudot-Mellakh, G. Antoni, M.-C. Alessi, D. Zelenika, F. Cambien, L. Tiret, M. Bertrand, A.-M. Dupuy, L. Letenneur, M. Lathrop, J. Emmerich, P. Amouyel, D.-A. Trégouët, P.-E. Morange, "Genetics of venous thrombosis: insights from a new genome wide association study," PLoS ONE, vol. 6 no. 9,DOI: 10.1371/journal.pone.0025581, 2011.
[10] Y. Chung, S. Y. Lee, R. C. Elston, T. Park, "Odds ratio based multifactor-dimensionality reduction method for detecting gene-gene interactions," Bioinformatics, vol. 23 no. 1, pp. 71-76, DOI: 10.1093/bioinformatics/btl557, 2007.
[11] S. Yeoun Lee, Y. Chung, R. C. Elston, Y. Kim, T. Park, "Log-linear model-based multifactor dimensionality reduction method to detect gene-gene interactions," Bioinformatics, vol. 23 no. 19, pp. 2589-2595, DOI: 10.1093/bioinformatics/btm396, 2007.
[12] X.-Y. Lou, G.-B. Chen, L. Yan, J. Z. Ma, J. Zhu, R. C. Elston, M. D. Li, "A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence," American Journal of Human Genetics, vol. 80 no. 6, pp. 1125-1137, DOI: 10.1086/518312, 2007.
[13] Y. Lee, H. Kim, T. Park, M. Park, "Cluster-based multifactor dimensionality reduction method to identify gene-gene interactions for quantitative traits in genome-wide studies," Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM '17), pp. 1772-1776, .
[14] Y. Lee, H. Kim, T. Park, M. Park, "Gene-gene interaction analysis for quantitative trait using cluster-based multifactor dimensionality reduction method," International Journal of Data Mining and Bioinformatics, vol. 20 no. 1,DOI: 10.1504/IJDMB.2018.092155, 2018.
[15] N. Kato, F. Takeuchi, Y. Tabara, T. N. Kelly, M. J. Go, X. Sim, W. T. Tay, C.-H. Chen, Y. Zhang, K. Yamamoto, T. Katsuya, M. Yokota, Y. J. Kim, R. T. H. Ong, T. Nabika, D. Gu, L.-C. Chang, Y. Kokubo, W. Huang, K. Ohnaka, Y. Yamori, E. Nakashima, C. E. Jaquish, J.-Y. Lee, M. Seielstad, M. Isono, J. E. Hixson, Y.-T. Chen, T. Miki, X. Zhou, T. Sugiyama, J.-P. Jeon, J. J. Liu, R. Takayanagi, S. S. Kim, T. Aung, Y. J. Sung, X. Zhang, T. Y. Wong, B.-G. Han, S. Kobayashi, T. Ogihara, D. Zhu, N. Iwai, J.-Y. Wu, Y. Y. Teo, E. S. Tai, Y. S. Cho, J. He, "Meta-analysis of genome-wide association studies identifies common variants associated with blood pressure variation in east Asians," Nature Genetics, vol. 43 no. 6, pp. 531-538, DOI: 10.1038/ng.834, 2011.
[16] C. J. Willer, S. Sanna, A. U. Jackson, "Newly identified loci that influence lipid concentrations and risk of coronary artery disease," Nature Genetics, vol. 40 no. 2, pp. 161-169, DOI: 10.1038/ng.76, 2008.
[17] Y. J. Kim, M. J. Go, C. Hu, C. B. Hong, Y. K. Kim, J. Y. Lee, J.-Y. Hwang, J. H. Oh, D.-J. Kim, N. H. Kim, S. Kim, E. J. Hong, J.-H. Kim, H. Min, R. Zhang, W. Jia, Y. Okada, A. Takahashi, M. Kubo, T. Tanaka, N. Kamatani, K. Matsuda, T. Park, B. Oh, K. Kimm, D. Kang, C. Shin, N. H. Cho, H.-L. Kim, B.-G. Han, J.-Y. Lee, Y. S. Cho, "Large-scale genome-wide association studies in East Asians identify new genetic loci influencing metabolic traits," Nature Genetics, vol. 43 no. 10, pp. 990-995, DOI: 10.1038/ng.939, 2011.
[18] F. Asselbergs, Y. Guo, E. van Iperen, S. Sivapalaratnam, V. Tragante, M. Lanktree, L. Lange, B. Almoguera, Y. Appelman, J. Barnard, J. Baumert, A. Beitelshees, T. Bhangale, Y. Chen, T. Gaunt, Y. Gong, J. Hopewell, T. Johnson, M. Kleber, T. Langaee, M. Li, Y. Li, K. Liu, C. McDonough, M. Meijs, R. Middelberg, K. Musunuru, C. Nelson, J. O’Connell, S. Padmanabhan, J. Pankow, N. Pankratz, S. Rafelt, R. Rajagopalan, S. Romaine, N. Schork, J. Shaffer, H. Shen, E. Smith, S. Tischfield, P. van der Most, J. van Vliet-Ostaptchouk, N. Verweij, K. Volcik, L. Zhang, K. Bailey, K. Bailey, F. Bauer, J. Boer, P. Braund, A. Burt, P. Burton, S. Buxbaum, W. Chen, R. Cooper-DeHoff, L. Cupples, J. deJong, C. Delles, D. Duggan, M. Fornage, C. Furlong, N. Glazer, J. Gums, C. Hastie, M. Holmes, T. Illig, S. Kirkland, M. Kivimaki, R. Klein, B. Klein, C. Kooperberg, K. Kottke-Marchant, M. Kumari, A. LaCroix, L. Mallela, G. Murugesan, J. Ordovas, W. Ouwehand, W. Post, R. Saxena, H. Scharnagl, P. Schreiner, T. Shah, D. Shields, D. Shimbo, S. Srinivasan, R. Stolk, D. Swerdlow, H. Taylor, E. Topol, E. Toskala, J. van Pelt, J. van Setten, S. Yusuf, J. Whittaker, A. Zwinderman, S. Anand, A. Balmforth, G. Berenson, C. Bezzina, B. Boehm, E. Boerwinkle, J. Casas, M. Caulfield, R. Clarke, J. Connell, K. Cruickshanks, K. Davidson, I. Day, P. de Bakker, P. Doevendans, A. Dominiczak, A. Hall, C. Hartman, C. Hengstenberg, H. Hillege, M. Hofker, S. Humphries, G. Jarvik, J. Johnson, B. Kaess, S. Kathiresan, W. Koenig, D. Lawlor, W. März, O. Melander, B. Mitchell, G. Montgomery, P. Munroe, S. Murray, S. Newhouse, N. Onland-Moret, N. Poulter, B. Psaty, S. Redline, S. Rich, J. Rotter, H. Schunkert, P. Sever, A. Shuldiner, R. Silverstein, A. Stanton, B. Thorand, M. Trip, M. Tsai, P. van der Harst, E. van der Schoot, Y. van der Schouw, W. Verschuren, H. Watkins, A. Wilde, B. Wolffenbuttel, J. Whitfield, G. Hovingh, C. Ballantyne, C. Wijmenga, M. Reilly, N. Martin, J. Wilson, D. Rader, N. Samani, A. Reiner, R. Hegele, J. Kastelein, A. Hingorani, P. Talmud, H. Hakonarson, C. Elbers, B. Keating, F. Drenos, "Large-Scale Gene-Centric Meta-analysis across 32 Studies Identifies Multiple Lipid Loci," American Journal of Human Genetics, vol. 91 no. 5, pp. 823-838, DOI: 10.1016/j.ajhg.2012.08.032, 2012.
[19] M. J. Go, J. Hwang, D. Kim, "Effect of Genetic Predisposition on Blood Lipid Traits," Genomics & Informatics, vol. 10 no. 2, pp. 99-105, DOI: 10.5808/GI.2012.10.2.99, 2012.
[20] J. C. Souto, L. Almasy, M. Borrell, F. Blanco-Vaca, J. Mateo, J. M. Soria, I. Coll, R. Felices, W. Stone, J. Fontcuberta, J. Blangero, "Genetic susceptibility to thrombosis and its relationship to physiological risk factors: the GAIT study. Genetic Analysis of Idiopathic Thrombophilia," American Journal of Human Genetics, vol. 67 no. 6, pp. 1452-1459, DOI: 10.1086/316903, 2000.
[21] S. Oh, I. Huh, S. Y. Lee, T. Park, "Analysis of multiple related phenotypes in genome-wide association studies," Journal of Bioinformatics and Computational Biology, vol. 14 no. 05,DOI: 10.1142/S0219720016440054, 2016.
[22] H. Xu, X. Sun, T. Qi, W. Lin, N. Liu, X. Lou, Z. Yu, "Multivariate Dimensionality Reduction Approaches to Identify Gene-Gene and Gene-Environment Interactions Underlying Multiple Complex Traits," PLoS ONE, vol. 9 no. 9,DOI: 10.1371/journal.pone.0108103, 2014.
[23] H. Jung, S. Leem, S. Lee, T. Park, "A novel fuzzy set based multifactor dimensionality reduction method for detecting gene–gene interaction," Computational Biology and Chemistry, vol. 65, pp. 193-202, DOI: 10.1016/j.compbiolchem.2016.09.006, 2016.
[24] S. Leem, T. Park, "An empirical fuzzy multifactor dimensionality reduction method for detecting gene-gene interactions," BMC Genomics, vol. 18,DOI: 10.1186/s12864-017-3496-x, 2017.
[25] H. Jung, S. Leem, T. Park, "Fuzzy set-based generalized multifactor dimensionality reduction analysis of gene-gene interactions," BMC Medical Genomics, vol. 11 no. S2, pp. 11-20, DOI: 10.1186/s12920-018-0343-0, 2018.
[26] S. Leem, T. Park, "EFMDR-Fast: An Application of Empirical Fuzzy Multifactor Dimensionality Reduction for Fast Execution," Genomics & Informatics, vol. 16 no. 4,DOI: 10.5808/GI.2018.16.4.e37, 2018.
[27] C.-H. Yang, L.-Y. Chuang, Y.-D. Lin, "Epistasis Analysis using an Improved Fuzzy C-means-based Entropy Approach," IEEE Transactions on Fuzzy Systems, vol. PP no. L, 2019.
[28] R. N. Davé, "Characterization and detection of noise in clustering," Pattern Recognition Letters, vol. 12 no. 11, pp. 657-664, DOI: 10.1016/0167-8655(91)90002-4, 1991.
[29] D. R. Velez, B. C. White, A. A. Motsinger, W. S. Bush, M. D. Ritchie, S. M. Williams, J. H. Moore, "A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction," Genetic Epidemiology, vol. 30 no. 8, pp. 718-727, DOI: 10.1002/gepi.20211, 2007.
[30] Y. Stitou, N. Lasmar, Y. Berthoumieu, "Copulas based multivariate Gamma modeling for texture classification," Proceedings of the IEEE Int. Conf. Data Min, pp. 1045-1048, .
[31] Y. S. Cho, M. J. Go, Y. J. Kim, J. Y. Heo, J. H. Oh, H.-J. Ban, D. Yoon, M. H. Lee, D.-J. Kim, M. Park, S.-H. Cha, J.-W. Kim, B.-G. Han, H. Min, Y. Ahn, M. S. Park, H. R. Han, H.-Y. Jang, E. Y. Cho, J.-E. Lee, N. H. Cho, C. Shin, T. Park, J. W. Park, J.-K. Lee, L. Cardon, G. Clarke, M. I. McCarthy, J.-Y. Lee, J.-K. Lee, B. Oh, H.-L. Kim, "A large-scale genome-wide association study of Asian populations uncovers genetic factors influencing eight quantitative traits," Nature Genetics, vol. 41 no. 5, pp. 527-534, DOI: 10.1038/ng.357, 2009.
[32] J. Choi, T. Park, "Multivariate generalized multifactor dimensionality reduction to detect gene-gene interactions," BMC Systems Biology, vol. 7 no. Suppl 6,DOI: 10.1186/1752-0509-7-S6-S15, 2013.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2019 Hyein Kim et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
To understand the pathophysiology of complex diseases, including hypertension, diabetes, and autism, deleterious phenotypes are unlikely due to the effects of single genes, but rather, gene-gene interactions (GGIs), which are widely analyzed by multifactor dimensionality reduction (MDR). Early MDR methods mainly focused on binary traits. More recently, several extensions of MDR have been developed for analyzing various traits such as quantitative traits and survival times. Newer technologies, such as genome-wide association studies (GWAS), have now been developed for assessing multiple traits, to simultaneously identify genetic variants associated with various pathological phenotypes. It has also been well demonstrated that analyzing multiple traits has several advantages over single trait analysis. While there remains a need to find GGIs for multiple traits, such studies have become more difficult, due to a lack of novel methods and software. Herein, we propose a novel multi-CMDR method, by combining fuzzy clustering and MDR, to find GGIs for multiple traits. Multi-CMDR showed similar power to existing methods, when phenotypes followed bivariate normal distributions, and showed better power than others for skewed distributions. The validity of multi-CMDR was confirmed by analyzing real-life Korean GWAS data.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details



1 Department of Statistics, Korea University, Seoul,02841, Republic of Korea
2 Department of Statistics, Seoul National University, Seoul, 08826, Republic of Korea
3 Department of Preventive Medicine, Eulji University, Daejeon, 34824, Republic of Korea