1. Introduction
Genome-wide association study (GWAS) is the most popular strategy for dissecting complex traits of agronomical significance and human diseases with the rising of cutting-edge microarray and next-generation sequencing (NGS) tools and the development of linear mixed models [1,2]. GWAS has been applied to many complex human traits, including diabetes, cancer, and several inflammatory diseases and detected hundreds of novel genes [3,4]. Many studies have also been effectively executed in plants, including the model plant Arabidopsis [5] and other plants [6,7,8,9,10]. Several factors contribute to this success, such as high-throughput technological advancement [11,12], the HapMap project [13], and the growth of advanced statistical methodologies for GWAS [14]. However, few genetic elements related to most traits have been identified that are explained by the genes in GWAS [15], which could be due to several reasons; for example, the influence of a single variant on a disease trait could be of imperfect penetrance and poor power to identify uncommon variants associated with disease, and epistatic and/or gene–environment (G × E) interactions [16]. Moreover, significant single nucleotide polymorphisms (SNPs) can only account for a small part of genetic contributions to complex traits or diseases [17].
The major problems in GWASs are confounding factors, including population stratification, familial correlation, and relatedness among individuals [18,19,20,21,22]. LMM (linear mixed model), also called mixed linear model (MLM), genomic control (GC), family-based association test, structured association, and principal components analysis are the statistical methods for correcting these confounders. LMMs can manage these confounders better, compared with other methods [22]. These methods proved useful in adjusting the inflation from many minor genetic outcomes and correcting the bias of population structure [19,23,24]. A combination of fixed and random effects is used to model phenotypes in the LMMs approaches, where the effect of candidate SNPs is considered fixed, and the random effects account for polygenic background variables with a covariance matrix across individuals [22]. LMMs are extensively applied in the genetic analysis of quantitative traits in plants and humans [25], which are attractive, familiar, and adaptable methods as they provide the individuals’ genetic effects in GWAS [25,26]. Previous studies showed that the mixed models could well accommodate population stratifications by calculating phenotypic covariance resulting from the genetic relatedness or relationship among individuals, and had functioned well in GWAS [18,20,21,27,28]. A study to investigate the epistatic and G × E interactions using an LMM showed that epistasis and G × E interactions are crucial components of the genetic architecture of complex diseases [29].
Recently, several papers have been published to highlight the importance of LMMs in GWAS [26,30,31]. These studies concentrated on a specific topic using LMMs in different fields. However, no studies have covered all of the currently available LMMs methods on GWAS in the literature. Therefore, we aim to provide a thorough review on available LMM methods for GWAS. First, we discuss diverse LMM approaches, including single locus, multi-locus, multivariate/multi-traits, epistasis (G × G) and gene–environment (G × E) interaction, TWAS (transcriptome-wide association studies), and longitudinal GWAS. Then we present different packages and web-based software/server tools using LMMs. Moreover, we have discussed the advantages and weaknesses of the linear mixed models utilized in GWAS. Finally, we discuss the future perspective and conclusion of the present study. Existing publications were collected in PubMed, Google Scholar, Web of Sciences, and other search engines, including Bing. Publications not associated with LMMs applied in GWAS were excluded in the present review.
2. Linear Mixed Models
LMMs can solve different problems, including population stratification, family structure, cryptic relatedness, estimating polygenic effect, and missing heritability (Figure 1).
The LMMs were originated to account for multiple levels of relatedness by using a kinship matrix, which greatly enhanced the performance of GWAS by reducing both the false-positive and the false-negative rates [18]. LMMs increase the power to discover QTNs (quantitative trait nucleotides) by governing the false-positive rates in presence of confounding factors, including population structure and cryptic relatedness [18]. Different types of LMMs, including single locus, multi-locus, multi-traits/multivariate, gene by gene (G × G) and gene by environment (G × E), have been used in GWAS for dissecting complex traits (Figure 2). We briefly discuss each type of LMM in the following subsections.
2.1. LMMs for Single Locus Analysis
Many LMM methods have been proposed and applied in GWAS according to the recent advancement [19,20,21,23,27,32] since the first study [18,33]. A single-locus LMM for the measurement of a phenotype across inbred strains can be written as the previous study defined [19] as follows:
(1)
where y is an dimensional observed phenotype, is an dimensional fixed effect matrix with mean, SNPs, and different confounding variables. is a dimensional fixed effect coefficient parameter. is an incidence matrix mapping every observed phenotype to one of the inbred strains; is the random effect with where is the kinship matrix, and is an dimensional residual effect such that . The respective paper can find details about parameter estimation and polygenic background controlling strategy for each single-locus model.Studies confirmed that the methods of controlling the population structure and the confounding factors had better performance than those that did not consider confounding factors [20,21]. For example, EMMA (efficient mixed-model association), an LMM model, is used for adjusting genetic relatedness and population structure in GWAS [19]. EMMA showed more effectiveness than the classical LMM method, which used spectral decomposition to change the calculation process. EMMAX (EMMA eXpedited), a variance component approach, decreased the computational time for analyzing big GWAS data sets [21]. CMLM (compressed MLM) and P3D (population parameter previously determined) remove the re-calculation of variance components, resulting in significantly decreased computational time and improved statistical power [20]. CMLM substitutes the individuals’ genetic impact with the clusters of similar individuals based on their association obtained from entirely obtainable genetic markers [20]. Statistical power was enhanced by 5–15% compared to the conventional LMM method, and computational time decreased using the CMLM method. FaST-LMM (factored spectrally transformed linear mixed models) used the subset of markers to manage the polygenic effect, resulting in accelerated speed and needing less memory [27]. It expressively improved computational speed by using a rank-reduced kinship algorithm, which depends on a subset of fewer genetic markers than the number of individuals [34]. GRAMMAR, genome-wide rapid association using a mixed model and regression, calculates the residuals at the beginning and then dissects the association utilizing LMM [32].
RMLM (random-SNP-effect MLM) considers the SNP-effect as random and permits using Bonferroni correction to estimate the p-value for significance tests [24]. The identified markers are concurrently assessed in a single model employing an EM empirical Bayes approach in the next phase of GWAS [35]. ECMLM (enriched CMLM) allows researchers to select numerous algorithms to cluster individuals into groups and several measurements to originate group kinship from single kinship, resulting in increased statistical power in GWAS for complex traits [36]. FaST-LMM-Select is a simple empirical method that gives enhanced power and adjustment [37]. First, it ranks the SNPs from the lowest to the highest based on the p-values obtained by linear regression, then constructs a genetic similarity matrix involving SNPs until it detects the first minimum in the GC factor (λGC). The GRAMMAR-Gamma method has been proposed as an analytical estimate within the basis of the score test technique [38]. This method gives unbiased estimates of the SNP effect and has power approximate to the LRT-based method, and it can be used for a large human cohort in GWAS. The computational burden of this method is near its theoretical minimum, and the running time is linearly related to the sample size [38].
SUPER (Settlement of MLM Under Progressively Exclusive Relationship) extracted a small subgroup of SNPs and applied them in FaST-LMM [39]. SUPER follows several steps. In the first step, the whole genome is which split into small bins, where the best important marker presents each bin. Subsequently, it selects only the influential bins and applies an ML (maximum likelihood) method to improve the size and the number of bins taken as the possible QTNs causal of the phenotypes. Finally, the small set of markers is used to define the kinship among the individuals by omitting the markers in LD (linkage disequilibrium) for testing the marker irrespective of local distance [39]. SUPER is computationally fast and outstandingly gains statistical power despite utilizing the whole set of SNPs [39]. WarpedLMM (warped linear mixed model) method simplifies the ordinary LMM that estimates an ideal conversion from the monitored data for genetic study [25]. It can also be adjusted for more particular tasks, such as for analysis of multi-locus or multiple phenotypes, and results demonstrated that transformations derived from WarpedLMM enhanced power and accuracy in GWAS. Recently, GMMAT (generalized linear mixed model association test) has been proposed, which is computationally useful for analyzing binary traits using a logistic mixed model approach for GWAS [40]. GMMAT applied a mixed logistic model once per GWAS and executed score tests under the null hypothesis, and it successfully controlled population structure and relatedness when examining binary traits in various study designs [40]. LMM-Score (LMM employing the score test) is a new method proposed to identify the genetic loci of complex traits [1]. This method employs a score test that does not need to estimate parameters under the full model. This method has increased power and requires less computing time than the traditional LMM method in calculating trait heritability. For interested users and readers to select the best method among the single-locus LMM models, the authors suggest top models in sequential order, which are based on the maximum number of citations in Google Scholar given below:
EMMAX > EMMA > CMLM/P3D > Fast-LMM > GRAMMAR > GMMAT > RMLM > GRAMMER-Gamma. These orders are based on the most cited to less cited models, and this style is followed for all other cases to suggest the models that may be chosen by the researchers in this manuscript. The single-locus model and their respective software and packages for GWAS using LMM are given in Table 1.
2.2. LMMs for Multilocus Analysis
Most of the methods perform a single-dimensional genome scan by testing one single marker at a time, where multiple test adjustments are needed for the cut-off value of the significance test. Several single-locus methods such as EMMAX [19], P3D [20], FaST-LMM [27], and GEMMA [23] have been suggested to facilitate the computational load. Most quantitative traits are regulated by a few genes with significant effects and many polygenes with small effects [26]. However, most studies have utilized single-locus GWAS methods, including LMM models and limited algorithms applied to multi-locus GWAS [41].
Multi-locus methods consider all loci info together and do not need multiple test corrections due to the nature of multi-locus [24]. Some multi-locus methods, including MLMM, MRMLM, FASTmrEMMA, FASTmrMLM, and FarmCPU using LMM, have been proposed and demonstrate more statistical power than single locus methods [24,26,42,43,44].
A multi-locus model can be written as the extended version of Equation (1) followed by the previous study, defined [24,43] as follows:
(2)
where is a vector of genotype indicators for the SNP, is the effect of marker and , is a vector of polygenic effect with a multivariate normal distribution with mean zero and variance described by the kinship matrix K, is the residual error with an identity matrix , and other notations are the same as in Equation (1). In the respective papers can be found details about parameter estimation and polygenic background controlling strategy for each multi-locus model. For example, a multi-locus model named MLMM (multi-locus mixed-model) used forward inclusion and backward exclusion in selecting loci [42]. Results showed that MLMM performs better than the existing methods concerning power and FDR (false discovery rate) for analyzing GWAS data with complex traits [42]. LMM-Lasso aggregates multi-variable association analysis with perfect improvement for population structure [45]. It permits jointly detecting various loci with minor effects while considering potential structure between samples [45]. It is theoretically easy, computationally effective, and balances genome-wide settings. PUMA (Penalized Unified Multiple-locus Association), utilizing a family of GWAS data, consists of a class of statistical procedures developed to discover poor associations that are not predicted by conventional analytical approaches [46]. It can handle thousands of genetic markers in a single statistical model by employing the penalized ML structure utilizing a generalized linear model. Results showed that PUMA had improved power to identify weak associations compared to usual GWAS and former penalized methods [46]. Table 1Single-locus model and their respective software and packages for GWAS using LMM.
Tool | Description | Link | Effect | Polygenic Background | Reference | ||||
---|---|---|---|---|---|---|---|---|---|
a | d | α | a | d | α | ||||
GRAMMAR | GRAMMAR is an alternate method to pedigree-founded QTL association mapping, which is quick and easy. It can handle millions of markers and is significantly faster than the evaluated genotype approach for association analysis. | ✓ | ✓ | [32] | |||||
EMMA | EMMA is a fixed model edition of LMM used to control GWAS’s population structure and genetic relatedness. | ✓ | ✓ | [19] | |||||
CMLM/P3D | CMLM (compressed MLM) diminished the sample size into groups using the clustering method, P3D (population parameters previously determined), which removes the re-calculation of variance components. The combined application of these two methods prominently abridged computing time and retained/enhanced statistical power. | ✓ | ✓ | [20] | |||||
EMMAX | EMMAX is a variance component approach founded on the LMM method, which decreases the computational time for analysis of big GWAS data sets and is used for fixing sample structure in GWASs. | ✓ | [21] | ||||||
FaST-LMM | FaST-LMM, an LMM-based method, used the subset of markers to manage the polygenic effect, resulting in accelerated speed and less required memory for GWAS. | ✓ | ✓ | [27] | |||||
FaST-LMM-Select | FaST-LMM-Select is a simple method that shows that wisely choosing a reduced number of SNPs consistently enhances power, expands standardization, and decreases computational time. | ✓ | ✓ | [37] | |||||
GRAMMAR-Gamma | GRAMMAR-Gamma is an exceptionally fast variance component-based method that can be used for the massive human cohort in GWAS. It is established based on the analytical approximation within the context of the score test method. | ✓ | ✓ | [38] | |||||
WarpedLMM | WarpedLMM is a simplification of the ordinary LMM that estimates an ideal transformation from the monitored data for genetic study. Subsequently, this method’s power and accuracy will increase in GWAS. | ✓ | ✓ | [25] | |||||
ECMLM | ECMLM, enriched CMLM, uses various related algorithms and then selects the most effective mixture between the relationship algorithm and grouping algorithm resulting in increased power and can be applied for complex traits. | ✓ | ✓ | [36] | |||||
SUPER | SUPER method intensely decreases the number of genetic markers utilized to define individual relationships, resulting in fast computation and increased statistical power despite utilizing the whole set of SNPs. | ✓ | ✓ | [39] | |||||
RMLM | RMLM, random-SNP-effect MLM, treats the SNP-effect as random and uses Bonferroni correction to determine the p-value for significance. | ✓ | ✓ | [24] | |||||
GMMAT | GMMAT is an R package for carrying out association tests using GLMMs in GWAS and sequencing association studies. | ✓ | [40] | ||||||
LMM-Score | LMM-Score is a new method proposed to identify the genetic loci of complex traits. The simulation study showed that this method’s power increased and needed less computing time than the traditional LMM methods. | ✓ | ✓ | [1] |
Note: a: additive effect; d: dominant effect; α: allelic substitution effect, α = a + d(q − p), where p and q are the frequencies of alleles A and a, respectively. The effect and polygenic background in all tables were partially adopted from another study, described elsewhere [47].
mrMLM (multi-locus RMLM) uses markers selected from the RMLM method with a flexible selection criterion, and is more reliable in QTN discovery and more precise in the QTN effect estimation than the RMLM and EMMA [24]. Recently, FASTmrEMMA, a fast multi-locus random-SNP-effect EMMA, has been proposed to improve the existing multi-locus GWAS method [26]. It used the MLM and EMEB (expectation and maximization empirical Bayes) methods together, where marker effects were considered random, and then the multi-locus model was applied to utilize the EMEB method [26]. The results showed that FASTmrEMMA is more reliable in QTN identification, has a smaller bias in QTN effect calculation, and needs less computation time than current methods, including SUPER, EMMA, CMLM, and ECMLM. FarmCPU (Fixed and random model Circulating Probability Unification) model is proposed to remove confounding factors and is currently frequently used in GWAS [44]. The power of this method increases, along with control of the false-positive rate and needs reduced calculation times compared with existing methods [44]. FASTmrMLM is a more robust method using the previously suggested mrMLM [24] integrated with GEMMA and matrix transformation [43]. More than 50% of computational time decreased, statistical power improved in QTN discovery, and reduced false positive rate by FASTmrMLM instead of GEMMA, MRMLM, and FarmCPU [43].
StepLMM (stepwise LMM) is a consistent, versatile, and computationally proficient method that can be applied to both GS (genomic selection) and GWAS [48]. It used LMMs and a kinship matrix to control the population stratification, and the variance components were re-calculated by an efficient mixed method at each regression stage. StepLMM used the Bayes information criteria as convergence conditions, and valuable and rigorous measures for model assessment in GWAS [49]. A new multi-marker method called SGL-LMM was recently proposed, which combined SGL (sparse group lasso) and LMM to control confounding factors in GWAS [50]. Results showed that the SGL-LMM improved its power to detect marker association in many settings and is suitable for GWAS [50]. For interested users and readers to select the best method among the multi-locus LMM models, the authors suggest top models in the sequential order based on the maximum number of citations in Google Scholar as follows:
BOLT-LMM > MLMM > FarmCPU > mrMLM > FASTmrEMMA > LMM-Lasso. Researchers could use these methods based on their research interests or data types. The multi-locus models and their respective software and packages for GWAS using LMM are given in Table 2.
Table 2Multi-locus models and their respective software and packages for GWAS using LMM.
Tool | Description | Link | Effect | Polygenic Background | Reference | ||||
---|---|---|---|---|---|---|---|---|---|
a | d | α | a | d | α | ||||
MLMM | MLMM, a multi-locus mixed-model, is an LMM-based method for complex traits, which is computationally effective and shows excellent performance regarding power and FDR compared with existing methods. | ✓ | ✓ | ✓ | ✓ | [42] | |||
LMM-Lasso | LMM-Lasso links the benefits of LMM with Lasso regression, which is free of tuning parameters and efficiently corrects population structure. LMM-Lasso instantaneously detects potential causal variants and provides multi-marker-founded phenotype prediction from genotype. | ✓ | ✓ | [45] | |||||
Puma | PUMA, a standard model for utilizing a family of GWAS data, has been proposed to detect a weak association that the traditional methods cannot identify. It used a penalized maximum likelihood method utilizing a general linear model to take thousands of markers in a particular statistical method instantaneously. | ✓ | ✓ | [46] | |||||
BOLT-LMM | BOLT-LMM is an efficient LMM that is computationally fast and gains power by demonstrating more accurate, non-infinitesimal genetic designs through a Bayesian admixture preceding marker impact. | ✓ | ✓ | [51] | |||||
mrMLM | mrMLM (multi-locus RMLM) used markers selected from the RMLM method with a flexible selection criterion, and simulation results showed that the mrMLM is stronger in QTN discovery and more precise in QTN effect estimation than the RMLM and EMM. | ✓ | ✓ | [24] | |||||
FarmCPU | FarmCPU was formulated to control the confounding factors, significantly enhance statistical power, and decrease computing power. | ✓ | ✓ | [44] | |||||
FASTmrEMMA | FASTmrEMMA, a dominant multi-locus model widely used in QTN identification and model fit, has a lower bias in QTN effect calculation and needs a lower running time than existing single- and multi-locus methods. | ✓ | ✓ | [26] | |||||
StepLMM | StepLMM is a consistent, versatile, and computationally proficient method that can be applied to GS and GWAS. StepLMM has excellent efficiency in both GWAS and GS and is workable for agronomic breeding and human genomic studies. | ✓ | ✓ | [48] | |||||
FASTmrMLM | FASTmrMLM is a multi-locus method, which is a fast and authentic algorithm in GWAS and assures superior statistical power, high accuracy of estimates, and low false-positive rate. | ✓ | ✓ | [43] | |||||
SGL-LMM | SGL-LMM, a multi-marker method, combined SGL and LMM for controlling confounding factors in GWAS. It includes the effect of multiple markers and integrates biological group info as preceding evidence in the model. | ✓ | ✓ | [50] |
Note: a: additive effect; d: dominant effect; α: allelic substitution effect, α = a + d(q − p), where p and q are the frequencies of alleles A and a, respectively.
2.3. Multivariate/Multi-Traits LMMs
Multivariate LMMs are commonly used to assess the association between SNPs and multiple correlated phenotypes in genetics due to their effectiveness in controlling relatedness amongst samples [23]. Many multi-trait models have been used for a prolonged period in quantitative genetics [52,53,54,55], but these approaches have hardly been used for GWAS. Multivariate LMMs are widely used in the different fields of genetics, such as the identification of QTL [56], evaluating the pleiotropy and genetic association amongst complex phenotypes [57,58,59] and realizing evolutionary forms [60]. These models are widely acceptable in GWAS not only for their application in sample relatedness and governing population stratification but also for their admiration of the power of the possible advance from multivariate GWAS [53,54,57,61,62,63] compared with univariate [18,19,20,21,23,27,64,65,66]. A multivariate/multi-traits model to analyze associations between the SNP and the phenotype can be written as follows:
(3)
where is a vector of length with the phenotype, is a vector of length with genotypes of the SNP, is the effect of the SNP on the phenotype, contains the effect of population structure of the SNP on the phenotype, and is the residual error of the jth phenotype. According to the single-locus LMM model in Equation (1), each phenotype follows a multivariate normal distribution with mean and variance , where + is the variance of the jth phenotype. The details about the multi-traits model and variance-covariance matrix calculation can be found elsewhere [67].Korte, Vilhjalmsson [57] initially used multivariate LMMs for pairwise quantitative trait analysis in a human cohort. They proposed MTMM (multi-trait mixed mode) for associated phenotypes considering both between- and within-trait variance components concurrently for multiple traits for adjusting population stratification in GWAS [57]. MTMM performed better than single-trait LMMs in identifying loci and could also break down overall trait covariance into genetic and environmental factors. Fitting multivariate LMMs needs a computationally demanding parameter estimation process, where their application has been bound to two traits till now [57,67,68]. GEMMA (genome-wide efficient mixed-model association) has been proposed for fitting multivariate LMMs, which enhances power and computational speed more than the previous methods such as GCTA [69] and WOMBAT [70] and can include more than two phenotypes in the model [23]. It fits BSLMM for effectively integrating the benefits of both LMMs and sparse regression, is robust to different settings in the proportion of variance in phenotypes explained (PVE) estimation, and outperforms in phenotype prediction. Moreover, it can handle three types of models, such as univariate and multivariate LMM and Bayesian sparse LMM. GEMMA can adapt a reasonable number ranging from 2–10 phenotypes and demonstrates computation considerably quicker than MTMM. mvLMM (matrix-variate linear mixed model), a further advanced method, needs less computational time to execute ML inference in a multi-trait model using data transformation [67]. Human data analysis proved that mvLMM increased computational speed ten times, resulting in a practically used large population in GWAS [67].
However, while various multivariate methods are proposed to discover variants linked to more than one phenotype, these existing approaches do not investigate the population structure [71]. GAMMA (generalized analysis of molecular variance for mixed-model analysis) considered the population structure in the model, which can instantly analyze multiple phenotypes and adjust population structure [71]. Results indicated that GAMMA is an enhancement over former methods [19,72] that can detect accurate signals or generate numerous false positives. The existing methods apply a particular area to improve the required computations in multi-traits mixed model approaches. LIMIX is a simple and effective LMM-based software with concurrence to Python for multi-traits genetic analysis [73]. It permits the demonstration of genomic or environmental elements by aggregating diverse fixed effects. It can easily adjust mixed models for various uses with diverse observed and secret covariates and flexible study purposes. Results showed that LIMIX enhances power and prediction precision, particularly while incorporating stepwise multi-locus regression into multi-trait models and examining huge numbers of traits [73]. WOMBAT is software used for the quantitative genetic study of continuous multi-traits using REML (restricted maximum likelihood) [70]. It permits various models, fitting several traits, fixed and random effects, designated genetic covariance configurations, and abridged rank approximation. WOMBAT is well-fitted to investigate big GWAS data sets, assuring both computational effectiveness and authentic maximization of the likelihood function [70].
Methods for assessing a set of variants are crucial for GWAS with complex traits [74]. Set tests are a regression model used for essaying statistical dependencies amongst sets of genetic variants and an objective quantitative trait. This test can be attained by applying LMM through an accumulation of additive effects of multiple variants in a particular variance component [75]. Set tests can help abridge the amount of genome-wide tests and are efficient when the causative variant is unseen or when many causal variants are present compared with single-variant methods [76]. However, the current set test did not account for confounding factors, which is a central problem for the big genomic data set to increase statistical power. FaST-LMM-Set for set tests has been proposed to handle confounding problems based on the LMM and used two random effects [74]. It used the LRT (likelihood ratio test) and score test, and the results showed that LRT gives more power to controlling type-I error. A second random effect has recently been included in the set tests to control confounding factors, heritable background effects, and relatedness [74,77,78]. A useful set test named mtSet (multi-trait set test) has been proposed for joint analysis throughout numerous linked traits when considering population structure and relatedness and can be applied to one and several traits in GWAS [75]. mtSet is based on a multivariate LMM with two variance components and is computationally capable and facilitating genetic analysis for large cohorts [75]. SMMAT (set mixed-model association test) is a computationally effective variant set test for continuous and binary traits [79]. It can be used in structured and related samples with various possible correlations from large-scale whole-genome sequencing studies. It is supposed that SMMAT could help better understand the complex traits and diseases in human genetic investigation with the technological advances and analytical approaches in large-scale GWAS [79].
Using the variance component (VC model), LRT-based VC studies [18,19,27] are the standard of genetic association. VC studies have gained attention for analysis of human complex traits and application in various fields, including inheritable phenotypic variation elucidated by SNPs [69,80], its allocation across chromosomes, allele frequencies, and functional annotations [81], and its connection all over traits [58]. Though LRT-based VC methods need to estimate all model parameters for each tested genetic marker, existing VCMs such as GCTA [69] become computationally intensive when the population sizes are over 50,000. To overcome this problem, a two-stage approach was suggested instead of the ordinary LRT [64]. The two-step approach would estimate the LRT quickly, if many loci of minor effects participated in trait finding [21,64], and be computationally faster than the LRT-based approach. The BOLT-RELM method is a much faster VC method and can handle large samples [82]. It uses the Monte Carlo average information REML algorithm [83], which approximates Newton-type maximization of the restricted log-likelihood concerning the calculated variance parameters [82]. GCTA and BOLT-REML used REML to estimate genetic correlation amongst two traits of any kind, whereas the mvLMM method is close to GEMMA and can solely adapt normally distributed traits [82]. Although all three approaches apply similar algorithms, BOLT-REML and mvLMM are more effective than GCTA concerning run time and memory utilization [67,82]. Another efficient LMM method, BOLT-LMM, needs just a few O(MN)-time repetitions and gains power by demonstrating more accurate, non-infinitesimal genetic designs through a Bayesian admixture preceding marker impact [51]. Results revealed that cohort size power gains allow BOLT-LMM to favor big cohorts’ data in GWAS. Penalized-MTMM combines both the within- and between-trait variance factors for multiple traits [84]. This method uses AI-REML to calculate variance components and deals with variable selection by applying group MCP (minimax concave penalization) and point estimation using sparse group MCP [84]. LiMMBo (linear mixed models with bootstrapping) has been proposed to facilitate the computationally efficient combined genetic study of multi-dimensional phenotypes [85]. It cuts the number of operative model parameters by entering a mediate subsampling step, strongly controlling the population structure. It can be used for handling big GWAS data with hundreds of traits. All multi-trait LMM methods are popular. For interested users and readers to select the best method among the multivariate/multi-traits LMM models, the authors suggest top models in the sequential order based on the maximum number of citations in Google Scholar as follows:
GEMMA > WOMBAT > BOLT-REML > MTMM > LIMIX. The widely used method’s recommendation could help the users and readers make a quick decision and save time in analyzing their GWAS data using a multivariate/multi-traits LMM model. Multi-trait/multivariate model and their respective software and packages for GWAS using LMM are given in Table 3.
2.4. Linear Mixed Models in Epistasis (G × G) and Gene-Environment (G × E) Interaction
Though many optimistic results have been produced using the different methods in GWAS data analysis, it has been recognized that additive effects can elucidate only a portion of genetic variations [86]. Epistasis is considered a reasonable basis for undetermined variations [87,88]. Much research in epistatic interactions has been completed for complex human traits [89], suggesting that more research about interactions among genetic variants is uncovered. Many software such as INTERSNP [90], EpiGPU [91], FastEpistasis [92], EPIBLASTER [93], TEAM [94], and methods [88,95] have been proposed considering the interaction between two loci for big omics datasets. An epistasis (G × G) and gene–environment (G × E) interaction model for mapping the SNPs in the homozygote population and transcripts/proteins/metabolites in homozygote/heterozygote population for the k-th subject and h-th environment can be written by the following LMM [96]:
(4)
where μ is the population mean; is the fixed effect of the h-th ethnic population; is the i-th locus effect with coefficient in QTS mapping, and expression values using in QTT/P/M mapping); is the epistasis effect of locus locus with coefficients in QTS mapping, and expression values using in QTT/P/M mapping); is the environment interaction effect of the i-th locus and the h-th environment with coefficient ; is the epistasis environment interaction effect of locus locus in the h-th environment with coefficient ; is the residual effect of the k-th individual in the h-th environment. The details about parameter estimation and test statistic for G G and G E interaction models can be found elsewhere [96]. Table 3Multi-trait/multivariate models and their respective software and packages using LMM.
Tool | Description | Link | Effect | Polygenic Background | Reference | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a | d | α | ae | de | αe | a | d | α | ae | de | αe | ||||
WOMBAT | WOMBAT is a software package that analyzes multiple quantitative traits using REML. It is well-fitted to investigate big GWAS data sets and assure both computational effectiveness and accurate boosting of the likelihood function. | ✓ | ✓ | [70] | |||||||||||
GEMMA | GEMMA (genome-wide efficient mixed-model association) is used to calculate precise values of test statistics and is constructed on EMMA software. It can handle three types of models such as univariate and multivariate LMM and Bayesian sparse LMM. | ✓ | ✓ | [23] | |||||||||||
MTMM | MTMM is an LMM method for associated phenotypes considering both between and within-trait variance components concurrently for multiple traits for adjusting population stratification in GWAS. | ✓ | ✓ | ✓ | [57] | ||||||||||
FaST-LMM-Set | FaST-LMM-Set, a novel approach for set tests, can handle the confounding problem. It is based on the LMM and uses two random effects: the first random effect is used to capture the set association signal, and the second is used to control confounding factors. | ✓ | ✓ | [74] | |||||||||||
mtSet | Set tests are an effective approach for genome-wide association essaying among groups of genetic variants and a single quantitative trait. mtSet is an application of effective set test algorithms for combined analysis across multiple traits, which can explain confounding factors, including relatedness and single and multiple traits that can be used for GWAS. | ✓ | ✓ | [75] | |||||||||||
LIMIX | LIMIX, a simple and effective LMM-based software, can execute a wide range of genetic analyses for multi-trait using GWAS data. It can handle diverse functions, including single-locus and interaction association studies and variance decomposition studies with LMMs. | ✓ | ✓ | ✓ | [73] | ||||||||||
BOLT-REML | BOLT-REML uses the RELM approach to estimate the variance parameters for models, taking multiple variance components and traits that solve computational problems that make it impossible to analyze large data sets. | ✓ | ✓ | [82] | |||||||||||
mvLMM | mvLMM (matrix-variate linear mixed model) is a multiple-trait association mapping approach, which needs less computational time to execute inference in a multi-trait model by using data transformation and a ten-fold computational speed increase for large cohort analysis. | ✓ | ✓ | [67] | |||||||||||
GAMMA | GAMMA, a multivariate method, can coincidentally analyze numerous phenotypes and adjust for population structure. GAMMA is a more advanced method than others, which either cannot find true effects or have a higher false positive rate. | ✓ | ✓ | [71] | |||||||||||
LiMMBo | LiMMBo is a very easy and flexible method based on LMMs for multi-dimensional GWAS data with hundreds of phenotypes. It combines LMMs and bootstrapping for estimates of large trait covariance matrices. | ✓ | ✓ | [85] | |||||||||||
SGL-LMM | SGL-LMM combined SGL (sparse group lasso) and LMM for multivariate GWAS analysis. Results showed that the SGL-LMM improved the power to detect marker association in various settings. | ✓ | ✓ | [50] | |||||||||||
SMMAT | SMMAT is a computationally effective variant test for continuous and binary traits. SMMAT can be used in structured and related samples with various possible origins of correlations from large-scale whole-genome sequencing studies. | ✓ | ✓ | [79] |
Note: a: additive effect; d: dominant effect; α: allelic substitution effect, α = a + d(q − p), where p and q are the frequencies of alleles A and a, respectively; e: environmental effect; ae: additive-environment interaction effect; aa (aae): additive-additive epistatic effect (or interaction effect between aa and environment); ad: additive-dominant effect; da: dominant-additive effect; dd: dominant-dominant effect.
FAM-MDR, a multifactor dimensionality reduction technique, detects epistasis in minor or extensive pedigrees [97]. It aggregates characteristics of GRAMMAR with model-based multifactor dimensionality reduction. This model can manage complex and significant pedigrees with extra unconnected individuals [97]. The FAM-MDR methodology comprises two parts, where residuals are inferred from a polygenic model. Both additive polygenic and confounding effects are removed at the first step. In the second step, FAM-MDR used a model-based MDR method for calculating the association between the new traits (residuals inferred in the first part are regarded as the new traits in the subsequent part of the FAM-MDR) and genotypes obtained based on the multi-locus dimensions [98]. The p-values for the best model can be estimated after randomly permuting the traits under the assumption of familial correlation-free traits in this step [97]. Simulation and real data analysis results showed that the FAM-MDR method performs better for solving multiple-testing problems, improves power, and expeditiously applies the whole available information compared with PGMDR [97]. Zhang, Zhu, Tong, Zhu, Qi, and Zhu [96] developed an association analysis method that can analyze epistasis (G × G) and G × E (genotype-by-environment) interaction based on a mixed linear model. However, it cannot be directly used for high-density SNP marker data, and many markers need to be screened before analysis. They implemented their method in software named QTXNetwork, based on the graphics processing unit system to analyze diverse genetic effects concurrently. Three functional modules, including QTL identification, QTS (quantitative trait SNP) detection and QTT/P/M (quantitative trait transcript/protein/metabolite) analysis, can be done using QTXNetwork. Simulation study and real data analysis proved that unbiased estimation would be found for genetic effects by QTXNetwork [96].
A study showed that the LMMs were unable to control the inflation of test statistics for G × E but were only capable of handling population structure when considering the genetic relatedness in the model [99]. To overcome this problem, the researcher considered traditional genetic similarity and the associated individuals with identical environments, which causes misleading G × E interactions [99]. Another method named iSet was proposed based on LMMs, considering G × E in the model and answering for polygenic effects [98]. This study showed that the model’s power increased due to considering the interactions with variants; consequently, this method detected many unknown interactions [98]. Research showed that epistasis allows a practicable path for investigating possible genetic systems of complex traits. However, computational efficiency is a great barrier to identifying interactions effect in real-world problems, particularly in controlling the type I error, population structure and cryptic relatedness using the LMMs [100]. REMMA, a rapid epistatic mixed-model association, has been proposed to address these issues based on the knowledge of approximation between GBLUP (genomic best linear unbiased prediction) and SNP-BLUP [100]. This model has several advantages, such as computational efficiency, lower Type I error rate, and QTL discovery power [100]. However, the computational complexity is O(n2), where n is the population size. Therefore, the same group proposed the REMMAX (REMMA eXpedited) model to reduce the computational time for the epistatic GWAS model. REMMAX can concurrently manage association studies for additive × additive, additive × dominance, dominance × dominance, and individual-definite residual effects for controlling background by integrating various polygenic effects in the model [97]. Additionally, the fairly accurate REMMAX algorithm suggested filtering out the non-significant interactions and then applying a Wald test to accelerate the computation times. Accordingly, time complexity reduced and became linear with the population size, and real data analysis results revealed that REMMAX is a proficient method for interpreting genetic structures of complex traits [101].
G × E interaction can detect the genetic effects, which are avoided in the linear models, enhance the GWAS power, and give the fractional answer to the missing heritability [102,103]. Another group proposed GxEMM, an integrative mixed model for polygenic interactions, to obtain the total effect of small G × E effects to disseminate throughout the genome [104]. Most importantly, environmental variables are not necessarily categorical, and diverse quantities of heritability could be assigned to diverse environments [105]. It can be employed for any GWAS datasets with pertinent environmental interaction and is especially useful when splitting heritability into distinct environmental components. For estimating G × E based-heritability, GxEMM elucidates key biases in the latest methods. For example, GxEMM can adapt to the overall environment, noise diversity, and binary traits [104].
Various phenotypes and environmental variables such as nutrition, physical exertion, or lifestyle covariates can help the G × E interaction study [103]. The study proved that phenotypes controlled by a single locus interacted with multiple environments. However, there are no powerful approaches for the joint G × E interaction study of multiple environmental variables. StructLMM (Structured LMM) has been proposed to analyze G × E interactions, which is computationally effective in detecting the loci that relate to hundreds of environmental variables [103]. This method possesses more power and enhances robustness in case of large numbers of environmental variables analysis compared with the conventional G × E interaction fixed effect test for single and manifold degrees of freedom [103]. Moreover, allelic effect size estimations, which contributed to G × E interaction, for each individual were obtained by this method. Recently, the deep mixed model has been proposed for random model interactions between SNPs for adjusting confounding factors in GWAS [106]. Grid-LMM is a scalable algorithm for frequently suiting complex LMMs that can include various origins of heterogeneity, including additive and dominance genetic variance and G × E interactions [107]. It is applied to execute the G × E interaction and find the association for phenotypes determined by a non-additive inherited variation, an advantage from prototyping multiple random effects [107]. Simulation and real data analysis results showed that accuracy for association investigation and power to discover causal genetic variants increased by Grid-LMM in GWAS [107]. It is a user-friendly method for genome-wide data that prominently decreases their computational load, and users can easily select the best statistical model for analyzing their data [107]. FFselect is an LMMs-based advanced method for analyzing GWAS data incorporating shared environmental effects in the model [108]. Phenotypic variance can be subdivided into large, small, and environmental genetic effects, which permits the user to estimate the environmental variance by FFselect [108]. Additionally, this method supplies an understanding of trait genetic structure founded on the many loci with larger genetic effects. Furthermore, this method incorporated auxiliary criteria to stop the forward feature assortment of pseudo QTNs to avoid overfitting problems [108]. This method demonstrated enhanced power, effectively controlled FDR, and simultaneously adapted for environmental factors to enlarge the effectiveness of GWAS. A study evaluated the overall G × E interaction using LMMs [109]. Authors considered instantaneous scoring of particular and general environmental effects for fixed effect terms demonstrating G × E effects in this study. The genomic inflation factor is controlled by considering both G × E and G × T (genotype by trial) effect for random effects terms [109]. The LMM approach was applied to tomato phenotype data collected in two different seasons. Results showed that this method identified both QTLs with consistent effects throughout the cultivating seasons and G × E effects. Moreover, this study discovered more QTLs with G × E effects than other LMM methods [109]. Recently, Li, et al. [110] established a compressed variance component mixed model framework, namely 3VmrMLM (three-variance-component mixed model), to detect QTNs and QTN × E and QTN × QTN interaction and estimate all their possible effects by controlling all the possibly polygenic backgrounds. Simulation and real data analysis showed that 3VmrMLM has more power, accuracy, and a small FDR [110]. Moreover, this model has the facility to handle compound environments to discover QTN × E interaction and variable selection beneath a polygenic setting for finding QTN × QTN interaction [110]. Many G × G and G × E interaction LMM methods have been proposed but the results obtained by different methods across environments are not stable. Researchers can use the newly developed method 3VmrMLM, which considered all possible interactions and controlled all possible polygenic backgrounds, which might provide better results. Additionally, the relevant software named IIIVmrMLM [47] can easily be used for the analysis of GWAS data. G × G and G × E interaction and their respective software and packages for GWAS using LMM are given in Table 4.
2.5. Linear Mixed Models in Transcriptome-Wide Association Studies (TWAS) and Longitudinal GWAS
GWAS has been effectively used for discovering various genetic variants linked with complex traits/diseases [111]. However, the mechanism behind the genetic variants linked to the complex traits is unclear [111]. Different types of data, including Omics-, clustered-, longitudinal-, family-based GWAS-, expression-, TWAS-, and meta-data, can be handled by LMMs (Figure 3). Recent studies assume genetic variants regulate complex traits by affecting cellular traits, including protein overflow and gene expression [112,113]. LSMM (latent sparse mixed model) method incorporates genetic and cell-type functional annotations with GWAS data [114]. It uses the EM algorithm for parameter estimations and statistical inference. Results showed that the LSMM has more power than current methods in detecting the risk variants (SNPs) and cell-type targeted functional observations and consequently brings about insightful knowledge of the genetic architecture of complex traits in GWAS [114].
SMART (Scalable Multiple Annotation integration for trait-Relevant Tissue identification) is based on the extension of LMM [115]. This model assumes that all SNP effects follow a random distribution. SMART integrates numerous SNP operative annotations from omics investigations on GWAS summary data to assist the detection of trait-associated tissues to reconstruct the dominant association test [115]. CoMM (collaborative mixed model) has been proposed to investigate the mechanism related to linked variants in complex traits [111]. CoMM is computationally fast and statistically effective in analyzing genetic contributions to complex traits by maximizing information in transcriptome data.
Table 4Epistasis (G × G) and gene–environment (G × E) interaction and their respective software and packages for GWAS using LMM.
Tool | Description | Link | Effect | Polygenic Background | Reference | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a | d | α | e | aa/aae/ae |
ad/
|
da/
|
dd/dde | qqe | a | d | α | e |
aa/
|
ad |
da/
|
dd | ||||
FAM-MDR | FAM-MDR, a novel family-based and compromising epistasis finding exploration method, provides better results than the existing method PGMDR (Pedigree-based Generalized MDR) in terms of power, and it sufficiently contracts with numerous testing in epistasis tests. | ✓ | ✓ | [97] | ||||||||||||||||
QTXNetwork | QTXNetwork is an LMM-based software that uses GPU to analyze diverse genetic effects concurrently. It can be used for calculating main genetic effects, G × G and G × E interaction effects on big omics data for complex traits and for calculating the heritability of specific genetic component effects. | ✓ | ✓ | ✓ | ✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓/✓ | ✓ | [96] | |||||||||
iSet | The interaction set test, iSet, is an LMMs-based method that explains the polygenic effects and has more power to detect the interaction between environment and variants. | ✓ | ✓ | [98] | ||||||||||||||||
REMMA | REMMA has been proposed to overcome the computational efficiency problem for handling epistatic effects in GWAS. It is more computationally efficient, has a lower type I error rate, and has higher QTL discovery power than other existing models. | ✓ | ✓ | [100] | ||||||||||||||||
GxEMM | GxEMM is an integrative mixed model for polygenic interactions to disseminate the total effect of small G × E effects throughout the genome. | ✓ | ✕/✓ | ✕/✓ | [104] | |||||||||||||||
StructLMM | StructLMM (structured linear mixed model) is a computationally effective method to detect and illustrate loci that relate to one or more environments. Hundreds of environmental variables can be used to study interactions using this model. | ✓ | ✕/✓ | ✓ | [103] | |||||||||||||||
Grid-LMM | Grid-LMM is a scalable algorithm for frequently suiting complex LMMs that can include heterogeneity, including additive and dominance genetic variance, uneven distribution of traits, and G × E interactions. | ✓ | ✕/✓ | [107] | ||||||||||||||||
FFselect | FFselect is an LMM based advanced method for the analysis of GWAS data incorporating shared environmental effects in the model. This method demonstrated enhanced power, controlled FDR (false discovery rate), and simultaneously adapted to environmental factors to enhance GWAS’s effectiveness. | ✓ | ✓ | [108] | ||||||||||||||||
REMMAX | REMMAX, REMMA eXpedited, is a proficient method for GWAS by adjusting numerous polygenic effects, and the time complexity is almost linear with the population size. | ✓ | ✓ | ✓ | ✓ | ✓ | Polygenic background with normal distribution | [101] | ||||||||||||
3VmrMLM | 3VmrMLM, a three-variance-component mixed model, was incorporated with the mrMLM method. It has more power and accuracy to discover all kinds of loci and give an unbiased estimation of their effects. | ✓ | ✓ | ✓ | ✓/✕ |
✓/✕ | ✓/✕/✓ | ✓/✕ | ✓ | ✓ | ✓ | ✓/✓ | ✓ | ✓/✓ | ✓ | [110] |
Note: a: additive effect; d: dominant effect; α: allelic substitution effect, α = a + d(q − p), where p and q are the frequencies of alleles A and a, respectively; e: environmental effect; aa/aae/ae: additive-additive epistatic/interaction effect between aa and environment/additive-environment interaction effect; ad/ade: additive-dominant effect/interaction effect between ad and environment; da/dae/de: dominant-additive effect/interaction effect between da and environment/dominant-environment interaction effect; dd/dde: dominant-dominant effect/interaction effect between dd and environment; qqe: interaction effect between qq and environment.
Real data analysis demonstrated that CoMM could identify more genetically governed genes associated with complex traits deprived of excessive type I errors. However, CoMM is an effective method, but it uses individual-level GWAS data and cannot entirely use extensively existing summary statistics data in GWAS [116]. CoMM-S2 methods proposed using summary statistics GWAS data rather than individual-level data [117]. This method uses similar approaches to CoMM except for summary statistics data. CoMM-S2 has some benefits over CoMM. For example, CoMM-S2 is computationally more proficient than CoMM when using larger sample sizes. The authors showed that CoMM-S2 performed better when the cellular heritability was small [116]. However, CoMM-S2 cannot be applied in a cross-tissue study. Additionally, CoMM-S2 cannot differentiate whether the discovered genes are only correlated with the complex traits or if they are genuine causal effects [116].
Numerous GWASs have been implemented in population cohorts that have repeated measures at multiple time points for each individual [117,118,119], but usual association methods only take account of one-time points. Furlotte et al. [120] offered a mixed-model-based longitudinal GWAS, which used multiple phenotype measurements for every individual. Their model clarifies phenotypic chronological tendencies and uses a kinship coefficient matrix-based LMEM (linear mixed-effects model) named KIN-LMEM to control population structure. The results demonstrated that power was improved compared with conventional methods [120]. Additionally, it is feasible to separate the genetic effect from the environmental effect when the manifold measurement for a unique individual is accessible using the KIN-LMEM method. Even if this method essayed for a specific set of assumptions, it may also be utilized for a larger class of challenges [120]. Another method based on a conditional two-step approach was proposed for longitudinal data, suggesting a computationally realistic result for inquiring about the association between the provided SNP and the longitudinal desire trait [121]. Sikorska et al. [122] applied a quick conditional two-step method founded on fitting an LMEM accompanied by linear regression as a computationally efficient solution for LMEM with random intercept and slope. Sung et al. [123] proposed two-stage approaches for family-based data to detect the pleiotropic impact on multiple longitudinal traits. Among the TWAS and longitudinal LMM methods, KIN-LMEM is a very popular and widely used method. We suggest choosing the TWAS models from the sequential order based on the maximum number of citations in Google Scholar as follows: KIN-LMEM > SMART > CoMM > CoMM-S2 > LSMM. TWAS and longitudinal-related LMM models, software, and packages for GWAS are given in Table 5.
2.6. LMM-Based Packages in GWAS
Many LMM-based packages have been developed in GWAS (Table 6). DMU is a broadly employed package in quantitative genetics and is applied to estimate the variance components, fixed effects, and predict random effects [124]. This package analyzes MMM (multivariate mixed models) under continuous improvement for over 30 years. Many high-performance methods have been applied for particular project-associated tasks and common applications in genetics and genomics research-related packages integrated with the DMP package [124]. GenABEL is another widely used software R library for GWAS, which is very useful for verifying the quality of genetic data, screening associations between SNPs with binary or quantitative traits, displaying results, and delivering comfortable interfaces to ordinary statistical results and figures [125]. lrgpr is a high-functioning and convenient R interface for assessing LMMs [126]. Lrgpr has been configured for interactive and big-scale GWAS analysis for the confounding effects of family relationships and population stratification.
Table 5Transcriptome-wide association studies (TWAS) and longitudinal-related LMMS models, software, and packages for GWAS.
Tool | Description | Link | Effect | Polygenic Background | Reference | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
a | d | α | e | a | d | α | e | ||||
SMART | SMART is based on the extension of LMM that utilizes various corresponding annotations matched to diverse approaches and algorithms. SMART can be applied to construct useful SNP set experiments and decide novel trait-tissue related and useful annotations concerning trait-tissue associations. | ✓ | ✓ | [115] | |||||||
LSMM | LSMM incorporates both genic and cell-type targeted functional annotations in GWAS. It uses the EM algorithm for parameter estimations and statistical implications. The power increased compared with current methods to detect the risk variants (SNPs) and cell-type targeted functional observations by the LSMM approach. | ✓ | ✓ | [114] | |||||||
CoMM | CoMM, a collaborative mixed model, is to inquire about the recurring role of linked variants in complex traits. CoMM is computationally fast and statistically effective in analyzing genetic contributions to complex traits by maximizing information in transcriptome data. | ✓ | ✓ | [111] | |||||||
CoMM-S2 | CoMM-S2 uses summary statistics GWAS data to study the mechanism of genetic variants. This method uses similar approaches to CoMM, except for summary statistics data and simulation and real data analysis showed that the efficiency of CoMM-S2 is equivalent to CoMM and CoMM-S2 applied in the CoMM package. | ✓ | ✓ | [116] | |||||||
KIN-LMEM | KIN-LMEM is a mixed-model-based approach for executing association mapping, which utilizes numerous phenotype measurements for each individual. | ✓ | ✓ | [120] |
Note: a: additive effect; d: dominant effect; α: allelic substitution effect, α = a + d(q − p), where p and q are the frequencies of alleles A and a, respectively; e: environmental effect.
Table 6LMM-based packages in GWAS.
Tool | Description | Link | Effect | Polygenic Background | Reference | ||||
---|---|---|---|---|---|---|---|---|---|
a (aa/ad) | d (dd/da) | α | a | d | α | ||||
DMU | DMU is a broadly employed package for analyzing MMM in quantitative genetics and genomics. It applies advanced tools to calculate variance components and fixed effects and predict random effects. | ✓ | [124] | ||||||
ASREML | ASReml utilizes LMMs to analyze big and complex data, and many variance models for random effects are available in the LMM in the ASReml package. | ✓ | ✓ | [127] | |||||
GenABEL | GenABEL is an R package GWAS, which applies an efficient GWA data storehouse and dealing, quick processes for verifying the quality of genetic data, statistical analysis, and representation of GWAS data. | [125] | |||||||
lrgpr | lrgpr is very computationally powerful and efficient for analyzing big GWAS and NGS datasets. It provides a collaborative model conforming to assist exploratory data analysis from the perspective of the LMM. | ✓ | ✓ | [126] | |||||
lme4qtl | lme4qtl, an extension of lme4, adds novel models for genetic studies and extends a flexible model for settings with numerous levels of connection and would be useful while covariance matrices are sparse. | ✓ | ✓ | [128] | |||||
Sci-LMM | SciLMM is a systematic model for analyzing the ancestries of millions of individuals. SciLMM uses LMM approaches in the presence of the dependencies encoded by matrices constructed by the model. This tool is adaptable, can be elongated in various ways, and is valuable for GWAS. | ✓ (✓/✓) | ✓ (✓/✓) | [129] | |||||
Single-RunKing | Single-RunKing is a useful R package to speed up the computation in GWAS by using LMMs. It uses R/fastLmPure to numerically understand the genetic effects of screened SNPs and concentrate on significant SNPs found by the EMMAX algorithm. | ✓ | ✓ | [130] | |||||
LiMMBo | LiMMBo is a very easy and flexible method based on LMMs for multi-dimensional GWAS data with hundreds of phenotypes. It combines LMMs and bootstrapping for estimates of large trait covariance matrices. | ✓ | ✓ | [85] | |||||
SGL-LMM | SGL-LMM combined SGL (sparse group lasso) and LMM for multivariate GWAS analysis, with improved power to detect marker association in various settings. | ✓ | ✓ | [50] | |||||
SMMAT | SMMAT is a computationally effective variant test for continuous and binary traits. SMMAT can be used in structured and related samples with various possible origins of correlations from large-scale whole-genome sequencing studies. | ✓ | ✓ | [79] |
Note: a (aa/ad): additive effect (or additive-additive epistatic effect or additive-dominant effect); d: dominant effect (or dominant–dominant effect or dominant–additive effect); α: allelic substitution effect, α = a + d(q − p), where p and q are the frequencies of alleles A and a, respectively.
Linear and logistic regression models can be fit using this software, which permits accommodating millions of regression models on a desktop by employing an effective execution, concurrent, and out-of-core data processing for big datasets. ASReml, a statistical package, utilizes LMMs by REML for big datasets with complex variance frameworks [127]. Many variance models for random effects are available in the LMM in the ASReml package. Another package named lme4qtl, an extension of lme4, is the most effective method for QTL mapping [128]. It proposes a flexible model for settings with numerous levels of kinship and becomes efficient while covariance matrices are sparse. Family-based data were used to show that lme4qtl is a computationally effective and useful tool.
Single-RunKing, an efficient R software, has been proposed to speed up the GWAS by LMMs [130]. It uses R/fastLmPure to numerically understand the genetic effects of screened SNPs and concentrate on significant SNPs found by the EMMAX algorithm. LMMs and their annexes have currently acquired significant acceptance in human genetics research for estimating heritability [69,80,83,131,132,133], genetic correlation [58,134], predicting phenotype [66,135,136,137], and design sample kinship [22,51,138]. Nevertheless, LMMs have not yet been utilized to study population-scale human genealogies. Shor, Kalka, Geiger, Erlich, and Weissbrod [129] proposed Sci-LMM (Sparse Cholesky factorization LMM), a systematic model for analyzing ancestries with millions of individuals. Sci-LMM can build a matrix of relationships among trillions of pairs of people and fit the representing LMM in a few hours. It offers an integrated basis for inquiring about the epidemiological record of human populations through a pedigree track record and is useful for GWAS [129]. For interested users and readers to select the best method among the LMM-based packages in GWAS, the authors suggest top packages in the sequential order based on the maximum number of citations in Google Scholar as follows: ASREML > GenABEL > DMU > lme4qtl > SMMATs. Shortly, ASREML is broadly employed for big and complex GWAS data, and GenABEL is well-known for inspecting the quality and demonstration of the GWAS data. Moreover, DMU is usually used for calculating variance components and fixed effects and predicting the random effect, lme4qtl is appropriate for the sparse covariance matrix, and SMMATs are mostly used for continuous and binary traits. Every package has different types of advantages and disadvantages. Most curious researchers may check other packages and their details in Table 6.
2.7. Web-Based Software/Server Tools Using Linear Mixed Models
Many software and server-based tools have been developed for multi-omics data analysis in GWAS. Qxpak is a software-based mixed-model, which allows a versatile tool for QTL mapping in various populations, including cross-between inbred lines and within-population analysis [139]. Association studies between SNP and an interesting trait can be done using Qxpak. The most computationally demanding work for every SNP in succession throughout the genome is to fit an LMM, which is guided to improve numerous quicker estimations for building tests of the fixed SNP outcomes in the LMM [20,21,32,38,64]. These approximate tests have been used in various packages such as GenABEL [125], EMMAX [21], TASSEL (Trait Analysis by aSSociation, Evolution and Linkage) [140], and MMM [65]. TASSEL is a widely used software that applies a standard linear model and LMM methodologies for controlling population stratification and family architecture [140]. Traits association, evolutionary pattern, LD, and principal components analysis can be estimated using TASSEL.
QTLNetwork is a widely used software for linkage mapping and visualizing the genetic structure for complex traits, where analytical populations are derived from a crossing of different inbred lines [141]. It can accommodate QTLs with special effects, epistasis, and Q × E (QTL-environment interaction) effect. QTLNetwork provides a GUI facility and can deal with data from diverse forms of observational populations. Although thousands of SNPs associated with complex traits have been identified using GWAS [142], only a portion of the heritability explained by the identified genome-wide significant SNPs due to the numerous SNPs with minor effects are still to be identified [17]. GCTA, genome-wide complex trait analysis, is a flexible tool to calculate and dissect complex trait variation using big GWAS data sets [69]. This method was developed to tackle the “missing heritability” problem. GCTA calculates the variance accounted for by all the SNPs on a chromosome for complex traits instead of testing the association of a single specific SNP to the trait. To investigate and enhance the knowledge about the genetic architecture of complex traits, GCTA covers many other analyses now [69]. GAPIT (Genome Association and Prediction Integrated Tool) applies innovative statistical procedures, including the CMLM (compressed mixed linear model) and CMLM-founded genomic prediction [143]. The GAPIT software offers multiple options for the necessary association tests and uses the most computationally effective methods, including MLM, CMLM, ECMLM, FaST-LMM, FaST-LMM-Select, and SUPER methods in the improved version of the GAPIT [34]. Recently, various powerful LMMs, including FaST-LMM-Select [28], ECMLM [36], and SUPER [39] have been implemented in the GAPIT version 2 [34]. The modified version is relatively easy to run and allows for journal-set-up tabular sum-ups and figures.
MASTOR (mixed-model association score test on related individuals) has been proposed for genetic association mapping a quantitative attribute [144]. It can handle samples with linked individuals and attains high power by using full kinship information to integrate partly missing data in the investigation when adjusting for dependence [144]. Another widely used package is named MMM, which utilizes LMM with one random effect, whose covariance design can be easily assigned by the users for GWAS [65]. It can handle more than 20,000 individuals and 500,000 genetic variants and can be used with other types of data. MMM and FaST-LMM packages have been implemented in the GEMMA package, and those methods used the exact model increasing power relying on the true fundamental layer of relatedness [23,65]. OmicABEL considers the problem of mixed model-based GWAS for a random number of traits [145]. Results showed that different computational algorithms are best for analyzing single- and multi-trait mixed model-founded GWAS, and OmicABEL attains significant speed-ups compared with existing methods.
PEPIS (Pipeline for estimating EPIStatic) has been proposed to estimate polygenic effects based on the LMM [146]. PEPIS used C/C++ programming and integrated respective beneficial publicly available mathematical functions and upgraded libraries, which will tackle the existing problems in epistasis analysis in GWAS [146]. MTG2 is based on the LMM approach using GWAS data for analyzing complex traits [131]. MTG2 incorporated the average information algorithm and eigen decomposition of the genomic relationship matrix, which is considerably faster than other REML methods [131]. It could be applied for the highest number of statistical models than GEMMA, including MLMMs, random regression models, and numerous variance components approach. It can be a valuable and resourceful tool for complex traits studies, especially for multivariate analysis, such as estimating genetic variance-covariance and G × E. PopPAnTe, a versatile and straightforward software, has been proposed for pairwise association studies in associated samples with a wide range of predictors and response. It uses an exact LMM corresponding to that applied in the QTDT software [147]. It is very convenient for biobank data, where a wide range of pedigree evidence is missing [145]. GREML is a dominant LMM-based method where all SNP’s effects are collectively equipped as random effects and have been used for many traits, including height [80]. However, the GREML and Bayesian MLM methods did not examine the relationship between effect size and MAF (minor allele frequency) for complex traits. Bayesian LMM method has been proposed, named BayesS, which can concurrently estimate the effect size, MAF, SNP-based heritability, and polygenicity in usually unconnected individuals utilizing GWAS data [148]. BayesS is applied in a software tool called GCTB (genome-wide complex trait Bayesian analyses), and recently summary-data-based Bayesian LMMs integrated with the GCTB Version 2.0.
OSCA was proposed to manage omics data from high-throughput trials in big cohorts and help analyze complex traits utilizing omics data [149]. OSCA used MLM-based omics association and multi-component MLM-based omics association, excluding the target method to discover omics associated with complex traits considering unseen confounding components and calculate the fraction of phenotypic variation caught by all quantities of one or different omics profiles [150]. Recently, an LMM-based computationally fast and efficient method, fastGWAS, was proposed to analyze biobank data [150]. This method was robust, authentic, and resource-effective for monitoring false positives in the presence of confounding factors, which is employed in the GCTA software package [150]. For interested users and readers to select the best web-based software/server tools and tools using LMM in GWAS, authors suggest top web-based software/server tools in the sequential order based on the maximum number of citations in Google Scholar as follows: TASSEL > GCTA > GAPIT > MMM > QTLNetwork > GAPIT Version 2 > GCTB > fastGWA > QxPak > fastGWA. These tools are very popular and widely used, and various association mapping can be done using these tools for analyzing complex traits based on the LMM model for GWAS. An interested user could investigate the other LMM-based software and server tools in more detail which are given in Table 7.
Table 7Web software and server-based tools using LMMs.
Tool | Description | Link | Effect | Polygenic Background | Reference | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a | d | α | e | ae | aa (aae) | ad | da | dd | a | d | α | e | ||||
QxPak | Qxpak is a mixed-model-based software that allows a very versatile tool for QTL mapping in various populations and can be used for multi-trait and multiQTL analysis in genomic studies. | ✓ | ✓ | [139] | ||||||||||||
TASSEL | TASSEL is software that measures trait associations, evolutionary patterns, and LD calculation. Database browsing and importing are assisted by incorporated middleware. | ✓ | ✓ | [140] | ||||||||||||
QTLNetwork | QTLNetwork is software for mapping and displaying the genetic structure underlying complex traits for observational populations that came from a cross relating to dual inbred lines. QTLNetwork provides a GUI facility and can deal with data from diverse forms of observational populations. | ✓ | ✓ | ✓ | ✓ (✓) | [141] | ||||||||||
GCTA | GCTA, genome-wide complex trait analysis, is a widely used software incorporating many methods for analyzing complex traits using GWAS. | ✓ | ✓ | [69] | ||||||||||||
GAPIT | GAPIT applies promoted statistical approaches, including the CMLM and CMLM-based CMLM-founded genomic prediction. | Several methods including EMMA, P3D/CMLM, ECMLM, MLMM, SUPER and FarmCPU implemented in GAPIT. See the effect and polygenic background in the respective methods tables. | [143] | |||||||||||||
MASTOR | MASTOR is a mixed model-based approach for analyzing GWAS data using the score test for genetic association with a quantitative trait, where sample individuals are related. MASTOR attains high power by using full kinship information to integrate partly missing data in the investigation when adjusting for dependence. | ✓ | ✓ | [144] | ||||||||||||
MMM | MMM, a software package, used LMM with one random effect whose covariance design can be easily assigned by the users for GWAS. It can handle more than 20,000 individuals and 500,000 genetic variants and use other data. | ✓ | ✓ | [65] | ||||||||||||
OmicABEL | OmicABEL is freely accessible software that carries out fast mixed-model-based GWAS. It can handle single and multi-trait and uses CLAK-C HOL to explore significant complex traits, and CLAK-E IG is used for investigating the genomic control of various omics in GWAS. | ✓ | [151] | |||||||||||||
GAPIT Version 2 | GAPIT version 2 included some powerful LMMs, including FaST-LMM-Select, ECMLM, and SUPER. | GAPIT version 2 is an updated version of GAPIT. Several methods including FaST-LMM and FaST-LMM-Select along with others methods mentioned in the GAPIT implemented in GAPIT version 2. See the effect and polygenic background in the respective methods tables. | [34] | |||||||||||||
PEPIS | PEPIS is a web-based tool for studying polygenic epistatic effects founded on an LMM employed to predict the functioning of hybrid rice. PEPIS was devotedly formulated to calculate epistatic effects and will help tackle the obstacles in genetic epistasis study. | ✓ | ✓ | ✓(✕) | ✓ | ✓ | [146] | |||||||||
MTG2 | MTG2 is an LMM-based software for analyzing complex traits using GWAS data. It incorporated AI algorithms and eigendecomposition, which is considerably faster than other REML methods. | ✓ | ✓ | [131] | ||||||||||||
PopPAnTe | PopPAnTe, an easy Java program based on the accurate LMM, allows a flexible permutation method to end the propagation of arbitrarily permuted samples. It could be used for the exact relationship between significant quantitative response and independent variables in family-based GWAS data. | ✓ | ✓ | [145] | ||||||||||||
GCTB | GCTB is a software tool that includes a class of Bayesian LMMs for complex trait studies applying genome-wide SNPs for dissecting complex traits. It offers users many functions to reveal necessary signatures of evolution. | ✓ | [148] | |||||||||||||
OSCA | OSCA, a multipurpose software tool, manages omic data produced from high-throughput trials in big cohorts and helps analyze complex traits utilizing omic data. | ✓ | [149] | |||||||||||||
fastGWA | fastGWA, an LMM model, is proposed for controlling population structure by PCA and relatedness by sparse GRM (genetic relationship matrix) for analyzing big data such as biobank-scale data in GWAS. | ✓ | ✓ | [150] |
Note: a: additive effect; d: dominant effect; α: allelic substitution effect, α = a + d(q − p), where p and q are the frequencies of alleles A and a, respectively; e: environmental effect; ae: additive-environment interaction effect; aa (aae): additive-additive epistatic effect (or interaction effect between aa and environment); ad: additive-dominant effect; da: dominant-additive effect; dd: dominant-dominant effect.
3. Advantages and Weaknesses of Linear Mixed Models Used in GWAS
LMMs are attractive because they can control population structure and explain polygenic information for typical single-variant analysis in GWAS [18,19,20,21,22,23,27,65]. Different LMMs approaches have practical and unique benefits. For example, the key benefit of single-locus modes is the power to deal with many markers, such as millions of markers. However, a single-locus-based method using a single locus at once fails to identify the correct genetic model of complex traits governed by various loci concurrently in GWAS [152]. The amendment of the multiple tests is another problem for the cut-off level of the significance test because the traditional Bonferroni correction is very stringent, resulting in numerous vital loci not exceeding the strict critical value of the significance test [24]. Importantly, multiple loci generally regulate complex traits, which cannot be tested using single-locus methods when each locus has a small effect [153]. Multi-locus LMMs are improved methods for GWAS as these methods do not need Bonferroni correction due to the multi-locus nature, and these methods showed more statistical power than singe locus methods [24,26,42,43,44]. These methods fail when the number of markers is numerous times higher than the sample size because of the limitation in memory allocation or computational complexity despite the usefulness of multi-locus LMMs in GWAS studies. For example, a multi-traits model named BOLT-LMM acquired more power over the present methods based on the conditions through its versatility prior to SNP effect size, depending on the exact genetic architecture and whether sample size are adequately substantial. This method is also sensitive to losing power when used to analyze large observed case-control data in low-incidence diseases. Data quality controlling is vital to elude false positives for correcting confounding factors. This method also has other limitations, such as being computationally slower than GRAMMAR-Gamma, not analyzing plant and animal data and considering only one random genetic effect in the model [51].
Recently, multi-traits association mapping has received more attention as these methods provide more power and in-depth knowledge for dissecting the genetic architecture of complex traits [154]. Many unmeasured aspects of the complicated biological network might be missed using single-trait analysis. Multi-traits analysis concurrently increases the power to grab these unmeasured prospects and identify more variants [71].
The statistical power of the multi-traits LMMs increases across traits by combining small genetic effects [57] and considering interrelated background distinction simplifying the decomposition of phenotypic variation into the diverse VC [63]. For example, GAMMA is developed for the generalized analysis of molecular variance for the mixed model, which is proficient in the instantaneous analysis of numerous phenotypes and controlling population structure [71]. SGL-LMM permits controlling confounding effects, consider the joint effects of multi-markers, and integrates biological group information as earlier knowledge [50]. Consequently, true genetic associations and better phenotypic prediction were possible by SGL-LMM in cases of weak marker effects, powerful confounding effects, and complex situations underlying genetic models [50]. Moreover, the statistical challenge is the robust covariance matrix estimation for multi-traits analysis in statistical genetics to single-cell study. Advanced informative and scalable methods are needed to analyze the enormous plant phenotyping of thousands of individuals from structured crosses with hundreds-thousands of image-based phenotypes [7]. LiMMBo expands to achieve LMMs into the new era, permitting new composite genetic associations and a more instructive investigation of the fundamental biological consequences [85]. Nonetheless, the active use of these methods is fixed as they are computationally rigid for big sample sizes [154].
There are many benefits of using G × G and G × E interaction methods, including the detection of the genetic effects which are missed in the linear models, enhancing the GWAS power, and giving the fractional answer to the missing heritability. However, different G × G and G × E methods have limitations. For example, GxEMM has several limitations, such as it is very computationally intensive, considering Gaussian random effects, which reduce power; and it does not correct for G–E correlation, which is a familiar origin of bias in the fixed effect situation [155]. Additionally, GxEMM did not fit the full model, and random effect is not permitted at present [104]. Another G × E method named StructLMM is robust and powerful, but there are limitations. Firstly, this method did not consider the heritable properties of the environmental variables, which may produce spurious associations. Secondly, this method chose variants that strongly affect the phenotype to reduce the multiple testing problem. However, this screening technique is not good for genome-wide testing for G × E interaction [104]. Furthermore, this method is computationally intensive compared with traditional LMMs and does not support controlling relatedness. Moreover, variance components raise based on the size of the grid and are proportional to the exponentially with the number of random effects. Grid-LMM estimates are not precise for posterior inference of variance component sizes and are bound to Gaussian LMMs. Furthermore, the Grid-LMM method has not investigated LMMs with correlated random effects [106]. Thus, more novel methods are needed to analyze G × G and G × E effects. Moreover, data incorporation from various natures is required to understand the interaction between genetic and environmental factors completely.
Gene expression data and GWAS are incorporated by the TWAS to discover gene-traits associations. TWAS methods are needed to overcome the limitations of the other methods. For example, most of the GWAS hits are in non-coding regions, and their biological explanation is unclear. Additionally, all information from GWAS proposes that complex traits are frequently controlled by many variants with minor or moderate effects. In contrast, a prominent part of risk variants with minor effects remain unidentified [114]. A TWAS method named LSMM was proposed to integrate the functional annotation data with GWAS, and results showed that the statistical power of this method increased compared with other methods in identifying risk variants and uncovering cell-type related annotation [114]. Another method, SMART, integrated multiple binary and continuous annotations to simplify the detection of trait-associated tissues for GWAS traits [115]. However, improved SNP annotation tools and a large sample size might help adapt diverse annotation incorporation methods in the coming days. Many LMMs-based software and tools have been developed, which are authoritative for dissecting complex traits, and these tools are available, such as freely available statistical R packages. Moreover, applying LMMs in the biological field has challenges; for example, understanding model output can be complicated for the variance components of random effects and the model selections for LMMs [30]. Furthermore, investigation of G × G and G × E effects is needed when incorporating the different omics data, including transcriptomic, metabolomics, proteomics, and genomics in GWAS, to depict the genetic architecture of variants for complex traits. These big omics data deliver an unlimited opportunity for biological knowledge, but incorporating the various omics information and environmental effects is challenging.
However, there are some major restrictions on the current LMMs approaches. Firstly, LMMs are computationally costly and require a long time to analyze big datasets compared with simple models. For example, the run time and memory needed by LMM models are the scale as the cube and square of the cohort size, respectively [57]. Secondly, the existing LMM methods fail to achieve maximum statistical power due to insignificant modeling premises concerning the genetic structure-based phenotypes [51]. Thirdly, the ordinary LMM indirectly assumes that all variants are causal and follow the independent Gaussian distributions with minor effects, but the reality is that complex traits do not always follow the normal distribution [156,157]. Moreover, LMMs are unsatisfactory when many uncommon variants are incorporated into the analysis, particularly when population stratifications are determined by current demographic alterations [158]. Furthermore, the excessive polygenicity of many traits can pose challenges when revealing fundamental biological processes, especially when thousands of variants individually have a slight effect on a trait [159,160]. Therefore, novel approaches are required to tackle polygenicity and assist in explaining the outcome of GWAS through mechanistic intuition [160].
4. Future Perspective
The LMMs have been applied in most aspects of GWAS, including population stratification and relatedness, resulting in computational proficiency and increased statistical power in GWAS studies. However, genomes sequencing has rapidly increased due to the development of NGS, and genomic datasets are growing progressively [161]. This context incorporated the new research fields, including pan genomics, venomics, phenomics, single-cell genomics, and many others, with GWAS (Figure 4). Pan genomics compares the genetic content of diverse strains of similar species or genera, and there are few methods for the pan-genome data, but they give a biased estimate and enforce massive limitations in their models [161,162]. Another field, named phenomics, uses high-throughput data in genomics, which offers many facilities to acquire more worthy evidence than conventional procedures of plant phenotyping [163,164]. Furthermore, venomics is an interdisciplinary field investigating venoms, where different omics data, such as transcriptomics, genomics, and proteomics, are used [165]. Another new approach called pharmacometrics, incorporating different omics approaches has been developed to investigate vigorous molecular conditions for disease conditions and drug reactions [166].
Moreover, artificial intelligence (AI) is growing quickly due to its robust and stable application for resolving problems in conventional computing methods [167]. Furthermore, machine learning and other methods can obtain innovative understandings from meta-analyses of various datasets [164]. Likewise, deep learning is a widespread technique and is widely used in many fields as it can discover more complicated and nonlinear forms in big data [168,169]. Many modern technologies accelerate digital agriculture, including AI, robotics, remote sensing, and others, and these technologies support agriculturalists in acquiring complete, precise, crystal-clear crop and animal breeding products globally [170]. Although AI has received significant attention in agriculture and health research, the real application encounters problems. Additionally, difficulties and deficiencies, including methodologies to handle big data, storage, and computational bottleneck, should be overcome to successfully use these high technologies and the well-known digital revolution in agriculture [170]. Therefore, it is urgent and crucial to develop LMMs-based novel methods or software to analyze big omics data and dissect complex traits.
5. Conclusions
This review introduced the available LMMs methods on GWAS, including single locus, multi-locus, multi-traits, TWAS, longitudinal GWAS, packages, and software in omics data. It provides a practical explanation and guides the reader to fundamental references that allow for an advanced methodological feature and better comprehension of GWAS. It also assists in finding appropriate LMM methods for dissecting complex traits in GWAS and further help to investigate these methods using diverse NGS and omics datasets. This review could guide both the new scholars and those desiring to update their knowledge in the field of GWAS by applying LMMs using the omics data. However, there is no unique and sophisticated software that users would like, including flexible and easy to use, combining different types of omics data, and which can handle big GWAS data analysis much faster than the existing methods. Necessary software and packages should be developed for analyzing big GWAS data sets and marker derivative kinship matrices. Overall, there is much scope to utilize the LMMs in diverse fields, including biostatistics, bioinformatics, and statistical genetics, which could be helpful for medical scientists, agriculturists, technologists, and data scientists to solve real-world problems.
Conceptualization, M.A. and H.X.; writing—original draft preparation, M.A. and X.L.; writing—review and editing, M.H.S., W.J. and H.X.; visualization, M.A. All authors have read and agreed to the published version of the manuscript.
Not applicable.
Not applicable.
Not applicable.
We thank the reviewers and academic editor for their valuable comments and suggestions that helped us improve the manuscript.
The authors have no conflict of interest.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figure 3. Different types of data can be analyzed by LMMs in GWAS for dissecting complex traits.
Figure 4. Linear mixed models (LMMs) could be used in the above potential fields currently developed.
References
1. Chang, T.; Wei, J.; Wang, X.; Miao, J.; Xu, L.; Zhang, L.; Gao, X.; Chen, Y.; Li, J.; Gao, H. A rapid and efficient linear mixed model approach using the score test and its application to GWAS. Livest. Sci.; 2019; 220, pp. 37-45. [DOI: https://dx.doi.org/10.1016/j.livsci.2018.12.012]
2. Wang, Q.; Tang, J.; Han, B.; Huang, X. Advances in genome-wide association studies of complex traits in rice. TAG. Theor. Appl. Genet. Theor. Und Angew. Genet.; 2020; 133, pp. 1415-1425. [DOI: https://dx.doi.org/10.1007/s00122-019-03473-3] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31720701]
3. Altshuler, D.; Daly, M.J.; Lander, E.S. Genetic mapping in human disease. Science; 2008; 322, pp. 881-888. [DOI: https://dx.doi.org/10.1126/science.1156409] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18988837]
4. Manolio, T.A. Cohort studies and the genetics of complex disease. Nat. Genet.; 2009; 41, pp. 5-6. [DOI: https://dx.doi.org/10.1038/ng0109-5] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19112455]
5. Atwell, S.; Huang, Y.S.; Vilhjalmsson, B.J.; Willems, G.; Horton, M.; Li, Y.; Meng, D.; Platt, A.; Tarone, A.M.; Hu, T.T. et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature; 2010; 465, pp. 627-631. [DOI: https://dx.doi.org/10.1038/nature08800] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20336072]
6. Shang, Y.; Ma, Y.; Zhou, Y.; Zhang, H.; Duan, L.; Chen, H.; Zeng, J.; Zhou, Q.; Wang, S.; Gu, W. et al. Plant science. Biosynthesis, regulation, and domestication of bitterness in cucumber. Science; 2014; 346, pp. 1084-1088. [DOI: https://dx.doi.org/10.1126/science.1259215]
7. Yang, W.; Guo, Z.; Huang, C.; Duan, L.; Chen, G.; Jiang, N.; Fang, W.; Feng, H.; Xie, W.; Lian, X. et al. Combining high-throughput phenotyping and genome-wide association studies to reveal natural genetic variation in rice. Nat. Commun.; 2014; 5, 5087. [DOI: https://dx.doi.org/10.1038/ncomms6087]
8. Wu, X.; Li, Y.X.; Shi, Y.S.; Song, Y.C.; Zhang, D.F.; Li, C.H.; Buckler, E.S.; Li, Y.; Zhang, Z.W.; Wang, T.Y. Joint-linkage mapping and GWAS reveal extensive genetic loci that regulate male inflorescence size in maize. Plant Biotechnol. J.; 2016; 14, pp. 1551-1562. [DOI: https://dx.doi.org/10.1111/pbi.12519] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26801971]
9. Fan, Y.; Zhou, G.F.; Shabala, S.; Chen, Z.H.; Cai, S.G.; Li, C.D.; Zhou, M.X. Genome-Wide Association Study Reveals a New QTL for Salinity Tolerance in Barley (Hordeum vulgare L.). Front. Plant Sci.; 2016; 7, 946. [DOI: https://dx.doi.org/10.3389/fpls.2016.00946]
10. Guo, Z.; Chen, D.; Alqudah, A.M.; Roder, M.S.; Ganal, M.W.; Schnurbusch, T. Genome-wide association analyses of 54 traits identified multiple loci for the determination of floret fertility in wheat. New Phytol.; 2017; 214, pp. 257-270. [DOI: https://dx.doi.org/10.1111/nph.14342]
11. Matsuzaki, H.; Dong, S.; Loi, H.; Di, X.; Liu, G.; Hubbell, E.; Law, J.; Berntsen, T.; Chadha, M.; Hui, H. et al. Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat. Methods; 2004; 1, pp. 109-111. [DOI: https://dx.doi.org/10.1038/nmeth718] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/15782172]
12. Gunderson, K.L.; Steemers, F.J.; Lee, G.; Mendoza, L.G.; Chee, M.S. A genome-wide scalable SNP genotyping assay using microarray technology. Nat. Genet.; 2005; 37, pp. 549-554. [DOI: https://dx.doi.org/10.1038/ng1547] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/15838508]
13. Altshuler, D.; Brooks, L.D.; Chakravarti, A.; Collins, F.S.; Daly, M.J.; Donnelly, P.; Gibbs, R.A.; Belmont, J.W.; Boudreau, A.; Leal, S.M. et al. A haplotype map of the human genome. Nature; 2005; 437, pp. 1299-1320.
14. de Bakker, P.I.; Yelensky, R.; Pe’er, I.; Gabriel, S.B.; Daly, M.J.; Altshuler, D. Efficiency and power in genetic association studies. Nat. Genet.; 2005; 37, pp. 1217-1223. [DOI: https://dx.doi.org/10.1038/ng1669]
15. Hardy, J.; Singleton, A. Genomewide association studies and human disease. N. Engl. J. Med.; 2009; 360, pp. 1759-1768. [DOI: https://dx.doi.org/10.1056/NEJMra0808700] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19369657]
16. Cohen, J.C.; Kiss, R.S.; Pertsemlidis, A.; Marcel, Y.L.; McPherson, R.; Hobbs, H.H. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science; 2004; 305, pp. 869-872. [DOI: https://dx.doi.org/10.1126/science.1099870]
17. Manolio, T.A.; Collins, F.S.; Cox, N.J.; Goldstein, D.B.; Hindorff, L.A.; Hunter, D.J.; McCarthy, M.I.; Ramos, E.M.; Cardon, L.R.; Chakravarti, A. et al. Finding the missing heritability of complex diseases. Nature; 2009; 461, pp. 747-753. [DOI: https://dx.doi.org/10.1038/nature08494] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19812666]
18. Yu, J.M.; Pressoir, G.; Briggs, W.H.; Bi, I.V.; Yamasaki, M.; Doebley, J.F.; McMullen, M.D.; Gaut, B.S.; Nielsen, D.M.; Holland, J.B. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet.; 2006; 38, pp. 203-208. [DOI: https://dx.doi.org/10.1038/ng1702]
19. Kang, H.M.; Zaitlen, N.A.; Wade, C.M.; Kirby, A.; Heckerman, D.; Daly, M.J.; Eskin, E. Efficient control of population structure in model organism association mapping. Genetics; 2008; 178, pp. 1709-1723. [DOI: https://dx.doi.org/10.1534/genetics.107.080101] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/18385116]
20. Zhang, Z.; Ersoz, E.; Lai, C.Q.; Todhunter, R.J.; Tiwari, H.K.; Gore, M.A.; Bradbury, P.J.; Yu, J.; Arnett, D.K.; Ordovas, J.M. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet.; 2010; 42, pp. 355-360. [DOI: https://dx.doi.org/10.1038/ng.546] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20208535]
21. Kang, H.M.; Sul, J.H.; Service, S.K.; Zaitlen, N.A.; Kong, S.-Y.; Freimer, N.B.; Sabatti, C.; Eskin, E. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet.; 2010; 42, 348. [DOI: https://dx.doi.org/10.1038/ng.548] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20208533]
22. Price, A.L.; Zaitlen, N.A.; Reich, D.; Patterson, N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet.; 2010; 11, pp. 459-463. [DOI: https://dx.doi.org/10.1038/nrg2813] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20548291]
23. Zhou, X.; Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet.; 2012; 44, pp. 821-824. [DOI: https://dx.doi.org/10.1038/ng.2310]
24. Wang, S.B.; Feng, J.Y.; Ren, W.L.; Huang, B.; Zhou, L.; Wen, Y.J.; Zhang, J.; Dunwell, J.M.; Xu, S.; Zhang, Y.M. Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci. Rep.; 2016; 6, 19444. [DOI: https://dx.doi.org/10.1038/srep19444] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26787347]
25. Fusi, N.; Lippert, C.; Lawrence, N.D.; Stegle, O. Warped linear mixed models for the genetic analysis of transformed phenotypes. Nat. Commun.; 2014; 5, 4890. [DOI: https://dx.doi.org/10.1038/ncomms5890]
26. Wen, Y.J.; Zhang, H.; Ni, Y.L.; Huang, B.; Zhang, J.; Feng, J.Y.; Wang, S.B.; Dunwell, J.M.; Zhang, Y.M.; Wu, R. Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief. Bioinform.; 2017; 18, 906. [DOI: https://dx.doi.org/10.1093/bib/bbx028]
27. Lippert, C.; Listgarten, J.; Liu, Y.; Kadie, C.M.; Davidson, R.I.; Heckerman, D. FaST linear mixed models for genome-wide association studies. Nat. Methods; 2011; 8, pp. 833-835. [DOI: https://dx.doi.org/10.1038/nmeth.1681]
28. Listgarten, J.; Lippert, C.; Kadie, C.M.; Davidson, R.I.; Eskin, E.; Heckerman, D. Improved linear mixed models for genome-wide association studies. Nat. Methods; 2012; 9, pp. 525-526. [DOI: https://dx.doi.org/10.1038/nmeth.2037]
29. Alamin, M.; Zhu, J.; Lou, X.; Xu, H. Dissecting Impacts of Nutrition on Epistasis and Ethnicity-Specific Effects of Calibrated Factor VIII Level in the Multiethnic Study of Atherosclerosis. Res. Sq.; 2021; [DOI: https://dx.doi.org/10.21203/rs.3.rs-965091/v1]
30. Harrison, X.A.; Donaldson, L.; Correa-Cano, M.E.; Evans, J.; Fisher, D.N.; Goodwin, C.E.D.; Robinson, B.S.; Hodgson, D.J.; Inger, R. A brief introduction to mixed effects modelling and multi-model inference in ecology. PeerJ; 2018; 6, e4794. [DOI: https://dx.doi.org/10.7717/peerj.4794]
31. Zhang, Y.M.; Jia, Z.; Dunwell, J.M. Editorial: The Applications of New Multi-Locus GWAS Methodologies in the Genetic Dissection of Complex Traits. Front. Plant Sci.; 2019; 10, 100. [DOI: https://dx.doi.org/10.3389/fpls.2019.00100] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30804969]
32. Aulchenko, Y.S.; de Koning, D.J.; Haley, C. Genomewide rapid association using mixed model and regression: A fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics; 2007; 177, pp. 577-585. [DOI: https://dx.doi.org/10.1534/genetics.107.075614] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/17660554]
33. Zhang, Y.M.; Mao, Y.; Xie, C.; Smith, H.; Luo, L.; Xu, S. Mapping quantitative trait loci using naturally occurring genetic variance among commercial inbred lines of maize (Zea mays L.). Genetics; 2005; 169, pp. 2267-2275. [DOI: https://dx.doi.org/10.1534/genetics.104.033217] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/15716509]
34. Tang, Y.; Liu, X.; Wang, J.; Li, M.; Wang, Q.; Tian, F.; Su, Z.; Pan, Y.; Liu, D.; Lipka, A.E. et al. GAPIT Version 2: An Enhanced Integrated Tool for Genomic Association and Prediction. Plant Genome; 2016; 9, pp. 1-9. [DOI: https://dx.doi.org/10.3835/plantgenome2015.11.0120]
35. Xu, S. An expectation-maximization algorithm for the Lasso estimation of quantitative trait locus effects. Heredity; 2010; 105, pp. 483-494. [DOI: https://dx.doi.org/10.1038/hdy.2009.180] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20051978]
36. Li, M.; Liu, X.; Bradbury, P.; Yu, J.; Zhang, Y.M.; Todhunter, R.J.; Buckler, E.S.; Zhang, Z. Enrichment of statistical power for genome-wide association studies. BMC Biol.; 2014; 12, 73. [DOI: https://dx.doi.org/10.1186/s12915-014-0073-5]
37. Listgarten, J.; Lippert, C.; Heckerman, D. FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nat. Genet.; 2013; 45, pp. 470-471. [DOI: https://dx.doi.org/10.1038/ng.2620]
38. Svishcheva, G.R.; Axenovich, T.I.; Belonogova, N.M.; van Duijn, C.M.; Aulchenko, Y.S. Rapid variance components-based method for whole-genome association analysis. Nat. Genet.; 2012; 44, pp. 1166-1170. [DOI: https://dx.doi.org/10.1038/ng.2410]
39. Wang, Q.; Tian, F.; Pan, Y.; Buckler, E.S.; Zhang, Z. A SUPER powerful method for genome wide association study. PLoS ONE; 2014; 9, e107684. [DOI: https://dx.doi.org/10.1371/journal.pone.0107684]
40. Chen, H.; Wang, C.; Conomos, M.P.; Stilp, A.M.; Li, Z.; Sofer, T.; Szpiro, A.A.; Chen, W.; Brehm, J.M.; Celedon, J.C. et al. Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models. Am. J. Hum. Genet.; 2016; 98, pp. 653-666. [DOI: https://dx.doi.org/10.1016/j.ajhg.2016.02.012]
41. Peng, Y.; Liu, H.; Chen, J.; Shi, T.; Zhang, C.; Sun, D.; He, Z.; Hao, Y.; Chen, W. Genome-Wide Association Studies of Free Amino Acid Levels by Six Multi-Locus Models in Bread Wheat. Front. Plant Sci.; 2018; 9, 1196. [DOI: https://dx.doi.org/10.3389/fpls.2018.01196]
42. Segura, V.; Vilhjálmsson, B.J.; Platt, A.; Korte, A.; Seren, Ü.; Long, Q.; Nordborg, M. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet.; 2012; 44, 825. [DOI: https://dx.doi.org/10.1038/ng.2314]
43. Tamba, C.L.; Zhang, Y.-M. A fast mrMLM algorithm for multi-locus genome-wide association studies. bioRxiv; 2018; 341784. [DOI: https://dx.doi.org/10.1101/341784]
44. Liu, X.; Huang, M.; Fan, B.; Buckler, E.S.; Zhang, Z. Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies. PLoS Genet.; 2016; 12, e1005767. [DOI: https://dx.doi.org/10.1371/journal.pgen.1005767]
45. Rakitsch, B.; Lippert, C.; Stegle, O.; Borgwardt, K. A Lasso multi-marker mixed model for association mapping with population structure correction. Bioinformatics; 2013; 29, pp. 206-214. [DOI: https://dx.doi.org/10.1093/bioinformatics/bts669]
46. Hoffman, G.E.; Logsdon, B.A.; Mezey, J.G. PUMA: A unified framework for penalized multiple regression analysis of GWAS data. PLoS Comput. Biol.; 2013; 9, e1003101. [DOI: https://dx.doi.org/10.1371/journal.pcbi.1003101]
47. Li, M.; Zhang, Y.W.; Xiang, Y.; Liu, M.H.; Zhang, Y.M. IIIVmrMLM: The R and C++ tools associated with 3VmrMLM, a comprehensive GWAS method for dissecting quantitative traits. Mol. Plant; 2022; 15, pp. 1251-1253. [DOI: https://dx.doi.org/10.1016/j.molp.2022.06.002]
48. Li, H.; Su, G.; Jiang, L.; Bao, Z. An efficient unified model for genome-wide association studies and genomic selection. Genet. Sel. Evol.; 2017; 49, 64. [DOI: https://dx.doi.org/10.1186/s12711-017-0338-x] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28836943]
49. Chen, J.H.; Chen, Z.H. Extended Bayesian information criteria for model selection with large model spaces. Biometrika; 2008; 95, pp. 759-771. [DOI: https://dx.doi.org/10.1093/biomet/asn034]
50. Guo, Y.; Wu, C.; Guo, M.; Zou, Q.; Liu, X.; Keinan, A. Combining Sparse Group Lasso and Linear Mixed Model Improves Power to Detect Genetic Variants Underlying Quantitative Traits. Front. Genet.; 2019; 10, 271. [DOI: https://dx.doi.org/10.3389/fgene.2019.00271]
51. Loh, P.R.; Tucker, G.; Bulik-Sullivan, B.K.; Vilhjalmsson, B.J.; Finucane, H.K.; Salem, R.M.; Chasman, D.I.; Ridker, P.M.; Neale, B.M.; Berger, B. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet.; 2015; 47, pp. 284-290. [DOI: https://dx.doi.org/10.1038/ng.3190]
52. Jiang, C.; Zeng, Z.B. Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics; 1995; 140, pp. 1111-1127. [DOI: https://dx.doi.org/10.1093/genetics/140.3.1111] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/7672582]
53. Ferreira, M.A.; Purcell, S.M. A multivariate test of association. Bioinformatics; 2009; 25, pp. 132-133. [DOI: https://dx.doi.org/10.1093/bioinformatics/btn563] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19019849]
54. Zhang, L.; Pei, Y.F.; Li, J.; Papasian, C.J.; Deng, H.W. Univariate/Multivariate Genome-Wide Association Scans Using Data from Families and Unrelated Samples. PLoS ONE; 2009; 4, e6502. [DOI: https://dx.doi.org/10.1371/journal.pone.0006502] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19652719]
55. Knott, S.A.; Haley, C.S. Multitrait least squares for quantitative trait loci detection. Genetics; 2000; 156, pp. 899-911. [DOI: https://dx.doi.org/10.1093/genetics/156.2.899]
56. Amos, C.I. Robust variance-components approach for assessing genetic linkage in pedigrees. Am. J. Hum. Genet.; 1994; 54, pp. 535-543.
57. Korte, A.; Vilhjalmsson, B.J.; Segura, V.; Platt, A.; Long, Q.; Nordborg, M. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet.; 2012; 44, pp. 1066-1071. [DOI: https://dx.doi.org/10.1038/ng.2376]
58. Lee, S.H.; Yang, J.; Goddard, M.E.; Visscher, P.M.; Wray, N.R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics; 2012; 28, pp. 2540-2542. [DOI: https://dx.doi.org/10.1093/bioinformatics/bts474]
59. Vattikuti, S.; Guo, J.; Chow, C.C. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet.; 2012; 8, e1002637. [DOI: https://dx.doi.org/10.1371/annotation/61bb5924-6688-4ee5-a37f-d48aa09ad66a]
60. Kruuk, L.E.B. Estimating genetic parameters in natural populations using the “animal model”. Philos. Trans. R. Soc. London. Ser. B Biol. Sci.; 2004; 359, pp. 873-890. [DOI: https://dx.doi.org/10.1098/rstb.2003.1437]
61. Kim, S.; Sohn, K.A.; Xing, E.P. A multivariate regression approach to association analysis of a quantitative trait network. Bioinformatics; 2009; 25, pp. i204-i212. [DOI: https://dx.doi.org/10.1093/bioinformatics/btp218]
62. O’Reilly, P.F.; Hoggart, C.J.; Pomyen, Y.; Calboli, F.C.F.; Elliott, P.; Jarvelin, M.-R.; Coin, L.J.M. MultiPhen: Joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE; 2012; 7, e34861. [DOI: https://dx.doi.org/10.1371/journal.pone.0034861]
63. Stephens, M. A unified framework for association analysis with multiple related phenotypes. PLoS ONE; 2013; 8, e65245. [DOI: https://dx.doi.org/10.1371/journal.pone.0065245]
64. Chen, W.M.; Abecasis, G.R. Family-based association tests for genomewide association scans. Am. J. Hum. Genet.; 2007; 81, pp. 913-926. [DOI: https://dx.doi.org/10.1086/521580]
65. Pirinen, M.; Donnelly, P.; Spencer, C.C. Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. Ann. Appl. Stat.; 2013; 7, pp. 369-390. [DOI: https://dx.doi.org/10.1214/12-AOAS586]
66. Zhou, X.; Carbonetto, P.; Stephens, M. Polygenic modeling with bayesian sparse linear mixed models. PLoS Genet.; 2013; 9, e1003264. [DOI: https://dx.doi.org/10.1371/journal.pgen.1003264]
67. Furlotte, N.A.; Eskin, E. Efficient Multiple-Trait Association and Estimation of Genetic Correlation Using the Matrix-Variate Linear Mixed Model. Genetics; 2015; 200, 59-U112. [DOI: https://dx.doi.org/10.1534/genetics.114.171447]
68. Zhou, X.; Stephens, M. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods; 2014; 11, pp. 407-409. [DOI: https://dx.doi.org/10.1038/nmeth.2848]
69. Yang, J.; Lee, S.H.; Goddard, M.E.; Visscher, P.M. GCTA: A tool for genome-wide complex trait analysis. Am. J. Hum. Genet.; 2011; 88, pp. 76-82. [DOI: https://dx.doi.org/10.1016/j.ajhg.2010.11.011]
70. Meyer, K. WOMBAT: A tool for mixed model analyses in quantitative genetics by restricted maximum likelihood (REML). J. Zhejiang Univ. Sci. B; 2007; 8, pp. 815-821. [DOI: https://dx.doi.org/10.1631/jzus.2007.B0815]
71. Joo, J.W.; Kang, E.Y.; Org, E.; Furlotte, N.; Parks, B.; Hormozdiari, F.; Lusis, A.J.; Eskin, E. Efficient and Accurate Multiple-Phenotype Regression Method for High Dimensional Data Considering Population Structure. Genetics; 2016; 204, pp. 1379-1390. [DOI: https://dx.doi.org/10.1534/genetics.116.189712] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27770036]
72. Zapala, M.A.; Schork, N.J. Statistical properties of multivariate distance matrix regression for high-dimensional data analysis. Front. Genet.; 2012; 3, 190. [DOI: https://dx.doi.org/10.3389/fgene.2012.00190] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23060897]
73. Lippert, C.; Casale, F.P.; Rakitsch, B.; Stegle, O. LIMIX: Genetic analysis of multiple traits. bioRxiv; 2014; 003905. [DOI: https://dx.doi.org/10.1101/003905]
74. Listgarten, J.; Lippert, C.; Kang, E.Y.; Xiang, J.; Kadie, C.M.; Heckerman, D. A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics; 2013; 29, pp. 1526-1533. [DOI: https://dx.doi.org/10.1093/bioinformatics/btt177] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23599503]
75. Casale, F.P.; Rakitsch, B.; Lippert, C.; Stegle, O. Efficient set tests for the genetic analysis of correlated traits. Nat. Methods; 2015; 12, pp. 755-758. [DOI: https://dx.doi.org/10.1038/nmeth.3439]
76. Wu, M.C.; Kraft, P.; Epstein, M.P.; Taylor, D.M.; Chanock, S.J.; Hunter, D.J.; Lin, X. Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet.; 2010; 86, pp. 929-942. [DOI: https://dx.doi.org/10.1016/j.ajhg.2010.05.002]
77. Lippert, C.; Xiang, J.; Horta, D.; Widmer, C.; Kadie, C.; Heckerman, D.; Listgarten, J. Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics; 2014; 30, pp. 3206-3214. [DOI: https://dx.doi.org/10.1093/bioinformatics/btu504]
78. Schifano, E.D.; Epstein, M.P.; Bielak, L.F.; Jhun, M.A.; Kardia, S.L.; Peyser, P.A.; Lin, X. SNP set association analysis for familial data. Genet. Epidemiol.; 2012; 36, pp. 797-810. [DOI: https://dx.doi.org/10.1002/gepi.21676]
79. Chen, H.; Huffman, J.E.; Brody, J.A.; Wang, C.; Lee, S.; Li, Z.; Gogarten, S.M.; Sofer, T.; Bielak, L.F.; Bis, J.C. et al. Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies. Am. J. Hum. Genet.; 2019; 104, pp. 260-274. [DOI: https://dx.doi.org/10.1016/j.ajhg.2018.12.012]
80. Yang, J.; Benyamin, B.; McEvoy, B.P.; Gordon, S.; Henders, A.K.; Nyholt, D.R.; Madden, P.A.; Heath, A.C.; Martin, N.G.; Montgomery, G.W. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet.; 2010; 42, pp. 565-569. [DOI: https://dx.doi.org/10.1038/ng.608]
81. Yang, J.; Manolio, T.A.; Pasquale, L.R.; Boerwinkle, E.; Caporaso, N.; Cunningham, J.M.; de Andrade, M.; Feenstra, B.; Feingold, E.; Hayes, M.G. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet.; 2011; 43, pp. 519-525. [DOI: https://dx.doi.org/10.1038/ng.823]
82. Loh, P.-R.; Bhatia, G.; Gusev, A.; Finucane, H.K.; Bulik-Sullivan, B.K.; Pollack, S.J.; de Candia, T.R.; Lee, S.H.; Wray, N.R.; Kendler, K.S. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet.; 2015; 47, 1385. [DOI: https://dx.doi.org/10.1038/ng.3431]
83. Matilainen, K.; Mantysaari, E.A.; Lidauer, M.H.; Stranden, I.; Thompson, R. Employing a Monte Carlo algorithm in Newton-type methods for restricted maximum likelihood estimation of genetic parameters. PLoS ONE; 2013; 8, e80821. [DOI: https://dx.doi.org/10.1371/journal.pone.0080821]
84. Liu, J.; Yang, C.; Shi, X.J.; Li, C.; Huang, J.; Zhao, H.Y.; Ma, S.G. Analyzing Association Mapping in Pedigree-Based GWAS Using a Penalized Multitrait Mixed Model. Genet. Epidemiol.; 2016; 40, pp. 382-393. [DOI: https://dx.doi.org/10.1002/gepi.21975]
85. Hannah, M.V.; Casale, F.P.; Stegle, O.; Birney, E. LiMMBo: A simple, scalable approach for linear mixed models in high-dimensional genetic association studies. bioRxiv; 2018; 255497. [DOI: https://dx.doi.org/10.1101/255497]
86. Maki-Tanila, A.; Hill, W.G. Influence of gene interaction on complex trait variation with multilocus models. Genetics; 2014; 198, pp. 355-367. [DOI: https://dx.doi.org/10.1534/genetics.114.165282]
87. Eichler, E.E.; Flint, J.; Gibson, G.; Kong, A.; Leal, S.M.; Moore, J.H.; Nadeau, J.H. Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet.; 2010; 11, pp. 446-450. [DOI: https://dx.doi.org/10.1038/nrg2809]
88. Wei, W.H.; Hemani, G.; Haley, C.S. Detecting epistasis in human complex traits. Nat. Rev. Genet.; 2014; 15, pp. 722-733. [DOI: https://dx.doi.org/10.1038/nrg3747]
89. Hemani, G.; Shakhbazov, K.; Westra, H.J.; Esko, T.; Henders, A.K.; McRae, A.F.; Yang, J.; Gibson, G.; Martin, N.G.; Metspalu, A. et al. Detection and replication of epistasis influencing transcription in humans. Nature; 2014; 508, pp. 249-253. [DOI: https://dx.doi.org/10.1038/nature13005]
90. Herold, C.; Steffens, M.; Brockschmidt, F.F.; Baur, M.P.; Becker, T. INTERSNP: Genome-wide interaction analysis guided by a priori information. Bioinformatics; 2009; 25, pp. 3275-3281. [DOI: https://dx.doi.org/10.1093/bioinformatics/btp596]
91. Hemani, G.; Theocharidis, A.; Wei, W.; Haley, C. EpiGPU: Exhaustive pairwise epistasis scans parallelized on consumer level graphics cards. Bioinformatics; 2011; 27, pp. 1462-1465. [DOI: https://dx.doi.org/10.1093/bioinformatics/btr172] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/21471009]
92. Schupbach, T.; Xenarios, I.; Bergmann, S.; Kapur, K. FastEpistasis: A high performance computing solution for quantitative trait epistasis. Bioinformatics; 2010; 26, pp. 1468-1469. [DOI: https://dx.doi.org/10.1093/bioinformatics/btq147] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20375113]
93. Kam-Thong, T.; Czamara, D.; Tsuda, K.; Borgwardt, K.; Lewis, C.M.; Erhardt-Lehmann, A.; Hemmer, B.; Rieckmann, P.; Daake, M.; Weber, F. et al. EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units. Eur. J. Hum. Genet. EJHG; 2011; 19, pp. 465-471. [DOI: https://dx.doi.org/10.1038/ejhg.2010.196]
94. Zhang, X.; Huang, S.; Zou, F.; Wang, W. TEAM: Efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics; 2010; 26, pp. i217-i227. [DOI: https://dx.doi.org/10.1093/bioinformatics/btq186] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/20529910]
95. Evans, D.M.; Marchini, J.; Morris, A.P.; Cardon, L.R. Two-stage two-locus models in genome-wide association. PLoS Genet.; 2006; 2, e157. [DOI: https://dx.doi.org/10.1371/journal.pgen.0020157]
96. Zhang, F.T.; Zhu, Z.H.; Tong, X.R.; Zhu, Z.X.; Qi, T.; Zhu, J. Mixed Linear Model Approaches of Association Mapping for Complex Traits Based on Omics Variants. Sci. Rep.; 2015; 5, 10298. [DOI: https://dx.doi.org/10.1038/srep10298]
97. Cattaert, T.; Urrea, V.; Naj, A.C.; De Lobel, L.; De Wit, V.; Fu, M.; John, J.M.M.; Shen, H.; Calle, M.L.; Ritchie, M.D. FAM-MDR: A flexible family-based multifactor dimensionality reduction technique to detect epistasis using related individuals. PLoS ONE; 2010; 5, e10304. [DOI: https://dx.doi.org/10.1371/journal.pone.0010304]
98. Casale, F.P.; Horta, D.; Rakitsch, B.; Stegle, O. Joint genetic analysis using variant sets reveals polygenic gene-context interactions. PLoS Genet.; 2017; 13, e1006693. [DOI: https://dx.doi.org/10.1371/journal.pgen.1006693]
99. Sul, J.H.; Bilow, M.; Yang, W.Y.; Kostem, E.; Furlotte, N.; He, D.; Eskin, E. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models. PLoS Genet.; 2016; 12, e1005849. [DOI: https://dx.doi.org/10.1371/journal.pgen.1005849]
100. Ning, C.; Wang, D.; Kang, H.M.; Mrode, R.; Zhou, L.; Xu, S.Z.; Liu, J.F. A rapid epistatic mixed-model association analysis by linear retransformations of genomic estimated values. Bioinformatics; 2018; 34, pp. 1817-1825. [DOI: https://dx.doi.org/10.1093/bioinformatics/bty017]
101. Wang, D.; Tang, H.; Liu, J.F.; Xu, S.; Zhang, Q.; Ning, C. Rapid epistatic mixed-model association studies by controlling multiple polygenic effects. Bioinformatics; 2020; 36, pp. 4833-4837. [DOI: https://dx.doi.org/10.1093/bioinformatics/btaa610]
102. Robinson, M.R.; English, G.; Moser, G.; Lloyd-Jones, L.R.; Triplett, M.A.; Zhu, Z.; Nolte, I.M.; van Vliet-Ostaptchouk, J.V.; Snieder, H.; LifeLines Cohort, S. et al. Genotype-covariate interaction effects and the heritability of adult body mass index. Nat. Genet.; 2017; 49, pp. 1174-1181. [DOI: https://dx.doi.org/10.1038/ng.3912]
103. Moore, R.; Casale, F.P.; Bonder, M.J.; Horta, D.; Franke, L.; Barroso, I.; Stegle, O.; Consortium, B. A linear mixed-model approach to study multivariate gene-environment interactions. Nat. Genet.; 2019; 51, pp. 180-186. [DOI: https://dx.doi.org/10.1038/s41588-018-0271-0]
104. Dahl, A.; Nguyen, K.; Cai, N.; Gandal, M.J.; Flint, J.; Zaitlen, N. A Robust Method Uncovers Significant Context-Specific Heritability in Diverse Complex Traits. Am. J. Hum. Genet.; 2020; 106, pp. 71-91. [DOI: https://dx.doi.org/10.1016/j.ajhg.2019.11.015]
105. Dahl, A.; Cai, N.; Flint, J.; Zaitlen, N. GxEMM: Extending linear mixed models to general gene-environment interactions. bioRxiv; 2018; 397638. [DOI: https://dx.doi.org/10.1101/397638]
106. Wang, H.; Yue, T.; Yang, J.; Wu, W.; Xing, E.P. Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies. BMC Bioinform.; 2019; 20, 656. [DOI: https://dx.doi.org/10.1186/s12859-019-3300-9]
107. Runcie, D.E.; Crawford, L. Fast and flexible linear mixed models for genome-wide genetics. PLoS Genet.; 2019; 15, e1007978. [DOI: https://dx.doi.org/10.1371/journal.pgen.1007978]
108. Schultz, N.; Weigel, K. FFselect: An improved linear mixed model for genome-wide association study in populations featuring shared environments confounded by relatedness. bioRxiv; 2020; 892455. [DOI: https://dx.doi.org/10.1101/2020.01.01.892455]
109. Yamamoto, E.; Matsunaga, H. Exploring efficient linear mixed models to detect quantitative trait locus-by-environment interactions. G3; 2021; 11, jkab119. [DOI: https://dx.doi.org/10.1093/g3journal/jkab119]
110. Li, M.; Zhang, Y.W.; Zhang, Z.C.; Xiang, Y.; Liu, M.H.; Zhou, Y.H.; Zuo, J.F.; Zhang, H.Q.; Chen, Y.; Zhang, Y.M. A compressed variance component mixed model for detecting QTNs and QTN-by-environment and QTN-by-QTN interactions in genome-wide association studies. Mol. Plant; 2022; 15, pp. 630-650. [DOI: https://dx.doi.org/10.1016/j.molp.2022.02.012]
111. Yang, C.; Wan, X.; Lin, X.; Chen, M.; Zhou, X.; Liu, J. CoMM: A collaborative mixed model to dissecting genetic contributions to complex traits by leveraging regulatory information. Bioinformatics; 2019; 35, pp. 1644-1652. [DOI: https://dx.doi.org/10.1093/bioinformatics/bty865] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30295737]
112. Albert, F.W.; Kruglyak, L. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet.; 2015; 16, pp. 197-212. [DOI: https://dx.doi.org/10.1038/nrg3891] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25707927]
113. Zhang, X.; Joehanes, R.; Chen, B.H.; Huan, T.; Ying, S.; Munson, P.J.; Johnson, A.D.; Levy, D.; O’Donnell, C.J. Identification of common genetic variants controlling transcript isoform variation in human whole blood. Nat. Genet.; 2015; 47, pp. 345-352. [DOI: https://dx.doi.org/10.1038/ng.3220] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25685889]
114. Ming, J.; Dai, M.; Cai, M.; Wan, X.; Liu, J.; Yang, C. LSMM: A statistical approach to integrating functional annotations with genome-wide association studies. Bioinformatics; 2018; 34, pp. 2788-2796. [DOI: https://dx.doi.org/10.1093/bioinformatics/bty187]
115. Hao, X.; Zeng, P.; Zhang, S.; Zhou, X. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies. PLoS Genet.; 2018; 14, e1007186. [DOI: https://dx.doi.org/10.1371/journal.pgen.1007186] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29377896]
116. Yang, Y.; Shi, X.; Jiao, Y.; Huang, J.; Chen, M.; Zhou, X.; Sun, L.; Lin, X.; Yang, C.; Liu, J. CoMM-S2: A collaborative mixed model using summary statistics in transcriptome-wide association studies. Bioinformatics; 2019; 36, pp. 2009-2016. [DOI: https://dx.doi.org/10.1093/bioinformatics/btz880]
117. Sabatti, C.; Service, S.K.; Hartikainen, A.L.; Pouta, A.; Ripatti, S.; Brodsky, J.; Jones, C.G.; Zaitlen, N.A.; Varilo, T.; Kaakinen, M. et al. Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat. Genet.; 2009; 41, pp. 35-46. [DOI: https://dx.doi.org/10.1038/ng.271]
118. Aulchenko, Y.S.; Ripatti, S.; Lindqvist, I.; Boomsma, D.; Heid, I.M.; Pramstaller, P.P.; Penninx, B.W.; Janssens, A.C.; Wilson, J.F.; Spector, T. et al. Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts. Nat. Genet.; 2009; 41, pp. 47-55. [DOI: https://dx.doi.org/10.1038/ng.269]
119. Kamatani, Y.; Matsuda, K.; Okada, Y.; Kubo, M.; Hosono, N.; Daigo, Y.; Nakamura, Y.; Kamatani, N. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet.; 2010; 42, pp. 210-215. [DOI: https://dx.doi.org/10.1038/ng.531]
120. Furlotte, N.A.; Eskin, E.; Eyheramendy, S. Genome-wide association mapping with longitudinal data. Genet. Epidemiol.; 2012; 36, pp. 463-471. [DOI: https://dx.doi.org/10.1002/gepi.21640]
121. Sikorska, K.; Rivadeneira, F.; Groenen, P.J.; Hofman, A.; Uitterlinden, A.G.; Eilers, P.H.; Lesaffre, E. Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat. Med.; 2013; 32, pp. 165-180. [DOI: https://dx.doi.org/10.1002/sim.5517]
122. Sikorska, K.; Montazeri, N.M.; Uitterlinden, A.; Rivadeneira, F.; Eilers, P.H.; Lesaffre, E. GWAS with longitudinal phenotypes: Performance of approximate procedures. Eur. J. Hum. Genet. EJHG; 2015; 23, pp. 1384-1391. [DOI: https://dx.doi.org/10.1038/ejhg.2015.1]
123. Sung, Y.; Feng, Z.; Subedi, S. A genome-wide association study of multiple longitudinal traits with related subjects. Stat; 2016; 5, pp. 22-44. [DOI: https://dx.doi.org/10.1002/sta4.102]
124. Madsen, P.; Sørensen, P.; Su, G.; Damgaard, L.H.; Thomsen, H.; Labouriau, R. DMU—A package for analyzing multivariate mixed models. Proceedings of the 8th World Congress on Genetics Applied to Livestock Production; Belo Horizonte, Brazil, 13–18 August 2006.
125. Aulchenko, Y.S.; Ripke, S.; Isaacs, A.; van Duijn, C.M. GenABEL: An R library for genome-wide association analysis. Bioinformatics; 2007; 23, pp. 1294-1296. [DOI: https://dx.doi.org/10.1093/bioinformatics/btm108]
126. Hoffman, G.E.; Mezey, J.G.; Schadt, E.E. lrgpr: Interactive linear mixed model analysis of genome-wide association studies with composite hypothesis testing and regression diagnostics in R. Bioinformatics; 2014; 30, pp. 3134-3135. [DOI: https://dx.doi.org/10.1093/bioinformatics/btu435]
127. Gilmour, A.; Gogel, B.; Cullis, B.; Thompson, R. ASReml User Guide Release 2.0; VSN International Ltd.: Hemel Hempstead, UK, 2006.
128. Ziyatdinov, A.; Vazquez-Santiago, M.; Brunel, H.; Martinez-Perez, A.; Aschard, H.; Soria, J.M. lme4qtl: Linear mixed models with flexible covariance structure for genetic studies of related individuals. BMC Bioinform.; 2018; 19, 68. [DOI: https://dx.doi.org/10.1186/s12859-018-2057-x]
129. Shor, T.; Kalka, I.; Geiger, D.; Erlich, Y.; Weissbrod, O. Estimating variance components in population scale family trees. PLoS Genet.; 2019; 15, e1008124. [DOI: https://dx.doi.org/10.1371/journal.pgen.1008124]
130. Gao, J.; Zhou, X.; Hao, Z.; Jiang, L.; Yang, R. Genome-wide barebones regression scan for mixed-model association analysis. Theor. Appl. Genet.; 2020; 133, pp. 51-58. [DOI: https://dx.doi.org/10.1007/s00122-019-03439-5]
131. Lee, S.H.; van der Werf, J.H. MTG2: An efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics; 2016; 32, pp. 1420-1422. [DOI: https://dx.doi.org/10.1093/bioinformatics/btw012]
132. Golan, D.; Lander, E.S.; Rosset, S. Measuring missing heritability: Inferring the contribution of common variants. Proc. Natl. Acad. Sci. USA; 2014; 111, pp. E5272-E5281. [DOI: https://dx.doi.org/10.1073/pnas.1419064111]
133. Ge, T.; Chen, C.Y.; Neale, B.M.; Sabuncu, M.R.; Smoller, J.W. Phenome-wide heritability analysis of the UK Biobank. PLoS Genet.; 2017; 13, e1006711. [DOI: https://dx.doi.org/10.1371/journal.pgen.1006711]
134. Weissbrod, O.; Flint, J.; Rosset, S. Estimating SNP-Based Heritability and Genetic Correlation in Case-Control Studies Directly and with Summary Statistics. Am. J. Hum. Genet.; 2018; 103, pp. 89-99. [DOI: https://dx.doi.org/10.1016/j.ajhg.2018.06.002] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29979983]
135. Speed, D.; Balding, D.J. MultiBLUP: Improved SNP-based prediction for complex traits. Genome Res.; 2014; 24, pp. 1550-1557. [DOI: https://dx.doi.org/10.1101/gr.169375.113] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/24963154]
136. Golan, D.; Rosset, S. Effective Genetic-Risk Prediction Using Mixed Models. Am. J. Hum. Genet.; 2014; 95, pp. 383-393. [DOI: https://dx.doi.org/10.1016/j.ajhg.2014.09.007] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25279982]
137. Vilhjalmsson, B.J.; Yang, J.; Finucane, H.K.; Gusev, A.; Lindstrom, S.; Ripke, S.; Genovese, G.; Loh, P.R.; Bhatia, G.; Do, R. et al. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. Am. J. Hum. Genet.; 2015; 97, pp. 576-592. [DOI: https://dx.doi.org/10.1016/j.ajhg.2015.09.001]
138. Loh, P.R.; Kichaev, G.; Gazal, S.; Schoech, A.P.; Price, A.L. Mixed-model association for biobank-scale datasets. Nat. Genet.; 2018; 50, pp. 906-908. [DOI: https://dx.doi.org/10.1038/s41588-018-0144-6]
139. Perez-Enciso, M.; Misztal, I. Qxpak: A versatile mixed model application for genetical genomics and QTL analyses. Bioinformatics; 2004; 20, pp. 2792-2798. [DOI: https://dx.doi.org/10.1093/bioinformatics/bth331]
140. Bradbury, P.J.; Zhang, Z.; Kroon, D.E.; Casstevens, T.M.; Ramdoss, Y.; Buckler, E.S. TASSEL: Software for association mapping of complex traits in diverse samples. Bioinformatics; 2007; 23, pp. 2633-2635. [DOI: https://dx.doi.org/10.1093/bioinformatics/btm308]
141. Yang, J.; Hu, C.; Hu, H.; Yu, R.; Xia, Z.; Ye, X.; Zhu, J. QTLNetwork: Mapping and visualizing genetic architecture of complex traits in experimental populations. Bioinformatics; 2008; 24, pp. 721-723. [DOI: https://dx.doi.org/10.1093/bioinformatics/btm494]
142. Visscher, P.M.; Wray, N.R.; Zhang, Q.; Sklar, P.; McCarthy, M.I.; Brown, M.A.; Yang, J. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am. J. Hum. Genet.; 2017; 101, pp. 5-22. [DOI: https://dx.doi.org/10.1016/j.ajhg.2017.06.005]
143. Lipka, A.E.; Tian, F.; Wang, Q.; Peiffer, J.; Li, M.; Bradbury, P.J.; Gore, M.A.; Buckler, E.S.; Zhang, Z. GAPIT: Genome association and prediction integrated tool. Bioinformatics; 2012; 28, pp. 2397-2399. [DOI: https://dx.doi.org/10.1093/bioinformatics/bts444] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22796960]
144. Jakobsdottir, J.; McPeek, M.S. MASTOR: Mixed-model association mapping of quantitative traits in samples with related individuals. Am. J. Hum. Genet.; 2013; 92, pp. 652-666. [DOI: https://dx.doi.org/10.1016/j.ajhg.2013.03.014] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23643379]
145. Visconti, A.; Al-Shafai, M.; Al Muftah, W.A.; Zaghlool, S.B.; Mangino, M.; Suhre, K.; Falchi, M. PopPAnTe: Population and pedigree association testing for quantitative data. BMC Genom.; 2017; 18, 150. [DOI: https://dx.doi.org/10.1186/s12864-017-3527-7] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/28187711]
146. Zhang, W.; Dai, X.; Wang, Q.; Xu, S.; Zhao, P.X. PEPIS: A Pipeline for Estimating Epistatic Effects in Quantitative Trait Locus Mapping and Genome-Wide Association Studies. PLoS Comput. Biol.; 2016; 12, e1004925. [DOI: https://dx.doi.org/10.1371/journal.pcbi.1004925] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/27224861]
147. Abecasis, G.R.; Cardon, L.R.; Cookson, W.O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet.; 2000; 66, pp. 279-292. [DOI: https://dx.doi.org/10.1086/302698]
148. Zeng, J.; de Vlaming, R.; Wu, Y.; Robinson, M.R.; Lloyd-Jones, L.R.; Yengo, L.; Yap, C.X.; Xue, A.; Sidorenko, J.; McRae, A.F. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet.; 2018; 50, pp. 746-753. [DOI: https://dx.doi.org/10.1038/s41588-018-0101-4]
149. Zhang, F.T.; Chen, W.H.; Zhu, Z.H.; Zhang, Q.; Nabais, M.F.; Qi, T.; Deary, I.J.; Wray, N.R.; Visscher, P.M.; McRae, A.F. et al. OSCA: A tool for omic-data-based complex trait analysis. Genome Biol.; 2019; 20, 107. [DOI: https://dx.doi.org/10.1186/s13059-019-1718-z]
150. Jiang, L.; Zheng, Z.; Qi, T.; Kemper, K.E.; Wray, N.R.; Visscher, P.M.; Yang, J. A resource-efficient tool for mixed model association analysis of large-scale data. Nat. Genet.; 2019; 51, pp. 1749-1755. [DOI: https://dx.doi.org/10.1038/s41588-019-0530-8]
151. Fabregat-Traver, D.; Sharapov, S.; Hayward, C.; Rudan, I.; Campbell, H.; Aulchenko, Y.; Bientinesi, P. High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software. F1000Research; 2014; 3, 200. [DOI: https://dx.doi.org/10.12688/f1000research.4867.1] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25717363]
152. Xu, Y.; Yang, T.; Zhou, Y.; Yin, S.; Li, P.; Liu, J.; Xu, S.; Yang, Z.; Xu, C. Genome-Wide Association Mapping of Starch Pasting Properties in Maize Using Single-Locus and Multi-Locus Models. Front. Plant Sci.; 2018; 9, 1311. [DOI: https://dx.doi.org/10.3389/fpls.2018.01311]
153. Scheinfeldt, L.B.; Tishkoff, S.A. Recent human adaptation: Genomic approaches, interpretation and insights. Nat. Rev. Genet.; 2013; 14, pp. 692-702. [DOI: https://dx.doi.org/10.1038/nrg3604]
154. Hackinger, S.; Zeggini, E. Statistical methods to detect pleiotropy in human complex traits. Open Biol.; 2017; 7, 170125. [DOI: https://dx.doi.org/10.1098/rsob.170125] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29093210]
155. Dudbridge, F.; Fletcher, O. Gene-environment dependence creates spurious gene-environment interaction. Am. J. Hum. Genet.; 2014; 95, pp. 301-307. [DOI: https://dx.doi.org/10.1016/j.ajhg.2014.07.014] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25152454]
156. Yang, J.; Weedon, M.N.; Purcell, S.; Lettre, G.; Estrada, K.; Willer, C.J.; Smith, A.V.; Ingelsson, E.; O’Connell, J.R.; Mangino, M. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. EJHG; 2011; 19, pp. 807-812. [DOI: https://dx.doi.org/10.1038/ejhg.2011.39] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/21407268]
157. Stahl, E.A.; Wegmann, D.; Trynka, G.; Gutierrez-Achury, J.; Do, R.; Voight, B.F.; Kraft, P.; Chen, R.; Kallberg, H.J.; Kurreeman, F.A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet.; 2012; 44, pp. 483-489. [DOI: https://dx.doi.org/10.1038/ng.2232] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/22446960]
158. Zaidi, A.A.; Mathieson, I. Demographic history mediates the effect of stratification on polygenic scores. Elife; 2020; 9, e61548. [DOI: https://dx.doi.org/10.7554/eLife.61548]
159. Uffelmann, E.; Posthuma, D. Emerging Methods and Resources for Biological Interrogation of Neuropsychiatric Polygenic Signal. Biol. Psychiatry; 2021; 89, pp. 41-53. [DOI: https://dx.doi.org/10.1016/j.biopsych.2020.05.022] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/32736792]
160. Uffelmann, E.; Huang, Q.Q.; Munung, N.S.; de Vries, J.; Okada, Y.; Martin, A.R.; Martin, H.C.; Lappalainen, T.; Posthuma, D. Genome-wide association studies. Nat. Rev. Methods Primers; 2021; 1, 59. [DOI: https://dx.doi.org/10.1038/s43586-021-00056-9]
161. Guimaraes, L.C.; de Jesus, L.B.; Viana, M.V.C.; Silva, A.; Ramos, R.T.J.; Soares, S.D.; Azevedo, V. Inside the Pan-genome—Methods and Software Overview. Curr. Genom.; 2015; 16, pp. 245-252. [DOI: https://dx.doi.org/10.2174/1389202916666150423002311]
162. Snipen, L.; Almoy, T.; Ussery, D.W. Microbial comparative pan-genomics using binomial mixture models. BMC Genom.; 2009; 10, 385. [DOI: https://dx.doi.org/10.1186/1471-2164-10-385] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/19691844]
163. Rahaman, M.M.; Chen, D.; Gillani, Z.; Klukas, C.; Chen, M. Advanced phenotyping and phenotype data analysis for the study of plant growth and development. Front. Plant Sci.; 2015; 6, 619. [DOI: https://dx.doi.org/10.3389/fpls.2015.00619] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26322060]
164. Bolger, A.M.; Poorter, H.; Dumschott, K.; Bolger, M.E.; Arend, D.; Osorio, S.; Gundlach, H.; Mayer, K.F.X.; Lange, M.; Scholz, U. et al. Computational aspects underlying genome to phenome analysis in plants. Plant J.; 2019; 97, pp. 182-198. [DOI: https://dx.doi.org/10.1111/tpj.14179] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30500991]
165. Wilson, D.; Daly, N.L. Venomics: A Mini-Review. High Throughput; 2018; 7, 19. [DOI: https://dx.doi.org/10.3390/ht7030019] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30041430]
166. Milward, E.A.; Daneshi, N.; Johnstone, D.M. Emerging real-time technologies in molecular medicine and the evolution of integrated ‘pharmacomics’ approaches to personalized medicine and drug discovery. Pharm. Ther.; 2012; 136, pp. 295-304. [DOI: https://dx.doi.org/10.1016/j.pharmthera.2012.08.008]
167. Das, S.; Ghosh, I.; Banerjee, G.; Sarkar, U. Artificial Intelligence in Agriculture: A Literature Survey. Int. J. Sci. Res. Comput. Sci. Appl. Manag. Stud.; 2018; 7, pp. 1-6.
168. Jiang, F.; Jiang, Y.; Zhi, H.; Dong, Y.; Li, H.; Ma, S.; Wang, Y.; Dong, Q.; Shen, H.; Wang, Y. Artificial intelligence in healthcare: Past, present and future. Stroke Vasc. Neurol.; 2017; 2, pp. 230-243. [DOI: https://dx.doi.org/10.1136/svn-2017-000101] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/29507784]
169. Min, S.; Lee, B.; Yoon, S. Deep learning in bioinformatics. Brief. Bioinform.; 2017; 18, pp. 851-869. [DOI: https://dx.doi.org/10.1093/bib/bbw068]
170. Fountas, S.; Espejo-Garcia, B.; Kasimati, A.; Mylonas, N.; Darra, N. The Future of Digital Agriculture: Technologies and Opportunities. IT Prof.; 2020; 22, pp. 24-28. [DOI: https://dx.doi.org/10.1109/MITP.2019.2963412]
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene–gene interaction, gene–environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China; Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
2 Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
3 Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
4 Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China