Introduction
To date, more than 50 loci associated with COVID-19 susceptibility, hospitalization, and severity have been identified using genome-wide association studies (GWAS) (Kanai et al., 2023; Pairo-Castineira et al., 2023). The COVID-19 Host Genetics Initiative (HGI) has made significant efforts (Niemi et al., 2021) to augment the power to identify disease loci by recruiting individuals from diverse populations and conducting a trans-ancestry meta-analysis. Despite this, the lack of genetic diversity and a focus on cases of European ancestries still predominate in the studies (Popejoy and Fullerton, 2016; Sirugo et al., 2019). In addition, while trans-ancestry meta-analyses are a powerful approach for discovering shared genetic risk variants with similar effects across populations (Li and Keating, 2014), they may fail to identify risk variants that have larger effects on particular underrepresented populations. Genetic disease risk has been shaped by the particular evolutionary history of populations and environmental exposures (Rosenberg et al., 2010). Their action is particularly important for infectious diseases due to the selective constraints that are imposed by host‒pathogen interactions (Karlsson et al., 2014; Kwok et al., 2021). Literature examples of this in COVID-19 severity include a
Including diverse populations in case‒control GWAS with unrelated participants usually requires a prior classification of individuals in genetically homogeneous groups, which are typically analyzed separately to control the population stratification effects (Peterson et al., 2019). Populations with recent admixture impose an additional challenge to GWASs due to their complex genetic diversity and linkage disequilibrium (LD) patterns, requiring the development of alternative approaches and a careful inspection of results to reduce false positives due to population structure (Rosenberg et al., 2010). In fact, there are benefits in study power from modeling the admixed ancestries either locally, at the regional scale in the chromosomes, or globally, across the genome, depending on factors such as the heterogeneity of the risk variant in frequencies or the effects among the ancestry strata (Mester et al., 2023). Despite the development of novel methods specifically tailored for the analysis of admixed populations (Atkinson et al., 2021), the lack of a standardized analysis framework and the difficulties in confidently clustering admixed individuals into particular genetic groups often lead to their exclusion from GWAS.
The Spanish Coalition to Unlock Research on Host Genetics on COVID-19 (SCOURGE) recruited COVID-19 patients between March and December 2020 from hospitals across Spain and from March 2020 to July 2021 in Latin America (https://www.scourge-covid.org). A first GWAS of COVID-19 severity among Spanish patients of European descent revealed novel disease loci and explored age- and sex-varying effects of genetic factors (Cruz et al., 2022). Here, we present the findings of a GWAS meta-analysis in admixed Latin American (AMR) populations, comprising individuals from the SCOURGE Latin American cohort and the HGI studies, which allowed us to identify two novel severe COVID-19 loci,
Results
Meta-analysis of COVID-19 hospitalization in admixed Americans
Study cohorts
Within the SCOURGE consortium, we included 1608 hospitalized cases and 1887 controls (not hospitalized COVID-19 patients) from Latin American countries and from recruitments of individuals of Latin American descent conducted in Spain (Supplementary file 1). Quality control details and estimation of global genetic inferred ancestry (GIA) (Figure 1—figure supplement 1) are described in ‘Materials and methods’, whereas clinical and demographic characteristics of patients included in the analysis are shown in Table 1. Summary statistics from the SCOURGE cohort were obtained under a logistic mixed model with the SAIGE model (‘Materials and methods’). Another seven studies participating in the COVID-19 HGI consortium were included in the meta-analysis of COVID-19 hospitalization in admixed Americans (Figure 1).
Figure 1.
Flow chart of this study.
Stage I of the study involved a meta-analysis of the Latin American genome-wide association studies (GWAS) from SCOURGE and the COVID-19 Host Genetics Initiative. The resulting meta-analysis was leveraged to prioritize genes by using a transcriptome-wide association study (TWAS), Bayesian fine-mapping and functional annotations, and to assess the generalizability of polygenic risk score (PGS) cross-population models in Latin Americans. Stage II involved two additional cross-population GWAS meta-analyses to further investigate the replicability of findings.
Figure 1—figure supplement 1.
Global genetic inferred ancestry (GIA) composition in the SCOURGE Latin American cohort.
European (EUR), African (AFR), and Native American (AMR) GIA was derived with ADMIXTURE from a reference panel composed of Aymaran, Mayan, Nahuan, and Quechuan individuals of Native American genetic ancestry and randomly selected samples from the EUR and AFR 1KGP populations. The colors represent the different geographical sampling regions from which the admixed American individuals from SCOURGE were recruited.
Table 1.
Demographic characteristics of the SCOURGE Latin American cohort.
Variable | Non-hospitalized | Hospitalized(N = 1625) | |
---|---|---|---|
Age, mean years ±SD | 39.1 ± 11.9 | 54.1 ± 14.5 | |
Sex, N (%) | |||
Female (%) | 1253 (66.4) | 668 (41.1) | |
Global genetic inferred ancestry, % mean ± SD | |||
European | 54.4 ± 16.2 | 39.4 ± 20.7 | |
African | 15.3 ± 12.7 | 9.1 ± 11.6 | |
Native American | 30.3 ± 19.8 | 51.3 ± 26.5 | |
Comorbidities, N (%) | |||
Vascular/endocrinological | 488 (25.9) | 888 (64.5) | |
Cardiac | 60 (3.2) | 151 (9.3) | |
Nervous | 15 (0.8) | 61 (3.8) | |
Digestive | 14 (0.7) | 33 (2.0) | |
Onco-hematological | 21 (1.1) | 48 (3.00) | |
Respiratory | 76 (4.0) | 118 (7.3) |
GWAS meta-analysis
We performed a fixed-effects GWAS meta-analysis using the inverse of the variance as weights for the overlapping markers. The combined GWAS sample size consisted of 4702 admixed AMR hospitalized cases and 68,573 controls.
This GWAS meta-analysis revealed genome-wide significant associations at four risk loci (Table 2, Figure 2; a quantile‒quantile plot is shown in Figure 2—figure supplement 1), two of which (
Table 2.
Lead independent variants in the admixed AMR genome-wide association studies (GWAS) meta-analysis.
SNP rsID | chr:pos | EA | NEA | OR (95% CI) | p-Value | EAF cases | EAF controls | Nearest gene | Mamba PPR |
---|---|---|---|---|---|---|---|---|---|
| 2:159407982 | T | C | 1.20 (1.12–1.27) | 3.66E-08 | 0.563 | 0.429 |
| 0.30 |
| 3:45848457 | T | C | 1.65 (1.47–1.85) | 6.30E-17 | 0.087 | 0.056 |
| 0.95 |
| 6:41535254 | A | T | 0.84 (0.79–0.89) | 1.89E-08 | 0.453 | 0.517 |
| 0.18 |
| 11:82906875 | G | A | 2.27 (1.7–3.04) | 2.26E-08 | 0.016 | 0.011 |
| 0.95 |
EA: effect allele; NEA: noneffect allele; EAF: effect allele frequency in the SCOURGE study; PPR: posterior probability of replicability.
Figure 2.
Manhattan plot for the admixed AMR genome-wide association studies (GWAS) meta-analysis.
Probability thresholds at p=5 × 10–8 and p=5 × 10–5 are indicated by the horizontal lines. Genome-wide significant associations with COVID-19 hospitalizations were found on chromosome 2 (within
Figure 2—figure supplement 1.
Quantile–quantile plot for the AMR genome-wide association studies (GWAS) meta-analysis.
A lambda inflation factor of 1.015 was obtained.
Table 3.
Novel variants in the SC-HGIALL and SC-HGI3POP meta-analyses (with respect to HGIv7).
Independent signals after LD clumping.
SNP rsID | chr:pos | EA | NEA | OR (95% CI) | p-Value | Nearest gene | Analysis |
---|---|---|---|---|---|---|---|
| 16:3892266 | T | G | 1.31 (1.19–1.44) | 9.64E-09 |
| SC-HGI3POP |
| 19:4063488 | T | C | 0.94 (0.92–0.96) | 1.89E-08 |
| SC-HGI3POP |
| 19:4063488 | T | C | 0.94 (0.92–0.96) | 2.50E-08 |
| SC-HGIALL |
| 20:6492834 | A | T | 0.95 (0.93–0.97) | 2.83E-08 |
| SC-HGIALL |
EA: effect allele; NEA: non-effect allele.
Located within the
Figure 3.
New loci associated with COVID-19 hospitalization in Admixed american populations.
(A) Regional association plots for rs1003835 at chromosome 2 and rs77599934 at chromosome 11. (B) Allele frequency distribution across the 1000 Genomes Project populations for the lead variants rs1003835 and rs77599934. Retrieved from
Figure 3—figure supplement 1.
Regional association plots for the fine mapped loci in chromosomes 2 (A) and 16 (B).
Colored in red, the variants allocated to the credible set at the 95% confidence according to the Bayesian fine mapping. In blue, the sentinel variant.
The other novel risk locus is led by the sentinel variant rs77599934 (Figure 3), a rare intronic variant located in chromosome 11 within
We also observed a suggestive association with rs2601183 in chromosome 15, which is located between
The GWAS meta-analysis also pinpointed two significant variants at known loci,
None of the lead variants was associated with the comorbidities included in Table 1.
Functional mapping of novel risk variants
Variants belonging to the lead loci were prioritized by positional and expression quantitative trait loci (eQTL) mapping with FUMA, resulting in 31 mapped genes (Supplementary file 5). Within the region surrounding the lead variant in chromosome 2, FUMA prioritized four genes in addition to
Bayesian fine mapping
We performed different approaches to narrow down the prioritized loci to a set of most likely genes driving the associations. First, we computed credible sets at the 95% confidence level for causal variants and annotated them with VEP and V2G aggregate scoring. The 95% confidence credible set from the region of chromosome 2 around rs13003835 included 76 variants, which can be found in Supplementary file 6 and a regional plot is shown in Figure 3—figure supplement 1 (VEP and V2G annotations are included in Supplementary files 7 and 8). TheV2G score prioritized
Transcriptome-wide association study (TWAS)
Five novel genes, namely,
Likewise, we carried out TWAS analyses using the models trained in the admixed populations. However, no significant gene pairs were detected in this case. The top 10 genes with the lowest p-values for each of the datasets (Puerto Ricans, Mexicans, African Americans, and pooled cohorts) are shown in Supplementary file 10. Although not significant,
All mapped genes from analyses conducted in AMR populations are shown in Figure 4, and associations for the two novel variants with expression are shown in Figure 4—figure supplement 1.
Figure 4.
Summary of the results from gene prioritization strategies used for genetic associations in AMR populations.
Genome-wide association studies (GWAS) catalog association for
Figure 4—figure supplement 1.
Gene‒tissue pairs for which either rs1003835 or rs60606421 are significant expression quantitative trait loci (eQTL) at false discovery rate (FDR) < 0.05 (data retrieved from https://gtexportal.org/home/snp/).
rs1003835 (chromosome 2) maps to
Genetic architecture of COVID-19 hospitalization in AMR populations
Allele frequencies of rs13003835 and rs77599934 across ancestries
Neither rs13003835 (
According to gnomAD v3.1.2, the T allele at rs13003835 (
Figure 5.
Forest plot showing effect sizes and the corresponding confidence intervals for the sentinel variants identified in the AMR meta-analysis across populations.
All beta values with their corresponding CIs were retrieved from the B2 population-specific meta-analysis from the HGI v7 release, except for AMR, for which the beta value and IC from the HGIAMR-SCOURGE meta-analysis are represented.
rs77599934 (
Cross-population meta-analyses
We carried out two cross-ancestry inverse variance-weighted fixed-effects meta-analyses with the admixed AMR GWAS meta-analysis results to evaluate whether the discovered risk loci replicated when considering other population groups. In doing so, we also identified novel cross-population COVID-19 hospitalization risk loci.
First, we combined the SCOURGE Latin American GWAS results with the HGI B2 ALL analysis (Supplementary file 11). We refer to this analysis as the SC-HGIALL meta-analysis. Out of the 40 genome-wide significant loci associated with COVID-19 hospitalization in the last HGI release (Kanai et al., 2023), this study replicated 39, and the association was stronger than in the original study in 29 of those (Supplementary file 12). However, the variant rs13003835 located in
In this cross-ancestry meta-analysis, we replicated two associations that were not found in HGIv7, albeit they were sentinel variants in the latest GenOMICC meta-analysis (Pairo-Castineira et al., 2023). We found an association at the
In a second analysis, we also explored the associations across the defined admixed AMR, EUR, and AFR ancestral sources by combining through meta-analysis the SCOURGE Latin American GWAS results with the HGI studies in EUR, AFR, and admixed AMR and excluding those from EAS and SAS (Supplementary file 13). We refer to this as the SC-HGI3POP meta-analysis. The association at rs13003835 (
Polygenic risk score models
Using the 49 variants associated with disease severity that are shared across populations according to the HGIv7, we constructed a PGS model to assess its generalizability in the admixed AMR (Supplementary file 14). First, we calculated the PGS for the SCOURGE Latin Americans and explored the association with COVID-19 hospitalization under a logistic regression model. The PGS model was associated with a 1.48-fold increase in COVID-19 hospitalization risk per every PGS standard deviation. It also contributed to explaining a slightly larger variance (∆R2 = 1.07%) than the baseline model.
Subsequently, we divided the individuals into PGS deciles and percentiles to assess their risk stratification. The median percentile among controls was 40, while in cases, it was 63. Those in the top PGS decile exhibited a 2.89-fold (95% CI = 2.37–3.54, p=1.29 × 10–7) greater risk compared to individuals in the deciles between 4 and 6 (corresponding to a score of the median distribution).
We also examined the distribution of PGS across a five-level severity scale to further determine if there was any correspondence between clinical severity and genetic risk. Median PGS were lower in the asymptomatic and mild groups, whereas higher median scores were observed in the moderate, severe, and critical patients (Figure 6). We fitted a multinomial model using the asymptomatic class as a reference and calculated the OR for each category (Supplementary file 15), observing that the disease genetic risk was similar among asymptomatic, mild, and moderate patients. Given that the PGS was built with variants associated with critical disease and/or hospitalization and that the categories severe and critical correspond to hospitalized patients, these results underscore the ability of cross-ancestry PGS for risk stratification even in an admixed population.
Figure 6.
Polygenic risk distribution for COVID-19 hospitalization.
(A) Polygenic risk stratified by polygenic risk score (PGS) deciles comparing each risk group against the lowest risk group (OR–95% CI). (B) Distribution of the PGS in each of the severity scale classes. 0, asymptomatic; 1, mild disease; 2, moderate disease; 3, severe disease; 4, critical disease.
Discussion
We have conducted the largest GWAS meta-analysis of COVID-19 hospitalization in admixed AMR to date. While the genetic risk basis discovered for COVID-19 is largely shared among populations, trans-ancestry meta-analyses on this disease have primarily included EUR samples. This dominance of GWAS in Europeans and the subsequent bias in sample sizes can mask population-specific genetic risks (i.e., variants that are monomorphic in some populations) or be less powered to detect risk variants having higher allele frequencies in population groups other than Europeans. In this sense, after combining data from admixed AMR patients, we found two risk loci that were first discovered in a GWAS of Latin American populations. Interestingly, the sentinel variant rs77599934 in the
Fine mapping of the region harboring
The risk region found in chromosome 2 harbors more than one gene. The lead variant rs13003835 is located within
A third novel risk region was observed on chromosome 15 between the
Secondary analyses revealed five TWAS-associated genes, some of which have already been linked to severe COVID-19. In a comprehensive multitissue gene expression profiling study (Gómez-Carballa et al., 2022), decreased expression of
To explore the genetic architecture of the trait among admixed AMR populations, we performed two cross-ancestry meta-analyses including the SCOURGE Latin American cohort GWAS findings. We found that the two novel risk variants were not associated with COVID-19 hospitalization outside the population-specific meta-analysis, highlighting the importance of complementing trans-ancestry meta-analyses with group-specific analyses. Notably, this analysis did not replicate the association at the
Moreover, these cross-ancestry meta-analyses pointed to three loci that were not genome-wide significant in the HGIv7 ALL meta-analysis: a novel locus at
We developed a cross-population PGS model, which effectively stratified individuals based on their genetic risk and demonstrated consistency with the clinical severity classification of the patients. Only a few polygenic scores were derived from COVID-19 GWAS data. Horowitz et al., 2022 developed a score using 6 and 12 associated variants (PGS ID: PGP000302) and reported an associated OR (top 10% vs rest) of 1.38 for risk of hospitalization in European populations, whereas the OR for Latin American populations was 1.56. Since their sample size and the number of variants included in the PGS were lower, direct comparisons are not straightforward. Nevertheless, our analysis provides the first results for a PRS applied to a relatively large AMR cohort, being of value for future analyses regarding PRS transferability.
This study is subject to limitations, mostly concerning sample recruitment and composition. The SCOURGE Latino American sample size is small, and the GWAS is likely underpowered. Another limitation is the difference in case‒control recruitment across sampling regions that, yet controlled for, may reduce the ability to observe significant associations driven by different compositions of the populations. In this sense, the identified risk loci might not replicate in a cohort lacking any of the parental population sources from the three-way admixture. Likewise, we could not explicitly control for socioenvironmental factors that could have affected COVID-19 spread and hospitalization rates, although genetic principal components are known to capture nongenetic factors. Finally, we must acknowledge the lack of a replication cohort. We used all the available GWAS data for COVID-19 hospitalization in admixed AMR in this meta-analysis due to the low number of studies conducted. Therefore, we had no studies to replicate or validate the results. These concerns may be addressed in the future by including more AMR GWAS in the meta-analysis, both by involving diverse populations in study designs and supporting research from countries in Latin America.
This study provides novel insights into the genetic basis of COVID-19 severity, emphasizing the importance of considering host genetic factors by using non-European populations, especially of admixed sources. Such complementary efforts can pin down new variants and increase our knowledge on the host genetic factors of severe COVID-19.
Materials and methods
Key resources table
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Commercial assay or kit | Chemagic DNA Blood 100 kit | PerkinElmer Chemagen Technologies GmbH | ||
Software, algorithm | Axiom Analysis Suite | Thermo Fisher Scientific | Version 4.0.3.3 | |
Software, algorithm | PLINK | Purcell et al., 2007; https://www.cog-genomics.org/plink/ | RRID:SCR_001757 | Version 1.9; v2 |
Software, algorithm | TOPMed Imputation Server | https://imputation.biodatacatalyst.nhlbi.nih.gov/ | Version 2 | |
Software, algorithm | ADMIXTURE | Alexander et al., 2009; https://dalexander.github.io/admixture/ | RRID:SCR_001263 | Version 1.3.0 |
Software, algorithm | SAIGEgds | Zheng and Davis, 2021; https://www.bioconductor.org/packages/release/bioc/html/SAIGEgds.html | Version 1.10.0 | |
Software, algorithm | METAL | Willer et al., 2010; https://csg.sph.umich.edu/abecasis/metal/ | RRID:SCR_002013 | Version 2011-03-25 |
Software, algorithm | FUMA | Watanabe et al., 2017; https://fuma.ctglab.nl/ | RRID:SCR_017521 | Version 1.5.2 |
Software, algorithm | MAMBA | McGuire et al., 2021; https://github.com/dan11mcguire/mamba | Version 1 | |
Software, algorithm | S-PrediXcan; S-MultiXcan | Barbeira et al., 2018; https://github.com/hakyimlab/MetaXcan | RRID:SCR_016739 | Version 1 |
Software, algorithm | GTEx v8 mashr prediction models | https://predictdb.org/post/2021/07/21/gtex-v8-models-on-eqtl-and-sqtl/ | ||
Other | GWAS Catalog | https://www.ebi.ac.uk/gwas/ | RRID:SCR_012745 | Section ‘Definition of the genetic risk loci and putative functional impact’ |
GWAS in Latin Americans from SCOURGE
The SCOURGE Latin American cohort
A total of 3729 COVID-19-positive cases were recruited across five countries from Latin America (Mexico, Brazil, Colombia, Paraguay, and Ecuador) by 13 participating centers (Supplementary file 1) from March 2020 to July 2021. In addition, we included 1082 COVID-19-positive individuals recruited between March and December 2020 in Spain who either had evidence of origin from a Latin American country or showed inferred genetic admixture between AMR, EUR, and AFR (with <0.05% SAS/EAS). These individuals were excluded from a previous SCOURGE study that focused on participants with European genetic ancestries (Cruz et al., 2022). We used hospitalization as a proxy for disease severity and defined COVID-19-positive patients who underwent hospitalization as a consequence of the infection as cases and those who did not need hospitalization due to COVID-19 as controls.
Samples and data were collected with informed consent after the approval of the Ethics and Scientific Committees from the participating centers and by the Galician Ethics Committee Ref 2020/197. Recruitment of patients from IMSS (in Mexico City) was approved by the National Committee of Clinical Research from Instituto Mexicano del Seguro Social, Mexico (protocol R-2020-785-082).
Samples and data were processed following normalized procedures. The REDCap electronic data capture tool (Harris et al., 2009; Harris et al., 2019), hosted at Centro de Investigación Biomédica en Red (CIBER) from the Instituto de Salud Carlos III (ISCIII), was used to collect and manage demographic, epidemiological, and clinical variables. Subjects were diagnosed with COVID-19 based on quantitative PCR tests (79.3%) or according to clinical (2.2%) or laboratory procedures (antibody tests: 16.3%; other microbiological tests: 2.2%).
SNP array genotyping
Genomic DNA was obtained from peripheral blood and isolated using the Chemagic DNA Blood 100 kit (PerkinElmer Chemagen Technologies GmbH), following the manufacturer’s recommendations.
Samples were genotyped with the Axiom Spain Biobank Array (Thermo Fisher Scientific) following the manufacturer’s instructions in the Santiago de Compostela Node of the National Genotyping Center (CeGen-ISCIII; http://www.usc.es/cegen). This array contains probes for genotyping a total of 757,836 SNPs. Clustering and genotype calling were performed using Axiom Analysis Suite v4.0.3.3 software.
Quality control steps and variant imputation
A quality control (QC) procedure using PLINK 1.9 (Purcell et al., 2007) was applied to both samples and the genotyped SNPs. We excluded variants with a minor allele frequency (MAF) <1%, a call rate <98%, and markers strongly deviating from Hardy–Weinberg equilibrium expectations (p<1 × 10–6) with mid-p adjustment. We also explored the excess of heterozygosity to discard potential cross-sample contamination. Samples missing >2% of the variants were filtered out. Subsequently, we kept the autosomal SNPs, removed high-LD regions, and conducted LD pruning (windows of 1000 SNPs, with a step size of 80 and an r2 threshold of 0.1) to assess kinship and estimate the global ancestral proportions. Kinship was evaluated based on IBD values, removing one individual from each pair with PI_HAT > 0.25 that showed a Z0, Z1, and Z2 coherent pattern (according to the theoretical expected values for each relatedness level). Genetic principal components (PCs) were calculated with PLINK with the subset of LD pruned variants.
Genotypes were imputed with the TOPMed version r2 reference panel (GRCh38) using the TOPMed Imputation Server, and variants with Rsq < 0.3 or with MAF <1% were filtered out. A total of 4348 individuals and 10,671,028 genetic variants were included in the analyses.
Genetic admixture estimation
Global GIA, referred to the genetic similarity to the used reference individuals, was estimated with ADMIXTURE (Alexander et al., 2009) v1.3 software following a two-step procedure. First, we randomly sampled 79 European (EUR) and 79 African (AFR) samples from The 1000 Genomes Project (1KGP) (Auton et al., 2015) and merged them with the 79 Native American (AMR) samples from Mao et al., 2007 keeping the biallelic SNPs. LD-pruned variants were selected from this merge using the same parameters as in the QC. We then ran an unsupervised analysis with K = 3 to redefine and homogenize the clusters and to compose a refined reference for the analyses by applying a threshold of ≥95% of belonging to a particular cluster. As a result, 20 AFR, 18 EUR, and 38 AMR individuals were removed. The same LD-pruned variants data from the remaining individuals were merged with the SCOURGE Latin American cohort to perform supervised clustering and estimate admixture proportions. A total of 471 samples from the SCOURGE cohort with >80% estimated European GIA were removed to reduce the weight of the European ancestral component, leaving a total of 3512 admixed Latin American (AMR) subjects for downstream analyses.
Association analysis
The results for the SCOURGE Latin American GWAS were obtained by testing for COVID-19 hospitalization as a surrogate of severity. To accommodate the continuum of GIA in the cohort, we opted for a joint testing of all the individuals as a single study using a mixed regression model as this approach has demonstrated a greater power and sufficient control of population structure (Wojcik et al., 2019). The SCOURGE cohort consisted of 3512 COVID-19-positive patients: cases (n = 1625) were defined as hospitalized COVID-19 patients, and controls (n = 1887) were defined as non-hospitalized COVID-19-positive patients.
Logistic mixed regression models were fitted using the SAIGEgds (Zheng and Davis, 2021) package in R, which implements the two-step mixed SAIGE (Zhou et al., 2018) model methodology and the SPA test. Baseline covariables included sex, age (continuous), and the first 10 PCs. To account for potential heterogeneity in the recruitment and hospitalization criteria across the participating countries, we adjusted the models by groups of the recruitment areas classified into six categories: Brazil, Colombia, Ecuador, Mexico, Paraguay, and Spain. This dataset has not been used in any previously published GWAS of COVID-19.
Meta-analysis of Latin American populations
The results of the SCOURGE Latin American cohort were meta-analyzed with the AMR HGI-B2 data, conforming our primary analysis. Summary results from the HGI freeze 7 B2 analysis corresponding to the admixed AMR population were obtained from the public repository (April 8, 2022: https://www.covid19hg.org/results/r7/), summing up 3077 cases and 66,686 controls from seven contributing studies. We selected the B2 phenotype definition because it offered more power, and the presence of population controls not ascertained for COVID-19 does not have a drastic impact on the association results.
The meta-analysis was performed using an inverse-variance weighting method in METAL (Willer et al., 2010). The average allele frequency was calculated, and variants with low imputation quality (Rsq < 0.3) were filtered out, leaving 10,121,172 variants for meta-analysis.
Heterogeneity between studies was evaluated with Cochran’s
Replicability of associations
The model-based method MAMBA (McGuire et al., 2021) was used to calculate the posterior probabilities of replication for each of the lead variant (PPR; PP that an SNP has a non-zero replicable effect). We defined PPR <0.1 as a low posterior probability of replication, following the original paper, whereas those with a PPR >90% were considered consistent and likely to replicate in future studies. Variants with p<1 × 10–05 were clumped and combined with random pruned variants from the 1KGP AMR reference panel. Then, MAMBA was applied to the set of significant and non-significant variants.
Each of the lead variants was also tested for association with the main comorbidities in the SCOURGE cohort with logistic regression models (adjusted by the same base covariables as the GWAS).
Definition of the genetic risk loci and putative functional impact
Definition of lead variant and novel loci
To define the lead variants in the loci that were genome-wide significant, LD-clumping was performed on the meta-analysis data using a threshold p-value<5 × 10–8, clump distance = 1500 kb, independence set at a threshold r2 = 0.1 and the SCOURGE cohort genotype data as the LD reference panel. Independent loci were deemed as a novel finding if they met the following criteria: (1) p-value<5 × 10–8 in the meta-analysis and p-value>5 × 10–8 in the HGI B2 ALL meta-analysis or in the HGI B2 AMR and AFR and EUR analyses when considered separately; (2) Cochran’s
Annotation and initial mapping
Functional annotation was performed with FUMA (Watanabe et al., 2017) for those variants with a p-value<5 × 10–8 or in moderate-to-strong LD (r2 > 0.6) with the lead variants, where the LD was calculated from the 1KGP AMR panel. Genetic risk loci were defined by collapsing LD blocks within 250 kb. Then, genes, scaled CADD v1.4 scores, and RegulomeDB v1.1 scores were annotated for the resulting variants with ANNOVAR in FUMA (Watanabe et al., 2017). Gene-based analysis was also performed using MAGMA (de Leeuw et al., 2015) as implemented in FUMA under the SNP-wide mean model using the 1KGP AMR reference panel. Significance was set at a threshold p<2.66 × 10–6 (which assumes that variants can be mapped to a total of 18,817 genes).
FUMA allowed us to perform initial gene mapping by two approaches: (1) positional mapping, which assigns variants to genes by physical distance using 10 kb windows; and (2) eQTL mapping based on GTEx v.8 data from whole blood, lungs, lymphocytes, and esophageal mucosa tissues, establishing a false discovery rate (FDR) of 0.05 to declare significance for variant–gene pairs.
Subsequently, to assign the variants to the most likely gene driving the association, we refined the candidate genes by fine mapping the discovered regions.
Bayesian fine-mapping
To conduct a Bayesian fine mapping, credible sets for the genetic loci considered novel findings were calculated on the results from each of the three meta-analyses to identify a subset of variants most likely containing the causal variant at the 95% confidence level, assuming that there is a single causal variant and that it has been tested. We used
VEP and V2G annotation
We used the Variant-to-Gene (V2G) score to prioritize the genes that were most likely affected by the functional evidence based on eQTL, chromatin interactions, in silico functional predictions, and distance between the prioritized variants and transcription start site (TSS), based on data from the Open Targets Genetics portal (Ghoussaini et al., 2021). Details of the data integration and the weighting of each of the datasets are described in detail at https://genetics-docs.opentargets.org/our-approach/data-pipeline. V2G is a score for ranking the functional genomics evidence that supports the connection between variants and genes (the higher the score the more likely the variant to be functionally implicated on the assigned gene). We used VEP release 111 (https://www.ensembl.org/info/docs/tools/vep/index.html; accessed April 10, 2024; McLaren et al., 2016) to annotate the following: gene symbol, function (exonic, intronic, intergenic, non-coding RNA, etc.), impact, feature type, feature, and biotype.
We queried the GWAS catalog (date of accession: 1/07/2024) for evidence of association of each of the prioritized genes with traits related to lung diseases or phenotypes. Lastly, those which were linked to COVID-19, infection, or lung diseases in the revised literature were classified as ‘literature evidence’.
Transcription-wide association studies
TWAS were conducted using the pretrained prediction models with MASHR-computed effect sizes on GTEx v8 datasets (Barbeira et al., 2019a; Barbeira et al., 2021). The results from the Latin American meta-analysis were harmonized and integrated with the prediction models through S-PrediXcan (Barbeira et al., 2018) for lung, whole blood, lymphocyte, and esophageal mucosal tissues. Statistical significance was set at p-value<0.05 divided by the number of genes that were tested for each tissue. Subsequently, we leveraged results for all 49 tissues and ran a multitissue TWAS (S-MultiXcan) to improve the power for association, as demonstrated recently (Barbeira et al., 2019b). TWAS was also performed using recently published gene expression datasets derived from a cohort of African Americans, Puerto Ricans, and Mexican Americans (GALA II-SAGE) (Kachuri et al., 2023).
Cross-population meta-analyses
We conducted two additional meta-analyses to investigate the ability of combining populations to replicate our discovered risk loci. This methodology enabled the comparison of effects and the significance of associations in the novel risk loci between the results from analyses that included or excluded other population groups.
The first meta-analysis comprised the five populations analyzed within HGI (B2-ALL). Additionally, to evaluate the three GIA components within the SCOURGE Latin American cohort (Bryc et al., 2010), we conducted a meta-analysis of the admixed AMR, EUR, and AFR cohorts (B2). All summary statistics were retrieved from the HGI repository. We applied the same meta-analysis methodology and filters as in the admixed AMR meta-analysis.
Cross-population polygenic risk score
A PGS for critical COVID-19 was derived by combining the variants associated with hospitalization or disease severity that have been discovered to date. We curated a list of lead variants that were (1) associated with either severe disease or hospitalization in the latest HGIv7 release (Kanai et al., 2023) (using the hospitalization weights) or (2) associated with severe disease in the latest GenOMICC meta-analysis (Pairo-Castineira et al., 2023) that were not reported in the latest HGI release. A total of 48 markers were used in the PGS model (see Supplementary file 13) since two variants were absent from our study.
Scores were calculated and normalized for the SCOURGE Latin American cohort with PLINK 1.9. This cross-ancestry PGS was used as a predictor for hospitalization (COVID-19-positive patients who were hospitalized vs COVID-19-positive patients who did not necessitate hospital admission) by fitting a logistic regression model. Prediction accuracy for the PGS was assessed by performing 500 bootstrap resamples of the increase in the pseudo-R2. We also divided the sample into deciles and percentiles to assess risk stratification. The models were fit for the dependent variable adjusting for sex, age, the first 10 PCs, and the sampling region (in the admixed AMR cohort) with and without the PGS, and the partial pseudo-R2 was computed and averaged among the resamples.
A clinical severity scale was used in a multinomial regression model to further evaluate the power of this cross-ancestry PGS for risk stratification. These severity strata were defined as follows: (0) asymptomatic; (1) mild, that is, with symptoms, but without pulmonary infiltrates or need of oxygen therapy; (2) moderate, that is, with pulmonary infiltrates affecting <50% of the lungs or need of supplemental oxygen therapy; (3) severe disease, that is, with hospital admission and PaO2 <65 mmHg or SaO2 <90%, PaO2/FiO2<300, SaO2/FiO2<440, dyspnea, respiratory frequency ≥22 bpm, and infiltrates affecting >50% of the lungs; and (4) critical disease, that is, with an admission to the ICU or need of mechanical ventilation (invasive or noninvasive).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2024, Diz-de Almeida, Cruz et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
The genetic basis of severe COVID-19 has been thoroughly studied, and many genetic risk factors shared between populations have been identified. However, reduced sample sizes from non-European groups have limited the discovery of population-specific common risk loci. In this second study nested in the SCOURGE consortium, we conducted a genome-wide association study (GWAS) for COVID-19 hospitalization in admixed Americans, comprising a total of 4702 hospitalized cases recruited by SCOURGE and seven other participating studies in the COVID-19 Host Genetic Initiative. We identified four genome-wide significant associations, two of which constitute novel loci and were first discovered in Latin American populations (
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer