Introduction
Since its emergence, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been evolving, with evidence for increased transmissibility and ability to evade existing immune responses elicited by prior infection or COVID-19 vaccination [1,2]. Multiple mutations that increase transmission and immune evasion were identified when the Delta and subsequent Omicron variants (and sub-variants) emerged. However, few in-depth analyses characterizing the genomic variation within a circulating variant and identifying potential mutations associated with immune evasion have been conducted. In addition, during the Omicron wave, individual immunity could be due to prior infection or multiple levels of vaccination, due to the availability of vaccine boosters in this timeframe, and any immune evasion characteristics found among mutations could be tied to a specific type of immunity, derived from, for example, prior infection, immunization dose(s), or both.
To address these data gaps, we conducted a retrospective cohort study of persons with nucleic-acid amplification test (NAAT)-confirmed SARS-CoV-2 infection with the Omicron variant confirmed by whole genome sequencing. Patients were members of a large, integrated health system that had detailed epidemiologic and clinical data on all participants. The study assessed overall SARS-CoV-2 genetic variation over time as well as the relationship between mutations and the individual’s source(s) of potential COVID-19 immunity. The key assumption in this study is that after adjusting for sociodemographic characteristics and timeframe, an association between mutation presence and vaccine- or infection-derived immunity could imply greater immune evasion for that mutation.
Methods
Setting
Kaiser Permanente Northern California (KPNC) is an integrated health system that serves over 4.5 million members in Northern and Central California and provides comprehensive preventive and curative care in inpatient and outpatient settings across 266 medical offices and 21 hospitals. Members receive most clinical services, including laboratory testing, outpatient, and inpatient care in KPNC facilities. Members have similar sociodemographic characteristics to the population of Northern and Central California [3]. SARS-CoV-2 NAAT for outpatients and hospitalized patients is conducted at a regional laboratory or local hospital laboratories using various platforms and specimens are sent to the California Department of Public Health (CDPH) for whole genome sequencing.
Study design
We conducted a retrospective cohort study of persons with NAAT-confirmed SARS-CoV-2 infection and whole genome sequencing-confirmed Omicron variant infection and assessed patterns among non-lineage-defining mutations as well as the association between SARS-CoV-2 genetic variation and immunity source (if any). This immunity was defined by prior reported SARS-CoV-2 infection or vaccination. Epidemiologic and clinical data were obtained from the KPNC Virtual Data Warehouse, a common data model into which standardized data are extracted from clinical and administrative databases including an integrated electronic health record database (Epic, Verona, WI, USA), and from other sources [4,5]. The study was approved by the KPNC and University of California, Berkeley Institutional Review Boards with waivers of the requirement for informed consent. This activity was reviewed by the Centers for Disease Control and Prevention (CDC) and conducted consistent with applicable federal law and CDC policy (see, e.g., 45 C.F.R. part 46.102(l)(2), 21 C.F.R. part 56; 42 U.S.C. §241(d); 5 U.S.C. §552a; 44 U.S.C. §3501 et seq.).
Inclusion criteria
We included all persons with an incident NAAT-confirmed SARS-CoV-2 infection and whole genome sequencing-confirmed Omicron variant infection from January 1, 2022 to October 31, 2022 who met the following criteria: 1) KPNC membership for at least one year prior to SARS-CoV-2 diagnosis; 2) aged ≥5 years and <90 years; 3) no prior receipt of COVID-19 vaccine or the completion of a primary COVID-19 vaccine series of two doses of BNT162b2 [Pfizer/BioNTech] or m-RNA-1973 [Moderna/National Institutes of Health] or one dose of Ad.26.COV2.s [Johnson & Johnson] more than 14 days prior to index date; 4) incident infection defined as the first NAAT-confirmed SARS-CoV-2 infection during the study period. The index date was defined as the date of NAAT-confirmation of SARS-CoV-2 infection.
SARS-CoV-2 whole genome sequencing and processing
KPNC is a partner in COVIDNet with CDPH and submits a majority of NAAT-confirmed SARS-CoV-2 specimens to COVIDNet for genomic surveillance; a subset of specimens undergo whole genome sequencing at CDPH or affiliated laboratories based on CDPH priorities, including prioritizing hospitalized cases and a convenience sample of the first specimens received per week (number varies by burden) [6]. COVIDNet consists of a set of (primarily University of California-affiliated) laboratories tapped to assist CDPH because they had surplus sequencing capacity. Illumina sequencing was used by most of the sites, with about 10% of genomes being sequenced by Oxford Nanopore. For this study, we only included Illumina-sequenced genomes because the error rate of Nanopore is too high to accurately identify low-frequency mutations. Varying sets of primers were used by different institutions, but the most common for the Omicron genomes were the ARTICv3 and ARTICv4 primers. The University of California, San Francisco lab used their own custom set of primers with more primers and shorter amplicons. After sequencing, the CDPH Viral and Rickettsial Disease Laboratory links lineage information with patient identifiers and shares the results with KPNC. The raw sequence files are shared with the University of California, Berkeley (UCB); KPNC and UCB collaborate further to link demographic, clinical, and epidemiologic data from KPNC with subsequent sequence analysis data from UCB.
The raw sequencing data were processed through a SARS-CoV-2 analysis pipeline that was modified for this work as follows. Adapter removal and trimming were performed using bbduk (http://sourceforge.net/projects/bbmap/). The reads were then aligned to the Wuhan reference genome (GISAID ID: EPI_ISL_402125, GenBank accession number MN908947) using minimap2 [7] followed by primer trimming using iVar [8]. Samtools was used to create a pileup file from which a consensus sequence was generated. This consensus genome was created with iVar using a minimum depth of 10 reads and majority rule for base calling. We next used iVar to call variants from the pileup file where we set the sensitivity threshold for calling a mutation to 2%. This process calls mutations for any loci where at least 2% of the reads are non-reference. Using this very low threshold allowed us to capture even very low frequency mutations in the sequencing data. This threshold resulted in many spurious mutations from sequencing and alignment errors and therefore we later set a threshold of 5% when processing mutations. This original list of variants was then annotated with the gene and amino acid change (if there was one) or insertion or deletion.
After calling with iVar using a low-frequency threshold, mutations were compiled into a master frequency matrix listing all mutations that appeared in any genome in rows, and all the genomes that each mutation appeared in columns (https://github.com/staciawyman/SC2_GenVar). The cell values were frequency of the alternate reads at the location. A mutation had to have an allele frequency of at least 5% to be included to rule out sequencing errors. This matrix was partitioned into variant-specific parts to facilitate analysis. We constructed an Omicron matrix including all individuals that had the Omicron lineage [9] between January 1, 2022 and October 31, 2022 including non-synonymous SNPs, insertions and deletions (indels). Pipeline code is available at https://github.com/staciawyman/SC2_GenVar. For research purposes, data was accessed between January 1, 2023 and June 30, 2024.
Outcome measure
The primary outcome measure was, among persons with an incident NAAT-confirmed SARS-CoV-2 infection, the presence of individual mutations (nonsynonymous SNPs, insertions and deletions) with ≥5% allelic frequency and ≥5% and ≤95% population frequency, as well as mutation groups created by grouping SNPs with a pairwise Pearson correlation coefficient > 0.7. Additionally, patterns in the prevalence of these mutations were assessed as follows: as noted earlier, pairwise Pearson correlations with other mutations in each infection; presence within Omicron subvariant (BA.1, BA.2, BA.4, BA.5, BCD, BE, BF, Bdot), and prevalence over time.
Primary exposure: SARS-CoV-2 immunity status
The primary exposure of interest in this analysis was immunity level, defined in mutually exclusive categories: i) no recorded vaccination or recorded prior infection; ii) recorded prior infection but no recorded vaccination; and iii) recorded vaccination (primary series complete and/or more than primary series), regardless of prior infection status.
Statistical analysis
Adjusted relative prevalence, shown as an adjusted rate ratio (aRR), and 95% confidence intervals (CIs) for the estimates for our exposure and adjustment variables were obtained from modified Poisson regression models with robust standard errors using PROC GENMOD with a log link function in SAS (Version 9.4, Cary, North Carolina); statistical significance was defined as a p-value <0.05. All p-values were adjusted for multiple comparisons to control the false discovery rate at 5%, using the Benjamini-Hochberg procedure [10]. To adjust for confounding, all models included age category, sex, race/ethnicity, Charlson comorbidity index score [11], and index month. Adjusting for these demographic and clinical factors accounted for differential vaccination status of these subgroups, increasing vaccination rates over time, and the inherent uncertainty as to where in a population a new mutation might arise.
Results
From January 1, 2022 to October 31, 2022, 312,249 SARS-CoV-2-positive specimens were submitted for sequencing, 18,603 (6.0%) underwent sequencing (S1 Fig), and 18,476 (5.9%) had reported lineages consistent with the Omicron variant. After applying selection criteria, 15,556 persons with SARS-CoV-2 Omicron variant infection were included in the analysis: 9,172 (58%) vaccinated with a booster; 4,209 (26%) vaccinated with a complete primary series [two doses of Pfizer or Moderna vaccine]; and 2,185 (14%) with no vaccination (S1 Fig). The prevalence of Omicron subvariants in our specimens, aggregated to the week level, showed increasing diversity over time (Fig 1).
[Figure omitted. See PDF.]
SARS-CoV-2-positive specimens tested by Kaiser Permanente Northern California, subvariant proportions by week, January 1, 2022 to October 31, 2022.
After applying selection criteria to the sequences, we identified 78 non-lineage defining mutations (all non-synonymous SNPs) in 1,456 SARS-CoV-2 SNPs included in this analysis, shown in S2 Fig. Identified mutations were in the spike gene (n = 44), nucleocapsid gene (n = 8) open reading frame 1a (ORF1a) (n = 19), ORF1b (n = 4), ORF3a (n = 1), ORF6c (n = 1), ORF10 (n = 1). The population frequency within our genomes of individual identified mutations ranged from 5.0% to 95.0% and 46 out of 79 (58.2%) mutations were present in <50% of the overall population. Although we called low-frequency mutations with our pipeline, none of them met the criteria for inclusion in our analysis.
Of the 78 identified SNPs that met our inclusion criteria, many co-occurred on the same specimens. S2 Fig shows the pairwise correlations of SNP presence in our dataset. Given the strong pairwise correlations observed, we formed groups of SNPs that had a pairwise Pearson correlation > .7, reducing the dimension to 33 SNPs/SNP groups, also shown in S2 Fig. These mutation groups are largely monophyletic and segregate together across the phylogeny of Omicron genomes (S3 Fig).
The demographics of persons with SARS-CoV-2 infection included in this analysis, stratified by vaccination and prior infection status, are shown in Table 1, and the number of total reported infections and those sequenced over time is shown in S4 Fig. Significant differences were found between groups, thus supporting their inclusion in our covariate adjustment set. Figure 2 details some of the relationships between SNP groups, time, and subvariant among the specimens in our data set. While certain mutations/groups appear essentially only in a single subvariant (for example, 99.1% of S:H69L mutations appear in specimens classified as BA.1 in Figure 2c), the majority of mutations/groups appear in many subvariants, suggesting that they are not simply proxy measures of subvariant. Further, the presence of mutations/groups do not follow clear temporal patterns (Figure 2b). While some mutations such as Group 12 arise, become plentiful, then wane, others disappear and re-appear over time without a clear pattern (for example, S:D1259H). Smoothed over time, grouped SNP prevalence by month is shown in S5 Fig.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
SNP, group, and sublineage appearance, January 1, 2022 to October 31, 2022: (a) Proportion of specimens in each sublineage by week; (b) Proportion of specimens with a given SNP/group by week; (c) Proportion of sublineage by SNP/group (rows sum to 100%).
Using modified Poisson regression models with robust standard errors, we found several of our 33 SNP groupings (Table 2) and/or 78 individual mutations (S1 Table) that were significantly associated with immunity status; however, after correcting p-values to adjust for multiple comparisons, none of the associations were statistically significant. These findings were not meaningfully changed when people with a booster vaccine were/were not aggregated with the complete primary series group (S2 Table, S3 Table). The SNPs/groups with the largest absolute associations with vaccination (Table 2) included some that were strongly associated with a single variant (for example, ORF10:L37F, which occurred almost exclusively in BA.5 subvariants) as well as some that frequently appeared in multiple variants (for example, ORF1b:N498I and group 45).
[Figure omitted. See PDF.]
When looking at the evolutionary relationships with the genomes colored by vaccination status (S6 Fig), vaccination status did not show a correlation with a clade/subvariant or distance from the root (number of mutations with respect to the reference SARS-CoV-2 genome).
Discussion
Characterizing the continuing evolution of SARS-CoV-2, along with active surveillance of infections, is critical to targeting public health measures and vaccine development. Large-scale global whole genome sequencing efforts have contributed substantially to our understanding of the genetic variation of SARS-CoV-2 and its associations with transmissibility and immune evasion. Our current study adds to this evidence base by assessing within-variant (Omicron) variation in the genome beyond the already-identified subvariants such as BA.1 and BA.2 using a unique whole genome surveillance system linking genomic data with detailed data on demographic, clinical and epidemiologic characteristics in a well-defined population.
Understanding the potential for immune evasion based on mutations beyond the spike protein is critical for vaccine development and understanding the mechanisms of the human immune response. In this detailed analysis of genomic variation within the Omicron variant over a 10-month period in California, we identified only 78 mutations that met our inclusion criteria among 15,566 genomes. These mutations occurred across the entire genome, not just in the spike protein. Spike protein mutations previously implicated with immune evasion were excluded based on population frequency. We encourage future research to examine the entire viral genome for potential immune evasion or increased replication or infectivity.
Our results showed that there appear to be some mutations that are tightly grouped with certain subvariants, while a number of other mutations appear across multiple subvariants. While we did not find evidence for immune evasion among our identified mutations/groups after correction for multiple testing, it is possible that we are underpowered to detect small effects over a short time period [12]. Many of the SNPs with the highest adjusted risk ratios appeared in a small number of specimens (S1 Table). While one would expect more variability from the smallest table cells if we were simply observing statistical noise, it is also possible that a larger sample could confirm these stronger adjusted risk ratios.
Our analysis has several limitations. First, not all possible demographic and clinical variables were included in our models, and if those variables are associated with immunity status and SNP presence, unmeasured confounding could bias our analysis. Second, there is a risk of misclassification of immunity status; though the KPNC surveillance system for vaccination is quite robust, prior infections (say, diagnosed by a home test) may have gone unreported. Third, we focused on a set of specimens from Northern and Central California, limiting the generalizability of our findings to this context. Finally, there may be hidden bias as to which specimens were successfully sequenced based on the nucleic acid viral load in the specimen (e.g., specimens with lower viral loads might not be successfully sequenced and thus would have been excluded from this analysis). Fourth, although our surveillance system has a very robust system for assessing individual-level vaccination status, there is a small risk of misclassification of persons as having incorrect immunity status.
As part of the global effort to conduct near real-time genomic surveillance for SARS-CoV-2, there has been tremendous progress in developing national genomic surveillance systems with nearly real-time public reporting of circulating SARS-CoV-2 variants and uploading of consensus sequence data to publicly available genomic databases [13–16]. However, these efforts can be improved. First, enhancing methods to define consensus sequences to also assess for minor variation with <50% allelic frequency would better define the true genomic variation of circulating SARS-CoV-2, including capturing within-host genetic diversity [17,18]. Second, current SARS-CoV-2 genomic surveillance systems in the United States have limited metadata on human demographics, clinical factors, epidemiologic factors, and immunity status (e.g., COVID-19 vaccination and documented prior infection), particularly when compared to the United Kingdom [19]. This limits the ability to correlate noted patterns in SARS-CoV-2 genetic variation and real-world clinical and epidemiologic outcomes in the United States. We present one potential analysis that uses this data linkage to better understand how SARS-CoV-2 genetic variation might be associated with human disease, in this case evidence for immune evasion associated with specific mutation clusters. Improving the clinical metadata associated with SARS-CoV-2 genomic data should be a priority as we continue to improve and expand genomic surveillance. We have demonstrated this capability through partnership and collaboration between a large healthcare system and a state public health department, namely KPNC and CDPH/COVIDNet.
In summary, we demonstrate that linking SARS-CoV-2 genomic sequence data with detailed clinical data revealed no significant associations between genetic variation among Omicron SARS-CoV-2 mutations and immunity status. Further research using larger sample sets that link SARS-CoV-2 genetic data with clinical data is warranted for the potential to reveal whether SARS-CoV-2 non-lineage defining genetic variation might contribute to the fitness of SARS-CoV-2 by either immune evasion or other mechanism.
Supporting information
S1 Fig. Cohort.
Cohort eligibility flowchart with prior recorded infection and vaccination status.
https://doi.org/10.1371/journal.pone.0319218.s001
(TIF)
S2 Fig. SNP correlation.
Pearson correlations of SNP presence in specimens, before and after clustering.
https://doi.org/10.1371/journal.pone.0319218.s002
(TIF)
S3 Fig. Phylogenies.
Phylogenies of 15,045 Omicron genomes colored by group. Mutations in group 52 (top) represent a clade defined by the BA.2.12.1 subvariant (in blue). Mutations in group 4 (bottom) fall into a subclade of BA.4 (in yellow).
https://doi.org/10.1371/journal.pone.0319218.s003
(TIF)
S4 Fig. Total and sequenced SARS-CoV-2 infections.
Persons with recorded SARS-CoV-2 infection at KPNC by week from January 1, 2022 to October 31, 2022.
https://doi.org/10.1371/journal.pone.0319218.s004
(TIF)
S5 Fig. SNP prevalence.
Proportion of specimens with a given SNP or group, by month, January–October 2022.
https://doi.org/10.1371/journal.pone.0319218.s005
(TIF)
S6 Fig. Phylogenies by vaccination status.
Phylogeny of 15,045 Omicron genomes sampled between 1/1/2022 and 9/25/22. Branch lengths represent divergence from Wuhan reference genome and nodes are colored by vaccination status. Vaccinated category includes primary series and primary plus booster.
https://doi.org/10.1371/journal.pone.0319218.s006
(TIF)
S1 Table. Individual SNP associations with vaccination status.
Association between SNP presence and vaccine status (pooling over complete and boosted) among persons included in this analysis with SARS-CoV-2 infection with Omicron variant in Northern California, January 1, 2022 - October 31, 2022 (N = 15,566).
https://doi.org/10.1371/journal.pone.0319218.s007
(TIF)
S2 Table. Individual SNP associations with vaccination substatus.
Association between SNP presence and vaccine status (separating out complete and boosted) among persons included in this analysis with SARS-CoV-2 infection with Omicron variant in Northern California, January 1, 2022 – October 31, 2022 (N = 15,566).
https://doi.org/10.1371/journal.pone.0319218.s008
(TIF)
S3 Table. SNP and group associations with vaccination substatus.
Association between mutation groups and vaccine status (separating complete and boosted) among persons included in this analysis with SARS-CoV-2 infection with Omicron variant in Northern California, January 2022–October 2022 (N = 15,566). Adjusted risk ratios derived from Poisson regression models to assess the association of immunity status with each mutation group adjusting for age, sex, race/ethnicity, Charlson comorbidity index score, and month of SARS-CoV-2 infection.
https://doi.org/10.1371/journal.pone.0319218.s009
(TIF)
Acknowledgments
The authors are grateful to all Kaiser Permanente members without whom this study would not have been possible. In addition, we wish to acknowledge our partnership with the California Department of Public Health and participation in the California SARS-CoV-2 Whole Genome Sequencing Initiative (COVIDNet). We thank the VRDL and COVIDNet Teams: Summer Adams, Allison Bailey, Matt Bacinskas, Nikki Baumrind, Elizabeth Baylis, John Bell, Ricardo Berumen III, Yocelyn Cruz, Mojgan Deldari, Alex Espinosa, Sabrina Gilliam, Madeline Glenn, Bianca Gonzaga, Melanie Greengard, Jill Hacker, Kim Hansard, Monica Haw, Thalia Huynh, Chantha Kath, Ruth Lopez, Sharon Messenger, Alexa Quintana, Chris Preas, Clarence Reyes, Maria Salas, Hilary Tamnanchit, Serena Ting, Kathy Jacobson, MD, and Carol Glaser, MD.
The content is solely the responsibility of the authors and does not represent the official views or opinions of the National Institutes of Health, Kaiser Permanente, California Department of Public Health or the California Health and Human Services Agency.
Use of trade names and commercial sources is for identification only and does not imply endorsement by the California Department of Public Health or the California Health and Human Services Agency.
References
1. 1. Lou F, Li M, Pang Z, Jiang L, Guan L, Tian L, et al. Understanding the secret of SARS-CoV-2 variants of concern/interest and immune escape. Front Immunol. 2021;12:744242. PMID: 34804024
* View Article
* PubMed/NCBI
* Google Scholar
2. 2. Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, et al; COVID-19 Genomics UK (COG-UK) Consortium. SARS-CoV-2 variants, spike mutations and immune escape. Nat Rev Microbiol. 2021;19(7):409–24. pmid:34075212
* View Article
* PubMed/NCBI
* Google Scholar
3. 3. Davis AC, Voelkel JL, Remmers CL, Adams JL, McGlynn EA. Comparing kaiser permanente members to the general population: implications for generalizability of research. Perm J. 2023;27(2):87–98. pmid:37170584
* View Article
* PubMed/NCBI
* Google Scholar
4. 4. Ross TR, Ng D, Brown JS, Pardee R, Hornbrook Mark C, Hart G, et al. The HMO research network virtual data warehouse: a public data model to support collaboration. EGEMS (Washington, DC). 2014;2(1):1049. PMID: 25848584
* View Article
* PubMed/NCBI
* Google Scholar
5. 5. Weeks J, Pardee R. Learning to share health care data: a brief timeline of influential common data models and distributed health data networks in U.S. Health care research. EGEMS (Washington, DC). 2019;7(1):4. PMID: 30937326
* View Article
* PubMed/NCBI
* Google Scholar
6. 6. Wadford DA, Baumrind N, Baylis EF, Bell JM, Bouchard EL, Crumpler M, et al. Implementation of California COVIDNet—a multi-sector collaboration for statewide SARS-CoV-2 genomic surveillance. Front Public Health. 2023;11:1249614. pmid:37937074
* View Article
* PubMed/NCBI
* Google Scholar
7. 7. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics (Oxford, England). 2018;34(18):3094–100. PMID: 29750242
* View Article
* PubMed/NCBI
* Google Scholar
8. 8. Grubaugh ND, Gangavarapu K, Quick J, Matteson NL, De Jesus JG, Main BJ, et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019;20(1):8. pmid:30621750
* View Article
* PubMed/NCBI
* Google Scholar
9. 9. O’Toole A, Scher E, Underwood A, Jackson B, Hill V, McCrone JT, et al. Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evol. 2021;7(2):veab064. pmid:34527285
* View Article
* PubMed/NCBI
* Google Scholar
10. 10. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995;57(1):289–300.
* View Article
* Google Scholar
11. 11. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–83. pmid:3558716
* View Article
* PubMed/NCBI
* Google Scholar
12. 12. Volz E. Fitness, growth and transmissibility of SARS-CoV-2 genetic variants. Nat Rev Genet. 2023;24(10):724–34. pmid:37328556
* View Article
* PubMed/NCBI
* Google Scholar
13. 13. Chen Z, Azman AS, Chen X, Zou J, Tian Y, Sun R, et al. Global landscape of SARS-CoV-2 genomic surveillance and data sharing. Nat Genet. 2022;54(4):499–507. pmid:35347305
* View Article
* PubMed/NCBI
* Google Scholar
14. 14. World Health Organization. Guidance for surveillance of SARS-CoV-2 variants: interim guidance, 9 august 2021. World Health Organization; 2021. https://apps.who.int/iris/handle/10665/343775
15. 15. Lambrou AS, Shirk P, Steele MK, Paul P, Paden CR, Cadwell B, et al; Strain Surveillance and Emerging Variants Bioinformatic Working Group. Genomic surveillance for SARS-CoV-2 variants: predominance of the delta (B.1.617.2) and omicron (B.1.1.529) variants—United States, June 2021–January 2022. MMWR Morb Mortal Wkly Rep. 2022;71(6):206–11. PMID: 35143464
* View Article
* PubMed/NCBI
* Google Scholar
16. 16. Robishaw JD, Alter SM, Solano JJ, Shih RD, DeMets David L, Maki DG, et al. Genomic surveillance to combat COVID-19: challenges and opportunities. Lancet Microbe. 2021;2(9):e481–4. pmid:34337584
* View Article
* PubMed/NCBI
* Google Scholar
17. 17. Tonkin-Hill G, Martincorena I, Amato R, Lawson ARJ, Gerstung M, Johnston I, et al; COVID-19 Genomics UK (COG-UK) Consortium. Patterns of within-host genetic diversity in SARS-CoV-2. eLife. 2021;10:e66857. pmid:34387545
* View Article
* PubMed/NCBI
* Google Scholar
18. 18. Gao R, Zu W, Liu Y, Li J, Li Z, Wen Y, et al. Quasispecies of SARS-CoV-2 revealed by single nucleotide polymorphisms (SNPs) analysis. Virulence. 2021;12(1):1209–26. pmid:34030593
* View Article
* PubMed/NCBI
* Google Scholar
19. 19. Nicholls SM, Poplawski R, Bull MJ, Underwood A, Chapman M, Abu-Dahab K, et al; COVID-19 Genomics UK (COG-UK) Consortium. CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance. Genome Biol. 2021;22(1):196. pmid:34210356
* View Article
* PubMed/NCBI
* Google Scholar
Citation: Nugent JR, Wood MS, Liu L, Bullick T, Schapiro JM, Arunleung P, et al. (2025) SARS-CoV-2 Omicron subvariant genomic variation associations with immune evasion in Northern California: A retrospective cohort study. PLoS ONE 20(2): e0319218. https://doi.org/10.1371/journal.pone.0319218
About the Authors:
Joshua R. Nugent
Roles: Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing
E-mail: [email protected]
Affiliation: Division of Research, Kaiser Permanente Northern California, Oakland, California, United States of America
ORICD: https://orcid.org/0000-0002-4479-9673
Mariah S. Wood
Roles: Conceptualization, Formal analysis, Methodology, Writing – review & editing
Affiliation: Division of Research, Kaiser Permanente Northern California, Oakland, California, United States of America
Liyan Liu
Roles: Conceptualization, Formal analysis, Methodology, Writing – review & editing
Affiliation: Division of Research, Kaiser Permanente Northern California, Oakland, California, United States of America
Teal Bullick
Roles: Formal analysis, Writing – review & editing
Affiliation: Viral and Rickettsial Disease Laboratory, Center for Laboratory Sciences, California Department of Public Health, Richmond, California, United States of America
ORICD: https://orcid.org/0000-0001-9043-6155
Jeffrey M. Schapiro
Roles: Conceptualization, Methodology, Writing – review & editing
Affiliation: The Permanente Medical Group, Kaiser Permanente Northern California, Oakland, California, United States of America
Phacharee Arunleung
Roles: Writing – review & editing
Affiliation: Viral and Rickettsial Disease Laboratory, Center for Laboratory Sciences, California Department of Public Health, Richmond, California, United States of America
ORICD: https://orcid.org/0000-0002-2446-4227
Gautham Gautham
Roles: Writing – review & editing
Affiliation: Viral and Rickettsial Disease Laboratory, Center for Laboratory Sciences, California Department of Public Health, Richmond, California, United States of America
ORICD: https://orcid.org/0000-0002-6858-5346
Shiffen Getabecha
Roles: Writing – review & editing
Affiliation: Viral and Rickettsial Disease Laboratory, Center for Laboratory Sciences, California Department of Public Health, Richmond, California, United States of America
Christina Morales
Roles: Writing – review & editing
Affiliation: Viral and Rickettsial Disease Laboratory, Center for Laboratory Sciences, California Department of Public Health, Richmond, California, United States of America
Laura B. Amsden
Roles: Conceptualization, Methodology, Writing – review & editing
Affiliation: Division of Research, Kaiser Permanente Northern California, Oakland, California, United States of America
ORICD: https://orcid.org/0000-0003-1178-2792
Crystal A. Hsiao
Roles: Conceptualization, Methodology, Writing – review & editing
Affiliation: Division of Research, Kaiser Permanente Northern California, Oakland, California, United States of America
ORICD: https://orcid.org/0000-0003-0806-2671
Debra A. Wadford
Roles: Supervision, Writing – review & editing
Affiliation: Viral and Rickettsial Disease Laboratory, Center for Laboratory Sciences, California Department of Public Health, Richmond, California, United States of America
ORICD: https://orcid.org/0000-0002-8630-427X
Stacia K. Wyman
Roles: Conceptualization, Formal analysis, Methodology, Writing – original draft, Writing – review & editing
Affiliation: Innovative Genomics Institute, University of California, Berkeley, California, United States of America
ORICD: https://orcid.org/0000-0002-8937-8397
Jacek Skarbinski
Roles: Conceptualization, Formal analysis, Methodology, Supervision, Writing – original draft, Writing – review & editing
Affiliations: Division of Research, Kaiser Permanente Northern California, Oakland, California, United States of America, The Permanente Medical Group, Kaiser Permanente Northern California, Oakland, California, United States of America, Department of Infectious Diseases, Oakland Medical Center, Kaiser Permanente Northern California, Oakland, California, United States of America, Physician Researcher Program, Kaiser Permanente Northern California, Oakland, California, United States of America
ORICD: https://orcid.org/0000-0003-1630-5733
[/RAW_REF_TEXT]
1. Lou F, Li M, Pang Z, Jiang L, Guan L, Tian L, et al. Understanding the secret of SARS-CoV-2 variants of concern/interest and immune escape. Front Immunol. 2021;12:744242. PMID: 34804024
2. Harvey WT, Carabelli AM, Jackson B, Gupta RK, Thomson EC, Harrison EM, et al; COVID-19 Genomics UK (COG-UK) Consortium. SARS-CoV-2 variants, spike mutations and immune escape. Nat Rev Microbiol. 2021;19(7):409–24. pmid:34075212
3. Davis AC, Voelkel JL, Remmers CL, Adams JL, McGlynn EA. Comparing kaiser permanente members to the general population: implications for generalizability of research. Perm J. 2023;27(2):87–98. pmid:37170584
4. Ross TR, Ng D, Brown JS, Pardee R, Hornbrook Mark C, Hart G, et al. The HMO research network virtual data warehouse: a public data model to support collaboration. EGEMS (Washington, DC). 2014;2(1):1049. PMID: 25848584
5. Weeks J, Pardee R. Learning to share health care data: a brief timeline of influential common data models and distributed health data networks in U.S. Health care research. EGEMS (Washington, DC). 2019;7(1):4. PMID: 30937326
6. Wadford DA, Baumrind N, Baylis EF, Bell JM, Bouchard EL, Crumpler M, et al. Implementation of California COVIDNet—a multi-sector collaboration for statewide SARS-CoV-2 genomic surveillance. Front Public Health. 2023;11:1249614. pmid:37937074
7. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics (Oxford, England). 2018;34(18):3094–100. PMID: 29750242
8. Grubaugh ND, Gangavarapu K, Quick J, Matteson NL, De Jesus JG, Main BJ, et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019;20(1):8. pmid:30621750
9. O’Toole A, Scher E, Underwood A, Jackson B, Hill V, McCrone JT, et al. Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evol. 2021;7(2):veab064. pmid:34527285
10. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol. 1995;57(1):289–300.
11. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–83. pmid:3558716
12. Volz E. Fitness, growth and transmissibility of SARS-CoV-2 genetic variants. Nat Rev Genet. 2023;24(10):724–34. pmid:37328556
13. Chen Z, Azman AS, Chen X, Zou J, Tian Y, Sun R, et al. Global landscape of SARS-CoV-2 genomic surveillance and data sharing. Nat Genet. 2022;54(4):499–507. pmid:35347305
14. World Health Organization. Guidance for surveillance of SARS-CoV-2 variants: interim guidance, 9 august 2021. World Health Organization; 2021. https://apps.who.int/iris/handle/10665/343775
15. Lambrou AS, Shirk P, Steele MK, Paul P, Paden CR, Cadwell B, et al; Strain Surveillance and Emerging Variants Bioinformatic Working Group. Genomic surveillance for SARS-CoV-2 variants: predominance of the delta (B.1.617.2) and omicron (B.1.1.529) variants—United States, June 2021–January 2022. MMWR Morb Mortal Wkly Rep. 2022;71(6):206–11. PMID: 35143464
16. Robishaw JD, Alter SM, Solano JJ, Shih RD, DeMets David L, Maki DG, et al. Genomic surveillance to combat COVID-19: challenges and opportunities. Lancet Microbe. 2021;2(9):e481–4. pmid:34337584
17. Tonkin-Hill G, Martincorena I, Amato R, Lawson ARJ, Gerstung M, Johnston I, et al; COVID-19 Genomics UK (COG-UK) Consortium. Patterns of within-host genetic diversity in SARS-CoV-2. eLife. 2021;10:e66857. pmid:34387545
18. Gao R, Zu W, Liu Y, Li J, Li Z, Wen Y, et al. Quasispecies of SARS-CoV-2 revealed by single nucleotide polymorphisms (SNPs) analysis. Virulence. 2021;12(1):1209–26. pmid:34030593
19. Nicholls SM, Poplawski R, Bull MJ, Underwood A, Chapman M, Abu-Dahab K, et al; COVID-19 Genomics UK (COG-UK) Consortium. CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance. Genome Biol. 2021;22(1):196. pmid:34210356
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 Nugent et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Background
The possibility of association between SARS-CoV-2 genomic variation and immune evasion is not known among persons with Omicron variant SARS-CoV-2 infection.
Methods
In a retrospective cohort, using Poisson regression adjusting for sociodemographic variables and month of infection, we examined associations between individual non-lineage defining mutations and SARS-CoV-2 immunity status, defined as a) no prior recorded infection, b) not vaccinated but with at least one prior recorded infection, c) complete primary series vaccination, and/or d) primary series vaccination and ≥1 booster. We identified all non-synonymous single nucleotide polymorphisms (SNPs), insertions and deletions in SARS-CoV-2 genomes with ≥5% allelic frequency and population frequency of ≥5% and ≤95%. We also examined correlations between the presence of SNPs with each other, with subvariants, and over time.
Results
Seventy-nine mutations met inclusion criteria. Among 15,566 persons infected with Omicron SARS-CoV-2, 1,825 (12%) were unvaccinated with no prior recorded infection, 360 (2%) were unvaccinated with a recorded prior infection, 13,381 (86%) had a complete primary series vaccination, and 9,172 (58%) had at least one booster. After examining correlation between SNPs, 79 individual non-lineage defining mutations were organized into 38 groups. After correction for multiple testing, no individual SNPs or SNP groups were significantly associated with immunity status levels.
Conclusions
Genomic variation identified within SARS-CoV-2 Omicron specimens was not significantly associated with immunity status, suggesting that contribution of non-lineage defining SNPs to immune evasion is minimal. Larger-scale surveillance of SARS-CoV-2 genomes linked with clinical data can help provide information to inform future vaccine development.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer