Abstract

With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine-learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome. In a preliminary study, we observed that pangoLEARN (decision tree model), while substantially faster than pUShER, offered less consistency across different versions of pangolin v3. Here, we expand upon this analysis to include v3 and v4 of pangolin, which moved the default algorithm for lineage assignment from pangoLEARN in v3 to pUShER in v4, and perform a thorough analysis confirming that pUShER is not only more stable across versions but also more accurate. Our findings suggest that future lineage assignment algorithms for various pathogens should consider the value of phylogenetic placement.

Details

Title
SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine-learning method
Author
de Bernardi Schneider, Adriano 1 ; Su, Michelle 2 ; Hinrichs, Angie S 1 ; Wang, Jade 2 ; Helly Amin 2 ; Bell, John 3 ; Wadford, Debra A 3 ; Áine O’Toole 4 ; Scher, Emily 4 ; Perry, Marc D 1 ; Turakhia, Yatish 5 ; De Maio, Nicola 6 ; Hughes, Scott 2 ; Corbett-Detig, Russ 1 

 Genomics Institute, University of California Santa Cruz , Santa Cruz, CA 95064, USA 
 Department of Health and Mental Hygiene, New York City Public Health Laboratory , New York, NY 10016, USA 
 California Department of Public Health (CDPH), VRDL/COVIDNet , Richmond, CA 94804, USA 
 Institute of Evolutionary Biology, University of Edinburgh , Edinburgh EH9 3FL, UK 
 Department of Electrical and Computer Engineering, University of California San Diego , San Diego, CA 92093, USA 
 European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton CB10 1SD, UK 
Publication year
2024
Publication date
2024
Publisher
Oxford University Press
e-ISSN
20571577
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3168695800
Copyright
© The Author(s) 2024. Published by Oxford University Press. This work is published under https://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.