Abstract

H37Rv is the most widely used Mycobacterium tuberculosis strain, and its genome is globally used as the M. tuberculosis reference sequence. Here, we present Bact-Builder, a pipeline that uses consensus building to generate complete and accurate bacterial genome sequences and apply it to three independently cultured and sequenced H37Rv aliquots of a single laboratory stock. Two of the 4,417,942 base-pair long H37Rv assemblies are 100% identical, with the third differing by a single nucleotide. Compared to the existing H37Rv reference, the new sequence contains ~6.4 kb additional base pairs, encoding ten new regions that include insertions in PE/PPE genes and new paralogs of esxN and esxJ, which are differentially expressed compared to the reference genes. New sequencing and de novo assemblies with Bact-Builder confirm that all 10 regions, plus small additional polymorphisms, are also present in the commonly used H37Rv strains NR123, TMC102, and H37Rv1998. Thus, Bact-Builder shows promise as an improved method to perform accurate and reproducible de novo assemblies of bacterial genomes, and our work provides important updates to the primary M. tuberculosis reference genome.

H37Rv is the most widely used Mycobacterium tuberculosis strain, and its genome is the reference sequence for this pathogen. Here, Chitale et al. present a bioinformatic pipeline for accurate assembly of bacterial genome sequences, and use it to provide important updates to the M. tuberculosis reference genome.

Details

Title
A comprehensive update to the Mycobacterium tuberculosis H37Rv reference genome
Author
Chitale, Poonam 1   VIAFID ORCID Logo  ; Lemenze, Alexander D. 2 ; Fogarty, Emily C. 3   VIAFID ORCID Logo  ; Shah, Avi 4 ; Grady, Courtney 1 ; Odom-Mabey, Aubrey R. 5   VIAFID ORCID Logo  ; Johnson, W. Evan 6 ; Yang, Jason H. 4   VIAFID ORCID Logo  ; Eren, A. Murat 7 ; Brosch, Roland 8   VIAFID ORCID Logo  ; Kumar, Pradeep 1 ; Alland, David 1   VIAFID ORCID Logo 

 Rutgers University – New Jersey Medical School, Ray V. Lourenco Center for the Study of Emerging and Re-emerging Pathogens, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796); Rutgers University – New Jersey Medical School, Public Health Research Institute, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796) 
 Rutgers—The State University of New Jersey, Department of Pathology, Immunology and Laboratory Medicine, New Jersey Medical School, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796) 
 University of Chicago, Department of Medicine, Chicago, USA (GRID:grid.170205.1) (ISNI:0000 0004 1936 7822); University of Chicago, Committee on Microbiology, Chicago, USA (GRID:grid.170205.1) (ISNI:0000 0004 1936 7822) 
 Rutgers University – New Jersey Medical School, Ray V. Lourenco Center for the Study of Emerging and Re-emerging Pathogens, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796); Rutgers University- New Jersey Medical School, Department of Microbiology, Biochemistry and Molecular Genetics, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796) 
 Boston University School of Medicine and Bioinformatics Program, Boston University, Division of Computational Biomedicine, Boston, USA (GRID:grid.189504.1) (ISNI:0000 0004 1936 7558); Boston University, Bioinformatics Program, Boston, USA (GRID:grid.189504.1) (ISNI:0000 0004 1936 7558) 
 Rutgers University – New Jersey Medical School, Ray V. Lourenco Center for the Study of Emerging and Re-emerging Pathogens, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796); Rutgers University – New Jersey Medical School, Public Health Research Institute, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796); Rutgers University – New Jersey Medical School, Center for Data Science, Newark, USA (GRID:grid.430387.b) (ISNI:0000 0004 1936 8796) 
 Helmholtz Institute for Functional Marine Biodiversity (HIFMB), Oldenburg, Germany (GRID:grid.511218.e); Bay Paul Center, Marine Biological Laboratory, Woods Hole, USA (GRID:grid.144532.5) (ISNI:000000012169920X) 
 Université Paris Cité, Unit for Integrated Mycobacterial Pathogenomics, Institut Pasteur, Paris, France (GRID:grid.508487.6) (ISNI:0000 0004 7885 7602) 
Publication year
2022
Publication date
2022
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2737612129
Copyright
© The Author(s) 2022. corrected publication 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.