Abstract

Temperate phages (active prophages induced from bacteria) help control pathogenicity, modulate community structure, and maintain gut homeostasis. Complete phage genome sequences are indispensable for understanding phage biology. Traditional plaque techniques are inapplicable to temperate phages due to their lysogenicity, curbing their identification and characterization. Existing bioinformatics tools for prophage prediction usually fail to detect accurate and complete temperate phage genomes. This study proposes a novel computational temperate phage detection method (TemPhD) mining both the integrated active prophages and their spontaneously induced forms (temperate phages) from next-generation sequencing raw data. Applying the method to the available dataset resulted in 192 326 complete temperate phage genomes with different host species, expanding the existing number of complete temperate phage genomes by more than 100-fold. The wet-lab experiments demonstrated that TemPhD can accurately determine the complete genome sequences of the temperate phages, with exact flanking sites, outperforming other state-of-the-art prophage prediction methods. Our analysis indicates that temperate phages are likely to function in the microbial evolution by (i) cross-infecting different bacterial host species; (ii) transferring antibiotic resistance and virulence genes and (iii) interacting with hosts through restriction-modification and CRISPR/anti-CRISPR systems. This work provides a comprehensively complete temperate phage genome database and relevant information, which can serve as a valuable resource for phage research.

Details

Title
Mining bacterial NGS data vastly expands the complete genomes of temperate phages
Author
Zhang, Xianglilan 1 ; Wang, Ruohan 2 ; Xie, Xiangcheng 3 ; Hu, Yunjia 4 ; Wang, Jianping 2 ; Sun, Qiang 5 ; Feng, Xikang 6 ; Lin, Wei 4 ; Tong, Shanwei 7 ; Yan, Wei 8 ; Wen, Huiqi 1 ; Wang, Mengyao 2 ; Zhai, Shixiang 9 ; Sun, Cheng 10 ; Wang, Fangyi 11 ; Niu, Qi 10 ; Kropinski, Andrew M 12 ; Cui, Yujun 1 ; Jiang, Xiaofang 8 ; Peng, Shaoliang 10 ; Li, Shuaicheng 2 ; Tong, Yigang 4 

 State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology , Beijing  100071,  People's Republic of China 
 Department of Computer Science, City University of Hong Kong , Hong Kong  999077,  People's Republic of China 
 College of Computer, National University of Defense Technology , Changsha  410073,  People's Republic of China 
 Beijing Advanced Innovation Center for Soft Matter Science and Engineering (BAIC-SM), College of Life Science and Technology, Beijing University of Chemical Technology , Beijing  100029,  People's Republic of China 
 The 964 th Hospital , Changchun  130021,  People's Republic of China 
 School of Software, Northwestern Polytechnical University , Xi’an  710072,  People's Republic of China 
 Bioinformatics Graduate Program, University of British Columbia , Vancouver BC  V6T 1Z4,  Canada 
 National Library of Medicine, National Institutes of Health , Bethesda ,  MD  20894, USA 
 Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences , Yantai  264003,  People's Republic of China 
10  School of Computer Science and Electronic Engineering, Hunan University , Changsha  410082,  People's Republic of China 
11  Department of Statistics, the Ohio State University , Columbus, OH  43210,  USA 
12  Departments of Food Science, and Pathobiology, University of Guelph ,  Guelph ,  ON N1G 2W1 , Canada 
Publication year
2022
Publication date
Sep 2022
Publisher
Oxford University Press
e-ISSN
26319268
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3170908910
Copyright
© The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This work is published under https://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.