Abstract

Haplotype-resolved genome assembly plays a crucial role in understanding allele-specific functions. However, obtaining haplotype-resolved assembly for auto-polyploid genomes remains challenging. Existing methods can be classified into reference-based phasing, assembly-based phasing, and gamete binning. Nevertheless, there is a lack of cost-effective and efficient methods for haplotyping auto-polyploid genomes. In this study, we propose a novel phasing algorithm called PolyGH, which combines Hi-C and gametic data. We conducted experiments on tetraploid potato cultivars and divided the method into three steps. Firstly, gametic data was utilized to bin non-collapsed contigs, followed by merging adjacent fragments of the same type within the same contig. Secondly, accurate Hi-C signals related to differential genomic regions were acquired using unique k-mers. Finally, collapsed fragments were assigned to haplotigs based on combined Hi-C and gametic signals. Comparing PolyGH with Hi-C-based and gametic data-based methods, we found that PolyGH exhibited superior performance in haplotyping auto-polyploid genomes when integrating both data types. This approach has the potential to enhance haplotype-resolved assembly for auto-polyploid genomes.

Details

Title
Haplotype-resolved assembly of auto-polyploid genomes via combining Hi-C and gametic data
Author
Zhang, Xiaohui 1 ; Li, Dongxi 2 ; Pan, Weihua 3 

 Taiyuan University of Technology, College of Computer Science and Technology, Taiyuan, China (GRID:grid.440656.5) (ISNI:0000 0000 9491 9632); Chinese Academy of Agricultural Sciences, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Shenzhen, China (GRID:grid.410727.7) (ISNI:0000 0001 0526 1937) 
 Taiyuan University of Technology, College of Computer Science and Technology, Taiyuan, China (GRID:grid.440656.5) (ISNI:0000 0000 9491 9632) 
 Chinese Academy of Agricultural Sciences, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Shenzhen, China (GRID:grid.410727.7) (ISNI:0000 0001 0526 1937) 
Pages
7892
Publication year
2024
Publication date
2024
Publisher
Nature Publishing Group
e-ISSN
20452322
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3031478086
Copyright
© The Author(s) 2024. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.