Abstract

Inspired by the production of reference data sets in the Genome in a Bottle project, we sequenced one Charolais heifer with different technologies: Illumina paired-end, Oxford Nanopore, Pacific Biosciences (HiFi and CLR), 10X Genomics linked-reads, and Hi-C. In order to generate haplotypic assemblies, we also sequenced both parents with short reads. From these data, we built two haplotyped trio high quality reference genomes and a consensus assembly, using up-to-date software packages. The assemblies obtained using PacBio HiFi reaches a size of 3.2 Gb, which is significantly larger than the 2.7 Gb ARS-UCD1.2 reference. The BUSCO score of the consensus assembly reaches a completeness of 95.8%, among highly conserved mammal genes. We also identified 35,866 structural variants larger than 50 base pairs. This assembly is a contribution to the bovine pangenome for the “Charolais” breed. These datasets will prove to be useful resources enabling the community to gain additional insight on sequencing technologies for applications such as SNP, indel or structural variant calling, and de novo assembly.

Details

Title
A Bos taurus sequencing methods benchmark for assembly, haplotyping, and variant calling
Author
Eché, Camille 1 ; Iampietro, Carole 1   VIAFID ORCID Logo  ; Birbes, Clément 2 ; Dréau, Andreea 2 ; Kuchly, Claire 1   VIAFID ORCID Logo  ; Di Franco, Arnaud 2 ; Klopp, Christophe 2   VIAFID ORCID Logo  ; Faraut, Thomas 3 ; Djebali, Sarah 4   VIAFID ORCID Logo  ; Castinel, Adrien 1 ; Zytnicki, Matthias 5 ; Denis, Erwan 1 ; Boussaha, Mekki 6 ; Grohs, Cécile 6   VIAFID ORCID Logo  ; Boichard, Didier 6 ; Gaspin, Christine 7 ; Milan, Denis 8   VIAFID ORCID Logo  ; Donnadieu, Cécile 1   VIAFID ORCID Logo 

 INRAE, US 1426, GeT-PlaGe, Genotoul, France Genomique, Université Fédérale de Toulouse, Castanet-Tolosan, France (GRID:grid.507621.7) 
 Université Fédérale de Toulouse, INRAE, BioinfOmics, GenoToul Bioinformatics facility, Castanet-Tolosan, France (GRID:grid.507621.7) 
 GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Castanet-Tolosan, France (GRID:grid.507621.7) 
 GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Castanet-Tolosan, France (GRID:grid.507621.7); IRSD, Université de Toulouse, INSERM, INRAE, ENVT, UPS, Toulouse, France (GRID:grid.503230.7) (ISNI:0000 0004 9129 4840) 
 Université Fédérale de Toulouse, INRAE, MIAT, Castanet-Tolosan, France (GRID:grid.507621.7) 
 Université Paris-Saclay, INRAE, AgroParisTech, GABI, Jouy-en-Josas, France (GRID:grid.420312.6) (ISNI:0000 0004 0452 7969) 
 Université Fédérale de Toulouse, INRAE, BioinfOmics, GenoToul Bioinformatics facility, Castanet-Tolosan, France (GRID:grid.507621.7); Université Fédérale de Toulouse, INRAE, MIAT, Castanet-Tolosan, France (GRID:grid.507621.7) 
 INRAE, US 1426, GeT-PlaGe, Genotoul, France Genomique, Université Fédérale de Toulouse, Castanet-Tolosan, France (GRID:grid.507621.7); GenPhySE, Université de Toulouse, INRAE, INPT, ENVT, Castanet-Tolosan, France (GRID:grid.507621.7) 
Pages
369
Publication year
2023
Publication date
2023
Publisher
Nature Publishing Group
e-ISSN
20524463
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2825537614
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.