Full text

Turn on search term navigation

Copyright: © 2022 Beier S et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of metadata in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified.

We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. They form a basis for the proposed VCF extensions here. We have learned from the existing application of VCF that the definition of relevant metadata using controlled standards, vocabulary and the consistent use of cross-references via resolvable identifiers (machine-readable) are particularly necessary and propose their encoding.

VCF is an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant data (for example, the HapMap and the gVCF formats), but none currently have the reach of VCF. For the sake of simplicity, we will only discuss VCF and our recommendations for its use, but these recommendations could also be applied to gVCF. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.

Details

Title
Recommendations for the formatting of Variant Call Format (VCF) files to make plant genotyping data FAIR
Author
Beier, Sebastian; Fiebig, Anne; Pommier Cyril; Liyanage Isuru; Lange, Matthias; Kersey, Paul J; Weise, Stephan; Finkers, Richard; Baron, Koylass; Cezard Timothee; Courtot Mélanie; Contreras-Moreira, Bruno; Naamati Guy; Dyer, Sarah; Scholz Uwe
University/institution
U.S. National Institutes of Health/National Library of Medicine
Publication year
2022
Publication date
2022
Publisher
Faculty of 1000 Ltd.
e-ISSN
20461402
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2696827670
Copyright
Copyright: © 2022 Beier S et al. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.