Abstract

Motivation: Recent advancements in sequencing technologies have led to the discovery of numerous variants in the human genome. However, understanding their precise roles in diseases remains challenging due to their complex functional mechanisms. Various methodologies have emerged to predict the pathogenic significance of these genetic variants. Typically, these methods employ an integrative approach, leveraging diverse data sources that provide critical insights into genomic function. Despite the abundance of publicly available data sources and databases, the process of navigating, extracting, and pre-processing features for machine learning models can be daunting. Furthermore, researchers often invest substantial effort in feature extraction, only to later discover that these features lack informativeness. Results: In this paper, we present DrivR-Base, an innovative resource that efficiently extracts and integrates molecular information (features) for single nucleotide variants from a wide range of databases and tools, including AlphaFold, ENCODE, and Variant Effect Predictor. The resulting features can be used as input for machine learning models designed to predict the pathogenic impact of human genome variants in disease. Moreover, these feature sets have applications beyond this, including haploinsufficiency prediction and the development of drug repurposing tools. We describe the resources development, practical applications, and potential for future expansion and enhancement. Availability and Implementation: DrivR-Base source code is available at https://github.com/amyfrancis97/DrivR-Base.

Competing Interest Statement

The authors have declared no competing interest.

Footnotes

* https://github.com/amyfrancis97/DrivR-Base

Details

Title
DrivR-Base: A Feature Extraction Toolkit For Variant Effect Prediction Model Construction
Author
Francis, Amy; Campbell, Colin; Gaunt, Tom R
University/institution
Cold Spring Harbor Laboratory Press
Section
New Results
Publication year
2024
Publication date
Jan 17, 2024
Publisher
Cold Spring Harbor Laboratory Press
ISSN
2692-8205
Source type
Working Paper
Language of publication
English
ProQuest document ID
2915800132
Copyright
© 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.