Full Text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Data obtained with the use of massive parallel sequencing (MPS) can be valuable in population genetics studies. In particular, such data harbor the potential for distinguishing samples from different populations, especially from those coming from adjacent populations of common origin. Machine learning (ML) techniques seem to be especially well suited for analyzing large datasets obtained using MPS. The Slavic populations constitute about a third of the population of Europe and inhabit a large area of the continent, while being relatively closely related in population genetics terms. In this proof-of-concept study, various ML techniques were used to classify DNA samples from Slavic and non-Slavic individuals. The primary objective of this study was to empirically evaluate the feasibility of discerning the genetic provenance of individuals of Slavic descent who exhibit genetic similarity, with the overarching goal of categorizing DNA specimens derived from diverse Slavic population representatives. Raw sequencing data were pre-processed, to obtain a 1200 character-long binary vector. A total of three classifiers were used—Random Forest, Support Vector Machine (SVM), and XGBoost. The most-promising results were obtained using SVM with a linear kernel, with 99.9% accuracy and F1-scores of 0.9846–1.000 for all classes.

Details

Title
A Machine-Learning-Based Approach to Prediction of Biogeographic Ancestry within Europe
Author
Kloska, Anna 1 ; Giełczyk, Agata 2   VIAFID ORCID Logo  ; Grzybowski, Tomasz 3 ; Płoski, Rafał 4   VIAFID ORCID Logo  ; Kloska, Sylwester M 1   VIAFID ORCID Logo  ; Marciniak, Tomasz 2 ; Pałczyński, Krzysztof 2   VIAFID ORCID Logo  ; Rogalla-Ładniak, Urszula 3 ; Malyarchuk, Boris A 5   VIAFID ORCID Logo  ; Derenko, Miroslava V 5 ; Kovačević-Grujičić, Nataša 6   VIAFID ORCID Logo  ; Stevanović, Milena 7   VIAFID ORCID Logo  ; Drakulić, Danijela 6   VIAFID ORCID Logo  ; Davidović, Slobodan 8   VIAFID ORCID Logo  ; Spólnicka, Magdalena 9 ; Zubańska, Magdalena 10 ; Woźniak, Marcin 3   VIAFID ORCID Logo 

 Department of Forensic Medicine, The Ludwik Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, 85067 Bydgoszcz, Poland; Faculty of Medical Sciences, Bydgoszcz University of Science and Technology, 85796 Bydgoszcz, Poland 
 Faculty of Telecommunications, Computer Science and Electrical Engineering, Bydgoszcz University of Science and Technology, 85796 Bydgoszcz, Poland 
 Department of Forensic Medicine, The Ludwik Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, 85067 Bydgoszcz, Poland 
 Department of Medical Genetics, Warsaw Medical University, 02106 Warsaw, Poland 
 Institute of Biological Problems of the North, Russian Academy of Sciences, 685000 Magadan, Russia 
 Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, 11042 Belgrade, Serbia 
 Institute of Molecular Genetics and Genetic Engineering, University of Belgrade, 11042 Belgrade, Serbia; Faculty of Biology, University of Belgrade, 11000 Belgrade, Serbia; Serbian Academy of Sciences and Arts, 11000 Belgrade, Serbia 
 Institute for Biological Research “Siniša Stanković”, National Institute of Republic of Serbia, University of Belgrade, 11060 Belgrade, Serbia 
 Center of Forensic Sicences, University of Warsaw, 00927 Warsaw, Poland 
10  Faculty of Law and Administration, Department of Criminology and Forensic Sciences, University of Warmia and Mazury, 10726 Olsztyn, Poland 
First page
15095
Publication year
2023
Publication date
2023
Publisher
MDPI AG
ISSN
16616596
e-ISSN
14220067
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2882573189
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.