Abstract

RNAseq data can be used to infer genetic variants, yet its use for estimating genetic population structure remains underexplored. Here, we construct a freely available computational tool (RGStraP) to estimate RNAseq-based genetic principal components (RG-PCs) and assess whether RG-PCs can be used to control for population structure in gene expression analyses. Using whole blood samples from understudied Nepalese populations and the Geuvadis study, we show that RG-PCs had comparable results to paired array-based genotypes, with high genotype concordance and high correlations of genetic principal components, capturing subpopulations within the dataset. In differential gene expression analysis, we found that inclusion of RG-PCs as covariates reduced test statistic inflation. Our paper demonstrates that genetic population structure can be directly inferred and controlled for using RNAseq data, thus facilitating improved retrospective and future analyses of transcriptomic data.

RGStraP is a computational pipeline to estimate RNA-seq-based genetic principal components and assess whether these can be used to control for population structure in gene expression analyses.

Details

Title
Direct inference and control of genetic population structure from RNA sequencing data
Author
Fachrul, Muhamad 1   VIAFID ORCID Logo  ; Karkey, Abhilasha 2 ; Shakya, Mila 2 ; Judd, Louise M. 3 ; Harshegyi, Taylor 3   VIAFID ORCID Logo  ; Sim, Kar Seng 4 ; Tonks, Susan 5 ; Dongol, Sabina 2 ; Shrestha, Rajendra 6 ; Salim, Agus 7 ; Adhikari, Anup 8 ; Banda, Happy Chimphako 9 ; Blohmke, Christoph 5 ; Darton, Thomas C. 5 ; Farooq, Yama 5 ; Ghimire, Maheshwar 8 ; Hill, Jennifer 5 ; Hoang, Nhu Tran 10 ; Jere, Tikhala Makhaza 11 ; Kamzati, Moses 11 ; Kao, Yu-Han 12 ; Masesa, Clemens 13 ; Mbewe, Maurice 13 ; Msuku, Harrison 13 ; Munthali, Patrick 13 ; Nga, Tran Vu Thieu 10 ; Nkhata, Rose 11 ; Saad, Neil J. 12 ; Van Tan, Trinh 10 ; Thindwa, Deus 11 ; Khanam, Farhana 14 ; Meiring, James 15 ; Clemens, John D. 16 ; Dougan, Gordon 17 ; Pitzer, Virginia E. 12 ; Qadri, Firdausi 14 ; Heyderman, Robert S. 18 ; Gordon, Melita A. 19 ; Voysey, Merryn 5 ; Baker, Stephen 20 ; Pollard, Andrew J. 5   VIAFID ORCID Logo  ; Khor, Chiea Chuen 4   VIAFID ORCID Logo  ; Dolecek, Christiane 21 ; Basnyat, Buddha 22 ; Dunstan, Sarah J. 23   VIAFID ORCID Logo  ; Holt, Kathryn E. 24   VIAFID ORCID Logo  ; Inouye, Michael 25   VIAFID ORCID Logo 

 Baker Heart and Diabetes Institute, Cambridge Baker Systems Genomics Initiative, Melbourne, Australia (GRID:grid.1051.5) (ISNI:0000 0000 9760 5620); University of Melbourne, Department of Clinical Pathology, Parkville, Australia (GRID:grid.1008.9) (ISNI:0000 0001 2179 088X); The University of Melbourne, School of BioSciences, Parkville, Australia (GRID:grid.1008.9) (ISNI:0000 0001 2179 088X) 
 Patan Academy of Health Sciences, Oxford University Clinical Research Unit, Kathmandu, Nepal (GRID:grid.452690.c) (ISNI:0000 0004 4677 1409); Patan Hospital, Patan Academy of Health Sciences, Lalitpur, Nepal (GRID:grid.417187.c) (ISNI:0000 0004 0644 2774) 
 Monash University, Department of Infectious Diseases, Central Clinical School, Melbourne, Australia (GRID:grid.1002.3) (ISNI:0000 0004 1936 7857) 
 Genome Institute of Singapore, Singapore, Singapore (GRID:grid.418377.e) (ISNI:0000 0004 0620 715X) 
 University of Oxford, and the NIHR Oxford Biomedical Research Centre, Oxford Vaccine Group, Department of Paediatrics, Oxford, UK (GRID:grid.4991.5) (ISNI:0000 0004 1936 8948) 
 Patan Hospital, Patan Academy of Health Sciences, Lalitpur, Nepal (GRID:grid.417187.c) (ISNI:0000 0004 0644 2774) 
 The University of Melbourne, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, Melbourne, Australia (GRID:grid.1008.9) (ISNI:0000 0001 2179 088X); The University of Melbourne, School of Mathematics and Statistics, Melbourne, Australia (GRID:grid.1008.9) (ISNI:0000 0001 2179 088X); Baker Heart and Diabetes Institute, Department of Population Health, Melbourne, Australia (GRID:grid.1051.5) (ISNI:0000 0000 9760 5620) 
 Patan Academy of Health Sciences, Oxford University Clinical Research Unit, Kathmandu, Nepal (GRID:grid.452690.c) (ISNI:0000 0004 4677 1409) 
 Malawi-Liverpool Wellcome Programme, Blantyre, Malawi (GRID:grid.452690.c) 
10  Oxford University Clinical Research Unit, The Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Ho Chi Minh City, Vietnam (GRID:grid.412433.3) (ISNI:0000 0004 0429 6814) 
11  Malawi-Liverpool Wellcome Programme, Blantyre, Malawi (GRID:grid.412433.3) 
12  Yale University, Department of Epidemiology of Microbial Diseases and the Public Health Modeling Unit, Yale School of Public Health, New Haven, USA (GRID:grid.47100.32) (ISNI:0000000419368710) 
13  Malawi-Liverpool Wellcome Programme, Blantyre, Malawi (GRID:grid.47100.32) 
14  International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh (GRID:grid.414142.6) (ISNI:0000 0004 0600 7174) 
15  University of Oxford, and the NIHR Oxford Biomedical Research Centre, Oxford Vaccine Group, Department of Paediatrics, Oxford, UK (GRID:grid.4991.5) (ISNI:0000 0004 1936 8948); Malawi-Liverpool Wellcome Programme, Blantyre, Malawi (GRID:grid.4991.5); University of Sheffield, Department of Infection, Immunity and Cardiovascular Disease, Sheffield, UK (GRID:grid.11835.3e) (ISNI:0000 0004 1936 9262) 
16  International Centre for Diarrhoeal Disease Research, Dhaka, Bangladesh (GRID:grid.414142.6) (ISNI:0000 0004 0600 7174); International Vaccine Institute, Seoul, South Korea (GRID:grid.30311.30) (ISNI:0000 0000 9629 885X) 
17  University of Cambridge, Department of Medicine, Cambridge Institute of Therapeutic Immunology and Infectious Diseases (CITIID), Cambridge, UK (GRID:grid.5335.0) (ISNI:0000000121885934) 
18  University College London, National Institute for Health Research Global Health Research Unit on Mucosal Pathogens, Division of Infection and Immunity, London, UK (GRID:grid.83440.3b) (ISNI:0000000121901201) 
19  Malawi-Liverpool Wellcome Programme, Blantyre, Malawi (GRID:grid.83440.3b); University of Liverpool, Institute of Infection, Veterinary & Ecological Sciences, Liverpool, UK (GRID:grid.10025.36) (ISNI:0000 0004 1936 8470); Kamuzu University of Health Sciences, Blantyre, Malawi (GRID:grid.517969.5); Liverpool School of Tropical Medicine, Department of Clinical Sciences, Liverpool, UK (GRID:grid.48004.38) (ISNI:0000 0004 1936 9764) 
20  University of Cambridge, Department of Medicine, Cambridge, UK (GRID:grid.5335.0) (ISNI:0000000121885934) 
21  University of Oxford, Nuffield Department of Medicine, Centre for Tropical Medicine and Global Health, Oxford, UK (GRID:grid.4991.5) (ISNI:0000 0004 1936 8948); Mahidol University, Mahidol Oxford Tropical Medicine Research Unit, Bangkok, Thailand (GRID:grid.10223.32) (ISNI:0000 0004 1937 0490) 
22  Patan Academy of Health Sciences, Oxford University Clinical Research Unit, Kathmandu, Nepal (GRID:grid.452690.c) (ISNI:0000 0004 4677 1409); University of Oxford, Nuffield Department of Medicine, Centre for Tropical Medicine and Global Health, Oxford, UK (GRID:grid.4991.5) (ISNI:0000 0004 1936 8948) 
23  The University of Melbourne, The Peter Doherty Institute for Infection and Immunity, Melbourne, Australia (GRID:grid.1008.9) (ISNI:0000 0001 2179 088X) 
24  Monash University, Department of Infectious Diseases, Central Clinical School, Melbourne, Australia (GRID:grid.1002.3) (ISNI:0000 0004 1936 7857); London School of Hygiene & Tropical Medicine, Department of Infection Biology, London, UK (GRID:grid.8991.9) (ISNI:0000 0004 0425 469X) 
25  Baker Heart and Diabetes Institute, Cambridge Baker Systems Genomics Initiative, Melbourne, Australia (GRID:grid.1051.5) (ISNI:0000 0000 9760 5620); University of Melbourne, Department of Clinical Pathology, Parkville, Australia (GRID:grid.1008.9) (ISNI:0000 0001 2179 088X); University of Cambridge, Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, Cambridge, UK (GRID:grid.5335.0) (ISNI:0000000121885934); Wellcome Genome Campus and University of Cambridge, Health Data Research UK Cambridge, Cambridge, UK (GRID:grid.5335.0) (ISNI:0000000121885934); University of Cambridge, British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, Cambridge, UK (GRID:grid.5335.0) (ISNI:0000000121885934); University of Cambridge, British Heart Foundation Centre of Research Excellence, Cambridge, UK (GRID:grid.5335.0) (ISNI:0000000121885934); University of Cambridge, Victor Phillip Dahdaleh Heart and Lung Research Institute, Cambridge, UK (GRID:grid.5335.0) (ISNI:0000000121885934) 
Pages
804
Publication year
2023
Publication date
2023
Publisher
Nature Publishing Group
e-ISSN
23993642
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2844932456
Copyright
© The Author(s) 2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.