Abstract

Annotations of evolutionary sequence constraint based on multi-species genome alignments and genome-wide maps of epigenomic marks and transcription factor binding provide important complementary information for understanding the human genome and genetic variation. Here we developed the Constrained Non-Exonic Predictor (CNEP) to quantify the evidence of each base in the genome being in an evolutionarily constrained non-exonic element from an input of over 60,000 epigenomic and transcription factor binding features. We find that the CNEP score outperforms baseline and related existing scores at predicting evolutionarily constrained non-exonic bases from such data. However, a subset of them are still not well predicted by CNEP. We developed a complementary Conservation Signature Score by CNEP (CSS-CNEP) that is predictive of those bases. We further characterize the nature of constrained non-exonic bases with low CNEP scores using additional types of information. CNEP and CSS-CNEP are resources for analyzing constrained non-exonic bases in the genome.

Genome-wide maps of evolutionary constraint and large-scale compendia of epigenomic and transcription factor data provide complementary information for genome annotation. Here, the authors develop the Constrained Non-Exonic Predictor (CNEP) that enables better understanding of their relationship.

Details

Title
Identification and characterization of constrained non-exonic bases lacking predictive epigenomic and transcription factor binding annotations
Author
Grujic Olivera 1   VIAFID ORCID Logo  ; Phung, Tanya N 2 ; Kwon, Soo Bin 3   VIAFID ORCID Logo  ; Arneson, Adriana 3 ; Lee, Yuju 4 ; Lohmueller, Kirk E 5   VIAFID ORCID Logo  ; Ernst, Jason 6   VIAFID ORCID Logo 

 University of California, Los Angeles, Computer Science Department, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718); University of California, Los Angeles, Department of Biological Chemistry, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718) 
 University of California, Los Angeles, Interdepartmental Program in Bioinformatics, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718) 
 University of California, Los Angeles, Department of Biological Chemistry, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718); University of California, Los Angeles, Interdepartmental Program in Bioinformatics, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718) 
 University of California, Los Angeles, Computer Science Department, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718) 
 University of California, Los Angeles, Interdepartmental Program in Bioinformatics, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718); University of California, Los Angeles, Department of Ecology and Evolutionary Biology, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718); University of California, Los Angeles, Department of Human Genetics, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718) 
 University of California, Los Angeles, Computer Science Department, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718); University of California, Los Angeles, Department of Biological Chemistry, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718); University of California, Los Angeles, Interdepartmental Program in Bioinformatics, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718); Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at University of California, Los Angeles, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718); University of California, Los Angeles, Jonsson Comprehensive Cancer Center, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718); University of California, Los Angeles, Molecular Biology Institute, Los Angeles, USA (GRID:grid.19006.3e) (ISNI:0000 0000 9632 6718) 
Publication year
2020
Publication date
2020
Publisher
Nature Publishing Group
e-ISSN
20411723
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2473286496
Copyright
© The Author(s) 2020. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.