Content area
Full Text
About the Authors:
Shiran Abadi
Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing
Affiliation: Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv, Israel
ORCID http://orcid.org/0000-0002-3932-6310
Winston X. Yan
Roles Data curation, Resources, Writing - review & editing
Affiliations Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, Graduate Program in Biophysics, Harvard Medical School, Boston, Massachusetts, United States of America, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts, United States of America
David Amar
Roles Investigation, Software, Writing - review & editing
Affiliations Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel, Division of Cardiovascular Medicine, Department of Medicine, Stanford University, Stanford, CA, United States of America
Itay Mayrose
Roles Conceptualization, Funding acquisition, Investigation, Methodology, Project administration, Writing - original draft, Writing - review & editing
* E-mail: [email protected]
Affiliation: Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv, Israel
ORCID http://orcid.org/0000-0002-8460-1502Abstract
The adaptation of the CRISPR-Cas9 system as a genome editing technique has generated much excitement in recent years owing to its ability to manipulate targeted genes and genomic regions that are complementary to a programmed single guide RNA (sgRNA). However, the efficacy of a specific sgRNA is not uniquely defined by exact sequence homology to the target site, thus unintended off-targets might additionally be cleaved. Current methods for sgRNA design are mainly concerned with predicting off-targets for a given sgRNA using basic sequence features and employ elementary rules for ranking possible sgRNAs. Here, we introduce CRISTA (CRISPR Target Assessment), a novel algorithm within the machine learning framework that determines the propensity of a genomic site to be cleaved by a given sgRNA. We show that the predictions made with CRISTA are more accurate than other available methodologies. We further demonstrate that the occurrence of bulges is not a rare phenomenon and should be accounted for in the prediction process. Beyond predicting cleavage efficiencies, the learning process provides inferences regarding patterns that underlie the mechanism of action of the CRISPR-Cas9 system. We discover that attributes that describe the spatial structure and rigidity of the entire genomic site as well as those...