Content area
Full Text
N E W S A N D V I E W S
Deep learning for regulatory genomics
Yongjin Park & Manolis Kellis
Computational modeling of DNA and RNA targets of regulatory proteins is improved by a deep-learning approach.
npg 201 5 Nature America, Inc. All rights reserved.
A fundamental unit of gene-regulatory control is the contact between a regulatory protein and its target DNA or RNA molecule. Biophysical models that directly predict these interactions are incomplete and confined to specific types of structures, but computational analysis of large-scale experimental datasets allows regulatory motifs to be identified by their over- representation in target sequences. In this issue, Alipanahi et al.1 describe the use of a deep learning strategy to calculate proteinnucleic acid interactions from diverse experimental data sets. They show that their algorithm, called DeepBind, is broadly applicable and results in increased predictive power compared to traditional single-domain methods, and they use its predictions to discover regulatory motifs, to predict RNA editing and alternative splicing, and to interpret genetic variants.
Diverse statistical models have been proposed for regulatory motif discovery2,
but current models still have considerable limitations3, especially for RNA-binding proteins that recognize both sequence components and secondary (or tertiary) structural components. Moreover, regulatory proteins bind in the context of dozens of other proteins that compete for occupancy or exert synergistic effects, by binding nearby or partly overlapping positions. This results in higher-order structures and motif combinations that are not easily recognizable by traditional methods.
Deep learning, a recent modification of multilayered artificial neural networks, is a particularly powerful approach for learning complex patterns at multiple layers. Originally inspired by the layers of neurons that receive and combine information in the human brain, neural networks have been remarkably adept at learning
complex tasks with relatively simple building blocks arranged in complex networks. However, their internal representations have generally been difficult to interpret, and training deeply layered models has been algorithmically intractable and statistically prone to overfitting.
So-called 'belief networks' have recently combined the learning architectures of neural networks with generative models framed in the context of an internal 'representation' of the world, from which samples are drawn with varying probabilities. Model parameters that match observed data are fit with Bayesian statistics and are used to classify...