Abstract/Details

Untangling the Complexity of Nature: Machine-Learning for Accelerated Life-Sciences

Yaari, Adam U.   Massachusetts Institute of Technology ProQuest Dissertations & Theses,  2023. 30672338.

Abstract (summary)

The fundamental understanding of living processes is one of the main pillars in modern medicine and technology. Biological mechanisms are convoluted and stochastic systems that remain largely misunderstood despite centuries of rigorous scientific work. In recent years, machine-learning (ML) has resurfaced as a powerful framework to identify patterns of interest in complex datasets. Yet, the impact of such methods remains limited in the broad context of life-sciences. This work optimizes the utility of ML to accelerate research of fundamental biological problems. First, we propose a paradigm shift from siloed data curation to multi-purpose cohorts at scale, even in the most restrictive case of human experimentation. The potential of this approach is revealed through the Brain TreeBank, a multi-modal dataset of naturalistic language aligned to intracranial neural recordings. The TreeBank provides the resolution and breadth required to probe the spatio-temporal dynamics of language context dependence and representation in the brain. Second, we argue for the importance of ML interpretability to accelerate the understanding of biology. We develop an explainable general-purpose tool for modeling discrete stochastic processes at multiple resolutions with output certainty estimation. We demonstrate the utility of the method by modeling patterns of somatic mutations across the entire cancer genome and extend it to map mutation rates in 37 types of cancer. The confidence intervals and increased sensitivity of the method identify sets of mutations that likely drive cancer growth in both coding and noncoding regions of the genome. Broadly, this work demonstrates how computational approaches can overcome unique challenges in biological data and how biological problems can drive advances of computational methodologies.

Indexing (details)


Business indexing term
Subject
Epigenetics;
Genes;
Neural networks;
Computer science;
Artificial intelligence
Classification
0984: Computer science
0800: Artificial intelligence
URL
https://hdl.handle.net/1721.1/150069
Title
Untangling the Complexity of Nature: Machine-Learning for Accelerated Life-Sciences
Author
Yaari, Adam U.  VIAFID ORCID Logo 
Number of pages
273
Publication year
2023
Degree date
2023
School code
0753
Source
DAI-B 85/2(E), Dissertation Abstracts International
ISBN
9798380096850
Advisor
Katz, Boris; Berger, Bonnie; Barbu, Andrei
University/institution
Massachusetts Institute of Technology
Department
Department of Electrical Engineering and Computer Science
University location
United States -- Massachusetts
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
30672338
ProQuest document ID
2849923816
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
https://www.proquest.com/docview/2849923816