Abstract

Life's molecules, ranging from small molecule ligands to large polymer proteins, are intricately responsible for the biomolecular functions that maintain life within and beyond a single cell. Nonetheless, such biomolecules and their structural roles in cellular biology remain poorly understood at the genomic scale owing to their complex inter-atomic interactions, necessitating the development of new computational methods for studying biomolecules at the atomic level.

To address this issue, in this dissertation, I describe the development of a collection of deep learning methods (Geometric Transformers, GCPNet, GCDM, and FlowDock) for modeling increasingly complex biomolecular structures and interactions. These methods have advanced the state-of-the-art of deep learning in protein and biomolecular representation learning, generative modeling of 3D molecules, and protein-ligand structure and affinity prediction. Additionally, in this dissertation, I detail the design and results of a new deep learning benchmark (PoseBench) and ensembling prediction method (MULTICOM_ligand) for standardized and broadly applicable protein-ligand docking and structure prediction. The findings of the former benchmark suggest that future work in deep learning for 3D biomolecules may benefit from stronger dataset splitting and out-of-distribution evaluation. Further, the latter ensembling method ranked as a top-5 method in the ligand prediction category of the 16th Critical Assessment of Techniques for Structure Prediction (CASP16).

Taken together, this dissertation represents an advancement in our understanding of life's molecules through the lens of deep learning as well as new insights and directions for future deep learning research in the physical and life sciences. All methods, benchmarks, and datasets described in this dissertation have been open sourced and made freely available to the scientific community.

Details

Title
Geometric Deep Learning and Generative Modeling of 3D Biomolecules
Author
Morehead, Alex
Publication year
2025
Publisher
ProQuest Dissertations & Theses
ISBN
9798265472274
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
3280272557
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.