Content area

Abstract

Deep learning and generative modeling have achieved impressive feats but still face challenges in high stakes settings or applications such as health and science. This thesis investigates the reliability and alignment of generative models and utilizes the resulting insights to motivate alternative approaches for scientific discovery. A common thread in this work is to conceptualize tasks and objects using the language of probability, with distributions as the central objects of interest. The thesis is divided into two parts. Part one focuses on the reliability and alignment of generative models. First, we examine the failures of a wide range of generative models in out-of-distribution detection. Then, concluding that the issue is estimation error from standard training of existing architectures, we turn to strategies for finetuning autoregressive models beyond maximum likelihood estimation of examples from the target distribution of interest. Part two focuses on machine learning for scientific discovery. First, drawing from theoretical and practical insights on out-of-distribution detection from part one, we propose a method for robust anomaly detection and apply it to the detection of novel jets for particle physics. Second, motivated by the goal of directly processing empirical distributions as inputs, we introduce improved permutation-invariant neural network architectures and employ them in a pipeline for mechanism and biomarker discovery from single-cell data. The thesis concludes with thoughts on advancing generative modeling and machine learning for science.

Details

1010268
Business indexing term
Title
A Probabilistic Perspective on the Reliability and Alignment of Generative Models & Machine Learning for Scientific Discovery
Number of pages
368
Publication year
2025
Degree date
2025
School code
0146
Source
DAI-B 86/12(E), Dissertation Abstracts International
ISBN
9798286424153
Committee member
Cho, Kyunghyun; Fergus, Rob; Cranmer, Kyle; Snoek, Jasper
University/institution
New York University
Department
Center for Data Science
University location
United States -- New York
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31848169
ProQuest document ID
3223786997
Document URL
https://www.proquest.com/dissertations-theses/probabilistic-perspective-on-reliability/docview/3223786997/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic