Abstract/Details

Advancing Causal Machine Learning for Metabolomic Biomarker Discovery and Cross-Species Gene Regulation

Ebeid, Mark Maher.   University of Pittsburgh ProQuest Dissertations & Theses,  2025. 32278155.

Abstract (summary)

The elucidation of robust, causal signals from high-dimensional “omics” data remains a central challenge in the post-genomic era. In metabolomics, traditional statistical techniques often recover confounded associations between biomarkers and disease that later fail to validate. In regulatory genomics, sequence-to-function (S2F) deep learning models are powerful, yet often opaque and prone to poor out-of-distribution generalization. From this perspective, both challenges stem from the same underlying issue: a reliance on observational data, where correlative signals obscure causal mechanisms. This dissertation introduces two computational frameworks that advance causal machine learning for biomedical data. Integrative Model for Atherosclerotic Disease (IMAD) combines automated causal structure learning with statistical modeling to map dependencies among metabolomic, clinical, and demographic variables. Applied to a case-control study of cardiovascular disease (CVD), and more specifically, atherosclerotic cardiovascular disease (ASCVD) in Japan, IMAD improved classification AUROC relative to association-based models and isolated glutamic acid and trigonelline as putative direct effectors of these outcomes. Mean and Correlation Alignment (MORALE) learns species-invariant sequence representations for transcription factor binding prediction by aligning distributional moments in latent space—easily embedded into any architecture—providing a foundation for a more advanced framework aimed at causally disentangling conserved regulatory signals from species-specific elements. Evaluated on liver ChIP-seq data from up to five mammals, MORALE yielded consistent gains in area under the precision-recall curve (auPRC) and avoided the performance degradation observed with adversarial domain adaptation by way of a gradient reversal layer (GRL). Collectively, these methods encourage the integration of causal principles—and those amenable to them—to yield models that are both robust and generalizable, facilitating biomarker discovery and regulatory inference.

Indexing (details)


Business indexing term
Subject
Bioinformatics;
Statistics;
Artificial intelligence
Classification
0715: Bioinformatics
0800: Artificial intelligence
0463: Statistics
Identifier / keyword
Causal discovery; Cross-species; Domain adaptation; Machine learning; Representation learning
Title
Advancing Causal Machine Learning for Metabolomic Biomarker Discovery and Cross-Species Gene Regulation
Author
Ebeid, Mark Maher
Number of pages
159
Publication year
2025
Degree date
2025
School code
0178
Source
DAI-B 87/7(E), Dissertation Abstracts International
ISBN
9798270254780
Advisor
Benos, Panayiotis V.; Kostka, Dennis
Committee member
Sekikawa, Akira; Wu, Wei
University/institution
University of Pittsburgh
Department
Computational Biology
University location
United States -- Pennsylvania
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32278155
ProQuest document ID
3290492319
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
https://www.proquest.com/docview/3290492319/$N