Advancing Causal Machine Learning for Metabolomic Biomarker Discovery and Cross-Species Gene Regulation

Abstract/Details

Advancing Causal Machine Learning for Metabolomic Biomarker Discovery and Cross-Species Gene Regulation

Ebeid, Mark Maher. University of Pittsburgh ProQuest Dissertations & Theses, 2025. 32278155.

Abstract (summary)

The elucidation of robust, causal signals from high-dimensional “omics” data remains a central challenge in the post-genomic era. In metabolomics, traditional statistical techniques often recover confounded associations between biomarkers and disease that later fail to validate. In regulatory genomics, sequence-to-function (S2F) deep learning models are powerful, yet often opaque and prone to poor out-of-distribution generalization. From this perspective, both challenges stem from the same underlying issue: a reliance on observational data, where correlative signals obscure causal mechanisms. This dissertation introduces two computational frameworks that advance causal machine learning for biomedical data. Integrative Model for Atherosclerotic Disease (IMAD) combines automated causal structure learning with statistical modeling to map dependencies among metabolomic, clinical, and demographic variables. Applied to a case-control study of cardiovascular disease (CVD), and more specifically, atherosclerotic cardiovascular disease (ASCVD) in Japan, IMAD improved classification AUROC relative to association-based models and isolated glutamic acid and trigonelline as putative direct effectors of these outcomes. Mean and Correlation Alignment (MORALE) learns species-invariant sequence representations for transcription factor binding prediction by aligning distributional moments in latent space—easily embedded into any architecture—providing a foundation for a more advanced framework aimed at causally disentangling conserved regulatory signals from species-specific elements. Evaluated on liver ChIP-seq data from up to five mammals, MORALE yielded consistent gains in area under the precision-recall curve (auPRC) and avoided the performance degradation observed with adversarial domain adaptation by way of a gradient reversal layer (GRL). Collectively, these methods encourage the integration of causal principles—and those amenable to them—to yield models that are both robust and generalizable, facilitating biomarker discovery and regulatory inference.

Indexing (details)

Business indexing term

Subject:

Artificial intelligence

Subject

Bioinformatics;
Statistics;
Artificial intelligence

Classification

0715: Bioinformatics
0800: Artificial intelligence
0463: Statistics

Identifier / keyword

Causal discovery; Cross-species; Domain adaptation; Machine learning; Representation learning

Title

Advancing Causal Machine Learning for Metabolomic Biomarker Discovery and Cross-Species Gene Regulation

Author

Ebeid, Mark Maher

Number of pages

159

Publication year

2025

Degree date

2025

School code

0178

Source

DAI-B 87/7(E), Dissertation Abstracts International

ISBN

9798270254780

Advisor

Benos, Panayiotis V.; Kostka, Dennis

Committee member

Sekikawa, Akira; Wu, Wei

University/institution

University of Pittsburgh

Department

Computational Biology

University location

United States -- Pennsylvania

Degree

Ph.D.

Source type

Dissertation or Thesis

Language

English

Document type

Dissertation/Thesis

Dissertation/thesis number

32278155

ProQuest document ID

3290492319

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Document URL

https://www.proquest.com/docview/3290492319/$N

Copyright information

View related documents