Content area

Abstract

Machine Learning (ML) models are being adopted as components in software systems. Creating and specializing ML models from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, ML engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for downstream tasks and environments. This practice constructs the ML model supply chain. Traditional software reuse practices and challenges are well understood. However, the foundations for trustworthiness and reusability in the ML supply chain are still largely unexplored.

To investigate the challenges and practices in the ML model supply chain, this dissertation conducts a series of empirical analyses, repository mining studies, and automated tool development, aiming to characterize detailed insights into the challenges and practices in PTM ecosystems. Utilizing mining software repository techniques, I have extracted, analyzed, and interpreted the rich data of deep learning reengineering process, and within PTM packages. My work first adopts traditional software engineering methodologies to understand the challenges and practices of deep learning software. I also characterized PTM naming practices and developed a Deep Neural Network (DNN) architecture assessment pipeline (DARA) to enhance trust and promote more effective reuse in the ML model supply chain. Our finding indicates that ML model naming convention is unique from traditional software packages. Building on my findings, I developed a package confusion detection system and adapted it to ML model supply chain. To enable further research, I released two open-source datasets of PTM packages.

This dissertation compares the PTM model supply chain with the traditional software supply chain across multiple dimensions. The findings reveal that while the ML model supply chain shares many challenges with traditional software, it also introduces unique issues and practices. This work informs future research in ML supply chain analysis, model recommendation systems, model and dataset lineage tracking, and the automated simplification of reengineering processes.

Details

1010268
Title
Trustworthy Reuse in the Machine Learning Model Supply Chain
Number of pages
356
Publication year
2025
Degree date
2025
School code
0183
Source
DAI-B 87/6(E), Dissertation Abstracts International
ISBN
9798265489371
Committee member
Inouye, David I; Qiu, Xiaokang; Lu, Yung-Hsiang; Ghodsi, Zahra
University/institution
Purdue University
University location
United States -- Indiana
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32420223
ProQuest document ID
3283379482
Document URL
https://www.proquest.com/dissertations-theses/trustworthy-reuse-machine-learning-model-supply/docview/3283379482/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic