Content area

Abstract

Standard data selection relies on opaque metrics like loss, offering little insight into the specific knowledge a model acquires. This thesis proposes interpretable concept-based data selection, a framework treating high-level semantic concepts as the primary units of curation. We first demonstrate that characterizing the conceptual composition of data is essential for robust generalization. Failing to explicitly capturing the concepts in data leads to unintended artifacts in the selection process. To operationalize this, we introduce a gradient-based methodology that quantifies the influence of specific concepts at the instance level. Finally, we apply this framework to Continual Learning, showing that selecting rehearsal data based on "threatened" concepts significantly mitigates catastrophic forgetting compared to random baselines. This work establishes interpretability not merely as an analytic lens, but as a rigorous, actionable tool for efficient model training.

Details

1010268
Business indexing term
Title
Operationalizing Interpretability: Characterizing and Measuring Concepts for Data Selection
Number of pages
127
Publication year
2025
Degree date
2025
School code
0031
Source
DAI-B 87/6(E), Dissertation Abstracts International
ISBN
9798270224523
Committee member
Chang, Kai-Wei; Grover, Aditya; Peng, Nanyun; Sun, Yizhou
University/institution
University of California, Los Angeles
Department
Computer Science 0201
University location
United States -- California
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32399743
ProQuest document ID
3283091631
Document URL
https://www.proquest.com/dissertations-theses/operationalizing-interpretability-characterizing/docview/3283091631/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic