Content area

Abstract

We consider two prevalent data-centric constraints in modern machine learning: (a) restricted data access with potential computational constraints, and (b) poor data quality. Our goal is to provide theoretically sound algorithms/practices for such settings.

Under (a), we focus on federated learning (FL) where data is stored locally on decentralized clients, each with individual computational constraints, and on differentially private training where data access is impaired due to the privacy-preservation requirement. Specifically, we propose an accelerated FL algorithm attaining the best known complexity for smooth non-convex functions under arbitrary client heterogeneity and compressed communication. We also provide a theoretically justified recommendation for setting the clip norm in differentially private stochastic gradient descent (DP-SGD) and derive new convergence results for DP-SGD with heavy-tailed gradients. We validate the effectiveness of our methods via extensive experimentation.

Under (b), we consider the problem of learning with noisy labels in this dissertation. Specifically, we focus on the idea of retraining a model with its own hard predictions (1/0 labels) or soft predictions (raw unrounded scores) on the same training set on which it is initially trained. Surprisingly, this simple idea improves the model's performance, even though no extra information is obtained by retraining. We theoretically characterize this surprising phenomenon for linear models; to our knowledge, our results are the first of their kind. Empirically, we show the efficacy of selective retraining in improving training with local label differential privacy, where the goal is to safeguard the privacy of only the labels by injecting label noise.

Details

1010268
Title
Principled Machine Learning Under Constraints on Data Access, Quality, and Computations
Number of pages
351
Publication year
2025
Degree date
2025
School code
0227
Source
DAI-A 87/6(E), Dissertation Abstracts International
ISBN
9798270232450
Committee member
Liu, Qiang; Kale, Satyen
University/institution
The University of Texas at Austin
Department
Computer Science
University location
United States -- Texas
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32460826
ProQuest document ID
3284362952
Document URL
https://www.proquest.com/dissertations-theses/principled-machine-learning-under-constraints-on/docview/3284362952/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic