Understanding the Challenges and Needs of Practitioners in Developing Privacy-Preserving Machine Learning Systems

Abstract

Rapid advances in Artificial Intelligence (AI) and Machine Learning (ML) in recent years have shown significant benefits for their use in many applications. Simultaneously, privacy has captured the immense attention of the public, industry, and academic community, especially within the context of the development and use of AI/ML-enabled systems. Growing concerns in privacy have recently led to new privacy laws, extensive research to identify privacy threats and their mitigation, and the understanding and development of domain-specific privacy requirements and solutions. Incorporating ML models in systems that lack the necessary foundations to protect privacy can exacerbate the privacy risks for ML-enabled systems. Deploying ML models in systems with little regard for privacy can jeopardize the privacy of the systems' data. Emerging large-scale systems like National Digital ID (NDID) have privacy requirements specific to their needs while providing opportunities to improve the quality of life of citizens and communities by offering the means to verify identities legally and provide services in online environments. Governments are adopting such NDID systems and exploring the role of ML models in such systems to provide support for different tasks, including proactively applying for citizen benefits and services, employing digital assistants, creating personalized recommendations, and predicting the societal risks related to housing and food security. These ML models are trained using privacy-sensitive and unbalanced data from the NDID ecosystem. The models can be biased due to the algorithm or the dataset used to train them. It is essential to be careful when developing privacy-preserving ML systems to avoid the possibility of bias still present or even amplified. Furthermore, the absence of robust privacy protection measures and the existing privacy concerns in ML models can lead to additional privacy and fairness issues.

This dissertation examines the challenges and needs of building privacy-preserving ML systems. First, I study privacy threats in NDID ecosystems by analyzing real-world case studies and propose a privacy requirements elicitation framework for NDID systems. Next, I examine how privacy (specifically differential privacy) and fairness mechanisms interact when used in ML models that can be incorporated into systems such as NDID systems. Through an empirical evaluation of these mechanisms in three stages of learning - pre-, in-, and post-processing - I aim to determine the optimal settings for a better balance between privacy and fairness and analyze their trade-offs. This dissertation provides valuable insights into optimizing the performance of these models when considering privacy and fairness issues. Finally, I investigate ML practitioners' perspectives on privacy by 1) analyzing publicly available data from Community Question Answering websites (CQA) and 2) conducting a user study to understand developers' practices, challenges, and needs when developing private ML models.

Details

Business indexing term

Subject:

Artificial intelligence

Subject

Information science;
Computer science;
Artificial intelligence

Classification

0723: Information science
0984: Computer science
0800: Artificial intelligence

Identifier / keyword

Developers; Differential privacy; Fairness; National Digital ID; Privacy; Privacy-preserving machine learning

Title

Understanding the Challenges and Needs of Practitioners in Developing Privacy-Preserving Machine Learning Systems

Author

Aldairi, Maryam

Number of pages

195

Publication year

2025

Degree date

2025

School code

0178

Source

DAI-A 87/2(E), Dissertation Abstracts International

ISBN

9798290920085

Advisor

Joshi, James

Committee member

Palanisamy, Balaji; Munro, Paul; Hibshi, Hanan

University/institution

University of Pittsburgh

Department

Information Science

University location

United States -- Pennsylvania

Degree

D.Phil.

Source type

Dissertation or Thesis

Language

English

Document type

Dissertation/Thesis

Dissertation/thesis number

31933224

ProQuest document ID

3236096284

Document URL

https://www.proquest.com/dissertations-theses/understanding-challenges-needs-practitioners/docview/3236096284/se-2?accountid=208611

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Database

ProQuest One Academic

Understanding the Challenges and Needs of Practitioners in Developing Privacy-Preserving Machine Learning Systems

Content area

Abstract

Details