Abstract

Clustering is a fundamental unsupervised learning method and classic facility location problem where given a set of items along with pair-wise distances, the goal is to group the items into clusters, with close items falling inside the same cluster and far items separated across different clusters. Clustering and its many variations are used to decide on locations to establish facilities such as fire stations or hospitals in an area, where items represent clients that will be served by these facilities. In general, an item is a point in a dataset, the item distances are given similarity measures, and the goal is to cluster items based on their similarities. This is a machine learning method with applications from recommender systems and document analysis to anomaly detection in networks and detecting fake news. Fairness in machine learning and facility location is a major concern and there is a rapidly growing literature on fair clustering. Additionally, it is imperative to ensure algorithms are robust to outliers and anomalies in data and there is a long line of research on robust clustering. In this thesis, we devise algorithms for various fair and robust clustering problems with provable quality guarantees using the power of Linear Program (LP)s for LP-aware reductions, LP-rounding, and the round-or-cut framework.

We consider two definitions of fair clustering: group fairness and individual fairness. In group fairness, the data points are individuals from various protected groups and the goal is to ensure that the outcome of the clustering is not biased towards or against any specific group. In individual fairness, each item comes with a criterion on how well it needs to be represented by its cluster, which is defined with certain fairness considerations in mind. For both definitions, we bring approximation algorithms based on LP-rounding that guarantees high clustering quality with essentially fair clusters. Our theoretical results are accompanied by experiments that empirically establish that our algorithms perform much better in practice than our theory suggests.

In robust clustering, the goal is to discard up to a certain number of data points as outliers in a way that improves the clustering quality the most. We design algorithms for robust clustering with various constraints on the clusters and the points, such as fairness. With this, we investigate whether such constraints make the problem qualitatively harder. Our main tool is LP-rounding using the powerful round-or-cut framework. In some instances, we prove that our results are the best possible unless P=NP.

Details

Title
Approximation Algorithms for Clustering: Fairness and Outlier Detection
Author
Negahbani, Maryam
Publication year
2022
Publisher
ProQuest Dissertation & Theses
ISBN
9798819366257
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
2673346881
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.