It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Clustering is a fundamental unsupervised learning method and classic facility location problem where given a set of items along with pair-wise distances, the goal is to group the items into clusters, with close items falling inside the same cluster and far items separated across different clusters. Clustering and its many variations are used to decide on locations to establish facilities such as fire stations or hospitals in an area, where items represent clients that will be served by these facilities. In general, an item is a point in a dataset, the item distances are given similarity measures, and the goal is to cluster items based on their similarities. This is a machine learning method with applications from recommender systems and document analysis to anomaly detection in networks and detecting fake news. Fairness in machine learning and facility location is a major concern and there is a rapidly growing literature on fair clustering. Additionally, it is imperative to ensure algorithms are robust to outliers and anomalies in data and there is a long line of research on robust clustering. In this thesis, we devise algorithms for various fair and robust clustering problems with provable quality guarantees using the power of Linear Program (LP)s for LP-aware reductions, LP-rounding, and the round-or-cut framework.
We consider two definitions of fair clustering: group fairness and individual fairness. In group fairness, the data points are individuals from various protected groups and the goal is to ensure that the outcome of the clustering is not biased towards or against any specific group. In individual fairness, each item comes with a criterion on how well it needs to be represented by its cluster, which is defined with certain fairness considerations in mind. For both definitions, we bring approximation algorithms based on LP-rounding that guarantees high clustering quality with essentially fair clusters. Our theoretical results are accompanied by experiments that empirically establish that our algorithms perform much better in practice than our theory suggests.
In robust clustering, the goal is to discard up to a certain number of data points as outliers in a way that improves the clustering quality the most. We design algorithms for robust clustering with various constraints on the clusters and the points, such as fairness. With this, we investigate whether such constraints make the problem qualitatively harder. Our main tool is LP-rounding using the powerful round-or-cut framework. In some instances, we prove that our results are the best possible unless P=NP.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer