Content area
Optimal transport, as a quantification of distance between distributions, has become a pivotal tool across statistics and machine learning. This dissertation addresses fundamental statistical challenges arising in discrete optimal transport, specifically focusing on asymptotic laws and statistical inference methods for empirical optimal transportation plans. By formulating discrete optimal transport problems as linear programs, the dissertation develops results applicable to general random linear programs, with the empirical discrete optimal transport problem serving as a notable example.
In the first part, motivated by discrete optimal transport, we develop novel asymptotic distributional limits for linear programs with random constraints. Existing results by Klatt, Munk, & Zemel 2022 characterize these limits via a computationally intractable decomposition of R𝑛 into a possibly exponential number of convex cones. We overcome this challenge by expressing the distributional limits through auxiliary linear programs solvable in polynomial time, thereby making it practically feasible to sample from the limit law. We also leverage tools from random convex geometry to give distributional limits for the entire set of random optimal solutions, when the optimum is not unique. Most importantly, we describe a simple, data-driven method to construct asymptotically valid confidence sets in polynomial time.
In the second part, we propose a new estimator for the discrete optimal transport plan that enjoys a central limit theorem (CLT) type of convergence and naive bootstrap consistency. Previous work by Klatt, Tameling, & Munk 2020 showed that the regularized empirical optimal transport plan exhibits CLT-type weak convergence. However, their limit law centers at a regularized optimal plan, which introduces a fixed amount of bias compared to the true plan. We suggest a new regularization scheme and develop a debiasing technique inspired by Richardson-extrapolation. This estimator leads to an asymptotically unbiased Gaussian estimator, which allows statistical inferences for the true optimal plan via bootstrap.