Content area
Full Text
Probab. Theory Relat. Fields (2015) 161:651686 DOI 10.1007/s00440-014-0556-x
The topology of probability distributions on manifolds
Omer Bobrowski Sayan Mukherjee
Received: 5 July 2013 / Revised: 26 February 2014 / Published online: 22 March 2014 Springer-Verlag Berlin Heidelberg 2014
Abstract Let P be a set of n random points in
Rd, generated from a probability measure on a m-dimensional manifold M
Rd. In this paper we study the homology of U(P, r)the union of d-dimensional balls of radius r around P, as n , and
r 0. In addition we study the critical points of dPthe distance function from the
set P. These two objects are known to be related via Morse theory. We present limit
theorems for the Betti numbers of U(P, r), as well as for number of critical points of
index k for dP. Depending on how fast r decays to zero as n grows, these two objects
exhibit different types of limiting behavior. In one particular case (nrm C log n),
we show that the Betti numbers of U(P, r) perfectly recover the Betti numbers of the
original manifold M, a result which is of signicant interest in topological manifold
learning.
Keywords Random complexes Point process Random Betti numbers Stochastic
topology
Mathematics Subject Classication (2000) Primary 60D05 60F15 60G55;
Secondary 55U10
OB was supported by DARPA: N66001-11-1-4002Sub#8. SM is pleased to acknowledge the support of NIH (Systems Biology): 5P50-GM081883, AFOSR: FA9550-10-1-0436, NSF CCF-1049290, and NSF DMS-1209155.
O. Bobrowski (B)
Department of Mathematics, Duke University, Durham, NC 27708, USA e-mail: [email protected]
S. MukherjeeDepartments of Statistical Science, Computer Science, and Mathematics, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USAe-mail: [email protected]
123
652 O. Bobrowski, S. Mukherjee
1 Introduction
The incorporation of geometric and topological concepts for statistical inference is at the heart of spatial point process models, manifold learning, and topological data analysis. The motivating principle behind manifold learning is using low dimensional geometric summaries of the data for statistical inference [4,10,21,47,50]. In topological data analysis, topological summaries of data are used to infer or extract underlying structure in the data [19,25,39,49,52]. In the analysis of spatial point processes, limiting distributions of integral-geometric quantities such as area and boundary length [23,35,41,48], Euler characteristic of patterns of discs centered at random...