Full Text

Probab. Theory Relat. Fields (2015) 161:651686 DOI 10.1007/s00440-014-0556-x

The topology of probability distributions on manifolds

Omer Bobrowski Sayan Mukherjee

Received: 5 July 2013 / Revised: 26 February 2014 / Published online: 22 March 2014 Springer-Verlag Berlin Heidelberg 2014

Abstract Let P be a set of n random points in

Rd, generated from a probability measure on a m-dimensional manifold M

Rd. In this paper we study the homology of U(P, r)the union of d-dimensional balls of radius r around P, as n , and

r 0. In addition we study the critical points of dPthe distance function from the

set P. These two objects are known to be related via Morse theory. We present limit

theorems for the Betti numbers of U(P, r), as well as for number of critical points of

index k for dP. Depending on how fast r decays to zero as n grows, these two objects

exhibit different types of limiting behavior. In one particular case (nrm C log n),

we show that the Betti numbers of U(P, r) perfectly recover the Betti numbers of the

original manifold M, a result which is of signicant interest in topological manifold

learning.

Keywords Random complexes Point process Random Betti numbers Stochastic

topology

Mathematics Subject Classication (2000) Primary 60D05 60F15 60G55;

Secondary 55U10

OB was supported by DARPA: N66001-11-1-4002Sub#8. SM is pleased to acknowledge the support of NIH (Systems Biology): 5P50-GM081883, AFOSR: FA9550-10-1-0436, NSF CCF-1049290, and NSF DMS-1209155.

O. Bobrowski (B)

Department of Mathematics, Duke University, Durham, NC 27708, USA e-mail: [email protected]

S. MukherjeeDepartments of Statistical Science, Computer Science, and Mathematics, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USAe-mail: [email protected]

123

652 O. Bobrowski, S. Mukherjee

1 Introduction

The incorporation of geometric and topological concepts for statistical inference is at the heart of spatial point process models, manifold learning, and topological data analysis. The motivating principle behind manifold learning is using low dimensional geometric summaries of the data for statistical inference [4,10,21,47,50]. In topological data analysis, topological summaries of data are used to infer or extract underlying structure in the data [19,25,39,49,52]. In the analysis of spatial point processes, limiting distributions of integral-geometric quantities such as area and boundary length [23,35,41,48], Euler characteristic of patterns of discs centered at random...

Show less

The topology of probability distributions on manifolds

Content area

Full Text

Suggested sources