A Fractal Dimension for Measures via
Persistent Homology
Henry Adams, Manuchehr Aminian, Elin Farnell, Michael Kirby,
Joshua Mirth, Rachel Neville, Chris Peterson, and Clayton Shonkwiler
Abstract We use persistent homology in order to define a family of fractal
dimensions, denoted dimiPH (μ) for each homological dimension i ≥ 0, assigned to
a probability measure μ on a metric space. The case of zero-dimensional homology
(i = 0) relates to work by Steele (Ann Probab 16(4): 1767–1787, 1988) studying
the total length of a minimal spanning tree on a random sampling of points. Indeed,
if μ is supported on a compact subset of Euclidean space Rm for m ≥ 2, then
Steele’s work implies that dim0PH (μ) = m if the absolutely continuous part of μ
has positive mass, and otherwise dim0PH (μ) < m. Experiments suggest that similar
results may be true for higher-dimensional homology 0 < i < m, though this is
an open question. Our fractal dimension is defined by considering a limit, as the
number of points n goes to infinity, of the total sum of the i-dimensional persistent
homology interval lengths for n random points selected from μ in an i.i.d. fashion.
To some measures μ, we are able to assign a finer invariant, a curve measuring
the limiting distribution of persistent homology interval lengths as the number of
points goes to infinity. We prove this limiting curve exists in the case of zerodimensional homology when μ is the uniform distribution over the unit interval, and
This work was completed while Elin Farnell was a research scientist in the Department of
Mathematics at Colorado State University.
H. Adams () · M. Aminian · M. Kirby · J. Mirth · C. Peterson · C. Shonkwiler
Colorado State University, Fort Collins, CO, USA
e-mail:
[email protected];
[email protected];
[email protected];
[email protected];
[email protected];
[email protected]
E. Farnell
Amazon, Seattle, WA, USA
e-mail:
[email protected]
R. Neville
University of Arizona, Fort Collins, CO, USA
e-mail:
[email protected]
© Springer Nature Switzerland AG 2020
N. A. Baas et al. (eds.), Topological Data Analysis, Abel Symposia 15,
https://doi.org/10.1007/978-3-030-43408-3_1
1
2
H. Adams et al.
conjecture that it exists when μ is the rescaled probability measure for a compact
set in Euclidean space with positive Lebesgue measure.
1 Introduction
Let X be a metric space equipped with a probability measure μ. While fractal
dimensions are most classically defined for a space, there are a variety of fractal
dimension definitions for a measure, including the Hausdorff or packing dimension
of a measure [24, 30, 54]. In this paper we use persistent homology to define a fractal
dimension dimiPH (μ) associated to a measure μ for each homological dimension
i ≥ 0. Roughly speaking, dimiPH (μ) is determined by how the lengths of the
persistent homology intervals for a random sample, Xn , of n points from X vary
as n tends to infinity.
Our definition should be thought of as a generalization, to higher homological
dimensions, of fractal dimensions related to minimal spanning trees, as studied, for
example, in [63]. Indeed, the lengths of the zero-dimensional (reduced) persistent
homology intervals corresponding to the Vietoris–Rips complex of a sample Xn are
equal to the lengths of the edges in a minimal spanning tree with Xn as the set of
vertices. In particular, if X is a subset of Euclidean space Rm with m ≥ 2, then [63,
Theorem 1] by Steele implies that dim0PH (μ) ≤ m, with equality when the absolutely
continuous part of μ has positive mass (Proposition 1). Independent generalizations
of Steele’s work to higher homological dimensions are considered in [26, 61, 62].
To some metric spaces X equipped with a measure μ we are able to assign a finer
invariant that contains more information than just the fractal dimension. Consider
the set of the lengths of all intervals in the i-dimensional persistent homology for
Xn . Experiments suggest that when probability measure μ is absolutely continuous
with respect to the Lebesgue measure on X ⊆ Rm , the scaled set of interval
lengths in each homological dimension i converges distribution-wise to some fixed
probability distribution (depending on μ and i). This is easy to prove in the simple
case of zero-dimensional homology when μ is the uniform distribution over the unit
interval, in which case we can also derive a formula for the limiting distribution.
Experiments suggest that when μ is the rescaled probability measure corresponding
to a compact set X ⊆ Rm of positive Lebesgue measure, then a limiting rescaled
distribution exists that depends only on m, i, and the volume of μ (see Conjecture 2).
We would be interested to know the formulas for the limiting distributions with
higher Euclidean and homological dimensions.
Whereas Steele in [63] studies minimal spanning trees on random subsets of a
space, Kozma et al. in [42] study minimal spanning trees built on extremal subsets.
Indeed, they define a fractal dimension for a metric space X as the infimum, over
all powers d, such that for any minimal spanning tree T on a finite number of
points in X, the sum of the edge lengths in T each raised to the power d is
bounded. They relate this extremal minimal spanning tree dimension to the box
counting dimension. Their work is generalized to higher homological dimensions by
A Fractal Dimension for Measures via Persistent Homology
3
Schweinhart [60]. By contrast, we instead generalize Steele’s work [63] on measures
to higher homological dimensions. Three differences between [42, 60] and our work
are the following.
• The former references define a fractal dimension for metric spaces, whereas we
define a fractal dimension for measures.
• The fractal dimension in [42, 60] is defined using extremal subsets, whereas we
define our fractal dimension using random subsets.
• We can estimate our fractal dimension computationally using log-log plots as in
Sect. 5, whereas we do not know a computational technique for estimating the
fractal dimensions in [42, 60].
After describing related work in Sect. 2, we give preliminaries on fractal
dimensions and on persistent homology in Sect. 3. We present the definition of
our fractal dimension and prove some basic properties in Sect. 4. We demonstrate
example experimental computations in Sect. 5; our code is publicly available
at https://github.com/CSU-PHdimension/PHdimension. Section 6 describes how
limiting distributions, when they exist, form a finer invariant. Sects. 7 and 8 discuss
the computational details involved in sampling from certain fractals and estimating
asymptotic behavior, respectively. Finally we present our conclusion in Sect. 9. One
of the main goals of this paper is to pose questions and conjectures, which are shared
throughout.
2 Related Work
2.1 Minimal Spanning Trees
The paper [63] studies the total length of a minimal spanning tree for random subsets
of Euclidean space. Let Xn be a random sample of points from a compact subset of
Rd according to some probability distribution. Let Mn be the sum of all the edge
lengths of a minimal spanning tree on vertex set Xn . Then for d ≥ 2, Theorem 1
of [63] says that
Mn ∼ Cn(d−1)/d
as
n → ∞,
(1.1)
where the relation ∼ denotes asymptotic convergence, with the ratio of the terms
approaching one in the specified limit. Here, C is a fixed constant depending on d
and on the volume of the absolutely continuous part of the probability distribution.1
There has been a wide variety of related work, including for example [5–7, 38, 64–
67]. See [41] for a version of the central limit theorem in this context. The
papers [51, 52] study the length of the longest edge in the minimal spanning tree
1 If
the compact subset has Hausdorff dimension less than d, then [63] implies C = 0.
4
H. Adams et al.
for points sampled uniformly at random from the unit square, or from a torus of
dimension at least two. By contrast, [42] studies Euclidean minimal spanning trees
built on extremal finite subsets, as opposed to random subsets.
2.2 Umbrella Theorems for Euclidean Functionals
As Yukich explains in his book [72], there are a wide variety of Euclidean
functionals, such as the length of the minimal spanning tree, the length of the
traveling salesperson tour, and the length of the minimal matching, which all have
scaling asymptotics analogous to (1.1). To prove such results, one needs to show that
the Euclidean functional of interest satisfies translation invariance, subadditivity,
superadditivity, and continuity, as in [21, Page 4]. Superadditivity does not always
hold, for example it does not hold for the minimal spanning tree length functional,
but there is a related “boundary minimal spanning tree functional" that does satisfy
superadditivity. Furthermore, the boundary functional has the same asymptotics as
the original functional, which is enough to prove scaling results. It is intriguing to
ask if these techniques will work for functionals defined using higher-dimensional
homology.
2.3 Random Geometric Graphs
In this paper we consider simplicial complexes (say Vietoris–Rips or Čech) with
randomly sampled points as the vertex set. The 1-skeleta of these simplicial
complexes are random geometric graphs. We recommend the book [50] by Penrose
as an introduction to random geometric graphs; related families of random graphs
are also considered in [53]. Random geometric graphs are often studied when the
scale parameter r(n) is a function of the number of vertices n, with r(n) tending to
zero as n goes to infinity. Instead, in this paper we are more interested in the behavior
over all scale parameters simultaneously. From a slightly different perspective,
the paper [40] studies the expected Euler characteristic of the union of randomly
sampled balls (potentially of varying radii) in the plane.
2.4 Persistent Homology
Vanessa Robins’ thesis [58] contains many related ideas; we describe one such
example here. Given a set X ⊆ Rm and a scale parameter ε ≥ 0, let
Xε = {y ∈ Rm | there exists some x ∈ X with d(y, x) ≤ ε}
A Fractal Dimension for Measures via Persistent Homology
5
denote the ε-offset of X. The ε-offset of X is equivalently the union of all closed
ε balls centered at points in X. Furthermore, let C(Xε ) ∈ N denote the number
of connected components of Xε . In Chapter 5, Robins shows that for a generalized
Cantor set X in R with Lebesgue measure 0, the box-counting dimension of X is
equal to the limit
log(C(Xε ))
.
ε→0 log(1/ε)
lim
Here Robins considers the entire Cantor set, whereas we study random subsets
thereof.
The paper [46], which heavily influenced our work, introduces a fractal dimension defined using persistent homology. This fractal dimension depends on thickenings of the entire metric space X, as opposed to random or extremal subsets
thereof. As a consequence, the computed dimension of some fractal shapes (such
as the Cantor set cross the interval) disagrees significantly with the Hausdorff or
box-counting dimension.
Schweinhart’s paper [60] takes a slightly different approach from ours, considering extremal (as opposed to random) subsets. After fixing a homological dimension
i, Schweinhart assigns a fractal dimension to each metric space X equal to the
infimum over all powers d such that for any finite subset X′ ⊆ X, the sum of the
i-dimensional persistent homology bar lengths for X′ , each raised to the power d, is
bounded. For low-dimensional metric spaces Schweinhart relates this dimension to
the box counting dimension.
More recently, Divol and Polonik [26] obtain generalizations of [63, 72] to higher
homological dimensions in the case when X is a cube. Related results are obtained
in [62] when X is a ball or sphere, and afterwards in [61] when points are sampled
according to an Ahlfors regular measure.
There is a growing literature on the topology of random geometric simplicial
complexes, including in particular the homology of Vietoris–Rips and Čech complexes built on top of random points in Euclidean space [3, 13, 39]. The paper [14]
shows that for n points sampled from the unit cube [0, 1]d with d ≥ 2, the
maximally persistent cycle in dimension 1 ≤ k ≤ d − 1 has persistence of order
(( logloglogn n )1/k ), where the asymptotic notation big Theta means both big O and big
Omega. The homology of Gaussian random fields is studied in [4], which gives the
expected k-dimensional Betti numbers in the limit as the number of points increases
to infinity, and also in [12]. The paper [29] studies the number of simplices and
critical simplices in the alpha and Delaunay complexes of Euclidean point sets
sampled according to a Poisson process. An open problem about the birth and death
times of the points in a persistence diagram coming from sublevelsets of a Gaussian
random field is stated in Problem 1 of [28]. The paper [18] shows that the expected
persistence diagram,from a wide class of random point clouds, has a density with
respect to the Lebesgue measure
6
H. Adams et al.
The paper [15] explores what attributes of an algebraic variety can be estimated
from a random sample, such as the variety’s dimension, degree, number of irreducible components, and defining polynomials; one of their estimates of dimension
is inspired by our work.
In an experiment in [1], persistence diagrams are produced from random
subsets of a variety of synthetic metric space classes. Machine learning tools, with
these persistence diagrams as input, are then used to classify the metric spaces
corresponding to each random subset. The authors obtain high classification rates
between the different metric spaces. It is likely that the discriminating power is
based not only on the underlying homotopy types of the shape classes, but also on
the shapes’ dimensions as detected by persistent homology.
3 Preliminaries
This section contains background material and notation on fractal dimensions and
persistent homology.
3.1 Fractal Dimensions
The concept of fractal dimension was introduced by Hausdorff to describe spaces
like the Cantor set, and it later found extensive application in the study of dynamical
systems. The attracting sets of simple a dynamical system is often a submanifold,
with an obvious dimension, but in non-linear and chaotic dynamical systems the
attracting set may not be a manifold. The Cantor set, defined by removing the middle
third from the interval [0, 1], and then recursing on the remaining pieces, is a typical
example. It has the same cardinality as R, but it is nowhere-dense, meaning it at no
point resembles a line. The typical fractal dimension of the Cantor set is log3 (2).
Intuitively, the Cantor set has “too many” points to have dimension zero, but also
should not have dimension one.
We speak of fractal dimensions in the plural because there are many different
definitions. In particular, fractal dimensions can be divided into two classes, which
have been called “metric” and “probabilistic” [31]. The former describe only the
geometry of a metric space. Two widely-known definitions of this type, which often
agree on well-behaved fractals, but are not in general equal, are the box-counting
and Hausdorff dimensions. For an inviting introduction to fractal dimensions
see [30]. Dimensions of the latter type take into account both the geometry of a
given set and a probability distribution supported on that set—originally the “natural
measure” of the attractor given by the associated dynamical system, but in principle
any probability distribution can be used. The information dimension is the best
known example of this type. For detailed comparisons, see [32]. Our persistent
homology fractal dimension, Definition 6, is of the latter type.
A Fractal Dimension for Measures via Persistent Homology
7
For completeness, we exhibit some of the common definitions of fractal dimension. The primary definition for sets is given by the Hausdorff dimension [33].
Definition 1 Let S be a subset of a metric space X, let d ∈ [0, ∞), and let δ > 0.
The Hausdorff measure of S is
⎛
Hd (S) = inf ⎝inf
δ
⎧
∞
⎨
⎩
j =1
diam(Bj )d | S ⊆
∞
j =1
⎫⎞
⎬
Bj and diam(Bj ) ≤ δ ⎠ ,
⎭
where the inner infimum is over all coverings of S by balls Bj of diameter at most
δ. The Hausdorff dimension of S is
dimH (S) = inf{Hd (S) = 0.}
d
The Hausdorff dimension of the Cantor set, for example, is log3 (2).
In practice it is difficult to compute the Hausdorff dimension of an arbitrary
set, which has led to a number of alternative fractal dimension definitions in the
literature. These dimensions tend to agree on well-behaved fractals, such as the
Cantor set, but they need not coincide in general. Two worth mentioning are the
box-counting dimension, which is relatively simple to define, and the correlation
dimension.
Definition 2 Let S ⊆ X a metric space, and let Nε denote the infimum of the
number of closed balls of radius ǫ required to cover S. Then the box-counting
dimension of S is
log(Nε )
,
ε→0 log(1/ε)
dimB (S) = lim
provided this limit exists. Replacing the limit with a lim sup gives the upper boxcounting dimension, and a lim inf gives the lower box-counting dimension.
The box-counting definition is unchanged if Nǫ is instead defined by taking the
number of open balls of radius ε, or the number of sets of diameter at most ε, or (for
S a subset of Rn ) the number of cubes of side-length ε [70, Definition 7.8], [30,
Equivalent Definitions 2.1]. It can be shown that dimB (S) ≥ dimH (S). This
inequality can be strict; for example if S = Q ∩ [0, 1] is the set of all rational
numbers between zero and one, then dimH (S) = 0 < 1 = dimB (S) [30, Chapter 3].
In Sect. 4 we introduce a fractal dimension based on persistent homology which
shares key similarities with the Hausdorff and box-counting dimensions. It can also
be easily estimated via log-log plots, and it is defined for arbitrary metric spaces
(though our examples will tend to be subsets of Euclidean space). A key difference,
however, will be that ours is a fractal dimension for measures, rather than for
subsets.
8
H. Adams et al.
There are a variety of classical notions of a fractal dimension for a measure,
including the Hausdorff, packing, and correlation dimensions of a measure [24, 30,
54]. We give the definitions of two of these.
Definition 3 ((13.16) of [30]) The Hausdorff dimension of a measure μ with total
mass one is defined as
dimH (μ) = inf{dimH (S) | S is a Borel subset with μ(S) > 0}.
We have dimH (μ) ≤ dimH (supp(μ)), and it is possible for this inequality to be
strict [30, Exercise 3.10].2 We also give the example of the correlation dimension of
a measure.
Definition 4 Let X be a subset of Rm equipped with a measure μ, and let Xn be
a random sample of n points from X. Let θ : R → R denote the Heaviside step
function, meaning θ (x) = 0 for x < 0 and θ (x) = 1 for x ≥ 0. The correlation
integral of μ is defined (for example in [35, 69]) to be
C(r) = lim
n→∞
1
n2
x,x ′ ∈Xn
x =x ′
θ r − x − x′ .
It can be shown that C(r) ∝ r ν , and the exponent ν is defined to be the correlation
dimension of μ.
In [35, 36] it is shown that the correlation dimension gives a lower bound on
the Hausdorff dimension of a measure. The correlation dimension can be easily
estimated from a log-log plot, similar to the methods we use in Sect. 5. A different
definition of the correlation definition is given and studied in [23, 47]. The
correlation dimension is a particular example of the family of Rènyi dimensions,
which also includes the information dimension as a particular case [56, 57]. A
collection of possible axioms that one might like to have such a fractal dimension
satisfy is given in [47].
3.2 Persistent Homology
The field of applied and computational topology has grown rapidly in recent years,
with the topic of persistent homology gaining particular prominence. Persistent
homology has enjoyed a wealth of meaningful applications to areas such as image
analysis, chemistry, natural language processing, and neuroscience, to name just a
2 See also [31] for an example of a measure whose information dimension is less than the Hausdorff
dimension of its support.
A Fractal Dimension for Measures via Persistent Homology
9
few examples [2, 10, 20, 25, 44, 45, 71, 73]. The strength of persistent homology
lies in its ability to characterize important features in data across multiple scales.
Roughly speaking, homology provides the ability to count the number of independent k-dimensional holes in a space, and persistent homology provides a means
of tracking such features as the scale increases. We provide a brief introduction
to persistent homology in this preliminaries section, but we point the interested
reader to [8, 27, 37] for thorough introductions to homology, and to [16, 22, 34]
for excellent expository articles on persistent homology.
Geometric complexes, which are at the heart of the work in this paper, associate
to a set of data points a simplicial complex—a combinatorial space that serves as a
model for an underlying topological space from which the data has been sampled.
The building blocks of simplicial complexes are called simplices, which include
vertices as 0-simplices, edges as 1-simplices, triangles as 2-simplices, tetrahedra as
3-simplices, and their higher-dimensional analogues as k-simplices for larger values
of k. An important example of a simplicial complex is the Vietoris–Rips complex.
Definition 5 Let X be a set of points in a metric space and let r ≥ 0 be a scale
parameter. We define the Vietoris–Rips simplicial complex VR(X; r) to have as its
k-simplices those collections of k + 1 points in X that have diameter at most r.
In constructing the Vietoris–Rips simplicial complex we translate our collection of
points in X into a higher-dimensional complex that models topological features of
the data. See Fig. 1 for an example of a Vietoris–Rips complex constructed from a
set of data points, and see [27] for an extended discussion.
It is readily observed that for various data sets, there is not necessarily an ideal
choice of the scale parameter so that the associated Vietoris–Rips complex captures
the desired features in the data. The perspective behind persistence is to instead
allow the scale parameter to increase and to observe the corresponding appearance
and disappearance of topological features. To be more precise, each hole appears
at a certain scale and disappears at a larger scale. Those holes that persist across a
wide range of scales often reflect topological features in the shape underlying the
data, whereas the holes that do not persist for long are often considered to be noise.
Fig. 1 An example of a set of data points in Rm with an associated Vietoris–Rips complex at a
fixed scale
10
H. Adams et al.
However, in the context of this paper (estimating fractal dimensions), the holes that
do not persist are perhaps better described as measuring the local geometry present
in a random finite sample.
For a fixed set of points, we note that as scale increases, simplices can only be
added and cannot be removed. Thus, for r0 < r1 < r2 < · · · , we obtain a filtration
of Vietoris–Rips complexes
VR(X; r0 ) ⊆ VR(X; r1 ) ⊆ VR(X; r2 ) ⊆ · · · .
The associated inclusion maps induce linear maps between the corresponding
homology groups Hk (VR(X; ri )), which are algebraic structures whose ranks count
the number of independent k-dimensional holes in the Vietoris–Rips complex. A
technical remark is that homology depends on the choice of a group of coefficients;
it is simplest to use field coefficients (for example R, Q, or Z/pZ for p prime), in
which case the homology groups are furthermore vector spaces. The corresponding
collection of vector spaces and linear maps is called a persistent homology module.
A useful tool for visualizing and extracting meaning from persistent homology
is a barcode. The basic idea is that each generator of persistent homology can be
represented by an interval, whose start and end times are the birth and death scales
of a homological feature in the data. These intervals can be arranged as a barcode
graph in which the x-axis corresponds to the scale parameter. See Fig. 2 for an
example. If Y is a finite metric space, then we let PHi (Y ) denote the corresponding
collection of i-dimensional persistent homology intervals.
Fig. 2 An example of Vietoris–Rips complexes at increasing scales, along with associated
persistent homology intervals. The zero-dimensional persistent homology intervals shows how 21
connected components merge into a single connected component as the scale increases. The onedimensional persistent homology intervals show two one-dimensional holes, one short-lived and
the other long-lived
A Fractal Dimension for Measures via Persistent Homology
11
Zero-dimensional barcodes always produce one infinite interval, as in Fig. 2,
which are problematic for our purposes. Therefore, in the remainder of this paper
we will always use reduced homology, which has the effect of simply eliminating
the infinite interval from the zero-dimensional barcode while leaving everything
else unchanged. As a consequence, there will never be any infinite intervals in the
persistent homology of a Vietoris–Rips simplicial complex, even in homological
dimension zero.
Remark 1 It is well-known (see for example [58]) and easy to verify that for any
finite metric space X, the lengths of the zero-dimensional (reduced) persistent
homology intervals of the Vietoris–Rips complex of X correspond exactly to the
lengths of the edges in a minimal spanning tree with vertex set X.
4 Definition of the Persistent Homology Fractal Dimension
for Measures
Let X be a metric space equipped with a probability measure μ, and let Xn ⊆ X
be a random sample of n points from X distributed independently and identically
according to μ. Build a filtered simplicial complex K on top of vertex set Xn ,
for example a Vietoris–Rips complex VR(X; r) (Definition 5), an intrinsic Čech
complex Č(X, X; r), or an ambient Čech complex Č(X, Rm ; r) if X is a subset of
Rm [17]. Denote the i-dimensional persistent homology of this filtered simplicial
complex by PHi (Xn ). This persistent homology barcode decomposes as a direct
sum of interval summands; we let Li (Xn ) be the sum of the lengths of the intervals
in PHi (Xn ). In the case of homological dimension zero, the sum L0 (Xn ) is simply
the sum of all the edge lengths in a minimal spanning tree with Xn as its vertex set
(since we are using reduced homology).
Definition 6 (Persistent Homology Fractal Dimension) Let X be a metric space
equipped with a probability measure μ, let Xn ⊆ X be a random sample of n
points from X distributed according to μ, and let Li (Xn ) be the sum of the lengths
of the intervals in the i-dimensional persistent homology for Xn . We define the idimensional persistent homology fractal dimension of μ to be
dimiPH (μ) = inf d ∃ constant C(i, μ, d) such that Li (Xn ) ≤ Cn(d−1)/d
d>0
with probability one as n → ∞ .
The constant C can depend on i, μ, and d. Here “Li (Xn ) ≤ Cn(d−1)/d with
probability one as n → ∞" means that we have limn→∞ P[Li (Xn ) ≤ Cn(d−1)/d ] =
1. This dimension may depend on the choices of filtered simplicial complex (say
Vietoris–Rips or Čech), and on the choice of field coefficients for homology
computations; for now those choices are suppressed from the definition.
12
H. Adams et al.
Proposition 1 Let μ be a measure on X ⊆ Rm with m ≥ 2. Then dim0PH (μ) ≤ m,
with equality if the absolutely continuous part of μ has positive mass.
Proof
By Theorem 2 of [63], we have that limn→∞ n−(m−1)/m L0 (Xn ) =
c Rm f (x)(m−1)/m dx, where c is a constant depending on m, and where f is
the absolutely continuous part of μ. To see that dim0PH (μ) ≤ m, note that
L (Xn ) ≤ c
0
f (x)
(m−1)/m
Rm
with probability one as n → ∞ for any ε > 0.
dx + ε n(m−1)/m
⊔
⊓
We conjecture that the i-dimensional persistent homology of compact subsets of
Rm have the same scaling properties as the functionals in [63, 72].
Conjecture 1 Let μ be a probability measure on a compact set X ⊆ Rm with m ≥ 2,
and let μ be absolutely continuous with respect to the Lebesgue measure. Then for
all 0 ≤ i < m, there is a constant C ≥ 0 (depending on μ, m, and i) such that
Li (Xn ) = Cn(m−1)/m with probability one as n → ∞.
Let μ be a probability measure with compact support that is absolutely continuous with respect to Lebesgue measure in Rm for m ≥ 2. Note that Conjecture 1
would imply that the persistent homology fractal dimension of μ is equal to m.
The tools of subadditivity and superadditivity behind the umbrella theorems for
Euclidean functionals, as described in [72] and Sect. 2.2, may be helpful towards
proving this conjecture. In some limited cases, for example when X is a cube or ball,
or when μ is Ahlfors regular, then Conjecture 1 is closely related to [26, 61, 62].
One could alternatively define birth-time or death-time fractal dimensions by
replacing Li (Xn ) with the sum of the birth times, or alternatively the sum of the
death times, in the persistent homology barcodes PHi (Xn ).
5 Experiments
A feature of Definition 6 is that we can use it to estimate the persistent homology
fractal dimension of a measure μ. Indeed, suppose we can sample from X according
to the probability distribution μ. We can therefore sample collections of points Xn
of size n, compute the statistic Li (Xn ), and then plot the results in a log-log fashion
as n increases. In the limit as n goes to infinity, we expect the plotted points to
be well-modeled by a line of slope d−1
d , where d is the i-dimensional persistent
homology fractal dimension of μ. In many of the experiments in this section, the
measures μ are simple enough (or self-similar enough) that we would expect the
persistent homology fractal dimension of μ to be equal to the Hausdorff dimension
of μ.
A Fractal Dimension for Measures via Persistent Homology
13
In our computational experiments, we have used the persistent homology
software packages Ripser [9], Javaplex [68], and code from Duke (see the acknowledgements). For the case of zero-dimensional homology, we can alternatively use
well-known algorithms for computing minimal spanning trees, such as Kruskal’s
algorithm or Prim’s algorithm [43, 55]. We estimate the slope of our log-log plots (of
Li (Xn ) as a function of n) using both a line of best fit, and alternatively a technique
designed to approximate the asymptotic scaling described in Sect. 8. Our code is
publicly available at https://github.com/CSU-PHdimension/PHdimension.
5.1 Estimates of Persistent Homology Fractal Dimensions
We display several experimental results, for shapes of both integral and non-integral
fractal dimension. In Fig. 3, we show the log-log plots of Li (Xn ) as a function of n,
where Xn is sampled uniformly at random from a disk, a square, and an equilateral
triangle, each of unit area in the plane R2 . Each of these spaces constitutes a
manifold of dimension two, and we thus expect these shapes to have persistent
homology fractal dimension d = 2 as well. Experimentally, this appears to be the
case, both for homological dimensions i = 0 and i = 1. Indeed, our asymptotically
estimated slopes lie in the range 0.49–0.54, which is fairly close to the expected
1
slope of d−1
d = 2.
In Fig. 4 we perform a similar experiment for the cube in R3 of unit volume. We
expect the cube to have persistent homology fractal dimension d = 3, corresponding
2
to a slope in the log-log plot of d−1
d = 3 . This appears to be the case for homological
dimension i = 0, where the slope is approximately 0.65. However, for i = 1 and
i = 2, our estimated slope is far from 32 , perhaps because our computational limits
do not allow us to take n, the number of randomly chosen points, to be sufficiently
large.
In Fig. 5 we use log-log plots to estimate some persistent homology fractal
dimensions of the Cantor set cross the interval (expected dimension d = 1 +
log3 (2)), of the Sierpiński triangle (expected dimension d = log2 (3)), of Cantor
dust in R2 (expected dimension d = log3 (4)), and of Cantor dust in R3 (expected
dimension d = log3 (8)). As noted in Sect. 3, various notions of fractal dimension
tend to agree for well-behaved fractals. Thus, in each case above, we provide the
Hausdorff dimension d in order to define an expected persistent homology fractal
dimension. The Hausdorff dimension is well-known for the Sierpiński triangle,
Cantor dust in R2 , and Cantor dust in R3 . The Hausdorff dimension for the Cantor
set cross the interval can be shown to be 1 + log3 (2), which follows from [30,
Theorem 9.3] or [48, Theorem III]). In Sect. 5.2 we define these fractal shapes in
detail, and we also explain our computational technique for sampling points from
them at random.
Summarizing the experimental results for self-similar fractals, we find reasonably good estimates of fractal dimension for homological dimension i = 0. More
14
H. Adams et al.
PH0 for points from disk of area one
PH1 for points from disk of area one
1
1
log 10(L1(Xn))
log 10(L0(Xn))
1.5
0.5
0
1
1.5
2
2.5
log 10(n)
3
3.5
0
-0.5
data
linear fit=0.4942
asymptotic estimate=0.49998
-0.5
0.5
data
linear fit=0.58686
asymptotic estimate=0.4925
-1
4
1
PH0 for points from unit square
2.5
log 10(n)
3
3.5
4
1
1.5
log 10(L1(Xn))
log 10(L0(Xn))
2
PH1 for points from unit square
2
1
0.5
1
1.5
2
2.5
log 10(n)
3
3.5
0.5
0
-0.5
data
linear fit=0.49392
asymptotic estimate=0.49249
0
data
linear fit=0.5943
asymptotic estimate=0.53521
-1
1
4
PH0 for points from triangle of area one
1.5
2
2.5
log 10(n)
3
3.5
4
PH1 for points from triangle of area one
2
1
1.5
log 10(L1(Xn))
log 10(L0(Xn))
1.5
1
0.5
data
linear fit=0.49133
asymptotic estimate=0.48066
0
1
1.5
2
2.5
log 10(n)
3
3.5
4
0.5
0
-0.5
data
linear fit=0.5919
asymptotic estimate=0.49755
-1
1
1.5
2
2.5
log 10(n)
3
3.5
4
Fig. 3 Log scale plots and slope estimates of the number n of sampled points versus L0 (Xn )
(left) or L1 (Xn ) (right). Subsets Xn are drawn uniformly at random from (top) the unit disc in R2 ,
(middle) the unit square, and (bottom) the unit triangle. All cases have slope estimates close to 1/2,
which is consistent with the expected dimension. The asymptotic scaling estimates of the slope are
computed as described in Sect. 8
A Fractal Dimension for Measures via Persistent Homology
PH0 for points from Unit Cube
1.5
2
log 10(L1(Xn))
log 10(L0(Xn))
PH1 for points from Unit Cube
2
2.5
15
1.5
1
1
0.5
0
data
linear fit =0.65397
0.5
1
1.5
2
2.5
log 10(n)
3
3.5
data
linear fit =0.85188
-0.5
1
4
1.5
2
2.5
log 10(n)
3
3.5
4
PH2 for points from Unit Cube
1
log 10(L2(Xn))
0.5
0
-0.5
-1
-1.5
data
linear fit =1.0526
1
2
3
log 10(n)
4
Fig. 4 Log scale plots of the number n of sampled points from the cube versus L0 (Xn ) (left),
L1 (Xn ) (right), and L2 (Xn ) (bottom). The dimension estimate from zero-dimensional persistent
homology is reasonably good, while the one- and two-dimensional cases are less accurate, likely
due to computational limitations
specifically, for the Cantor set cross the interval, we expect d−1
d ≈ 0.3869, and we
find slope estimates from a linear fit of all data and an asymptotic fit to be 0.3799
and 0.36488, respectively. In the case of the Sierpiński triangle, the estimate is
quite good: we expect d−1
d ≈ 0.3691, and the slope estimates from both a linear
fit and an asymptotic fit are approximately 0.37. Similarly, the estimates for Cantor
dust in R2 and R3 are close to the expected values: (1) For Cantor dust in R2 ,
we expect d−1
≈ 0.2075 and estimate d−1
≈ 0.25. (2) For Cantor dust in R3 ,
d
d
d−1
d−1
we expect d ≈ 0.4717 and estimate d ≈ 0.49. For i > 0 many of these
estimates of the persistent homology fractal dimension are not close to the expected
(Hausdorff) dimensions, perhaps because the number of points n is not large enough.
The experiments in R2 are related to [61, Corollary 1], although our experiments are
with the Vietoris–Rips complex instead of the Čech complex.
It is worth commenting on the Cantor set, which is a self-similar fractal in R.
Even though the Hausdorff dimension of the Cantor set is log3 (2), it is not hard to
16
H. Adams et al.
PH0 for points from C
[0,1]
PH1 for points from C
[0,1]
2
log 10(L1(Xn))
log 10(L0(Xn))
0.5
1.5
1
0
-0.5
0.5
-1
data
linear fit =0.3799
asymptotic estimate =0.36488
0
1
1.5
2
2.5
log 10(n)
3
3.5
-1.5
4
data
linear fit =0.43391
asymptotic estimate =0.46707
1
PH0 for points from Sierpinski Triangle
log 10(L1(Xn))
log 10(L0(Xn))
2.5
log 10(n)
1
0.5
1
1.5
2
2.5
log 10(n)
-0.5
3
3.5
data
linear fit=0.47645
asymptotic estimate=0.43541
-1
1
4
1.5
0.5
log 10(L1(Xn))
1
1
0.5
1.5
2
2.5
log 10(n)
3
3.5
PH0 for points from Cantor Dust in R3
2
2.5
log 10(n)
1
4
1.5
2
2.5
log 10(n)
1
2
2.5
log 10(n)
3
3.5
4
3.5
4
PH2 for points from Cantor Dust in R3
0
1
0.5
0
data
linear fit =0.49075
asymptotic estimate =0.48565
3
0.5
log 10(L2(Xn))
log 10(L1(Xn))
1.5
4
data
linear fit =0.34733
asymptotic estimate =0.28639
-1
1.5
2
3.5
-0.5
PH1 for points from Cantor Dust in R3
2.5
3
0
data
linear fit =0.26506
asymptotic estimate =0.24543
1
1.5
PH1 for points from Cantor Dust in R2
2
0
1.5
4
0
PH0 for points from Cantor Dust in R2
1
3.5
0.5
data
linear fit=0.3712
asymptotic estimate=0.37853
0
0.5
3
1
1.5
log 10(L0(Xn))
2
PH1 for points from Sierpinski Triangle
2
log 10(L0(Xn))
1.5
data
linear fit =0.56443
asymptotic estimate =0.49887
-0.5
1
1.5
2
2.5
log 10(n)
3
3.5
4
-0.5
-1
data
linear fit =0.62552
asymptotic estimate =0.5559
-1.5
1
1.5
2
2.5
log 10(n)
3
3.5
4
Fig. 5 (Top) Cantor set cross the unit interval for i = 0, 1. (Second row) Sierpiński triangle in
R2 for i = 0, 1. (Third row) Cantor dust in R2 for i = 0, 1. (Bottom) Cantor dust in R3 for
i = 0, 1, 2. In each case, the zero-dimensional estimate is close to the expected dimension. The
higher-dimensional estimates are not as accurate; we speculate that this is due to computational
limitations
see that the zero-dimensional persistent homology fractal dimension of the Cantor
set is 1. This is because as n → ∞ a random sample of points from the Cantor set
will contain points in R arbitrarily close to 0 and to 1, and hence L0 (Xn ) → 1 as
n → ∞. This is not surprising—we do not necessarily expect to be able to detect a
fractional dimension less than one by using minimal spanning trees (which are one-
A Fractal Dimension for Measures via Persistent Homology
17
Fig. 6 Log scale plot of the number n of sampled points from the Cantor set versus L0 (Xn ). Note
that L0 (Xn ) approaches one, as expected
dimensional graphs). For this reason, if a measure μ is defined on a subset of Rm ,
we sometimes restrict attention to the case m ≥ 2. See Fig. 6 for our experimental
computations on the Cantor set.
Finally, we include one example with data drawn from a two-dimensional
manifold in R3 . We sample points from a torus with major radius 5 and minor
radius 3. We expect the persistent homology fractal dimensions to be 2, and this
is supported in the experimental evidence for zero-dimensional homology shown in
Fig. 7.
5.2 Randomly Sampling from Self-Similar Fractals
The Cantor set C = ∩∞
l=0 Cl is a countable intersection of nested sets C0 ⊇ C1 ⊇
C2 ⊇ · · · , where the set Cl at level l is a union of 2l closed intervals, each of
length 31l . More precisely, C0 = [0, 1] is the closed unit interval, and Cl is defined
recursively via
Cl−1
∪
Cl =
3
2 Cl−1
+
3
3
for l ≥ 1.
In our experiment for the Cantor set (Fig. 6), we do not sample from the Cantor
distribution on the entire Cantor set C, but instead from the left endpoints of level
Cl of the Cantor set, where l is chosen to be very large (we use l = 100,000). More
precisely, in order to sample points, we choose a binary sequence {ai }li=1 uniformly
at random, meaning that each term ai is equal to either 0 or 1 with probability 12 ,
and furthermore the value ai is independent from the value of aj for i = j . The
18
H. Adams et al.
PH0 for points from Torus
3.5
log10(L0(Xn))
3
2.5
2
data
linear fit =0.50165
asymptotic estimate =0.50034
1.5
1
1.5
2
2.5
3
3.5
4
log10(n)
Fig. 7 Log scale plot of the number n of sampled points from a torus with major radius 5 and
minor radius 3 versus L0 (Xn ). Estimated lines of best fit from L0 (Xn ) have slope approximately
equal to 21 , suggesting a dimension estimate of d = 2. We restrict to zero-dimensional homology
in this setting due to computational limitations
i
corresponding random point in the Cantor set is li=1 2a
. Note that this point is in
3i
C and furthermore is the left endpoint of some interval in Cl . So we are selecting
left endpoints of intervals in Cl uniformly at random, but since l is large this is a
good approximation to sampling from the entire Cantor set according to the Cantor
distribution.
We use a similar procedure to sample at random for our experiments on the
Cantor set cross the interval, on Cantor dust in R2 , on Cantor dust in R3 , and on the
Sierpiński triangle (Fig. 5). The Cantor set cross the interval is C × [0, 1] ⊆ R2 ,
equipped with the Euclidean metric. We computationally sample by choosing a
point from Cl as described in the paragraph above for l = 100,000, and by also
sampling a point from the unit interval [0, 1] uniformly at random. Cantor dust
is the subset C × C of R2 , which we sample by choosing two points from Cl as
described previously. The same procedure is done for the Cantor dust C × C × C in
R3 . The Sierpiński triangle S ⊆ R2 is defined in a similar way to the Cantor set, with
S = ∩∞
l=0 Sl a countable intersection of nested sets S0 ⊇ S1 ⊇ S2 ⊇ · · · . Here each
Sl is a union of 3l triangles. We choose l = 100,000 to be large, and then sample
points uniformly at random from the bottom left endpoints of the triangles in Sl .
More precisely, we choose a ternary sequence {ai }li=1 uniformly at random, meaning
that each term ai is equal to either 0, 1, or 2 with probability 31 . The corresponding
A Fractal Dimension for Measures via Persistent Homology
random point in the Sierpiński triangle is
by
l
⎧
T
⎪
⎪
⎨(0, 0)
vi = (1, 0)T
√
⎪
⎪
⎩ ( 1 , 3 )T
2
2
1
i=1 2i vi
19
∈ R2 , where vector vi is given
if ai = 0
if ai = 1
if ai = 2.
Note this point is in S and furthermore is the bottom left endpoint of some triangle
in Sl .
6 Limiting Distributions
To some metric measure spaces, (X, μ), we are able to assign a finer invariant
that contains more information than just the persistent homology fractal dimension.
Consider the set of the lengths of all intervals in PHi (Xn ), for each homological
dimension i. Experiments suggest that for some X ⊆ Rm , the scaled set of interval
lengths in each homological dimension converges distribution-wise to some fixed
probability distribution which depends on μ and on i.
More precisely, for a fixed probability measure μ, let Fn(i) be the cumulative
distribution function of the i-dimensional persistent homology interval lengths in
PHi (Xn ), where Xn is a sample of n points from X drawn in an i.i.d. fashion
according to μ. If μ is absolutely continuous with respect to the Lebesgue measure
on some compact set, then the function Fn(i) (t) converges pointwise to the Heaviside
step function as n → ∞, since the fraction of interval lengths less than any fixed
ε > 0 is converging to one as n → ∞. More interestingly, for μ a sufficiently nice
(i)
measure on X ⊆ Rm , the rescaled cumulative distribution function Fn (n−1/m t)
may converge to a non-constant curve. A back-of-the-envelope motivation for
this rescaling is that if Li (Xn ) = Cn(m−1)/m with probability one as n → ∞
(Conjecture 1), then the average length of a persistent homology interval length is
Cn(m−1)/m
Li (Xn )
=
,
# intervals
# intervals
which is proportional to n−1/m if the number of intervals is proportional to n. We
make this precise in the following conjectures.
Conjecture 2 Let μ be a probability measure on a compact set X ⊆ Rm , and let μ
be absolutely continuous with respect to the Lebesgue measure. Then the limiting
(i)
distribution F (i) (t) = limn→∞ Fn (n−1/m t), which depends on μ and i, exists.
In Sect. 6.1 we show that Conjecture 2 holds when μ is the uniform distribution
on an interval, and in Sect. 6.2 we perform experiments in higher dimensions.
20
H. Adams et al.
•
? Question 1
Assuming Conjecture 2 is true, what is the limiting rescaled distribution when
μ is the uniform distribution on an m-dimensional ball, or alternatively an mdimensional cube?
Conjecture 3 Let the compact set X ⊆ Rm have positive Lebesgue measure, and let
μ be the corresponding probability measure (i.e., μ is the restriction of the Lebesgue
measure to X, rescaled to have mass one). Then the limiting distribution F (i) (t) =
(i)
limn→∞ Fn (n−1/m t) exists and depends only on m, i, and the volume of X.
•
? Question 2
Assuming Conjecture 3 is true, what is the limiting rescaled distribution when X
has unit volume?
Remark 2 Conjecture 3 is false if μ is not a uniform measure (i.e. a rescaled
Lebesgue measure). Indeed, the uniform measure on a square (experimentally) has
a different limiting rescaled distribution than a (nonconstant) beta distribution on
the same unit square, as seen in Fig. 8.
6.1 The Uniform Distribution on the Interval
In the case where μ is the uniform distribution on the unit interval [0, 1], then
Conjecture 2 is known to be true, and furthermore a formula for the limiting rescaled
distribution is known. If Xn is a subset of [0, 1] drawn uniformly at random, then
(with probability one) the points in Xn divide [0, 1] into n + 1 pieces. The joint
probability distribution function for the lengths of these pieces is given by the flat
Dirichlet distribution, which can be thought of as the uniform distribution
on the n
simplex (the set of all (t0 , . . . , tn ) with ti ≥ 0 for all i, such that ni=0 ti = 1). Note
that the intervals in PH0 (Xn ) have lengths t1 , . . . , tn−1 , omitting t0 and tn which
correspond to the two subintervals on the boundary of the interval.
The probability distribution function of each ti , and therefore of each interval
length in PH0 (Xn ), is the marginal of the Dirichlet distribution, which is given by
the Beta distribution B(1, n) [11]. After simplifying, the cumulative distribution
function of B(1, n) is given by [59]
Fn(0) (t)
B(t; 1, n)
=
=
B(1, n)
t
0
s 0 (1 − s)n−1 ds
Ŵ(1)Ŵ(n)
Ŵ(n+1)
= 1 − (1 − t)n .
A Fractal Dimension for Measures via Persistent Homology
21
ECDF Uniform and Beta Distribution
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
H0 intervals unif.
H intervals beta
0.2
0
H intervals unif.
1
0.1
H intervals beta
1
0
0
0.5
1
1.5
2
2.5
3
3.5
4
Fig. 8 Empirical CDF’s for the H0 and H1 interval lengths computed from 10,000 points sampled
from the unit square according to the uniform distribution and beta distribution with shape and size
parameter both set to 2. The limiting distributions appear to be different
(0)
As n goes to infinity, Fn (t) converges pointwise to the constant function 1.
(0)
However, after rescaling, Fn (n−1 t) converges to a more interesting distribution
(0)
independent of n. Indeed, we have Fn nt = 1−(1− nt )n , and the limit as n → ∞
is
lim Fn(0)
n→∞
t
n
= 1 − e−t .
This is the cumulative distribution function of the exponential distribution with rate
parameter one. Therefore, the rescaled interval lengths in the limit as n → ∞ are
distributed according to the exponential distribution Exp(1).
6.2 Experimental Evidence for Conjecture 2 in the Plane
We now move to the case where μ is the uniform distribution on the unit square in
R2 . It is known that the sum of the edge lengths of the minimal spanning tree, given
by L0 (Xn ) where Xn is a random sample of n points from the unit square, converges
as n → ∞ to Cn1/2 , for a constant C [63]. However, to our knowledge the limiting
22
H. Adams et al.
Fig. 9 Empirical CDF’s for H0 interval lengths, H1 birth times, H1 death times, and H1 interval
lengths computed from an increasing number of n points drawn uniformly from the twodimensional unit square, and rescaled by n1/2
distribution of all (rescaled) edge lengths is not known. We instead analyze this
example empirically. The experiments in Fig. 9 suggest that as n increases, it is
plausible that both Fn(0) (n−1/2 t) and Fn(1) (n−1/2 t) converge in distribution to a
limiting probability distribution.
6.3 Examples where a Limiting Distribution Does Not Exist
In this section we give experimental evidence that the assumption of being a rescaled
Lebesgue measure in Conjecture 2 is necessary. Our example computation is done
on a separated Sierpiński triangle.
For a given separation value δ ≥ 0, the separated
Sierpiński triangle can be
1
defined as the set of all points in R2 of the form ∞
i=1 (2+δ)i vi , where each vector
√
vi ∈ R2 is either (0, 0), (1, 0), or ( 12 , 23 ). The Hausdorff dimension of this selfsimilar fractal shape is log2+δ (3) ([30, Theorem 9.3] or [48, Theorem III]), and note
that when δ = 0, we recover the standard (non-separated) Sierpiński triangle. See
from the
Fig. 10 for a picture when δ = 2. Computationally, when we sample
a point
1
separated Sierpiński triangle, we sample a point of the form li=1 (2+δ)
i vi , where
in our experiments we use l = 100,000.
A Fractal Dimension for Measures via Persistent Homology
23
Fig. 10 Plot of 20,000 points sampled at random from the Sierpiński triangle of separation δ = 2
In the following experiment we sample random points from the separated
Sierpiński triangle with δ = 2. As the number of random points n goes to
infinity, it appears that the rescaled3 CDF of H0 interval lengths are not converging to a fixed probability distribution, but instead to a periodic family of
distributions, in the following sense. If you fix k ∈ N then the distributions on
n = k, 3k, 9k, 27k, . . . , 3j k, . . . points appear to converge as j → ∞ to a fixed
distribution. Indeed, see Fig. 11 for the limiting distribution on 3j points, and for
the limiting distribution on 3j · 2 points. However, the limiting distribution for 3j k
points and the limiting distribution for 3j k ′ points appear to be the same if and only
if k and k ′ differ by a power of 3. See Fig. 12, which shows four snapshots from one
full periodic orbit.
Here is an intuitively plausible explanation for why the rescaled CDFs for the
separated Sierpiński triangle converge to a periodic family of distributions, rather
than a fixed distribution: Imagine focusing a camera at the origin of the Sierpiński
triangle and zooming in. Once you get to (2 + δ)× magnification, you see the same
image again. This is one full period. However, for magnifications between 1× and
(2 + δ)× you see a different image. In our experiments sampling random points,
zooming in by a factor of (2+δ)× is the same thing as sampling three times as many
points (indeed, the Hausdorff dimension is log2+δ (3)). When zooming in you see the
same image only when the magnification is at a multiple of 2 + δ, and analogously
when sampling random points perhaps we should expect to see the same probability
3 Since
the separated Sierpiński triangle has Hausdorff dimension log2+δ (3), the rescaled distribu(0)
tions we plot are Fn (n−1/m t) with m = log2+δ (3).
24
H. Adams et al.
Fig. 11 This figure shows the empirical rescaled CDFs of H0 interval lengths for n = 3j points
(left) and for n = 3j · 2 points (right) sampled from the separated Sierpiński triangle with δ = 2.
Each figure appears to converge to a fixed limiting distribution as j → ∞, but the two limiting
distributions are not equal
H0
1
0.9
0
500 0
500 0
500 0
500 0
500 0
500 0
500 0
500
0
500 0
500 0
500 0
500 0
500 0
500 0
500 0
500
H1
1
0.9
k=1
k=1.25
k=1.5
k=1.75
k=2.0
k=2.25
k=2.5
k=2.75
k=3.0
Fig. 12 Empirical rescaled CDF’s for H0 interval lengths, and H1 interval lengths computed from
an increasing number of n = k · 36 points from the separated Sierpiński triangle with δ = 2,
moving left to right. Note that as k increases between adjacent powers of three, the “bumps" in the
distribution shift to the right, until the starting distribution reappears
distribution of interval lengths only when the number of points is multiplied by a
power of 3.
7 Another Way to Randomly Sample from the Sierpiński
Triangle
An alternate approach to constructing a sequence of measures converging to the
Sierpiński triangle is using a particular Lindenmayer system, which generates
a sequence of instructions in a recursive fashion [49, Figure 7.16]. Halting the
recursion at any particular level l will give a (non-fractal) approximation to the
Sierpiński triangle as a piecewise linear curve with a finite number of segments; see
Fig. 13.
A Fractal Dimension for Measures via Persistent Homology
25
Fig. 13 The Sierpiński triangle as the limit of a sequence of curves. We can uniformly randomly
sample from the curve at level l to generate a sequence of measures μl converging to the Sierpinski
triangle measure as l → ∞
Fig. 14 Scaling behaviors for various “depths” of the Sierpinski arrowhead curves visualized in
Fig. 13
Let μl be the uniform measure on the piecewise linear curve at level l. In Fig. 14
we sample n points from μl and compute Li (Xn ), displayed in a log-log plot.
Since each μl for l fixed is non-fractal (and one-dimensional) in nature, the ultimate
asymptotic behavior will be d = 1 once the number of points n is sufficiently large
26
H. Adams et al.
(depending on the level l). However, for level l sufficiently large (depending on the
number of points n) we see that there is an intermediate regime in the log-log plots
which scale with the expected fractal dimension near log2 (3). We expect a similar
relationship between the number of points n and the level l to hold for many of types
of self-similar fractals.
8 Asymptotic Approximation of the Scaling Exponent
From Definition 6 we consider how to estimate the exponent (d − 1)/d numerically
for a given metric measure space (X, μ). For a fixed number of points n, a pair of
values (n, ℓn ) is produced, where ℓn = Li (Xn ) for a sampling Xn from (X, μ) of
cardinality n. If the scaling holds asymptotically for n sampled past a sufficiently
large point, then we can approximate the exponent by sampling for a range of n
values and observing the rate of growth of ℓn . A common technique used to estimate
power law behavior (see for example [19]) is to fit a linear function to the logtransformed data. The reason for doing this is a hypothesized asymptotic scaling
y ∼ eC x α as x → ∞ becomes a linear function after taking the logarithm: log(y) ∼
C + α log(x).
However, the expected power law in the data only holds asymptotically for n →
∞. We observe in practice that the trend for small n is subdominant to its asymptotic
scaling. Intuitively we would like to throw out the non-asymptotic portion of the
sequence, but deciding where to threshold depends on the sequence. We propose
the following approach to address this issue.
Suppose in general we have a countable set of measurements (n, ℓn ), with n
ranging over some subset of the positive integers. Create a sequence in monotone
increasing order of n so that we have a (nk , ℓnk )∞
k=1 with nk > nj for k > j . For
any pairs of integers p, q with 1 ≤ p < q, we denote the log-transformed data of
the corresponding terms in the sequence as
Spq =
log(nk ), log(ℓnk ) | p ≤ k ≤ q ⊆ R2 .
Each finite collection of points Spq has an associated pair of linear least-squares
coefficients (Cpq , αpq ), where the line of best fit to the set Spq is given by y =
Cpq + αpq x. For our purposes we are more interested in the slope αpq than the
intercept Cpq . We expect that we can obtain the fractal dimension by considering
the joint limits in p and q: if we define α as
α=
lim
p,q→∞
αpq ,
then we can recover the dimension by solving α = d−1
d . A possibly overly restrictive
assumption is that the asymptotic behavior of ℓnk is monotone. If this is the case, we
may expect any valid joint limit p, q → ∞ will be defined and produce the same
A Fractal Dimension for Measures via Persistent Homology
27
value. For example, setting q = p + r we expect the following to hold:
α = lim lim αp,p+r .
p→∞ r→∞
In general, the joint limit may exist under a wider variety of ways in which one
allows q to grow relative to p.
Now define a function A : R2 → R, which takes on values A( p1 , q1 ) = αpq , and
define A(0, 0) so that A is continuous at the origin. Assuming αpq → α as above,
then any sequence (xk , yk )k → (0, 0) will produce the same limiting value A(0, 0)
and the limit lim(x,y)→(0,0) A(x, y) is well-defined. This suggests an algorithm for
finite data:
1. Obtain a collection of estimates αpq for various values of p, q, and then
2. use the data {( p1 , q1 , A( p1 , q1 ))} to extrapolate an estimate for A(0, 0) = α, from
which we can solve for the fractal dimension d.
For simplicity, we currently fix q = nmax and collect estimates varying only p;
i.e., we only collect estimates of the form αp nmax . In practice it is safest to use a
low-order estimator to limit the risks of extrapolation. We use linear fit for the twodimensional data A( p1 , q1 ) to produce a linear approximation Â(ξ, η) = a +bξ +cη,
giving an approximation α = A(0, 0) ≈ Â(0, 0) = a.
Shown in Fig. 15 is an example applied to the function
f (x) = 100x +
1 2
x + 0.1ε(x)
10
(1.2)
with ε = dW (x), with W (x) standard Brownian noise. The theoretical asymptotic
is α = 2 and should be attainable for sufficiently large x and enough sample points
to overcome noise. Note that there is a balance needed to both keep a sufficient
number of points to have a robust estimation (we want q − p to be large) and to
Fig. 15 Left panel: approximations αpq for selections of (p, q) in an artificial function 100x +
1/10x 2 (1 + ε(x)). Center panel: log-absolute-error of the coefficients. Note that the approximation
is generally poor for |p − q| small, due to a small number of sample points. Right panel: same
values, with the coordinates mapped as ξ = 1/p, η = 1/q. The value to be extrapolated is at
(ξ, η) = (0, 0)
28
H. Adams et al.
avoid including data in the pre-asymptotic regime (thus p must be relatively large).
Visually, this is seen near the top side of the triangular region, where the error
drops to roughly the order of 10−3 . The challenge for an arbitrary function is not
knowing precisely where this balance is; see [19, Sections 1, 3.3–3.4] in the context
of estimating xmin (in their language) for the tails of probability density functions.
9 Conclusion
When points are sampled at random from a subset of Euclidean space, there
are a wide variety of Euclidean functionals (such as the minimal spanning tree,
the traveling salesperson tour, the optimal matching) which scale according to
the dimension of Euclidean space [72]. In this paper we explore whether similar
properties are true for persistent homology, and how one might use these scalings in
order to define a persistent homology fractal dimension for measures. We provide
experimental evidence for some of our conjectures, though that evidence is limited
by the sample sizes on which we are able to compute. Our hope is that our
experiments are only a first step toward inspiring researchers to further develop the
theory underlying the scaling properties of persistent homology.
Acknowledgements We would like to thank Visar Berisha, Vincent Divol, Al Hero, Sara Kališnik,
Benjamin Schweinhart, and Louis Scharf for their helpful conversations. We would like to
acknowledge the research group of Paul Bendich at Duke University for allowing us access to
a persistent homology package, which can be accessed via GitLab after submitting a request to
Paul Bendich.
References
1. Henry Adams, Sofya Chepushtanova, Tegan Emerson, Eric Hanson, Michael Kirby, Francis
Motta, Rachel Neville, Chris Peterson, Patrick Shipman, and Lori Ziegelmeier. Persistence
images: A stable vector representation of persistent homology. The Journal of Machine
Learning Research, 18(1):218–252, 2017.
2. Aaron Adcock, Daniel Rubin, and Gunnar Carlsson. Classification of hepatic lesions using the
matching metric. Computer Vision and Image Understanding, 121:36–42, 2014.
3. Robert J Adler, Omer Bobrowski, Matthew S Borman, Eliran Subag, and Shmuel Weinberger.
Persistent homology for random fields and complexes. In Borrowing strength: theory powering
applications—a Festschrift for Lawrence D. Brown, pages 124–143. Institute of Mathematical
Statistics, 2010.
4. Robert J Adler, Omer Bobrowski, and Shmuel Weinberger. Crackle: The persistent homology
of noise. arXiv preprint arXiv:1301.1466, 2013.
5. David Aldous and J Michael Steele. Asymptotics for Euclidean minimal spanning trees on
random points. Probability Theory and Related Fields, 92(2):247–258, 1992.
6. David Aldous and J Michael Steele. The objective method: probabilistic combinatorial
optimization and local weak convergence. In Probability on discrete structures, pages 1–72.
Springer, 2004.
A Fractal Dimension for Measures via Persistent Homology
29
7. Kenneth S Alexander. The RSW theorem for continuum percolation and the CLT for Euclidean
minimal spanning trees. The Annals of Applied Probability, 6(2):466–494, 1996.
8. Mark A Armstrong. Basic topology. Springer Science & Business Media, 2013.
9. Ulrich Bauer. Ripser: A lean C++ code for the computation of Vietoris–Rips persistence
barcodes. Software available at https://github.com/Ripser/ripser, 2017.
10. Paul Bendich, J S Marron, Ezra Miller, Alex Pieloch, and Sean Skwerer. Persistent homology
analysis of brain artery trees. The Annals of Applied Statistics, 10(1):198–218, 2016.
11. Martin Bilodeau and David Brenner. Theory of multivariate statistics. Springer Science &
Business Media, 2008.
12. Omer Bobrowski and Matthew Strom Borman. Euler integration of Gaussian random fields
and persistent homology. Journal of Topology and Analysis, 4(01):49–70, 2012.
13. Omer Bobrowski and Matthew Kahle. Topology of random geometric complexes: A survey.
Journal of Applied and Computational Topology, 2018.
14. Omer Bobrowski, Matthew Kahle, and Primoz Skraba. Maximally persistent cycles in random
geometric complexes. arXiv preprint arXiv:1509.04347, 2015.
15. Paul Breiding, Sara Kalisnik Verovsek, Bernd Sturmfels, and Madeleine Weinstein. Learning
algebraic varieties from samples. arXiv preprint arXiv:1802.09436, 2018.
16. Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society,
46(2):255–308, 2009.
17. Frédéric Chazal, Vin de Silva, and Steve Oudot. Persistence stability for geometric complexes.
Geometriae Dedicata, pages 1–22, 2013.
18. Frédéric Chazal and Vincent Divol. The density of expected persistence diagrams and its kernel
based estimation. arXiv preprint arXiv:1802.10457, 2018.
19. Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. Power-law distributions in
empirical data. SIAM review, 51(4):661–703, 2009.
20. Anne Collins, Afra Zomorodian, Gunnar Carlsson, and Leonidas J. Guibas. A barcode shape
descriptor for curve point cloud data. Computers & Graphics, 28(6):881–894, 2004.
21. Jose A Costa and Alfred O Hero. Determining intrinsic dimension and entropy of highdimensional shape spaces. In Statistics and Analysis of Shapes, pages 231–252. Springer,
2006.
22. Justin Michael Curry. Topological data analysis and cosheaves. Japan Journal of Industrial
and Applied Mathematics, 32(2):333–371, 2015.
23. Colleen D Cutler. Some results on the behavior and estimation of the fractal dimensions of
distributions on attractors. Journal of Statistical Physics, 62(3–4):651–708, 1991.
24. Colleen D Cutler. A review of the theory and estimation of fractal dimension. In Dimension
estimation and models, pages 1–107. World Scientific, 1993.
25. Yuri Dabaghian, Facundo Mémoli, Loren Frank, and Gunnar Carlsson. A topological paradigm
for hippocampal spatial map formation using persistent homology. PLoS computational
biology, 8(8):e1002581, 2012.
26. Vincent Divol and Wolfgang Polonik. On the choice of weight functions for linear representations of persistence diagrams. arXiv preprint arXiv: arXiv:1807.03678, 2018.
27. Herbert Edelsbrunner and John L Harer. Computational Topology: An Introduction. American
Mathematical Society, Providence, 2010.
28. Herbert Edelsbrunner, A Ivanov, and R Karasev. Current open problems in discrete and
computational geometry. Modelirovanie i Analiz Informats. Sistem, 19(5):5–17, 2012.
29. Herbert Edelsbrunner, Anton Nikitenko, and Matthias Reitzner. Expected sizes of Poisson–
Delaunay mosaics and their discrete Morse functions. Advances in Applied Probability,
49(3):745–767, 2017.
30. Kenneth Falconer. Fractal geometry: mathematical foundations and applications; 3rd ed.
Wiley, Hoboken, NJ, 2013.
31. J.D. Farmer. Information dimension and the probabilistic structure of chaos. Zeitschrift für
Naturforschung A, 37(11):1304–1326, 1982.
32. J.D. Farmer, Edward Ott, and James Yorke. The dimension of chaotic attractors. Physica D:
Nonlinear Phenomena, 7(1):153–180, 1983.
30
H. Adams et al.
33. Gerald Folland. Real Analysis. John Wiley & Sons, 1999.
34. Robert Ghrist. Barcodes: The persistent topology of data. Bulletin of the American
Mathematical Society, 45(1):61–75, 2008.
35. Peter Grassberger and Itamar Procaccia. Characterization of strange attractors. Physics Review
Letters, 50(5):346–349, 1983.
36. Peter Grassberger and Itamar Procaccia. Measuring the Strangeness of Strange Attractors. In
The Theory of Chaotic Attractors, pages 170–189. Springer, New York, NY, 2004.
37. Allen Hatcher. Algebraic Topology. Cambridge University Press, Cambridge, 2002.
38. Patrick Jaillet. On properties of geometric random problems in the plane. Annals of Operations
Research, 61(1):1–20, 1995.
39. Matthew Kahle. Random geometric complexes. Discrete & Computational Geometry,
45(3):553–573, 2011.
40. Albrecht M Kellerer. On the number of clumps resulting from the overlap of randomly placed
figures in a plane. Journal of Applied Probability, 20(1):126–135, 1983.
41. Harry Kesten and Sungchul Lee. The central limit theorem for weighted minimal spanning
trees on random points. The Annals of Applied Probability, pages 495–527, 1996.
42. Gady Kozma, Zvi Lotker, and Gideon Stupp. The minimal spanning tree and the upper box
dimension. Proceedings of the American Mathematical Society, 134(4):1183–1187, 2006.
43. Joseph B Kruskal. On the shortest spanning subtree of a graph and the traveling salesman
problem. Proceedings of the American Mathematical society, 7(1):48–50, 1956.
44. H Lee, H Kang, M K Chung, B N Kim, and D S Lee. Persistent brain network homology from
the perspective of dendrogram. IEEE Transactions on Medical Imaging, 31(12):2267–2277,
2012.
45. Javier Lamar Leon, Andrea Cerri, Edel Garcia Reyes, and Rocio Gonzalez Diaz. Gaitbased gender classification using persistent homology. In José Ruiz-Shulcloper and Gabriella
Sanniti di Baja, editors, Progress in Pattern Recognition, Image Analysis, Computer Vision,
and Applications, pages 366–373, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
46. Robert MacPherson and Benjamin Schweinhart. Measuring shape with topology. Journal of
Mathematical Physics, 53(7):073516, 2012.
47. Pertti Mattila, Manuel Morán, and José-Manuel Rey. Dimension of a measure. Studia Math,
142(3):219–233, 2000.
48. Pat A .P. Moran. Additive functions of intervals and Hausdorff measure. Proceedings of the
Cambridge Philosophical Society, 42(1):15–23, 1946.
49. Heinz-Otto Peitgen, Hartmut Jürgens, and Dietmar Saupe. Chaos and fractals: New frontiers
of science. Springer Science & Business Media, 2006.
50. Mathew Penrose. Random geometric graphs, volume 5. Oxford University Press, Oxford,
2003.
51. Mathew D Penrose. The longest edge of the random minimal spanning tree. The annals of
applied probability, pages 340–361, 1997.
52. Mathew D Penrose et al. A strong law for the longest edge of the minimal spanning tree. The
Annals of Probability, 27(1):246–260, 1999.
53. Mathew D Penrose and Joseph E Yukich. Central limit theorems for some graphs in
computational geometry. Annals of Applied probability, pages 1005–1041, 2001.
54. Yakov B Pesin. Dimension theory in dynamical systems: contemporary views and applications.
University of Chicago Press, 2008.
55. Robert Clay Prim. Shortest connection networks and some generalizations. Bell Labs Technical
Journal, 36(6):1389–1401, 1957.
56. Alfréd Rényi. On the dimension and entropy of probability distributions. Acta Mathematica
Hungarica, 10(1–2):193–215, 1959.
57. Alfréd Rényi. Probability Theory. North Holland, Amsterdam, 1970.
58. Vanessa Robins. Computational topology at multiple resolutions: foundations and applications
to fractals and dynamics. PhD thesis, University of Colorado, 2000.
59. M.J. Schervish. Theory of Statistics. Springer Series in Statistics. Springer New York, 1996.
A Fractal Dimension for Measures via Persistent Homology
31
60. Benjamin Schweinhart. Persistent homology and the upper box dimension. arXiv preprint
arXiv:1802.00533, 2018.
61. Benjamin Schweinhart. The persistent homology of random geometric complexes on fractals.
arXiv preprint arXiv:1808.02196, 2018.
62. Benjamin Schweinhart. Weighted persistent homology sums of random Čech complexes. arXiv
preprint arXiv:1807.07054, 2018.
63. J Michael Steele. Growth rates of Euclidean minimal spanning trees with power weighted
edges. The Annals of Probability, pages 1767–1787, 1988.
64. J Michael Steele. Probability and problems in Euclidean combinatorial optimization. Statistical
Science, pages 48–56, 1993.
65. J Michael Steele. Minimal spanning trees for graphs with random edge lengths. In Mathematics
and Computer Science II, pages 223–245. Springer, 2002.
66. J Michael Steele, Lawrence A Shepp, and William F Eddy. On the number of leaves of a
Euclidean minimal spanning tree. Journal of Applied Probability, 24(4):809–826, 1987.
67. J Michael Steele and Luke Tierney. Boundary domination and the distribution of the largest
nearest-neighbor link in higher dimensions. Journal of Applied Probability, 23(2):524–528,
1986.
68. Andrew Tausz, Mikael Vejdemo-Johansson, and Henry Adams. Javaplex: A research software
package for persistent (co)homology. In International Congress on Mathematical Software,
pages 129–136, 2014. Software available at http://appliedtopology.github.io/javaplex/.
69. James Theiler. Estimating fractal dimension. JOSA A, 7(6):1055–1073, 1990.
70. Robert W Vallin. The elements of Cantor sets: with applications. John Wiley & Sons, 2013.
71. Kelin Xia and Guo-Wei Wei. Multidimensional persistence in biomolecular data. Journal of
Computational Chemistry, 36(20):1502–1520, 2015.
72. Joseph E Yukich. Probability theory of classical Euclidean optimization problems. Springer,
2006.
73. Xiaojin Zhu. Persistent homology: An introduction and a new text representation for natural
language processing. In IJCAI, pages 1953–1959, 2013.