A Fast and Robust Algorithm to Count Topologically Persistent Holes in Noisy Clouds

Preprocessing a 2D image often produces a noisy cloud of interest points. We study the problem of counting holes in noisy clouds in the plane. The holes in a given cloud are quantified by the topological persistence of their boundary contours when the cloud is analyzed at all possible scales. We design the algorithm to count holes that are most persistent in the filtration of offsets (neighborhoods) around given points. The input is a cloud of n points in the plane without any user-defined parameters. The algorithm has a near linear time and a linear space O(n). The output is the array (number of holes, relative persistence in the filtration). We prove theoretical guarantees when the algorithm finds the correct number of holes (components in the complement) of an unknown shape approximated by a cloud.


Introduction: counting holes in noisy clouds
We apply methods from the new area of topological data analysis to counting persistent holes in a noisy cloud of points.Such a cloud can be obtained by selecting interest points in a gray scale or RGB image.Our region-based method uses global topological properties of contours.
By a shape we mean any subset X ⊂ R 2 that can be split into finitely many (topological) triangles.Hence X is bounded, but may not be connected.Then a hole in a shape X ⊂ R 2 is a bounded connected component of the complement R 2 − X.Such a hole can be a disk, a ring or may have a more complicated topological form, see Fig. 1.
The α-offset X α is the union ∪ p∈X B(p; α) of disks with the radius α ≥ 0 and centers at all p ∈ X.For instance, X 0 is the original shape X ⊂ R 2 .When α is increasing, the holes of R 2 − X α are shrinking, may split into smaller newborn holes and will eventually die, each at its own death time α, see Fig. 3.The persistence of a hole is its life span death − birth in the filtration {X α } of all α-offsets.So we quantify holes by their persistence at different scales α.Hole counting problem.Let a shape X be represented by a finite sample C of points in R 2 .Find conditions on X and its sample when one can quickly count persistent holes.
We solve the problem by the algorithm HOCTOP : Hole Counting based on Topological Persistence.The only input is a finite cloud C of n points approximating an unknown shape X ⊂ R 2 .The algorithm outputs the relative persistence of k holes in the filtration {C α } for all k ≥ 0. If the scale α is random and uniform, this output gives probabilities P (k holes).The boundary edges of persistent holes can be quickly post-processed to extract all boundary contours.Theorems 1, 4 say that the algorithm HOCTOP quickly and correctly finds all persistent holes using only a good enough sample C of an unknown shape X, see section 2.

Main results: the algorithm and guarantees
We start from a high-level description of our algorithm.The topological persistence of contours in the filtration {C α } is computed by using a Delaunay triangulation Del(C) of a given cloud C ⊂ R 2 of n points.By Nerve Lemma 8 the α-offsets C α can be continuously deformed to the α-complexes C(α), which filter Del(C) as follows: Each C(α) has some edges and triangles from Del(C).
. The big hole in the green offset C α is born at α = 1.5, splits into 2 smaller holes at α = 2 and dies at α ≈ 2.577, so the topological persistence of this hole is death − birth ≈ 1.077.
The graph dual to Del(C) is filtered by the subgraphs C * (α) whose connected components correspond to holes in C(α).When α is decreasing, C(α) is shrinking, so its holes are growing and corresponding components of C * (α) merge at critical values of α, see Fig. 6.The persistence of cycles in the filtration {C α } corresponds to the persistence of components in {C * (α)}, see Duality Lemma 14.
The pairs (birth, death) of connected components in {C * (α)} are found via a union-find structure by adding edges and merging components.So computing the 1dimensional persistence of cycles in {C α } reduces to the 0-dimensional persistence of components in {C * (α)}.
Starting from a given cloud C ⊂ R 2 of n points with real coordinates (x i , y i ), i = 1, . . ., n, we find a Delaunay triangulation Del(C) in O(n log n) time with O(n) space.Then we remove each edge of Del(C) one by one in the decreasing order of their length.Removing an edge may break a contour when adjacent regions in C(α) and the corresponding components of C * (α) merge.In the case of a merger, a younger component of C * (α) and the corresponding hole in C(α) die.We note the birth and death of each dead hole.We get the probability of k holes as the relative length of all intervals of the scale α when C α ⊂ R 2 has k holes.
Theorem 1.The algorithm HOCTOP counts all holes in a given cloud C ⊂ R 2 of n points in O(n log n) time with O(n) space.All holes are ordered by their topological persistence in the ascending filtration {C α } of the α-offsets.

Definition 2 (ε-sample).
A cloud C is an ε-sample of a shape X ⊂ R 2 if X ⊂ C α and C ⊂ X α .So any point of C is within the distance ε from a point of X and any point of X is at most ε away from a point of C. Hence ε can be considered as the upper bound of some arbitrary noise.Definition 3 (min and max homological feature sizes).For any shape X ⊂ R 2 , let α = minhfs(X) be the minimum homological feature size when a first hole is born or dies in X α .Let α = maxhfs(X) be the maximum homological feature size after which no holes are born or die in X α .
Theorem 4 gives sufficient (not necessary) conditions when the algorithm finds the correct number of holes in an unknown shape X ⊂ R 2 that is represented by its finite sample C. We extend the Homology Inference Theorem [4] to the case when the upper bound ε of noise is unknown.
Theorem 4. Let a cloud C be an ε-sample of a shape X ⊂ R 2 with an unknown parameter ε such that minhfs(X) > 1 2 maxhfs(X) + 4ε.If no new holes are appear in X α when α is increasing, then the algorithm HOCTOP finds the correct number of holes in X by using only the cloud C.
The condition minhfs(X) > 1  2 maxhfs(X) + 4ε means that all holes of X, which are bounded components of R 2 − X, have comparable sizes (neither tiny nor huge).
Even if the conditions of Theorem 4 are not satisfied, we can always find the number k of holes with the highest probability.The algorithm HOCTOP can also accept a signalto-noise ratio τ and output all holes whose persistence is larger than τ .Alternatively, the user may prefer to get most likely outputs ordered by the probability P (k holes).

Previous work on computing persistence
The offsets C α of a finite cloud C are usually studied through the Cech or Rips complexes, which may contain up to The fastest algorithm [8] for computing persistence of a filtration in all dimensions has the same running time O(m 2.376 ) in the number m of simplices as the best known time for the multiplication of two m × m matrices.
In dimension 0 the persistence can be computed in almost linear time [6, p. 6-8], which was used for simplifying functions on surfaces [1] and for approximating persistence of an unknown scalar field from its values on a sample [3].
Two extra parameters were used in a Delaunay-based image segmentation [7]: α for the radius of disks centered at points of a cloud C and p for a desired level of persistence.

Delaunay triangulation and α-complexes
Definition 5 (simplicial complex).A simplicial 2-complex is a finite set of simplices (vertices, edges, triangles): • the sides of any triangle are included in the complex; • the endpoints of any edge are included in the complex; • two triangles can intersect only along a common edge; • edges can meet only at a common endpoint (a vertex); • an edge can not pierce through the interior of a triangle.
If a complex S is drawn in R n without self-intersections, we may call this image |S| a geometric realization of S. We have defined a shape X ⊂ R 2 as a geometric realization of a 2-complex.For instance, a round disk whose boundary is split into 3 edges by 3 vertices is a topological triangle.
A cycle in a complex is a sequence of edges e 1 , . . ., e m such that any consecutive edges e i , e i+1 (in the cyclic order) have a common vertex.Any loop in a geometric realization |S| continuously deforms to a cycle of edges in S.

Definition 6 (Delaunay triangulation Del
is the set of all points q that are (non-strictly) closer to p i than to other points of C. The Delaunay triangulation Del(C) is the nerve of the Voronoi diagram ∪ p∈C V (p).Namely, p, q, r ∈ C span a triangle if and only if If α > 0 is very small, all points of C are disjoint in C(α), while C(α) = Del(C) for any large enough α, see examples in Fig. 3.So all α-complexes form the filtration Edges or triangles are added only at critical values of α.
Lemma 8 (Nerve of a ball covering [5]).The union of balls C α = ∪ p∈C B(p; α) continuously deforms to (has the homotopy type of) a geometric realization of C(α).

Persistent homology: definitions, examples
Definition 9 (1-dimensional homology H 1 ).We consider the 1-dimensional homology group H 1 (S) only with coefficients in Z/2Z = {0, 1}.Cycles of a 2-dimensional complex S can be algebraically written as linear combinations of edges (with coefficients 0 or 1) and generate the vector space C 1 of cycles.The boundaries of all triangles in S (as cycles of 3 edges) generate the subspace By a filtration {S(α)} we mean a sequence of nested complexes S(0) ⊂ • • • ⊂ S(α) ⊂ . . .that change only at finitely many critical values α 1 , . . ., α m .Then we get the induced linear maps ) for any α < α i .The class γ dies at the first time α j = death(γ) ≥ α i when the image of γ under H 1 (S(α i )) → H 1 (S(α j )) merges into the image of H 1 (S(α)) → H 1 (S(α j )) for some α < α i .The class γ has the persistence death(γ)−birth(γ).The point (α i , α j ) has the multiplicity µ ij equal to the number of independent classes that are born at α i and die at α j .The persistence diagram PD{S(α)} in {(x, y) ∈ R 2 : x ≤ y} is the multiset consisting of all points (α i , α j ) with the multiplicity µ ij and all diagonal points (x, x) with the infinite multiplicity.
Pairs with a low persistence death − birth (close to the diagonal {x = y} in PD) are treated as noise.Pairs with a high persistence represent persistent cycles in {S(α)}.
We shall consider the filtrations of α-offsets {X α } and {C α } for a shape X ⊂ R  We suggest one more way to visualize persistence.Each pair (birth, death) defines the function f (α) = 1 for birth ≤ α < death and f (α) = 0 otherwise.The sum of these functions over all pairs gives the persistence staircase PS{C α }.The value of this piecewise constant function of α is the number of holes in the offset C α .We have connected consecutive horizontal segments of PS{C α } to get a 'continuous' staircase as in the right picture of Fig. 4.
For the cloud C of 10 points in Fig. 3, the full range of the scale α is from the smallest critical value α = 1.5 (when a first hole is born) to the largest critical value α =

Persistent homology: stability and duality
Definition 11 (bottleneck distance d B ).Let the distance between p = (x 1 , y 1 ), Stability Theorem 12 implies for barcodes PB that the endpoints of all bars are perturbed by at most ε.So a long bar can become only a bit shorter after adding noise.
To every triangle in the Delaunay triangulation Del(C), let us associate a single abstract vertex v i , i = 1, . . ., k.It will be convenient to call the external region of Del(C) also a 'triangle' and represent it by an extra vertex v 0 .
Definition 13 (graphs C * (α)).For any vertices v i , v j representing adjacent triangles in Del(C), let d ij be the length of the (longest) common edge of the triangles.The metric graph C * dual to Del(C) has the vertices v 0 , v 1 , . . ., v k and edges of the length d ij connecting vertices v i , v j that represent adjacent triangles, see Fig. 6.The graph C * is filtered by the subgraphs C * (α) that have only the edges of a length d ij > 2α.We remove any isolated node v (except v 0 ) from C * (α) if the corresponding triangle T v is not acute or has a small circumradius rad(v) ≤ α.We get the filtration C(1/ 2) Components of C * (α) are called white, because they represent regions in , so γ 'encloses' the corresponding white component of C * (α).Lemma 14 is an analogue of the Symmetry Theorem [6, p. 164] for a function on a closed manifold.
Lemma 14 (Duality).All contours of the complex C(α) are in a 1-1 correspondence with all connected components of the graph C * (α) not containing the vertex v 0 .When α is decreasing, the contours of C(α) and the white components of C * (α) have the corresponding critical moments: • a birth of a contour ↔ a birth of a white component, • a death of a contour ↔ a death of a white component.

The algorithm HOCTOP for counting holes
We build the union-find structure Forest(α) on the vertices of the graph C * (α).All nodes and trees of Forest(α) will be in a 1-1 correspondence with all vertices and white components of C * (α).Every node v in Forest(α) has • a pointer to a unique parent of the node v in Forest(α); • a pointer to the Delaunay triangle dual to the node v; • the weight (the number of nodes below v in its tree); If a node v is a self-parent, we call v a root.We can find root(v) of any node v by going up along parent links.If α is decreasing, α v can be considered as the birth time when the vertex v joins C * (α).The algorithm initializes Forest(α) as the set of isolated nodes v 0 , . . ., v k .If the triangle corresponding to v k is acute, the birth time of v k is the circumradius of the triangle, otherwise 0. We will go through all edges of Del(C) in the decreasing order of their length and will update α v when v enters the ascending filtration All triangles of C(α) and the corresponding nodes of Forest(α) are called gray.The remaining triangles and the external region of Del(C) are called white.The external region has birth time +∞ and is called a 'triangle' for simplicity.Initially all triangles with birth time 0 are gray.
The while loop.For each edge e ⊂ Del(C) arriving in the decreasing order of length, we find two triangles T u , T v attached to e and check if they are gray or white.To determine if a triangle T v represented by a node v is gray, we go up along parent links from v to root(v).If the birth time of root(v) is 0, the triangle T v is still gray, otherwise white.
To distinguish Cases 1 and 4 below, we also check if the triangles T u , T v attached to the current edge e are in the same region of R 2 − C(α).Case 1 means that the nodes u, v ∈ Forest(α) belong to the same tree, so root(u) = root(v).In all 4 cases the scale α goes down through the half-length 1  2 length(e) of the current edge e from Del(C).To decide which white component dies, we find the roots root(u), root(v) ∈ Forest(α) of the trees representing R u , R v and compare the birth times α root(u) , α root(v) when a first node of each tree was born.By the elder rule [6, p. 150], the older white component (say, with u) survives and keeps its larger birth time α root(u) .The younger white component R v dies and we get (birth, death) = ( 12 length(e), α root(v) ) for the life of the white component in the ascending filtration {C * (α)} and of the corresponding contour in the descending filtration {C(α)}.
We swapped the birth and death times, because the persistence is usually defined when the scale α is increasing.However, we need the ascending filtration {C * (α)} to use a union-find structure, so α is decreasing in the algorithm.
Finally, to merge the trees with root(u), root(v) in Forest(α), we compare the weights of the roots and set the root of the (non-strictly) larger tree as the parent for the root of another tree.So the size of any subtree grows by a factor of at least 2 each time when we pass to the parent.We get Lemma 15.By the above construction the longest path in any tree of size k from Forest(α) has length O(log k).Proof of Theorem 4. The important critical values of α for the 1-dimensional homology of the filtration {X α } are • α = minhfs(X) is the 1st value when H 1 (X α ) changes; • α = maxhfs(X) is the last value when H 1 (X α ) changes.

Proofs of main results and our conclusion
No new holes appear in offsets X α of the shape X with original k holes.Then PD{X α } contains only points (0, d i ).The smallest death is d 1 = minhfs(X).The largest death is d k = maxhfs(X).If a cloud C is an ε-sample of a shape X ⊂ R 2 , the perturbed diagram PD{C α } has only points ε-close to (0,

Figure 1 .
Figure 1.The orange shape X ⊂ R 2 with 3 white holes of different forms: a small disk, a ring-like hole, a 'figure-eight' hole.
By another definition [2, section 9.1] the circumcircle of any Delaunay triangle in Del(C) encloses no points of C. For a cloud C ⊂ R 2 of n points, let Del(C) have k triangles and b boundary edges in the external region.Counting all E edges over triangles, we get 3k + b

2
and a finite cloud C ⊂ R 2 .Figures 4 and 5 show the persistence diagram PD for the filtration of the α-offsets C α equivalent to C(α) by Lemma 8.

Figure 4 .
Figure 4. Extra outputs for the cloud C of 10 points in Fig. 3. Left: persistence diagram, middle: barcode, right: persistence staircase.We can convert the persistence diagram into the persistence barcode PB{C α }.All pairs (birth, death) give horizontal bars ordered by their length death − birth.Usually the bars are drawn from the left endpoint 0 to the right endpoint death − birth, see the middle picture in Fig. 4.We suggest one more way to visualize persistence.Each pair (birth, death) defines the function f (α) = 1 for

Figure 5 .
Figure 5. Extra outputs for the cloud C of 1251 points in Fig. 2. Left: persistence diagram PD, middle: barcode PB, right: staircase PS.

Figure 6 .
Figure 6.The complexes C(α) have solid edges and gray triangles.The graphs C * (α) have circled vertices and red dashed edges.

Case 1 :
e has the same white region on both sides of e. C(α) loses only the open edge e.The white components of C * (α) are unchanged.Fig. 6 illustrates Case 1 for α = 1 when C(α) loses the edge connecting (1, 0) to (1, 2).

Proof of Theorem 1 .
Constructing the Delaunay triangulation Del(C) on a cloud of n points requires O(n log n) time [2, Chapter 9].Sorting O(n) edges of Del(C) needs O(n log n) time.Then we go through the while loop analyzing each of the O(n) edges of Del(C).For the nodes u, v ∈ Forest(α) of triangles attached to each edge e, we find the roots of u, v by going up along O(log n) parent links by Lemma 15.All other steps in the while loop require only O(1) time.Hence the total time is O(n log n).The sizes of all data structures are proportional to the numbers of edges or triangles in Del(C), so we use O(n) space.The careful analysis of a union-find structure says that Forest(α) can be built in time O(nA −1 (n, n)) time, where A −1 (n, n) is the extremely slowly growing inverse Ackermann function.Our time O(n log n) is dominated by the construction of Del(C) and sorting all O(n) edges.
d i ) or to the diagonal {x = y} in the L ∞ distance on the plane by Stability Theorem 12.The strip {2ε < y − x < d 1 − 2ε} is the largest empty strip in PD{C α } due to the given conditiond 1 > 1 2 d k + 4ε or (d 1 −2ε)−2ε > (d k +2ε)−(d 1 −2ε).Then we can detect this strip in PD{C α } without using ε.Hence PD{C α } has exactly k points above y − x = d 1 − 2ε close to (0, d i ) corresponding to k holes of the unknown shape X.Conclusion.Here are the key advantages of our approach: • a cloud C ⊂ R 2 of n points is simultaneously analyzed at all scales α without any extra user-defined parameters;• the algorithm HOCTOP counts persistent holes of any topological form in O(n log n) time, see Theorem 1;• theoretical guarantees for a correct number of holes are proved for ε-samples of unknown shapes, see Theorem 4;• the output is stable under perturbations of a cloud C and the only parameter of noise is an unknown upper bound ε.