{"title": "Large Scale computation of Means and Clusters for Persistence Diagrams using Optimal Transport", "book": "Advances in Neural Information Processing Systems", "page_first": 9770, "page_last": 9780, "abstract": "Persistence diagrams (PDs) are now routinely used to summarize the underlying topology of complex data. Despite several appealing properties, incorporating PDs in learning pipelines can be challenging because their natural geometry is not Hilbertian. Indeed, this was recently exemplified in a string of papers which show that the simple task of averaging a few PDs can be computationally prohibitive. We propose in this article a tractable framework to carry out standard tasks on PDs at scale, notably evaluating distances, estimating barycenters and performing clustering. This framework builds upon a reformulation of PD metrics as optimal transport (OT) problems. Doing so, we can exploit recent computational advances: the OT problem on a planar grid, when regularized with entropy, is convex can be solved in linear time using the Sinkhorn algorithm and convolutions. This results in scalable computations that can stream on GPUs. We demonstrate the efficiency of our approach by carrying out clustering with diagrams metrics on several thousands of PDs, a scale never seen before in the literature.", "full_text": "Large Scale computation of Means and Clusters for\n\nPersistence Diagrams using Optimal Transport\n\nTh\u00e9o Lacombe\n\nDatashape\nInria Saclay\n\ntheo.lacombe@inria.fr\n\nMarco Cuturi\n\nGoogle Brain, and\nCREST, ENSAE\n\ncuturi@google.com\n\nSteve Oudot\nDatashape\nInria Saclay\n\nsteve.oudot@inria.fr\n\nAbstract\n\nPersistence diagrams (PDs) are now routinely used to summarize the underlying\ntopology of complex data. Despite several appealing properties, incorporating\nPDs in learning pipelines can be challenging because their natural geometry is not\nHilbertian. Indeed, this was recently exempli\ufb01ed in a string of papers which show\nthat the simple task of averaging a few PDs can be computationally prohibitive.\nWe propose in this article a tractable framework to carry out standard tasks on\nPDs at scale, notably evaluating distances, estimating barycenters and performing\nclustering. This framework builds upon a reformulation of PD metrics as optimal\ntransport (OT) problems. Doing so, we can exploit recent computational advances:\nthe OT problem on a planar grid, when regularized with entropy, is convex can be\nsolved in linear time using the Sinkhorn algorithm and convolutions. This results in\nscalable computations that can stream on GPUs. We demonstrate the ef\ufb01ciency of\nour approach by carrying out clustering with diagrams metrics on several thousands\nof PDs, a scale never seen before in the literature.\n\nIntroduction\n\n1\nTopological data analysis (TDA) has been used successfully in a wide array of applications, for\ninstance in medical (Nicolau et al., 2011) or material (Hiraoka et al., 2016) sciences, computer\nvision (Li et al., 2014) or to classify NBA players (Lum et al., 2013). The goal of TDA is to\nexploit and account for the complex topology (connectivity, loops, holes, etc.) seen in modern data.\nThe tools developed in TDA are built upon persistent homology theory (Edelsbrunner et al., 2000;\nZomorodian & Carlsson, 2005; Edelsbrunner & Harer, 2010) whose main output is a descriptor called\na persistence diagram (PD) which encodes in a compact form\u2014roughly speaking, a point cloud in\nthe upper triangle of the square [0, 1]2\u2014the topology of a given space or object at all scales.\nStatistics on PDs. Persistence diagrams have appealing properties: in particular they have been\nshown to be stable with respect to perturbations of the input data (Cohen-Steiner et al., 2007;\nChazal et al., 2009, 2014). This stability is measured either in the so called bottleneck metric or\nin the p-th diagram distance, which are both distances that compute optimal partial matchings.\nWhile theoretically motivated and intuitive, these metrics are by de\ufb01nition very costly to compute.\nFurthermore, these metrics are not Hilbertian, preventing a faithful application of a large class of\nstandard machine learning tools (k-means, PCA, SVM) on PDs.\nRelated work. To circumvent the non-Hilbertian nature of the space of PDs, one can of course map\ndiagrams onto simple feature vectors. Such features can be either \ufb01nite dimensional (Carri\u00e8re et al.,\n2015; Adams et al., 2017), or in\ufb01nite through kernel functions (Reininghaus et al., 2015; Bubenik,\n2015; Carri\u00e8re et al., 2017). A known drawback of kernel approaches on a rich geometric space such\nas that formed by PDs is that once PDs are mapped as feature vectors, any ensuing analysis remains in\nthe space of such features (the \u201cinverse image\u201d problem inherent to kernelization). They are therefore\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: Illustration of differences between Fr\u00e9chet means with Wasserstein and Euclidean geometry. The\ntop row represents input data, namely persistence diagrams (left), vectorization of PDs as persistence images\nin R100\u00d7100 (middle, (Adams et al., 2017)), and discretization of PDs as histograms (right). The bottom row\nrepresents the estimated barycenters (orange scale) with input data (shaded), using the approach of Turner et al.\n(2014) (left), the arithmetic mean of persistence images (middle) and our optimal tranport based approach (right).\nnot helpful to carry out simple tasks in the space of PDs, such as that of averaging PDs, namely\ncomputing the Fr\u00e9chet mean of a family of PDs. Such problems call for algorithms that are able to\noptimize directly in the space of PDs, and were \ufb01rst addressed by Mileyko et al. (2011) and Turner\n(2013). Turner et al. (2014) provides an algorithm that converges to a local minimum of the Fr\u00e9chet\nfunction by successive iterations of the Hungarian algorithm. However, the Hungarian algorithm\ndoes not scale well with the size of diagrams, and non-convexity yields potentially convergence to\nbad local minima.\nContributions. We reformulate the computation of diagram metrics as an optimal transport (OT)\nproblem, opening several perspectives, among them the ability to bene\ufb01t from entropic regulariza-\ntion (Cuturi, 2013). We provide a new numerical scheme to bound OT metrics, and therefore diagram\nmetrics, with additive guarantees. Unlike previous approximations of diagram metrics, ours can be\nparallelized and implemented ef\ufb01ciently on GPUs. These approximations are also differentiable,\nleading to a scalable method to compute barycenters of persistence diagrams. In exchange for a\ndiscretized approximation of PDs, we recover a convex problem, unlike previous formulations of\nthe barycenter problem for PDs. We demonstrate the scalability of these two advances (accurate\napproximation of the diagram metric at scale and barycenter computation) by providing the \ufb01rst\ntractable implementation of the k-means algorithm in the space of PDs.\nNotations for matrix and vector manipulations. When applied to matrices or vectors, operators\nexp, log, division, are always meant element-wise. u (cid:12) v denotes element-wise multiplication\n(Hadamard product) while Ku denotes the matrix-vector product of K \u2208 Rd\u00d7d and u \u2208 Rd.\n2 Background on OT and TDA\nOT. Optimal transport is now widely seen as a central tool to compare probability measures (Villani,\ni=1 ai\u03b4xi , \u03bd =(cid:80)m\ncombinations of diracs, \u00b5 =(cid:80)n\n2003, 2008; Santambrogio, 2015). Given a space X endowed with a cost function c : X \u00d7 X \u2192 R+,\nsatisfying(cid:80)\nwe consider two discrete measures \u00b5 and \u03bd on X , namely measures that can be written as positive\n+, b \u2208 Rm\nj=1 bj\u03b4yj with weight vectors a \u2208 Rn\nj bj and all xi, yj in X . The n \u00d7 m cost matrix C = (c(xi, yj))ij and the\ntransportation polytope \u03a0(a, b) := {P \u2208 Rn\u00d7m\n|P 1m = a, P T 1n = b} de\ufb01ne an optimal transport\nproblem whose optimum LC can be computed using either of two linear programs, dual to each other,\n\ni ai =(cid:80)\n\n+\n\n+\n\nLC(\u00b5, \u03bd) := min\n\nP\u2208\u03a0(a,b)(cid:104)P, C(cid:105) = max\n\n(1)\nwhere (cid:104)\u00b7,\u00b7(cid:105) is the Frobenius dot product and \u03a8C is the set of pairs of vectors (\u03b1, \u03b2) in Rn \u00d7 Rm such\nthat their tensor sum \u03b1 \u2295 \u03b2 is smaller than C, namely \u2200i, j, \u03b1i + \u03b2j \u2264 Cij. Note that when n = m\nand all weights a and b are uniform and equal, the problem above reduces to the computation of an\noptimal matching, that is a permutation \u03c3 \u2208 Sn (with a resulting optimal plan P taking the form\nPij = 1\u03c3(i)=j). That problem has clear connections with diagram distances, as shown in \u00a73.\n\n(\u03b1,\u03b2)\u2208\u03a8C (cid:104)\u03b1, a(cid:105) + (cid:104)\u03b2, b(cid:105)\n\n2\n\nInputsPDs,d2PImages,k\u00b7k2Histograms,L\u03b3CBarycenters\fFigure 2: Sketch of persistent homology. X = R3 and f (x) = minp\u2208P (cid:107)x \u2212 p(cid:107) so that sublevel sets of f are\nunions of balls centered at the points of P . First (resp second) coordinate of points in the persistence diagram\nencodes appearance scale (resp disappearance) of cavities in the sublevel sets of f. The isolated red point\naccounts for the presence of a persistent hole in the sublevel sets, inferring the underlying spherical geometry of\nthe input point cloud.\n\nEntropic Regularization. Solving the optimal transport problem is intractable for large data. Cuturi\nproposes to consider a regularized formulation of that problem using entropy, namely:\n\nL\u03b3\n\n= max\n\nwhere \u03b3 > 0 and h(P ) := \u2212\nthat problem has a unique solution P \u03b3 which takes the form, using \ufb01rst order conditions,\n\n\u03b1i+\u03b2j\u2212Ci,j\n\nC(a, b) := min\n\nP\u2208\u03a0(a,b)(cid:104)P, C(cid:105) \u2212 \u03b3h(P )\n(cid:80)\n\u03b1\u2208Rn,\u03b2\u2208Rm (cid:104)\u03b1, a(cid:105) + (cid:104)\u03b2, b(cid:105) \u2212 \u03b3\nij Pij(log Pij \u2212 1). Because the negentropy is 1-strongly convex,\nP \u03b3 = diag(u\u03b3)Kdiag(v\u03b3) \u2208 Rn\u00d7m,\n(4)\n\u03b3 (term-wise exponentiation), and (u\u03b3, v\u03b3) \u2208 Rn \u00d7 Rm is a \ufb01xed point of the\n\nwhere K = e\u2212 C\nSinkhorn map (term-wise divisions):\n\ne\n\n\u03b3\n\n,\n\n(2)\n\n(3)\n\n(cid:88)\n\ni,j\n\n(cid:18) a\n\n(cid:19)\n\nS : (u, v) (cid:55)\u2192\n\n,\n\nb\n\nK T u\n\nKv\n\n.\n\n(5)\n\nNote that this \ufb01xed point is the limit of any sequence (ut+1, vi+1) = S(ut, vt), yielding a straightfor-\nward algorithm to estimate P \u03b3. Cuturi considers the transport cost of the optimal regularized plan,\nS\u03b3\nC(a, b) := (cid:104)P \u03b3, C(cid:105) = (u\u03b3)T (K (cid:12) C)v\u03b3, to de\ufb01ne a Sinkhorn divergence between a, b (here (cid:12) is\nthe term-wise multiplication). One has that S\u03b3\nC(a, b) \u2192 LC(a, b) as \u03b3 \u2192 0, and more precisely P \u03b3\nconverges to the optimal transport plan solution of (1) with maximal entropy. That approximation can\nbe readily applied to any problem that involves terms in LC, notably barycenters (Cuturi & Doucet,\n2014; Solomon et al., 2015; Benamou et al., 2015).\nEulerian setting. When the set X is \ufb01nite with cardinality d, \u00b5 and \u03bd are entirely characterized\nby their probability weights a, b \u2208 Rd\n+ and are often called histograms in a Eulerian setting. When\nX is not discrete, as when considering the plane [0, 1]2, we therefore have a choice of representing\nmeasures as sums of diracs, encoding their information through locations, or as discretized histograms\non a planar grid of arbitrary granularity. Because the latter setting is more effective for entropic\nregularization (Solomon et al., 2015), this is the approach we will favor in our computations.\nPersistent homology and Persistence Diagrams. Given a topological space X and a real-valued\n(cid:0)f\u22121((\u2212\u221e, t])(cid:1)\nfunction f : X \u2192 R, persistent homology provides\u2014under mild assumptions on f, taken for\nof the form Dgm(f ) =(cid:80)n\ngranted in the remaining of this article\u2014a topological signature of f built on its sublevel sets\nt\u2208R, and called a persistence diagram (PD), denoted as Dgm(f ). In practice, it is\n> :=\n{(s, t) \u2208 R2|s < t}. Each point (s, t) in Dgm(f ) can be understood as a topological feature\n(connected component, loop, hole...) which appears at scale s and disappears at scale t in the sublevel\nsets of f. Comparing the persistence diagrams of two functions f, g measures their difference from a\ntopological perspective: presence of some topological features, difference in appearance scales, etc.\nThe space of PDs is naturally endowed with a partial matching metric de\ufb01ned as (p \u2265 1):\n\ni=1 \u03b4xi, namely a point measure with \ufb01nite support included in R2\n\ndp(D1, D2) :=\n\n\u03b6\u2208\u0393(D1,D2)\n\n(x,y)\u2208\u03b6\n\ns\u2208D1\u222aD2\\\u03b6\n\n(cid:107)s \u2212 \u03c0\u2206(s)(cid:107)p\n\np\n\n,\n\n(6)\n\n\uf8eb\uf8ed min\n\n(cid:88)\n\n(cid:88)\n\n(cid:107)x \u2212 y(cid:107)p\n\np +\n\n3\n\n\uf8f6\uf8f8 1\n\np\n\nPersistencediagramSublevelssetsoff=distPInputdata:pointcloudP\fwhere \u0393(D1, D2) is the set of all partial matchings between points in D1 and points in D2 and \u03c0\u2206(s)\ndenotes the orthogonal projection of an (unmatched) point s to the diagonal {(x, x) \u2208 R2, x \u2208 R}.\nThe mathematics of OT and diagram distances share a key idea, that of matching, but differ on an\nimportant aspect: diagram metrics can cope, using the diagonal as a sink, with measures that have a\nvarying total number of points. We solve this gap by leveraging an unbalanced formulation for OT.\n\n3 Fast estimation of diagram metrics using Optimal Transport\nIn the following, we start by explicitly formulating (6) as an optimal transport problem. Entropic\nsmoothing provides us a way to approximate (6) with controllable error. In order to bene\ufb01t mostly\nfrom that regularization (matrix parallel execution, convolution, GPU\u2014as showcased in (Solomon\net al., 2015)), implementation requires speci\ufb01c attention, as described in propositions 2, 3, 4.\nPD metrics as Optimal Transport. The main differences between (6) and (1) are that PDs do not\ngenerally have the same mass, i.e. number of points (counted with multiplicity), and that the diagonal\nplays a special role by allowing to match any point x in a given diagram with its orthogonal projection\n\u03c0\u2206(x) onto the diagonal. Guittet\u2019s formulation for partial transport (2002) can be used to account for\nthis by creating a \u201csink\u201d bin corresponding to that diagonal and allowing for different total masses.\nThe idea of representing the diagonal as a single point already appears in the bipartite graph problem\nof Edelsbrunner & Harer (2010) (Ch.VIII). The important aspect of the following proposition is the\nclari\ufb01cation of the partial matching problem (6) as a standard OT problem (1).\nLet R2\nthe linear operator R which, to a \ufb01nite non-negative measure \u00b5 supported on R2\non \u2206 with mass equal to the total mass of \u00b5, namely R : \u00b5 (cid:55)\u2192 |\u00b5|\u03b4\u2206.\ntively n1 points x1 . . . xn1 and n2 points y1 . . . yn2. Let p \u2265 1. Then:\n\n> extended with a unique virtual point {\u2206} encoding the diagonal. We introduce\n>, associates a dirac\n\nProposition 1. Let D1 =(cid:80)n1\n\nj=1 \u03b4yj be two persistence diagrams with respec-\n\n> \u222a {\u2206} be R2\n\ndp(D1, D2)p = LC(D1 + RD2, D2 + RD1),\n\nwhere C is the cost matrix with block structure\n\ni=1 \u03b4xi and D2 =(cid:80)n2\n(cid:18) (cid:98)C u\n(cid:19)\n\nC =\n\nvT\n\n0\n\n\u2208 R(n1+1)\u00d7(n2+1),\np,(cid:98)Cij = (cid:107)xi \u2212 yj(cid:107)p\n\n(7)\n\n(8)\n\n(9)\n\nt = diag(u\u03b3\n\nt ) where (u\u03b3\n\nt , v\u03b3\n\n|dp(D1, D2)p \u2212 (cid:104)P \u03b3\n\np, for i \u2264 n1, j \u2264 n2.\n\np, vj = (cid:107)yj \u2212 \u03c0\u2206(yj)(cid:107)p\n\nwhere ui = (cid:107)xi \u2212 \u03c0\u2206(xi)(cid:107)p\nThe proof seamlessly relies on the fact that, when transporting point measures with the same mass\n(number of points counted with multiplicity), the optimal transport problem is equivalent to an\noptimal matching problem (see \u00a72). Details are left to the supplementary material.\nEntropic approximation of diagram distances. Following the correspondance established in\nProposition 1, entropic regularization can be used to approximate the diagram distance dp(\u00b7,\u00b7).\nGiven two persistence diagrams D1, D2 with respective masses n1 and n2, let n := n1 + n2,\na = (1n1 , n2) \u2208 Rn1+1, b = (1n2 , n1) \u2208 Rn2+1, and P \u03b3\nt ) is\nthe output after t iterations of the Sinkhorn map (5). Adapting the bounds provided by Altschuler\net al. (2017), we can bound the error of approximating dp(D1, D2)p by (cid:104)P \u03b3\n\nt )Kdiag(v\u03b3\nt , C(cid:105):\nt , \u03a0(a, b))(cid:107)C(cid:107)\u221e\nwhere dist(P, \u03a0(a, b)) := (cid:107)P 1 \u2212 a(cid:107)1 + (cid:107)P T 1 \u2212 b(cid:107)1 (that is, error on marginals).\nDvurechensky et al. (2018) prove that iterating the Sinkhorn map (5) gives a plan P \u03b3\n(cid:17)\ndist(P \u03b3\n\nt satisfying\niterations. Given (9), a natural choice is thus to take\n\nt , \u03a0(a, b)) < \u03b5 in O\nn ln(n) for a desired precision \u03b5, which lead to a total of O\n\niterations in the\n\u03b3 =\nSinkhorn loop. These results can be used to pre-tune parameters t and \u03b3 to control the approximation\nerror due to smoothing. However, these are worst-case bounds, controlled by max-norms, and are\noften too pessimistic in practice. To overcome this phenomenon, we propose on-the-\ufb02y error control,\nusing approximate solutions to the smoothed primal (2) and dual (3) optimal transport problems,\nwhich provide upper and lower bounds on the optimal transport cost.\nUpper and Lower Bounds. The Sinkhorn algorithm, after at least one iteration (t \u2265 1), produces\nfeasible dual variables (\u03b1\u03b3\nt )) \u2208 \u03a8C (see below (1) for a de\ufb01nition).\n\nt , C(cid:105)| \u22642\u03b3n log (n) + dist(P \u03b3\n\n(cid:16) n ln(n)(cid:107)C(cid:107)2\u221e\n\nt , \u03b2\u03b3\n\nt ) = (\u03b3 log(u\u03b3\n\nt ), \u03b3 log(v\u03b3\n\n(cid:107)C(cid:107)2\u221e\n\u03b3\u03b5 + ln(n)\n\n(cid:16)\n\n(cid:17)\n\n\u03b52\n\n\u03b5\n\n4\n\n\f:= (cid:104)R\u03b3\n\nt := (cid:104)\u03b1c\u00afc\n\nt , C(cid:105) (red) and m\u03b3\n\nt , b(cid:105) (green) as a function of t, the number\nFigure 3: (Left) M \u03b3\nof iterations of the Sinkhorn map (t ranges from 1 to 500, with \ufb01xed \u03b3 = 10\u22123). (Middle) Final M \u03b3 (red)\nt\nand m\u03b3 (green) provided by Alg.1, computed for decreasing \u03b3s, ranging from 10\u22121 to 5.10\u22124. For each value\nt , \u03a0(a, b)) < 10\u22123. Note that the \u03b3-axis is \ufb02ipped. (Right) In\ufb02uence\nof \u03b3, Sinkhorn loop is run until d(P \u03b3\nof c\u00afc-transform for the Sinkhorn dual cost. (orange) The dual cost (cid:104)\u03b1\u03b3\nt , b(cid:105), where (\u03b1\u03b3\nt ) are\nSinkhorn dual variables (before the C-transform). (green) Dual cost after C-transform, i.e. with ((\u03b1\u03b3\nt )c).\nExperiment run with \u03b3 = 10\u22123 and 500 iterations.\n\nt , \u03b2\u03b3\nt )c\u00afc, (\u03b1\u03b3\n\nt , a(cid:105) + (cid:104)\u03b1c\n\nt , a(cid:105) + (cid:104)\u03b2\u03b3\n\nTheir objective value, as measured by (cid:104)\u03b1\u03b3\nt , b(cid:105), performs poorly as a lower bound of the\ntrue optimal transport cost (see Fig. 3 and \u00a75 below) in most of our experiments. To improve on this,\nwe compute the so called C-transform (\u03b1\u03b3\nt (Santambrogio, 2015, \u00a71.6), de\ufb01ned as:\n\nt , a(cid:105) + (cid:104)\u03b2\u03b3\nt )c of \u03b1\u03b3\ni {Cij \u2212 \u03b1i}, j \u2264 n2 + 1.\n\n\u2200j, (\u03b1\u03b3\nt )c\n\nj = max\n\nt )c\u00afc \u2208 Rn1+1, (\u03b1\u03b3\nApplying a C T -transform on (\u03b1\u03b3\ncan show that for any feasible \u03b1, \u03b2, we have that (Peyr\u00e9 & Cuturi, 2018, Prop 3.1)\n\nt )c, we recover two vectors (\u03b1\u03b3\n\nt )c \u2208 Rn2+1. One\n\n(cid:104)\u03b1, a(cid:105) + (cid:104)\u03b2, b(cid:105) \u2264 (cid:104)\u03b1c\u00afc, a(cid:105) + (cid:104)\u03b1c, b(cid:105) .\n\nWhen C\u2019s top-left block is the squared Euclidean metric, this problem can be cast as that of computing\nthe Moreau envelope of \u03b1. In a Eulerian setting and when X is a \ufb01nite regular grid which we\nwill consider, we can use either the linear-time Legendre transform or the Parabolic Envelope\nalgorithm (Lucet, 2010, \u00a72.2.1,\u00a72.2.2) to compute the C-transform in linear time with respect to the\ngrid resolution d.\nUnlike dual iterations, the primal iterate P \u03b3\nt does not belong to the transport polytope \u03a0(a, b) after a\n\ufb01nite number t of iterations. We use the rounding_to_feasible algorithm provided by Altschuler\net al. (2017) to compute ef\ufb01ciently a feasible approximation R\u03b3\nt that does belong to \u03a0(a, b).\nPutting these two elements together, we obtain\n\nt of P \u03b3\n\n(cid:124)\n(cid:104)(\u03b1\u03b3\n\nt )c\u00afc, a(cid:105) + (cid:104)(\u03b1\u03b3\n\n(cid:123)(cid:122)\n\nm\u03b3\nt\n\n(cid:125)\nt )c, b(cid:105)\n\n(cid:124) (cid:123)(cid:122) (cid:125)\n\u2264 LC(a, b) \u2264 (cid:104)R\u03b3\n\nt , C(cid:105)\nM \u03b3\nt\n\n.\n\n(10)\n\nt \u2212 m\u03b3\n\nt \u2212 m\u03b3\n\nt . Note that m\u03b3\n\nt might be negative but can always be replaced by max(m\u03b3\n\nTherefore, after iterating the Sinkhorn map (5) t times, we have that if M \u03b3\nt is below a certain\ncriterion \u03b5, then we can guarantee that (cid:104)R\u03b3\nt , C(cid:105) is a fortiori an \u03b5-approximation of LC(a, b). Observe\nt , then (1 \u2212 \u03b5)M \u03b3\nthat one can also have a relative error control: if one has M \u03b3\nt \u2264 \u03b5M \u03b3\nt \u2264\nLC(a, b) \u2264 M \u03b3\nt , 0) since\nwe know C has non-negative entries (and therefore LC(a, b) \u2265 0), while M \u03b3\nt is always non-negative.\nDiscretization. For simplicity, we assume in the remaining that our diagrams have their support in\n[0, 1]2 \u2229 R2\n>. From a numerical perspective, encoding persistence diagrams as histograms on the\nsquare offers numerous advantages. Given a uniform grid of size d \u00d7 d on [0, 1]2, we associate to a\ngiven diagram D a matrix-shaped histogram a \u2208 Rd\u00d7d such that aij is the number of points in D\nbelonging to the cell located at position (i, j) in the grid (we transition to bold-faced small letters to\ninsist on the fact that these histograms must be stored as square matrices). To account for the total\nmass, we add an extra dimension encoding mass on {\u2206}. We extend the operator R to histograms,\nassociating to a histogram a \u2208 Rd\u00d7d its total mass on the (d2 + 1)-th coordinate. One can show that\nthe approximation error resulting from that discretization is bounded above by 1\np + |D2|\np )\n(see the supplementary material).\nConvolutions. In the Eulerian setting, where diagrams are matrix-shaped histograms of size d \u00d7 d =\nd2, the cost matrix C has size d2 \u00d7 d2. Since we will use large values of d to have low discretization\nerror (typically d = 100), instantiating C is usually intractable. However, Solomon et al. (2015)\n\nd (|D1|\n\n1\n\n1\n\n5\n\n0100200300400500Nbiterationst012345Transportcost(centered)uppercost(primal)lowercost(dual)truecost0.000.020.040.060.080.10Parameter\u03b3\u221220246Transportcost(centered)uppercost(primal)lowercost(dual)truecost0100200300400500Nbiterationst\u22121.75\u22121.50\u22121.25\u22121.00\u22120.75\u22120.50\u22120.250.00Transportcost(centered)sinkhorn(dual)costlowercost(dual)truecost\fC =\n\n\u2212\u2192c\u2206\n\nT\n\n(cid:19)\n\n\u2212\u2192c\u2206\n0\n\nshowed that for regular grids endowed with a separable cost, each Sinkhorn iteration (as well as\nother key operations such as evaluating Sinkhorn\u2019s divergence S\u03b3\nC) can be performed using Gaussian\nconvolutions, which amounts to performing matrix multiplications of size d \u00d7 d, without having\nto manipulate d2 \u00d7 d2 matrices. Our framework is slightly different due to the extra dimension\n{\u2206}, but we show that equivalent computational properties hold. This observation is crucial from a\nnumerical perspective. Our ultimate goal being to ef\ufb01ciently evaluate (11), (12) and (14), we provide\nimplementation details.\nLet (u, u\u2206) be a pair where u \u2208 Rd\u00d7d is a matrix-shaped histogram and u\u2206 \u2208 R+ is a real number\naccounting for the mass located on the virtual point {\u2206}. We denote by \u2212\u2192u the d2 \u00d7 1 column vector\nobtained when reshaping u. The (d2 + 1) \u00d7 (d2 + 1) cost matrix C and corresponding kernel K are\ngiven by\n\n(cid:18) (cid:98)C\nwhere (cid:98)C = ((cid:107)(i, i(cid:48)) \u2212 (j, j(cid:48))(cid:107)p\np)ii(cid:48). C and K as de\ufb01ned above\nwill never be instantiated, because we can rely instead on c \u2208 Rd\u00d7d de\ufb01ned as cij = |i \u2212 j|p and\nk = e\u2212 c\n\u03b3 .\nProposition 2 (Iteration of Sinkhorn map). The application of K to (u, u\u2206) can be performed as:\n(11)\n\n(cid:0)k(kuT )T + u\u2206k\u2206,(cid:104)u, k\u2206(cid:105) + u\u2206\n\np)ii(cid:48),jj(cid:48), c\u2206 = ((cid:107)(i, i(cid:48)) \u2212 \u03c0\u2206((i, i(cid:48)))(cid:107)p\n\n(cid:32)(cid:98)K := e\u2212\n\n(cid:98)C\n\u03b3 \u2212\u2192k\u2206 := e\u2212\n\nwhere (cid:104)\u00b7,\u00b7(cid:105) denotes the Froebenius dot-product in Rd\u00d7d.\nWe now introduce m := k (cid:12) c and m\u2206 := k\u2206 (cid:12) c\u2206 ((cid:12) denotes term-wise multiplication).\nProposition 3 (Computation of S\u03b3\ndiag(\u2212\u2192u , u\u2206)Kdiag(\u2212\u2192v , v\u2206) can be computed as:\n\nC). Let (u, u\u2206), (v, v\u2206) \u2208 Rd\u00d7d+1. The transport cost of P :=\n(cid:104)diag(\u2212\u2192u , u\u2206)Kdiag(\u2212\u2192v , v\u2206), C(cid:105) = (cid:104)diag(\u2212\u2192u )(cid:98)Kdiag(\u2212\u2192v ),(cid:98)C(cid:105)+u\u2206 (cid:104)v, m\u2206(cid:105)+v\u2206 (cid:104)u, m\u2206(cid:105) , (12)\n\n(u, u\u2206) (cid:55)\u2192\n\n, K =\n\n\u2212\u2192k\u2206\n\nT\n\n(cid:33)\n\n\u2212\u2192c\u2206\n\n\u03b3\n\n,\n\n(cid:1)\n\n1\n\nwhere the \ufb01rst term can be computed as:\n\n(cid:104)diag(\u2212\u2192u )(cid:98)Kdiag(\u2212\u2192v ),(cid:98)C(cid:105) = (cid:107)u (cid:12)\n\n(13)\nFinally, consider two histograms (a, a\u2206), (b, b\u2206) \u2208 Rd\u00d7d \u00d7 R, let R \u2208 \u03a0((a, a\u2206), (b, b\u2206)) be the\nrounded matrix of P (see the supplementary material or (Altschuler et al., 2017)). Let r(P ), c(P ) \u2208\nRd\u00d7d \u00d7 R denote the \ufb01rst and second marginal of P respectively. We introduce (using term-wise\nmin and divisions):\n\n(cid:107)1.\n\n(cid:0)m(kvT )T + k(mvT )T(cid:1)\n\n(cid:19)\n\n(cid:18) (a, a\u2206)\n\nr(P )\n\n(cid:18) (b, b\u2206)\n\nc(diag(X)P )\n\n(cid:19)\n\n, 1\n\n,\n\nX = min\n\n, 1\n\n,\n\nY = min\n\nalong with P (cid:48) = diag(X)P diag(Y ) and the marginal errors:\n\n(er, (er)\u2206) = (a, a\u2206) \u2212 r(P (cid:48)),\n\n(ec, (ec)\u2206) = (b, b\u2206) \u2212 c(P (cid:48)),\n\nProposition 4 (Computation of upper bound (cid:104)R, C(cid:105)). The transport cost induced by R can be\ncomputed as:\n\n(cid:104)R, C(cid:105) =(cid:104)diag(X (cid:12) (u, u\u2206))Kdiag(Y (cid:12) (v, v\u2206)), C(cid:105)\n\n1\n\n+\n\nr cec(cid:107)1 + (cid:107)erceT\nNote that the \ufb01rst term can be computed using (12)\n\n(cid:107)ec(cid:107)1 + (ec)\u2206\n\n(cid:107)eT\n\n(cid:0)\n\nc (cid:107)1 + (ec)\u2206 (cid:104)er, c\u2206(cid:105) + (er)\u2206 (cid:104)ec, c\u2206(cid:105)\n\n(cid:1).\n\n(14)\n\nParallelization and GPU. Using a Eulerian representation is particularly bene\ufb01cial when ap-\nplying Sinkhorn\u2019s algorithm, as shown by Cuturi (2013).\nIndeed, the Sinkhorn map (5) only\ninvolves matrix-vector operations. When dealing with a large number of histograms, concate-\nnating these histograms and running Sinkhorn\u2019s iterations in parallel as matrix-matrix product\nresults in signi\ufb01cant speedup that can exploit GPGPU to compare a large number of pairs simul-\ntaneously. This makes our approach especially well-suited for large sets of persistence diagrams.\n\n6\n\n\fFigure 4: Barycenter estimation for different \u03b3s with a simple set of 3 PDs (red, blue and green). The smaller\nthe \u03b3, the better the estimation (E decreases, note the \u03b3-axis is \ufb02ipped on the right plot), at the cost of more\niterations in Alg. 2. The mass appearing along the diagonal is a consequence of entropic smoothing: it does not\ncost much to delete while it increases the entropy of transport plans.\n\nWe can now estimate distances be-\ntween persistence diagrams with\nAlg. 1 in parallel by performing only\n(d \u00d7 d)-sized matrix multiplications,\nleading to a computational scaling in\nd3 where d is the grid resolution pa-\nrameter. Note that a standard stopping\nthreshold in Sinkhorn iteration pro-\ncess is to check the error to marginals\ndist(P, \u03a0(a, b)), as motivated by (9).\n\nAlgorithm 1 Sinkhorn divergence for persistence diagrams\nInput: Pairs of PDs (Di, D(cid:48)i)i, smoothing parameter \u03b3 >\n0, grid step d \u2208 N, stopping criterion, initial (u, v).\nOutput: Approximation of all (dp(Di, D(cid:48)i)p)i, upper and\nlower bounds if wanted.\ninit Cast Di, D(cid:48)i as histograms ai, bi on a d \u00d7 d grid\nwhile stopping criterion not reached do\nIterate in parallel (5) (u, v) (cid:55)\u2192 S(u, v) using (11)\n\nend while\nCompute all S\u03b3\nif Want a upper bound then\n\nC(ai + Rbi, bi + Rai) using (12)\n\nend if\nif Want a lower bound then\n\nCompute (cid:104)Ri, C(cid:105) in parallel using (14)\n\n4 Smoothed barycenters\nfor persistence diagrams\nOT formulation for barycenters.\nWe show in this section that the bene-\n\ufb01ts of entropic regularization also ap-\nply to the computation of barycenters of PDs. As the space of PD is not Hilbertian but only a metric\nspace, the natural de\ufb01nition of barycenters is to formulate them as Fr\u00e9chet means for the dp metric,\nas \ufb01rst introduced (for PDs) in (Mileyko et al., 2011).\nDe\ufb01nition. Given a set of persistence diagrams D1, . . . , DN , a barycenter of D1 . . . DN is any\nsolution of the following minimization problem:\n\nt )c, bi(cid:105) using (Lucet, 2010)\n\nt )c\u00afc, ai(cid:105)+(cid:104)(\u03b1\u03b3\n\nCompute (cid:104)(\u03b1\u03b3\n\nend if\n\nN(cid:88)\n\ni=1\n\nminimize\n\u00b5\u2208M+(R2\n\n>)E(\u00b5) :=\n\nLC(\u00b5 + RDi, Di + R\u00b5)\n\n(15)\n\n>) denotes the set of non-negative \ufb01nite measures supported on R2\n\net al. (2014) proved the existence of minimizers of \u02c6E and proposed an algorithm that converges to a\nlocal minimum of the functional, using the Hungarian algorithm as a subroutine. Their algorithm\n\nwhere C is de\ufb01ned as in (8) with p = 2 (but our approach adapts easily to any \ufb01nite p \u2265 1), and\nM+(R2\nLet (cid:98)E denotes the restriction of E to the space of persistence diagrams (\ufb01nite point measures). Turner\n>. E(\u00b5) is the energy of \u00b5.\nwill be referred to as the B-Munkres Algorithm. The non-convexity of (cid:98)E can be a real limitation in\npractice since (cid:98)E can have arbitrarily bad local minima (see Lemma 1 in the supplementary material).\nNote that minimizing E instead of (cid:98)E will not give strictly better minimizers (see Proposition 6 in the\n\nsupplementary material). We then apply entropic smoothing to this problem. This relaxation offers\ndifferentiability and circumvents both non-convexity and numerical scalability.\nEntropic smoothing for PD barycenters. In addition to numerical ef\ufb01ciency, an advantage of\nsmoothed optimal transport is that a (cid:55)\u2192 L\u03b3\nC(a, b) is differentiable. In the Eulerian setting, its gradient\nis given by centering the vector \u03b3 log(u\u03b3) where u\u03b3 is a \ufb01xed point of the Sinkhorn map (5), see\n(cid:80)N\n(Cuturi & Doucet, 2014). This result can be adapted to our framework, namely:\nProposition 5. Let D1 . . . DN be PDs, and (ai)i the corresponding histograms on a d \u00d7 d grid. The\n(cid:32) N(cid:88)\ngradient of the functional E \u03b3 : z (cid:55)\u2192\n\nC(z + Rai, ai + Rz) is given by\n\ni=1 L\u03b3\n\n(cid:33)\n\n(16)\n\n\u2207zE \u03b3 = \u03b3\n\nlog(u\u03b3\n\ni ) + RT log(v\u03b3\ni )\n\ni=1\n\n7\n\n\u03b3=0.02\u03b3=0.01\u03b3=0.0010.00250.00500.00750.01000.01250.01500.01750.0200Parameter\u03b30.00.20.40.60.81.01.21.4Energyreached\fi , v\u03b3\n\n\u2207 := \u03b3((cid:80)\n\nend if\nReturn z\n\ni ) + RT log(v\u03b3\n\ni ))\n\nC(z + Rai, ai + Rz) using (12)\n\ni log(u\u03b3\nz := z (cid:12) exp(\u2212\u03bb\u2207)\n\nend while\nif Want energy then\ni S\u03b3\n\nCompute 1\nN\n\n(cid:80)\n\ni ) is a \ufb01xed point of the Sinkhorn map obtained\n\nIterate S de\ufb01ned in (5) in parallel between all the pairs\n(z + Rai)i and (ai + Rz)i, using (11).\n\nAlgorithm 2 Smoothed approximation of PD barycenter\nInput: PDs D1, . . . , DN , learning rate \u03bb, smoothing pa-\nrameter \u03b3 > 0, grid step d \u2208 N.\nOutput: Estimated barycenter z\nInit: z uniform measure above the diagonal.\nCast each Di as an histogram ai on a d \u00d7 d grid\nwhile z changes do\n\nwhere RT denotes the adjoint operator R and (u\u03b3\nwhile transporting z + Rai onto ai + Rz.\nAs in (Cuturi & Doucet, 2014), this\nresult follows from the envelope the-\norem, with the added subtlety that z\nappears in both terms depending on u\nand v. This formula can be exploited\nto compute barycenters via gradient\ndescent, yielding Algorithm 2. Fol-\nlowing (Cuturi & Doucet, 2014, \u00a74.2),\nwe used a multiplicative update. This\nis a particular case of mirror descent\n(Beck & Teboulle, 2003) and is equiv-\nalent to a (Bregman) projected gra-\ndient descent on the positive orthant,\nretaining positive coef\ufb01cients through-\nout iterations.\nAs it can be seen in Fig. 4, the barycen-\ntric persistence diagrams are smeared.\nIf one wishes to recover more spiked diagrams, quantization and/or entropic sharpening (Solomon\net al., 2015, \u00a76.1) can be applied, as well as smaller values for \u03b3 that impact computational speed or\nnumerical stability. We will consider these extensions in future work.\nA comparison with linear representations. When doing statistical analysis with PDs, a standard\napproach is to transform a diagram into a \ufb01nite dimensional vector\u2014in a stable way\u2014and then\nperform statistical analysis and learning with an Euclidean structure. This approach does not preserve\nthe Wasserstein-like geometry of the diagram space and thus loses the algebraic interpretability of\nPDs. Fig. 1 gives a qualitative illustration of the difference between Wasserstein barycenters (Fr\u00e9chet\nmean) of PDs and Euclidean barycenters (linear means) of persistence images (Adams et al., 2017), a\ncommonly used vectorization for PDs (Makarenko et al., 2016; Zeppelzauer et al., 2016; Obayashi\net al., 2018).\n5 Experiments\nAll experiments are run with p = 2, but would work with any \ufb01nite p \u2265 1. This choice is consistent\nwith the work of Turner et al. (2014) for barycenter estimation.\nA large scale approximation. Iterations of Sinkhorn map\n(5) yield a transport cost whose value converges to the true\ntransport cost as \u03b3 \u2192 0 and the number of iterations t \u2192\n\u221e (Cuturi, 2013). We quantify in Fig. 3 this convergence\nexperimentally using the upper and lower bounds provided\nin (10) through t and for decreasing \u03b3. We consider a set\nof N = 100 pairs of diagrams randomly generated with\n100 to 150 points in each diagrams, and discretized on a\n100\u00d7 100 grid. We run Alg. 1 for different \u03b3 ranging from\n10\u22121 to 5.10\u22124 along with corresponding upper and lower\nbounds described in (10). For each pair of diagrams, we\ncenter our estimates by removing the true distance, so that\nthe target cost is 0 across all pairs. We plot median, top\n90% and bottom 10% percentiles for both bounds. Using\nthe C-transform provides a much better lower bound in our experiments. This is however inef\ufb01cient\nin practice: despite a theoretical complexity linear in the grid size, the sequential structure of the\nalgorithms described in (Lucet, 2010) makes them unsuited for GPGPU to our knowledge.\nWe then compare the scalability of Alg. 1 with respect to the number of points in diagrams with that of\nKerber et al. (2017) which provides a state-of-the-art algorithm with publicly available code\u2014referred\nto as Hera\u2014to estimate distances between diagrams. For both algorithms, we compute the average\ntime tn to estimate a distance between two random diagrams having from n to 2n points where n\nranges from 10 to 5000. In order to compare their scalability, we plot in Fig. 5 the ratio tn/t10 of\nboth algorithms, with \u03b3n = 10\u22121/n in Alg. 1.\n\nFigure 5: Comparison of scalings of Hera and\nSinkhorn (Alg. 1) as the number of points in\ndiagram increases. log-log scale.\n\n8\n\n101102103Nbpointsindiag.n(log-scale)100101102103Scalingtn/t10(log-scale)HeraSinkhorn\fFigure 7: Qualitative comparison of B-Munkres and our Alg 2. (a) Input set of N = 3 diagrams with n = 20\npoints each. (b) Output of B-Munkres when initialized on the blue diagram (orange squares) and input data (grey\nscale). (c) Output of B-Munkres initialized on the green diagram. (d) Output of Alg. 2 on a 100 \u00d7 100 grid,\n\u03b3 = 5.10\u22124, learning-rate \u03bb = 5, Sinkhorn stopping criterion (distance to marginals): 0.001, gradient descent\nperformed until |E(zt+1)/E(zt) \u2212 1| < 0.01.\u2014As one can see, localization of masses is similar. Initialization\nof B-Munkres is made on one of the input diagram as indicated in (Turner et al., 2014, Alg. 1), and leads to\nconvergence to different local minima. Our convex approach (Alg. 2) performs better (lower energy). As a\nbaseline, the energy of the naive arithmetic mean of the three diagrams is 0.72.\n\nFigure 8: Illustration of our k-means algorithm. From left to right: 20 diagrams extracted from horses and\ncamels plot together (one color for each diagram); the centroid they are matched with provided by our algorithm;\n20 diagrams of head and faces; along with their centroid; decrease of the objective function. Running time\ndepends on many parameters along with the random initialization of k-means. As an order of magnitude, it takes\nfrom 40 to 80 minutes with this 5000 PD dataset on a P100 GPU.\nFast barycenters and k-means on large PD sets. We compare our Alg. 2 (referred to as Sinkhorn)\nto the combinatorial algorithm of Turner et al. (2014) (referred to as B-Munkres). We use the script\nmunkres.py provided on the website of K.Turner for their implementation. We record in Fig. 6\nrunning times of both algorithms on a set of 10 diagrams having from n to 2n points, n ranging from\n1 to 500, on Intel Xeon 2.3 GHz (CPU) and P100 (GPU, Sinkhorn only). When running Alg. 2, the\ngradient descent is performed until |E(zt+1)/E(zt) \u2212 1| < 0.01, with \u03b3 = 10\u22121/n and d = 50. Our\nexperiment shows that Alg. 2 drastically outperforms B-Munkres as the number of points n increases.\nWe interrupt B-Munkres at n = 30, after which computational time becomes an issue.\nAside the computational ef\ufb01ciency, we highlight the bene-\n\ufb01ts of operating with a convex formulation in Fig. 7. Due\nto non-convexity, the B-Munkres algorithm is only guaran-\nteed to converge to a local minima, and its output depends\non initialization. We illustrate on a toy set of N = 3 dia-\ngrams how our algorithm avoids local minima thanks to\nthe Eulerian approach we take.\nWe now merge Alg. 1 and Alg. 2 in order to perform unsu-\npervised clustering via k-means on PDs. We work with the\n3D-shape database provided by Sumner & Popovi\u00b4c and\ngenerate diagrams in the same way as in (Carri\u00e8re et al.,\n2015), working in practice with 5000 diagrams with 50 to\n100 points each. The database contains 6 classes: camel, cat, elephant, horse, head and face.\nIn practice, this unsupervised clustering algorithm detects two main clusters: faces and heads on one\nhand, camels and horses on the other hand are systematically grouped together. Fig. 8 illustrates the\nconvergence of our algorithm and the computed centroids for the aforementioned clusters.\n6 Conclusion\nIn this work, we took advantage of a link between PD metrics and optimal transport to leverage and\nadapt entropic regularization for persistence diagrams. Our approach relies on matrix manipulations\nrather than combinatorial computations, providing parallelization and ef\ufb01cient use of GPUs. We\nprovide bounds to control approximation errors. We use these differentiable approximations to\nestimate barycenters of PDs signi\ufb01cantly faster than existing algorithm, and showcase their application\nby clustering thousand diagrams built from real data. We believe this \ufb01rst step will open the way for\nnew statistical tools for TDA and ambitious data analysis applications of persistence diagrams.\n\nFigure 6: Average running times for B-\nMunkres (blue) and Sinkhorn (red) algo-\nrithms (log-log scale) to average 10 PDs.\n\n9\n\n(a)Diagramset(b)B-Munkres(c)B-Munkres(d)Alg2FinalenergyB-Munkres(b)B-Munkres(c)Alg.2(d)0.5890.5550.54205101520NbIterationink-means0.1250.1300.1350.1400.1450.1500.1550.1600.165Energyofk-means100101102Nbpointsindiagramsn(log-scale)10\u22121100101102103Timetoconverge(s)(log-scale)Sinkhorn(cpu)Sinkhorn(gpu)B-Munkres\fAcknowledgments. We thank the anonymous reviewers for the fruitful discussion. TL was sup-\nported by the AMX, \u00c9cole polytechnique. MC acknowledges the support of a Chaire d\u2019Excellence\nde l\u2019Idex Paris-Saclay.\n\nReferences\nAdams, H., Emerson, T., Kirby, M., Neville, R., Peterson, C., Shipman, P., Chepushtanova, S.,\nHanson, E., Motta, F., and Ziegelmeier, L. Persistence images: a stable vector representation of\npersistent homology. Journal of Machine Learning Research, 18(8):1\u201335, 2017.\n\nAgueh, M. and Carlier, G. Barycenters in the wasserstein space. SIAM Journal on Mathematical\n\nAnalysis, 43(2):904\u2013924, 2011.\n\nAltschuler, J., Weed, J., and Rigollet, P. Near-linear time approximation algorithms for optimal\nIn Advances in Neural Information Processing Systems, pp.\n\ntransport via sinkhorn iteration.\n1961\u20131971, 2017.\n\nAnderes, E., Borgwardt, S., and Miller, J. Discrete wasserstein barycenters: optimal transport for\n\ndiscrete data. Mathematical Methods of Operations Research, 84(2):389\u2013409, 2016.\n\nBeck, A. and Teboulle, M. Mirror descent and nonlinear projected subgradient methods for convex\n\noptimization. Operations Research Letters, 31(3):167\u2013175, 2003.\n\nBenamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., and Peyr\u00e9, G. Iterative bregman projections for\nregularized transportation problems. SIAM Journal on Scienti\ufb01c Computing, 37(2):A1111\u2013A1138,\n2015.\n\nBubenik, P. Statistical topological data analysis using persistence landscapes. The Journal of Machine\n\nLearning Research, 16(1):77\u2013102, 2015.\n\nCarlier, G., Oberman, A., and Oudet, E. Numerical methods for matching for teams and wasserstein\nbarycenters. ESAIM: Mathematical Modelling and Numerical Analysis, 49(6):1621\u20131642, 2015.\nCarri\u00e8re, M., Oudot, S. Y., and Ovsjanikov, M. Stable topological signatures for points on 3d shapes.\n\nIn Computer Graphics Forum, volume 34, pp. 1\u201312. Wiley Online Library, 2015.\n\nCarri\u00e8re, M., Cuturi, M., and Oudot, S. Sliced wasserstein kernel for persistence diagrams. In 34th\n\nInternational Conference on Machine Learning, 2017.\n\nChazal, F., Cohen-Steiner, D., Glisse, M., Guibas, L. J., and Oudot, S. Y. Proximity of persistence\nmodules and their diagrams. In Proceedings of the twenty-\ufb01fth annual symposium on Computational\ngeometry, pp. 237\u2013246. ACM, 2009.\n\nChazal, F., De Silva, V., and Oudot, S. Persistence stability for geometric complexes. Geometriae\n\nDedicata, 173(1):193\u2013214, 2014.\n\nCohen-Steiner, D., Edelsbrunner, H., and Harer, J. Stability of persistence diagrams. Discrete &\n\nComputational Geometry, 37(1):103\u2013120, 2007.\n\nCuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural\n\nInformation Processing Systems, pp. 2292\u20132300, 2013.\n\nCuturi, M. and Doucet, A. Fast computation of wasserstein barycenters. In International Conference\n\non Machine Learning, pp. 685\u2013693, 2014.\n\nDvurechensky, P., Gasnikov, A., and Kroshnin, A. Computational optimal transport: Complexity by\naccelerated gradient descent is better than by sinkhorn\u2019s algorithm. In Dy, J. and Krause, A. (eds.),\nProceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings\nof Machine Learning Research, pp. 1367\u20131376, Stockholmsm\u00e4ssan, Stockholm Sweden, 10\u201315\nJul 2018. PMLR. URL http://proceedings.mlr.press/v80/dvurechensky18a.html.\n\nEdelsbrunner, H. and Harer, J. Computational topology: an introduction. American Mathematical\n\nSoc., 2010.\n\nEdelsbrunner, H., Letscher, D., and Zomorodian, A. Topological persistence and simpli\ufb01cation. In\nFoundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, pp. 454\u2013463.\nIEEE, 2000.\n\nFr\u00e9chet, M. Les \u00e9l\u00e9ments al\u00e9atoires de nature quelconque dans un espace distanci\u00e9. In Annales de\n\nl\u2019institut Henri Poincar\u00e9, volume 10, pp. 215\u2013310. Presses universitaires de France, 1948.\n\n10\n\n\fGuittet, K. Extended Kantorovich norms: a tool for optimization. PhD thesis, INRIA, 2002.\nHiraoka, Y., Nakamura, T., Hirata, A., Escolar, E. G., Matsue, K., and Nishiura, Y. Hierarchical\nstructures of amorphous solids characterized by persistent homology. Proceedings of the National\nAcademy of Sciences, 113(26):7035\u20137040, 2016.\n\nKerber, M., Morozov, D., and Nigmetov, A. Geometry helps to compare persistence diagrams.\n\nJournal of Experimental Algorithmics (JEA), 22(1):1\u20134, 2017.\n\nLi, C., Ovsjanikov, M., and Chazal, F. Persistence-based structural recognition. In Proceedings of the\n\nIEEE Conference on Computer Vision and Pattern Recognition, pp. 1995\u20132002, 2014.\n\nLucet, Y. What shape is your conjugate? a survey of computational convex analysis and its\n\napplications. SIAM review, 52(3):505\u2013542, 2010.\n\nLum, P., Singh, G., Lehman, A., Ishkanov, T., Vejdemo-Johansson, M., Alagappan, M., Carlsson, J.,\nand Carlsson, G. Extracting insights from the shape of complex data using topology. Scienti\ufb01c\nreports, 3:1236, 2013.\n\nMakarenko, N., Kalimoldayev, M., Pak, I., and Yessenaliyeva, A. Texture recognition by the methods\n\nof topological data analysis. Open Engineering, 6(1), 2016.\n\nMileyko, Y., Mukherjee, S., and Harer, J. Probability measures on the space of persistence diagrams.\n\nInverse Problems, 27(12):124007, 2011.\n\nMoreau, J.-J. Proximit\u00e9 et dualit\u00e9 dans un espace hilbertien. Bull. Soc. Math. France, 93(2):273\u2013299,\n\n1965.\n\nNicolau, M., Levine, A. J., and Carlsson, G. Topology based data analysis identi\ufb01es a subgroup of\nbreast cancers with a unique mutational pro\ufb01le and excellent survival. Proceedings of the National\nAcademy of Sciences, 108(17):7265\u20137270, 2011.\n\nObayashi, I., Hiraoka, Y., and Kimura, M. Persistence diagrams with linear machine learning models.\n\nJournal of Applied and Computational Topology, 1(3-4):421\u2013449, 2018.\n\nPeyr\u00e9, G. and Cuturi, M. Computational Optimal Transport, 2018. URL http://arxiv.org/abs/\n\n1803.00567.\n\nReininghaus, J., Huber, S., Bauer, U., and Kwitt, R. A stable multi-scale kernel for topological ma-\nchine learning. In Proceedings of the IEEE conference on computer vision and pattern recognition,\npp. 4741\u20134748, 2015.\n\nSantambrogio, F. Optimal transport for applied mathematicians. Birk\u00e4user, NY, 2015.\nSchrijver, A. Theory of linear and integer programming. John Wiley & Sons, 1998.\nSolomon, J., De Goes, F., Peyr\u00e9, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., and Guibas, L.\nConvolutional wasserstein distances: Ef\ufb01cient optimal transportation on geometric domains. ACM\nTransactions on Graphics (TOG), 34(4):66, 2015.\n\nSumner, R. W. and Popovi\u00b4c, J. Deformation transfer for triangle meshes. In ACM Transactions on\n\nGraphics (TOG), volume 23, pp. 399\u2013405. ACM, 2004.\n\nTurner, K. Means and medians of sets of persistence diagrams. arXiv preprint arXiv:1307.8300,\n\n2013.\n\nTurner, K., Mileyko, Y., Mukherjee, S., and Harer, J. Fr\u00e9chet means for distributions of persistence\n\ndiagrams. Discrete & Computational Geometry, 52(1):44\u201370, 2014.\n\nVillani, C. Topics in optimal transportation. Number 58. American Mathematical Soc., 2003.\nVillani, C. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.\nZeppelzauer, M., Zieli\u00b4nski, B., Juda, M., and Seidl, M. Topological descriptors for 3d surface analysis.\nIn International Workshop on Computational Topology in Image Context, pp. 77\u201387. Springer,\n2016.\n\nZomorodian, A. and Carlsson, G. Computing persistent homology. Discrete & Computational\n\nGeometry, 33(2):249\u2013274, 2005.\n\n11\n\n\f", "award": [], "sourceid": 6405, "authors": [{"given_name": "Theo", "family_name": "Lacombe", "institution": "Inria Saclay"}, {"given_name": "Marco", "family_name": "Cuturi", "institution": "Universit\u00e9 Paris-Saclay, CREST - ENSAE"}, {"given_name": "Steve", "family_name": "OUDOT", "institution": "inria"}]}