{"title": "Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams", "book": "Advances in Neural Information Processing Systems", "page_first": 10007, "page_last": 10018, "abstract": "Algebraic topology methods have recently played an important role for statistical analysis with complicated geometric structured data such as shapes, linked twist maps, and material data. Among them, \\textit{persistent homology} is a well-known tool to extract robust topological features, and outputs as \\textit{persistence diagrams} (PDs). However, PDs are point multi-sets which can not be used in machine learning algorithms for vector data. To deal with it, an emerged approach is to use kernel methods, and an appropriate geometry for PDs is an important factor to measure the similarity of PDs. A popular geometry for PDs is the \\textit{Wasserstein metric}. However, Wasserstein distance is not \\textit{negative definite}. Thus, it is limited to build positive definite kernels upon the Wasserstein distance \\textit{without approximation}. In this work, we rely upon the alternative \\textit{Fisher information geometry} to propose a positive definite kernel for PDs \\textit{without approximation}, namely the Persistence Fisher (PF) kernel. Then, we analyze eigensystem of the integral operator induced by the proposed kernel for kernel machines. Based on that, we derive generalization error bounds via covering numbers and Rademacher averages for kernel machines with the PF kernel. Additionally, we show some nice properties such as stability and infinite divisibility for the proposed kernel. Furthermore, we also propose a linear time complexity over the number of points in PDs for an approximation of our proposed kernel with a bounded error. Throughout experiments with many different tasks on various benchmark datasets, we illustrate that the PF kernel compares favorably with other baseline kernels for PDs.", "full_text": "Persistence Fisher Kernel: A Riemannian Manifold\n\nKernel for Persistence Diagrams\n\nRIKEN Center for Advanced Intelligence Project, Japan\n\nTam Le\n\ntam.le@riken.jp\n\nMakoto Yamada\n\nKyoto University, Japan\n\nRIKEN Center for Advanced Intelligence Project, Japan\n\nmakoto.yamada@riken.jp\n\nAbstract\n\nAlgebraic topology methods have recently played an important role for statistical\nanalysis with complicated geometric structured data such as shapes, linked twist\nmaps, and material data. Among them, persistent homology is a well-known tool\nto extract robust topological features, and outputs as persistence diagrams (PDs).\nHowever, PDs are point multi-sets which can not be used in machine learning\nalgorithms for vector data. To deal with it, an emerged approach is to use kernel\nmethods, and an appropriate geometry for PDs is an important factor to measure the\nsimilarity of PDs. A popular geometry for PDs is the Wasserstein metric. However,\nWasserstein distance is not negative de\ufb01nite. Thus, it is limited to build positive\nde\ufb01nite kernels upon the Wasserstein distance without approximation. In this work,\nwe rely upon the alternative Fisher information geometry to propose a positive\nde\ufb01nite kernel for PDs without approximation, namely the Persistence Fisher (PF)\nkernel. Then, we analyze eigensystem of the integral operator induced by the\nproposed kernel for kernel machines. Based on that, we derive generalization error\nbounds via covering numbers and Rademacher averages for kernel machines with\nthe PF kernel. Additionally, we show some nice properties such as stability and\nin\ufb01nite divisibility for the proposed kernel. Furthermore, we also propose a linear\ntime complexity over the number of points in PDs for an approximation of our\nproposed kernel with a bounded error. Throughout experiments with many different\ntasks on various benchmark datasets, we illustrate that the PF kernel compares\nfavorably with other baseline kernels for PDs.\n\n1\n\nIntroduction\n\nUsing algebraic topology methods for statistical data analysis has been recently received a lot of\nattention from machine learning community [Chazal et al., 2015, Kwitt et al., 2015, Bubenik, 2015,\nKusano et al., 2016, Chen and Quadrianto, 2016, Carriere et al., 2017, Hofer et al., 2017, Adams et al.,\n2017, Kusano et al., 2018]. Algebraic topology methods can produce a robust descriptor which can\ngive useful insight when one deals with complicated geometric structured data such as shapes, linked\ntwist maps, and material data. More speci\ufb01cally, algebraic topology methods are applied in various\nresearch \ufb01elds such as biology [Kasson et al., 2007, Xia and Wei, 2014, Cang et al., 2015], brain\nscience [Singh et al., 2008, Lee et al., 2011, Petri et al., 2014], and information science [De Silva\net al., 2007, Carlsson et al., 2008], to name a few.\n\n32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montr\u00e9al, Canada.\n\n\fFigure 1: An illustration of a persistence diagram on a real-value function f. The orange horizontal\nlines are the boundaries of sublevel sets f\u22121((\u2212\u221e, t]). For the 0-dimensional topological features\n(connected components), the topological events of births are happened at t = t1, t2, t3 and their\ncorresponding topological events of deaths are occurred at t = +\u221e, t5, t4 respectively. Therefore,\nthe persistent diagram of f is Dgf = {(t1, +\u221e), (t2, t5), (t3, t4)}.\nIn algebraic topology, persistent homology is an important method to extract robust topological\ninformation, it outputs point multisets, called persistence diagrams (PDs) [Edelsbrunner et al., 2000].\nSince PDs can have different number of points, it is not straightforward to plug PDs into traditional\nstatistical machine learning algorithms, which often assume a vector representation for data.\n\nRelated work. There are two main approaches in topological data analysis: (i) explicit vector\nrepresentation for PDs such as computing and sampling functions built from PDs (i.e. persistence\nlanscapes [Bubenik, 2015], tangent vectors from the mean of the square-root framework with principal\ngeodesic analysis [Anirudh et al., 2016], or persistence images [Adams et al., 2017]), using points in\nPDs as roots of a complex polynomial for concatenated-coef\ufb01cient vector representations [Di Fabio\nand Ferri, 2015], or using distance matrices of points in PDs for sorted-entry vector representations\n[Carriere et al., 2015], (ii) implicit representation via kernels such as the Persistence Scale Space\n(PSS) kernel, motivated by a heat diffusion problem with a Dirichlet boundary condition [Reininghaus\net al., 2015], the Persistence Weighted Gaussian (PWG) kernel via kernel mean embedding [Kusano\net al., 2016], or the Sliced Wasserstein (SW) kernel under Wasserstein geometry [Carriere et al., 2017].\nIn particular, geometry on PDs plays an important role. One of the most popular geometries for PDs\nis the Wasserstein metric [Villani, 2003, Peyre and Cuturi, 2017]. However, it is well-known that the\nWasserstein distance is not negative de\ufb01nite [Reininghaus et al., 2015] (Appendix A). Consequently,\nwe may not obtain positive de\ufb01nite kernels, built upon from the Wasserstein distance. Thus, it may\nbe necessary to approximate the Wasserstein distance to achieve positive de\ufb01niteness for kernels,\nrelied on Wasserstein geometry. For example, [Carriere et al., 2017] used the SW distance\u2014an\napproximation of Wasserstein distance\u2014to construct the positive de\ufb01nite SW kernel.\n\nContributions.\nIn this work, we focus on the implicit representation via kernels for PDs approach,\nand follow Anirudh et al. [2016] to explore an alternative Riemannian geometry, namely the Fisher\ninformation metric [Amari and Nagaoka, 2007, Lee, 2006] for PDs. Our contribution is two-fold:\n(i) we propose a positive de\ufb01nite kernel, namely the Persistence Fisher (PF) kernel for PDs. The\nproposed kernel well preserves the geometry of the Riemannian manifold since it is directly built\nupon the Fisher information metric for PDs without approximation. (ii) We analyze the eigensystem\nof the integral operator induced by the PF kernel for kernel machines. Based on that, we derive\ngeneralization error bounds via covering numbers and Rademacher averages for kernel machines with\nthe PF kernel. Additionally, we provide some nice properties such as a bound for the proposed kernel\ninduced squared distance with respect to the geodesic distance which can be interpreted as stability\nin a similar sense as the work of [Kwitt et al., 2015, Reininghaus et al., 2015] with Wasserstein\ngeometry, and in\ufb01nite divisibility for the proposed kernel. Furthermore, we describe a linear time\ncomplexity over the number of points in PDs for an approximation of the PF kernel with a bounded\nerror via Fast Gauss Transform [Greengard and Strain, 1991, Morariu et al., 2009].\n\n2 Background\n\nPersistence diagrams. Persistence homology (PH) [Edelsbrunner and Harer, 2008] is a popular\ntechnique to extract robust topological features (i.e. connected components, rings, cavities) on\nreal-value functions. Given f : X (cid:55)\u2192 R, PH considers the family of sublevel sets of f (i.e.\n\n2\n\nRXAAAB7XicbVA9T8MwEL3wWcJXgZHFokFiqpIuwFbBwlgkQiu1UeW4TmvVcSzbQaqi/ggWBkCs/B82/g1umwFannTS03t3ursXS8608f1vZ219Y3Nru7Lj7u7tHxxWj44fdZYrQkOS8Ux1YqwpZ4KGhhlOO1JRnMactuPx7cxvP1GlWSYezETSKMVDwRJGsLFS2/Pcjut5/WrNr/tzoFUSlKQGJVr96ldvkJE8pcIQjrXuBr40UYGVYYTTqdvLNZWYjPGQdi0VOKU6KubnTtG5VQYoyZQtYdBc/T1R4FTrSRrbzhSbkV72ZuJ/Xjc3yVVUMCFzQwVZLEpyjkyGZr+jAVOUGD6xBBPF7K2IjLDCxNiEXBtCsPzyKgkb9eu6f9+oNW/KNCpwCmdwAQFcQhPuoAUhEBjDM7zCmyOdF+fd+Vi0rjnlzAn8gfP5A/usjY0=AAAB7XicbVA9T8MwEL3wWcJXgZHFokFiqpIuwFbBwlgkQiu1UeW4TmvVcSzbQaqi/ggWBkCs/B82/g1umwFannTS03t3ursXS8608f1vZ219Y3Nru7Lj7u7tHxxWj44fdZYrQkOS8Ux1YqwpZ4KGhhlOO1JRnMactuPx7cxvP1GlWSYezETSKMVDwRJGsLFS2/Pcjut5/WrNr/tzoFUSlKQGJVr96ldvkJE8pcIQjrXuBr40UYGVYYTTqdvLNZWYjPGQdi0VOKU6KubnTtG5VQYoyZQtYdBc/T1R4FTrSRrbzhSbkV72ZuJ/Xjc3yVVUMCFzQwVZLEpyjkyGZr+jAVOUGD6xBBPF7K2IjLDCxNiEXBtCsPzyKgkb9eu6f9+oNW/KNCpwCmdwAQFcQhPuoAUhEBjDM7zCmyOdF+fd+Vi0rjnlzAn8gfP5A/usjY0=AAAB7XicbVA9T8MwEL3wWcJXgZHFokFiqpIuwFbBwlgkQiu1UeW4TmvVcSzbQaqi/ggWBkCs/B82/g1umwFannTS03t3ursXS8608f1vZ219Y3Nru7Lj7u7tHxxWj44fdZYrQkOS8Ux1YqwpZ4KGhhlOO1JRnMactuPx7cxvP1GlWSYezETSKMVDwRJGsLFS2/Pcjut5/WrNr/tzoFUSlKQGJVr96ldvkJE8pcIQjrXuBr40UYGVYYTTqdvLNZWYjPGQdi0VOKU6KubnTtG5VQYoyZQtYdBc/T1R4FTrSRrbzhSbkV72ZuJ/Xjc3yVVUMCFzQwVZLEpyjkyGZr+jAVOUGD6xBBPF7K2IjLDCxNiEXBtCsPzyKgkb9eu6f9+oNW/KNCpwCmdwAQFcQhPuoAUhEBjDM7zCmyOdF+fd+Vi0rjnlzAn8gfP5A/usjY0=t1AAAB73icbVBNS8NAEN3Urxq/qh69LDaCp5L0ooKHghePFYyttKFstpt26e4m7E6EUvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL84EN+D7305pbX1jc6u87e7s7u0fVA6PHkyaa8pCmopUt2NimOCKhcBBsHamGZGxYK14dDPzW09MG56qexhnLJJkoHjCKQErPXqeC73A9bxeperX/DnwKgkKUkUFmr3KV7ef0lwyBVQQYzqBn0E0IRo4FWzqdnPDMkJHZMA6lioimYkm84On+MwqfZyk2pYCPFd/T0yINGYsY9spCQzNsjcT//M6OSSX0YSrLAem6GJRkgsMKZ59j/tcMwpibAmhmttbMR0STSjYjFwbQrD88ioJ67Wrmn9XrzauizTK6ASdonMUoAvUQLeoiUJEkUTP6BW9Odp5cd6dj0VrySlmjtEfOJ8/S8+ORw==AAAB73icbVBNS8NAEN3Urxq/qh69LDaCp5L0ooKHghePFYyttKFstpt26e4m7E6EUvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL84EN+D7305pbX1jc6u87e7s7u0fVA6PHkyaa8pCmopUt2NimOCKhcBBsHamGZGxYK14dDPzW09MG56qexhnLJJkoHjCKQErPXqeC73A9bxeperX/DnwKgkKUkUFmr3KV7ef0lwyBVQQYzqBn0E0IRo4FWzqdnPDMkJHZMA6lioimYkm84On+MwqfZyk2pYCPFd/T0yINGYsY9spCQzNsjcT//M6OSSX0YSrLAem6GJRkgsMKZ59j/tcMwpibAmhmttbMR0STSjYjFwbQrD88ioJ67Wrmn9XrzauizTK6ASdonMUoAvUQLeoiUJEkUTP6BW9Odp5cd6dj0VrySlmjtEfOJ8/S8+ORw==AAAB73icbVBNS8NAEN3Urxq/qh69LDaCp5L0ooKHghePFYyttKFstpt26e4m7E6EUvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL84EN+D7305pbX1jc6u87e7s7u0fVA6PHkyaa8pCmopUt2NimOCKhcBBsHamGZGxYK14dDPzW09MG56qexhnLJJkoHjCKQErPXqeC73A9bxeperX/DnwKgkKUkUFmr3KV7ef0lwyBVQQYzqBn0E0IRo4FWzqdnPDMkJHZMA6lioimYkm84On+MwqfZyk2pYCPFd/T0yINGYsY9spCQzNsjcT//M6OSSX0YSrLAem6GJRkgsMKZ59j/tcMwpibAmhmttbMR0STSjYjFwbQrD88ioJ67Wrmn9XrzauizTK6ASdonMUoAvUQLeoiUJEkUTP6BW9Odp5cd6dj0VrySlmjtEfOJ8/S8+ORw==t2AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5L0ooKHghePFayttKFstpt26e4m7G6EEvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL0o508b3v53S2vrG5lZ5293Z3ds/qBwePegkU4S2SMIT1YmwppxJ2jLMcNpJFcUi4rQdjW9mfvuJKs0SeW8mKQ0FHkoWM4KNlR49zzX9uut5/UrVr/lzoFUSFKQKBZr9yldvkJBMUGkIx1p3Az81YY6VYYTTqdvLNE0xGeMh7VoqsaA6zOcHT9GZVQYoTpQtadBc/T2RY6H1RES2U2Az0sveTPzP62YmvgxzJtPMUEkWi+KMI5Og2fdowBQlhk8swUQxeysiI6wwMTYj14YQLL+8Slr12lXNv6tXG9dFGmU4gVM4hwAuoAG30IQWEBDwDK/w5ijnxXl3PhatJaeYOYY/cD5/AE1Vjkg=AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5L0ooKHghePFayttKFstpt26e4m7G6EEvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL0o508b3v53S2vrG5lZ5293Z3ds/qBwePegkU4S2SMIT1YmwppxJ2jLMcNpJFcUi4rQdjW9mfvuJKs0SeW8mKQ0FHkoWM4KNlR49zzX9uut5/UrVr/lzoFUSFKQKBZr9yldvkJBMUGkIx1p3Az81YY6VYYTTqdvLNE0xGeMh7VoqsaA6zOcHT9GZVQYoTpQtadBc/T2RY6H1RES2U2Az0sveTPzP62YmvgxzJtPMUEkWi+KMI5Og2fdowBQlhk8swUQxeysiI6wwMTYj14YQLL+8Slr12lXNv6tXG9dFGmU4gVM4hwAuoAG30IQWEBDwDK/w5ijnxXl3PhatJaeYOYY/cD5/AE1Vjkg=AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5L0ooKHghePFayttKFstpt26e4m7G6EEvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL0o508b3v53S2vrG5lZ5293Z3ds/qBwePegkU4S2SMIT1YmwppxJ2jLMcNpJFcUi4rQdjW9mfvuJKs0SeW8mKQ0FHkoWM4KNlR49zzX9uut5/UrVr/lzoFUSFKQKBZr9yldvkJBMUGkIx1p3Az81YY6VYYTTqdvLNE0xGeMh7VoqsaA6zOcHT9GZVQYoTpQtadBc/T2RY6H1RES2U2Az0sveTPzP62YmvgxzJtPMUEkWi+KMI5Og2fdowBQlhk8swUQxeysiI6wwMTYj14YQLL+8Slr12lXNv6tXG9dFGmU4gVM4hwAuoAG30IQWEBDwDK/w5ijnxXl3PhatJaeYOYY/cD5/AE1Vjkg=t3AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5LUgwoeCl48VjBaaUPZbDft0t1N2N0IpfRXePGg4tW/481/47bNQVsfDDzem2FmXpxxpo3vfzulldW19Y3ypru1vbO7V9k/uNdprggNScpT1YqxppxJGhpmOG1limIRc/oQD6+n/sMTVZql8s6MMhoJ3JcsYQQbKz16nmu6Z67ndStVv+bPgJZJUJAqFGh2K1+dXkpyQaUhHGvdDvzMRGOsDCOcTtxOrmmGyRD3adtSiQXV0Xh28ASdWKWHklTZkgbN1N8TYyy0HonYdgpsBnrRm4r/ee3cJBfRmMksN1SS+aIk58ikaPo96jFFieEjSzBRzN6KyAArTIzNyLUhBIsvL5OwXrus+bf1auOqSKMMR3AMpxDAOTTgBpoQAgEBz/AKb45yXpx352PeWnKKmUP4A+fzB07bjkk=AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5LUgwoeCl48VjBaaUPZbDft0t1N2N0IpfRXePGg4tW/481/47bNQVsfDDzem2FmXpxxpo3vfzulldW19Y3ypru1vbO7V9k/uNdprggNScpT1YqxppxJGhpmOG1limIRc/oQD6+n/sMTVZql8s6MMhoJ3JcsYQQbKz16nmu6Z67ndStVv+bPgJZJUJAqFGh2K1+dXkpyQaUhHGvdDvzMRGOsDCOcTtxOrmmGyRD3adtSiQXV0Xh28ASdWKWHklTZkgbN1N8TYyy0HonYdgpsBnrRm4r/ee3cJBfRmMksN1SS+aIk58ikaPo96jFFieEjSzBRzN6KyAArTIzNyLUhBIsvL5OwXrus+bf1auOqSKMMR3AMpxDAOTTgBpoQAgEBz/AKb45yXpx352PeWnKKmUP4A+fzB07bjkk=AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5LUgwoeCl48VjBaaUPZbDft0t1N2N0IpfRXePGg4tW/481/47bNQVsfDDzem2FmXpxxpo3vfzulldW19Y3ypru1vbO7V9k/uNdprggNScpT1YqxppxJGhpmOG1limIRc/oQD6+n/sMTVZql8s6MMhoJ3JcsYQQbKz16nmu6Z67ndStVv+bPgJZJUJAqFGh2K1+dXkpyQaUhHGvdDvzMRGOsDCOcTtxOrmmGyRD3adtSiQXV0Xh28ASdWKWHklTZkgbN1N8TYyy0HonYdgpsBnrRm4r/ee3cJBfRmMksN1SS+aIk58ikaPo96jFFieEjSzBRzN6KyAArTIzNyLUhBIsvL5OwXrus+bf1auOqSKMMR3AMpxDAOTTgBpoQAgEBz/AKb45yXpx352PeWnKKmUP4A+fzB07bjkk=t4AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5IUQQUPBS8eKxittKFstpt26e4m7G6EUvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL84408b3v53Syura+kZ5093a3tndq+wf3Os0V4SGJOWpasVYU84kDQ0znLYyRbGIOX2Ih9dT/+GJKs1SeWdGGY0E7kuWMIKNlR49zzXdM9fzupWqX/NnQMskKEgVCjS7la9OLyW5oNIQjrVuB35mojFWhhFOJ24n1zTDZIj7tG2pxILqaDw7eIJOrNJDSapsSYNm6u+JMRZaj0RsOwU2A73oTcX/vHZukotozGSWGyrJfFGSc2RSNP0e9ZiixPCRJZgoZm9FZIAVJsZm5NoQgsWXl0lYr13W/Nt6tXFVpFGGIziGUwjgHBpwA00IgYCAZ3iFN0c5L8678zFvLTnFzCH8gfP5A1Bhjko=AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5IUQQUPBS8eKxittKFstpt26e4m7G6EUvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL84408b3v53Syura+kZ5093a3tndq+wf3Os0V4SGJOWpasVYU84kDQ0znLYyRbGIOX2Ih9dT/+GJKs1SeWdGGY0E7kuWMIKNlR49zzXdM9fzupWqX/NnQMskKEgVCjS7la9OLyW5oNIQjrVuB35mojFWhhFOJ24n1zTDZIj7tG2pxILqaDw7eIJOrNJDSapsSYNm6u+JMRZaj0RsOwU2A73oTcX/vHZukotozGSWGyrJfFGSc2RSNP0e9ZiixPCRJZgoZm9FZIAVJsZm5NoQgsWXl0lYr13W/Nt6tXFVpFGGIziGUwjgHBpwA00IgYCAZ3iFN0c5L8678zFvLTnFzCH8gfP5A1Bhjko=AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5IUQQUPBS8eKxittKFstpt26e4m7G6EUvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL84408b3v53Syura+kZ5093a3tndq+wf3Os0V4SGJOWpasVYU84kDQ0znLYyRbGIOX2Ih9dT/+GJKs1SeWdGGY0E7kuWMIKNlR49zzXdM9fzupWqX/NnQMskKEgVCjS7la9OLyW5oNIQjrVuB35mojFWhhFOJ24n1zTDZIj7tG2pxILqaDw7eIJOrNJDSapsSYNm6u+JMRZaj0RsOwU2A73oTcX/vHZukotozGSWGyrJfFGSc2RSNP0e9ZiixPCRJZgoZm9FZIAVJsZm5NoQgsWXl0lYr13W/Nt6tXFVpFGGIziGUwjgHBpwA00IgYCAZ3iFN0c5L8678zFvLTnFzCH8gfP5A1Bhjko=t5AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5IURAUPBS8eKxittKFstpt26e4m7G6EUvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL84408b3v53Syura+kZ5093a3tndq+wf3Os0V4SGJOWpasVYU84kDQ0znLYyRbGIOX2Ih9dT/+GJKs1SeWdGGY0E7kuWMIKNlR49zzXdM9fzupWqX/NnQMskKEgVCjS7la9OLyW5oNIQjrVuB35mojFWhhFOJ24n1zTDZIj7tG2pxILqaDw7eIJOrNJDSapsSYNm6u+JMRZaj0RsOwU2A73oTcX/vHZukotozGSWGyrJfFGSc2RSNP0e9ZiixPCRJZgoZm9FZIAVJsZm5NoQgsWXl0lYr13W/Nt6tXFVpFGGIziGUwjgHBpwA00IgYCAZ3iFN0c5L8678zFvLTnFzCH8gfP5A1Hnjks=AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5IURAUPBS8eKxittKFstpt26e4m7G6EUvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL84408b3v53Syura+kZ5093a3tndq+wf3Os0V4SGJOWpasVYU84kDQ0znLYyRbGIOX2Ih9dT/+GJKs1SeWdGGY0E7kuWMIKNlR49zzXdM9fzupWqX/NnQMskKEgVCjS7la9OLyW5oNIQjrVuB35mojFWhhFOJ24n1zTDZIj7tG2pxILqaDw7eIJOrNJDSapsSYNm6u+JMRZaj0RsOwU2A73oTcX/vHZukotozGSWGyrJfFGSc2RSNP0e9ZiixPCRJZgoZm9FZIAVJsZm5NoQgsWXl0lYr13W/Nt6tXFVpFGGIziGUwjgHBpwA00IgYCAZ3iFN0c5L8678zFvLTnFzCH8gfP5A1Hnjks=AAAB73icbVBNS8NAEJ3Urxq/qh69LDaCp5IURAUPBS8eKxittKFstpt26e4m7G6EUvorvHhQ8erf8ea/cdvmoK0PBh7vzTAzL84408b3v53Syura+kZ5093a3tndq+wf3Os0V4SGJOWpasVYU84kDQ0znLYyRbGIOX2Ih9dT/+GJKs1SeWdGGY0E7kuWMIKNlR49zzXdM9fzupWqX/NnQMskKEgVCjS7la9OLyW5oNIQjrVuB35mojFWhhFOJ24n1zTDZIj7tG2pxILqaDw7eIJOrNJDSapsSYNm6u+JMRZaj0RsOwU2A73oTcX/vHZukotozGSWGyrJfFGSc2RSNP0e9ZiixPCRJZgoZm9FZIAVJsZm5NoQgsWXl0lYr13W/Nt6tXFVpFGGIziGUwjgHBpwA00IgYCAZ3iFN0c5L8678zFvLTnFzCH8gfP5A1Hnjks=\ff\u22121((\u2212\u221e, t]), t \u2208 R) and records all topological events (i.e. births and deaths of topological\nfeatures) in f\u22121((\u2212\u221e, t]) when t goes from \u2212\u221e to +\u221e. PH outputs a 2-dimensional point multiset,\ncalled persistence diagram (PD), illustrated in Figure 1, where each 2-dimensional point represents a\nlifespan of a particular topological feature with its birth and death time as its coordinates.\n\n(cid:80)\nWasserstein geometry. Persistence diagram Dg can be considered as a discrete measure \u00b5Dg =\nu\u2208Dg \u03b4u where \u03b4u is the Dirac unit mass on u. Therefore, the bottleneck metric (a.k.a. \u221e-\nWasserstein metric) is a popular choice to measure distances on the set of PDs with bounded\ncardinalities. Given two PDs Dgi and Dgj, the bottleneck distance W\u221e [Cohen-Steiner et al., 2007,\nCarriere et al., 2017, Adams et al., 2017] is de\ufb01ned as\n\n(cid:107)x \u2212 \u03b3(x)(cid:107)\u221e ,\n\nsup\n\nx\u2208Dgi\u222a\u2206\n\n(cid:1) = inf\n\n\u03b3\n\nW\u221e(cid:0)Dgi, Dgj\n\uf8ee\uf8f0 1\n\n\u03c1Dg :=\n\n(cid:88)\n\n\uf8f9\uf8fb\n\nwhere \u2206 := {(a, a) | a \u2208 R} is the diagonal set, and \u03b3 : Dgi \u222a \u2206 \u2192 Dgj \u222a \u2206 is bijective.\nFisher information geometry. Given a bandwidth \u03c3 > 0, for a set \u0398, one can smooth and\nnormalize \u00b5Dg as follows,\n\n,\n\n\u0398\n\nZ\n\nu\u2208Dg\n\n(cid:80)\n\nN(x; u, \u03c3I)\n\nwhere Z =(cid:82)\neach PD can be regarded as a point in a probability simplex P :=(cid:8)\u03c1 |(cid:82) \u03c1(x)dx = 1, \u03c1(x) \u2265 0(cid:9)1.\n\nu\u2208Dg N(x; u, \u03c3I)dx, N is a Gaussian function and I is an identity matrix. Therefore,\n\nIn case, one chooses \u0398 as an entire Euclidean space, each PD turns into a probability distribution as\nin [Anirudh et al., 2016, Adams et al., 2017].\nFisher information metric (FIM)2 is a well-known Riemannian geometry on the probability simplex\nP, especially in information geometry [Amari and Nagaoka, 2007]. Given two points \u03c1i and \u03c1j in P,\nthe Fisher information metric is de\ufb01ned as\n\nx\u2208\u0398\n\n(1)\n\n(cid:18)(cid:90)(cid:113)\n\n(cid:19)\n\ndP (\u03c1i, \u03c1j) = arccos\n\n\u03c1i(x)\u03c1j(x)dx\n\n.\n\n(2)\n\n3 Persistence Fisher Kernel (PF Kernel)\n\nIn this section, we propose the Persistence Fisher (PK) kernel for persistence diagrams (PDs).\nFor the bottleneck distance, two PDs Dgi and Dgj may be two discrete measures with different\nmasses. So, the transportation plan \u03b3 is bijective between Dgi \u222a \u2206 and Dgj \u222a \u2206 instead of between\nDgi and Dgj. Carriere et al. [2017], for instance, used Wasserstein distance between Dgi and Dgj\nwhere its transportation plans operate between Dgi \u222a Dgj\u2206 and Dgj \u222a Dgi\u2206 (nonnegative, not\nnecessarily normalized measures with same masses). Here, we denote Dgi\u2206 := {\u03a0\u2206(u) | u \u2208 Dgi}\nwhere \u03a0\u2206(u) is a projection of a point u on the diagonal set \u2206. Following this line of work, we also\nconsider a distance between two measures Dgi \u222a Dgj\u2206 and Dgi \u222a Dgj\u2206 as a distance between Dgi\nand Dgj for the Fisher information metric.\nDe\ufb01nition 1. Let Dgi, Dgj be two \ufb01nite and bounded persistence diagrams. The Fisher information\nmetric between Dgi and Dgj is de\ufb01ned as follows,\n\ndFIM(Dgi, Dgj) := dP\n\n\u03c1(Dgi\u222aDgj\u2206), \u03c1(Dgj\u222aDgj\u2206)\n\n(3)\nLemma 3.1. Let D be the set of bounded and \ufb01nite persistent diagrams. Then, (dFIM \u2212 \u03c4 ) is negative\nde\ufb01nite on D for all \u03c4 \u2265 \u03c0\n2 .\nProof. Let consider the function \u03c4 \u2212 arccos(\u03be) where \u03c4 \u2265 \u03c0\nseries expansion for arccos(\u03be) at 0, we have\n\u03c4 \u2212 arccos(\u03be) = \u03c4 \u2212 \u03c0\n2\n\n2 and \u03be \u2208 [0, 1], then apply the Taylor\n\n\u221e(cid:88)\n\n22i(i!)2(2i + 1)\n\nx2i+1.\n\n(2i)!\n\n+\n\n.\n\ni=0\n\n(cid:16)\n\n(cid:17)\n\n1In case, \u0398 is an in\ufb01nite set, then the corresponding probability simplex P has in\ufb01nite dimensions.\n2FIM is also known as a particular pull-back metric on Riemannian manifold [Le and Cuturi, 2015b].\n\n3\n\n\fSo, all coef\ufb01cients of the Taylor series expansion are nonnegative. Following [Schoenberg, 1942]\n(Theorem 2, p. 102), for \u03c4 \u2265 \u03c0\n2 and \u03be \u2208 [0, 1], \u03c4 \u2212 arccos(\u03be) is positive de\ufb01nite. Consequently,\narccos(\u03be) \u2212 \u03c4 is negative de\ufb01nite. Furthermore, for any PDs Dgi and Dgj in D, we have\n\n(cid:90) (cid:113)\n\n0 \u2264\n\n\u00af\u03c1i(x)\u00af\u03c1j(x)dx \u2264 1,\n\n2 and \u03b1 = exp (\u2212t\u03c4 ) > 0.\n\nwhere we denote \u00af\u03c1i = \u03c1(Dgi\u222aDgj\u2206) and \u00af\u03c1j = \u03c1(Dgj\u222aDgi\u2206). The lower bound is due to nonnegativity\nof the probability simplex while the upper bound follows from the Cauchy-Schwarz inequality. Hence,\ndFIM \u2212 \u03c4 is negative de\ufb01nite on D for all \u03c4 \u2265 \u03c0\n2 .\nBased on Lemma 3.1, we propose a positive de\ufb01nite kernel for PDs under the Fisher information\ngeometry by following [Berg et al., 1984] (Theorem 3.2.2, p.74), namely the Persistence Fisher\nkernel,\n\nkPF(Dgi, Dgj) := exp(cid:0)\u2212tdFIM(Dgi, Dgj)(cid:1) ,\n\n(4)\nwhere t is a positive scalar since we can rewrite the Persistence Fisher kernel as kPF(Dgi, Dgj) =\n\n\u03b1 exp(cid:0)\u2212t(cid:0)dFIM(Dgi, Dgj) \u2212 \u03c4(cid:1)(cid:1) where \u03c4 \u2265 \u03c0\nRemark 1. Let S+ :=(cid:8)\u03bd |(cid:82) \u03bd2(x)dx = 1, \u03bd(x) \u2265 0(cid:9) be the positive orthant of the sphere, and\nTo the best of our knowledge, the kPF is the \ufb01rst kernel relying on the Fisher information geometry\nfor measuring the similarity of PDs. Moreover, the kPF is positive de\ufb01nite without any approximation.\n\u221a\u00b7, where the square root is an element-wise function which\nde\ufb01ne the Hellinger mapping h(\u00b7) :=\ntransforms the probability simplex P into S+. The Fisher information metric between \u03c1i and \u03c1j in P\n(Equation (2)) is equivalent to the geodesic distance between h(\u03c1i) and h(\u03c1j) in S+. From [Levy\nand Loeve, 1965], the geodesic distance in S+ is a measure de\ufb01nite kernel distance. Following [Istas,\n2012] (Proposition 2.8), the geodesic distance in S+ is negative de\ufb01nite. This result is also noted in\n[Feragen et al., 2015]. From [Berg et al., 1984] (Theorem 3.2.2, p.74), the Persistence Fisher kernel\nis positive de\ufb01nite. Therefore, our proof technique is not only independent and direct for the Fisher\ninformation metric on the probability simplex without relying on the geodesic distance on S+, but\nalso valid for the case of in\ufb01nite dimensions due to [Schoenberg, 1942] (Theorem 2, p. 102).\nRemark 2. A closely related kernel to the Persistence Fisher kernel is the diffusion kernel [Lafferty\nand Lebanon, 2005] (p. 140), based on the heat equation on the Riemannian manifold de\ufb01ned by the\nFisher information metric to exploit the geometric structure of statistical manifolds. A generalized\nfamily of kernels for the diffusion kernel is exploited in [Jayasumana et al., 2015, Feragen et al.,\n2015]. To the best of our knowledge, the diffusion kernel has not been used for measuring the\nsimilarity of PDs. If one uses the Fisher information metric (De\ufb01nition 1) for PDs, and then plug\nthe distance into the diffusion kernel, one obtains a similar form to our proposed Persistence Fisher\nkernel. A slight difference is that the diffusion kernel relies on d2\nFIM while the Persistence Fisher\nkernel is built upon dFIM itself. However, the Persistence Fisher kernel is positive de\ufb01nite while it is\nunclear whether the diffusion kernel is positive de\ufb01nite3.\n\nComputation. Given two \ufb01nite PDs Dgi and Dgj with cardinalities bounded by N, in practice,\nwe consider a \ufb01nite set \u0398 := Dgi \u222a Dgj\u2206 \u222a Dgj \u222a Dgi\u2206 without multiplicity in R2 for smoothed\nand normalized measures \u03c1(\u00b7) (Equation 1)4. Then, let m be the cardinality of \u0398, we have m \u2264 4N.\nConsequently, the time complexity of \u03c1(\u00b7) is O(N m). For acceleration, we propose to apply the\nFast Gauss Transform [Greengard and Strain, 1991, Morariu et al., 2009] to approximate the sum of\nGaussian functions in \u03c1(\u00b7) with a bounded error. The time complexity of \u03c1(\u00b7) is reduced from O(N m)\nto O(N + m). Due to the low dimension of points in PDs (R2), this approximation by the Fast Gauss\nTransform is very ef\ufb01cient in practice. Additionally, dP (Equation (2)) is evaluated between two points\n\nin the m-dimensional probability simplex Pm\u22121 where Pm\u22121 :=(cid:8)x | x \u2208 Rm\n\n+ ,(cid:107)x(cid:107)1 = 1(cid:9). So, the\n\ntime complexity of the Persistence Fisher kernel kPF between two smoothed and normalized measures\nis O(m). Hence, the time complexity of kPF between Dgi and Dgj is O(N 2), or O(N ) for the\nacceleration version with Fast Gauss Transform. We summarize the computation of dFIM in Algorithm\n3Although the heat kernel is positive de\ufb01nite, the diffusion kernel on the probability simplex\u2014the heat kernel\non multinomial manifold\u2014does not have an explicit form. In practice, the diffusion kernel equation [Lafferty\nand Lebanon, 2005] (p. 140) is only its \ufb01rst-order approximation.\n\n4We leave the computation with an in\ufb01nite set \u0398 for future work.\n\n4\n\n\fTable 1: A comparison for time complexities and metric preservation of kernels for PDs. Noted that\nthe SW kernel is built upon the SW distance (an approximation of Wasserstein metric) while the PF\nkernel uses the Fisher information metric without approximation.\n\nTime complexity\nTime complexity with approximation O(N )\nMetric preservation\n\nO(N 2) O(N 2) O(N 2 log N ) O(N 2)\nO(N ) O(M N log N ) O(N )\n\nkSW\n\n(cid:88)\n\nkPF\n\n(cid:88)\n\nkPSS\n\nkPWG\n\n1, where the second and third steps can be approximated with a bounded error via Fast Gaussian\nTransform with a linear time complexity O(N ). Source code for Algorithm 1 can be obtained\nin http://github.com/lttam/PersistenceFisher. We recall that the time complexity of the\nWasserstein distance between Dgi and Dgj is O(N 3 log N ) [Pele and Werman, 2009] (\u00a72.1). For\nthe Sliced Wasserstein distance (an approximation of Wasserstein distance), the time complexity is\nO(N 2 log N ) [Carriere et al., 2017], or O(M N log N ) for its approximation with M projections\n[Carriere et al., 2017]. We also summary a comparison for the time complexity and metric preservation\nof kPF and related kernels for PDs in Table 1.\n\nAlgorithm 1 Compute dFIM for persistence diagrams\nInput: Persistence diagrams Dgi, Dgj, and a bandwith \u03c3 > 0 for smoothing\nOutput: dFIM\n1: Let \u0398 \u2190 Dgi \u222a Dgj\u2206 \u222a Dgj \u222a Dgi\u2206 (a set for smoothed and normalized measures)\n\n2: Compute \u00af\u03c1i = \u03c1(Dgi\u222aDgj\u2206) \u2190(cid:104) 1\n(cid:80)\nwhere Z \u2190(cid:80)\n4: Compute dFIM \u2190 arccos(cid:0)(cid:10)\u221a\n\nZ\nu\u2208Dgi\u222aDgj\u2206\n3: Compute \u00af\u03c1j = \u03c1(Dgj\u222aDgi\u2206) similarly as \u00af\u03c1i.\n\u221a\n\n(cid:80)\n(cid:11)(cid:1) where (cid:104)\u00b7,\u00b7(cid:105) is a dot product and\n\nu\u2208Dgi\u222aDgj\u2206\n\nN(x; u, \u03c3I)\n\nN(x; u, \u03c3I)\n\nx\u2208\u0398\n\n\u00af\u03c1i,\n\n\u00af\u03c1j\n\nx\u2208\u0398\n\n(cid:105)\n\n\u221a\u00b7 is element-wise.\n\n4 Theoretical Analysis\n\nd\u22121 where S+\n\nd\u22121 :=(cid:8)x | x \u2208 Rd\n\nIn this section, we analyze for the Persistence Fisher kernel kPF (in Equation (4)) where the Hellinger\nmapping h of a smoothed and normalized measure \u03c1(\u00b7) is on the positive orthant of the d-dimension\nunit sphere S+\nd\u22121. We denote xi and\nbounded and \ufb01nite PDs, and \u00b5 be the uniform probability distribution on S+\nxj \u2208 S+\nd\u22121 as corresponding mapped points through the Hellinger mapping h of smoothed and\nnormalized measures \u03c1(Dgi\u222aDgj\u2206) and \u03c1(Dgj\u222aDgi\u2206) respectively. Then, we rewrite the Persistence\nFisher kernel between xi and xj as follows,\n\n+,(cid:107)x(cid:107)2 = 1(cid:9)5. Let Dgi, Dgj be PDs in the set D of\n\nkPF(xi, xj) = exp (\u2212t arccos ((cid:104)xi, xj(cid:105))).\n\n(5)\n\nEigensystem. Let TkPF\nPersistence Fisher kernel kPF, which is de\ufb01ned as\n\n: L2(S+\n\nd\u22121, \u00b5) \u2192 L2(S+\n\n(cid:90)\n\n(TkPFf ) (\u00b7) :=\n\nkPF(x,\u00b7)f (x)d\u00b5(x).\n\nd\u22121, \u00b5) be the integral operator induced by the\n\nFollowing [Smola et al., 2001] (Lemma 4), we derive an eigensystem of the integral operator TkPF as\nin Proposition 1.\nProposition 1. Let {ai}i\u22650 be the coef\ufb01cients of Legendre polynomial expansion of the Persistence\nFisher kernel kPF(x, z) de\ufb01ned on S+\n\nd\u22121 as in Equation (5),\n\nd\u22121 \u00d7 S+\n\naiP d\n\ni ((cid:104)x, z(cid:105)),\n\n(6)\n\nkPF(x, z) =\n\n5It is corresponding to a \ufb01nite set \u0398.\n\n\u221e(cid:88)\n\ni=0\n\n5\n\n\fi is the associated Legendre polynomial of degree i. Let |Sd\u22121| := 2\u03c0d/2\n\nspherical harmonics of order i on Sd\u22121, and(cid:8)Y d\n\nwhere P d\nof Sd\u22121 where \u0393(\u00b7) is the Gamma function, N (d, i) := (d+2i\u22122)(d+i\u22123)!\n\n\u0393(d/2) denote the surface\ndenote the multiplicity of\ndenote any \ufb01xed orthonormal basis\nfor the subspace of all homogeneous harmonics of order i on Sd\u22121. Then, the eigensystem (\u03bbi,j, \u03c6i,j)\nof the integral operator TkPF induced by the Persistence Fisher kernel kPF is\n\n1\u2264j\u2264N (d,i)\n\n(d\u22122)!i!\n\n(cid:9)\n\ni,j\n\n\u03c6i,j = Y d\ni,j,\nai |Sd\u22121|\nN (d, i)\n\n\u03bbi,j =\n\n(7)\n\n(8)\n\nof multiplicity N (d, i).\n\n[Muller, 2012] (\u00a74, p. 29), we have(cid:80)N (d,i)\ninto Equation (6), and note that(cid:82)\n\nProof. From the Addition Theorem [Muller, 2012] (Theorem 2, p. 18) and the Funk-Hecke formula\ni ((cid:104)x, z(cid:105)), then replace P d\n\ni,j(z) = N (d,i)\n\nj=1 Y d\nY d\ni,j(x)Y d\n\ni,j(x)Y d\ni(cid:48),j(cid:48)(x)dx = \u03b4i,i(cid:48)\u03b4j,j(cid:48), we complete the proof.\n\n|Sd\u22121| P d\n\ni\n\nSd\u22121\n\nProposition 2. All coef\ufb01cients of Legendre polynomial expansion of the Persistence Fisher kernel\nare nonnegative.\n\nProof. From Lemma 3.1, the kPF is positive de\ufb01nite. Applying Schoenberg [1942] (Theorem 1, p.\n101) for kPF de\ufb01ned on S+\n\nd\u22121 as in Equation (5), we obtain the result.\n\nd\u22121 \u00d7 S+\n\nThe eigensystem of the integral operator TkPF induced by the PF kernel plays an important role to\nderive generalization error bounds for kernel machines with the proposed PF kernel via covering\nnumbers and Rademacher averages as in Proposition 3 and Proposition 4 respectively.\n\nCovering numbers. Given a set of \ufb01nite points S = (cid:8)xi | xi \u2208 S+\n\nd\u22121, d \u2265 3(cid:9), the Persistence\n\nFisher kernel hypothesis class with R-bounded weight vectors for S is de\ufb01ned as follows\n\nFR(S) = {f | f(xi) = (cid:104)w, \u03c6 (xi)(cid:105)H ,(cid:107)w(cid:107)H \u2264 R} ,\n\nwhere (cid:104)\u03c6 (xi) , \u03c6 (xj)(cid:105)H = kPF(xi, xj). (cid:104)\u00b7,\u00b7(cid:105)H and (cid:107)\u00b7(cid:107)H are an inner product and a norm in the\ncorresponding Hilbert space respectively. Following [Guo et al., 1999], we derive bounds on\nthe generalization performance of the PF kernel on kernel machines via the covering numbers\nN (\u00b7,FR(S)) [Shalev-Shwartz and Ben-David, 2014] (De\ufb01nition 27.1, p. 337) as in Proposition 3.\nProposition 3. Assume the number of non-zero coef\ufb01cients {ai} in Equation (6) is \ufb01nite, and r is\nthe maximum index of the non-zero coef\ufb01cients. Let q := arg maxi \u03bbi,\u00b7, choose \u03b1 \u2208 N such that\n. Then,\n\u03b1 <\n\n2 with i (cid:54)= q, and de\ufb01ne \u03b5 := 6R\n\n(cid:17)N (d,q)\n\n(cid:16) \u03bbq,\u00b7\n\n(cid:114)\n\nN (d, r)\n\n(cid:16)\n\n(cid:17)\n\ni=0,i(cid:54)=q ai\n\n\u03bbi,\u00b7\n\naq\u03b1\u22122/N (d,q) +(cid:80)\u221e\n(cid:13)(cid:13)\u221e \u2264 (cid:113) N (d,i)\n\nN (\u03b5,FR(S)) \u2264 \u03b1.\n\nsup\nxi\u2208S\n\nProof. From [Minh et al., 2006] (Lemma 3), we have(cid:13)(cid:13)Y d\n(cid:13)(cid:13)\u221e \u2264 (cid:113) N (d,r)\neigenfunctions of kPF satisfy that(cid:13)(cid:13)Y d\n\n|Sd\u22121| . It is easy to check\nthat \u2200d \u2265 3, i \u2265 j \u2265 0, we have N (d, i) \u2265 N (d, j). Therefore, following Proposition 1, all\n|Sd\u22121| . Additionally, the multiplicity of \u03bbi,\u00b7 is\nN (d, i), and N (d, i)\u03bbi,\u00b7 = ai |Sd\u22121| (Equation (8)). Hence, from [Guo et al., 1999] (Theorem 1), we\nobtain the result.\n\ni,j\n\ni,j\n\nRademacher averages. We provide a different family of generalization error bounds via\nRademacher averages [Bartlett et al., 2005]. By plugging the eigensystem of the PF kernel as\nin Proposition 1 into the localized averages of function classes based on the PF kernel with respect to\nthe uniform probability distribution \u00b5 on S+\nd\u22121 [Mendelson, 2003] (Theorem 2.1), we obtain a bound\nas in Proposition 4.\n\n6\n\n\fProposition 4. Let {xi}1\u2264i\u2264m be independent, distributed according to the uniform probabil-\nd\u22121, denote {\u03c3i}1\u2264i\u2264m for independent Rademacher random variables,\nity distribution \u00b5 on S+\nHkPF for the unit ball of the reproducing kernel Hilbert space corresponding with the Riema-\nIf \u03bbq,\u00b7 \u2265 1/m, for \u03c4 \u2265 1/(m|Sd\u22121|),\nnian manifold kernel kPF, and let q := arg maxi \u03bbi,\u00b7.\n\nlet \u03a8(\u03c4 ) :=\n\nN (d, i)\n\n, then there are absolute constants C(cid:96) and Cu\n\n(9)\n\n(cid:118)(cid:117)(cid:117)(cid:116)|Sd\u22121|\n\n(cid:32) (cid:80)\n\nai + \u03c4 (cid:80)\n\nai<\u03c4 N (d,i)\n\nai\u2265\u03c4 N (d,i)\n\nwhich satisfy\n\nC(cid:96)\u03a8(\u03c4 ) \u2264 E sup\nf\u2208HkPF\nE\u00b5 f2\nd\u22121|\u2264\u03c4\n|S\n\nwhere E is an expectation.\n\n(cid:33)\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) m(cid:88)\n\ni=1\n\n(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) \u2264 Cu\u03a8(\u03c4 ),\n\n\u03c3if(xi)\n\nFrom Proposition 3 and Proposition 4, a decay rate of the eigenvalues of the integral operator TkPF is\nrelative with the capacity of the kernel learning machines. When the decay rate of the eigenvalues is\nlarge, the capacity of kernel machines is reduced. So, if the training error of kernel machines is small,\nthen it can lead to better bounds on generalization error. The resulting bounds for both the covering\nnumber (Proposition 3) and the Rademacher averages (Proposition 4) are essentially the same as the\nstandard ones for a Gaussian kernel on a Euclidean space.\n\nBounding for kPF induced squared distance with respect to dFIM. The squared distance induced\nby the PF kernel, denoted as d2\nkPF, can be computed by the Hilbert norm of the difference between\ntwo corresponding mappings. Given two persistent diagram Dgi and Dgj, we have\n\n(cid:0)Dgi, Dgj\n\n(cid:1) := kPF (Dgi, Dgi) + kPF\n\n(cid:0)Dgj, Dgj\n\n(cid:1) \u2212 2kPF\n\n(cid:0)Dgi, Dgj\n\n(cid:1) .\n\nd2\nkPF\n\nWe recall that kPF is based on the Fisher information geometry. So, it is of interest to bound the PF\nkernel induced squared distance d2\nkPF with respect to the corresponding Fisher information metric\ndFIM between PDs as in Lemma 4.1.\nLemma 4.1. Let D be the set of bounded and \ufb01nite persistent diagrams. Then, \u2200Dgi, Dgj \u2208 X,\n\nkPF (Dgi, Dgj) \u2264 2tdFIM(Dgi, Dgj),\nd2\n\nwhere t is a parameter of kPF.\n\n(cid:0)Dgi, Dgj\n\nProof. We have d2\n2tdFIM\n\nkPF(Dgi, Dgj) = 2(cid:0)1 \u2212 kPF\n(cid:0)Dgi, Dgj\n(cid:1), since 1 \u2212 exp(\u2212a) \u2264 a,\u2200a \u2265 0.\n\n(cid:1)(cid:1) = 2(1 \u2212 exp(cid:0)\u2212tdFIM\n\n(cid:0)Dgi, Dgj\n\n(cid:1)(cid:1) \u2264\n\nFrom Lemma 4.1, it implies that the Persistence Fisher kernel is stable on Riemannian geometry\nin a similar sense as the work of Kwitt et al. [2015], and Reininghaus et al. [2015] on Wasserstein\ngeometry.\n\nIn\ufb01nite divisibility for the Persistence Fisher kernel.\nLemma 4.2. The Persistence Fisher kernel kPF is in\ufb01nitely divisible.\n\nProof. For m \u2208 N\u2217, let kPFm := exp(cid:0)\u2212 t\n\nm dFIM\n\n(cid:1), so (kPFm)m = kPF and note that kPFm is positive de\ufb01nite.\n\nHence, following Berg et al. [1984] (\u00a73, De\ufb01nition 2.6, p. 76), we have the result.\n\nAs for in\ufb01nitely divisible kernels, the Gram matrix of the PF kernel does not need to be recomputed for\neach choice of t (Equation (4)), since it suf\ufb01ces to compute the Fisher information metric between PDs\nin training set only once. This property is shared with the Sliced Wasserstein kernel [Carriere et al.,\n2017]. However, neither Persistence Scale Space kernel [Reininghaus et al., 2015] nor Persistence\nWeighted Gaussian kernel [Kusano et al., 2016] has this property.\n\n7\n\n\fTable 2: Results on SVM classi\ufb01cation. The averaged accuracy (%) and standard deviation are shown.\n\nkPSS\nkPWG\nkSW\nProb+kG\nTang+kG\nkPF\n\nMPEG7\n\n73.33 \u00b1 4.17\n74.83 \u00b1 4.36\n76.83 \u00b1 3.75\n55.83 \u00b1 5.45\n66.17 \u00b1 4.01\n80.00 \u00b1 4.08\n\nOrbit\n\n72.38 \u00b1 2.41\n76.63 \u00b1 0.66\n83.60 \u00b1 0.87\n72.89 \u00b1 0.62\n77.32 \u00b1 0.72\n85.87 \u00b1 0.77\n\n5 Experimental Results\n\nWe evaluated the Persistence Fisher kernel with support vector machines (SVM) on many benchmark\ndatasets. We consider \ufb01ve baselines as follows: (i) the Persistence Scale Space kernel (kPSS), (ii)\nthe Persistence Weighted Gaussian kernel (kPWG), (iii) the Sliced Wasserstein kernel (kSW), (iv) the\nsmoothed and normalized measures in the probability simplex with the Gaussian kernel (Prob + kG),\nand (v) the tangent vector representation [Anirudh et al., 2016] with the Gaussian kernel (Tang +\nkG). Practically, Euclidean metric is not a suitable geometry for the probability simplex [Le and\nCuturi, 2015a,b]. So, the (Prob + kG) approach may not work well for PDs. For hyper-parameters, we\ntypically choose them through cross validation. For baseline kernels, we follow their corresponding\nauthors to form sets of hyper-parameter candidates, and the bandwidth of the Gaussian kernel in\n(Prob + kG) and (Tang + kG) is chosen from 10{\u22123:1:3}. For the Persistence Fisher kernel, there are 2\nhyper-parameters: t (Equation (4)) and \u03c3 for smoothing measures (Equation (1)). We choose 1/t\nfrom {q1, q2, q5, q10, q20, q50} where qs is the s% quantile of a subset of Fisher information metric\n\nbetween PDs, observed on the training set, and \u03c3 from(cid:8)10\u22123:1:3(cid:9). For SVM, we use Libsvm (one-\nof SVM from(cid:8)10\u22122:1:2(cid:9). For PDs, we used the DIPHA toolbox6.\n\nvs-one) [Chang and Lin, 2011] for multi-class classi\ufb01cation, and choose a regularization parameter\n\n5.1 Orbit Recognition\n\n(5K/300)\n\nMPEG7\n(200/80)\n\nGranular\n(35/20.4K)\n\nSiO2\n\n(80/30K)\n\nTable 3: Computational time (seconds) with approximation.\nFor each dataset, the \ufb01rst number in the parenthesis is the\nnumber of PDs while the second one is the maximum number\nof points in PDs.\nOrbit\n\nIt is a synthesized dataset proposed\nby [Adams et al., 2017] (\u00a76.4.1) for\nlinked twist map which is a discrete\ndynamical system modeling \ufb02ow. The\nlinked twist map is used to model\n\ufb02ows in DNA microarrays [Hertzsch\net al., 2007]. Given a parameter r > 0,\nand initial positions (s0, t0) \u2208 [0, 1]2,\nits orbit is described as si+1 = si +\nrti(1 \u2212 ti) mod 1, and ti+1 = ti +\nrsi+1(1\u2212si+1) mod 1. Adams et al.\n[2017] proposed 5 classes, corresponding to 5 different parameters r = 2.5, 3.5, 4, 4.1, 4.3. For each\nparameter r, we generated 1000 orbits where each orbit has 1000 points with random initial posi-\ntions. We randomly split 70%/30% for training and test, and repeated 100 times. We extract only\n1-dimensional topological features with Vietoris-Rips complex \ufb01ltration [Edelsbrunner and Harer,\n2008] for PDs. The accuracy results on SVM are summarized in the third column of Table 2. The\nPF kernel outperforms all other baselines. The (Prob + kG) does not performance well as expected.\nMoreover, the kPF and kSW which enjoy the Fisher information geometry and Wasserstein geometry\nfor PDs respectively, clearly outperform other approaches. As in the second column of Table 3, the\ncomputational time of kPF is faster than kPSS, but slower than kSW and kPWG for PDs.\n\n6473\n8756\n11024\n9891\n\n8.30\n17.44\n38.14\n22.70\n\nkSW\nkPWG\nkPSS\nkPF\n\n1.55\n5.23\n7.51\n6.63\n\n249\n288\n515\n318\n\n5.2 Object Shape Classi\ufb01cation\n\nWe consider a 10-class subset7 of MPEG7 object shape dataset [Latecki et al., 2000]. Each class has\n20 samples. We resize each image such that its length is shorter or equal 256, and extract a boundary\nfor object shapes before computing PDs. For simplicity, we only consider 1-dimensional topological\n\n6https://github.com/DIPHA/dipha\n7The 10-classes are: apple, bell, bottle, car, classic, cup, device0, face, Heart and key.\n\n8\n\n\fFigure 2: The kernel Fisher discriminant ratio (KFDR) graphs.\n\nfeatures with the traditional Vietoris-Rips complex \ufb01ltration [Edelsbrunner and Harer, 2008] for PDs8.\nWe also randomly split 70%/30% for training and test, and repeated 100 times. The accuracy results\non SVM are summarized in the second column of Table 2. The Persistence Fisher kernel compares\nfavorably with other baseline kernels for PDs. All approaches based on the implicit representation\nvia kernels for PDs outperform ones based on the explicit vector representation with Gaussian kernel\nby a large margin. Additionally, the kPF and kSW also compares favorably with other approaches. As\nin the third column of Table 3, the computational time of kPF is comparative with kPWG and kPSS, but\nslower than the kSW.\n\n5.3 Change Point Detection for Material Data Analysis\n\nWe evaluated the proposed kernel for the change point detection problem for material data analysis on\ngranular packing system [Francois et al., 2013] and SiO2[Nakamura et al., 2015] datasets. We use the\nkernel Fisher discriminant ratio [Harchaoui et al., 2009] (KFDR) as a statistical quantity and set 10\u22123\nfor the regularization of KFDR as in [Kusano et al., 2018]. We use the ball model \ufb01ltration to extract\nthe 2-dimensional topological features of PDs for granular packing system dataset, and 1-dimensional\ntopological features of PDs for SiO2 dataset. We illustrate the KFDR graphs for the granular packing\nsystem and SiO2 datasets in Figure 2. For granular tracking system dataset, all methods obtain\nthe change point as the 23rd index. They supports the observation result in [Anonymous, 1972]\n(corresponding id = 23). For the SiO2 datasets, all methods obtain the results within the supported\nrange (35 \u2264 id \u2264 50) from the traditional physical approach [Elliott, 1983]. The kPF compares\nfavorably with other baseline approaches as in Figure 2. As in the fourth and \ufb01fth columns of Table 3,\nkPF is faster than kPSS, but slower than kSW and kPWG.\n\n6 Conclusions\n\nIn this work, we propose the positive de\ufb01nite Persistence Fisher (PF) kernel for persistence diagrams\n(PDs). The PF kernel is relied on the Fisher information geometry without approximation for PDs.\nMoreover, the proposed kernel has many nice properties from both theoretical and practical aspects\nsuch as stability, in\ufb01nite divisibility, linear time complexity over the number of points in PDs, and\nimproving performances of other baseline kernels for PDs as well as implicit vector representation\nwith Gaussian kernel for PDs in many different tasks on various benchmark datasets.\n\n8A more advanced \ufb01ltration for this task was proposed in [Turner et al., 2014].\n\n9\n\n0102030(id = 23)Granular packing systemkPSS0102030(id = 23)kPWG0102030(id = 23)kSW0102030(id = 23)Prob + kG0102030(id = 23)Tang + kG0102030(id = 23)kPF020406080(id = 46)SiO2020406080(id = 37)020406080(id = 43)020406080(id = 35)020406080(id = 35)020406080(id = 42)\fAcknowledgments\n\nWe thank Ha Quang Minh, and anonymous reviewers for their comments. TL acknowledges the\nsupport of JSPS KAKENHI Grant number 17K12745. MY was supported by the JST PRESTO\nprogram JPMJPR165A.\n\nReferences\nHenry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman,\nSofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier. Persistence images: A\nstable vector representation of persistent homology. The Journal of Machine Learning Research,\n18(1):218\u2013252, 2017.\n\nShun-ichi Amari and Hiroshi Nagaoka. Methods of information geometry, volume 191. American\n\nMathematical Soc., 2007.\n\nRushil Anirudh, Vinay Venkataraman, Karthikeyan Natesan Ramamurthy, and Pavan Turaga. A\nriemannian framework for statistical analysis of topological persistence diagrams. In Proceedings\nof the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 68\u201376,\n2016.\n\nAnonymous. What is random packing? Nature, 239:488\u2013489, 1972.\n\nPeter L Bartlett, Olivier Bousquet, Shahar Mendelson, et al. Local rademacher complexities. The\n\nAnnals of Statistics, 33(4):1497\u20131537, 2005.\n\nChristian Berg, Jens Peter Reus Christensen, and Paul Ressel. Harmonic analysis on semigroups.\n\nSpringer-Verlag, 1984.\n\nPeter Bubenik. Statistical topological data analysis using persistence landscapes. The Journal of\n\nMachine Learning Research, 16(1):77\u2013102, 2015.\n\nZixuan Cang, Lin Mu, Kedi Wu, Kristopher Opron, Kelin Xia, and Guo-Wei Wei. A topological\n\napproach for protein classi\ufb01cation. Molecular Based Mathematical Biology, 3(1), 2015.\n\nGunnar Carlsson, Tigran Ishkhanov, Vin De Silva, and Afra Zomorodian. On the local behavior of\n\nspaces of natural images. International journal of computer vision, 76(1):1\u201312, 2008.\n\nMathieu Carriere, Steve Y Oudot, and Maks Ovsjanikov. Stable topological signatures for points on\n\n3d shapes. In Computer Graphics Forum, volume 34, pages 1\u201312. Wiley Online Library, 2015.\n\nMathieu Carriere, Marco Cuturi, and Steve Oudot. Sliced Wasserstein kernel for persistence dia-\ngrams. In Proceedings of the 34th International Conference on Machine Learning, volume 70 of\nProceedings of Machine Learning Research, pages 664\u2013673, 2017.\n\nChih-Chung Chang and Chih-Jen Lin. Libsvm: a library for support vector machines. ACM\n\ntransactions on intelligent systems and technology (TIST), 2(3):27, 2011.\n\nFrederic Chazal, Brittany Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, and Larry\nWasserman. Subsampling methods for persistent homology. In International Conference on\nMachine Learning, pages 2143\u20132151, 2015.\n\nChao Chen and Novi Quadrianto. Clustering high dimensional categorical data via topographical\n\nfeatures. In International Conference on Machine Learning, pages 2732\u20132740, 2016.\n\nDavid Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence diagrams.\n\nDiscrete & Computational Geometry, 37(1):103\u2013120, 2007.\n\nVin De Silva, Robert Ghrist, et al. Coverage in sensor networks via persistent homology. Algebraic\n\n& Geometric Topology, 7(1):339\u2013358, 2007.\n\nBarbara Di Fabio and Massimo Ferri. Comparing persistence diagrams through complex vectors. In\n\nInternational Conference on Image Analysis and Processing, pages 294\u2013305. Springer, 2015.\n\n10\n\n\fHerbert Edelsbrunner and John Harer. Persistent homology-a survey. Contemporary mathematics,\n\n453:257\u2013282, 2008.\n\nHerbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persistence and simpli\ufb01ca-\ntion. In Proceedings 41st Annual Symposium on Foundations of Computer Science, pages 454\u2013463,\n2000.\n\nStephen Richard Elliott. Physics of amorphous materials. Longman Group, Longman House, Burnt\n\nMill, Harlow, Essex CM 20 2 JE, England, 1983., 1983.\n\nAasa Feragen, Francois Lauze, and Soren Hauberg. Geodesic exponential kernels: When curvature\nand linearity con\ufb02ict. In Proceedings of the IEEE Conference on Computer Vision and Pattern\nRecognition, pages 3032\u20133042, 2015.\n\nNicolas Francois, Mohammad Saadatfar, R Cruikshank, and A Sheppard. Geometrical frustration in\namorphous and partially crystallized packings of spheres. Physical review letters, 111(14):148001,\n2013.\n\nLeslie Greengard and John Strain. The fast gauss transform. SIAM Journal on Scienti\ufb01c and Statistical\n\nComputing, 12(1):79\u201394, 1991.\n\nYing Guo, Peter L Bartlett, John Shawe-Taylor, and Robert C Williamson. Covering numbers for\nsupport vector machines. In Proceedings of the twelfth annual conference on Computational\nlearning theory, pages 267\u2013277, 1999.\n\nZaid Harchaoui, Eric Moulines, and Francis R Bach. Kernel change-point analysis. In Advances in\n\nneural information processing systems, pages 609\u2013616, 2009.\n\nJan-Martin Hertzsch, Rob Sturman, and Stephen Wiggins. Dna microarrays: design principles for\n\nmaximizing ergodic, chaotic mixing. Small, 3(2):202\u2013218, 2007.\n\nChristoph Hofer, Roland Kwitt, Marc Niethammer, and Andreas Uhl. Deep learning with topological\n\nsignatures. In Advances in Neural Information Processing Systems, pages 1633\u20131643, 2017.\n\nJacques Istas. Manifold indexed fractional \ufb01elds? ESAIM: Probability and Statistics, 16:222\u2013276,\n\n2012.\n\nSadeep Jayasumana, Richard Hartley, Mathieu Salzmann, Hongdong Li, and Mehrtash Harandi.\nKernel methods on riemannian manifolds with gaussian rbf kernels. IEEE transactions on pattern\nanalysis and machine intelligence, 37(12):2464\u20132477, 2015.\n\nPeter M Kasson, Afra Zomorodian, Sanghyun Park, Nina Singhal, Leonidas J Guibas, and Vijay S\nPande. Persistent voids: a new structural metric for membrane fusion. Bioinformatics, 23(14):\n1753\u20131759, 2007.\n\nGenki Kusano, Yasuaki Hiraoka, and Kenji Fukumizu. Persistence weighted gaussian kernel for\ntopological data analysis. In International Conference on Machine Learning, pages 2004\u20132013,\n2016.\n\nGenki Kusano, Kenji Fukumizu, and Yasuaki Hiraoka. Kernel method for persistence diagrams via\nkernel embedding and weight factor. Journal of Machine Learning Research, 18(189):1\u201341, 2018.\n\nRoland Kwitt, Stefan Huber, Marc Niethammer, Weili Lin, and Ulrich Bauer. Statistical topological\ndata analysis-a kernel perspective. In Advances in neural information processing systems, pages\n3070\u20133078, 2015.\n\nJohn Lafferty and Guy Lebanon. Diffusion kernels on statistical manifolds. Journal of Machine\n\nLearning Research, 6(Jan):129\u2013163, 2005.\n\nLongin Jan Latecki, Rolf Lakamper, and T Eckhardt. Shape descriptors for non-rigid shapes with a\nsingle closed contour. In Proceedings of the IEEE Conference on Computer Vision and Pattern\nRecognition (CVPR), volume 1, pages 424\u2013429, 2000.\n\nTam Le and Marco Cuturi. Adaptive euclidean maps for histograms: generalized aitchison embed-\n\ndings. Machine Learning, 99(2):169\u2013187, 2015a.\n\n11\n\n\fTam Le and Marco Cuturi. Unsupervised riemannian metric learning for histograms using aitchison\n\ntransformations. In International Conference on Machine Learning, pages 2002\u20132011, 2015b.\n\nHyekyoung Lee, Moo K Chung, Hyejin Kang, Bung-Nyun Kim, and Dong Soo Lee. Discriminative\npersistent homology of brain networks. In International Symposium on Biomedical Imaging: From\nNano to Macro, pages 841\u2013844, 2011.\n\nJohn M Lee. Riemannian manifolds: an introduction to curvature, volume 176. Springer Science &\n\nBusiness Media, 2006.\n\nPaul Levy and Michel Loeve. Processus stochastiques et mouvement brownien. Gauthier-Villars\n\nParis, 1965.\n\nShahar Mendelson. On the performance of kernel classes. Journal of Machine Learning Research, 4\n\n(Oct):759\u2013771, 2003.\n\nHa Quang Minh, Partha Niyogi, and Yuan Yao. Mercer\u2019s theorem, feature maps, and smoothing. In\n\nInternational Conference on Computational Learning Theory, pages 154\u2013168. Springer, 2006.\n\nVlad I Morariu, Balaji V Srinivasan, Vikas C Raykar, Ramani Duraiswami, and Larry S Davis.\nAutomatic online tuning for fast gaussian summation. In Advances in neural information processing\nsystems, pages 1113\u20131120, 2009.\n\nClaus Muller. Analysis of spherical symmetries in Euclidean spaces, volume 129. Springer Science\n\n& Business Media, 2012.\n\nTakenobu Nakamura, Yasuaki Hiraoka, Akihiko Hirata, Emerson G Escolar, and Yasumasa Nishiura.\nPersistent homology and many-body atomic structure for medium-range order in the glass. Nan-\notechnology, 26(30):304001, 2015.\n\nO\ufb01r Pele and Michael Werman. Fast and robust earth mover\u2019s distances. In International Conference\n\non Computer Vision, pages 460\u2013467. IEEE, 2009.\n\nGiovanni Petri, Paul Expert, Federico Turkheimer, Robin Carhart-Harris, David Nutt, Peter J Hellyer,\nand Francesco Vaccarino. Homological scaffolds of brain functional networks. Journal of The\nRoyal Society Interface, 11(101), 2014.\n\nGabriel Peyre and Marco Cuturi. Computational Optimal Transport.\n\n//optimaltransport.github.io.\n\n2017. URL http:\n\nJan Reininghaus, Stefan Huber, Ulrich Bauer, and Roland Kwitt. A stable multi-scale kernel for\ntopological machine learning. In Proceedings of the IEEE conference on computer vision and\npattern recognition (CVPR), pages 4741\u20134748, 2015.\n\nI. J. Schoenberg. Positive de\ufb01nite functions on spheres. Duke Mathematical Journal, 9:96\u2013108, 1942.\n\nShai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to\n\nalgorithms. Cambridge university press, 2014.\n\nGurjeet Singh, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, and Dario L\nRingach. Topological analysis of population activity in visual cortex. Journal of vision, 8(8):\n11\u201311, 2008.\n\nAlex J Smola, Zoltan L Ovari, and Robert C Williamson. Regularization with dot-product kernels. In\n\nAdvances in neural information processing systems, pages 308\u2013314, 2001.\n\nKatharine Turner, Sayan Mukherjee, and Doug M Boyer. Persistent homology transform for modeling\n\nshapes and surfaces. Information and Inference: A Journal of the IMA, 3(4):310\u2013344, 2014.\n\nC\u00e9dric Villani. Topics in optimal transportation. Number 58. American Mathematical Soc., 2003.\n\nKelin Xia and Guo-Wei Wei. Persistent homology analysis of protein structure, \ufb02exibility, and folding.\n\nInternational journal for numerical methods in biomedical engineering, 30(8):814\u2013844, 2014.\n\n12\n\n\f", "award": [], "sourceid": 6481, "authors": [{"given_name": "Tam", "family_name": "Le", "institution": "RIKEN AIP"}, {"given_name": "Makoto", "family_name": "Yamada", "institution": "Kyoto University / RIKEN AIP"}]}