{"title": "Probabilistic Principal Geodesic Analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 1178, "page_last": 1186, "abstract": "Principal geodesic analysis (PGA) is a generalization of principal component analysis (PCA) for dimensionality reduction of data on a Riemannian manifold. Currently PGA is defined as a geometric fit to the data, rather than as a probabilistic model. Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds. To compute maximum likelihood estimates of the parameters in our model, we develop a Monte Carlo Expectation Maximization algorithm, where the expectation is approximated by Hamiltonian Monte Carlo sampling of the latent variables. We demonstrate the ability of our method to recover the ground truth parameters in simulated sphere data, as well as its effectiveness in analyzing shape variability of a corpus callosum data set from human brain images.", "full_text": "Probabilistic Principal Geodesic Analysis\n\nMiaomiao Zhang\nSchool of Computing\nUniversity of Utah\nSalt Lake City, UT\n\nmiaomiao@sci.utah.edu\n\nP. Thomas Fletcher\nSchool of Computing\nUniversity of Utah\nSalt Lake City, UT\n\nfletcher@sci.utah.edu\n\nAbstract\n\nPrincipal geodesic analysis (PGA) is a generalization of principal component anal-\nysis (PCA) for dimensionality reduction of data on a Riemannian manifold. Cur-\nrently PGA is de\ufb01ned as a geometric \ufb01t to the data, rather than as a probabilistic\nmodel. Inspired by probabilistic PCA, we present a latent variable model for PGA\nthat provides a probabilistic framework for factor analysis on manifolds. To com-\npute maximum likelihood estimates of the parameters in our model, we develop\na Monte Carlo Expectation Maximization algorithm, where the expectation is ap-\nproximated by Hamiltonian Monte Carlo sampling of the latent variables. We\ndemonstrate the ability of our method to recover the ground truth parameters in\nsimulated sphere data, as well as its effectiveness in analyzing shape variability of\na corpus callosum data set from human brain images.\n\n1\n\nIntroduction\n\nPrincipal component analysis (PCA) [12] has been widely used to analyze high-dimensional data.\nTipping and Bishop proposed probabilistic PCA (PPCA) [22], which is a latent variable model for\nPCA. A similar formulation was proposed by Roweis [18]. Their work opened up the possibility\nfor probabilistic interpretations for different kinds of factor analyses. For instance, Bayesian PCA\n[3] extended PPCA by adding a prior on the factors, resulting in automatic selection of model di-\nmensionality. Other examples of latent variable models include probabilistic canonical correlation\nanalysis (CCA) [1] and Gaussian process latent variable models [15]. Such latent variable models\nhave not, however, been extended to handle data from a Riemannian manifold.\nManifolds arise naturally as the appropriate representations for data that have smooth constraints.\nFor example, when analyzing directional data [16], i.e., vectors of unit length in Rn, the correct rep-\nresentation is the sphere, Sn\u22121. Another important example of manifold data is in shape analysis,\nwhere the de\ufb01nition of the shape of an object should not depend on its position, orientation, or scale.\nKendall [14] was the \ufb01rst to formulate a mathematically precise de\ufb01nition of shape as equivalence\nclasses of all translations, rotations, and scalings of point sets. The result is a manifold represen-\ntation of shape, or shape space. Linear operations violate the natural constraints of manifold data,\ne.g., a linear average of data on a sphere results in a vector that does not have unit length. As shown\nrecently [5], using the kernel trick with a Gaussian kernel maps data onto a Hilbert sphere, and\nutilizing Riemannian distances on this sphere rather than Euclidean distances improves clustering\nand classi\ufb01cation performance. Other examples of manifold data include geometric transformations,\nsuch as rotations and af\ufb01ne transforms, symmetric positive-de\ufb01nite tensors [9, 24], Grassmannian\nmanifolds (the set of m-dimensional linear subspaces of Rn), and Stiefel manifolds (the set of or-\nthonormal m-frames in Rn) [23]. There has been some work on density estimation on Riemannian\nmanifolds. For example, there is a wealth of literature on parametric density estimation for direc-\ntional data [16], e.g., spheres, projective spaces, etc. Nonparametric density estimation based on\nkernel mixture models [2] was proposed for compact Riemannian manifolds. Methods for sam-\npling from manifold-valued distributions have also been proposed [4, 25]. It\u2019s important to note\n\n1\n\n\fthe distinction between manifold data, where the manifold representation is known a priori, versus\nmanifold learning and nonlinear component analysis [15, 20], where the data lies in Euclidean space\non some unknown, lower-dimensional manifold that must be learned.\nPrincipal geodesic analysis (PGA) [10] generalizes PCA to nonlinear manifolds. It describes the\ngeometric variability of manifold data by \ufb01nding lower-dimensional geodesic subspaces that mini-\nmize the residual sum-of-squared geodesic distances to the data. While [10] originally proposed an\napproximate estimation procedure for PGA, recent contributions [19, 21] have developed algorithms\nfor exact solutions to PGA. Related work on manifold component analysis has introduced variants of\nPGA. This includes relaxing the constraint that geodesics pass through the mean of the data [11] and,\nfor spherical data, replacing geodesic subspaces with nested spheres of arbitrary radius [13]. All of\nthese methods are based on geometric, least-squares estimation procedures, i.e., they \ufb01nd subspaces\nthat minimize the sum-of-squared geodesic distances to the data. Much like the original formulation\nof PCA, current component analysis methods on manifolds lack a probabilistic interpretation. In this\npaper, we propose a latent variable model for PGA, called probabilistic PGA (PPGA). The model\nde\ufb01nition applies to generic manifolds. However, due to the lack of an explicit formulation for the\nnormalizing constant, our estimation is limited to symmetric spaces, which include many common\nmanifolds such as Euclidean space, spheres, Kendall shape spaces, Grassman/Stiefel manifolds, and\nmore. Analogous to PPCA, our method recovers low-dimensional factors as maximum likelihood.\n\n2 Riemannian Geometry Background\n\nIn this section we brie\ufb02y review some necessary facts about Riemannian geometry (see [6] for more\ndetails). Recall that a Riemannian manifold is a differentiable manifold M equipped with a metric\ng, which is a smoothly varying inner product on the tangent spaces of M. Given two vector \ufb01elds\nv, w on M, the covariant derivative \u2207vw gives the change of the vector \ufb01eld w in the v direction.\nThe covariant derivative is a generalization of the Euclidean directional derivative to the manifold\nsetting. Consider a curve \u03b3 : [0, 1] \u2192 M and let \u02d9\u03b3 = d\u03b3/dt be its velocity. Given a vector \ufb01eld\ndt = \u2207 \u02d9\u03b3V . A vector \ufb01eld\nV (t) de\ufb01ned along \u03b3, we can de\ufb01ne the covariant derivative of V to be DV\nis called parallel if the covariant derivative along the curve \u03b3 is zero. A curve \u03b3 is geodesic if it\nsatis\ufb01es the equation \u2207 \u02d9\u03b3 \u02d9\u03b3 = 0. In other words, geodesics are curves with zero acceleration.\nRecall that for any point p \u2208 M and tangent vector v \u2208 TpM, the tangent space of M at p, there\nis a unique geodesic curve \u03b3, with initial conditions \u03b3(0) = p and \u02d9\u03b3(0) = v. This geodesic is only\nguaranteed to exist locally. When \u03b3 is de\ufb01ned over the interval [0, 1], the Riemannian exponential\nmap at p is de\ufb01ned as Expp(v) = \u03b3(1). In other words, the exponential map takes a position and\nvelocity as input and returns the point at time 1 along the geodesic with these initial conditions.\nThe exponential map is locally diffeomorphic onto a neighbourhood of p. Let V (p) be the largest\nsuch neighbourhood. Then within V (p) the exponential map has an inverse, the Riemannian log\nmap, Logp : V (p) \u2192 TpM. For any point q \u2208 V (p), the Riemannian distance function is given by\nd(p, q) = (cid:107) Logp(q)(cid:107). It will be convenient to include the point p as a parameter in the exponential\nand log maps, i.e., de\ufb01ne Exp(p, v) = Expp(v) and Log(p, q) = Logp(q). The gradient of the\nsquared distance function is \u2207pd(p, q)2 = \u22122 Log(p, q).\n\n3 Probabilistic Principal Geodesic Analysis\n\nBefore introducing our PPGA model for manifold data, we \ufb01rst review PPCA. The main idea of\nPPCA is to model an n-dimensional Euclidean random variable y as\n\ny = \u00b5 + Bx + \u0001,\n\n(1)\nwhere \u00b5 is the mean of y, x is a q-dimensional latent variable, with x \u223c N (0, I), B is an n\u00d7q factor\nmatrix that relates x and y, and \u0001 \u223c N (0, \u03c32I) represents error. We will \ufb01nd it convenient to model\nthe factors as B = W \u039b, where the columns of W are mutually orthogonal, and \u039b is a diagonal\nmatrix of scale factors. This removes the rotation ambiguity of the latent factors and makes them\nanalogous to the eigenvectors and eigenvalues of standard PCA (there is still of course an ambiguity\nof the ordering of the factors). We now generalize this model to random variables on Riemannian\nmanifolds.\n\n2\n\n\f3.1 Probability Model\n\n(2)\n\nFollowing [8, 17], we use a generalization of the normal distribution for a Riemannian manifold as\nour noise model. Consider a random variable y taking values on a Riemannian manifold M, de\ufb01ned\nby the probability density function (pdf)\np(y|\u00b5, \u03c4 ) =\n\n1\n\n,\n\nd(\u00b5, y)2(cid:17)\n(cid:16)\u2212 \u03c4\nd(\u00b5, y)2(cid:17)\n\ndy.\n\n2\n\nC(\u00b5, \u03c4 )\n\n(cid:90)\n\nM\n\nexp\n\n(cid:16)\u2212 \u03c4\n\n2\n\nC(\u00b5, \u03c4 ) =\n\nexp\n\nWe term this distribution a Riemannian normal distribution, and use the notation y \u223c NM (\u00b5, \u03c4\u22121)\nto denote it. The parameter \u00b5 \u2208 M acts as a location parameter on the manifold, and the parameter\n\u03c4 \u2208 R+ acts as a dispersion parameter, similar to the precision of a Gaussian. This distribution has\nthe advantages that (1) it is applicable to any Riemannian manifold, (2) it reduces to a multivariate\nnormal distribution (with isotropic covariance) when M = Rn, and (3) much like the Euclidean nor-\nmal distribution, maximum-likelihood estimation of parameters gives rise to least-squares methods\n(see [8] for details). We note that this noise model could be replaced with a different distribution,\nperhaps speci\ufb01c to the type of manifold or application, and the inference procedure presented in the\nnext section could be modi\ufb01ed accordingly.\nThe PPGA model for a random variable y on a smooth Riemannian manifold M is\n\n(cid:0)Exp(\u00b5, z), \u03c4\u22121(cid:1) , z = W \u039bx,\n\ny|x \u223c NM\n\n(3)\nwhere x \u223c N (0, 1) are again latent random variables in Rq, \u00b5 here is a base point on M, W is\na matrix with q columns of mutually orthogonal tangent vectors in T\u00b5M, \u039b is a q \u00d7 q diagonal\nIn this\nmatrix of scale factors for the columns of W , and \u03c4 is a scale parameter for the noise.\nmodel, a linear combination of W \u039b and the latent variables x forms a new tangent vector z \u2208 T\u00b5M.\nNext, the exponential map shoots the base point \u00b5 by z to generate the location parameter of a\nRiemannian normal distribution, from which the data point y is drawn. Note that in Euclidean\nspace, the exponential map is an addition operation, Exp(\u00b5, z) = \u00b5 + z. Thus, our model coincides\nwith (1), the standard PPCA model, when M = Rn.\n\n3.2\n\nInference\n\nWe develop a maximum likelihood procedure to estimate the parameters \u03b8 = (\u00b5, W, \u039b, \u03c4 ) of the\nPPGA model de\ufb01ned in (3). Given observed data yi \u2208 {y1, ..., yN} on M, with associated latent\nvariable xi \u2208 Rq, and zi = W \u039bxi, we formulate an expectation maximization (EM) algorithm.\nSince the expectation step over the latent variables does not yield a closed-form solution, we develop\na Hamiltonian Monte Carlo (HMC) method to sample xi from the posterior p(x|y; \u03b8), the log of\nwhich is given by\n\nN(cid:89)\n\np(xi|yi; \u03b8) \u221d \u2212N log C \u2212 N(cid:88)\n\ni=1\n\ni=1\n\nlog\n\nd (Exp(\u00b5, zi), yi)2 \u2212 (cid:107)xi(cid:107)2\n\n2\n\n\u03c4\n2\n\n,\n\n(4)\n\nand use this in a Monte Carlo Expectation Maximization (MCEM) scheme to estimate \u03b8. The\nprocedure contains two main steps:\n\n3.2.1 E-step: HMC\n\nFor each xi, we draw a sample of size S from the posterior distribution (4) using HMC with the cur-\nrent estimated parameters \u03b8k. Denote xij as the jth sample for xi, the Monte Carlo approximation\nof the Q function is given by\n\nQ(\u03b8|\u03b8k) = Exi|yi;\u03b8k\n\nlog p(xi|yi; \u03b8k)\n\nlog p(xij|yi; \u03b8k).\n\n(5)\n\n(cid:34) N(cid:89)\n\ni=1\n\n(cid:35)\n\nS(cid:88)\n\nN(cid:88)\n\nj=1\n\ni=1\n\n\u2248 1\nS\n\nIn our HMC sampling procedure, the potential energy of the Hamiltonian H(xi, m) = U (xi) +\nV (m) is de\ufb01ned as U (xi) = \u2212 log p(xi|yi; \u03b8), and the kinetic energy V (m) is a typical isotropic\n\n3\n\n\fdt = \u2212 \u2202H\n\n\u2202xi\n\ndt = \u2202H\n\n\u2202m = m, and dm\n\nGaussian distribution on a q-dimensional auxiliary momentum variable, m. This gives us a Hamil-\n= \u2212\u2207xiU. Due to the fact that xi is\ntonian system to integrate: dxi\na Euclidean variable, we use a standard \u201cleap-frog\u201d numerical integration scheme, which approxi-\nmately conserves the Hamiltonian and results in high acceptance rates.\nThe computation of the gradient term \u2207xiU (xi) requires we compute dv Exp(p, v), i.e., the deriva-\ntive operator (Jacobian matrix) of the exponential map with respect to the initial velocity v. To derive\nthis, consider a variation of geodesics c(s, t) = Exp(p, su + tv), where u \u2208 TpM. The variation\nc produces a \u201cfan\u201d of geodesics; this is illustrated for a sphere on the left side of Figure 1. Taking\nthe derivative of this variation results in a Jacobi \ufb01eld: Jv(t) = dc/ds(0, t). Finally, this gives an\nexpression for the exponential map derivative as\n\ndv Exp(p, v)u = Jv(1).\n\n(6)\n\nFor a general manifold, computing the Jacobi \ufb01eld Jv requires solving a second-order ordinary dif-\nferential equation. However, Jacobi \ufb01elds can be evaluated in closed-form for the class of manifolds\nknown as symmetric spaces. For the sphere and Kendall shape space examples, we provide explicit\nformulas for these computations in Section 4. For more details on the derivation of the Jacobi \ufb01eld\nequation and symmetric spaces, see for instance [6].\nNow, the gradient with respect to each xi is\n\n\u2207xiU = xi \u2212 \u03c4 \u039bW T{dzi Exp(\u00b5, zi)\u2020Log(Exp(\u00b5, zi), yi)},\n\n(7)\n\nwhere \u2020 represents the adjoint of a linear operator, i.e.\n\n(cid:104)dzi Exp(\u00b5, zi)\u02c6u, \u02c6v(cid:105) = (cid:104)\u02c6u, dzi Exp(\u00b5, zi)\u2020\u02c6v(cid:105).\n\n3.2.2 M-step: Gradient Ascent\n\nIn this section, we derive the maximization step for updating the parameters \u03b8 = (\u00b5, W, \u039b, \u03c4 ) by\nmaximizing the HMC approximation of the Q function in (5). This turns out to be a gradient ascent\nscheme for all the parameters since there are no closed-form solutions.\n\nGradient for \u03c4: The gradient of the Q function with respect to \u03c4 requires evaulation of the deriva-\ntive of the normalizing constant in the Riemannian normal distribution (2). When M is a symmetric\nspace, this constant does not depend on the mean parameter, \u00b5, because the distribution is invariant\nto isometrics (see [8] for details). Thus, the normalizing constant can be written as\n\n(cid:90)\n\nM\n\n(cid:16)\u2212 \u03c4\n\nd(\u00b5, y)2(cid:17)\n\n2\n\ndy.\n\nC(\u03c4 ) =\n\nexp\n\nWe can rewrite this integral in normal coordinates, which can be thought of as a polar coordinate sys-\ntem in the tangent space, T\u00b5M. The radial coordinate is de\ufb01ned as r = d(\u00b5, y), and the remaining\nn \u2212 1 coordinates are parametrized by a unit vector v, i.e., a point on the unit sphere Sn\u22121 \u2282 T\u00b5M.\nThus we have the change-of-variables, \u03c6(rv) = Exp(\u00b5, rv). Now the integral for the normalizing\nconstant becomes\n\n(cid:90)\n\n(cid:90) R(v)\n\n(cid:16)\u2212 \u03c4\n\nr2(cid:17)| det(d\u03c6(rv))|dr dv,\n\n(8)\n\nC(\u03c4 ) =\n\nSn\u22121\n\n0\n\nexp\n\n2\n\nwhere R(v) is the maximum distance that \u03c6(rv) is de\ufb01ned. Note that this formula is only valid if\nM is a complete manifold, which guarantees that normal coordinates are de\ufb01ned everywhere except\npossibly a set of measure zero on M.\nThe integral in (8) is dif\ufb01cult to compute for general manifolds, due to the presence of the determi-\nnant of the Jacobian of \u03c6. However, for symmetric spaces this change-of-variables term has a simple\nform. If M is a symmetric space, there exists a orthonormal basis u1, . . . , un, with u1 = v, such\nthat\n\n| det(d\u03c6(rv))| =\n\n\u221a\n\n1\u221a\n\u03bak\n\nfk(\n\n\u03bakr),\n\n(9)\n\nn(cid:89)\n\nk=2\n\n4\n\n\fwhere \u03bak = K(u1, uk) denotes the sectional curvature, and fk is de\ufb01ned as\n\nif \u03bak > 0,\nif \u03bak < 0,\nif \u03bak = 0.\n\nsinh(x)\nx\n\n\uf8f1\uf8f2\uf8f3sin(x)\n(cid:16)\u2212 \u03c4\nr2(cid:17) n(cid:89)\n\n2\n\nk=2\n\nfk(x) =\n\n(cid:90) R\n\n0\n\nNotice that with this expression for the Jacobian determinant there is no longer a dependence on v\ninside the integral in (8). Also, if M is simply connected, then R(v) = R does not depend on the\ndirection v, and we can write the normalizing constant as\n\nC(\u03c4 ) = An\u22121\n\nexp\n\n\u22121/2\n\u03ba\nk\n\n\u221a\nfk(\n\n\u03bakr)dr,\n\nwhere An\u22121 is the surface area of the n \u2212 1 hypersphere, Sn\u22121. The remaining integral is one-\ndimensional, and can be quickly and accurately approximated by numerical integration. While\nthis formula works only for simply connected symmetric spaces, other symmetric spaces could be\nhandled by lifting to the universal cover, which is simply connected, or by restricting the de\ufb01nition\nof the Riemannian normal pdf in (2) to have support only up to the injectivity radius, i.e., R =\nminv R(v).\nThe gradient term for estimating \u03c4 is\n\u2207\u03c4 Q =\n\nr2(cid:17) n(cid:89)\n\nd(Exp(\u00b5, zij), yi)2dr.\n\n(cid:16)\u2212 \u03c4\n\nN(cid:88)\n\nS(cid:88)\n\n(cid:90) R\n\nAn\u22121\n\nexp\n\n\u22121/2\n\u03ba\nk\n\nfk(\n\n\u221a\n\n\u03bakr)dr\u2212 1\n2\n\nr2\n2\n\n0\n\n1\n\nC(\u03c4 )\n\ni=1\n\nj=1\n\n2\n\nk=2\n\nGradient for \u00b5: From (4) and (5), the gradient term for updating \u00b5 is\n\nN(cid:88)\n\nS(cid:88)\n\ni=1\n\nj=1\n\n\u2207\u00b5Q =\n\n1\nS\n\n\u03c4 d\u00b5 Exp(\u00b5, zij)\u2020Log (Exp(\u00b5, zij), yi).\n\nHere the derivative d\u00b5 Exp(\u00b5, v) is with respect to the base point, \u00b5. Similar to before (6),\nthis derivative can be derived from a variation of geodesics: c(s, t) = Exp(Exp(\u00b5, su), tv(s)),\nwhere v(s) comes from parallel translating v along the geodesic Exp(\u00b5, su). Again, the deriva-\ntive of the exponential map is given by a Jacobi \ufb01eld satisfying J\u00b5(t) = dc/ds(0, t), and we have\nd\u00b5 Exp(\u00b5, v) = J\u00b5(1).\n\nGradient for \u039b: For updating \u039b, we take the derivative w.r.t. each ath diagonal element \u039ba as\n\n\u03c4 (W axa\n\nij)T{dzij Exp(\u00b5, zij)\u2020Log(Exp(\u00b5, zij), yi)},\n\nN(cid:88)\n\nS(cid:88)\n\ni=1\n\nj=1\n\n\u2202Q\n\u2202\u039ba =\n\n1\nS\n\nwhere W a denotes the ath column of W , and xa\n\nij is the ath component of xij.\n\nGradient for W : The gradient w.r.t. W is\n\nN(cid:88)\n\nS(cid:88)\n\ni=1\n\nj=1\n\n\u2207W Q =\n\n1\nS\n\n\u03c4 dzij Exp(\u00b5, zij)\u2020 Log(Exp(\u00b5, zij), yi)xT\n\nij\u039b.\n\n(10)\n\nTo preserve the mutual orthogonality constraint on the columns of W , we represent W as a point\non the Stiefel manifold Vq(T\u00b5M ), i.e., the space of orthonormal q-frames in T\u00b5M. We project the\ngradient in (10) onto the tangent space TW Vq(T\u00b5M ), and then update W by taking a small step\nalong the geodesic in the projected gradient direction. For details on the geodesic computations for\nStiefel manifolds, see [7].\nThe MCEM algorithm for PPGA is an iterative procedure for \ufb01nding the subspace spanned by q\nprincipal components, shown in Algorithm 1. The computation time per iteration depends on the\ncomplexity of exponential map, log map, and Jacobi \ufb01eld which may vary for different manifold.\nNote the cost of the gradient ascent algorithm also linearly depends on the data size, dimensionality,\nand the number of samples drawn. An advantage of MCEM is that it can run in parallel for each\ndata point.\n\n5\n\n\fAlgorithm 1 Monte Carlo Expectation Maximization for Probabilistic Principal Geodesic Analysis\n\nInput: Data set Y , reduced dimension q.\nInitialize \u00b5, W, \u039b, \u03c3.\nrepeat\n\nSample X according to (7),\nUpdate \u00b5, W, \u039b, \u03c3 by gradient ascent in Section 3.2.2.\n\nuntil convergence\n\n4 Experiments\n\nIn this section, we demonstrate the effectiveness of PPGA and our ML estimation using both simu-\nlated data on the 2D sphere and a real corpus callosum data set. Before presenting the experiments\nof PPGA, we brie\ufb02y review the necessary computations for the speci\ufb01c types of manifolds used,\nincluding, the Riemannian exponential map, log map, and Jacobi \ufb01elds.\n\n4.1 Simulated Sphere Data\nSphere geometry overview: Let p be a point on an n-dimensional sphere embedded in Rn+1, and\nlet v be a tangent at p. The inner product between tangents at a base point p is the usual Euclidean\ninner product. The exponential map is given by a 2D rotation of p by an angle given by the norm of\nthe tangent, i.e.,\n\nExp(p, v) = cos \u03b8 \u00b7 p +\n\n\u03b8 = (cid:107)v(cid:107).\n\nsin \u03b8\n\n\u03b8\n\n\u00b7 v,\n\n(11)\n\nThe log map between two points p, q on the sphere can be computed by \ufb01nding the initial velocity\nof the rotation between the two points. Let \u03c0p(q) = p \u00b7 (cid:104)p, q(cid:105) denote the projection of the vector q\nonto p. Then,\n\nLog(p, q) =\n\n\u03b8 \u00b7 (q \u2212 \u03c0p(q))\n(cid:107)q \u2212 \u03c0p(q)(cid:107) ,\n\n\u03b8 = arccos((cid:104)p, q(cid:105)).\n\n(12)\n\nAll sectional curvatures for Sn are equal to one. The adjoint derivatives of the exponential map are\ngiven by\n\ndp Exp(p, v)\u2020w = cos((cid:107)v(cid:107))w\u22a5 + w(cid:62),\n\ndv Exp(p, v)\u2020w =\n\nsin((cid:107)v(cid:107))\n(cid:107)v(cid:107) w\u22a5 + w(cid:62),\n\nwhere w\u22a5, w(cid:62) denote the components of w that are orthogonal and tangent to v, respectively. An\nillustration of geodesics and the Jacobi \ufb01elds that give rise to the exponential map derivatives is\nshown in Figure 1.\n\nParameter estimation on the sphere: Using our generative model for PGA (3), we forward\nsimulated a random sample of 100 data points on the unit sphere S2, with known parameters \u03b8 =\n(\u00b5, W, \u039b, \u03c4 ), shown in Table 1. Next, we ran our maximum likelihood estimation procedure to test\nwhether we could recover those parameters. We initialized \u00b5 from a random uniform point on the\nsphere. We initialized W as a random Gaussian matrix, to which we then applied the Gram-Schmidt\nalgorithm to ensure its columns were orthonormal. Figure 1 compares the ground truth principal\ngeodesics and MLE principal geodesic analysis using our algorithm. A good overlap between the\n\ufb01rst principal geodesic shows that PPGA recovers the model parameters.\nOne advantage that our PPGA model has over the least-squares PGA formulation is that the mean\npoint is estimated jointly with the principal geodesics. In the standard PGA algorithm, the mean\nis estimated \ufb01rst (using geodesic least-squares), then the principal geodesics are estimated second.\nThis does not make a difference in the Euclidean case (principal components must pass through the\nmean), but it does in the nonlinear case. We compared our model with PGA and standard PCA (in\nthe Euclidean embedding space). The estimation error of principal geodesics turned to be larger in\nPGA compared to our model. Furthermore, the standard PCA converges to an incorrect solution due\nto its inappropriate use of a Euclidean metric on Riemannian data. A comparison of the ground truth\nparameters and these methods is given in Table 1. Note that the noise precision \u03c4 is not a part of\neither the PGA or PCA models.\n\n6\n\n\fGround truth\nPPGA\nPGA\nPCA\n\n\u00b5\n(\u22120.78, 0.48,\u22120.37)\n(\u22120.78, 0.48,\u22120.40)\n(\u22120.79, 0.46,\u22120.41)\n(\u22120.70, 0.41,\u22120.46)\n\nw\n(\u22120.59,\u22120.42, 0.68)\n(\u22120.59,\u22120.43, 0.69)\n(\u22120.59,\u22120.38, 0.70)\n(\u22120.62,\u22120.37, 0.69)\n\n\u03c4\n\u039b\n100\n0.40\n0.41\n102\n0.41 N/A\n0.38 N/A\n\nTable 1: Comparison between ground truth parameters for the simulated data and the MLE of PPGA,\nnon-probabilistic PGA, and standard PCA.\n\nFigure 1: Left: Jacobi \ufb01elds; Right: the principal geodesic of random generated data on unit sphere.\nBlue dots: random generated sphere data set. Yellow line: ground truth principal geodesic. Red\nline: estimated principal geodesic using PPGA.\n\n4.2 Shape Analysis of the Corpus Callosum\n\nShape space geometry: A con\ufb01guration of k points in the 2D plane is considered as a complex\nk-vector, z \u2208 Ck. Removing translation, by requiring the centroid to be zero, projects this point to\n\nthe linear complex subspace V = {z \u2208 Ck : (cid:80) zi = 0}, which is equivalent to the space Ck\u22121.\n\nNext, points in this subspace are deemed equivalent if they are a rotation and scaling of each other,\nwhich can be represented as multiplication by a complex number, \u03c1ei\u03b8, where \u03c1 is the scaling factor\nand \u03b8 is the rotation angle. The set of such equivalence classes forms the complex projective space,\nCP k\u22122.\nWe think of a centered shape p \u2208 V as representing the complex line Lp = {z \u00b7 p : z \u2208 C\\{0}},\ni.e., Lp consists of all point con\ufb01gurations with the same shape as p. A tangent vector at Lp \u2208 V is\na complex vector, v \u2208 V , such that (cid:104)p, v(cid:105) = 0. The exponential map is given by rotating (within V )\nthe complex line Lp by the initial velocity v, that is,\n\n(cid:107)p(cid:107) sin \u03b8\n\n\u03b8\n\nExp(p, v) = cos \u03b8 \u00b7 p +\n\n(13)\nLikewise, the log map between two shapes p, q \u2208 V is given by \ufb01nding the initial velocity of the\nrotation between the two complex lines Lp and Lq. Let \u03c0p(q) = p\u00b7(cid:104)p, q(cid:105)/(cid:107)p(cid:107)2 denote the projection\nof the vector q onto p. Then the log map is given by\n\u03b8 \u00b7 (q \u2212 \u03c0p(q))\n(cid:107)q \u2212 \u03c0p(q)(cid:107) ,\n\n|(cid:104)p, q(cid:105)|\n(cid:107)p(cid:107)(cid:107)q(cid:107) .\n\nLog(p, q) =\n\n\u03b8 = arccos\n\n(14)\n\n\u00b7 v,\n\n\u03b8 = (cid:107)v(cid:107).\n\nThe sectional curvatures of CP k\u22122, \u03bai = K(ui, v), used in (9), can be computed as follows. Let\n\u221a\u22121. The remaining u2, . . . , un can be\nu1 = i \u00b7 v, where we treat v as a complex vector and i =\nchosen arbitrarily to be construct an orthonormal frame with v and u1 Then we have K(u1, v) = 4\nand K(ui, v) = 1 for i > 1. The adjoint derivatives of the exponential map are given by\n\ndp Exp(p, v)\u2020w = cos((cid:107)v(cid:107))w\u22a5\nsin((cid:107)v(cid:107))\n(cid:107)v(cid:107) w\u22a5\ndv Exp(p, v)\u2020w =\n1 +\n\nsin(2(cid:107)v(cid:107))\n2(cid:107)v(cid:107) + w(cid:62)\n2 ,\n1 = (cid:104)w, u1(cid:105)u1, u(cid:62)\nwhere w\u22a5\ning orthogonal component of w, and w(cid:62) denotes the component tangent to v.\n\n1 denotes the component of w parallel to u1, i.e., w\u22a5\n\n1 + cos(2(cid:107)v(cid:107))w\u22a5\n\n2 + w(cid:62),\n\n2 denotes the remain-\n\n7\n\npvJ(x)M\fShape variability of corpus callosum data: As a demonstration of PPGA on Kendall shape\nspace, we applied it to corpus callosum shape data derived from the OASIS database (www.\noasis-brains.org). The data consisted of magnetic resonance images (MRI) from 32 healthy\nadult subjects. The corpus callosum was segmented in a midsagittal slice using the ITK SNAP\nprogram (www.itksnap.org). An example of a segmented corpus callosum in an MRI is\nshown in Figure 2. The boundaries of these segmentations were sampled with 64 points us-\ning ShapeWorks (www.sci.utah.edu/software.html). This algorithm generates a sam-\npling of a set of shape boundaries while enforcing correspondences between different point mod-\nels within the population. Figure 2 displays the \ufb01rst two modes of corpus callosum shape varia-\ntion, generated from the as points along the estimated principal geodesics: Exp(\u00b5, \u03b1iwi), where\n\u03b1i = \u22123\u03bbi,\u22121.5\u03bbi, 0, 1.5\u03bbi, 3\u03bbi, for i = 1, 2.\n\nFigure 2: Left: example corpus callosum segmentation from an MRI slice. Middle to right: \ufb01rst and\nsecond PGA mode of shape variation with \u22123, \u22121.5, 1.5, and 3 \u00d7 \u03bb.\n\n5 Conclusion\n\nWe presented a latent variable model of PGA on Riemannian manifolds. We developed a Monte\nCarlo Expectation Maximization for maximum likelihood estimation of parameters that uses Hamil-\ntonian Monte Carlo to integrate over the posterior distribution of latent variables. This work takes the\n\ufb01rst step to bring latent variable models to Riemannian manifolds. This opens up several possibili-\nties for new factor analyses on Riemannian manifolds, including a rigorous formulation for mixture\nmodels of PGA and automatic dimensionality selection with a Bayesian formulation of PGA.\n\nAcknowledgments This work was supported in part by NSF CAREER Grant 1054057.\n\nReferences\n[1] F. R. Bach and M. I. Jordan. A probabilistic interpretation of canonical correlation analysis.\n\nTechnical Report 608, Department of Statistics, University of California, Berkeley, 2005.\n\n[2] A. Bhattacharya and D. B. Dunson. Nonparametric bayesian density estimation on manifolds\n\nwith applications to planar shapes. Biometrika, 97(4):851\u2013865, 2010.\n\n[3] C. M. Bishop. Bayesian PCA. Advances in neural information processing systems, pages\n\n382\u2013388, 1999.\n\n[4] S. Byrne and M. Girolami. Geodesic Monte Carlo on embedded manifolds. arXiv preprint\n\narXiv:1301.6064, 2013.\n\n[5] N. Courty, T. Burger, and P. F. Marteau. Geodesic analysis on the Gaussian RKHS hypersphere.\n\nIn Machine Learning and Knowledge Discovery in Databases, pages 299\u2013313, 2012.\n\n[6] M. do Carmo. Riemannian Geometry. Birkh\u00a8auser, 1992.\n[7] A. Edelman, T. A Arias, and S. T Smith. The geometry of algorithms with orthogonality\n\nconstraints. SIAM journal on Matrix Analysis and Applications, 20(2):303\u2013353, 1998.\n\n[8] P. T. Fletcher. Geodesic regression and the theory of least squares on Riemannian manifolds.\n\nInternational Journal of Computer Vision, pages 1\u201315, 2012.\n\n[9] P. T. Fletcher and S. Joshi. Principal geodesic analysis on symmetric spaces: statistics of\ndiffusion tensors. In Workshop on Computer Vision Approaches to Medical Image Analysis\n(CVAMIA), 2004.\n\n8\n\n-3l1-1.5l101.5l13l1-3l2-1.5l201.5l23l2\f[10] P. T. Fletcher, C. Lu, and S. Joshi. Statistics of shape via principal geodesic analysis on Lie\n\ngroups. In Computer Vision and Pattern Recognition, pages 95\u2013101, 2003.\n\n[11] S. Huckemann and H. Ziezold. Principal component analysis for Riemannian manifolds, with\nan application to triangular shape spaces. Advances in Applied Probability, 38(2):299\u2013319,\n2006.\n\n[12] I. T. Jolliffe. Principal Component Analysis, volume 487. Springer-Verlag New York, 1986.\n[13] S. Jung, I. L. Dryden, and J. S. Marron. Analysis of principal nested spheres. Biometrika,\n\n99(3):551\u2013568, 2012.\n\n[14] D. G. Kendall. Shape manifolds, Procrustean metrics, and complex projective spaces. Bulletin\n\nof the London Mathematical Society, 16:18\u2013121, 1984.\n\n[15] N. D. Lawrence. Gaussian process latent variable models for visualisation of high dimensional\n\ndata. Advances in neural information processing systems, 16:329\u2013336, 2004.\n\n[16] K. V. Mardia. Directional Statistics. John Wiley and Sons, 1999.\n[17] X. Pennec. Intrinsic statistics on Riemannian manifolds: basic tools for geometric measure-\n\nments. Journal of Mathematical Imaging and Vision, 25(1), 2006.\n\n[18] S. Roweis. EM algorithms for PCA and SPCA. Advances in neural information processing\n\nsystems, pages 626\u2013632, 1998.\n\n[19] S. Said, N. Courty, N. Le Bihan, and S. J. Sangwine. Exact principal geodesic analysis for\nIn Proceedings of the 15th European Signal Processing Conference, pages\n\ndata on SO(3).\n1700\u20131705, 2007.\n\n[20] B. Sch\u00a8olkopf, A. Smola, and K. R. M\u00a8uller. Nonlinear component analysis as a kernel eigen-\n\nvalue problem. Neural Computation, 10(5):1299\u20131319, 1998.\n\n[21] S. Sommer, F. Lauze, S. Hauberg, and M. Nielsen. Manifold valued statistics, exact principal\nIn Proceedings of the European\n\ngeodesic analysis and the effect of linear approximations.\nConference on Computer Vision, pages 43\u201356, 2010.\n\n[22] M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the\n\nRoyal Statistical Society: Series B (Statistical Methodology), 61(3):611\u2013622, 1999.\n\n[23] P. Turaga, A. Veeraraghavan, A. Srivastava, and R. Chellappa. Statistical computations on\nGrassmann and Stiefel manifolds for image and video-based recognition. IEEE Trans. Pattern\nAnalysis and Machine Intelligence, 33(11):2273\u20132286, 2011.\n\n[24] O. Tuzel, F. Porikli, and P. Meer. Pedestrian detection via classi\ufb01cation on Riemannian mani-\n\nfolds. IEEE Trans. Pattern Analysis and Machine Intelligence, 30(10):1713\u20131727, 2008.\n\n[25] M. Zhang, N. Singh, and P. T. Fletcher. Bayesian estimation of regularization and atlas building\nin diffeomorphic image registration. In Information Processing in Medical Imaging, pages 37\u2013\n48. Springer, 2013.\n\n9\n\n\f", "award": [], "sourceid": 618, "authors": [{"given_name": "Miaomiao", "family_name": "Zhang", "institution": "University of Utah"}, {"given_name": "Tom", "family_name": "Fletcher", "institution": "University of Utah"}]}