{"title": "Learning spatiotemporal trajectories from manifold-valued longitudinal data", "book": "Advances in Neural Information Processing Systems", "page_first": 2404, "page_last": 2412, "abstract": "We propose a Bayesian mixed-effects model to learn typical scenarios of changes from longitudinal manifold-valued data, namely repeated measurements of the same objects or individuals at several points in time. The model allows to estimate a group-average trajectory in the space of measurements. Random variations of this trajectory result from spatiotemporal transformations, which allow changes in the direction of the trajectory and in the pace at which trajectories are followed. The use of the tools of Riemannian geometry allows to derive a generic algorithm for any kind of data with smooth constraints, which lie therefore on a Riemannian manifold. Stochastic approximations of the Expectation-Maximization algorithm is used to estimate the model parameters in this highly non-linear setting.The method is used to estimate a data-driven model of the progressive impairments of cognitive functions during the onset of Alzheimer's disease. Experimental results show that the model correctly put into correspondence the age at which each individual was diagnosed with the disease, thus validating the fact that it effectively estimated a normative scenario of disease progression. Random effects provide unique insights into the variations in the ordering and timing of the succession of cognitive impairments across different individuals.", "full_text": "Learning spatiotemporal trajectories from\n\nmanifold-valued longitudinal data\n\nJean-Baptiste Schiratti2,1, St\u00b4ephanie Allassonni`ere2, Olivier Colliot1, Stanley Durrleman1\n\n1 ARAMIS Lab, INRIA Paris, Inserm U1127, CNRS UMR 7225, Sorbonne Universit\u00b4es,\n\nUPMC Univ Paris 06 UMR S 1127, Institut du Cerveau et de la Moelle \u00b4epini`ere,\n\nICM, F-75013, Paris, France\n\n2CMAP, Ecole Polytechnique, Palaiseau, France\n\njean-baptiste.schiratti@cmap.polytechnique.fr,\n\nstephanie.allassonniere@polytechnique.edu,\n\nolivier.colliot@upmc.fr,stanley.durrleman@inria.fr\n\nAbstract\n\nWe propose a Bayesian mixed-effects model to learn typical scenarios of changes\nfrom longitudinal manifold-valued data, namely repeated measurements of the\nsame objects or individuals at several points in time. The model allows to estimate\na group-average trajectory in the space of measurements. Random variations of\nthis trajectory result from spatiotemporal transformations, which allow changes in\nthe direction of the trajectory and in the pace at which trajectories are followed.\nThe use of the tools of Riemannian geometry allows to derive a generic algorithm\nfor any kind of data with smooth constraints, which lie therefore on a Riemannian\nmanifold. Stochastic approximations of the Expectation-Maximization algorithm\nis used to estimate the model parameters in this highly non-linear setting. The\nmethod is used to estimate a data-driven model of the progressive impairments of\ncognitive functions during the onset of Alzheimer\u2019s disease. Experimental results\nshow that the model correctly put into correspondence the age at which each in-\ndividual was diagnosed with the disease, thus validating the fact that it effectively\nestimated a normative scenario of disease progression. Random effects provide\nunique insights into the variations in the ordering and timing of the succession of\ncognitive impairments across different individuals.\n\n1\n\nIntroduction\n\nAge-related brain diseases, such as Parkinson\u2019s or Alzheimer\u2019s disease (AD) are complex diseases\nwith multiple effects on the metabolism, structure and function of the brain. Models of disease pro-\ngression showing the sequence and timing of these effects during the course of the disease remain\nlargely hypothetical [3, 13]. Large databases have been collected recently in the hope to give ex-\nperimental evidence of the patterns of disease progression based on the estimation of data-driven\nmodels. These databases are longitudinal, in the sense that they contain repeated measurements of\nseveral subjects at multiple time-points, but which do not necessarily correspond across subjects.\nLearning models of disease progression from such databases raises great methodological challenges.\nThe main dif\ufb01culty lies in the fact that the age of a given individual gives no information about the\nstage of disease progression of this individual. The onset of clinical symptoms of AD may vary from\nforty and eighty years of age, and the duration of the disease from few years to decades. Moreover,\nthe onset of the disease does not correspond with the onset of the symptoms: according to recent\nstudies, symptoms are likely to be preceded by a silent phase of the disease, for which little is\nknown. As a consequence, statistical models based on the regression of measurements with age are\ninadequate to model disease progression.\n\n1\n\n\fThe set of the measurements of a given individual at a speci\ufb01c time-point belongs to a high-\ndimensional space. Building a model of disease progression amounts to estimating continuous\nsubject-speci\ufb01c trajectories in this space and average those trajectories among a group of individ-\nuals. Trajectories need to be registered in space, to account for the fact that individuals follow\ndifferent trajectories, and in time, to account for the fact that individuals, even if they follow the\nsame trajectory, may be at a different position on this trajectory at the same age.\nThe framework of mixed-effects models seems to be well suited to deal with this hierarchical prob-\nlem. Mixed-effects models for longitudinal measurements were introduced in the seminal paper of\nLaird and Ware [15] and have been widely developed since then (see [6], [16] for instance). How-\never, this kind of models suffers from two main drawbacks regarding our problem. These models\nare built on the estimation of the distribution of the measurements at a given time point. In many\nsituations, this reference time is given by the experimental set up: date at which treatment begins,\ndate of seeding in studies of plant growth, etc. In studies of ageing, using these models would re-\nquire to register the data of each individual to a common stage of disease progression before being\ncompared. Unfortunately, this stage is unknown and such a temporal registration is actually what\nwe wish to estimate. Another limitation of usual mixed-effects models is that they are de\ufb01ned for\ndata lying in Euclidean spaces. However, measurements with smooth constraints usually cannot be\nsummed up or scaled, such as normalized scores of neurospychological tests, positive de\ufb01nite sym-\nmetric matrices, shapes encoded as images or meshes. These data are naturally modeled as points on\nRiemannian manifolds. Although the development of statistical models for manifold-valued data is\na blooming topic, the construction of statistical models for longitudinal data on a manifold remains\nan open problem.\nThe concept of \u201ctime-warp\u201d was introduced in [8] to allow for temporal registration of trajectories\nof shape changes. Nevertheless, the combination of the time-warps with the intrinsic variability of\nshapes across individuals is done at the expense of a simplifying approximation: the variance of\nshapes does not depend on time whereas it should adapt with the average scenario of shape changes.\nMoreover, the estimation of the parameters of the statistical model is made by minimizing a sum of\nsquares which results from an uncontrolled likelihood approximation. In [18], time-warps are used\nto de\ufb01ne a metric between curves that are invariant under time reparameterization. This invariance,\nby de\ufb01nition, prevents the estimation of correspondences across trajectories, and therefore the esti-\nmation of distribution of trajectories in the spatiotemporal domain. In [17], the authors proposed a\nmodel for longitudinal image data but the model is not built on the inference of a statistical model\nand does not include a time reparametrization of the estimated trajectories.\nIn this paper, we propose a generic statistical framework for the de\ufb01nition and estimation of mixed-\neffects models for longitudinal manifold-valued data. Using the tools of geometry allows us to derive\na method that makes little assumptions about the data and problem to deal with. Modeling choices\nboil down to the de\ufb01nition of the metric on the manifold. This geometrical modeling also allows\nus to introduce the concept of parallel curves on a manifold, which is key to uniquely decompose\ndifferences seen in the data in a spatial and a temporal component. Because of the non-linearity\nof the model, the estimation of the parameters should be based on an adequate maximization of the\nobserved likelihood. To address this issue, we propose to use a stochastic version of the Expectation-\nMaximization algorithm [5], namely the MCMC SAEM [2], for which theoretical results regarding\nthe convergence have been proved in [4], [2].\nExperimental results on neuropsychological tests scores and estimates of scenarios of AD progres-\nsion are given in section 4.\n\n2 Spatiotemporal mixed-effects model for manifold-valued data\n\n2.1 Riemannian geometry setting\n\nThe observed data consists in repeated multivariate measurements of p individuals. For a given\nindividual, the measurements are obtained at time points ti,1 < . . . < ti,ni. The j-th measurement\nof the i-th individual is denoted by yi,j. We assume that each observation yi,j is a point on a\nN-dimensional Riemannian manifold M embedded in RP (with P \u2265 N) and equipped with a\nM. We denote \u2207M the covariant derivative. We assume that the manifold is\nRiemannian metric g\ngeodesically complete, meaning that geodesics are de\ufb01ned for all time.\n\n2\n\n\fWe recall that a geodesic is a curve drawn on the manifold \u03b3 : R \u2192 M, which has no acceleration:\n\u2207M\n\u02d9\u03b3 \u02d9\u03b3 = 0. For a point p \u2208 M and a vector v \u2208 TpM, the mapping Exp\nM\np (v) denotes the\nRiemannian exponential, namely the point that is reached at time 1 by the geodesic starting at p\nwith velocity v. The parallel transport of a vector X0 \u2208 T\u03b3(t0)M in the tangent space at point \u03b3(t0)\non a curve \u03b3 is a time-indexed family of vectors X(t) \u2208 T\u03b3(t)M which satis\ufb01es \u2207M\nX(t) = 0\nand X(t0) = X0. We denote P\u03b3,t0,t(X0) the isometry that maps X0 to X(t).\nIn order to describe our model, we need to introduce the notion of \u201cparallel curves\u201d on the manifold:\nDe\ufb01nition 1. Let \u03b3 be a curve on M de\ufb01ned for all time, a time-point t0 \u2208 R and a vector w \u2208\nT\u03b3(t0)M, w (cid:54)= 0. One de\ufb01nes the curve s \u2192 \u03b7w(\u03b3, s), called parallel to the curve \u03b3, as:\n\n\u02d9\u03b3(t)\n\n\u03b7w(\u03b3, s) = Exp\n\nM\n\u03b3(s)\n\n(cid:0)P\u03b3,t0,s(w)(cid:1), s \u2208 R.\n\nThe idea is illustrated in Fig. 1. One uses the parallel transport to move the vector w from \u03b3(t0) to\n\u03b3(s) along \u03b3. At the point \u03b3(s), a new point on M is obtained by taking the Riemannian exponential\nof P\u03b3,t0,s(w). This new point is denoted by \u03b7w(\u03b3, s). As s varies, one describes a curve \u03b7w(\u03b3,\u00b7)\non M, which can be understood as a \u201cparallel\u201d to the curve \u03b3. It should be pointed out that, even\nif \u03b3 is a geodesic, \u03b7w(\u03b3,\u00b7) is, in general, not a geodesic of M. In the Euclidean case (i.e. a \ufb02at\nmanifold), the curve \u03b7w(\u03b3,\u00b7) is the translation of the curve \u03b3: \u03b7w(\u03b3, s) = \u03b3(s) + w.\n\nFigure 1: Model description on a schematic manifold. Figure a) (left) : a non-zero vector wi is\nchoosen in T\u03b3(t0)M. Figure b) (middle) : the tangent vector wi is transported along the geodesic\n\u03b3 and a point \u03b7wi(\u03b3, s) is constructed at time s by use of the Riemannian exponential. Figure c)\n(right) : The curve \u03b7wi(\u03b3,\u00b7) is the parallel resulting from the construction.\n\n2.2 Generic spatiotemporal model for longitudinal data\nOur model is built in a hierarchical manner: data points are seen as samples along individ-\nual trajectories, and these trajectories derive from a group-average trajectory. The model writes\nyi,j = \u03b7wi(\u03b3, \u03c8i(ti,j)) + \u03b5i,j, where we assume the group-average trajectory to be a geodesic,\ndenoted \u03b3 from now on. Individual trajectories derive from the group average by spatiotemporal\ntransformations. They are de\ufb01ned as a time re-parameterization of a trajectory that is parallel to the\ngroup-average: t \u2192 \u03b7wi(\u03b3, \u03c8i(t)). For the ith individual, wi denotes a non-zero tangent vector in\nT\u03b3(t0)M, for some speci\ufb01c time point t0 that needs to be estimated, and which is orthogonal to the\ntangent vector \u02d9\u03b3(t0) for the inner product given by the metric ((cid:104)\u00b7,\u00b7(cid:105)\u03b3(t0) = g\nM\n\u03b3(t0)). The time-warp\nfunction \u03c8i is de\ufb01ned as: \u03c8i(t) = \u03b1i(t \u2212 t0 \u2212 \u03c4i) + t0. The parameter \u03b1i is an acceleration factor\nwhich encodes whether the i-th individual is progressing faster or slower than the average, \u03c4i is a\ntime-shift which characterizes the advance or delay of the ith individual with respect to the average\nand wi is a space-shift which encodes the variability in the measurements across individuals at the\nsame stage of progression.\nThe normal tubular neighborhood theorem ([11]) ensures that parallel shifting de\ufb01nes a spatio-\ntemporal coordinate system as long as the vectors wi are choosen orthogonal and suf\ufb01cently small.\nThe orthogonality condition on the tangent vectors wi is necessary to ensure the identi\ufb01ability of\nthe model. Indeed, if a vector wi was not choosen orthogonal, its orthogonal projection would play\nthe same role as the acceleration factor.The spatial and temporal transformations commute, in the\nsense that one may re-parameterize the average trajectory before building the parallel curve, or vice\nversa. Mathematically, this writes \u03b7wi(\u03b3 \u25e6 \u03c8i, s) = \u03b7wi(\u03b3, \u03c8i(s)). This relation also explains the\nparticular form of the af\ufb01ne time-warp \u03c8i. The geodesic \u03b3 is characterized by the fact that it passes\n\n3\n\nFigure a) Figure b) Figure c) \fat time-point t0 by point p0 = \u03b3(t0) with velocity v0 = \u02d9\u03b3(t0). Then, \u03b3 \u25e6 \u03c8i is the same trajectory,\nexcept that it passes by point p0 at time t0 + \u03c4i with velocity \u03b1iv0.\nThe \ufb01xed effects of the model are the parameters of the average geodesic:\nthe point p0 on the\nmanifold, the time-point t0 and the velocity v0. The random effects are the acceleration factors\n\u03b1i, time-shifts \u03c4i and space-shifts wi. The \ufb01rst two random effects are scalars. One assumes the\nacceleration factors to follow a log-normal distribution (they need to be positive in order not to\nreverse time), and time-shifts to follow a zero-mean Gaussian distribution. Space-shifts are vectors\nof dimension N \u2212 1 in the hyperplane \u02d9\u03b3(t0)\u22a5 in T\u03b3(t0)M. In the spirit of independent component\nanalysis [12], we assume that wi\u2019s result from the superposition of Ns < N statistically independent\ncomponents. This writes wi = Asi where A is a N \u00d7 Ns matrix of rank Ns, si a vector of\nNs independent sources following a heavy tail Laplace distribution with \ufb01xed parameter, and each\ncolumn cj(A) (1 \u2264 j \u2264 Ns) of A satis\ufb01es the orthogonality condition (cid:104)cj(A), \u02d9\u03b3(t0)(cid:105)\u03b3(t0) = 0.\nFor the dataset (ti,j, yi,j) (1 \u2264 i \u2264 p, 1 \u2264 j \u2264 ni), the model may be summarized as:\n\nyi,j = \u03b7wi(\u03b3, \u03c8i(ti,j)) + \u03b5i,j.\nwith \u03c8i(t) = \u03b1i(t \u2212 t0 \u2212 \u03c4i) + t0, \u03b1i = exp(\u03bei), wi = Asi and\n\n(1)\n\ni.i.d.\u223c N (0, \u03c32\n\n\u03b7), \u03c4i\n\n\u03bei\n\ni.i.d.\u223c N (0, \u03c32\n\n\u03c4 ), \u03b5i,j\n\ni.i.d.\u223c N (0, \u03c32IN ), si,l\n\ni.i.d.\u223c Laplace(1/2).\n\nEventually, the parameters of the model one needs to estimate are the \ufb01xed effects and the variance\nof the random effects, namely \u03b8 = (p0, t0, v0, \u03c3\u03be, \u03c3\u03c4 , \u03c3, vec(A)).\n\nM\n\n2.3 Propagation model in a product manifold\nWe wish to use these developments to study the temporal progression of a family of biomarkers.\nWe assume that each component of yi,j is a scalar measurement of a given biomarker and belongs\nto a geodesically complete one-dimensional manifold (M, g). Therefore, each measurement yi,j is\na point in the product manifold M = M N , which we assume to be equipped with the Riemannian\nproduct metric g\n= g + . . . + g. We denote \u03b30 the geodesic of the one-dimensional manifold\nM which goes through the point p0 \u2208 M at time t0 with velocity v0 \u2208 Tp0M. In order to deter-\nmine relative progression of the biomarkers among themselves, we consider a parametric family of\n\ngeodesics of M : \u03b3\u03b4(t) =(cid:0)\u03b30(t), \u03b30(t+\u03b41), . . . , \u03b30(t+\u03b4N\u22121)(cid:1). We assume here that all biomarkers\nmeasured by the vector \u03b4 =(cid:0)0, \u03b41, . . . , \u03b4N\u22121\n(cid:1), which becomes a \ufb01xed effect of the model.\n(cid:0) w1\n(\u03b31(t), . . . , \u03b3N (t)), we have \u03b7w(\u03b3, s) =(cid:0)\u03b31\n(cid:0) wN\n\u02d9\u03b3(t0) + s(cid:1)(cid:1), s \u2208 R.\n\u02d9\u03b3(t0) + s(cid:1), . . . , \u03b3N\n\nIn this setting, a curve that is parallel to a geodesic \u03b3 is given by the following lemma :\nLemma 1. Let \u03b3 be a geodesic of the product manifold M = M N and let t0 \u2208 R.\n\u03b7w(\u03b3,\u00b7) denotes a parallel to the geodesic \u03b3 with w = (w1, . . . , wN\n\nhave on average the same dynamics but shifted in time. This hypothesis allows to model a temporal\nsuccession of effects during the course of the disease. The relative timing in biomarker changes is\n\nIf\n\n(cid:1) \u2208 T\u03b3(t0)M and \u03b3(t) =\n(cid:17)\n\nAs a consequence, a parallel to the average trajectory \u03b3\u03b4 has the same form as the geodesic but with\nrandomly perturbed delays. The model (1) writes : for all k \u2208 {1, . . . , N},\n+ \u03b1i(ti,j \u2212 t0 \u2212 \u03c4i) + t0 + \u03b4k\u22121\n\nyi,j,k = \u03b30\n\n+ \u03b5i,j,k.\n\nwk,i\n\n(2)\n\n(cid:16)\n\n\u02d9\u03b30(t0 + \u03b4k\u22121)\n\nwhere wk,i denotes the k-th component of the space-shift wi and yi,j,k, the measurement of the k-th\nbiomarker, at the j-th time point, for the i-th individual.\n\n2.4 Multivariate logistic curves model\nThe propagation model given in (2) is now described for normalized biomarkers, such as scores of\nneuropsychological tests. In this case, we assume the manifold to be M =]0, 1[ and equipped with\nthe Riemannian metric g given by : for p \u2208]0, 1[, (u, v) \u2208 TpM \u00d7 TpM, gp(u, v) = uG(p)v with\nG(p) = 1/(p2(1 \u2212 p)2). The geodesics given by this metric in the one-dimensional Riemannian\n\nmanifold M are logistic curves of the form : \u03b30(t) =(cid:0)1 + ( 1\n\np0(1\u2212p0) (t \u2212 t0)(cid:1)(cid:1)\u22121\n\nand leads to the multivariate logistic curves model in M. We can notice the quite unusual paramater-\nization of the logistic curve. This parametrization naturally arise because \u03b30 satis\ufb01es : \u03b30(t0) = p0\nand \u02d9\u03b30(t0) = v0. In this case, the model (1) writes:\n\n\u2212 1) exp(cid:0) \u2212\n\np0\n\nv0\n\n4\n\n\f(cid:32)\n\nyi,j,k =\n\n1 +\n\n(cid:32)\n\n(cid:17)\n\n\u2212 1\n\n(cid:16) 1\n\np0\n\nexp\n\n\u2212 v0\u03b1i(ti,j \u2212 t0 \u2212 \u03c4i) + v0\u03b4k + v0\n\np0(1 \u2212 p0)\n\n(Asi)k\n\n\u02d9\u03b30(t0+\u03b4k)\n\n(cid:33)(cid:33)\u22121\n\n+ \u03b5i,j,k, (3)\n\nwhere (Asi)k denotes the k-th component of the vector Asi. Note that (3) is not equivalent to a\nlinear model on the logit of the observations. The logit transform corresponds to the Riemannian\nlogarithm at p0 = 0.5.\nIn our framework, p0 is not \ufb01xed, but estimated as a parameter of our\nmodel. Even with a \ufb01xed p0 = 0.5, the model is still non-linear due to the multiplication between\nrandom-effects \u03b1i and \u03c4i, and therefore does not boil down to the usual linear model [15].\n\n3 Parameters estimation\nIn this section, we explain how to use a stochastic version of the Expectation-Maximization (EM)\nalgorithm [5] to produce estimates of the parameters \u03b8 = (p0, t0, v0, \u03b4, \u03c3\u03be, \u03c3\u03c4 , \u03c3, vec(A)) of the\nmodel. The algorithm detailed in this section is essentially the same as in [2]. Its scope of applica-\ntion is not limited to statistical models on product manifolds and the MCMC-SAEM algorithm can\nactually be used for the inference of a very large family of statistical models.\nThe random effects z = (\u03bei, \u03c4i, sj,i) (1 \u2264 i \u2264 p and 1 \u2264 j \u2264 Ns) are considered as hidden variables.\nWith the observed data y = (yi,j,k)i,j,k, (y, z) form the complete data of the model. In this context,\nthe Expectation-Maximization (EM) algorithm proposed in [5] is very ef\ufb01cient to compute the max-\nimum likelihood estimate of \u03b8. Due to the nonlinearity and complexity of the model, the E step is\nintractable. As a consequence, we considered a stochastic version of the EM algorithm, namely the\nMonte-Carlo Markov Chain Stochastic Approximation Expectation-Maximization (MCMC-SAEM)\nalgorithm [2], based on [4]. This algorithm is an EM-like algorithm which alternates between three\nsteps: simulation, stochastic approximation and maximization. If \u03b8(t) denotes the current parameter\nestimates of the algorithm, in the simulation step, a sample z(t) of the missing data is obtained from\nthe transition kernel of an ergodic Markov chain whose stationary distribution is the conditional\ndistribution of the missing data z knowing y and \u03b8(t), denoted by q(z| y, \u03b8(t)). This simulation\nstep is achieved using Hasting-Metropolis within Gibbs sampler. Note that the high complexity of\nour model prevents us from resorting to sampling methods as in [10] as they would require heavy\ncomputations, such as the Fisher information matrix. The stochastic approximation step consists in\na stochastic approximation on the complete log-likelihood log q(y, z| \u03b8) summarized as follows :\nQt(\u03b8) = Qt\u22121(\u03b8) + \u03b5t [log q(y, z| \u03b8) \u2212 Qt\u22121(\u03b8)], where (\u03b5t)t is a decreasing sequence of positive\nt < +\u221e. Finally, the parameter estimates\n\nstep-sizes in ]0, 1] which satis\ufb01es(cid:80)\n\nt \u03b5t = +\u221e and(cid:80)\n\nt \u03b52\n\n), v0 \u223c N (v0, \u03c32\n\nare updated in the maximization step according to: \u03b8(t+1) = argmax\u03b8\u2208\u0398 Qt(\u03b8).\nThe theoretical convergence of the MCMC SAEM algorithm is proved only if the model be-\nlong to the curved exponential family. Or equivalently, if the complete log-likelihood of the\nlog q(y, z| \u03b8) = \u2212\u03c6(\u03b8) + S(y, z)(cid:62)\u03c8(\u03b8), where S(y, z) is a suf\ufb01cent\nmodel may be written :\nstatistic of the model. In this case, the stochastic approximation on the complete log-likelihood\ncan be replaced with a stochastic approximation on the suf\ufb01cent statistics of the model. Note\nthat the multivariate logistic curves model does not belong to the curved exponential family. A\nusual workaround consists in regarding the parameters of the model as realizations of indepen-\ndents Gaussian random variables ([14]) : \u03b8 \u223c N (\u03b8, D) where D is a diagonal matrix with\nvery small diagonal entries and the estimation now targets \u03b8. This yields: p0 \u223c N (p0, \u03c32\n),\nt0 \u223c N (t0, \u03c32\n\u03b4 ). To ensure the orthogo-\nnality condition on the columns of A, we assumed that A follows a normal distribution on the\n\nspace \u03a3 = {A = (c1(A), . . . , cNs(A)) \u2208 (cid:0)T\u03b3\u03b4(t0)M(cid:1)Ns ; \u2200j, (cid:104)cj(A), \u02d9\u03b3\u03b4(t0)(cid:105)\u03b3\u03b4(t0) = 0}.\nEquivalently, we assume that the matrix A writes : A = (cid:80)(N\u22121)Ns\n\n\u03b2kBk where, for all k,\n\u03b2) and (B1, . . . ,B(N\u22121)Ns) is an orthonormal basis of \u03a3 obtained by applica-\n\u03b2k\ntion of the Gram-Schmidt process to a basis of \u03a3. The random variables \u03b21, . . . , \u03b2(N\u22121)Ns\nare considered as new hidden variables of the model. The parameters of the model are \u03b8 =\n(p0, t0, v0, (\u03b4k)1\u2264k\u2264N\u22121, (\u03b2k)1\u2264k\u2264(N\u22121)Ns, \u03c3\u03be, \u03c3\u03c4 , \u03c3) whereas the hidden variables of the model\nare z = (p0, t0, v0, (\u03b4k)1\u2264k\u2264N\u22121, (\u03b2k)1\u2264k\u2264(N\u22121)Ns, (\u03bei)1\u2264i\u2264p, (\u03c4i)1\u2264i\u2264p, (sj,i)1\u2264j\u2264Ns, 1\u2264i\u2264p).\nThe algorithm (1) given below summarizes the SAEM algorithm for this model.\nThe MCMC-SAEM algorithm 1 was tested on synthetic data generated according to (3). The\nMCMC-SAEM allowed to recover the parameters used to generate the synthetic dataset.\n\n) and, for all k, \u03b4k \u223c N (\u03b4k, \u03c32\n\ni.i.d.\u223c N (\u03b2k, \u03c32\n\nt0\n\nv0\n\np0\n\nk=1\n\n5\n\n\fAlgorithm 1 Overview of the MCMC SAEM algorithm for the multivariate logistic curves model.\n\nulation step of the k-th iteration of the MCMC SAEM, let fi,j = [fi,j,l] \u2208 RN and fi,j,l\nbe the l-th component of \u03b7w(k)\n=\n\n(cid:1) denotes the vector of hidden variables obtained in the sim-\n(cid:0)(\u03b3\u03b4)(k), exp(\u03be(k)\n(cid:1) and w(k)\n\n)(ti,j \u2212 t(k)\n\n0 \u2212 \u03c4 (k)\n\n0 , . . . , (s(k)\n\n) + t(k)\n\n0 , t(k)\n\n0\n\ni\n\ni\n\ni\n\ni\n\nj,i\n\nl Bl.\n\u03b2(k)\nl=1\nInitialization :\n\u03b8 \u2190 \u03b8(0) ; z(0) \u2190 random ; S \u2190 0 ; (\u03b5k)k\u22650.\nrepeat\n\nIf z(k) = (cid:0)p(k)\n(cid:80)(N\u22121)Ns\nSimulation step : z(k) =(cid:0)p(k)\n1 \u2190 (cid:2)y(cid:62)\nwith (1 \u2264 i \u2264 p ; 1 \u2264 j \u2264 ni) and K = (cid:80)p\n(cid:105)\n(cid:104)\n9 \u2190(cid:104)\n\n\u2208 Rp ; S(k)\ni\n\u03b2(k)\nj\n\nCompute the suf\ufb01cent statistics : S(k)\n\n\u2208 R(N\u22121)Ns.\n\n5 \u2190 p(k)\n\n6 \u2190 t(k)\n\n0 , . . . , (s(k)\n\n0 , t(k)\n\ni\nS(k)\n\n; S(k)\n\nj,i )j,i\n\n(\u03c4 (k)\n\n(cid:105)\n\n)2\n\n0\n\n0\n\nj\n\n(cid:1) \u2190 Gibbs Sampler(z(k\u22121), y, \u03b8(k\u22121))\n2 \u2190 (cid:2)(cid:107)fi,j(cid:107)2(cid:3)\n(cid:3)\n(cid:105)\n(cid:104)\ni,j \u2208 RK\n(cid:105)\n8 \u2190 (cid:104)\n4 \u2190\n\u2208 RN\u22121 ;\n\ni=1 ni ; S(k)\n3 =\n7 \u2190 v(k)\n; S(k)\n\n\u2208 Rp ; S(k)\ni\n\u03b4(k)\nj\n\n\u2208 RK ; S(k)\n\ni\n; S(k)\n\n(\u03be(k)\n\n)2\n\ni,j\n\n0\n\nj\n\ni,jfi,j\n\nStochastic approximation step : S(k+1)\n(k+1) \u2190 S(k)\nMaximization step : p0\nfor all 1 \u2264 j \u2264 N \u2212 1 ; \u03b2\n; \u03c3(k+1)\n\n\u2190 S(k)\n(k+1) \u2190 S(k)\n; t0\n; \u03b4\n\u2190 (S(k)\n9 )j for all 1 \u2264 j \u2264 (N \u2212 1)Ns ; \u03c3(k+1)\n4 )(cid:62)1p ; \u03c3(k+1) \u2190 1\u221a\n\nj + \u03b5k(S(y, z(k)) \u2212 S(k)\n(cid:0)(cid:80)\n\n) for j \u2208 {1, . . . , 9}.\n\u2190 (S(k)\n8 )j\n\u2190 1\n3 )(cid:62)1p\np (S(k)\n2 )(cid:62)1K\n\n1 )(cid:62)1K + (S(k)\n\ni,j,k \u2212 2(S(k)\n\n(k+1) \u2190 S(k)\n\n(cid:1)1/2.\n\n(k+1)\nj\n\n(k+1)\nj\n\n\u2190 1\n\ni,j,k y2\n\np (S(k)\n\n; v0\n\n7\n\n6\n\n5\n\n\u03c4\n\n\u03be\n\nj\n\nj\n\nN K\n\nuntil convergence.\nreturn \u03b8.\n\n4 Experiments\n4.1 Data\nWe use the neuropsychological assessment test \u201cADAS-Cog 13\u201d from the ADNI1, ADNIGO or\nADNI2 cohorts of the Alzheimer\u2019s Disease Neuroimaging Initiative (ADNI) [1]. The \u201cADAS-Cog\n13\u201d consists of 13 questions, which allow to test the impairment of several cognitive functions.\nFor the purpose of our analysis, these items are grouped into four categories: memory (5 items),\nlanguage (5 items), praxis (2 items) and concentration (1 item). Scores within each category are\nadded and normalized by the maximum possible score. Consequently, each data point consists in\nfour normalized scores, which can be seen as a point on the manifold M =]0, 1[4.\nWe included 248 individuals in the study, who were diagnosed with mild cognitive impairment\n(MCI) at their \ufb01rst visit and whose diagnosis changed to AD before their last visit. There is an aver-\nage of 6 visits per subjects (min: 3, max: 11), with an average duration of 6 or 12 months between\nconsecutive visits. The multivariate logistic curves model was used to analyze this longitudinal data.\n\n4.2 Experimental results\n\nThe model was applied with Ns = 1, 2 or 3 independent sources. In each experiment, the MCMC\nSAEM was run \ufb01ve times with different initial parameter values. The experiment which returned\nthe smallest residual variance \u03c32 was kept. The maximum number of iterations was arbitrarily set\nto 5000 and the number of burn-in iterations was set to 3000 iterations. The limit of 5000 iterations\nis enough to observe the convergence of the sequences of parameters estimates. As a result, two\nand three sources allowed to decrease the residual variance better than one source (\u03c32 = 0.012 for\none source, \u03c32 = 0.08 for two sources and \u03c32 = 0.084 for three sources). The residual variance\n\u03c32 = 0.012 (resp. \u03c32 = 0.08, \u03c32 = 0.084) mean that the model allowed to explain 79% (resp.\n84%, 85%) of the total variance. We implemented our algorithm in MATLAB without any particular\noptimization scheme. The 5000 iterations require approximately one day.\nThe number of parameters to be estimated is equal to 9 + 3Ns. Therefore, the number of sources\ndo not dramatically impact the runtime. Simulation is the most computationally expensive part of\n\n6\n\n\four algorithm. For each run of the Hasting-Metropolis algorithm, the proposal distribution is the\nprior distribution. As a consequence, the acceptation ratio simpli\ufb01es [2] and one computation of\nthe acceptation ratio requires two computations of the likelihood of the observations, conditionally\non different vectors of latent variables and the vector of current parameters estimates. The runtime\ncould be improved by parallelizing the sampling per individuals.\nFor a matter of clarity and because the results obtained with three sources were similar to the results\nwith two sources, we report here the experimental results obtained with two independent sources.\nThe average model of disease progression \u03b3\u03b4 is plotted in Fig. 2. The estimated \ufb01xed effects are\np0 = 0.3, t0 = 72 years, v0 = 0.04 unit per year, and \u03b4 = [0;\u221215;\u221213;\u22125] years. This means\nthat, on average, the memory score (\ufb01rst coordinate) reaches the value p0 = 0.3 at t0 = 72 years,\nfollowed by concentration which reaches the same value at t0 + 5 = 77 years, and then by praxis\nand language at age 85 and 87 years respectively.\nRandom effects show the variability of this average trajectory within the studied population. The\nstandard deviation of the time-shift equals \u03c3\u03c4 = 7.5 years, meaning that the disease progression\nmodel in Fig. 2 is shifted by \u00b17.5 years to account for the variability in the age of disease onset.\nThe effects of the variance of the acceleration factors, and the two independent components of the\nspace-shifts are illustrated in Fig. 4. The acceleration factors shows the variability in the pace of\ndisease progression, which ranges between 7 times faster and 7 times slower than the average. The\n\ufb01rst independent component shows variability in the relative timing of the cognitive impairments:\nin one direction, memory and concentration are impaired nearly at the same time, followed by\nlanguage and praxis; in the other direction, memory is followed by concentration and then language\nand praxis are nearly superimposed. The second independent component keeps almost \ufb01xed the\ntiming of memory and concentration, and shows a great variability in the relative timing of praxis\nand language impairment. It shows that the ordering of the last two may be inverted in different\nindividuals. Overall, these space-shift components show that the onset of cognitive impairment\ntends to occur by pairs: memory & concentration followed by language & praxis.\nIndividual estimates of the random effects are obtained from the simulation step of the last iteration\nof the algorithm and are plotted in Fig. 5. The \ufb01gure shows that the estimated individual time-shifts\ncorrespond well to the age at which individuals were diagnosed with AD. This means that the value\np0 estimated by the model is a good threshold to determine diagnosis (a fact that has occurred by\nchance), and more importantly that the time-warp correctly register the dynamics of the individual\ntrajectories so that the normalized age correspond to the same stage of disease progression across\nindividuals. This fact is corroborated by Fig. 3 which shows that the normalized age of conversion\nto AD is picked at 77 years old with a small variance compared to the real distribution of age of\nconversion.\n\nFigure 2: The four curves represent the es-\ntimated average trajectory. A vertical line is\ndrawn at t0 = 72 years old and an horizontal\nline is drawn at p0 = 0.3.\n\n7\n\ni\n\nred) : histogram\nFigure 3: In blue (resp.\nof the ages of conversion to AD (tdiag\n)\n(resp. normalized ages of conversion to AD\n(\u03c8i(tdiag\n\n))), with \u03c8i time-warp as in (1).\n\ni\n\n\fFigure 4: Variability in disease progression superimposed with the average trajectory \u03b3\u03b4 (dotted\nlines): effects of the acceleration factor with plots of \u03b3\u03b4\n\ufb01rst and second independent component of space-shift with plots of \u03b7\u00b1\u03c3si ci(A)(\u03b3\u03b4,\u00b7) for i = 1 or\n2 (second and third column respectively).\n\n(cid:0) exp(\u00b1\u03c3\u03be)(t \u2212 t0) + t0\n\n(cid:1) (\ufb01rst column),\n\nFigure 5: Plots of individual random effects:\nlog-acceleration factor \u03bei = log(\u03b1i) against\ntime-shifts t0 + \u03c4i. Color corresponds to the\nage of conversion to AD.\n\n4.3 Discussion and perspectives\n\nWe proposed a generic spatiotemporal model to analyze longitudinal manifold-valued measure-\nments. The \ufb01xed effects de\ufb01ne a group-average trajectory, which is a geodesic on the data manifold.\nRandom effects are subject-speci\ufb01c acceleration factor, time-shift and space-shift which provide in-\nsightful information about the variations in the direction of the individual trajectories and the relative\npace at which they are followed.\nThis model was used to estimate a normative scenario of Alzheimer\u2019s disease progression from\nneuropsychological tests. We validated the estimates of the spatiotemporal registration between\nindividual trajectories by the fact that they put into correspondence the same event on individual\ntrajectories, namely the age at diagnosis. Alternatives to estimate model of disease progression\ninclude the event-based model [9], which estimates the ordering of categorical variables. Our model\nmay be seen as a generalization of this model for continuous variables, which do not only estimate\nthe ordering of the events but also the relative timing between them. Practical solutions to combine\nspatial and temporal sources of variations in longitudinal data are given in [7]. Our goal was here to\npropose theoretical and algorithmic foundations for the systematic treatment of such questions.\n\n8\n\n-\u03c3 +\u03c3 Acceleration factor \ud835\udefc\ud835\udc56 Independent component \ud835\udc341 Independent component \ud835\udc342 \fReferences\n[1] The Alzheimer\u2019s Disease Neuroimaging Initiative, https://ida.loni.usc.edu/\n[2] Allassonni`ere, S., Kuhn, E., Trouv\u00b4e, A.: Construction of bayesian deformable models via a stochastic\n\napproximation algorithm: a convergence study. Bernoulli 16(3), 641\u2013678 (2010)\n\n[3] Braak, H., Braak, E.: Staging of alzheimer\u2019s disease-related neuro\ufb01brillary changes. Neurobiology of\n\naging 16(3), 271\u2013278 (1995)\n\n[4] Delyon, B., Lavielle, M., Moulines, E.: Convergence of a stochastic approximation version of the em\n\nalgorithm. Annals of statistics pp. 94\u2013128 (1999)\n\n[5] Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algo-\n\nrithm. Journal of the royal statistical society. Series B (methodological) pp. 1\u201338 (1977)\n\n[6] Diggle, P., Heagerty, P., Liang, K.Y., Zeger, S.: Analysis of longitudinal data. Oxford University Press\n\n(2002)\n\n[7] Donohue, M.C., Jacqmin-Gadda, H., Le Goff, M., Thomas, R.G., Raman, R., Gamst, A.C., Beckett, L.A.,\nJack, C.R., Weiner, M.W., Dartigues, J.F., Aisen, P.S., the Alzheimer\u2019s Disease Neuroimaging Initiative:\nEstimating long-term multivariate progression from short-term data. Alzheimer\u2019s & Dementia 10(5), 400\u2013\n410 (2014)\n\n[8] Durrleman, S., Pennec, X., Trouv\u00b4e, A., Braga, J., Gerig, G., Ayache, N.: Toward a comprehensive frame-\nwork for the spatiotemporal statistical analysis of longitudinal shape data. International Journal of Com-\nputer Vision 103(1), 22\u201359 (2013)\n\n[9] Fonteijn, H.M., Modat, M., Clarkson, M.J., Barnes, J., Lehmann, M., Hobbs, N.Z., Scahill, R.I., Tabrizi,\nS.J., Ourselin, S., Fox, N.C., et al.: An event-based model for disease progression and its application in\nfamilial alzheimer\u2019s disease and huntington\u2019s disease. NeuroImage 60(3), 1880\u20131889 (2012)\n\n[10] Girolami, M., Calderhead, B.: Riemann manifold langevin and hamiltonian monte carlo methods. Journal\n\nof the Royal Statistical Society: Series B (Statistical Methodology) 73(2), 123\u2013214 (2011)\n\n[11] Hirsch, M.W.: Differential topology. Springer Science & Business Media (2012)\n[12] Hyv\u00a8arinen, A., Karhunen, J., Oja, E.: Independent component analysis, vol. 46. John Wiley & Sons\n\n(2004)\n\n[13] Jack, C.R., Knopman, D.S., Jagust, W.J., Shaw, L.M., Aisen, P.S., Weiner, M.W., Petersen, R.C., Tro-\njanowski, J.Q.: Hypothetical model of dynamic biomarkers of the alzheimer\u2019s pathological cascade. The\nLancet Neurology 9(1), 119\u2013128 (2010)\n\n[14] Kuhn, E., Lavielle, M.: Maximum likelihood estimation in nonlinear mixed effects models. Computa-\n\ntional Statistics & Data Analysis 49(4), 1020\u20131038 (2005)\n\n[15] Laird, N.M., Ware, J.H.: Random-effects models for longitudinal data. Biometrics pp. 963\u2013974 (1982)\n[16] Singer, J.D., Willett, J.B.: Applied longitudinal data analysis: Modeling change and event occurrence.\n\nOxford university press (2003)\n\n[17] Singh, N., Hinkle, J., Joshi, S., Fletcher, P.T.: A hierarchical geodesic model for diffeomorphic longitudi-\n\nnal shape analysis. In: Information Processing in Medical Imaging. pp. 560\u2013571. Springer (2013)\n\n[18] Su, J., Kurtek, S., Klassen, E., Srivastava, A., et al.: Statistical analysis of trajectories on riemannian\nmanifolds: Bird migration, hurricane tracking and video surveillance. The Annals of Applied Statistics\n8(1), 530\u2013552 (2014)\n\n9\n\n\f", "award": [], "sourceid": 1404, "authors": [{"given_name": "Jean-Baptiste", "family_name": "SCHIRATTI", "institution": "Ecole Polytechnique"}, {"given_name": "St\u00e9phanie", "family_name": "ALLASSONNIERE", "institution": "Ecole Polytechnique"}, {"given_name": "Olivier", "family_name": "Colliot", "institution": "Universit\u00e9 Pierre et Marie Curie (UPMC)"}, {"given_name": "Stanley", "family_name": "DURRLEMAN", "institution": "INRIA"}]}