{"title": "Spatial and anatomical regularization of SVM for brain image analysis", "book": "Advances in Neural Information Processing Systems", "page_first": 460, "page_last": 468, "abstract": "Support vector machines (SVM) are increasingly used in brain image analyses since they allow capturing complex multivariate relationships in the data. Moreover, when the kernel is linear, SVMs can be used to localize spatial patterns of discrimination between two groups of subjects. However, the features' spatial distribution is not taken into account. As a consequence, the optimal margin hyperplane is often scattered and lacks spatial coherence, making its anatomical interpretation difficult. This paper introduces a framework to spatially regularize SVM for brain image analysis. We show that Laplacian regularization provides a flexible framework to integrate various types of constraints and can be applied to both cortical surfaces and 3D brain images. The proposed framework is applied to the classification of MR images based on gray matter concentration maps and cortical thickness measures from 30 patients with Alzheimer's disease and 30 elderly controls. The results demonstrate that the proposed method enables natural spatial and anatomical regularization of the classifier.", "full_text": "Spatial and anatomical regularization of SVM\n\nfor brain image analysis\n\nR\u00b4emi Cuingnet\n\nCRICM (UPMC/Inserm/CNRS), Paris, France\n\nInserm - LIF (UMR S 678), Paris, France\nremi.cuingnet@imed.jussieu.fr\n\nMarie Chupin\n\nCRICM, Paris, France\n\nmarie.chupin@upmc.fr\n\nHabib Benali\n\nInserm - LIF, Paris, France\n\nhabib.benali@imed.jussieu.fr\n\nOlivier Colliot\n\nCRICM, Paris, France\n\nolivier.colliot@upmc.fr\n\nAbstract\n\nSupport vector machines (SVM) are increasingly used in brain image analyses\nsince they allow capturing complex multivariate relationships in the data. More-\nover, when the kernel is linear, SVMs can be used to localize spatial patterns\nof discrimination between two groups of subjects. However, the features\u2019 spa-\ntial distribution is not taken into account. As a consequence, the optimal margin\nhyperplane is often scattered and lacks spatial coherence, making its anatomical\ninterpretation dif\ufb01cult. This paper introduces a framework to spatially regularize\nSVM for brain image analysis. We show that Laplacian regularization provides a\n\ufb02exible framework to integrate various types of constraints and can be applied to\nboth cortical surfaces and 3D brain images. The proposed framework is applied\nto the classi\ufb01cation of MR images based on gray matter concentration maps and\ncortical thickness measures from 30 patients with Alzheimer\u2019s disease and 30 el-\nderly controls. The results demonstrate that the proposed method enables natural\nspatial and anatomical regularization of the classi\ufb01er.\n\n1\n\nIntroduction\n\nBrain image analyses have widely relied on univariate voxel-wise analyses, such as voxel-based\nmorphometry (VBM) for structural MRI [1]. In such analyses, brain images are \ufb01rst spatially reg-\nistered to a common stereotaxic space, and then mass univariate statistical tests are performed in\neach voxel to detect signi\ufb01cant group differences. However, the sensitivity of theses approaches\nis limited when the differences are spatially complex and involve a combination of different vox-\nels or brain structures [2]. Recently, there has been a growing interest in support vector machines\n(SVM) methods [3, 4] to overcome the limits of these univariate analyses. Theses approaches allow\ncapturing complex multivariate relationships in the data and have been successfully applied to the\nindividual classi\ufb01cation of a variety of neurological conditions [5, 6, 7, 8]. Moreover, the output of\nthe SVM can also be analyzed to localize spatial patterns of discrimination, for example by drawing\nthe coef\ufb01cients of the optimal margin hyperplane (OMH) \u2013 which, in the case of a linear SVM, live\nin the same space as the MRI data [7, 8]. However, one of the problems with analyzing directly\nthe OMH coef\ufb01cients is that the corresponding maps are scattered and lack spatial coherence. This\nmakes it dif\ufb01cult to give a meaningful interpretation of the maps, for example to localize the brain\nregions altered by a given pathology.\nIn this paper, we address this issue by proposing a framework to introduce spatial consistency into\nSVMs by using regularization operators. Section 2 provides some background information on SVMs\n\n1\n\n\fand regularization operators. We then show that the regularization operator framework provides a\n\ufb02exible approach to model different types of proximity (section 3). Section 4 presents the \ufb01rst type\nof regularization, which models spatial proximity, i.e.\ntwo features are close if they are spatially\nclose. We then present in section 5 a more complex type of constraint, called anatomical proxim-\nity. In the latter case, two features are considered close if they belong to the same brain network;\nfor instance two voxels are close if they belong to the same anatomical or functional region or if\nthey are anatomically or functionally connected (based on fMRI networks or white matter tracts).\nFinally, in section 6, the proposed framework is illustrated on the analysis of MR images using gray\nmatter concentration maps and cortical thickness measures from 30 patients with AD and 30 elderly\ncontrols from the ADNI database (www.adni-info.org).\n\n2 Priors in SVM\n\nIn this section, we \ufb01rst describe the neuroimaging data that we consider in this paper. Then, after\nsome background on SVMs and on how to add prior knowledge in SVMs, we describe the frame-\nwork of regularization operators.\n\n2.1 Brain imaging data\n\nIn this contribution, we consider any feature computed either at each voxel of a 3D brain image\nor at any vertex of the cortical surface. Typically, for anatomical studies, the features could be\ntissue concentration maps such as gray matter (GM) or white matter (WM) for the 3D case or\ncortical thickness maps for the surface case. The proposed methods are also applicable to functional\nor diffusion weighted MRI. We further assume that 3D images or cortical surfaces were spatially\nnormalized to a common stereotaxic space (e.g.\n[9]) as in many group studies or classi\ufb01cation\nmethods [5, 6, 7, 8, 10].\nLet V be the domain of the 3D images or surfaces. v will denote an element of V (i.e. a voxel or a\nvertex). Thus, X = RV, together with the canonical dot product will be the input space.\nLet xs \u2208 X be the data of a given subject s. In the case of 3D images, xs can be considered in two\ndifferent ways: (i) as an element of Rd where d denotes the number of voxels, (ii) as a real-valued\nfunction de\ufb01ned on a compact subset of R3. Both \ufb01nite and continuous viewpoints will be studied in\nthis paper because they allow different types of regularization. Similarly, in the surface case, xs can\nbe viewed either as an element of Rd where d denotes the number of vertices or as a real-valued\nfunction on a 2-dimensional compact Riemannian manifold.\nWe consider a group of N subjects with their corresponding data (xs)s\u2208[1,N ] \u2208 X N . Each subject\nis associated with a group (ys)s\u2208[1,N ] \u2208 {\u22121, 1}N (typically his diagnosis, i.e. diseased or healthy).\n\n2.2 Linear SVM\n\nThe linear SVM solves the following optimization problem [3, 4, 11]:\n\n(cid:0)wopt, bopt(cid:1) = arg min\n\nw\u2208X ,b\u2208R\n\nNX\n\ns=1\n\nlhinge (ys [hw, xsi + b]) + \u03bb k w k2\n\n(1)\n\nwhere \u03bb \u2208 R+ is the regularization parameter and lhinge the hinge loss function de\ufb01ned as:\nlhinge : u \u2208 R 7\u2192 max(0, 1 \u2212 u).\nWith a linear SVM, the feature space is the same as the input space. Thus, when the input features\nare the voxels of a 3D image, each element of wopt = (wopt\n)v\u2208V also corresponds to a voxel.\nSimilarly, for the surface-based methods, the elements of wopt can be represented on the vertices\nof the cortical surface. To be anatomically consistent, if v(1) \u2208 V and v(2) \u2208 V are close according\nto the topology of V, their weights in the SVM classi\ufb01er, wopt\nv(2) respectively, should be\nsimilar. In other words, if v(1) and v(2) correpond to two neighboring regions, they should have a\nsimilar role in the classi\ufb01er function. However, this is not guaranteed with the standard linear SVM\n(as for example in [7]) because the regularization term is not a spatial regularization. The aim of\nthe present paper is to propose methods to ensure that wopt is spatially regularized.\n\nv(1) and wopt\n\nv\n\n2\n\n\f2.3 How to include priors in SVM\n\nTo spatially regularize the SVM, one has to include some prior knowledge on the proximity of\nfeatures. In the literature, three main ways have been considered in order to include priors in SVMs.\nIn an SVM, all the information used for classi\ufb01cation is encoded in the kernel. Hence, the \ufb01rst way\nto include prior is to directly design the kernel function [4]. But this implies knowing a metric on\nthe input space X consistent with the prior knowledge.\nAnother way is to force the classi\ufb01er function to be locally invariant to some transformations. This\ncan be done: (i) by directly engineering a kernel which leads to locally invariant SVM, (ii) by\ngenerating arti\ufb01cially transformed examples from the training set to create virtual support vectors\n(virtual SV), (iii) by using a combination of both these approaches called kernel jittering [12, 13,\n14]. But the main dif\ufb01culty here is how to de\ufb01ne the transformations to which we would like the\nkernel to be invariant.\nThe last way is to consider SVM from the regularization viewpoint [15, 4]. The idea is to force the\nclassi\ufb01er function to be smooth with respect to some criteria. This is the viewpoint which is adopted\nin this paper.\n\n2.4 Regularization operators\n\nOur aim is to introduce a spatial regularization on the classi\ufb01er function of the SVM which can be\nwritten as sgn (f(xs) + b) where f \u2208 RX . This is done through the de\ufb01nition of a regularization\noperator P on f. Following [15, 4], P is de\ufb01ned as a linear map from a space F \u2282 RX into a dot\nproduct space (D,h\u00b7,\u00b7iD).\nG : X \u00d7 X \u2192 R is a Green\u2019s function of a regularization operator P iff:\n\u2200f \u2208 F, \u2200x \u2208 X , f(x) = hP (G(x,\u00b7), P (f)iD\n\n(2)\nIf P admits at least a Green\u2019s function called G, then G is a positive semi-de\ufb01nite kernel and the\nminimization problem:\n\nlhinge (ys [f(xs) + b]) + \u03bb k P (f) k2D\n\n(3)\n\n(cid:0)f opt, bopt(cid:1) = arg min\n\nf\u2208F ,b\u2208R\n\nNX\n\ns=1\n\nis equivalent to the SVM minimization problem with kernel G.\nSince in linear SVM, the feature space is the input space, f lies in the input space. Therefore, the\noptimisation problem (3) is very convenient to include spatial regularization on f via the de\ufb01nition of\nP . Note that, usually, F is a Reproducing Kernel Hilbert Space (RKHS) with kernel K and D = F.\nHence, if P is bounded, injective and compact, P admits a Green\u2019s function G = (P \u2020P )\u22121K where\nP \u2020 denotes the adjoint of P .\nOne has to de\ufb01ne the regularization operator P so as to obtain the suitable regularization for the\nproblem.\n\n3 Laplacian regularization\nSpatial regularization requires the notion of proximity between elements of V. This can be done\nthrough the de\ufb01nition of a graph in the discrete case or a metric in the continuous case. In this sec-\ntion, we propose spatial regularizations based on the Laplacian for both of these proximity models.\nThis penalizes the high-frequency components with respect to the topology of V.\n\n3.1 Graphs\nWhen V is \ufb01nite, weighted graphs are a natural framework to take spatial information into consid-\neration. Voxels of a brain image can be considered as nodes of a graph which models the voxels\u2019\nproximity. This graph can be the voxel connectivity (6, 18 or 26) or a more sophisticated graph.\nWe chose the following regularization operator:\n\nP : w\u2217 \u2208 F = L(RV , R) 7\u2192(cid:16)\n\n1\n2 \u03b2Lw\n\ne\n\n(cid:17)\u2217 \u2208 F\n\n(4)\n\n3\n\n\fwhere L denotes the graph Laplacian [16] and w\u2217 the dual vector of w. \u03b2 controls the size of the\nregularization. The optimization problem then becomes:\n\n(wopt, bopt) = arg min\nw\u2208X ,b\u2208R\n\nlhinge (ys [hw, xsi + b]) + \u03bb k e\n\n1\n\n2 \u03b2Lw k2\n\n(5)\n\nNX\n\ns=1\n\nSuch a regularization exponentially penalizes the high-frequency components and thus forces the\nclassi\ufb01er to consider as similar voxels highly connected according to the graph adjacency matrix.\nAccording to the previous section, this new minimization problem (5) is equivalent to an SVM\noptimization problem. The new kernel K\u03b2 is given by:\nK\u03b2(x1, x2) = xT\n\n(6)\nThis is a heat or diffusion kernel on a graph. Our approach differs from the diffusion kernels intro-\nduced by Kondor et al. [17] because the nodes of the graph are the features, here the voxels, whereas\nin [17], the nodes were the objects to classify. Laplacian regularization was also used in satellite\nimaging [18] but, again, the nodes were the objects to classify. Our approach can also be considered\nas a spectral regularization on the graph [19]. To our knowledge, such spectral regularization has\nnot been applied to brain images but only to the classi\ufb01cation of microarray data [20].\n\n1 e\u2212\u03b2Lx2\n\n3.2 Compact Riemannian manifolds\nIn this paper, when V is continuous, it can be considered as a 2-dimensional (e.g. surfaces) or a\n3-dimensional (e.g. 3D Euclidean or more complex) compact Riemannian manifold. The metric\nthen models the notion of proximity. On such spaces, the heat kernel exists [21, 22]. Therefore, the\nLaplacian regularization presented in the previous paragraph can be extended to compact Rieman-\nnian manifolds [22]. Similarly to the graphs, we chose the following regularization operator:\n\nP : w\u2217 \u2208 F = L(RV , R) 7\u2192(cid:16)\n\n1\n2 \u03b2\u2206w\n\n(7)\nwhere \u2206 denotes the Laplace-Beltramin operator. The optimization problem is also equivalent to\n1 e\u2212\u03b2\u2206x2. Note the difference between\nan SVM optimization problem with kernel K\u03b2(x1, x2) = xT\nour approach and that of Laferty and Lebanon [22]. In our case, the points of the manifolds are the\nfeatures, whereas in [22], they were the objects to classify.\nIn sections 4 and 5, we present different types of proximity models which correspond to different\ntypes of graphs or distances.\n\ne\n\n(cid:17)\u2217 \u2208 F\n\n4 Spatial proximity\n\nIn this section, we consider the case of regularization based on spatial proximity, i.e. two voxels (or\nvertices) are close if they are spatially close.\n\n\u221a\n\n\u03b2 [17].\n\n4.1 The 3D case\nWhen V are the image voxels (discrete case), the simplest option to encode the spatial proximity\nis to use the image connectivity (e.g. 6-connectivity) as a regularization graph. Similarly, when V\nis a compact subset of R3 (continuous case), the proximity is encoded by a Euclidean distance. In\nboth cases, this is equivalent to pre-process the data with a Gaussian smoothing kernel with standard\ndeviation \u03c3 =\nHowever, smoothing the data with a Gaussian kernel would mix gray matter (GM), white mat-\nter (WM) and cerebrospinal \ufb02uid (CSF). Instead, we propose a graph which takes into considera-\ntion both the spatial localization and the tissue types. Based on tissue probability maps, in each\nvoxel v, we have the set of probabilities pv that this voxel belongs to GM, WM or CSF. We con-\nsidered the following graph. Two voxels are connected if and only if they are neighbors in the\nimage (6-connectivity). The weight au,v of the edge between two connected voxels u and v is\nau,v = e\u2212d\u03c72 (pu,pv)2/(2\u03c32), where d\u03c72 is the \u03c72-distance between two distributions. We chose\nbeforehand \u03c3 equal to the standard deviation of d\u03c72(pu, pv).\nTo compute the kernel, we computed e\u2212\u03b2Lxs for each subject s in the training set by scaling the\nLaplacian and using the Taylor series expansion.\n\n4\n\n\f4.2 The surface case\n\nThe connectivity graph is not directly applicable to surfaces. Indeed, the regularization would then\nstrongly depend on the mesh used to discretize the surface. This shortcoming can be overcome\nby reweighing the graph with conformal weights. In this paper, we chose a different approach by\nadopting the continuous viewpoint: we consider the cortical surface as a 2-dimensional Riemannian\nmanifold and use the regularization operator de\ufb01ned by equation (7). Indeed, the Laplacian is an\nintrinsic operator and does not depend on the chosen surface parameterization. The heat kernel has\nalready been used for cortical smoothing for example in [23, 24, 25, 26]. We will therefore not detail\nthis part. We used the implementation described in [26].\n\n5 Anatomical proximity\n\nIn this section, we consider a different type of proximity, which we call anatomical proximity. Two\nvoxels are considered close if they belong to the same brain network. For example, two voxels\ncan be close if they belong to the same anatomical or functional region (de\ufb01ned for example by\na probabilistic atlas). This can be seen as a \u201cshort-range\u201d connectivity. Another example is that\nof \u201clong-range\u201d proximity which models the fact that distant voxels can be anatomically (through\nwhite matter tracts) or functionally connected (based on fMRI networks).\nWe \ufb01rst focus on the discrete case. The presented framework can be used either for 3D images or\nsurfaces and computed very ef\ufb01ciently. However, such an ef\ufb01cient implementation was obtained at\nthe cost of the spatial proximity. Therefore, we then show a continuous formulation which enables\nto consider both spatial and anatomical proximity.\n\n5.1 On graphs: atlas and connectivity\nLet (A1,\u00b7\u00b7\u00b7 ,AR) be the R regions of interest (ROI) of an atlas and p(v \u2208 Ar) the probabil-\nity that the voxel v belongs to region Ar. Then the probability that two voxels v(i) and v(j)\n\nr=1 p(cid:0)(cid:0)v(i), v(j)(cid:1) \u2208 A2\n(cid:1). We assume that if v(i) 6= v(j) then:\n(cid:1) p(cid:0)v(j) \u2208 Ar\n(cid:1). Let E \u2208 Rd\u00d7R be the right stochastic matrix\n(cid:1). Then, for i 6= j, the (i, j)-th entry of the adjacency matrix EEt\n\nbelong to the same region is: PR\np(cid:0)(cid:0)v(i), v(j)(cid:1) \u2208 A2\n(cid:1) = p(cid:0)v(i) \u2208 Ar\nde\ufb01ned by: Ei,r = p(cid:0)v(i) \u2208 Ar\n\nr\n\nr\n\nis the probability that the voxels v(i) and v(j) belong to the same regions.\nFor \u201clong-range\u201c connections (structural or functional), one can consider an R-by-R matrix C with\nthe (r1, r2)-th entry being the probability that Ar1 and Ar2 are connected. Then the adjacency\nmatrix becomes: ECEt. We considered the normalized Laplacian \u02dcL [16], to be sure that the two\nterms commute:\n\nwhere D is a diagonal matrix. Hence, if CEtD\u22121E is not singular, we have:\n\n\u02dcL = Id \u2212 D\u2212 1\n\n2 ECEtD\u2212 1\n\n2\n\ni\n\n(8)\n\n(9)\n\ne\u2212\u03b2 \u02dcL = e\u2212\u03b2h\n\nId + D\u2212 1\n\n2 E(e\u03b2CEtD\u22121E \u2212 IR)(CEtD\u22121E)\u22121CEtD\u2212 1\n\n2\n\nThe computation requires only the computation of D\u2212 1\n2 , which is done ef\ufb01ciently since D is a\ndiagonal matrix, and the computation of inverse and the matrix exponential of an R-by-R matrix,\nwhich is also ef\ufb01cient since R \u223c 102.\nThis method can be directly applied to both 3D images and cortical surfaces. Unfortunately, the\nef\ufb01cient implementation was obtained at the cost of the spatial proximity. The next section presents\na combination of anatomical and spatial proximity using the continuous viewpoint.\n\n5.2 On statistical manifolds\n\nIn this section, the goal is to take into account various prior informations such as tissue information,\natlas information and spatial proximity. We \ufb01rst show that this can be done by considering the\nimages or surfaces as statistical manifolds together with the Fisher metric. We then give some\ndetails about the computation of the kernel.\n\n5\n\n\fFisher metric We assume that we are given an anatomical or a functional atlas A composed of\nR regions: {Ar}r=1\u00b7\u00b7\u00b7R. Similarly, T = {TGM,TWM,TCSF} denotes the set of brain tissues. In\neach point v \u2208 V, we have a probability distribution patlas(\u00b7|v) \u2208 RT \u00d7A which informs about the\ntissue type and the atlas region in v. Without any loss of generality, one can assume that the tis-\nsue information is encoded in the atlas. Therefore, we consider the probability patlas(\u00b7|v) \u2208 RA.\nWe also consider a probability distribution ploc(\u00b7|v) \u2208 RV which encodes the spatial proxim-\nity. A simple example is ploc(\u00b7|v) \u223c N (v, \u03c32\nloc). Therefore, we consider the probability family:\n\nM = (cid:8)p(\u00b7|v) \u2208 RA\u00d7V(cid:9)\n\nv\u2208V where p(\u00b7|v) = patlas(\u00b7|v)ploc(\u00b7|v).\n\nA natural way to encode proximity on M is to use the Fisher metric as in [22]. With some smooth-\nness assumption about p, M together with this metric is a compact Riemannian manifold [27]. For\nclarity, we present this framework only for 3D images but it could be applied to cortical surfaces\nwith minor changes. The metric tensor g is then given for all v \u2208 V by:\n\ngij(v) = Ev\n\n\u2202vj\nIf we further assume that ploc(\u00b7|v) is isotropic we have:\n\n\u2202vi\n\n(cid:20) \u2202 log p(\u00b7|v)\nZ\n\n\u2202 log p(\u00b7|v)\n\n, 1 \u2264 i, j \u2264 3\n\n(cid:21)\n(cid:18) \u2202 log ploc(u|v)\n\n(cid:19)2\n\n(10)\n\n(v) + \u03b4ij\n\ngij(v) = gatlas\n\n(11)\nwhere \u03b4ij is the Kronecker delta and gatlas is the metric tensor when p(\u00b7|v) = patlas(\u00b7|v). When\nploc(\u00b7|v) \u223c N (v, \u03c32\n\nlocI3), we have: gij(v) = gatlas\n\nu\u2208V\n\n\u2202vi\n\ndu\n\nij\n\nij\n\nploc(u|v)\n\n(v) + \u03b4ij\n\u03c32\n\nloc\n\n.\n\nComputing the kernel Once the notion of proximity is de\ufb01ned, one has to compute the kernel\nmatrix. The computation of the kernel matrix requires the computation of e\u2212\u03b2\u2206xs for all the subjects\nof the training set. The eigendecomposition of the Laplace-Beltrami operator is intractable since the\nnumber of voxels in a brain images is about 106. Hence e\u2212\u03b2\u2206xs is considered as the solution at\nt = \u03b2 of the heat equation with the Dirichlet homogeneous boundary conditions:\n\n3X\n\nj=1\n\n\u2202\n\u2202vj\n\n  3X\n\npdet g\n\nhij\n\n(12)\n\n!\n\n\u2202u\n\u2202vi\n\ni=1\n\nThe Laplace-Beltrami operator is given by [21]: \u2206u =\n\n1\u221a\ndet g\n\nwhere h is the inverse tensor of g.\nTo solve equation (12), one can use a variational approach [28]. We used the rectangular \ufb01nite\nelements in space and the explicit \ufb01nite difference scheme for the time discretization. \u2206x and \u2206t\ndenote the space step and the time step respectively. \u2206x is \ufb01xed by the MRI spatial resolution. \u2206t\nis then chosen so as to respect the Courant-Friedrichs-Lewy (CFL) condition, which can be written\nin this case as: \u2206t \u2264 2(max \u03bbi)\u22121, where \u03bbi are the eigenvalues of the general eigenproblem:\nKU = \u03bbMU with K the stiffness matrix and M the mass matrix. To compute the optimal time\nstep \u2206t, we estimated the largest eigenvalue with the power iteration method.\n\n(cid:26) \u2202u\n\u2202t \u2212 \u2206u = 0\nu(t = 0) = xs\n\n6 Experiments and results\n\n6.1 Material\n\nSubjects and MRI acquisition Data were obtained from the Alzheimer\u2019s Disease Neuroimaging\nInitiative (ADNI) database 1. The Principal Investigator of this initiative is Michael W. Weiner,\nM.D., VA Medical Center and University of California - San Francisco.For up-to-date information\nsee www.adni-info.org. We studied 30 patients with probable AD (age\u00b1 standard-deviation (SD) =\n74\u00b14, range = 60-80 years, mini-mental score (MMS) = 23\u00b12) and 30 elderly controls (age\u00b1 SD =\n73\u00b14, range = 60-80, MMS = 29\u00b11) which were selected from the ADNI database according to the\n\n1www.loni.ucla.edu/ADNI\n\n6\n\n\ffollowing criteria. Subjects were excluded if their scan revealed major artifacts or gross structural\nabnormalities of the white matter, for it makes the tissue segmentation step fail. 80-year-old subjects\nor older were also excluded. The MR scans are T1-weighted MR images. MRI acquisition was done\naccording to the ADNI acquisition protocol in [29].\n\nFeatures extraction For the 3D image analyses, all T1-weighted MR images were segmented into\ngray matter (GM), white matter (WM) and cerebrospinal \ufb02uid (CSF) using the SPM5 (Statistical\nParametric Mapping, London, UK) uni\ufb01ed segmentation routine [30] and spatially normalized with\nDARTEL [9]. The features are the GM probability maps in the MNI space. For the surface-based\nanalyses, the features are the cortical thickness values at each vertex of the cortical surface. Cortical\nthickness measures were performed with Freesurfer (Massachusetts General Hospital, Boston, MA).\n\n6.2 Proposed experiments\n\nAs an illustration of the method, we present the results of the AD versus controls analysis. We\npresent the maps associated to the optimal margin hyperplane (OMH). The classi\ufb01cation function\nobtained with a linear SVM is the sign of the inner product of the features with wopt, a vector\northogonal to the OMH [3, 4]. Therefore, if the absolute value of the ith component of wopt, |wopt\n|,\nis small compared to the other components (|wopt\n|)j6=i, the ith feature will have a small in\ufb02uence\non the classi\ufb01cation. Conversely, if |wopt\n| is relatively large, the ith feature will play an important\nrole in the classi\ufb01er. Thus the optimal weights wopt allow us to evaluate the anatomical consistency\nof the classi\ufb01er. In all experiments, the C parameter of the SVM was \ufb01xed to one (\u03bb = 1\n2N C [4]).\n\nj\n\ni\n\ni\n\n6.3 Results: spatial proximity\n\nIn this section, we present the results for the spatial proximity in the 3D case (method presented in\nsection 4.1). Due to space limitations, the surface case is not presented. Fig. 1(a) presents the OMH\nwhen no spatial regularization is performed. Fig. 1(b) shows the results with spatial proximity but\nwithout tissue probability maps. w becomes smoother and spatially consistent. However it mixes\ntissues and does not respect the topology of the cortex. For instance, it mixes tissues of the temporal\nlobe with tissues of the frontal and parietal lobes. The results with both spatial proximity and tissue\nmaps are shown on Fig. 1(c). The OMH is much more consistent with the brain anatomy. \u03b2 controls\nthe size of the spatial regularization and was chosen to be equivalent to a 4mm-FWHM of the\nGaussian smoothing. The classi\ufb01cation accuracy was estimated by a leave-one-out cross validation.\nThe classi\ufb01ers were able to distinguish AD from CN with similar accuracies (83% with no spatial\npriors and 85% with spatial priors).\n\n6.4 Results: anatomical proximity\n\nIn this section, we present the results for the anatomical proximity. We \ufb01rst present the discrete\nsurface case. The discrete 3D case leads to comparable results but is omitted here due to space\nlimitations. We then present the continuous 3D case. Extension to surfaces is left for future work.\n\nDiscrete case For the discrete case, we used \u201dshort-range\u201c proximity, de\ufb01ned by the cortical atlas\nof Desikan et al. [31] with binary probabilities. We tested different values for \u03b2 = 0, 1,\u00b7\u00b7\u00b7 , 5.\nThe accuracies ranged between 80% and 85%. The highest accuracy was reached for \u03b2 = 3. The\noptimal SVM weights w are shown on Fig. 2. When no regularization has been carried out, they\nare noisy and scattered (Fig. 2 (a)). When the amount of regularization is increased, voxels of a\nsame region tend to be considered as similar by the classi\ufb01er (Fig. 2(b-d)). Note how the anatomical\ncoherence of the OMH varies with \u03b2.\n\nContinuous case We then present the results of the 3D continuous case (section 5.2). The atlas\ninformation used was only the tissue types. We chose \u03c3loc = 10mm for the spatial con\ufb01dency.\n\u03b2 was chosen to be equivalent to a 4mm-FWHM of the Gaussian smoothing. The classi\ufb01er reached\n87% accuracy. The optimal SVM weights w are shown on Fig. 1(d). The tissue knowledge enables\nthe classi\ufb01er to be more consistent with the anatomy. For instance, note the difference with the\nGaussian smoothing (Fig. 1(b)) and how the proposed method avoids mixing the temporal lobe with\nthe parietal and frontal lobes.\n\n7\n\n\f-0.5\n\n-0.05 +0.05\n\n+0.5\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 1: Normalized w coef\ufb01cients: (a) no spatial prior, (b) spatial proximity: FWHM=4mm, (c) spatial\nproximity and tissues: FWHM\u223c4mm, (d) Fisher metric using tissue maps.\n\n-0.5\n\n(b)\n\n0 0\n\n+0.5\n\n(c)\n\n(d)\n\n(a)\n\nFigure 2: Normalized w of the left hemisphere when the SVM is regularized with a cortical atlas [31]:\n(a) \u03b2 = 0 (no prior), (b) \u03b2 = 1, (c) \u03b2 = 2, (d) \u03b2 = 3.\n\n7 Discussion\n\nIn this contribution, we proposed to use regularization operators to add spatial consistency to SVMs\nfor brain image analysis. We show that this provides a \ufb02exible approach to model different types of\nproximity between the features. We proposed derivations for both 3D image features, such as tissue\nmaps, or surface characteristics, such as cortical thickness. We considered two different types of\nformulations: a discrete viewpoint in which the proximity is encoded via a graph, and a continuous\nviewpoint in which the data lies on a Riemannian manifold. In particular, the latter viewpoint is\nuseful for surface cases because it overcomes problems due to surface parameterization. This paper\nintroduced two different types of proximity. We \ufb01rst considered the case of regularization based on\nspatial proximity, which results in spatially consistent OMH making their anatomical interpretation\nmore meaningful. We then considered a different type of proximity which allows modeling higher-\nlevel knowledge, which we call anatomical proximity. In this model, two voxels are considered close\nif they belong to the same brain network. For example, two voxels can be close if they belong to the\nsame anatomical region. This can be seen as a \u201cshort-range\u201d connectivity. Another example is that\nof \u201clong-range\u201d proximity which models the fact that distant voxels can be anatomically connected,\nthrough white matter tracts, or functionally connected, based on fMRI networks.\nPreliminary evaluation was performed on 30 patients with AD and 30 age-matched controls. The\nresults demonstrate that the proposed approaches allow obtaining spatially and anatomically coher-\nent discrimination patterns. In particular, the obtained hyperplanes are largely consistent with the\nneuropathology of AD, with highly discriminant features in the medial temporal lobe, as well as\nlateral temporal, parietal associative and frontal areas. As for the classi\ufb01cation results, they were\ncomparable to those reported in the literature for AD classi\ufb01cation (e.g. [5, 8, 7]). The use of regu-\nlarization did not substantially improve the accuracy. However, the most important point is that the\nproposed approach makes the results more consistent with the anatomy, making their interpretation\nmore meaningful.\nFinally, it should be noted that the proposed approach is not speci\ufb01c to structural MRI, and can be\napplied to other pathologies and other types of data (e.g. functional or diffusion-weighted MRI).\n\nAcknowledgments\n\nThis work was supported by ANR (project HM-TC, number ANR-09-EMER-006).\nData collection and sharing for this project was funded by the Alzheimer\u2019s Disease Neuroimaging\nInitiative (ADNI; Principal Investigator: Michael Weiner; NIH grant U01 AG024904). ADNI data\nare disseminated by the Laboratory of Neuro Imaging at the University of California, Los Angeles.\n\n8\n\n\fReferences\n[1] J. Ashburner and K.J. Friston. Voxel-based morphometry\u2013the methods. NeuroImage, 11(6):805\u201321, 2000.\n[2] C. Davatzikos. Why voxel-based morphometric analysis should be used with great caution when charac-\n\nterizing group differences. NeuroImage, 23(1):17\u201320, 2004.\n\n[3] V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.\n[4] B. Sch\u00a8olkopf and A.J. Smola. Learning with Kernels. MIT Press, 2001.\n[5] Z. Lao et al. Morphological classi\ufb01cation of brains via high-dimensional shape transformations and\n\nmachine learning methods. NeuroImage, 21(1):46\u201357, 2004.\n\n[6] Y. Fan et al. COMPARE: classi\ufb01cation of morphological patterns using adaptive regional elements. IEEE\n\nTMI, 26(1):93\u2013105, 2007.\n\n[7] S. Kl\u00a8oppel et al. Automatic classi\ufb01cation of MR scans in Alzheimer\u2019s disease. Brain, 131(3):681\u20139,\n\n2008.\n\n[8] P. Vemuri et al. Alzheimer\u2019s disease diagnosis in individual subjects using structural MR images: valida-\n\ntion studies. NeuroImage, 39(3):1186\u201397, 2008.\n\n[9] J. Ashburner et al. A fast diffeomorphic image registration algorithm. NeuroImage, 38(1):95\u2013113, 2007.\n[10] O. Querbes et al. Early diagnosis of Alzheimer\u2019s disease using cortical thickness: impact of cognitive\n\nreserve. Brain, 132(8):2036, 2009.\n\n[11] J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis. Cambridge Univ Pr, 2004.\n[12] D. Decoste and B. Sch\u00a8olkopf. Training invariant support vector machines. Machine Learning, 46(1):161\u2013\n\n90, 2002.\n\n[13] B. Sch\u00a8olkopf et al. Incorporating invariances in support vector learning machines. In Proc. ICANN 1996,\n\npage 47. Springer Verlag, 1996.\n\n[14] B. Sch\u00a8olkopf et al. Prior knowledge in support vector kernels. In Proc. conference on Advances in neural\n\ninformation processing systems\u201997, pages 640\u201346. MIT Press, 1998.\n\n[15] A.J. Smola and B. Sch\u00a8olkopf. On a kernel-based method for pattern recognition, regression, approxima-\n\ntion, and operator inversion. Algorithmica, 22(1/2):211\u201331, 1998.\n[16] F.R.K. Chung. Spectral Graph Theory. Number 92. AMS, 1992.\n[17] R. I. Kondor and J.D. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In Proc.\n\nInternational Conference on Machine Learning, pages 315\u201322, 2002.\n\n[18] L. G\u00b4omez-Chova et al. Semi-supervised image classi\ufb01cation with Laplacian support vector machines.\n\nIEEE Geo Rem Sens Let, 5(3):336\u201340, 2008.\n\n[19] A.J. Smola and R. Kondor. Kernels and regularization on graphs. In Proc. COLT, page 144. Springer\n\nVerlag, 2003.\n\n[20] F. Rapaport et al. Classi\ufb01cation of microarray data using gene networks. BMC bioinformatics, 8(1):35,\n\n2007.\n\n[21] J. Jost. Riemannian geometry and geometric analysis. Springer Verlag, 2008.\n[22] J. Lafferty and G. Lebanon. Diffusion kernels on statistical manifolds. JMLR, 6:129\u201363, 2005.\n[23] A. Andrade et al. Detection of fMRI activation using cortical surface mapping. Hum Brain Mapp,\n\n12(2):79\u201393, 2001.\n\n[24] A. Cachia et al. A primal sketch of the cortex mean curvature: a morphogenesis based approach to study\n\nthe variability of the folding patterns. IEEE TMI, 22(6):754\u2013765, 2003.\n\n[25] M.K. Chung. Heat kernel smoothing and its application to cortical manifolds. Technical report, 1090.\n\nDepartment of Statistics, Univ of Wisconsin, Madison, 2004.\n\n[26] M.K. Chung et al. Cortical thickness analysis in autism with heat kernel smoothing. NeuroImage,\n\n25(4):1256\u201365, 2005.\n\n[27] S.-I. Amari et al. Differential Geometry in Statistical Inference, volume 10. Institute of Mathematical\n\nStatistics, 1987.\n\n[28] O. Druet et al. Blow-up theory for elliptic PDEs in Riemannian geometry. Princeton Univ Pr, 2004.\n[29] C.R.Jr Jack et al. The Alzheimer\u2019s Disease Neuroimaging Initiative (ADNI): MRI methods. J Magn\n\nReson Imaging, 27(4):685\u201391, 2008.\n\n[30] J. Ashburner and K.J. Friston. Uni\ufb01ed segmentation. NeuroImage, 26(3):839\u201351, 2005.\n[31] R. S. Desikan et al. An automated labeling system for subdividing the human cerebral cortex on MRI\n\nscans into gyral based regions of interest. Neuroimage, 31(3):968\u2013980, 2006.\n\n9\n\n\f", "award": [], "sourceid": 185, "authors": [{"given_name": "Remi", "family_name": "Cuingnet", "institution": null}, {"given_name": "Marie", "family_name": "Chupin", "institution": null}, {"given_name": "Habib", "family_name": "Benali", "institution": null}, {"given_name": "Olivier", "family_name": "Colliot", "institution": null}]}