{"title": "Efficient Kernel Discriminant Analysis via QR Decomposition", "book": "Advances in Neural Information Processing Systems", "page_first": 1529, "page_last": 1536, "abstract": null, "full_text": "     Efficient Kernel Discriminant Analysis via QR\n                                Decomposition\n\n\n\n           Tao Xiong                     Jieping Ye                       Qi Li\n       Department of ECE             Department of CSE             Department of CIS\n     University of Minnesota      University of Minnesota        University of Delaware\n     txiong@ece.umn.edu          jieping@cs.umn.edu             qili@cis.udel.edu\n\n\n\n              Vladimir Cherkassky                           Ravi Janardan\n                Department of ECE                         Department of CSE\n              University of Minnesota                   University of Minnesota\n           cherkass@ece.umn.edu                        janardan@cs.umn.edu\n\n\n\n\n                                         Abstract\n\n          Linear Discriminant Analysis (LDA) is a well-known method for fea-\n          ture extraction and dimension reduction. It has been used widely in\n          many applications such as face recognition. Recently, a novel LDA algo-\n          rithm based on QR Decomposition, namely LDA/QR, has been proposed,\n          which is competitive in terms of classification accuracy with other LDA\n          algorithms, but it has much lower costs in time and space. However,\n          LDA/QR is based on linear projection, which may not be suitable for data\n          with nonlinear structure. This paper first proposes an algorithm called\n          KDA/QR, which extends the LDA/QR algorithm to deal with nonlin-\n          ear data by using the kernel operator. Then an efficient approximation of\n          KDA/QR called AKDA/QR is proposed. Experiments on face image data\n          show that the classification accuracy of both KDA/QR and AKDA/QR\n          are competitive with Generalized Discriminant Analysis (GDA), a gen-\n          eral kernel discriminant analysis algorithm, while AKDA/QR has much\n          lower time and space costs.\n\n\n\n1     Introduction\n\nLinear Discriminant Analysis [3] is a wellknown method for dimension reduction. It has\nbeen used widely in many applications such as face recognition [2]. Classical LDA aims\nto find optimal transformation by minimizing the within-class distance and maximizing\nthe between-class distance simultaneously, thus achieving maximum discrimination. The\noptimal transformation can be readily computed by computing the eigen-decomposition on\nthe scatter matrices.\n\nAlthough LDA works well for linear problems, it may be less effective when severe non-\nlinearity is involved. To deal with such a limitation, nonlinear extensions through kernel\nfunctions have been proposed. The main idea of kernel-based methods is to map the input\ndata to a feature space through a nonlinear mapping, where the inner products in the feature\n\n\f\nspace can be computed by a kernel function without knowing the nonlinear mapping explic-\nitly [9]. Kernel Principal Component Analysis (KPCA) [10], Kernel Fisher Discriminant\nAnalysis (KFDA) [7] and Generalized Discriminant Analysis (GDA) [1] are, respectively,\nkernel-based nonlinear extensions of the well known PCA, FDA and LDA methods.\n\nTo our knowledge, there are few efficient algorithms for general kernel based discriminant\nalgorithms -- most known algorithms effectively scale as O(n3) where n is the sample\nsize. In [6, 8], S. Mika et al. made a first attempt to speed up KFDA through a greedy\napproximation technique. However the algorithm was developed to handle the binary clas-\nsification problem. For multi-class problem, the authors suggested the one against the rest\nscheme by considering all two-class problems.\n\nRecently, an efficient variant of LDA, namely LDA/QR, was proposed in [11, 12]. The\nessence of LDA/QR is the utilization of QR-decomposition on a small size matrix. The\ntime complexity of LDA/QR is linear in the size of the training data, as well as the number\nof dimensions of the data. Moreover, experiments in [11, 12] show that the classification\naccuracy of LDA/QR is competitive with other LDA algorithms.\n\nIn this paper, we first propose an algorithm, namely KDA/QR1, which is a nonlinear exten-\nsion of LDA/QR. Since KDA/QR involves the whole kernel matrix, which is not scalable\nfor large datasets, we also propose an approximation of KDA/QR, namely AKDA/QR. A\ndistinct property of AKDA/QR is that it scales as O(ndc), where n is the size of the data,\nd is the dimension of the data, and c is the number of classes.\n\nWe apply the proposed algorithms on face image datasets and compare them with LDA/QR,\nand Generalized Discriminant Analysis (GDA) [1], a general method for kernel discrim-\ninant analysis. Experiments show that: (1) AKDA/QR is competitive with KDA/QR and\nGDA in classification; (2) both KDA/QR and AKDA/QR outperform LDA/QR in classifi-\ncation; and (3) AKDA/QR has much lower costs in time and space than GDA.\n\n\n2     LDA/QR\n\nIn this section, we give a brief review of the LDA/QR algorithm [11, 12]. This algorithm\nhas two stages. The first stage maximizes the separation between different classes via QR\nDecomposition [4]. The second stage addresses the issue of minimizing the within-class\ndistance, while maintaining low time/space complexity.\n\nLet A  IRdn be the data matrix, where each column ai is a vector in d-dimensional space.\nAssume A is partitioned into c classes {i}ci=1, and the size of the ith class |i| = ni.\nDefine between-class, within-class, and total scatter matrices Sb, Sw, and St respectively,\n                                                                                   \nas follows [3]: Sb = HbHt,\n                             b Sw = HwH t\n                                            w, and St = HtH t\n                                                                   t , where Hb = [     N1(m1 -\n           \nm),    , Nc(mc - m)]  Rdc, Hw = A - [m1et1,    , mcet] \n                                                                        c    Rdn, and Ht =\nA - met  Rdn, ei = (1,    , 1)t  Rni1, e = (1,    , 1)t  Rn1, mi is the mean of\nthe ith class, and m is the global mean. It is easy to check that St = Sb + Sw.\n\nThe first stage of LDA/QR aims to solve the following optimization problem,\n\n                                G = arg max trace(GtSbG).                                    (1)\n                                        GtG=I\n\nNote that this optimization only addresses the issue of maximizing the between-class dis-\ntance. The solution can be obtained by solving the eigenvalue problem on Sb. The solution\ncan also be obtained through QR Decomposition on the centroid matrix C [12], where\nC = [m1, m2,    , mc] consists of the c centroids. More specifically, let C = QR be\nthe QR Decomposition of C, where Q  IRnc has orthonormal columns and R  IRcc\n\n     1KDA/QR stands for Kernel Discriminant Analysis via QR-decomposition\n\n\f\n     Algorithm 1: LDA/QR\n     /* Stage I: */\n     1. Construct centroid matrix C;\n     2. Compute QR Decomposition of C as C = QR, where Q  IRdc, R  IRcc;\n     /* Stage II: */\n     3. Y  HtbQ;\n     4. Z  HttQ;\n     5. B  Y tY ; /*Reduced between-class scatter matrix*/\n     6. T  ZtZ; /*Reduced total scatter matrix*/\n     7. Compute the c eigenvectors i of (T + Ic)-1B with decreasing eigenvalues;\n     8. G  QV , where V = [1,    , c].\n\n\nis upper triangular. Then G = QV , for any orthogonal matrix V , solves the optimiza-\ntion problem in Eq. (1). Note that the choice of orthogonal matrix V is arbitrary, since\ntrace(GtSbG) = trace(V tGtSbGV ), for any orthogonal matrix V .\n\nThe second stage of LDA/QR refines the first stage by addressing the issue of minimizing\nthe within-class distance. It incorporates the within-class scatter information by applying a\nrelaxation scheme on V (relaxing V from an orthogonal matrix to an arbitrary matrix). In\nthe second stage of LDA/QR, we look for a transformation matrix G such that G = QV , for\nsome V . Note that V is not required to be orthogonal. The original problem of computing G\nis equivalent to computing V . Since GtSbG = V t(QtSbQ)V , GtSwG = V t(QtSwQ)V ,\nand GtStG = V t(QtStQ)V , the original problem of finding optimal G is equivalent to\nfinding V , with B = QtSbQ, W = QtSwQ, and T = QtStQ as the \"reduced\" between-\nclass, within-class and total scatter matrices, respectively. Note that B has much smaller\nsize than the original scatter matrix Sb (similarly for W and T ).\n\nThe optimal V can be computed efficiently using many existing LDA-based methods, since\nwe are dealing with matrices B, W , and T of size c by c. We can compute the optimal V\nby simply applying regularized LDA; that is, we compute V , by solving a small eigenvalue\nproblem on (W + Ic)-1B or (T + Ic)-1B (note T = B + W ), for some positive\nconstant  [3]. The pseudo-code for this algorithm is given in Algorithm 1. We use the\ntotal scatter instead of the within-class scatter in Lines 4, 6, and 7, mainly for convenience\nof presentation of the kernel methods in Section 3 and Section 4.\n\n\n3    Kernel discriminant analysis via QR-decomposition (KDA/QR)\n\nIn this section, the KDA/QR algorithm, a nonlinear extension of LDA/QR through kernel\nfunctions, is presented. Let  be a mapping to the feature space and (A) be the data\nmatrix in the feature space. Then, the centroid matrix C in the feature space is\n                                                   1                                 1\n              C = m\n                              1 ,    , m =                  (                                (\n                                          c                      a                                 a\n                                                   n                  i),    ,                       i)    .                (2)\n                                                    1                                n\n                                                         i                         c\n                                                           1                              ic\n\nThe global centroid in the feature space can be computed as m = 1                                                      . To maxi-\n                                                                                             n          i nim\n                                                                                                                   i\nmize between-class distance in the feature space, as discussed in Section 2, we perform QR\ndecomposition on C, i.e., C = QR. A key observation is that R can be computed\nas (C)tC = (R)tR by applying the Cholesky decomposition on (C)tC [4].\n\nNote that C = AM , where A = (A) = [(a1) . . . (an)], and the ith column\nof M is (0,    , 0, 1 ,    , 1 , 0,    , 0)t. Let K be the kernel matrix with K(i, j) =\n                        ni          ni\n (ai), (aj) . Then\n                                               (C)tC = MtKM.                                                                 (3)\n\n\f\n Algorithm 2: KDA/QR\n /* Stage I: */\n 1.       Construct kernel matrix K;\n 2.       Compute (C)tC = M t(KM ) as in Eq. (3);\n 3.       Compute R from the Cholesky Decomposition of (C)tC;\n /* Stage II: */\n 4.       Y   N tM tKM (R)-1;\n 5.       Z  EtKM (R)-1;\n 6.       B  (Y )tY ;\n 7.       T   (Z)tZ;\n 8.       Compute the c eigenvectors  of (\n                                                                       i           T  + Ic)-1B, with decreasing eigenvalues;\n 9.       V   [\n                                1 , \n                                      2 ,    , ]\n                                                      c ;\n 10. G  C(R)-1V ;\n\n\n\nWith the computed R, Q = C(R)-1. The matrices Y , Z, B, and W  in the\nfeature space (corresponding to the second stage in LDA/QR) can be computed as follows.\n\nIn the feature space, we have H                                                   =\n                                                                           b               CN , where the ith column of N is\n((0                 \n       ,    , n                                   ni\n                         i,    0)t -                    (n                                                                                   )tQ =\n                                                n               1,    , nc)t.             It follows that Y  = (H\n                                                                                                                                           b\nN t(C)tC(R)-1 = N tM tKM (R)-1.                                                               Similarly, H =\n                                                                                                                           t          AE and Z =\n(H)t\n   t      Q = Et(A)tC(R)-1 = Et(A)tAM (R)-1 = EtKM (R)-1, where\nE = I - 1 eet.\n                    n\nSince S =                      (          )t and               =           (           )t, we have\n               b         H\n                           b     H\n                                      b              S\n                                                           t         H\n                                                                       t         H\n                                                                                   t\n\n                          B = (Q)tS                                                                 (         )t\n                                                                 b Q = (Q)tH \n                                                                                                  b     H\n                                                                                                            b          Q = (Y )tY ,\n                           T  = (Q)tS                                                               (         )t\n                                                                 t Q = (Q)tH \n                                                                                                  t     H\n                                                                                                            t          Q = (Z)tZ.\n\nWe proceed by computing the c eigenvectors {}c\n                                                                                             i         i=1 of (T  + Ic)-1B. Define V  =\n[1, 2,    , ]\n                            c . The final transformation matrix can be computed as\n\n\n                                                      G = QV  = C(R)-1V .                                                                     (4)\n\nFor       a         given            data     point              z,         its          projection              by       G    is    (G)t(z)     =\n(V )t((R)-1)t(C)t(z) = (V )t((R)-1)tMtKtz, where Ktz  IRn and\nKtz(i) = (ai), (z) .\n\nThe pseudo-code for the KDA/QR algorithm is given in Algorithm 2.\n\n\n3.1      Complexity analysis of KDA/QR\n\n\nThe cost to formulate the kernel matrix in Line 1 is O(n2d). The computation of (C)tC\nin Line 2 takes O(n2), taking advantage of the sparse structure of M . The Cholesky decom-\nposition in Line 3 takes O(c3) [4]. Lines 4 takes O(c3), as M tKM is already computed in\nLine 2. In Line 5, the computation of Z = EtKM (R)-1 = (I - 1 eet)KM (R)-1 =\n                                                                                                                                 n\nKM (R)-1 - 1 e (etKM )(R)-1                                                           in the given order takes O(nc2), assuming KM\n                                n\nis kept in Line 2. Lines 6, 7, and 8 take O(c3), O(nc2) and O(c3), respectively. Hence,\nthe total complexity of the kernel LDA/QR algorithm is O(n2d). Omitting the cost for\nevaluating the kernel matrix K, which is required in all kernel-based algorithms, the total\ncost is O(n2). Note that all other general discriminant analysis algorithms scale as O(n3).\n\n\f\n4      Approximate KDA/QR (AKDA/QR)\n\nIn this section, we present the AKDA/QR algorithm, which is an efficient approximation\nof the KDA/QR algorithm from the last section. Note that the bottleneck of KDA/QR is the\nexplicit formation of the large kernel matrix K for the computation of (C)tC in Line\n2 of Algorithm 2. The AKDA/QR algorithm presented in this section avoids the explicit\nconstruction of K, thus reducing the computational cost significantly.\n\nThe key to AKDA/QR is the efficient computation of (C)tC, where C =\n[m1,    , m]                             = 1                  (\n               c     and m                                                                                              in the original\n                                         j                               a\n                                                n                         i). AKDA/QR aims to find x\n                                                 j       ij                                                       j\nspace such that (x) approximates                                      . Mathematically, the optimal               can be computed\n                               j                             m\n                                                                   j                                         xj\nby solving the following optimization problem:\n\n                                                             1\n                           min (xj) -                                        (ai) 2 for j = 1,    , c.                             (5)\n                          xj Rd                             nj ij\n\nTo proceed, we only consider Gaussian kernels for AKDA/QR, as they are the most widely\nused ones in the literature [9]. Furthermore, the optimization problem in (5) can be simpli-\nfied by focusing on the Gaussian kernels, as shown in the following lemma.\n\nLemma 4.1. Consider Gaussian kernel function exp(- x - y 2/), where  is the band-\nwidth parameter. The optimization problem in (5) is convex if\n\n                                                                                                 2\n               for each j = 1,    , c and for all i  j,                                          (x\n                                                                                                        j - ai)          1            (6)\n\nProof. It is easy to check that, for the Gaussian kernel, the optimization problem in (5)\nreduces to:\n\n                                                     min f(xj) for j = 1,    , c,                                                    (7)\n                                                xj Rd\n\nwhere f (x) =                                                                                    2\n                          i fi(x) and fi(x) = -exp(- x - ai                                         /). The Hessian matrix of\n                                    j\nf                                                            2\n i(x) is H (fi) = 2 exp(- x - a                                   /)(I - 2 (x - a\n                                                       i                                 i)(x - ai)t). It is easy to show that\nif      2 (x - a\n                   i)     1, for all i  j, then H(fi) is positive semi-definite, that is, fi(x) is\nconvex. Thus, f (x), the sum of convex functions is also convex.\n\nFor applications involving high-dimensional data, such as face recognition,  is usually\nlarge (typically ranging from thousands to hundreds of thousands [13]), and the condition\nin Lemma 4.1 holds if we restrict our search space to the convex hull of each class in the\noriginal space. Therefore, the global minimum of the optimization problem in (7) can be\nfound very efficiently using Newton's or gradient decent methods. A key observation is\nthat for relatively large , the centroid of each class in the original space will map very\nclose to the centroid in the feature space [9], which can serve as the approximate solution\nof the optimization problem in (7). Experiments show that choosing x = 1\n                                                                                                               j                             a\n                                                                                                                           n                 i\n                                                                                                                                j    ij\nproduces results close to the one by solving the optimization problem in (7). We thus use\nit in all the following experiments.\n\nWith the computed x, for\n                                    j          j = 1, . . . , c, the centroid matrix C can be approximated by\n\n                                                C  [(x1) . . . (x)] ( ^\n                                                                                     c       C)                                        (8)\n\nand\n\n                                                              ( ^\n                                                                  C)t ^\n                                                                              C = ^\n                                                                                     K,                                                 (9)\n\n\f\n Algorithm 3: AKDA/QR\n /* Stage I: */\n 1.     Compute x = 1\n                            j                             a\n                                       n                   i, for j = 1,    , c;\n                                       j      ij\n 2.     Construct kernel matrix ^\n                                                    K as in Eq. (9);\n 3.     Compute ^\n                        R from the Cholesky Decomposition of ^\n                                                                                          K;\n /* Stage II: */\n 4.      ^\n        Y   N t ^\n                         K( ^\n                                  R)-1;\n 5.      ^\n        Z  Et ^\n                        Ktc( ^\n                                   R)-1;\n 6.      ^\n        B  ( ^\n                       Y )t ^\n                                  Y ;\n 7.      ^\n        T   ( ^\n                      Z)t ^\n                                 Z;\n 8.     Compute the c eigenvectors ^\n                                                                of ( ^\n                                                                i       T  + Ic)-1 ^\n                                                                                       B, with decreasing eigenvalues;\n 9.      ^\n        V   [ ^\n                      \n                        1 , ^\n                                 \n                                  2 ,    , ^\n                                              ]\n                                               c ;\n 10.     ^\n        G  ^\n                      C( ^\n                              R)-1 ^\n                                            V ;\n\n\n                                 PCA           LDA/QR                       GDA         KDA/QR       AKDA/QR\n              time          O(n2d)                 O(ndc)              O(n2d + n3)        O(n2d)      O(ndc)\n              space           O(nd)                 O(nc)                   O(n2)         O(n2)        O(nc)\n\nTable 1: Comparison of time & space complexities of several dimension reduction algo-\nrithms: n is the size of the data, d is the dimension, and c is the number of classes.\n\n\n\nwhere ^\n         K(i, j) = (x)                                 ) and ^\n                                       i , (x\n                                                    j                K  Rcc. The Cholesky decomposition of ^\n                                                                                                                  K will\ngive us ^\n         R by ^\n                       K = ( ^\n                                   R)t ^\n                                             R.\n\nIt follows that ^\n                       H = ^\n                         b             CN , and ^\n                                                               Y  = N t ^\n                                                                            K( ^\n                                                                              R)-1. Similarly, ^\n                                                                                                    Z = Et ^\n                                                                                                             Ktc( ^\n                                                                                                                 R)-1,\nwhere N and E are defined as in Section 3, and ^\n                                                                               Ktc(i, j) = (ai), (x) .\n                                                                                                       j\n\nThe following steps will be the same as the KDA/QR algorithm. The pseudo-code for\nAKDA/QR is given in Algorithm 3.\n\n\n4.1     Complexity analysis of AKDA/QR\n\nIt takes O(dn) in Line 1. The construction of the matrix ^\n                                                                                          K in Line 2 takes O(c2d). The\nCholesky Decomposition in Line 3 takes O(c3) [4]. Lines 4 and 5 take O(c3) and O(ndc)\nrespectively. It then takes O(c3) and O(nc2) for matrix multiplications in Lines 6 and 7,\nrespectively. Line 8 computes the eigen-decomposition of a c by c matrix, hence takes\nO(c3) [4]. Thus, the most expensive step in Algorithm 3 is Line 5, which takes O(ndc).\n\nTable 1 lists the time and space complexities of several dimension reduction algorithms. It\nis clear from the table that AKDA/QR is more efficient than other kernel based methods.\n\n\n5       Experimental results\n\nIn this section, we evaluate both the KDA/QR and AKDA/QR algorithms. The perfor-\nmance is measured by classification accuracy. Note that both KDA/QR and AKDA/QR\nhave two parameters:  for the kernel function and  for the regularization. Experi-\nments show that choosing  = 100000 and  = 0.15 for KDA/QR, and  = 100000\nand  = 0.10 for AKDA/QR produce good overall results. We thus use these values\nin all the experiments. 1-Nearest Neighbor (1-NN) method is used as the classifier. We\nrandomly select p samples of each person from the dataset for training and the rest for\n\n\f\n                     1                                                                     0.9\n\n                  0.99                                                                     0.8\n\n                  0.98                                                                     0.7\n\n                  0.97                                                                     0.6\n                                                                                                                          PCA\n                  0.96                            PCA                                      0.5                       LDA/QR\n     Accuracy                                LDA/QR                           Accuracy                               KDA/QR\n                  0.95                      KDA/QR                                         0.4                     AKDA/QR\n                                           AKDA/QR\n\n                  0.94                                                                     0.3\n\n                  0.93                                                                     0.2\n                           3        4        5            6     7        8                         3       4         5            6     7        8\n\n                                Number of training samples per class                                    Number of training samples per class\n\n\n                  Figure 1: Comparison of classification accuracy on PIX (left) and AR (right).\n\n\n\ntesting. We repeat the experiments 20 times and report the average recognition accuracy\nof each method. The MATLAB codes for the KDA/QR and AKDA/QR algorithms may be\naccessed at http://www.cs.umn.edu/jieping/Kernel.\n\nDatasets: We use the following three datasets in our study, which are publicly available:\nPIX contains 300 face images of 30 persons. The image size of PIX image is 512  512.\nWe subsample the images down to a size of 100  100 = 10000; ORL is a well-known\ndataset for face recognition. It contains ten different face images of 40 persons, for a total\nof 400 images. The image size is 92  112 = 10304; AR is a large face image datasets. We\nuse a subset of AR. This subset contains 1638 face images of 126 persons. Its image size\nis 768  576. We subsample the images down to a size of 60  40 = 2400. Each dataset is\nnormalized to have zero mean and unit variance.\n\nKDA/QR and AKQA/QR vs. LDA/QR: In this experiment, we compare the perfor-\nmance of AKDA/QR and KDA/QR with that of several other linear dimension reduction\nalgorithms including PCA, LDA/QR on two face datasets. We use 100 principal compo-\nnents for PCA as it produces good overall results. The results are summarized in Fig. 1,\nwhere the x-axis denotes the number of samples per class in the training set and the y-axis\ndenotes the classification accuracy. Fig. 1 shows that KDA/QR and AKQA/QR consis-\ntently outperform LDA/QR and PCA. The most interesting result lies in the AR dataset,\nwhere AKDA/QR and KDA/QR outperform LDA/QR by a large margin. It is known that\nthe images in the AR dataset contain pretty large area of occlusion due to sun glasses and\nscarves, which makes linear algorithms such as LDA/QR less effective. Another interest-\ning observation is that the approximate AKQA/QR algorithm is competitive with its exact\nversion KDA/QR in all cases.\n\nKDA/QR and AKQA/QR vs. GDA: In this experiment, we compare the performance of\nAKDA/QR and KDA/QR with Generalized Discriminant Analysis (GDA) [1]. The com-\nparison is made on the ORL face dataset, as the result of GDA on ORL is available in\n[5]. We also include the results on PCA and LDA/QR. The results are summarized in Ta-\nble 2. The main observation from this experiment is that both KDA/QR and AKDA/QR are\ncompetitive with GDA, while AKDA/QR is much more efficient than GDA (see Table 1).\nSimilar to the first experiment, Table 2 shows that KDA/QR and AKDA/QR consistently\noutperform the PCA and LDA/QR algorithms in terms of recognition accuracy.\n\n\n6                Conclusions\n\nIn this paper, we first present a general kernel discriminant analysis algorithm, called\nKDA/QR. Using Gaussian kernels, we then proposed an approximate algorithm to\n\n\f\n                 p     PCA       LDA/QR      GDA       KDA/QR        AKDA/QR\n                 3     0.8611    0.8561      0.8782      0.9132           0.9118\n                 4     0.8938    0.9083      0.9270      0.9321           0.9300\n                 5     0.9320    0.9385      0.9535      0.9625           0.9615\n                 6     0.9512    0.9444      0.9668      0.9737           0.9744\n                 7     0.9633    0.9692      0.9750      0.9825           0.9815\n                 8     0.9713    0.9713      0.9938      0.9875           0.9875\n\nTable 2: Comparison of classification accuracy on ORL face image dataset. p is the number\nof training samples per class. The results on GDA are taken from [5].\n\n\n\nKDA/QR, which we call AKDA/QR. Our experimental results show that the accuracy\nachieved by the two algorithms is very competitive with GDA, a general kernel discrimi-\nnant algorithms, while AKDA/QR is much more efficient. In particular, the computational\ncomplexity of AKDA/QR is linear in the number of the data points in the training set as\nwell as the number of dimensions and the number of classes.\n\nAcknowledgment Research of J. Ye and R. Janardan is sponsored, in part, by the Army High Per-\nformance Computing Research Center under the auspices of the Department of the Army, Army\nResearch Laboratory cooperative agreement number DAAD19-01-2-0014, the content of which does\nnot necessarily reflect the position or the policy of the government, and no official endorsement\nshould be inferred.\n\n\nReferences\n\n [1] G. Baudat and F. Anouar. Generalized discriminant analysis using a kernel approach. Neural\n     Computation, 12(10):23852404, 2000.\n\n [2] P.N. Belhumeour, J.P. Hespanha, and D.J. Kriegman. Eigenfaces vs. fisherfaces: Recognition\n     using class specific linear projection. IEEE TPAMI, 19(7):711720, 1997.\n\n [3] K. Fukunaga. Introduction to Statistical Pattern Classification. Academic Press, San Diego,\n     California, USA, 1990.\n\n [4] G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press,\n     Baltimore, MD, USA, third edition, 1996.\n\n [5] Q. Liu, R. Huang, H. Lu, and S. Ma. Kernel-based optimized feature vectors selection and\n     discriminant analysis for face recognition. In ICPR Proceedings, pages 362  365, 2002.\n\n [6] S. Mika, G. Ratsch, and K.-R. Muller. A mathematical programming approach to the kernel\n     fisher algorithm. In NIPS Proceedings, pages 591  597, 2001.\n\n [7] S. Mika, G. Ratsch, J. Weston, B. Schokopf, and K.-R. Muller. Fisher discriminant analysis\n     with kernels. In IEEE Neural Networks for Signal Processing Workshop, pages 41  48, 1999.\n\n [8] S. Mika, A.J. Smola, and B. Scholkopf. An improved training algorithm for kernel fisher dis-\n     criminants. In AISTATS Proceedings, pages 98104, 2001.\n\n [9] B. Schokopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization,\n     Optimization and Beyond. MIT Press, 2002.\n\n[10] B. Schokopf, A. Smola, and K. Muller. Nonlinear component analysis as a kernel eigenvalue\n     problem. Neural Computation, 10(5):12991319, 1998.\n\n[11] J. Ye and Q. Li. LDA/QR: An efficient and effective dimension reduction algorithm and its\n     theoretical foundation. Pattern recognition, pages 851854, 2004.\n\n[12] J. Ye, Q. Li, H. Xiong, H. Park, R. Janardan, and V. Kumar. IDR/QR: An incremental dimension\n     reduction algorithm via QR decomposition. In ACM SIGKDD Proceedings, pages 364373,\n     2004.\n\n[13] W. Zheng, L. Zhao, and C. Zou. A modified algorithm for generalized discriminant analysis.\n     Neural Computation, 16(6):12831297, 2004.\n\n\f\n", "award": [], "sourceid": 2686, "authors": [{"given_name": "Tao", "family_name": "Xiong", "institution": null}, {"given_name": "Jieping", "family_name": "Ye", "institution": null}, {"given_name": "Qi", "family_name": "Li", "institution": null}, {"given_name": "Ravi", "family_name": "Janardan", "institution": null}, {"given_name": "Vladimir", "family_name": "Cherkassky", "institution": null}]}