{"title": "Support Vector Machines Applied to Face Recognition", "book": "Advances in Neural Information Processing Systems", "page_first": 803, "page_last": 809, "abstract": null, "full_text": "Support Vector Machines Applied to Face \n\nRecognition \n\nP. Jonathon Phillips \n\nNational Institute of Standards and Technology \n\nBldg 225/ Rm A216 \n\nGaithersburg. MD 20899 \n\nTel 301.975.5348; Fax 301.975.5287 \n\njonathon@nist.gov \n\nAbstract \n\nFace recognition is a K  class problem. where K  is the number of known \nindividuals;  and  support vector machines  (SVMs)  are a binary  classi(cid:173)\nfication method. By reformulating the face recognition problem and re(cid:173)\ninterpreting the output of the SVM classifier. we developed a SVM -based \nface recognition algorithm.  The face recognition problem is formulated \nas a problem in difference space.  which models dissimilarities between \ntwo facial images. In difference space we formulate face recognition as a \ntwo class problem.  The classes are:  dissimilarities between faces  of the \nsame person.  and dissimilarities between faces  of different people.  By \nmodifying the interpretation of the decision surface generated by SVM. \nwe generated a similarity metric between faces  that is learned from ex(cid:173)\namples of differences between faces.  The SVM-based algorithm is com(cid:173)\npared with a principal component analysis (PeA) based algorithm on a \ndifficult set of images from the FEREf database. Performance was mea(cid:173)\nsured for both verification and identification scenarios. The identification \nperformance for SVM is 77-78% versus 54% for PCA. For verification. \nthe equal error rate is 7% for SVM and 13 % for PCA. \n\n1  Introduction \n\nFace recognition has developed into a major research area in pattern recognition and com(cid:173)\nputer vision. Face recognition is different from classical pattern-recognition problems such \nas character recognition.  In classical pattern recognition. there are relatively few  classes, \nand many samples per class. With many samples per class. algorithms can classify samples \nnot previously seen by  interpolating among the  training samples.  On the other hand, in \n\n\f804 \n\nP.  J  Phillips \n\nface recognition, there are many individuals (classes), and only a few images (samples) per \nperson, and algorithms must recognize faces by extrapolating from the training samples. \nIn numerous applications there can be only one training sample (image) of each person. \nSupport vector  machines  (SVMs)  are  formulated  to  solve  a classical  two  class  pattern \nrecognition problem.  We adapt SVM to face recognition by modifying the interpretation \nof the output of a SVM classifier  and devising a representation of facial  images  that is \nconcordant with a two class problem. Traditional SVM returns a binary value, the class of \nthe object.  To train our SVM algorithm, we formulate the problem in a difference  space, \nwhich explicitly captures the dissimilarities between two facial images. This is a departure \nfrom traditional face space or view-based approaches, which encodes each facial image as \na separate view of a face. \nIn difference space,  we are interested in the following two classes:  the dissimilarities be(cid:173)\ntween images of the same individual, and dissimilarities between images of different peo(cid:173)\nple.  These two classes are the input to a SVM algorithm.  A SVM algorithm generates a \ndecision surface separating the two classes.  For face recognition, we re-interpret the deci(cid:173)\nsion surface to produce a similarity metric between two facial images.  This allows us  to \nconstruct face-recognition algorithms.  The work of Moghaddam et al.  [3] uses a Bayesian \nmethod in a difference space, but they do not derive a similarity distance from both positive \nand negative samples. \nWe demonstrate our SVM-based algorithm on both verification and identification applica(cid:173)\ntions.  In identification,  the algorithm is presented with an image of an unknown person. \nThe algorithm reports its best estimate of the identity of an unknown person from a database \nof known individuals. In a more general response, the algorithm will report a list of the most \nsimilar individuals in the database.  In verification (also referred to as  authentication), the \nalgorithm is presented with an image and a claimed identity of the person.  The algorithm \neither accepts or rejects the claim. Or, the algorithm can return a confidence measure of the \nvalidity of the claim. \nTo provide a benchmark for comparison, we compared our algorithm with a principal com(cid:173)\nponent analysis  (PCA)  based  algorithm.  We  report results  on images  from  the FEREf \ndatabase of images, which is the de facto standard in the face recognition community. From \nour experience with the FEREf database, we selected harder sets of images on which to \ntest the algorithms. Thus, we avoided saturating performance of either algorithm and pro(cid:173)\nviding a robust comparison between the algorithms.  To test the ability of our algorithm to \ngeneralize to new faces, we trained and tested the algorithms on separate sets of faces. \n\n2  Background \n\nIn this section we will give a brief overview of SVM to present the notation used in this \npaper. For details of SVM see Vapnik [7], or for a tutorial see Burges [1]. SVM is a binary \nclassification method that finds the optimal linear decision surface based on the concept of \nstructural risk minimization.  The decision surface is  a weighted combination of elements \nof the training set. These elements are called support vectors and characterize the boundary \nbetween the two classes. The input to a SVM algorithm is a set {( XI, Yi) } of labeled training \ndata, where XI  is the data and Yi  =  -1 or 1 is the label.  The output of a SVM algorithm is \na set of Ns support vectors SI, coefficient weights ai, class labels Yi  of the support vectors, \nand a constant term b.  The linear decision surface is \nw\u00b7 z +b = 0, \n\nwhere \n\nNs \n\nW  = ~aiYisl' \n\ni=l \n\n\fSupport  Vector Machines Applied to Face Recognition \n\n805 \n\nSVM can be extended to nonlinear decision surfaces by using a kernel K ( \" .)  that satisfies \nMercer's condition [1, 7].  The nonlinear decision surface is \n\nNs L oWiK(sj, z) + b =  O. \n\ni = l \n\nA facial image is represented as a vector P  E  RN, where RN  is referred to as face  space. \nFace space can be the original pixel values vectorized or another feature space; for example, \nprojecting the facial image on the eigenvectors generated by performing PCA on a training \nset of faces [6] (also referred to as eigenfaces). \nWe  write PI  '\"  P2  if PI  and P2  are images of the same face,  and PI  1- P2  if they  are \nimages of different faces.  To avoid confusion we adopted the following terminology for \nidentification and  verification.  The gallery  is  the  set of images  of known people and  a \nprobe  is  an unknown face  that is  presented to  the  system.  In  identification,  the  face  in \na probe is  identified.  In verification,  a probe is  the facial  image presented to the  system \nwhose identity is to be verified. The set of unknown faces is call the probe set. \n\n3  Verification as a two class problem \n\nVerification is  fundamentally  a two class problem.  A verification algorithm is presented \nwith an image P  and a claimed identity.  Either the algorithm accepts or rejects the claim. \nA straightforward method for constructing a classifier for person X,  is to feed  a SVM al(cid:173)\ngorithm a training set with one class consisting of facial images of person X and the other \nclass consisting of facial images of other people.  A SVM algorithm will generated a linear \ndecision surface, and the identity of the face in image P is accepted if \n\nw\u00b7p + b:::;  0, \n\notherwise the claim is rejected. \nThis classifier is designed to  minimizes  the  structural risk.  Structural risk is  an  overall \nmeasure of classifier performance. However, verification performance is usually measured \nby  two  statistics,  the probability of correct verification,  Pv, and the probability of false \nacceptance,  PF .  There is  a tradeoff between Pv  and PF .  At one extreme all claims are \nrejected and Pv  =  PF  =  0;  and at the other extreme, all claims are accepted and  Pv  = \nPF  =  1. The operating values for Pv  and PF are dictated by the application. \nUnfortunately, the decision surface generated by a SVM algorithm produces a single per(cid:173)\nformance point for Pv  and PF . To allow for adjusting Pv  and PF. we parameterize a SVM \ndecision surface by  ~. The parametrized decision surface is \n\nand the identity of the face image p is accepted if \n\nw\u00b7 z +b =~, \n\nw  ' p+ b:::;~. \n\nIf ~ = -00, then all claims are rejected and  Pv  = PF  = 0;  if ~ = +00,  all claims \nare accepted and Pv  =  PF  =  O.  By varying ~ between negative and positive infinity, all \npossible combinations of Pv and PF  are found. \nNonlinear parametrized decision surfaces are described by \nN s L QiYiK(Sj, z)  + b =  ~. \n\ni = l \n\n\f806 \n\n4  Representation \n\nP J.  Phillips \n\nIn a canonical face recognition algorithm. each individual is a class and the distribution of \neach face is estimated or approximated.  In this method. for a gallery of K  individuals. the \nidentification problem is a K  class problem. and the verification problem is  K  instances \nof a two class  problems.  To reduce face recognition to a single instance of a two  class \nproblem. we introduce a new representation.  We model the dissimilarities between faces. \nLet T  = {t 1, ... , t M}  be a training set of faces of K  individuals. with multiple images of \neach of the K  individuals.  From T. we generate two classes.  The first is the within-class \ndifferences set. which are the dissimilarities in facial images of the same person.  Formally \nthe within-class difference set is \n\nThe set C1  contains within-class differences for all K  individuals in T. not dissimilarities \nfor one of the K  individuals in the training set. The second is the between-class differences \nset. which are the dissimilarities among images of different individuals in the training set. \nFormally. \n\nC2  =  {tl - tjltl f  tj}. \n\nClasses C1  and C2  are the inputs to our SVM algorithm. which generates a decision sur(cid:173)\nface. \nIn the  pure  SVM  paradigm.  given  the  difference  between  facial  images  Pl  and \nP2. the classifier estimates  if the faces  in the two images are from  the  same person.  In \nthe modification described in section 3.  the classification returns a measure of similarity \nt5  =  W, (Pl - P2) + b. This similarity measure is the basis for the SVM-based verification \nand identification algorithms presented in this paper. \n\n5  Verification \n\nIn verification. there is a gallery {gj} of m known individuals. The algorithm is presented \nwith a probe p  and a claim to be person j  in the gallery.  The first step of the verification \nalgorithm computes the similarity score \n\nNs \n\nt5=  LO:iYiK(Sl,gj -p) +b. \n\ni= l \n\nThe second step accepts the claim if t5  ~ ~. Otherwise. the claim is rejected. The value of \n~ is set to meet the desired tradeoff between Pv  and PF. \n\n6  Identification \n\nIn identification. there is a gallery {gj} of m known individuals. The algorithm is presented \nwith  a probe p  to be identified.  The first  step  of the  identification  algorithm computes \na similarity  score between the probe and each of the gallery images.  The similar  score \nbetween p and gj  is \n\nNs \n\nt5j  =  L O:iYiK(St, gj - p) + b. \n\ni=l \n\nIn the second step. the probe is identified as person j  that has minimum similarity score \nt5j .  An alternative method of reporting identification results is to order the gallery by  the \nsimilarity measure t5j  . \n\n\fSupport Vector Machines Applied to Face Recognition \n\n807 \n\n(a) \n\n(b) \n\nFigure 1:  (a) Original image from the FEREr database. (b) Image after prepr\u00ab:>ee)sing. \n\n7  Experiments \n\nWe demonstrate our SVM-based verification and identification algorithms on 400 frontal \nimages from the FEREf database of facial images [5].  To provide a benchmark for algo(cid:173)\nrithm pedormance. we provide performance for a PCA-based algorithm on the same set of \nimages.  The PCA algorithm identifies faces with a L2  nearest neighbor classifier.  For the \nSVM-based algorithms. a radial basis kernel was used. \n\nThe 400 images consisted of two images of 200 individuals. and were divided into disjoint \ntraining and testing sets.  Each set consisted of two images of 100 people.  All 400 images \nwere preprocessed to normalize geometry and illumination. and to remove background and \nhair (figure  1).  The preprocessing procedure consisted of manually  locating the centers \nof the eyes;  translating. rotating.  and scaling the faces  to place the center of the eyes on \nspecific pixels;  masking the faces  to remove background and hair;  histogram equalizing \nthe non-masked facial pixels;  and scaling the non-masked facial pixels to have zero mean \nand unit variance. \n\nPeA was pedormed on 100 preprocessed images (one image of each person in the training \nset).  This produced 99 eigenvectors {et}  and eigenvalues {Ad.  The eigenvectors were \nordered so that Ai  < A j  when i  < j. Thus. the low order eigenvectors encode the majority \nof the variance in the  training set.  The faces  were represented by projecting them on a \nsubset of the eigenvectors and this is the face space. We varied the dimension of face space \nby changing the number of eigenvectors in the representation. \nIn all experiments. the SVM training set consisted of the same images. '!he SVM-training \nset T  consisted of two images of 50 individuals from  the general training set of 100 in(cid:173)\ndividuals.  The set C1  consisted of all 50 within-class differences from faces of the same \nindividuals. The set C2  consisted of 50 randomly selected between-class differences. \n\nThe verification and  identification  algorithms were tested on a gallery  consisted of 100 \nimages from the test set. with one image person.  The probe set consisted of the remaining \nimages in the test set (100 individuals. with one image per person). \n\nWe report results for verification on a face space that consisted of the first 30 eigenfeatures \n(an eigenfeature is  the projection of the image onto an eigenvector).  The results  are re(cid:173)\nported as a receiver operator curve (ROC) in figure 2. The ROC in figure 2 was computed \n\n\f808 \n\nP.  J. Phillips \n\nr' \nr..r __ r~ \n\n--\n\nj \n\nr (cid:173)\n\n,--\n\" \n1--.2---'-\n\n_..J.. \n\n_ \n\n_ \n\nSVM algcrithm  -\nPeA algcrithm  ----. \n\n} \nI \n( \n) \n\" r \n./ 0' \n\n,f \n\nI \nl \nj \n.. \n( \nI \n\n0 \n\n0.1 \n\n0.2 \n\nProbabilty of false  acceptance \n\n0.3 \n\n0.4 \n\n0.5 \n\n0.6 \n\nIi \n\n\"8 ... .\" to > \n\n'0 \n~ \n\n~ -8 a: \n\n0.9 \n\n0.8 \n\n0.7 \n\n0.6 \n\n0.5 \n\n0.4 \n\nFigure 2:  ROC for verification (using first 30 eigenfeatures). \n\nby  averaging the  ROC  for  each of the 100 individuals in the gallery.  For person gj' the \nprobe set consisted of one image of person gj and 99 faces of different people. A summary \nstatistic for verification is the equal error rate.  The equal error rate is the point where the \nprobability of false  acceptance is equal to the probability of false verification, or mathe(cid:173)\nmatically, PF  =  1 - Pv.  For the SVM-based algorithm the equal error rate is  0.07, and \nfor the PeA-based algorithm is 0.13. \nFor identification, the algorithm estimated the identity of each of the probes in the probe \nset.  We compute the probability of correctly identifying the probes for a set of face spaces \nparametrized by the number of eigenfeatures. We always use the first n eigenfeatures, thus \nwe are slowly increasing the amount of information, as measured by variance, available to \nthe classifier. Figure 3 shows probability of identification as a function of representing faces \nby  the first n  eigenfeatures.  PeA achieves a correct identification rate of 54% and SVM \nachieves  an  identification rate of 77-78%.  (The PCA results  we report are significantly \nlower than those reported in the literature [2, 3]. This is because we selected a set of images \nthat are more difficult to recognize. The results are consistent with experimentations in our \ngroup  with PeA-based algorithms on the FEREf database  [4].  We  selected this  set of \nimages so that performance of neither the PCA or SVM algorithms are saturated.) \n\n8  Conclusion \n\nWe introduced a new technique for applying SVM to face recognition.  We demonstrated \nthe  algorithm on both verification and identification applications.  We  compared the per(cid:173)\nformance of our algorithm to a PCA-based algorithm. For verification, the equal error rate \nof our algorithm was  almost half that of the PCA algorithm, 7% versus  13%.  For identi(cid:173)\nfication,  the error of SVM was half that of PeA, 22-23% versus 46%.  This indicates that \nSVM is making more efficient use of the information in face space than the baseline PeA \nalgorithm. \nOne of the major concerns in practical face recognition applications is  the  ability  of the \n\n\fSupport  Vector Machines Applied to Face Recognition \n\n809 \n\nSVMscore&  -\nPCA score\"  ---. \n\n0.8 \n\n0.6 \n\n0.4 \n\n0.2 \n\n/----------------\n/ \n// \n\n---------------_ .. _-----____ --i.----\n\n--------------------------- ----------\n\no~ ______ ~ ______ i_  ______  ~ ______  J_  ____  ~ \n\no \n\n20 \n\n40 \n60 \nNumber of eigenfeature& \n\n80 \n\n100 \n\nFigure 3: Probability of identification as a function of the number eigenfeatures. \n\nalgorithm to generalize from a training set of faces to faces outside of the training set.  We \ndemonstrated the ability of the SVM-based algorithm to generalize by training and testing \non separate sets. \nFuture research directions include varying the kernel K, changing the representation space, \nand expanding the size of the gallery and probe set.  There is nothing in our method that is \nspecific to faces, and it should generalize to other biometrics such as fingerprints. \n\nReferences \n\n[1]  C. J. C.  Burges.  A tuturial on support vector machines for pattern recognition.  Data \n\nmining and knowledge discovery, (submitted), 1998. \n\n[2]  B.  Moghaddam and  A.  Pentland.  Face recognition using view-based  and  modular \neigenspaces.  In Proc.  SPIE Conference  on Automatic Systems for  the  Identification \nand Inspection of Humans, volume SPIE Vol. 2277, pages 12-21, 1994. \n\n[3]  B. Moghaddam, W. Wahid, and A. Pentland. Beyond eigenfaces: probablistic matching \nfor face recognition.  In 3rd International Conference on Automatic Face and Gesture \nRecognition, pages 30--35, 1998. \n\n[4]  H.  Moon and P.  J.  Phillips.  Analysis of PCA-based face  recognition algorithms.  In \nK  W. Bowyer and P. J. Phillips, editors, Empirical Evaluation Techniques in Computer \nVision. IEEE Computer Society Press, Los Alamitos, CA. 1998. \n\n[5]  P. J. Phillips, H. Wechsler, J. Huang. and P. Rauss.  The FEREf database and evalua(cid:173)\ntion procedure for f~recognition algorithms.  Image and Vision Computing Journal, \n16(5):295-306.1998. \n\n[6]  M.  Turk  and  A.  Pentland.  Eigenfaces for  recognition.  J.  Cognitive  Neuroscience, \n\n3(1):71-86,1991. \n\n[7]  V.  Vapnik.  The nature of statistical learning theory.  Springer. New York, 1995. \n\n\f", "award": [], "sourceid": 1609, "authors": [{"given_name": "P.", "family_name": "Phillips", "institution": null}]}