{"title": "Insights from Machine Learning Applied to Human Visual Classification", "book": "Advances in Neural Information Processing Systems", "page_first": 905, "page_last": 912, "abstract": "", "full_text": "Insights from Machine Learning Applied to\n\nHuman Visual Classi\ufb01cation\n\nArnulf B. A. Graf and Felix A. Wichmann\n\nMax Planck Institute for Biological Cybernetics\n\nSpemannstra\u00dfe 38\n\n72076 T\u00a8ubingen, Germany\n\nfarnulf.graf, felix.wichmanng@tuebingen.mpg.de\n\nAbstract\n\nWe attempt to understand visual classi\ufb01cation in humans using both psy-\nchophysical and machine learning techniques. Frontal views of human\nfaces were used for a gender classi\ufb01cation task. Human subjects classi-\n\ufb01ed the faces and their gender judgment, reaction time and con\ufb01dence\nrating were recorded. Several hyperplane learning algorithms were used\non the same classi\ufb01cation task using the Principal Components of the\ntexture and shape representation of the faces. The classi\ufb01cation perfor-\nmance of the learning algorithms was estimated using the face database\nwith the true gender of the faces as labels, and also with the gender es-\ntimated by the subjects. We then correlated the human responses to the\ndistance of the stimuli to the separating hyperplane of the learning algo-\nrithms. Our results suggest that human classi\ufb01cation can be modeled by\nsome hyperplane algorithms in the feature space we used. For classi\ufb01ca-\ntion, the brain needs more processing for stimuli close to that hyperplane\nthan for those further away.\n\n1\n\nIntroduction\n\nThe last decade has seen tremendous technological advances in neuroscience from the mi-\ncroscopic to the macroscopic scale (e.g. from multi-unit recordings to functional magnetic\nresonance imaging). On an algorithmic level, however, methods and understanding of brain\nprocesses are still limited. Here we report on a study combining psychophysical and ma-\nchine learning techniques in order to improve our understanding of human classi\ufb01cation of\nvisual stimuli. What algorithms best describe the way the human brain classi\ufb01es? Might\nhumans use something akin to hyperplanes for classi\ufb01cation? If so, is the learning rule as\nsimple as in mean-of-class prototype learners or are more sophisticated algorithms better\ncandidates?\n\nIn our experiments, subjects and machines classi\ufb01ed human faces according to gender. The\nstimuli were presented and we collected the subjects\u2019 responses, which are the estimated\ngender, reaction time and con\ufb01dence rating (sec.2). For every subject two personal new\ndatasets were created: the original faces either with the true or with the subject\u2019s labels\n(true or estimated gender response). We then applied a Principal Component Analysis to\na texture and shape representation of the faces. Various algorithms such as Support Vec-\n\n\ftor Machines, Relevance Vector Machines, Prototype and K-means Learners (sec.3) were\napplied on this low-dimensional dataset with either the true or the subjects\u2019 labels. The\nresulting classi\ufb01cation performances were compared, the corresponding decision hyper-\nplanes were computed and the distances of the faces to the hyperplanes were correlated\nwith the subjects\u2019 responses, the data being pooled among all subjects and stimuli or on a\nstimulus-by-stimulus basis (sec.4).\n\n2 Human Classi\ufb01cation Behaviour\n\nWe used grey-scale frontal views of human faces taken from the MPI face database [1].\nBecause of technical inhomogeneities of the faces in the database we post-processed each\nface such that all faces have same mean intensity, same pixel-surface area and are cen-\ntred [2]. This processing stage is followed by a slight low-pass \ufb01ltering of each face in\nthe database in order to eliminate, as much as possible, scanning artifacts. The database\nis gender-balanced and contains 200 Caucasian faces (see Fig.1). Twenty-seven human\n\n15\n\n14\n\n)\n\ni\n\n13\n\nl\n\n(l\ng\no\nl\n \ne\nu\na\nv\nn\ne\ng\ne\n\ni\n\n12\n\n11\n\n10\n\n9\n\n8\n\n7\n\n0\n\n20\n\n40\n\n80\n\n60\n140\nindex of component i\n\n120\n\n100\n\n160\n\n180\n\n200\n\nFigure 1: Female and male faces from the processed database (left). Eigenvalue spectrum\nfrom the PCA of our texture-shape representation (see sec.4): (cid:21)min = 1:01 (cid:1) 103 (the last\neigenvalue being 0 is not plotted) and (cid:21)max = 2:47 (cid:1) 106 (right).\nsubjects were asked to classify the faces according to their gender and we recorded three\nresponses: estimated class (i.e. female/male), reaction time (RT) and, after each estimated-\nclass-response, a con\ufb01dence rating (CR) on a scale from 1 (unsure) to 3 (sure). The stimuli\nwere presented sequentially to the subjects on a carefully calibrated display using a modi-\n\ufb01ed Hanning window (a raised cosine function with a raising time of ttransient = 500ms\nand a plateau time of tsteady = 1000ms, for a total presentation time t = 2000ms per\nface). Subjects were asked to answer as fast as possible to obtain perceptual, rather than\ncognitive, judgements. Most of the time they responded well before the presentation of\nthe stimulus had ended (mean RT over all stimuli and subjects was approximately 900ms).\nAll subjects had normal or corrected-to-normal vision and were paid for their participation.\nMost of them were students from the University of T\u00a8ubingen and all of them were naive to\nthe purpose of the experiment.\n\nAnalysis of the classi\ufb01cation performance of humans is based on signal detection the-\nory [3] and we assume that, on the decision axis, the internal signal and noise distribu-\ntions are Gaussian with same unit variance but different means. We de\ufb01ne correct re-\nsponse probabilities for males (+) and females ((cid:0)) as P+ = P (^y = 1jy = 1) and\nP(cid:0) = P (^y = (cid:0)1jy = (cid:0)1) where ^y is the estimated class and y the true class of the stimu-\nlus. The discriminability of both stimuli can then be computed as: d0 = Z(P+) + Z(P(cid:0))\nwhere Z = (cid:8)(cid:0)1, and (cid:8) is the cumulative normal distribution with zero mean and unit\nvariance. Averaged across subjects we obtain d0 = 2:85 (cid:6) 0:73. This value indicates\nthat the classi\ufb01cation task is comparatively easy for the subjects, although without be-\ning trivial (no ceiling effect). We observe a strong male bias (a large number of females\n\n\fclassi\ufb01ed as males but very few males classi\ufb01ed as females) and express this bias as:\n(cid:17) = Z 2(P+) (cid:0) Z 2(P(cid:0)) = 3:14 (cid:6) 2:61. The subplots of Fig.2 show the correlations\nof (a) RT and classi\ufb01cation error, (b) classi\ufb01cation error and CR, and (c) RT and CR. First,\n\n(a)\n\n(b)\n\n(c)\n\n]\ns\n[\n \n\nT\nR\n\n1.1\n\n1\n\n0.9\n\n0.8\n\nr\no\nr\nr\ne\n \nn\na\ne\nm\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n1.1\n\n1\n\n0.9\n\n0.8\n\n]\ns\n[\n \n\nT\nR\n\nno error\n\nerror\n\n1\n\n2\nCR\n\n3\n\n1\n\n2\nCR\n\n3\n\nFigure 2: Human classi\ufb01cation behaviour: mutual dependencies of the subject\u2019s responses.\n\nRT\u2019s are longer for incorrect answers than for correct ones (a). Second, a high CR is cor-\nrelated with a low classi\ufb01cation error (b) and thus subjects have veridical knowledge about\nthe dif\ufb01culty of individual responses\u2014this is certainly not the case in many low-level psy-\nchophysical settings. Third, the RT decreases as the CR increases (c), i.e. stimuli easy to\nclassify are also classi\ufb01ed rapidly. It may thus be concluded that a high error (or equiva-\nlently a low CR) implies higher RT\u2019s. This may suggest that patterns dif\ufb01cult to classify\nneed more computation, i.e. longer processing, by the brain than patterns easy to classify.\n\n3 Machine Learning Classi\ufb01ers\n\nIn the following, various hyperplane classi\ufb01cation algorithms are expressed as weighted\ndual space learners with different learning rules. Given a dataset f~xi; yigp\ni=1, we assume\nclassi\ufb01cation is done in the input space, i.e. we consider linear kernels. Moreover, the in-\nput space is normalized since this has proved to be effective for some classi\ufb01ers [4]. The\nhyperplanes can be written using a weight (or normal) vector ~w and an offset b in order to\nyield a classi\ufb01cation rule as y(~x) = sign(h ~wj~xi + b) in the \ufb01rst three cases whereas in the\nlast one, the decision rule is a collection of hyperplanes. These classi\ufb01ers are compared on\na two-dimensional toy dataset in Fig.3.\nSupport Vector Machine (SVM, [5]). The weight vector is given as: ~w = Pi (cid:11)iyi~xi\nwhere ~(cid:11) is obtained by maximising Pi (cid:11)i (cid:0) 1\n2 Pij yiyj(cid:11)i(cid:11)jh~xij~xji subject to Pi (cid:11)iyi =\n0 and 0 (cid:20) (cid:11)i (cid:20) C where C is a regularisation parameter, determined using for instance\ncross-validation. The offset is computed as: b = hyi (cid:0) h ~wj~xiiiij0<(cid:11)i<C.\nRelevance Vector Machine (RVM, [6]). The weight vector (incorporating here the off-\nset) is expressed as ~w = Pi (cid:11)i~xi. A Bernoulli distribution describes P (~yjX; ~(cid:11)) where\nX = f~xigp\ni=1. A hyperparameter ~(cid:12) is introduced in order to retrieve a sparse and smooth\nsolution for ~(cid:11) using a Gaussian distribution for P (~(cid:11)j~(cid:12)). Learning amounts to maximising\nP (~yjX; ~(cid:12)) = R P (~yjX; ~(cid:11))P (~(cid:11)j~(cid:12))d~(cid:11) with respect to ~(cid:12). Since the latter is not integrable\nanalytically, the Laplace approximation (local approximation of the integrand by a Gaus-\nsian) is used for resolution, yielding an iterative update scheme for ~(cid:12).\nPrototype Learner (Prot, [7]). De\ufb01ning the prototypes ~p(cid:6) = Pp\ni=1 ~xi(yi(cid:6)1)\ni=1(yi(cid:6)1) =\nPp\nPijyi=(cid:6)1 (cid:11)i~xi as the centre of mass of each class, the weight vector is then expressed\nas: ~w = ~p+ (cid:0) ~p(cid:0) = Pi (cid:11)iyi~xi and the offset as: b = k~p(cid:0)k2(cid:0)k~p+k2\nK-means Clustering with Nearest-neighbor Learner (Kmean, [8]). Once the K cen-\ntres of the clusters for each class are computed using the K-means algorithm, one mean\n~k(cid:6)(~x) = Pi \u2019(cid:6)\ni (~x)~xi for each class is selected for a pattern ~x using the nearest-neighbour\nrule. The weight is then computed as: ~w(~x) = ~k+(~x) (cid:0) ~k(cid:0)(~x) = Pi(\u2019+\ni (~x))~xi,\n\n= (cid:0) h ~wj Pi (cid:11)i~xii\n\ni (~x) (cid:0) \u2019(cid:0)\n\n2\n\n.\n\n2\n\n\fthe offset being given by: b(~x) = k~k(cid:0)(~x)k2(cid:0)k~k+(~x)k2\n. Since the nearest-neighbour rule is\nused for each pattern, the decision function is piecewise linear. The appropriate value of K\nis determined for instance using cross-validation.\n\n2\n\nSVM\n\nRVM\n\nProt\n\nKmean\n\nFigure 3: Two-dimensional toy example illustrating classi\ufb01cation for a SVM, RVM, Prot\nand Kmean: the lines indicate the separating hyperplanes and the circles show the SVs,\nRVs, prototypes or means respectively.\n\n4 Human Classi\ufb01cation Behaviour Revisited by Machine\n\nEach face taken from the MPI database is represented by three vectors: an intensity-\nstandardised texture map, and space-standardised x- and y-\ufb02ow\ufb01elds representing the\nshape. The texture and shape vectors contain the information required to generate a speci\ufb01c\nface from an \u201caverage\u201d reference face by putting each face of the database into correspon-\ndence. This format makes intensity and structural information about the faces explicit.\nFor the sake of numerical tractability, especially when using cross-validation methods, the\ndimension of the image vectors has to be reduced to be usable by machine learning algo-\nrithms. We use Principal Component Analysis (PCA) to represent the concatenated texture-\nand shape vectors of each face of size 3 (cid:1) 2562 in only 200 dimensions. In contrast to [9]\nwhere PCA is applied only to the intensity (or pixel) information of standard images, the\nuse of PCA on the texture-shape representation forces learning machines to encode infor-\nmation about local structure and spatial correspondences.\n\nIt may be argued that the Principal Components of faces form a biologically-plausible ba-\nsis for representation of faces [10], the so-called eigenfaces. Standard PCA on the images\nthemselves may thus be considered a biologically-plausible representation of faces. Given\nthat we use PCA on texture and shape, any claim of biological plausibility of our represen-\ntation is somewhat tenuous, however.\n\nThe variant of PCA considered in this paper searches to express the eigenvectors as linear\ncombinations of the data vectors [10, 11]. It has the computational advantage over classic\nPCA that it does not require the computation of a correlation between the dimensions of the\ninput but between the patterns of the input. For the stimuli considered here, the eigenvalue\nspectrum as shown in Fig.1 is a monotonically decreasing function with no \ufb02at regions.\nThus PCA seems to be a sensible choice to represent the human face stimuli used in this\nstudy (for a comparative study of PCA against Locally Linear Embedding, where PCA is\nclearly superior for machine learning purposes, see [2]).\n\n4.1 Classi\ufb01cation Performance of Man and Machine\nWe compare the classi\ufb01cation performance of man and machine in plot (a) of Fig.4. For\nhumans, the classi\ufb01cation error on the true dataset is obtained by comparing the estimated\ngender (class) to the true one. The classi\ufb01cation error on the subject dataset, seen as a\nmeasure for the mean consistency between subjects, is the mean over all subjects of the\n\n\fr\no\nr\nr\ne\n\nH\nS\n \no\nt\n \n|\n\n|d\n\n0.35\n\n0.3\n\n0.25\n\n0.2\n\n0.15\n\n0.1\n\n0.05\n\n0\n\n0.3\n\n0.25\n\n0.2\n\n0.15\n\n0.1\n\n0.05\n\n0\n\n(a)\n\n(b)\n\nclassification error on true dataset\nclassification error on subject dataset\n\nman SVM RVM Prot Kmean\n\n(c)\n\nbin 1\n\nbin 2\n\nRT (binified)\n\nbin 3\n\nH\nS\n \no\nt\n \n|\n\n|d\n\n0.3\n\n0.25\n\n0.2\n\n0.15\n\n0.1\n\n0.05\n\n0\n\n0.3\n\n0.25\n\nH\nS\n \no\nt\n \n|\n\n|d\n\n0.2\n\n0.15\n\n0.1\n\n0.05\n\n0\n\nSVM\nRVM\nProt\nKmean\n\nno error\n\nerror\n\n(d)\n\n1\n\n2\nCR\n\n3\n\nFigure 4: Classi\ufb01cation performance of man and machine on the true and subject datasets\n(a) and correlation of behaviour of man (classi\ufb01cation error, RT and CR) with machine (j(cid:14)j)\nfor data pooled across subjects and stimuli (b-d).\n\nmean classi\ufb01cation error the other subjects made on the stimuli presented to each subject\nby de\ufb01ning as an error when the other subjects responded differently than the considered\nsubject. For machines the classi\ufb01cation error is obtained for the dataset with either the true\nor the subject\u2019s labels by using a single 10-fold cross-validation for RVM and Prot and a\ndouble 10-fold cross-validation to determine C for SVMs and K for Kmeans. Since every\nsubject gets a different set of 148 randomly chosen faces from the 200 available, the mean\nand standard error of the classi\ufb01cation errors of man or machine for each dataset is plotted.\n\nWhen classifying the dataset with the true labels, the combination of PCA with Kmean\nyields a classi\ufb01cation performance comparable to that of humans. The better classi\ufb01cation\nperformance of Kmean compared to the simple prototype classi\ufb01er may be explained by\nthe piecewise linear decision function. The prototype classi\ufb01er, popular in neuroscience,\npsychology and philosophy, performs on average worse than humans. Either humans do\nnot classify gender using prototypes in the linear PCA space, or they use prototypes but not\nthe PCA representation, or, of course, they use neither.\n\nAn intriguing fact is that SVMs and RVMs perform better than man, which is contrary to\nwhat is reported in [5, 12] where human experts and machines are tested on digits from the\npostal service database USPS. The context of the study presented here is different, however.\nOur subjects were presented with human faces with some high-level features such as hair,\nbeards, or glasses removed. However, such features were likely used by the subjects to\ncreate their representation of gender-space during their lifetime. Subjects are thus trained\non one type of data and tested on another. The machines on the other hand are trained and\ntested on the same type of stimuli. This may explain the quite disappointing performance\nof man in such a biologically-relevant task compared to machine.\n\n\fHowever while humans learn the gender classi\ufb01cation during their lifetime, it seems that\nthey solve the problem in a manner not as optimal from a statistical point of view as SVMs\nor RVMs, but similarly to Kmeans and better than prototype learners.\n\nThe classi\ufb01cation on the subject\u2019s labels represents the ability of the classi\ufb01er to learn what\nwe, based on the responses of the subjects, presume to be their internal representation of\nface-space. The machines have more dif\ufb01culty in learning the dataset with the subject\u2019s\nlabels than the one with the true labels. Given our aim of re-creating the subjects\u2019 decision\nboundaries using arti\ufb01cial classi\ufb01ers\u2014to compare human response patterns to machine\nlearning concepts\u2014this makes SVM and RVM good, Kmean a mediocre, and the prototype\nlearner a rather poor candidate for this enterprise using the PCA representation.\n\nk ~wk\n\n4.2 Correlation of Behaviour of Man with Machine\nHere we correlate the classi\ufb01cation behaviours of man and machine. The results are sum-\nmarized in plot (b-d) of Fig.4 and in Fig.5 where the parameters are averaged over the\nsubjects as before. This type of data analysis simply correlates the subject\u2019s classi\ufb01cation\nerror, RT, and CR to the distance j(cid:14)(~xi)j = jh ~wj~xii+bj\nof the face stimuli to the separat-\ning hyperplane (SH) obtained for the four types of hyperplane classi\ufb01ers (in the case of\nKmean this distance is computed for each pattern with respect to the SH constructed using\nits nearest mean of each class.). The hyperplanes are determined using cross-validation\n(see above) on the dataset with the subject\u2019s labels. The distance of a pattern ~x to the SH is\nthen calculated using the hyperplane computed using the training set corresponding to the\ntesting set ~x is belonging to. Notice that j(cid:14)j re\ufb02ects the construction rule of the classi\ufb01ca-\ntion hyperplane rather than the generalisation ability of the algorithm. SVMs maximise the\ndistance to the nearest point but not the average distance to all points, which may yield a\nsmall value of j(cid:14)j. Moreover the number of SVs, here ](SV ) = 74 (cid:6) 1 out of 148 patterns,\nindicates that most patterns are close to the SH since classi\ufb01cation is done in a space of\ndimensionality 200. The number of RVs, ](RV ) = 9 (cid:6) 0, is comparatively small, this\nsparsity being a well-known feature of RVMs.\n\nLooking at Fig.4 (b-d) where the data is averaged across subjects and stimuli, we observe,\n\ufb01rst, that the error of the subjects is high for j(cid:14)j low, suggesting that elements near the SH\nare more dif\ufb01cult to classify. Second j(cid:14)j is low for high RT\u2019s: the elements near the SH\nseem to require more processing in the brain resulting in a higher RT. Third, the high CR\nfor high j(cid:14)j indicates that the subjects are sure when stimuli are far from SH. Thus elements\nfar from the SH are classi\ufb01ed more accurately, faster and with higher con\ufb01dence than those\nnear to the SH. In order to compare the classi\ufb01ers, we proceed as below.\n\nThus far we only considered data averaged across all face-stimuli. In the following we\nassess the relation between the distance of each face representation to the SH and the mean\nacross all subjects of one of their responses (classi\ufb01cation error, RT or CR) for that face.\nWe perform a non-parametric rank correlation analysis using the tied rank of the subject\u2019s\nresponse and of j(cid:14)j across the set of 200 faces. Fig.5 presents the resulting scatter plots for\neach classi\ufb01er and for each type of response. Qualitatively, it seems that RVMs show most\nand prototype learners least correlation between the subject\u2019s response and j(cid:14)j. In order to\ncompare these behaviours in a more quantitative manner, we indicate in \ufb01g.5 Spearman\u2019s\nrank correlation coef\ufb01cients r (linear correlation between the tied rank of one variable and\nthe tied rank of the other) between the parameter of machine (distance of a face to the SH)\nand the responses of man (classi\ufb01cation error, RT and CR). Under the null hypothesis of\n\nno correlation between man and machine, the variable z = rpN (cid:0) 1 follows a standard\n\nnormal distribution, N = 200 being the number of points in the scatter plots, and the\nsigni\ufb01cance of the hypothesis test is computed as P = (cid:8)(z) where (cid:8) is the cumulative\nnormal distribution with zero mean and unit variance. We get for all cases P < 5 (cid:1) 10(cid:0)4\nwhich allows us to reject the null hypothesis with a high degree of con\ufb01dence.\n\n\fr\no\nr\nr\ne\n\n \nt\nc\ne\nb\nu\ns\n\nj\n\nSVM r=\u22120.60 \u2013 0.02\n\nRVM r=\u22120.65 \u2013 0.01\n\nProt r=\u22120.29 \u2013 0.02\n\nKmean r=\u22120.39 \u2013 0.02\n\n200\n\n150\n\n100\n\n50\n\n1\n\n1\n\n200\n\n150\n\n100\n\n50\n\n200\n\n150\n\n100\n\n50\n\n200\n\n150\n\n100\n\n50\n\n50 100 150 200\n|d | to SH\n\n1\n\n1\n\n50 100 150 200\n|d | to SH\n\n1\n\n1\n\n50 100 150 200\n|d | to SH\n\n1\n\n1\n\n50 100 150 200\n|d | to SH\n\nSVM r=\u22120.69 \u2013 0.01\n\nRVM r=\u22120.71 \u2013 0.01\n\nProt r=\u22120.35 \u2013 0.02\n\nKmean r=\u22120.45 \u2013 0.02\n\n200\n\n150\n\n100\n\n50\n\n]\ns\n[\n \n\nT\nR\n\n200\n\n150\n\n100\n\n50\n\n200\n\n150\n\n100\n\n50\n\n200\n\n150\n\n100\n\n50\n\n1\n\n1\n\n50 100 150 200\n|d | to SH\n\n1\n\n1\n\n50 100 150 200\n|d | to SH\n\n1\n\n1\n\n50 100 150 200\n|d | to SH\n\n1\n\n1\n\n50 100 150 200\n|d | to SH\n\nSVM r=0.59 \u2013 0.02\n\nRVM r=0.67 \u2013 0.01\n\nProt r=0.24 \u2013 0.02\n\nKmean r=0.40 \u2013 0.02\n\n200\n\n150\n\nR\nC\n\n100\n\n50\n\n1\n\n1\n\n200\n\n150\n\n100\n\n50\n\n200\n\n150\n\n100\n\n50\n\n200\n\n150\n\n100\n\n50\n\n50 100 150 200\n|d | to SH\n\n1\n\n1\n\n50 100 150 200\n|d | to SH\n\n1\n\n1\n\n50 100 150 200\n|d | to SH\n\n1\n\n1\n\n50 100 150 200\n|d | to SH\n\nFigure 5: Scatter plots relating the subjects\u2019 responses (classi\ufb01cation error, RT and CR)\nto the distance j(cid:14)j to the SH for each face in the database, the pooling being done across\nsubjects.\n\nFrom these results it can be seen that RVMs correlate best all the subject\u2019s responses with\nthe distances of the stimuli to the SH. The RT seems to be the performance measure where\nmost correlation between man and machine can be asserted although all performance mea-\nsures are related as shown in sec.2. The prototype algorithm again behaves in the least\nhuman-like manner of the four classi\ufb01ers. The correlation between the classi\ufb01cation be-\nhaviour of man and machine indicates for RVMs, and to some extent SVMs, that heads\nfar from the SH are more easily processed by humans. It may be concluded that the brain\nneeds to do more processing (higher RT) to classify stimuli close to the decision hyper-\nplane, while stimuli far from it are classi\ufb01ed more accurately (low error) and with higher\ncon\ufb01dence (high CR). Human classi\ufb01cation behaviour can thus be modeled by hyperplane\nalgorithms; a piecewise linear decision function as found in Kmean seems however to be\nnot biologically-plausible.\n\n5 Conclusions\n\nOur study compared classi\ufb01cation of faces by man and machine. Psychophysically we\nnoted that a high classi\ufb01cation error and a low CR for humans is accompanied by a longer\nprocessing of information by the brain (a longer RT). Moreover, elements far from the SH\nare classi\ufb01ed more accurately, faster and with higher con\ufb01dence than those near to the SH.\nWe also \ufb01nd three noteworthy results. First, SVMs and RVMs can learn to classify faces\n\n\fusing the subjects\u2019 labels but perform much better when using the true labels. Second, cor-\nrelating the average response of humans (classi\ufb01cation error, RT or CR) with the distance\nto the SH on a face-by-face basis using Spearman\u2019s rank correlation coef\ufb01cients shows that\nRVMs recreate human performance most closely in every respect. Third, the mean-of-class\nprototype, its popularity in neuroscience notwithstanding, is the least human-like classi\ufb01er\nin all cases examined.\n\nObviously our results rely on a number of crucial assumptions: \ufb01rst, all measurements were\ndone in a linear space; second, the conclusions are only valid given the PCA representation\n(pre-processing). Third, when rejecting the prototype learner as a plausible candidate for\nhuman classi\ufb01cation we assume the representativeness of our face space: we assume that\nthe mean face of our human subjects\u2019 is close to the sample mean of our database. Clearly,\na larger face database would be welcome, but is not trivial as we need texture maps and\nthe corresponding shapes. Finally, there is the different learning regime. Machines were\ntrained on the dataset proper, whereas humans were assumed to have extracted the relevant\ninformation during their lifetime, and they were tested on faces with some cues removed.\nHowever, the representation we used does allow the genders to be separated well, as shown\nby the SVM classi\ufb01cation performance on the true labels. As a \ufb01rst attempt to extend\nthe neuroscience community\u2019s toolbox with machine learning methods we believe to have\nshown the fruitfulness of this approach.\n\nAcknowledgements\nThe authors would like to thank Volker Blanz for providing the face database and the \ufb02ow-\n\ufb01eld algorithms. In addition we are grateful to G\u00a8okhan Bak\u0131r, Heinrich B\u00a8ulthoff, Jez Hill,\nCarl Rasmussen, Gunnar R\u00a8atsch, Bernhard Sch\u00a8olkopf and Vladimir Vapnik for helpful\ncomments and suggestions. AG was supported by a grant from the European Union (IST\n2000-29375 COGVIS).\n\nReferences\n[1] V. Blanz and T. Vetter. A Morphable Model for the Synthesis of 3D Faces. Proc.\n\nSiggraph99, pp. 187-194. Los Angeles: ACM Press, 1999.\n\n[2] A. B. A. Graf and F. A. Wichmann. Gender Classi\ufb01cation of Human Faces. Proceed-\n\nings of the BMCV, Springer LNCS 2525, 491-501, 2002.\n\n[3] T. D. Wickens. Elementary Signal Detection Theory. Oxford University Press, 2002.\n[4] A. B. A. Graf, A. J. Smola, and S. Borer. Classi\ufb01cation in a Normalized Feature\nSpace using Support Vector Machines. IEEE Transactions on Neural Networks 14(3),\n597-605, 2003.\n\n[5] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.\n[6] M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal\n\nof Machine Learning Research 1, 211-244, 2001.\n\n[7] S. K. Reed. Pattern Recognition and Categorization. Cognitive Psychology 3, 382-\n\n407, 1972.\n\n[8] R. O. Duda, P.E. Hart, and D.G. Stork. Pattern Classi\ufb01cation. John Wiley & Sons,\n\n2001.\n\n[9] L. Sirovich and M. Kirby. Low-Dimensional Procedure for the Characterization of\n\nHuman Faces. Journal of the Optical Society of America A, 4(3), 519-524, 1987.\n\n[10] M. Turk and A. Pentland. Eigenfaces for Recognition. Journal of Cognitive Neuro-\n\nscience, 3(1), 1991.\n\n[11] B. Sch\u00a8olkopf, A. Smola, and K.-R. M\u00a8uller. Nonlinear Component Analysis as a Ker-\n\nnel Eigenvalue Problem. Neural Computation, 10, 1299-1319, 1998.\n\n[12] J. Bromley and E. S\u00a8ackinger. Neural-network and K-nearest-neighbor Classi\ufb01ers.\n\nTechnical Report 11359-910819-16TM, AT&T, 1991.\n\n\f", "award": [], "sourceid": 2484, "authors": [{"given_name": "Felix A.", "family_name": "Wichmann", "institution": null}, {"given_name": "Arnulf", "family_name": "Graf", "institution": null}]}