{"title": "Machine Learning Applied to Perception: Decision Images for Gender Classification", "book": "Advances in Neural Information Processing Systems", "page_first": 1489, "page_last": 1496, "abstract": null, "full_text": "Machine Learning Applied to Perception:\nDecision-Images for Gender Classi\ufb01cation\n\nFelix A. Wichmann and Arnulf B. A. Graf\n\nMax Planck Institute for Biological Cybernetics\n\nT\u00a8ubingen, Germany\n\nfelix.wichmann@tuebingen.mpg.de\n\nEero P. Simoncelli\n\nHoward Hughes Medical Institute\n\nCenter for Neural Science\nNew York University, USA\n\nHeinrich H. B\u00a8ulthoff and Bernhard Sch\u00a8olkopf\nMax Planck Institute for Biological Cybernetics\n\nT\u00a8ubingen, Germany\n\nAbstract\n\nWe study gender discrimination of human faces using a combination\nof psychophysical classi\ufb01cation and discrimination experiments together\nwith methods from machine learning. We reduce the dimensionality of\na set of face images using principal component analysis, and then train a\nset of linear classi\ufb01ers on this reduced representation (linear support vec-\ntor machines (SVMs), relevance vector machines (RVMs), Fisher linear\ndiscriminant (FLD), and prototype (prot) classi\ufb01ers) using human clas-\nsi\ufb01cation data. Because we combine a linear preprocessor with linear\nclassi\ufb01ers, the entire system acts as a linear classi\ufb01er, allowing us to visu-\nalise the decision-image corresponding to the normal vector of the separ-\nating hyperplanes (SH) of each classi\ufb01er. We predict that the female-to-\nmaleness transition along the normal vector for classi\ufb01ers closely mim-\nicking human classi\ufb01cation (SVM and RVM [1]) should be faster than\nthe transition along any other direction. A psychophysical discrimina-\ntion experiment using the decision images as stimuli is consistent with\nthis prediction.\n\n1\n\nIntroduction\n\nOne of the central problems in vision science is to identify the features used by human\nsubjects to classify visual stimuli. We combine machine learning and psychophysical tech-\nniques to gain insight into the algorithms used by human subjects during visual classi\ufb01ca-\ntion of faces. Comparing gender classi\ufb01cation performance of humans to that of machines\nhas attracted considerable attention in the past [2, 3, 4, 5]. The main novel aspect of our\nstudy is to analyse the machine algorithms to make inferences about the features used by\nhuman subjects, thus providing an alternative to psychophysical feature extraction tech-\nniques such as the \u201cbubbles\u201d [6] or the noise classi\ufb01cation image [7] techniques. In this\n\u201cmachine-learning-psychophysics research\u201d we \ufb01rst we train machine learning classi\ufb01ers\non the responses (labels) of human subjects to re-create the human decision boundaries by\nlearning machines. Then we look for correlations between machine classi\ufb01ers and sev-\n\n\feral characteristics of subjects\u2019 responses to the stimuli\u2014proportion correct, reaction times\n(RT) and con\ufb01dence ratings. Ideally this allows us to \ufb01nd preprocessor-classi\ufb01er pairings\nthat are closely aligned with the algorithm employed by the human brain for the task at\nhand. Thereafter we analyse properties of the machine closest to the human\u2014in our case\nsupport vector machines (SVMs), and to slightly lesser degree, relevance vector machines\n(RVMs)\u2014and make predictions about human behaviour based on machine properties.\n\nIn the current study we extract a decision-image containing the information relevant for\nclassi\ufb01cation by the machine classi\ufb01ers. The decision-image ~W is the image corresponding\nto a vector ~w orthogonal to the SH of the classi\ufb01er. The decision-image has the same\ndimensionality as the (input-) images\u2014in our case 256 \u00d7 256\u2014whereas the normal vector\nlives in the (reduced dimensionality) space after preprocessing\u2014in our case in 200 \u00d7 1\nafter Principal Component Analysis (PCA). Second, we use ~w of the classi\ufb01ers to generate\nnovel stimuli by adding (or subtracting) various \u201camounts\u201d (\u03bb ~w) to a genderless face in\nPCA space. The novel stimuli, images, I(\u03bb) are generated as I(\u03bb) = P CA\u22121\u03bb ~w\nk ~wk . We\npredict that the female-to-maleness transition along the vectors normal to the SHs, ~wSVM\nand ~wRVM, should be signi\ufb01cantly faster than those along the normal vectors of machine\nclassi\ufb01ers that do not correlate as well with human subjects. A psychophysical gender\ndiscrimination experiment con\ufb01rms our predictions:\nthe female-to-maleness axis of the\nSVM and, to a smaller extent, RVM, are more closely aligned with the human female-to-\nmaleness axis than those of the prototype (Prot) and a Fisher linear discriminant (FLD)\nclassi\ufb01er.\n\n2 Preprocessing and Machine Learning Methods\n\nWe preprocessed the faces using PCA. PCA is a good preprocessor in the current con-\ntext since we have previously shown that in PCA-space strong correlations exist between\nman and machine [1]. Second, there is evidence that the PCA representation may be\nbiologically-plausible [8]. The face stimuli were taken from the gender-balanced Max\nPlanck Institute (MPI) face database1 composed of 200 greyscale 256 \u00d7 256-pixel frontal\nviews of human faces, yielding a data matrix X \u2208 R200\u00d72562. For the gender discrimina-\ntion task we adhere to the following convention for the class labels: y = \u22121 for females\nand y = +1 for males. We consider no dimensionality reduction and keep all 200 com-\nponents of the PCA. This implies that the reconstruction of the data from the PCA analysis\nis perfect and we can write: E = \u00afXBT \u21d4 \u00afX = EB where E \u2208 R200\u00d7200 is the mat-\nrix of the encodings (each row is a PCA vector in the space of reduced dimensionality),\nB \u2208 R200\u00d72562 is the orthogonal basis matrix and \u00afX the centered data matrix. The com-\nbination of the encoding matrix E with the true class labels y of the MPI database yields\nthe true dataset, whereas its combination with the class labels yest by the subjects yields\nthe subject dataset.\n\nTo model classi\ufb01cation in human subjects we use methods from supervised machine learn-\ning. In particular, we consider linear classi\ufb01ers where classi\ufb01cation is done using a SH\nde\ufb01ned by its normal vector ~w and offset b. Furthermore the normal vector ~w of our\nclassi\ufb01ers can then be written as a linear combination of the input patterns ~xi with suit-\nable coef\ufb01cients \u03b1i as ~w = Pi \u03b1i~xi. We de\ufb01ne the distance of a pattern to the SH as\n\u03b4(~x) = h ~w|~xi+b\n. Note that in our experiments the ~xi are the PCA coef\ufb01cients of the im-\nages, that is ~xi \u2208 R200, whereas the images themselves are in R2562. For the subject dataset\nwe chose the mean values of ~w, b and ~w\u00b1 over all subjects.\n\nk ~wk\n\n1The MPI face database is located at http://faces.kyb.tuebingen.mpg.de\n\n\f2.1 Machine Classi\ufb01ers\nThe Support Vector Machine (SVM, [9, 10]) is a state-of-the-art maximum margin al-\ngorithm based on statistical learning theory. SVMs have an intuitive geometrical interpret-\nation: they classify by maximizing the margin separating both classes while minimizing\nthe classi\ufb01cation error.\n\nThe Relevance Vector Machine (RVM, [11]) is a probabilistic Bayesian classi\ufb01er. It op-\ntimises the expansion coef\ufb01cients of a SV-style decision function using a hyperprior which\nfavours sparse solutions.\n\nCommon classi\ufb01ers in neuroscience, cognitive science and psychology are variants of the\nPrototype classi\ufb01er (Prot, [12]). Their popularity is due to their simplicity: they classify\naccording to the nearest mean-of-class prototype; in the simplest form all dimensions are\nweighted equally but variants exist that weight the dimensions inversely proportional the\nclass variance along the dimensions. As we cannot estimate class variance along all 200\ndimensions from only 200 stimuli, we chose to implement the simplest Prot with equal\nweight along all dimensions.\n\nThe Fisher linear discriminant classi\ufb01er (FLD, [13]) \ufb01nds a direction in the dataset which\nallows best linear separation of the two classes. This direction is then used as the normal\nvector of the separating hyperplane. In fact, FLD is arguably a more principled whitened\nvariant of the Prot classi\ufb01er: Its weight vector can be written as ~w = S\u22121\nW (~\u00b5+ \u2212~\u00b5\u2212), where\nS\u22121\nW is the within class covariance matrix of the two classes, and \u00b5\u00b1 are the class means.\nConsequently, if we disregard the constant offset b, we can write the decision function as\nh ~w|~xi = hS\u22121\nW ~xi, which is a prototype classi\ufb01er\nusing the prototypes ~\u00b5\u00b1 after whitening the space with S\u22121/2\nW .\n\nW (~\u00b5+ \u2212 ~\u00b5\u2212)|~xi = hS\u22121/2\n\nW (~\u00b5+ \u2212 ~\u00b5\u2212)|S\u22121/2\n\n2.2 Decision-Images and Generalised Portraits\nWe combine the linear preprocessor (PCA) \u00afX = EB and the linear classi\ufb01er (SVM, RVM,\nProt, FLD) y(~x) = h ~w|~xi + b to yield a linear classi\ufb01cation system: ~y = ~wT ET +~b where\n~b = b~1. We de\ufb01ne the decision-image as the vector ~W effectively used for classi\ufb01cation as:\n~y = ~W T \u00afX T +~b. We then have ~wT ET = ~W T \u00afX T \u21d4 ~wT B\u2212T \u00afX T = ~W T \u00afX T where B\u22121\nis the pseudo-inverse of B. For the last condition, we obtain a de\ufb01nition of the decision-\nimage ~W = B\u22121 ~w \u2208 R2562. In the case of PCA where B\u22121 = BT , we simply have\n~W = BT ~w.\nFigure 1 shows the decision-images ~W for the four classi\ufb01ers, SVM, RVM, Prot and FLD.\nThe decision-images in the \ufb01rst row are those obtained if the classi\ufb01ers are trained on the\ntrue dataset; those in the second row if trained on the subject dataset, marked on the right\nhand side of the \ufb01gure by \u201ctrue data\u201d and \u201csubj data\u201d, respectively. Decision-images are\nrepresented by a vector pointing to the positive class and can thus be expected to have male\nattributes (the negative of it looks female). Both dark and light regions are more important\nfor classi\ufb01cation than the grey regions. Inspection of the decision-images is instructive. For\nthe prototype learner, the eye and beard regions are most important. SVM, RVM and FLD\nhave somewhat more \u201cholistic\u201d decision-images. Equally instructive is the comparison of\nthe optimal decision-images of the machine classi\ufb01ers in row one (0 to 1% classi\ufb01cation\nerror for SVM, RVM and FLD) and those trained on the subject labels in row two (the\naverage subject error is 16 % when classifying the faces; the machines attempt to re-create\nthe decision boundaries of the subjects and thus show similar mis-classi\ufb01cation errors).\nThe decision-images for the subject dataset are slightly more \u201cface-like\u201d and less holistic\nthan those obtained using the true labels; the eye and mouth regions are more strongly\nemphasised. This trend is true across all classi\ufb01ers. This suggest that human subjects base\ntheir gender classi\ufb01cation strongly on the eye and mouth regions of the face\u2014clearly a\nsub-optimal strategy as revealed by the more holistic true dataset SVM, RVM and FLD\n\n\fdecision-images.\n\nA decision-image thus represents a way to extract the visual cues and features used by hu-\nman subjects during visual classi\ufb01cation without using a priori assumptions or knowledge\nabout the task at hand.\nSVM\n\nRVM\n\ntrained\non\n\nProt\n\nFLD\n\n\u2192\nW\n\n\u2192\nW\n\ntrue\ndata\n\nsubj\ndata\n\nFigure 1: Decision-images ~W for each classi\ufb01er for both the true and the subject dataset; all\nimages are rescaled to [0, 1] and their means set to 128 for illustration purposes (different\nscalers for different images).\n\nWe can also de\ufb01ne generalised portraits2 ~W\u00b1. The generalised portraits ~W\u00b1 can be\nseen as \u201csummary\u201d faces in each class re\ufb02ecting the decision rule of the classi\ufb01er. They\ncan be viewed as an extension of the concept of a prototype:\nthey are the prototype\nof the faces the classi\ufb01er bases its decision on. We note that ~w can be written as:\n~w = Pi \u03b1i~xi = Pi| sign(\u03b1i)=+1 \u03b1i~xi \u2212 Pi| sign(\u03b1i)=\u22121 |\u03b1i|~xi. This allows to de\ufb01ne\nthe generalized portraits as ~W\u00b1 which are computed by inverting the PCA transformation\non the patterns ~w\u00b1 =\n. The vector ~w\u00b1 is constrained to be in the convex\nhull of the respective data in order to yield a \u201cviewable\u201d portrait. The generalised por-\ntraits for the SVM, RVM and FLD together with the Prot, where the prototype is the same\nas the generalised portrait, are shown in \ufb01gure 2. We also note that ~w can be written as\n~w = Pi \u03b1i~xi = Pi| sign(\u03b1i)=+1 \u03b1i~xi \u2212 Pi| sign(\u03b1i)=\u22121 |\u03b1i|~xi.\n\nPi| sign(\u03b1i)=\u00b11 \u03b1i~xi\nPi| sign(\u03b1i)=\u00b11 \u03b1i\n\nThe generalised portraits can be associated with the correct class: ~W+ are males whereas\n~W\u2212 are females. The SVM and the FLD use patterns close to the SH for classi\ufb01cation\nand hence their decision-images appear androgynous, whereas Prot and RVM tend to use\npatterns distant from the SH resulting in more female and male generalised portraits. Com-\nparison of the optimal, true, generalised portraits to those based on the subject labels shows\nthat classi\ufb01cation has become more dif\ufb01cult: generalised portraits have moved closer to\neach other in gender space, narrowing the distance between the classes and thereby dimin-\nishing the gender typicality of the generalised portraits for all classi\ufb01ers.\n\n3 Human Gender Discrimination along the Decision-Image Axes\n\nThe decision-images introduced in section 2.2 are based purely on machine learning, albeit\non labels provided by human subjects in the case of the subject dataset. Our previous paper\n[1] reported that the subjects\u2019 responses to the faces\u2014proportion correct, reaction times\n\n2This term was introduced by [14] with the idea in mind that when trained on a set of portraits of\nmembers of a family, one would obtain a \u201cgeneralized\u201d portrait which captures the essential features\nof the family as a superposition of all family members.\n\n\fSVM\n\nRVM\n\nProt\n\nFLD\n\ntrained\non\n\ntrue\ndata\n\ntrue\ndata\n\nsubj\ndata\n\nsubj\ndata\n\n\u2192\nW+\n\n\u2192\nW\u2212\n\n\u2192\nW+\n\n\u2192\nW\u2212\n\nFigure 2: Generalised portraits ~W\u00b1 for each classi\ufb01er for both the true and the subject\ndataset; all images are rescaled to [0, 1] and their means set to 128 for illustration purposes\n(different scalers for different images). [Unfortunately the downsampling (low-pass \ufb01lter-\ning) of the faces necessary to \ufb01t them in the \ufb01gure makes all the faces somewhat more\nandrogynous than they are viewed at full resolution.]\n\n(RT) and con\ufb01dence ratings\u2014correlated very well with the distance of the stimuli to their\nseparating hyperplane (SH) for support and relevance vector machines (SVMs, RVMs) but\nnot for simple prototype (Prot) classi\ufb01er. If these correlations really implied that SVM\nand RVM capture some crucial aspects of human internal face representation the following\nprediction must hold: already for small |\u03bb| ISVM(\u03bb) and IRVM(\u03bb) should look male/female\nwhereas |\u03bb| IProt(\u03bb) and IFLD(\u03bb) should only be perceptually male/female for larger |\u03bb|.\nIn other words: the female-to-maleness axis of SVM and RVM should be closely aligned\nto those of our subjects whereas that is not expected to be the case for FLD and Prot.\n\n3.1 Psychophysical Methods\nFour observers\u2014one of the authors (FAW) with extensive psychophysical training and\nthree na\u00a8\u0131ve subjects paid for their participation\u2014took part in a standard, spatial (left versus\nright) two-alternative forced-choice (2AFC) discrimination experiment. Subjects were\npresented with two faces I(\u2212\u03bb) and I(\u03bb) and had to indicate which face looked more\nmale. Stimuli were presented against the mean luminance (50 cd/m2) of a carefully lin-\nearised Clinton Monoray CRT driven by a Cambridge Research Systems VSG 2/5 display\ncontroller. Neither male nor female faces changed the mean luminance. Subjects viewed\nthe screen binocularly with their head stabilised by a headrest. The temporal envelope of\nstimulus presentation was a modi\ufb01ed Hanning window (a raised cosine function with rise\nand fall times of 500 ms and a plateau time of 1000 ms). The probability of the female\nface being presented on the left was 0.5 on each trial and observers indicated whether they\n\n\fFLD\nProt \nRVM \nSVM \n\nn\no\ni\nt\na\nc\ni\nf\ni\nt\nn\ne\nd\ni\n \nr\ne\nd\nn\ne\ng\n\n \nt\nc\ne\nr\nr\no\nc\n \n\nn\no\ni\nt\nr\no\np\no\nr\np\n\n 1 \n \n 0.9 \n \n 0.8 \n \n 0.7 \n \n 0.6 \n \n 0.5 \n \n 0.4 \n \n 0.3 \n \n0.2 \n\n0.05 \n\n \n\n 0.0 9 \n\n \n\n 1.4 \nlength of normalised decision image vector \u03bb W / ||W||\n\n 0.8 \n\n \n\n 0.4\n\n \n\n \n\n \n\n \n\na. FAW\n\n \n\n \n\n 2.5\n\n \n\n \n\n \n\n@75% correct @90% correct\n\nb. FAW\n\n1.8\n\n@75% correct @90% correct\n\nc. FJ\n\n1.4\n\n1\n\n0.6\n\nM\nV\nR\n\nt\no\nr\nP\n\nD\nL\nF\n\nM\nV\nR\n\nt\no\nr\nP\n\nD\nL\nF\n\nM\nV\nR\n\nt\no\nr\nP\n\nD\nL\nF\n\nM\nV\nR\n\nt\no\nr\nP\n\nD\nL\nF\n\n@75% correct @90% correct\n\n1.8\n\n@75% correct @90% correct\n\nd. HM\n\ne. KT\n\n1.4\n\n1\n\n0.6\n\nM\nV\nS\n\n \n.\ne\nr\n \n\nn\no\ni\nt\na\nv\ne\nl\ne\n \n\nd\nl\no\nh\ns\ne\nr\nh\nt\n\nM\nV\nS\n\n \n.\ne\nr\n \nn\no\ni\nt\na\nv\ne\nl\ne\n \n\nd\nl\no\nh\ns\ne\nr\nh\nt\n\n3\n\n2\n\n1\n\n1.8\n\n1.4\n\n1\n\n0.6\n\nM\nV\nR\n\nt\no\nr\nP\n\nD\nL\nF\n\nM\nV\nR\n\nt\no\nr\nP\n\nD\nL\nF\n\nM\nV\nR\n\nt\no\nr\nP\n\nD\nL\nF\n\nM\nV\nR\n\nt\no\nr\nP\n\nD\nL\nF\n\n@75% correct\n\n@90% correct\n\nf. pooled\n\nRVM\n\nProt\n\nFLD\n\nRVM\n\nProt\n\nFLD\n\nM\nV\nS\n\n \n.\ne\nr\n \n\nn\no\ni\nt\na\nv\ne\nl\ne\n \n\nd\nl\no\nh\ns\ne\nr\nh\nt\n\n2\n\n1. 5\n\n1\n\n0. 5\n\nFigure 3: a. Shows raw data and \ufb01tted psychometric functions for one observer (FAW).\nb\u2013e. For each of four observers the threshold elevation for the RVM, Prot and FLD\ndecision-image relative to that of the SVM; results are shown for both 75 and 90% cor-\nrect together with 68%-CIs.\n\nf. Same as in b\u2013e but pooled across observers.\n\n\fthought the left or right face was female by touching the corresponding location on a Elo\nTouchSystems touch-screen immediately in front of the display; no feedback was provided.\n\nTrials were run in blocks of 256 in which eight repetitions of eight stimulus levels\n(\u00b1\u03bb1 . . . \u00b1 \u03bb8) for each of the four classi\ufb01ers were randomly intermixed. The na\u00a8\u0131ve sub-\njects required approximately 2000 trials before their performance stabilised; thereafter they\ndid another \ufb01ve to six blocks of 256 trials. All results presented below are based on the\ntrials after training; all training trials were discarded.\n\n3.2 Results and Discussion\nFigure 3a shows the raw data and \ufb01tted psychometric functions for one of the observers.\nProportion correct gender identi\ufb01cation on the y-axis is plotted against \u03bb on the x-axis\non semi-logarithmic coordinates. Psychometric functions were \ufb01tted using the psigni\ufb01t\ntoolbox for Matlab which implements the constrained maximum-likelihood method de-\nscribed in [15]. 68%-con\ufb01dence intervals (CIs), indicated by horizontal lines at 75 and\n90-% correct in \ufb01gure 3a, were estimated by the BCa bootstrap method also implemented\nin psigni\ufb01t [16]. The raw data appear noisy because each data point is based on only eight\ntrials. However, none of \ufb01tted psychometric functions failed various Monte Carlo based\ngoodness-of-\ufb01t tests [15].\n\nTo summarise the data we extracted the \u03bb required for two performance levels\n(\u201cthresholds\u201d), 75 and 90% correct, together with their corresponding 68%-CIs. Figure 3b\u2013\ne shows the thresholds for all four observers normalised by \u03bbSVM (the \u201cthreshold elevation\u201d\nre. SVM). Thus values larger than 1.0 for RVM, Prot and FLD indicate that more of the\ncorresponding decision-images had to be added for the human observers to be able to dis-\ncriminate females from males. In \ufb01gure 3f we pool the data across observers as the main\ntrend, poorer performance for Prot and FLD compared to SVM and RVM, is apparent for\nall four observers. The difference between SVM and RVM is small; going along the direc-\ntion of both Prot and FLD, however, results in a much \u201dslower\u201d transition from female-to-\nmaleness.\n\nThe psychophysical data are very clear: all observers require a larger \u03bb for Prot and FLD;\nthe length ratio ranges from 1.2 to nearly 3.0, and averages to around 1.7 across observers.\nIn the pooled data all the differences are statistically signi\ufb01cant but even at the individual\nsubject level all differences are signi\ufb01cant at the 90% performance level, and \ufb01ve of eight\nare signi\ufb01cant at the 75% performance level. It thus appears that SVM and RVM capture\nmore of the psychological face-space of our human observers than Prot and FLD. From\nour results we cannot exclude the possibility that some other direction might have yielded\neven steeper psychometric functions, i.e. faster female-to-maleness transitions, but we can\nconclude that the decision-images of SVM and RVM are closer to the decision-images\nused by human subjects than those of Prot and FLD. This is exactly as predicted by the\ncorrelations between proportion correct, RTs and con\ufb01dence ratings versus distance to the\nhyperplane reported in [1]\u2014high correlations for SVM and RVM, low correlations for Prot.\n\n4 Summary and Conclusions\n\nWe studied classi\ufb01cation and discrimination of human faces both psychophysically as well\nas using methods from machine learning. The combination of linear preprocessor (PCA)\nand classi\ufb01er (SVM, RVM, Prot and FLD) allowed us to visualise the decision-images of\na classi\ufb01er corresponding to the vector normal to the SH of the classi\ufb01er. Decision-images\ncan be used to determine the regions of the stimuli most useful for classi\ufb01cation simply\nby analysing the distribution of light and dark regions in the decision-image. In addition\nwe de\ufb01ned the generalised portraits to be the prototypes of all faces used by the classi\ufb01er\nto obtain its classi\ufb01cation. For the SVM this is the weighted average of all the support\n\n\fvectors (SVs), for the RVM the weighted average of all the relevance vectors (RVs), and\nfor the Prot it is the prototype itself. The generalised portraits are, like the decision-images,\nanother useful visualisation of the categorisation algorithm of the machine classi\ufb01er.\n\nHowever, the central result of our paper is the corroboration of the machine-learning-\npsychophysics research methodology. In the machine-learning-psychophysics research we\nsubstitute a very hard to analyse complex system (the human brain) by a reasonably com-\nplex system (learning machine) that is complex enough to capture essentials of our human\nsubjects\u2019 behaviour but is nonetheless amenable to close analysis. From the analysis of\nthe machines we then derive predictions for human subjects which we subsequently test\npsychophysically.\nGiven the success in predicting the steepness of the female-to-male transition of the ~wSVM\n-axis we believe that the decision-image ~WSVM captures some of the essential character-\nistics of the human decision algorithm.\n\nAcknowledgements The authors would like to thank Bruce Henning, Frank J\u00a8akel, Ulrike\nvon Luxburg and Christian Wallraven for helpful comments and suggestions. In addition\nwe thank Frank J\u00a8akel for supplying us with the code to run the touch-screen experiment.\n\nReferences\n[1] A.B.A. Graf and F.A. Wichmann.\n\nclassi\ufb01cation. In Advances in Neural Information Processing Systems 16. MIT Press, 2004.\n\nInsights from machine learning applied to human visual\n\n[2] M.S. Gray, D.T. Lawrence, B.A. Golomb, and T.S. Sejnowski. A perceptron reveals the face of\n\nsex. Neural Computation, 7(6):1160\u20131164, 1995.\n\n[3] P.J.B. Hancock, V. Bruce, and A.M. Burton. A comparison of two computer-based face recog-\n\nnition systems with human perceptions of faces. Vision Research, 38:2277\u20132288, 1998.\n\n[4] A.J. O\u2019Toole, P.J. Phillips, Y. Cheng, B. Ross, and H.A. Wild. Face recognition algorithms as\nmodels of human face processing. In Proceedings of the 4th IEEE International Conference on\nAutomatic Face and Gesture Recognition, 2000.\n\n[5] B. Moghaddam and M.-H. Yang. Learning gender with support faces. IEEE Transactions on\n\nPattern Analysis and Machine Intelligence, 24(5):707\u2013711, 2002.\n\n[6] F. Gosselin and P.G. Schyns. Bubbles: a technique to reveal the use of information in recogni-\n\ntion tasks. Vision Research, 41:2261\u20132271, 2001.\n\n[7] A.J. Ahumada Jr. Classi\ufb01cation image weights and internal noise level estimation. Journal of\n\nVision, 2:121\u2013131, 2002.\n\n[8] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1),\n\n1991.\n\n[9] V.N. Vapnik. The Nature of Statistical Learning Theory. Springer, second edition, 2000.\n[10] B. Sch\u00a8olkopf and A.J. Smola. Learning with Kernels. MIT Press, 2002.\n[11] M.E. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine\n\nLearning Research, 1:211\u2013214, 2001.\n\n[12] S.K. Reed. Pattern recognition and categorization. Cognitive Psychology, 3:382\u2013407, 1972.\n[13] R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics,\n\n7(2):179\u2013188, 1936.\n\n[14] V. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation\n\nand Remote Control, 24:774\u2013780, 1963.\n\n[15] F.A. Wichmann and N.J. Hill. The psychometric function: I. \ufb01tting, sampling and goodness-of-\n\n\ufb01t. Perception and Psychophysics, 63(8):1293\u20131313, 2001.\n\n[16] F.A. Wichmann and N.J. Hill. The psychometric function: II. bootstrap-based con\ufb01dence inter-\n\nvals and sampling. Perception and Psychophysics, 63(8):1314\u20131329, 2001.\n\n\f", "award": [], "sourceid": 2543, "authors": [{"given_name": "Felix A.", "family_name": "Wichmann", "institution": null}, {"given_name": "Arnulf", "family_name": "Graf", "institution": null}, {"given_name": "Heinrich", "family_name": "B\u00fclthoff", "institution": null}, {"given_name": "Eero", "family_name": "Simoncelli", "institution": null}, {"given_name": "Bernhard", "family_name": "Sch\u00f6lkopf", "institution": null}]}