{"title": "Facial Memory Is Kernel Density Estimation (Almost)", "book": "Advances in Neural Information Processing Systems", "page_first": 24, "page_last": 30, "abstract": null, "full_text": "Facial Memory is Kernel Density Estimation \n\n(Almost) \n\nMatthew N. Dailey  Garrison W.  Cottrell \n\nDepartment of Computer Science and Engineering \n\nU.C. San Diego \n\nThomas A. Busey \n\nDepartment of Psychology \n\nIndiana University \n\nLa Jolla, CA 92093-0114 \n\n{mdailey,gary}@cs.ucsd.edu \n\nBloomington, IN 47405 \nbusey@indiana.edu \n\nAbstract \n\nWe  compare the  ability  of three exemplar-based memory models,  each \nusing  three  different  face  stimulus  representations,  to  account  for  the \nprobability a human subject responded \"old\" in an old/new facial mem(cid:173)\nory experiment.  The models are  1) the  Generalized Context Model, 2) \nSimSample,  a  probabilistic  sampling  model,  and  3)  MMOM,  a  novel \nmodel related to kernel density estimation that explicitly encodes stim(cid:173)\nulus  distinctiveness.  The representations  are  1)  positions of stimuli  in \nMDS \"face space,\" 2) projections of test faces onto the  \"eigenfaces\" of \nthe study set, and 3) a representation based on response to a grid of Gabor \nfilter jets.  Of the 9 model/representation combinations, only the distinc(cid:173)\ntiveness model in  MDS  space predicts the observed \"morph familiarity \ninversion\" effect, in  which the subjects'  false alarm rate for morphs be(cid:173)\ntween similar faces  is higher than their hit rate for many of the  studied \nfaces.  This evidence is consistent with the hypothesis that human mem(cid:173)\nory for faces is a kernel density estimation task, with the caveat that dis(cid:173)\ntinctive faces require larger kernels than do typical faces. \n\n1  Background \n\nStudying the errors subjects make during face recognition memory  tasks  aids our under(cid:173)\nstanding of the mechanisms and representations underlying memory, face processing, and \nvisual  perception.  One way  of evoking such errors is  by testing subjects'  recognition of \nnew faces created from studied faces that have been combined in some way (e.g. Solso and \nMcCarthy,  1981;  Reinitz,  Lammers,  and Cochran  1992).  Busey  and Tunnicliff (submit(cid:173)\nted) have recently examined the extent to which image-quality morphs between unfamiliar \nfaces affect subjects' tendency to make recognition errors. \n\nTheir experiments used facial images of bald males and morphs between these images (see \n\n\fFacial Memory Is Kernel Density Estimation (Almost) \u2022. . .  \" \n\n<:  .. \n\nI'  \",.,,\",  .  \" . .  \n\n\\i<  r \n\n.,. .'~  . .  \n,.;;.' :-.. '  .' ... ..\u2022. .  ' .. \n\n, ,';'.> \"i' .. > \n\\  \" \n\n' . '  \n\n. \n\nIl\" \n\n25 \n\nFigure 1:  Three normalized morphs from the database. \n\n::;;1;f~ \n\nFigure  1) as  stimuli.  In one study,  Busey (in press) had  subjects rate the similarity of all \npairs in a large set of faces and morphs, then performed a multidimensional scaling (MDS) \nof these  similarity  ratings  to  derive  a  6~dimensional \"face space\"  (Valentine  and  Endo, \n1992).  In another study, \"Experiment 3\" (Busey and Tunnicliff, submitted),  179 subjects \nstudied 68 facial images, including 8 similar pairs and 8 dissimilar pairs, as determined in a \npilot study. These pairs were included in order to study how morphs between similar faces \nand dissimilar faces evoke false alanns. We call the pair of images from which a morph are \nderived its \"parents,\" and the morph itself as their \"child.\"  In the experiment's test phase, \nthe subjects were asked to make new/old judgments in response to  8 of the  16 morphs, 20 \ncompletely new distractor faces, the 36 non-parent targets and one of the parents of each of \nthe 8 morphs. The results were that, for many of the morphlparent pairs, subjects responded \n\"old\" to the unstudied morph more often than to its studied parent.  However, this effect (a \nmorphfamiliarity inversion) only occurred for  the morphs with similar parents.  It seems \nthat  the  similar parents  are  so  similar to  their  \"child\" morphs  that  they  both  contribute \ntoward an \"old\" (false alann) response to the morpho \n\nResearchers  have proposed many  models  to  account  for  data  from  explicit  memory  ex(cid:173)\nperiments.  Although  we  have  applied  other  types  of models  to  Busey  and  Tunnicliff's \ndata with largely  negative results  (Dailey et al.,  1998), in  this  paper,  we limit discussion \nto  exemplar-based models, such as  the Generalized Context Model (Nosofsky,  1986) and \nSAM  (Gillund  and  Shiffrin,  1984).  These  models  rely  on  the  assumption  that  subjects \nexplicitly store representations of each of the stimuli they study.  Busey and Tunnicliff ap(cid:173)\nplied several exemplar-based models to  the Experiment 3 data,  but none of these models \nhave been able to fully account for the observed similar morph familiarity inversion with(cid:173)\nout positing that the similar parents are explicitly blended in memory, producing prototypes \nnear the morphs. \n\nWe  extend  Busey  and  Tunnicliff's (submitted)  work  by  applying  two  of their  exemplar \nmodels to  additional  image-based face  stimulus representations,  and  we propose a  novel \nexemplar model that accounts for the similar morphs' familiarity inversion. The results are \nconsistent with  the  hypothesis that facial  memory is  a kernel density estimation (Bishop, \n1995) task,  except that distinctive exemplars require larger kernels.  Also, on the basis of \nour model,  we  can predict that distinctiveness with  respect to  the study set is the critical \nfactor influencing kernel size,  as  opposed to a context-free notion of distinctiveness.  We \ncan easily test this prediction empirically. \n\n2  Experimental Methods \n2.1  Face Stimuli and Normalization \nThe original images were  104 digitized 560x662 grayscale images of bald men, with con(cid:173)\nsistent lighting and background and fairly  consistent position.  The subjects varied in race \nand extent of facial hair.  We automatically located the left and right eyes on each face using \na simple template correlation technique then translated, rotated,  scaled and cropped each \nimage so the eyes were aligned in each image.  We  then scaled each image to  114x 143 to \nspeed up image processing. Figure 1 shows three examples of the normalized morphs (the \noriginal images are copyrighted and cannot be published) . \n\n\f26 \n\nM  N.  Dailey,  G. W  Cottrell and T.  A.  Busey \n\n2.2  Representations \nPositions in multidimensional face space  Many  researchers  have used a  multidimen(cid:173)\nsional scaling approach to model various phenomena in face processing (e.g. Valentine and \nEndo,  1992).  Busey (in press) had 343 subjects rate the similarity of pairs of faces in the \ntest set and performed a multidimensional scaling on the similarity matrix for  100 of the \nfaces (four non-parent target faces were dropped from this analysis).  The process resulted \nin a 6-dimensional solution with r2  =  0.785 and a stress of 0.13.  In  the MDS modeling \nresults described below, we used the 6-dimensional vector associated with each stimulus as \nits representation. \n\nPrincipal component projections  \"Eigenfaces,\" or the  eigenvectors of the  covariance \nmatrix for a set of face images, are a common basis for face representations (e.g. Turk and \nPentland, 1991). We performed a principal components analysis on the 68 face images used \nin the study set for Busey and Tunnicliff's experiment to get the 67 non-zero eigenvectors \nof their covariance matrix.  We  then projected each of the  104 test set images onto the 30 \nmost significant eigenfaces to obtain a 30-dimensional vector representing each face. l \n\nGabor filter  responses  von  der Malsburg  and  colleagues  have  made  effective  use  of \nbanks of Gabor filters at various orientations and spatial frequencies in face recognition sys(cid:173)\ntems. We used one form of their wavelet (Buhmann, Lades, and von der Malsburg, 1990) at \nfive scales and 8 orientations in an 8x8 square grid over each normalized face image as the \nbasis for a third face stimulus representation.  However, since this  representation resulted \nin a 2560-dimensional vector for each face stimulus, we performed a principal components \nanalysis to reduce the dimensionality to 30, keeping this representation's dimensionality the \nsame as  the eigenface representation's.  Thus we obtained a  30-dimensional vector based \non Gabor filter responses to represent each test set face image. \n\n2.3  Models \nThe Generalized Context Model (GCM)  There are several different flavors to the GCM. \nWe only consider a simple sum-similarity form that will lead directly to our distinctiveness(cid:173)\nmodulated  density  estimation  model.  Our  version  of GCM's  predicted  P(old),  given  a \nrepresentation y  of a test stimulus and representations x  E X of the studied exemplars, is \n\npredy = a + {3 L e- c (dx \u2022y )2 \n\nxEX \n\nwhere a  and {3linearly convert the probe's summed similarity to a probability, X  is the set \nof representations of the study set stimuli;  c  is  used to widen or narrow the width of the \nsimilarity function,  and dx,y  is either Ilx - yll, the Euclidean distance between x and y \nor the weighted Euclidean distance VLk Wk(Xk  - Yk)2  where the \"attentional weights\" \nWk  are  constants that sum to  1.  Intuitively,  this  model simply places a Gaussian-shaped \nfunction over each of the studied exemplars, and the predicted familiarity of a test probe is \nsimply the summed height of each of these surfaces at the probe's location. \n\nRecall  that  two  of our representations,  PC  projection  space  and  Gabor filter  space,  are \n30-dimensional, whereas the other, MDS,  is  only 6-dimensional.  Thus allowing adaptive \nweights for the MDS  representation is  reasonable,  since the  resulting model only uses  8 \nparameters  to  fit  100 points,  but  it  is  clearly unreasonable to  allow  adaptive  weights  in \nPC  and  Gabor space,  where the  resulting  models  would  be  fitting  32 parameters to  100 \npoints. Thus, for all models, we report results in MDS space both with and without adaptive \nweights, but do not report adaptive weight results for models in PC and Gabor space. \n\nSimSample  Busey and Tunnicliff (submitted) proposed SimSample in an attempt to rem(cid:173)\nedy the GCM's poor predictions of the human data.  It is related to both GCM,  in  that it \n\n1 We used 30 eigenfaces because with this number, our theoretical \"distinctiveness\" measure was \n\nbest correlated with the same measure in MDS space. \n\n\fFacial Memory Is Kernel Density Estimation (Almost) \n\n27 \n\nuses representations in  MDS  space,  and SAM (Gillund and Shiffrin,  1984), in  that it in(cid:173)\nvolves  sampling exemplars.  The idea behind the model  is  that when  a subject is  shown \na  test stimulus,  instead of a  summed comparison to  all  of the exemplars in  memory, the \ntest probe probabilistically samples a single exemplar in memory, and the subject responds \n\"old\" if the probe's similarity to the exemplar is  above a noisy criterion.  The model has \na similarity scaling parameter and two parameters describing the noisy threshold function. \nDue to space limitations, we cannot provide the details of the model here. \n\nBusey and Tunnicliff were  able to  fit  the  human data within  the SimS ample framework, \nbut only when they introduced prototypes at the locations of the morphs in MDS space and \nmade the probability of sampling the prototype proportional to the similarity of the parents. \nHere, however, we only compare with the basic version that does not blend exemplars. \nMixture Model of Memory (MMOM) \nIn this model, we assume that subjects, at study \ntime, implicitly create a probability density surface corresponding to the training set.  The \nsubjects' probability of responding \"old\" to a probe are then taken to be proportional to the \nheight of this surface at the point corresponding to the probe.  The surface must be robust \nin  the face  of the variability or noise  typically encountered in  face  recognition  (lighting \nchanges, perspective changes, etc.)  yet also provide some level of discrimination support \n(i.e.  even  when  the  intervals  of possible  representations  for  a  single  face  could overlap \ndue to  noise,  some rational  decision  boundary must still  be constructed).  If we  assume \na  Gaussian  mixture model,  in  which  the  density  surface  is  built  from  Gaussian  \"blobs\" \ncentered on each studied exemplar, the task is a form of kernel density estimation (Bishop, \n1995). \n\nWe can fonnulate the task of predicting the human subjects' P( old) in this framework, then, \nas optimizing the priors and widths of the kernel functions to minimize the mean squared \nerror of the prediction. However, we also want to minimize the number of free parameters \nin  the model -\nparsimonious methods for  setting the priors  and  kernel  function  widths \npotentially lead  to more useful  insights into the principles underlying the human data.  If \nthe  priors and  widths were held constant,  we  would have a simple two parameter model \npredicting the probability a subject responds \"old\" to a test stimulus y: \n\npredy  = L oe-\n\nI!x_~1!2 \n\n2 .. \n\nxEX \n\nwhere a  folds together the  uniform prior and normalization constants, and  (7  is  the  stan-\ndard deviation of the Gaussian kernels.  If we  ignore the constants, however,  this  model \nis essentially the same as the version of the GCM described above.  As the results section \nwill  show,  this  model  cannot fully  account for  the  human familiarity  data in  any of our \nrepresentational spaces. \n\nTo  improve the  model,  we  introduce  two  parameters to  allow  the prior (kernel  function \nheight) and standard deviation (kernel function width) to vary with the distinctiveness of the \nstudied exemplar. This modification has two intuitive motivations. First, when humans are \nasked which of two parent faces a 50% morph is most similar to, if one parent is distinctive \nand the other parent is typical, subjects tend to choose the more distinctive parent (Tanaka et \naI., submitted). Second, we hypothesize that when a human is asked to study and remember \na set of faces for a recognition test, faces with few neighbors will likely have more relaxed \n(wider) discrimination boundaries than faces with many nearby neighbors. \nThus in each representation space, for each studied face x, we computed d(x), the theoret(cid:173)\nical distinctiveness of each face, as the Z-scored average distance to the five nearest studied \nfaces.  We then allowed the height and width of each kernel function to vary with d(x): \n\npredy = L 0(1 + cod(x\u00bbe  2(\"(l+c .. d(x\u00bb2 \n\nI!x_yl!2 \n\n_ \n\nAs  was  the case for GCM and SimSample, we report the results of using a weighted Eu-\nclidean distance between y  and x  in MDS space only. \n\nxEX \n\n\f28 \n\nM.  N  Dailey.  G.  W.  Cottrell and T.  A.  Busey \n\nModel \nGCM \nSimS ample \nMMOM \n\n\"  MDS space  I MDS + weights  I PC projections  I Gabor jets  I \n\n0.1633 \n0.1521 \n0.1601 \n\n0.1417 \n0.1404 \n0.1528 \n\n0.1745 \n0.1756 \n0.1992 \n\n0.1624 \n0.1704 \n0.1668 \n\nTable  1:  RMSE for the three models and three representations.  Quality of fit  for models \nwith adaptive attentional weights are only reported for the low-dimensional representation \n(\"MDS + weights\"). The baseline RMSE, achievable with a constant prediction, is 0.2044. \n\n2.4  Parameter fitting and model evaluation \nFor each  of the  twelve  combinations  of models  with  face  representations,  we  searched \nparameter space by simple hill climbing for the parameter settings that minimized the mean \nsquared error between the model's predicted P(old) and the actual human P(old) data. \n\nWe rate each model's effectiveness with two criteria. First, we measure the models' global \nfit  with RMSE over all test set points.  A model's RMSE can be compared to the baseline \nperformance of the  \"dumbest\" model,  which simply  predicts  the mean  human P(old)  of \n0.5395, and achieves an RMSE of 0.2044. Second, we evaluate the extent to which a model \npredicts the mean human response for each of the six categories of test set stimuli:  1) non(cid:173)\nparent targets, 2) non-morph distractors, 3) similar parents, 4) dissimilar parents, 5) similar \nmorphs, and 6) dissimilar morphs. If a model correctly predicts the rank ordering of these \ncategory means, it obviously accounts for the similar morph familiarity inversion pattern in \nthe human data. As long as models do an adequate job of fitting the human data overall, as \nmeasured by RMSE, we prefer models that predict the morph familiarity inversion effect \nas a natural consequence of minimizing RMSE. \n\n3  Results \n\nTable  1 shows the global fit  of each model/representation pair.  The SimSample model in \nMDS space provides the best quantitative fit.  GeM generally outperforms MMOM, indi(cid:173)\ncating that for  a tight quantitative fit,  having parameters for a linear transformation built \ninto the model is more important than allowing the kernel function to vary with distinctive(cid:173)\nness.  Also of note is that the PC projection representation is consistently outperformed by \nboth the Gabor jet representation and the MDS space representation. \n\nBut for our purposes, the degree to which a model predicts the mean human responses for \neach of the six categories of stimuli is more important, given that it is doing a reasonably \ngood job globally.  Figure 2  takes a more detailed look at how well  each model predicts \nthe human category means.  Even though SimSample in  MDS  space has  the  best global \nfit  to the human familiarity ratings, it does not predict the familiarity  inversion for similar \nmorphs.  Only  the  mixture model  in  weighted MDS  space correctly  predicts  the  morph \nfamiliarity effect.  All of the other models underpredict the human responses to the similar \nmorphs. \n\n4  Discussion \n\nThe results for the mixture model are consistent with the hypothesis that facial memory is \na kernel density estimation task,  with the caveat that distinctive exemplars require larger \nkernels.  Whereas  true  density  estimation  would  tend  to  deemphasize outliers  in  sparse \nareas of the face space, the human data show that the priors and kernel function widths for \noutliers should actually be increased.  Two potentially significant problems with the work \npresented here are first,  we experimented with several models before finding that MMOM \nwas able to predict the morph familiarity inversion effect, and second, we are fitting a single \n\n\fFacial Memory Is Kernel Density Estimation (Almost) \n\n29 \n\nGCMlMDS \n\nSimSamplelMDS \n\nMMOMlMDS \n\n0.6 \n\ni \n~ 0.\" \nf \n\n~  0.2 \n\n0 .0 \n\n0.6 \n\ni \nE\"  0 .4 \nf \n\n~  0.2 \n\n0.0 \n\n0 .6 \n\ni \nit'\"  0.\" \n\nr \n\n0.2 \n\n0,0 \n\n0.6 \n\ni \nE\"  0.\" \n\nr \n\n0.2 \n\n0 .0 \n\n0.6 \n\ni \niI:\"  OA \nf \n\n~  0.2 \n\n0 .0 \n\nOP  SM  T \n\nSP  OM  0 \n\nGCMlMDS+wts \n\nOP  SM  T \n\nSP  OM  0 \nSimSamplelMDS+wts \n\n0.6 \n\ni \nE\"  0.4 \nf \n\n~  0.2 \n\n0.0 \n\nDP  SM  T \n\nSP  OM  0 \nMMOMlMDS+wts \n\nOP  SM  T \n\nSP  OM  0 \n\nGCMlPC \n\n0.6 \n\ni \nE\"  0.\" \nf \n\n~  0 .2 \n\n0.0 \n\n0 .6 \n\ni \nil:\"0A \nf \n\n~  0.2 \n\n0.0 \n\nOP  SM  T \nSimSampleIPC \n\nSP  OM  0 \n\nOP  SM  T \nGCWGabor \n\nSP  OM  0 \n\nDP  SM  T \n\nSP  OM  0 \nSimSample/Gabor \n\n0.6 \n\ni \nill::  0.\" \nf \n\n~  0.2 \n\n0.0 \n\n0.6 \n\ni \nill::  0.\" \nf \n\n~  0 .2 \n\n0.0 \n\n0.6 \n\ni \nill::  0.\" \nf \n\n~  0.2 \n\n0.0 \n\n0.6 \n\ni \nit'\"  0.\" \nt \n\n~  0.2 \n\n0.0 \n\nSP  OM  D \n\nOP  SM  T \nMMOMIPC \n\nOP  SM \n\nT \n\nSP  OM  0 \n\nMMOWGabor \n\nDP  SM  T \n\nSP  OM  0 \n\nOP  SM  T \n\nSP  OM  0 \n\nOP  SM  T \n\nSP  OM  0 \n\nr::::=:I Actual \n_Predicted \n\nFigure 2:  Average  actual/predicted responses to  the faces  in  each category.  Key:  DP = \nDissimilar parents; SM = Similar morphs;  T = Non-parent targets;  SP = Similar parents; \nDM = Dissimilar morphs; D = Distractors. \n\nexperiment.  The model thus must be carefully tested against new data, and its predictions \nempirically validated. \n\nSince a theoretical distinctiveness measure based on the sparseness of face space around an \nexemplar was sufficient to account for the similar morphs' familiarity inversion, we predict \nthat distinctiveness with respect to the study set is the critical factor influencing kernel size, \nrather than context-free human distinctiveness judgments. We can easily test this prediction \nby  having subjects rate the distinctiveness of the stimuli without prior exposure and then \ndetermine whether their distinctiveness ratings improve or degrade the model's fit. \n\nA somewhat disappointing (though not particularly surprising) aspect of our results is that \nthe  model  requires  a  representation  based  on  human  similarity judgments.  Ideally,  we \nwould prefer to provide an information-processing account using image-based representa(cid:173)\ntions like eigenface projections or Gabor filter responses.  Interestingly, the efficacy of the \nimage-based representations seems  to depend on how similar they are to  the MDS  repre(cid:173)\nsentations.  The PC projection representation performed the worst,  and distances between \npairs of PC representations had a correlation of 0.388 with the distances between pairs of \nMDS representations. For the Gabor filter representation, which performed better, the cor(cid:173)\nrelation is 0.517.  In future work,  we plan to investigate how the MDS representation (or a \nrepresentation like it) might be derived directly from the face images. \n\n\f30 \n\nM  N.  Dailey,  G.  W  Cottrell and T  A.  Busey \n\nBesides providing an infonnation-processing account of the human data, there are several \nother avenues for  future  research.  These include empirical testing  of our distinctiveness \npredictions, evaluating the applicability of the distinctiveness model in domains other than \nface processing, and evaluating the ability of other modeling paradigms to account for this \ndata. \n\nAcknowledgements \n\nWe  thank  Chris  Vogt  for  comments  on  a  previous  draft,  and  other members  of Gary's \nUnbelievable Research Unit (GURU) for earlier comments on this work. This research was \nsupported in part by NIMH grant MH57075 to GWe. \n\nReferences \n\nBishop, C. M.  (1995).  Neural networks for pattern recognition.  Oxford University Press, \n\nOxford. \n\nBusey, T.  A.  (1999).  Where are morphed faces in multi-dimensional face space?  Psycho(cid:173)\n\nlogical Science.  In press. \n\nBusey,  T.  A.  and  Tunnicliff,  J.  (submitted).  Accounts  of blending,  distinctiveness  and \ntypicality in face recognition.  Journal of Experimental Psychology:  Learning,  Memory, \nand Cognition. \n\nDailey,  M.  N.,  Cottrell,  G.  W.,  and  Busey,  T.  A.  (1998).  Eigenfaces  for  familiarity.  In \nProceedings of the Twentieth Annual Conference of the Cognitive Science Society, pages \n273-278, Mahwah, NJ. Erlbaum. \n\nGillund,  G.  and  Shiffrin,  R.  (1984).  A  retrieval  model  for  both  recognition  and  recall. \n\nPsychological Review, 93(4):411-428. \n\nJ.  Buhmann, M. L. and von der Malsburg, C.  (1990).  Size and distortion invariant object \nrecognition by hierarchical graph matching.  In Proceedings of the IJCNN International \nJoint Conference on Neural Networks, volume II, pages 411-416. \n\nNosofsky,  R.  M.  (1986).  Attention,  similarity,  and the identification-categorization rela(cid:173)\n\ntionship. Journal of Experimental Psychology:  General, 116(1):39-57. \n\nReinitz,  M.,  Lammers,  W.,  and Cochran,  B.  (1992).  Memory-conjunction errors:  Mis(cid:173)\ncombination of stored  stimulus  features  can produce illusions of memory.  Memory  & \nCognition, 20(1):1-11. \n\nSolso, R. L. and McCarthy, J.  E. (1981).  Prototype formation offaces:  A case of pseudo(cid:173)\n\nmemory.  British Journal of Psychology, 72(4):499-503. \n\nTanaka, J., Giles, M.,  Kremen, 5.,  and Simon, V.  (submitted).  Mapping attract or fields  in \n\nface space:  The atypicality bias in face recognition. \n\nTurk, M. and Pentland, A.  (1991).  Eigenfaces for recognition.  The  Journal of Cognitive \n\nNeuroscience, 3:71-86. \n\nValentine, T.  and Endo, M.  (1992).  Towards an  exemplar model of face  processing:  The \neffects of race and distinctiveness.  The  Quarterly Journal of Experimental Psychology, \n44A(4):671-703. \n\n\f", "award": [], "sourceid": 1527, "authors": [{"given_name": "Matthew", "family_name": "Dailey", "institution": null}, {"given_name": "Garrison", "family_name": "Cottrell", "institution": null}, {"given_name": "Thomas", "family_name": "Busey", "institution": null}]}