{"title": "Pattern Recognition from One Example by Chopping", "book": "Advances in Neural Information Processing Systems", "page_first": 371, "page_last": 378, "abstract": null, "full_text": "Pattern Recognition from One Example by\n\nChopping\n\nFranc\u00b8ois Fleuret\n\nCVLAB/LCN \u2013 EPFL\nLausanne, Switzerland\n\nGilles Blanchard\u2217\nFraunhofer FIRST\nBerlin, Germany\n\nfrancois.fleuret@epfl.ch\n\nblanchar@first.fhg.de\n\nAbstract\n\nWe investigate the learning of the appearance of an object from a single\nimage of it. Instead of using a large number of pictures of the object to\nrecognize, we use a labeled reference database of pictures of other ob-\njects to learn invariance to noise and variations in pose and illumination.\nThis acquired knowledge is then used to predict if two pictures of new\nobjects, which do not appear on the training pictures, actually display the\nsame object.\nWe propose a generic scheme called chopping to address this task. It\nrelies on hundreds of random binary splits of the training set chosen to\nkeep together the images of any given object. Those splits are extended\nto the complete image space with a simple learning algorithm. Given\ntwo images, the responses of the split predictors are combined with a\nBayesian rule into a posterior probability of similarity.\nExperiments with the COIL-100 database and with a database of 150 de-\ngraded LATEX symbols compare our method to a classical learning with\nseveral examples of the positive class and to a direct learning of the sim-\nilarity.\n\n1\n\nIntroduction\n\nPattern recognition has so far mainly focused on the following task: given many training\nexamples labelled with their classes (the object they display), guess the class of a new sam-\nple which was not available during training. The various approaches all consist of going\nto some invariant feature space, and there using a classi\ufb01cation method such as neural net-\nworks, decision trees, kernel techniques, Bayesian estimations based on parametric density\nmodels, etc. Providing a large number of examples results in good statistical estimates of\nthe model parameters. Although such approaches have been successful in applications to\nmany problems, their performance are still far from what biological visual systems can do,\nwhich is one sample learning. This can be de\ufb01ned as the ability, given one picture of an\nobject, to spot instances of the same object, under the assumption that these new views can\nbe induced by the single available example.\n\n\u2217Supported in part by the IST Programme of the European Community, under the PASCAL Net-\n\nwork of Excellence, IST-2002-506778\n\n\fBeing able to perform that type of one-sample learning corresponds to the ability, given\none example, to sort out which elements of a test set are of the same class (i.e. one class\nvs. the rest of the world). This can be done by comparing one by one all the elements of\nthe test set with the reference example, and labelling as of the same class those which are\nsimilar enough. Learning techniques can be used to choose the similarity measure, which\ncould be adaptive and learned from a large number of examples of classes not involved in\nthe test.\n\nThus, given a large number of training images of a large number of objects labeled with\ntheir actual classes, and provided two pictures of unknown objects (objects which do not\nappear in the training pictures), we want to decide if these two objects are actually the\nsame object. The \ufb01rst image of such a couple can be seen as a single training example, and\nthe second image as a test example. Averaging the error rate by repeating that test several\ntimes provides with an estimate of a one-sample learning (OSL) error rate.\n\nThe idea of \u201clearning how to learn\u201d is not new and has been applied in various settings [12].\nTaking into account and/or learning relevant geometric invariances for a given task has been\nstudied under various forms [1, 8, 11], and in [7] with the goal to achieve learning from very\nfew examples. Finally, the precise one-sample learning setting considered here has been\nthe object of recent research [4, 3, 5] proposing different methods (hyperfeature learning,\ndistance learning) for \ufb01nding invariant features from a set of training reference objects\ndistinct from the test objects. This principle has also been dubbed interclass transfer.\n\nThe present study proposes a generic approach, and avoids an explicit description of the\nspace of deformations. We propose to build a large number of binary splits of the image\nspace, designed to assign the same binary label to all the images common to a same object.\nThe binary mapping associated to such a split is thus highly invariant across the images\nof a certain object while highly variant across images of different objects. We can de\ufb01ne\nsuch a split on the training images, and train a predictor to extend it to the complete image\nspace by induction. We expect the predictor to respond similarly on two images of a same\nobject, and differently on two images of two different objects with probability 1\n2 . The\nglobal criterion to compare two images consists roughly of counting how many such split-\npredictors responds similarly and compare the result to a \ufb01xed threshold.\n\nThe principle of transforming a multiclass learning problem into several binary ones by\nclass grouping has a long history in Machine Learning [10]. From this point of view the\ncollected output of several binary classi\ufb01ers is used as a way for coding class membership.\nIn [2] it was proposed to carefully choose the class groupings so as to yield optimal sep-\naration of codewords (ECOC methodology). While our method is related to this general\nprinciple, our goal is different since we are interested in recognizing yet-unseen objects.\nHence, the goal is not to code multiclass membership; our focus is not on designing ef\ufb01-\ncient codes \u2013 splits are chosen randomly and we take a large number of them \u2013 but rather\non how to use the learned mappings for learning unknown objects.\n\n2 Data and features\n\nTo make the rest of the paper clearer to the reader, we now introduce the data and feature\nsets we are using for our proof of concept experiments. However, note that while we have\nfocused on image classi\ufb01cation, our approach is generic and could be applied to any signals\nfor which adaptive binary classi\ufb01ers are available.\n\n2.1 Data\n\nWe use two databases of pictures for our experiments. The \ufb01rst one is the standard COIL-\n100 database of pictures [9]. It contains 7200 images corresponding to 100 different objects\n\n\fFigure 1: Four objects from the 100 objects of the COIL-100 database (downsampled to\n38 \u00d7 38 grayscale pixels) and four symbols from the 150 symbols of our LATEX symbol\ndatabase (A, \u03a6, \u22d6 and \u22d4, resolution 28 \u00d7 28). Each image of the later is generated by\napplying a rotation and a scaling, and by adding lines of random grayscales at random\nlocations and orientations.\n\n(x,y)\n\nd=0\n\nd=1\n\nd=2\n\nd=3\n\nd=4\n\nd=5\n\nd=6\n\nd=7\n\nFigure 2: The \ufb01gure on the left shows how an horizontal edge \u03bex,y,4 is detected: the six\ndifferences between pixels connected by a thin segment have to be all smaller in absolute\nvalue than the difference between the pixels connected by the thick segment. The relative\nvalues of the two pixels connected by the thick segment de\ufb01ne the polarity of the edge\n(dark to light or light to dark). On the right are shown the eight different types of edges.\n\nseen from 72 angles of view. We down-sample these images from their original resolution\nto 38 \u00d7 38 pixels, and convert them to grayscale. Examples are given in \ufb01gure 1 (left). The\nsecond database contains images of 150 LATEX symbols. We generated 1, 000 images of\neach symbol by applying a random rotation (angle is taken between \u221220 and +20 degrees)\nand a random scaling factor (up to 1.25). Noise is then added by adding random line\nsegments of various gray scales, locations and orientations. The \ufb01nal resulting database\ncontains 150, 000 images. Examples of these degraded images are given in \ufb01gure 1 (right).\n\n2.2 Features\n\nAll the classi\ufb01cation processes in the rest of the paper are based on edge-based boolean\nfeatures. Let \u03bex,y,d denote a basic edge detector indexed by a location (x, y) in the image\nframe and an orientation d which can take eight different values, corresponding to four\norientations and two polarities (see \ufb01gure 2). Such an edge detector is equal to 1 if and\nonly if an edge of the given location is detected at the speci\ufb01ed location, and 0 otherwise.\nA feature fx0,y0,x1,y1,d is a disjunction of the \u03be\u2019s in the rectangle de\ufb01ned by x0, y0, x1, y1.\nThus, it is equal to one if and only if \u2203x, y, x0 \u2264 x \u2264 x1, y0 \u2264 y \u2264 y1, \u03bex,y,d = 1. For\npictures of size 32 \u00d7 32 there is a total of N = 1\n\n4 (32 \u00d7 32)2 \u00d7 8 \u2243 2.106 features.\n\n\f 0.2\n\n 0.15\n\n 0.1\n\n 0.05\n\nNegative class\nPositive class\n\n 0.2\n\n 0.15\n\n 0.1\n\n 0.05\n\nNegative class\nPositive class\n\n 0\n-4000\n\n-3000\n\n-2000\n\n-1000\n\n 0\n\n 1000\n\n 2000\n\n 3000\n\n 4000\n\n 0\n-4000\n\n-3000\n\n-2000\n\n-1000\n\n 0\n\n 1000\n\n 2000\n\n 3000\n\n 4000\n\nResponse\n\nResponse\n\nFigure 3: These two histograms are representative of the responses of two split predictors\nconditionally to the real arbitrary labelling P (L | S).\n\n3 Chopping\n\nThe main idea we propose in this paper consists of learning a large number of binary splits\nof the image space which would ideally assign the same binary label to all the images of\nany given object. In this section we de\ufb01ne these splits and describe and justify how they\nare combined into a global rule.\n\n3.1 Splits\n\nA split is a binary labelling of the image space, with the property to give the same label\nto all images of a given object. We can trivially produce a labelling with that property on\nthe training examples, but we need to be able to extend it to images not appearing in the\ntraining data, including images of other objects. We suppose that it is possible to infer a\nrelevant split function on the complete image space, including images of other objects by\nlooking at the problem as a binary classi\ufb01cation problem. Inference is done by the mean\nof a simple learning scheme: a combination of a fast feature selection based on conditional\nmutual information (CMIM) [6] and a linear perceptron.\n\nThus, we create M arbitrary splits on the training sample by randomly assigning the la-\nbel 1 to half of the NT objects appearing in the training set, and 0 to the others. Since\nthere are (cid:0) NT\nNT /2(cid:1) such balanced arbitrary labellings, with NT of the order of a few tens, a\nvery large number of splits is available and only a small subset of them will be actually\nused for learning. For each one of those splits, we train a predictor using the scheme de-\nscribed above. Let (S1, . . . , SM ) denote the family of arbitrary splits and (L1, . . . , LM )\nthe split-predictors. The continuous outputs of these predictors before thresholding will be\ncombined in the \ufb01nal classi\ufb01cation.\n\n3.2 Combining splits\n\nTo combine the responses of the various split predictors, we rely on a set of simple condi-\ntional independence assumptions (comparable to the \u201cnaive Bayes\u201d setting) on the distribu-\ntion of the true class label C (each class corresponds to an object), the split labels (Si) and\nthe predictor outputs (Li) for a single image. We do not assume that for test image pairs\n(I1, I2) the two images are independent, because we want to encompass the case where\npairs of images of the same object are much more frequent than they would be if they were\nindependent (typically in our test data we have arranged to have 50% of test pairs picturing\nthe same object). We however still need some conditional independence assumption for\nthe drawing of test image pairs. To simplify the notation we denote L1 = (L1\ni ), L2 = (L2\ni )\ni ) the col-\nthe collection of predictor outputs for images 1 and 2, S 1 = (S 1\nlection of their split labels and C1, C2 their true classes. The conditional indepence\n\ni ), S 2 = (S 2\n\n\fassumptions we make are summed up in the following Markov dependency diagram:\n\nL2\n\n1\n\n2\n\nL2\n. . .\n\nL2\nM\n\nS 2\n\n1\n\n2\n\nS 2\n. . .\n\nS 2\nM\n\n@\nhh\n@\n\u0013\n\n\u0013\n\u0013\n\nC 2\n\nC 1\n\n\n((\n\nS\n\nS\nS\n\nS 1\n\n1\n\n2\n\nS 1\n. . .\n\nS 1\nM\n\nL1\n\n1\n\n2\n\nL1\n. . .\n\nL1\nM\n\nIn words, for each split i, the predictor output Li is assumed to be independent of the true\nclass C conditionally to the split label Si; and conditionally to the split labels (S1, S2) of\nboth images, the outputs of predictors on test pair images are assumed to be independent.\n\ni = S 2\n\nFinally, we make the additional symmetry hypothesis that conditionally to C1 = C2, for all\ni = Si and (Si) are independent Bernoulli variables with parameter 0.5, while\ni : S 1\nconditionally to C1 6= C2 all split labels (S 1\nUnder these assumptions we then want to compute the log-odds ratio\n\ni ) are independent Bernoulli(0.5).\n\ni , S 2\n\nlog\n\nP (C1 = C2 | L1, L2)\nP (C1 6= C2 | L1, L2)\n\n= log\n\nP (L1, L2 | C1 = C2)\nP (L1, L2 | C1 6= C2)\n\n+ log\n\nP (C1 = C2)\nP (C1 6= C2)\n\n.\n\n(1)\n\nIn this formula and the next ones, when handling real-valued variables L1, L2 we are im-\nplicitly assuming that they have a density with respect to the Lebesgue measure and prob-\nabilities are to be interpreted as densities with some abuse of notation. We assume that the\nsecond term above is either known or can be reliably estimated. For the \ufb01rst term, under\nthe aforementioned independence assumptions, the following holds (see appendix):\n\nlog\n\nP (L1, L2 | C1 = C2)\nP (L1, L2 | C1 6= C2)\n\n= N log 2 + X\n\ni\n\nlog (cid:0)\u03b11\n\ni \u03b12\n\ni + (1 \u2212 \u03b11\n\ni )(1 \u2212 \u03b12\n\ni )(cid:1) ,\n\n(2)\n\ni = P (Sj\n\ni = 1 | Lj\n\ni ). As a quick check, note that if the predictor outputs (Li) are\ni is 0.5), then the above formula gives a ratio of 1\ni are 0 or 1), the odds\nj , this excludes the case\nj there is still a tiny chance that C1 6= C2 if\n\nwhere \u03b1j\nuninformative (i.e. every probability \u03b1j\nwhich is what we expect. If they are perfectly informative (i.e. all \u03b1j\nratio can take the values 0 (if for some j we can ensure S 1\nC1 = C2) or 2N (if for all j we have S 1\nby chance C1, C2 are on the same side of each split).\nTo estimate the probabilities P (Sj | Lj), we use a simple 1D Gaussian model for the output\nof the predictor given the true split label. Mean and variance are estimated from the training\nset for each predictor. Experimental \ufb01ndings show that this Gaussian modelling is realistic\n(see \ufb01gure 3).\n\nj 6= S 2\n\nj = S 2\n\n4 Experiments\n\nWe estimate the performance of the chopping approach by comparing it to classical learning\nwith several examples of the positive class and to a direct learning of the similarity of two\nobjects on different images. For every experiment, we use a family of 10, 000 features\nsampled uniformly in the complete set of features (see section 2.2)\n\n4.1 Multiple example learning\n\nIn this procedure, we train a predictor with several pictures of a positive class and with\na very large number of pictures of a negative class. The number of positive examples\ndepends on the experiments (from 1 to 32) and the number of negative examples is 2, 000\n\n\fl\n\n)\ns\no\nb\nm\ny\ns\n \nX\ne\nT\na\nL\n(\n \ns\nr\no\nr\nr\ne\n\n \nt\ns\ne\nT\n\n 0.6\n\n 0.5\n\n 0.4\n\n 0.3\n\n 0.2\n\n 0.1\n\n 0\n\nNumber of samples for multi-example learning\n\n 1\n\n 2\n\n 4\n\n 8\n\n 16\n\n 32\n\nChopping\nSmart chopping\nMulti-example learning\nSimilarity learnt directly\n\n 1\n\n 2\n\n 4\n\n 8\n\n 16\n\n 32\n\n 64 128 256 512 1024\n\nNumber of splits for chopping\n\n)\n0\n0\n1\n-\nL\nO\nC\n\nI\n\n(\n \ns\nr\no\nr\nr\ne\n\n \nt\ns\ne\nT\n\n 0.6\n\n 0.5\n\n 0.4\n\n 0.3\n\n 0.2\n\n 0.1\n\n 0\n\nNumber of samples for multi-example learning\n\n 1\n\n 2\n\n 4\n\n 8\n\n 16\n\n 32\n\nChopping\nSmart chopping\nMulti-example learning\nSimilarity learnt directly\n\n 1\n\n 2\n\n 4\n\n 8\n\n 16\n\n 32\n\n 64 128 256 512 1024\n\nNumber of splits for chopping\n\nFigure 4: Error rates of the chopping, smart-chopping (see \u00a74.2), multi-example learning\nand learnt similarity on the LATEX symbol (left) and the COIL-100 database (right). Each\ncurve shows the average error and a two standard deviation interval, both estimated on ten\nexperiments for each setting. The x-axis shows either the number of splits for chopping or\nthe number of samples of the positive class for the multi-example learning.\n\nfor both the COIL-100 and the LATEX symbol databases. Note that to handle the unbalanced\npositive and negative populations, the perceptron bias is chosen to minimize a balanced\nerror rate. In each case, and for each number of positive samples, we run 10 experiments.\nEach experiment consists of several cross-validation cycles so that the total number of test\npictures is roughly the same as the number of pairs in one-sample techniques experiments\nbelow.\n\n4.2 One-sample learning\n\nFor each experiment, whatever the predictor is, we \ufb01rst select 80 training objects from the\nCOIL-100 database (respectively 100 symbols from the LATEX symbol database). The test\nerror is computed with 500 pairs of images of the 20 unseen objects for the COIL-100, and\n1, 000 pairs of images of the 50 unseen objects for the LATEX symbols. These test sets are\nbuilt to have as many pairs of images of the same object than pairs of images of different\nobjects.\n\nLearnt similarity: Note that one-sample learning can also be simply cast as a standard\nbinary classi\ufb01cation problem of pairs of images into the classes {same, different}. We\ntherefore want to compare the Chopping method to a more standard learning method di-\nrectly on pairs of images using a comparable set of features. For every single feature f\non single images, we consider three features of a pair of images standing for the conjunc-\ntion, disjunction and equality of the feature responses on the two images. From the 10, 000\nfeatures on single images, we thus create a set of 30, 000 features on pairs of images.\nWe generate a training set of 2, 000 pairs of pictures for the experiments with the COIL-\n100 database and 5, 000 for the LATEX symbols, half picturing the same object twice, half\npicturing two different objects. We then train a predictor similar to those used for the\nsplits in the chopping scheme: feature selection with CMIM, and linear combination with\na perceptron (see section 3.1), using the 30, 000 features described above.\n\nChopping: The performance of the chopping approach is estimated for several numbers\nof splits (from 1 to 1024). For each split we select 50 objects from the training objects, and\nselect at random 1, 000 training images of these objects. We generate an arbitrary balanced\nbinary labelling of these 50 objects and label the training images accordingly. We then\n\n\fbuild a predictor by selecting 2, 000 features with the CMIM algorithm, and combine them\nwith a perceptron (see section 3.1).\n\nTo compensate for the limitation of our conditional independence assumptions we allow to\nadd a \ufb01xed bias to the log-odds ratio (1). This type of correction is common when using\nnaive-Bayes type assumptions. Using the remaining training objects as validation set, we\ncompute this bias so as to minimize the validation error. We insist that no objects of the\ntest classes be used for training.\n\nTo improve the performance of the splits, we also test a \u201csmart\u201d version of the chopping\nfor which each split is built in two steps. The \ufb01rst step is similar to what is described\nabove. From that \ufb01rst step, we remove the 10 objects for which the labelling prediction\nhas the highest error rate, and re-build the split with the 40 remaining objects. This get\nrid of problematic objects or inconsistent labelling (for instance trying to force two similar\nobjects to be in different halves of the split).\n\n4.3 Results\n\nThe experiments demonstrate the good performance of chopping when only one example\nis available.\nIts optimal error rate, obtained for the largest number of splits, is 7.41%\non the LATEX symbol database and 11.42% on the COIL-100 database. By contrast, a\ndirect learning of the similarity (see section 4.2), reaches respectively 15.54% and 18.1%\nrespectively with 8, 192 features.\nOn both databases, the classical multi-sample learning scheme requires 32 samples to reach\nthe same level of performances (10.51% on the COIL-100 and 10.7% on the LATEX sym-\nbols).\n\nThe error curves (see \ufb01gure 4) are all monotonic. There is no over\ufb01tting when the num-\nber of splits increases, which is consistent with the absence of global learning: splits are\ncombined with an ad-hoc Bayesian rule, without optimizing a global functional, which\ngenerally also results in better robustness.\n\nThe smart splits (see section 4.2) achieve better performance initially but eventually reach\nthe same error rates as the standard splits. There is no visible degradation of the asymptotic\nperformance due to either a reduced independence between splits or a diminution of their\nseparation power. However the computational cost is twice as high, since every predictor\nhas to be built twice.\n\n5 Conclusion\n\nIn this paper we have proposed an original approach to learning the appearance of an object\nfrom a single image. Our method relies on a large number of individual splits of the image\nspace designed to keep together the images of any of the training objects. These splits\nare learned from a training set of examples and combined into a Bayesian framework to\nestimate the posterior probability for two images to show the same object.\n\nThis approach is very generic since it never makes the space of admissible perturbations\nexplicit and relies on the generalization properties of the family of predictors. It can be\napplied to predict the similarity of two signals as soon as a family of binary predictors\nexists on the space of individual signals.\n\nSince the learning is decomposed into the training of several splits independently, it can\nbe easily parallelized. Also, because the combination rule is symmetric with respect to the\nsplits, the learning can be incremental: splits can be added to the global rule progressively\nwhen they become available.\n\n\fAppendix: Proof of formula (2). For the \ufb01rst factor, we have\n\nP (L1, L2 | C1 = C2)\n\nP (L1, L2 | C1 = C2, S 1 = s1, S 2 = s2)P (S 1 = s1, S 2 = s2 | C1 = C2)\n\nP (L1, L2 | S 1 = s1, S 2 = s2)P (S 1 = s1, S 2 = s2 | C1 = C2)\n\n= X\n\ns1,s2\n\n= X\n\ns1,s2\n\n= X\n\nY\n\ns1,s2\n\ni\n\n= 2\u2212N Y\n\ni\n\nP (L1\n\ni | S 1\n\ni = s1\n\ni )P (L2\n\ni | S 2\n\ni = s2\n\ni )P ((S 1\n\ni , S 2\n\ni ) = (s1\n\ni , s2\n\ni ) | C1 = C2)\n\n(cid:0)P (L1\n\ni | S 1\n\ni = 1)P (L2\n\ni | S 2\n\ni = 1) + P (L1\n\ni | S 1\n\ni = 0)P (L2\n\ni | S 2\n\ni = 0)(cid:1) .\n\nIn the second equality, we have used that L is independent of C given S.\nequality, we have used that the (Lj\nused the symmetry assumption on the distribution of (S1, S2) given C1 = C2. Similarly,\n\nIn the third\ni ) are independent given S. In the last equality, we have\n\nP (L1, L2 | C1 6= C2) = 4\u2212N Y\n\nX\n\ni\n\ns1,s2\n\nP (L1\n\ni | S 1\n\ni = s1)P (L2\n\ni | S 2\n\ni = s2)\n\n= 4\u2212N Y\n\ni\n\nP (L1\n\ni )P (L2\n\ni ) X\n\ns1,s2\n\nP (S 1\n\ni = s1 | L1\nP (S 1\n\ni )P (S 2\ni = s1)P (S 2\n\ni = s2 | L2\ni )\ni = s2)\n\n= 4\u22122N Y\n\ni\n\nP (L1\n\ni )P (L2\n\ni ) ,\n\nsince P (Sj\nusing the latter property again leads to the conclusion.\n\ni = s) \u2261 1\n\n2 by the symmetry hypothesis. Taking the ratio of the two factors and\n\nReferences\n[1] Y. Bengio and M. Monperrus. Non-local manifold tangent learning. In Advances in Neural\n\nInformation Processing Systems 17, pages 129\u2013136. MIT press, 2005.\n\n[2] T. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output\n\ncodes. Journal of Arti\ufb01cial Intelligence Research, 2:263\u2013286, 1995.\n\n[3] A. Ferencz, E. Learned-Miller, and J. Malik. Learning hyper-features for visual identi\ufb01cation.\n\nIn Advances in Neural Information Processing Systems 17, pages 425\u2013432. MIT Press, 2004.\n\n[4] A. Ferencz, E. Learned-Miller, and J. Malik. Building a classi\ufb01cation cascade for visual iden-\n\nti\ufb01cation from one example. In International Conference on Computer Vision (ICCV), 2005.\n\n[5] M. Fink. Object classi\ufb01cation from a single example utilizing class relevance metrics.\nAdvances in Neural Information Processing Systems 17, pages 449\u2013456. MIT Press, 2005.\n\nIn\n\n[6] F. Fleuret. Fast binary feature selection with conditional mutual information. Journal of Ma-\n\nchine Learning Research, 5:1531\u20131555, November 2004.\n\n[7] F. Li, R. Fergus, and P. Perona. A Bayesian approach to unsupervised one-shot learning of\n\nobject categories. In Proceedings of ICCV, volume 2, page 1134, 2003.\n\n[8] E. G. Miller, N. E. Matsakis, and P. A. Viola. Learning from one example through shared\ndensities on transforms. In Proceedings of the IEEE conference on Computer Vision and Pattern\nRecognition, volume 1, pages 464\u2013471, 2000.\n\n[9] S. A. Nene, S. K. Nayar, and H. Murase. Columbia Object Image Library (COIL-100). Tech-\n\nnical Report CUCS-006-96, Columbia University, 1996.\n\n[10] T. Sejnowski and C. Rosenberg. Parallel networks that learn to pronounce english text. Journal\n\nof Complex Systems, 1:145\u2013168, 1987.\n\n[11] P. Simard, Y. Le Cun, and J. Denker. Ef\ufb01cient pattern recognition using a new transforma-\ntion distance. In S. Hanson, J. Cowan, and C. Giles, editors, Advances in Neural Information\nProcessing Systems 5, pages 50\u201368. Morgan Kaufmann, 1993.\n\n[12] S. Thrun and L. Pratt, editors. Learning to learn. Kluwer, 1997.\n\n\f", "award": [], "sourceid": 2838, "authors": [{"given_name": "Francois", "family_name": "Fleuret", "institution": null}, {"given_name": "Gilles", "family_name": "Blanchard", "institution": null}]}