{"title": "Algebraic Set Kernels with Application to Inference Over Local Image Representations", "book": "Advances in Neural Information Processing Systems", "page_first": 1257, "page_last": 1264, "abstract": null, "full_text": " Algebraic Set Kernels with Application to\n Inference Over Local Image Representations\n\n\n\n Amnon Shashua and Tamir Hazan \n\n\n\n\n Abstract\n\n This paper presents a general family of algebraic positive definite simi-\n larity functions over spaces of matrices with varying column rank. The\n columns can represent local regions in an image (whereby images have\n varying number of local parts), images of an image sequence, motion tra-\n jectories in a multibody motion, and so forth. The family of set kernels\n we derive is based on a group invariant tensor product lifting with param-\n eters that can be naturally tuned to provide a cook-book of sorts covering\n the possible \"wish lists\" from similarity measures over sets of varying\n cardinality. We highlight the strengths of our approach by demonstrat-\n ing the set kernels for visual recognition of pedestrians using local parts\n representations.\n\n\n\n1 Introduction\n\nIn the area of learning from observations there are two main paths that are often mutually\nexclusive: (i) the design of learning algorithms, and (ii) the design of data representations.\nThe algorithm designers take pride in the fact that their algorithm can generalize well given\nstraightforward data representations (most notable example is SVM [11]), whereas those\nwho work on data representations demonstrate often remarkable results with sophisticated\ndata representations using only straightforward learning algorithms (e.g. [5, 10, 6]). This\ndichotomy is probably most emphasized in the area of computer vision, where image under-\nstanding from observations involve data instances of images or image sequences containing\nhuge amounts of data. A straightforward representation treating all the measurements as\na single vector, such as the raw pixel data, or a transformed raw-pixel data, places un-\nreasonable demands on the learning algorithm. The \"holistic\" representations suffer also\nfrom sensitivity to occlusions, invariance to local and global transformations, non-rigidity\nof local parts of the object, and so forth.\n\nPractitioners in the area of data representations have long noticed that a collection of local\nrepresentations (part-based representations) can be most effective to ameliorate changes of\nappearance [5, 10, 6]. The local data representations vary in their sophistication, but share\nthe same principle where an image corresponds to a collection of points each in a relatively\nsmall dimensional space -- instead of a single point in high-dimensional space induced\nby holistic representations. In general, the number of points (local parts) per image may\nvary and the dimension of each point may vary as well. The local representations tend\n\n School of Engineering and Computer Science, Hebrew University of Jerusalem, Jerusalem\n91904, Israel\n\n\f\nto be robust against occlusions, local and global transformations and preserve the original\nresolution of the image (the higher the resolution the more parts are generated per image).\n\nThe key for unifying local and holistic representations for inference engines is to design\npositive definite similarity functions (a.k.a. kernels) over sets (of vectors) of varying cardi-\nnalities. A Support Vector Machine (SVM) [11] can then handle sets of vectors as a single\ninstance via application of those \"set kernels\". A set kernel would be useful also to other\ntypes of inference engines such as kernel versions of PCA, LDA, CCA, ridge regression and\nany algorithm which can be mapped onto inner-products between pairs of data instances\n(see [8] for details on kernel methods).\n\nFormally, we consider an instance being represented by a collection of vectors, which for\nthe sake of convenience, form the columns of a matrix. We would like to find an algebraic\nfamily of similarity functions sim(A, B) over matrices A, B which satisfy the following\nrequirements: (i) sim(A, B) is an inner product, i.e., sim(A, B) = (A) (B) for some\nmapping () from matrices to vectors, (ii) sim(A, B) is built over local kernel functions\nk(ai, bj) over columns ai and bj of A, B respectively, (iii) The column cardinality (rank\nof column space) of A and B need not be the same (number of local parts may differ from\nimage to image), and (iv) the parameters of sim(A, B) should induce the properties of in-\nvariance to order (alignement) of parts, part occlusions, and degree of interactions between\nlocal parts. In a nutshell, our work provides a cook-book of sorts which fundamentally\ncovers the possible algebraic kernels over collections of local representations built on top\nof local kernels by combining (linearly and non-linearly) local kernels to form a family of\nglobal kernels over local representations.\n\nThe design of a kernel over sets of vectors has been recently attracting much attention in the\ncomputer vision and machine learning literature. A possible approach is to fit a distribution\nto the set of vectors and define the kernel as a distribution matching measure [9, 12, 4].\nThis has the advantage that the number of local parts can vary but at the expense of fitting\na distribution to the variation over parts. The variation could be quite complex at times,\nunlikely to fit into a known family of distributions in many situations of interest, and in\npractice the sample size (number of columns of A) is not sufficiently large to reliably fit\na distribution. The alternative, which is the approach taken in this paper, is to create a\nkernel over sets of vectors in a direct manner. When the column cardinality is equal it is\npossible to model the similarity measure as a function over the principal angles between the\ntwo column spaces ([14] and references therein) while for varying column cardinality only\nheuristic similarity measures (which are not positive definite) have so far been introduced\n[13].\n\nIt is important to note that although we chose SVM over local representations as the appli-\ncation to demonstrate the use of set kernels, the need for adequately working with instances\nmade out of sets of various cardinalities spans many other application domains. For exam-\nple, an image sequence may be represented by a set (ordered or unordered) of vectors,\nwhere each vector stands for an image, the pixels in an image can be represented as a tuple\nconsisting of position, intensity and other attributes, motion trajectories of multiply mov-\ning bodies can be represented as a collection of vectors, and so on. Therefore, the problem\naddressed in this paper is fundamental both theoretically and from a practical perspective\nas well.\n\n\n2 The General Family of Inner-Products over Matrices\n\nWe wish to derive the general family of positive definite similarity measures sim(A, B)\nover matrices A, B which have the same number of rows but possibly different column\nrank (in particular, different number of columns). Let A be of dimensions n k and\nB of dimension n q where n is fixed and k, q can vary at will over the application of\nsim(, ) on pairs of matrices. Let m = max{n, k, q} be the upper bound over all values\n\n\f\nof k, q encountered by the data. Let ai, bj be the column vectors of matrices A, B and\nlet k(ai, bj) be the local kernel function. For example, in the context where the column\nvectors represent local parts of an image, then the matching function k(, ) between pairs\nof local parts provides the building blocks of the overall similarity function. The local\nkernel is some positive definite function k(x, y) = (x) (y) which is the inner-product\nbetween the \"feature\"-mapped vectors x, y for some feature map (). For example, if ()\nis the polynomial map of degree up to d, then k(x, y) = (1 + x y)d.\n\nThe local kernels can be combined in a linear or non-linear manner. When the combination\nis linear the similarity becomes the analogue of the inner-product between vectors extended\nto matrices. We will refer to the linear family as sim(A, B) =< A, B > and that will be\nthe focus of this section. In the next section we will derive the general (algebraic) non-\nlinear family which is based on \"lifting\" the input matrices A, B onto higher dimensional\nspaces and feeding the result onto the < , > machinery developed in this section, i.e.,\nsim(A, B) =< (A), (B) >.\n\nWe will start by embedding A, B onto m m matrices by zero padding as follows. Let\nei denote the i'th standard basis vector (0, .., 0, 1, 0, .., 0) of Rm. The the embedding is\nrepresented by linear combinations of tensor products:\n\n n k n q\n\n A aijei ej, B bltel et.\n i=1 j=1 l=1 t=1\n\nNote that A, B are the upper-left blocks of the zero-padded matrices. Let S be a positive\nsemi definite m2 m2 matrix represented by S = p G\n r=1 r Fr where Gr , Fr are m m\nmatrices1. Let ^\n Fr be the q k upper-left sub-matrix of Fr , and let ^\n Gr be the n n\nupper-left sub-matrix of Gr. We will be using the following three identities:\n Gx1 F x2 = (G F )(x1 x2),\n (G F )(G F ) = GG F F ,\n\n < x1 x2, y >= ( )( ).\n 1 y2 x1 y1 x2 y2\nThe inner-product < A, B > over all p.s.d. matrices S has the form:\n\n < A, B > = < aijei ej, ( Gr Fr) bltel et >\n i,j r l,t\n\n = aijblt < ei ej, Grel Fret >\n r i,j,l,t\n\n = aijblt(e G F\n i r el)(ej r et)\n r i,j,l,t\n\n = aijblt(Gr)il(Fr)jt\n r i,j,l,t\n\n = (A ^\n GrB)jt(Fr)jt\n r lt\n\n\n = trace (A ^\n GrB) ^\n Fr\n r\n\n\nWe have represented the inner product < A, B > using the choice of m m matrices\nGr, Fr instead of the choice of a single m2 m2 p.s.d. matrix S. The matrices Gr, Fr\n\n 1Any S can be represented as a sum over tensor products: given column-wise ordering, the matrix\nG F is composed of n n blocks of the form fij G. Therefore, take Gr to be the n n blocks of\nS and Fr to be the elemental matrices which have \"1\" in coordinate r = (i, j) and zero everywhere\nelse.\n\n\f\nmust be selected such that p G\n r=1 r Fr is positive semi definite. The problem of decid-\ning on the the necessary conditions on Fr and Gr such that the sum over tensor products is\np.s.d is difficult. Even deciding whether a given S has a separable decomposition is known\nto be NP-hard [3]. The sufficient conditions are easy -- choosing Gr, Fr to be positive\nsemi definite would make p G\n r=1 r Fr positive semi definite as well. In this context\n(of separable S) we need one more constraint in order to work with non-linear local ker-\nnels k(x, y) = (x) (y): the matrices ^\n G ~\n r = ~\n M M\n r r must \"distribute with the kernel\",\nnamely there exist Mr such that\n k(M ~\n r x, Mr y) = (Mr x) (Mry) = (x) ~\n M M\n r r (y) = (x) ^\n Gr(y).\n\nTo summarize the results so far, the most general, but seperable, analogue of the inner-\nproduct over vectors to the inner-product of matrices of varying column cardinality has the\nform:\n < A, B >= trace(H ^\n r Fr ) (1)\n r\n\nWhere the entries of Hr consists of k(Mrai, Mrbj) over the columns of A, B after possibly\nundergoing global coordinate changes by Mr (the role of ^\n Gr), and ^\n Fr are the q k upper-\nleft sub-matrix of positive definite m m matrices Fr .\n\nThe role of the matrices ^\n Gr is to perform global coordinate changes of Rn before applica-\ntion of the kernel k() on the columns of A, B. These global transformations include pro-\njections (say onto prototypical \"parts\") that may be given or \"learned\" from a training set.\nThe matrices ^\n Fr determine the range of interaction between columns of A and columns of\nB. For example, when ^\n Gr = I then < A, B >= trace(A B ^\n F ) where ^\n F is the upper-left\nsubmatrix with the appropriate dimension of some fixed m m p.s.d matrix F = F\n r r .\nNote that entries of A B are k(ai, bj). In other words, when Gr = I, < A, B > boils\ndown to a simple linear super-position of the local kernels, k(a\n ij i, bj )fij where the en-\ntries fij are part of the upper-left block of a fixed positive definite matrix F where the\nblock dimensions are commensurate with the number of columns of A and those of B. The\nvarious choices of F determine the type of invariances one could obtain from the simi-\nlarity measure. For example, when F = I the similarity is simply the sum (average) of\nthe local kernels k(ai, bi) thereby assuming we have a strict alignment between the local\nparts represented by A and the local parts represented by B. On the other end of the in-\nvariance spectrum, when F = 11 (all entries are \"1\") the similarity measure averages\nover all interactions of local parts k(ai, bj) thereby achieving an invariance to the order of\nthe parts. A decaying weighted interaction such as fij = -|i-j| would provide a middle\nground between the assumption of strict alignment and the assumption of complete lack of\nalignment. In the section below we will derive the non-linear version of sim(A, B) based\non the basic machinery of < A, B > of eqn. (1) and lifting operations on A, B.\n\n3 Lifting Matrices onto Higher Dimensions\n\nThe family of sim(A, B) =< A, B > forms a weighted linear superposition of the\nlocal kernel k(ai, bj). Non-linear combinations of local kernels emerge using map-\npings (A) from the input matrices onto other higher-dimensional matrices, thus forming\nsim(A, B) =< (A), (B) >. Additional invariance properties and parameters control-\nling the perfromance of sim(A, B) emerge with the introduction of non-linear combina-\ntions of local kernels, and those will be discussed later on in this section.\n\nConsider the general d-fold lifting (A) = Ad which can be viewed as a nd kd matrix.\nLet Fr be a p.s.d. matrix of dimension md md and ^\n Fr be the upper-left qd kd block of\nFr. Let Gr = ( ^\n Gr)d be a p.s.d matrix of dimension nd nd where ^\n Gr is p.s.d. n n\nmatrix. Using the identity (Ad) Bd = (A B)d we obtain the inner-product in the\n\n\f\nlifted space:\n < Ad, Bd >= trace (A ^\n GrB)d ^\n Fr .\n r\n\nBy taking linear combinations of < Al, Bl >, l = 1, ..., d, we get the general non-\nhomogenous d-fold inner-product simd(A, B). A this point the formulation is general but\nsomewhat unwieldy computational-wise. The key for computational simplification lay in\nthe fact that choices of Fr determine not only local interactions (as in the linear case) but\nalso group invariances. The group invariances are a result of applying symmetric operators\non the tensor product space -- we will consider two of those operators here, known as the\nthe d-fold alternating tensor Ad = A .... A and the d-fold symmetric tensor Ad =\nA ... A. These lifting operations introduce the determinant and permanent operations on\nsubmatrices of A ^\n GrB, as described below.\n\nThe alternating tensor is a multilinear map of Rn, (A .... A)(x1 ... xd) = Ax1 \n... Axd, where\n 1\n x1 ... xd = sign()x\n d! (1) .... x(d),\n Sd\n\nwhere Sd is the symmetric group over d letters and Sd are the permutations of the\ngroup. If x1, ..., xn form a basis of Rn, then the n elements x ... x , where 1 \n d i1 id\ni1 < ... < id n form a basis of the alternating d - f old tensor product of Rn, denoted\nas dRn. If A Rnk is a linear map on Rn sending points to Rk, then Ad is a\nlinear map on dRn sending x1 ... xd to Ax1 ... Axd, i.e., sending points in\ndRn to points in dRk. The matrix representation of Ad is called the \"d'th compound\nmatrix\" Cd(A) whose (i1, ..., id|j1, ..., jd) entry has the value det(A[i1, ..., id : j1, ..., jd])\nwhere the determinant is of the d d block constructed by choosing the rows i1, ..., id\nand the columns j1, ..., jd of A. In other words, Cd(A) has n rows and k columns\n d d\n(instead of nd kd necessary for Ad) whose entries are equal to the d d minors of A.\nWhen k = d, Ck(A) is a vector known as the Grasmanian of A, and when n = k = d\nthen Cd(A) = det(A). Finally, the identity (Ad) Bd = (A B)d specializes to\n(Ad) Bd = (A B)d which translates to the identity Cd(A) Cd(B) = Cd(A B)\nknown as the Binet-Cauchy theorem [1]. Taken together, the \"d-fold alternating kernel\"\nd(A, B) is defined by:\n\n\n d(A, B) =< Ad, Bd >=< Cd(A), Cd(B) >= trace Cd(A ^\n GrB) ^\n Fr , (2)\n r\n\n\nwhere ^\n Fr is the q k upper-left submatrix of the p.s.d m m matrix F\n d d d d r . Note\nthat the local kernel plugs in as the entries of (A ^\n GrB)ij = k(Mrai, Mrbj) where ^\n Gr =\nM M\n r r .\n\nAnother symmetric operator on the tensor product space is via the d-fold symmetric tensor\nspace SymdRn whose points are:\n 1\n x1 xd = x\n d! (1) .... x(d).\n Sd\n\nThe analogue of Cd(A) is the \"d'th power matrix\" Rd(A) whose (i1, ..., id|j1, ..., jd) entry\nhas the value perm(A[i1, ..., id : j1, ..., jd]) and which stands for the map Ad\n\n (A A)(x1 xd) = Ax1 Axd.\n\nIn other words, Rd(A) has n+d-1 rows and k+d-1 columns whose entries are equal to\n d d\nthe dd permanents of A. The analogue of the Binet-Cauchy theorem is Rd(A) Rd(B) =\n\n\f\nRd(A B). The ensuing kernel similarity function, referred to as the \"d-fold symmetric\nkernel\" is:\n\n Symd(A, B) =< Ad, Bd >=< Rd(A), Rd(B) >= trace Rd(A ^\n GrB) ^\n Fr (3)\n r\n\nwhere ^\n Fr is the q+d-1 k+d-1 upper-left submatrix of the positive definite m+d-1 \n d d d\n n+d-1 matrix F\n d r . Due to lack of space we will stop here and spend the remainder of this\nsection in describing in laymen terms what are the properties of these similarity measures,\nhow they can be constructed in practice and in a computationally efficient manner (despite\nthe combinatorial element in their definition).\n\n3.1 Practical Considerations\n\nTo recap, the family of similarity functions sim(A, B) comprise of the linear version\n< A, B > (eqn. 1) and non-linear versions l(A, B), Syml(A, B) (eqns. 2,3) which are\ngroup projections of the general kernel < Ad, Bd >. These different similarity func-\ntions are controlled by the choice of three items: Gr, Fr and the parameter d representing\nthe degree of the tensor product operator. Specifically, we will focus on the case Gr = I\nand on d(A, B) as a representative of the non-linear family. The role of ^\n Gr is fairly in-\nteresting as it can be viewed as a projection operator from \"parts\" to prototypical parts that\ncan be learned from a training set but we leave this to the full length article that will appear\nlater.\n\nPractically, to compute d(A, B) one needs to run over all d d blocks of the k q ma-\ntrix A B (whose entries are k(ai, bj)) and for each block compute the determinant. The\nsimilarity function is a weighted sum of all those determinants weighted by fij. By appro-\npriate selection of F one can control both the complexity (avoid running over all possible\nd d blocks) of the computation and the degree of interaction between the determinants.\nThese determinants have an interesting geometric interpretation if those are computed over\nunitary matrices -- as described next.\n\nLet A = QARA and B = QBRB be the QR factorization of the matrices, i.e., QA has\northonormal columns which span the column space of A, then it has been recently shown\n[14] that R-1 can be computed from A using only operations over k(a\n A i, aj ). Therefore,\nthe product Q Q A BR-1, can be computed using only local\n A B , which is equal to R-T\n A B\nkernel applications. In other words, for each A compute R-1 (can be done using only\n A\ninner-products over columns of A), then when it comes to compute A B compute in-\nstead R-T A BR-1 which is equivalent to computing Q Q\n A B A B . Thus effectively we have\nreplaced every A with QA (unitary matrix).\n\nNow, d(QA, QB) for unitary matrices is the sum over the product of the cosine principal\nangles between d-dim subspaces spanned by columns of A and B. The value of each\ndeterminant of the d d blocks of Q Q\n A B is equal to the product of the cosine principal\nangles between the respective d-dim subspaces determined by corresponding selection of\nd columns from A and d columns from B. For example, the case k = q = d produces\nd(QA, QB) = det(Q Q Q\n A B ) which is the product of the eigenvalues of the matrix QA B .\nThose eigenvalues are the cosine of the principal angles between the column space of A\nand the column space of B [2]. Therefore, det(Q Q\n A B ) measures the \"angle\" between the\ntwo subspaces spanned by the respective columns of the input matrices -- in particular is\ninvariant to the order of the columns. For smaller values of d we obtain the sum over such\nproducts between subspaces spanned by subsets of d columns between A and B.\n\nThe advantage of smaller values of d is two fold: first it enables to compute the similarity\nwhen k = q and second breaks down the similarity between subspaces into smaller pieces.\nThe entries of the matrix F determine which subspaces are being considered and the inter-\naction between subspaces in A and B. A diagonal F compares corresponding subspaces\n\n\f\n (a) (b)\n\nFigure 1: (a) The configuration of the nine sub-regions is displayed over the gradient image. (b)\nsome of the positive examples -- note the large variation in appearance, pose and articulation.\n\n\n\nbetween A and B whereas off-diagonal entries would enable comparisons between differ-\nent choices of subspaces in A and in B. For example, we may want to consider choices\nof d columns arranged in a \"sliding\" fashion, i.e., column sets {1, .., d}, {2, ..., d + 1}, ...\nand so forth, instead of the combinatorial number of all possible choices. This selection\nis associated with a sparse diagonal F where the non-vanishing entries along the diagonal\nhave the value of \"1\" and correspond to the sliding window selections.\n\nTo conclude, in the linear version < A, B > the role of F is to determine the range of\ninteraction between columns of A and columns of B, whereas with the non-linear version\nit is the interaction between d-dim subspaces rather than individual columns. We could\nselect all possible interactions (exponential number) or any reduced interaction set such as\nthe sliding window rule (linear number of choices) as described above.\n\n4 Experiments\n\nWe examined the performance of sim(A, B) on part-based representations for pedestrian\ndetection using SVM for the inference engine. The dataset we used (courtesy of Mobileye\nLtd.) covers a challenging variability of appearance, viewing position and body articulation\n(see Fig. 1). We ran a suit of comparative experiments using sim(A, B) =< A, B >\nwith three versions of F = {I, 11 , decay} with local kernels covering linear, d'th degree\npolynomial (d = 2, 6) and RBF kernel, and likewise with sim(A, B) = d(A, B) with d =\n2, sparse diagonal F (covering a sliding window configuration) and with linear, polynomial\nand RBF local kernels. We compared our results to the conventional down-sampled holistic\nrepresentation where the raw images were down-sampled to size 20 20 and 32 32.\nOur tests also included simulation of occlusions (in the test images) in order to examine\nthe sensitivity of our sim(A, B) family to occlusions. For the local part representation,\nthe input image was divided into 9 fixed regions where for each image local orientation\nstatistics were were generated following [5, 7] with a total of 22 numbers per region (see\nFig 1a), thereby making a 22 9 matrix representation to be fed into sim(A, B). The\nsize of the training set was 4000 split evenly between positive and negative examples and\na test set of 4000 examples was used to evaluate the performance of each trial. The table\nbelow summarizes the accuracy results for the raw-pixel (holistic) representation over three\ntrials: (i) images down-sampled to 20 20, (ii) images down-sampled to 32 32, and\n(iii) test images were partially occluded (32 32 version). The accuracy figures are the\nratio between the sum of the true positives and true negatives and the total number of test\nexamples.\n\n raw linear poly d = 2 poly d = 6 RBF\n 20 20 78% 83% 84% 86%\n 32 32 78% 84% 85% 82%\n occlusion 73.5% 72% 77% 76.5%\n\n\nThe table below displays sim(A, B) with linear and RBF local kernels.\n\n local kernel < A, B >, F = I < A, B >, F = 11 < A, B >, fij = 2-|i-j| 2(A, B)\n linear 90.8% 85% 90.6% 88%\n RBF 91.2% 85% 90.4% 90%\n\n\f\nOne can see that the local part representation provides a sharp increase in accuracy com-\npared to the raw pixel holistic representation. The added power of invariance to order of\nparts induced by < A, B >, F = 11 is not required since the parts are aligned and there-\nfore the accuracy is the highest for the linear combination of local RBF < A, B >, F = I.\nThe same applies for the non-linear version d(A, B) -- the additional invariances that\ncome with a non-linear combination of local parts are apparently not required. The power\nof non-linearity associated with the combination of local parts comes to bear when the\ntest images have occluded parts, i.e., at random one of the columns of the input matrix is\nremoved (or replaced with a random vector), as shown in the table below:\n\n local kernel < A, B >, F = I 2(A, B)\n linear 62% 87%\n RBF 83% 88%\n\n\nOne can notice that a linear combination of local parts suffers from reduced accuracy\nwhereas the non-linear combination maintains a stable accuracy (compare the right-most\ncolumns of the two tables above). Although the experiments above are still preliminary\nthey show the power and potential of the sim(A, B) family of kernels defined over local\nkernels. With the principles laid down in Section 3 one can construct a large number (we\ntouched only a few) of algebraic kernels which combine the local kernels in non-linear\nways thus creating invariances to order and increased performance against occlusion. Fur-\nther research is required for sifting through the various possibilities with this new family\nof kernels and extracting their properties, their invariances and behavior under changing\nparameters (Fr, Gr, d).\n\nReferences\n\n [1] A.C. Aitken. Determinants and Matrices. Interscience Publishers Inc., 4th edition, 1946.\n\n [2] G.H. Golub and C.F. Van Loan. Matrix computations. John Hopkins University Press, 1989.\n\n [3] L. Gurvits. Classical deterministic complexity of edmonds' problem and quantum entangle-\n ment. In ACM Symp. on Theory of Computing, 2003.\n\n [4] R. Kondor and T. Jebara. A kernel between sets of vectors. In International Conference on\n Machine Learning, ICML, 2003.\n\n [5] D.G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of\n Computer Vision, 2004.\n\n [6] C. Schmidt and R. Mohr. Local grey-value invariants for image retrieval. IEEE Transactions\n on Pattern Analysis and Machine Intelligence, 19(5):530535, 1997.\n\n [7] A. Shashua, Y. Gdalyahu, G. Hayun and L. Mann. \"Pedestrian Detection for Driving Assistance\n Systems\". IEEE Intelligent Vehicles Symposium (IV2004), June. 2004, Parma Italy.\n\n [8] B. Scholkopf and A.J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.\n\n [9] G. Shakhnarovich, J.W. Fisher, and T. Darrell. Face recognition from long-term observations.\n In Proceedings of the European Conference on Computer Vision, 2002.\n\n[10] S. Ullman, M. Vidal-Naquet, and E. Sali. Visual features of intermediate complexity and their\n use in classification. Nature Neuroscience, 5(7):16, 2002.\n\n[11] V.N. Vapnik. The nature of statistical learning. Springer, 2nd edition, 1998.\n\n[12] N. Vasconcelos, P. Ho, and P. Moreno. The kullback-leibler kernel as a framework for dis-\n criminant and localized representations for visual recognition. In Proceedings of the European\n Conference on Computer Vision, pages 430441, Prague, Czech Republic, May 2004.\n\n[13] C. Wallraven, B. Caputo, and A. Graf. Recognition with local features: the kernel recipe. In\n Proceedings of the International Conference on Computer Vision, 2003.\n\n[14] L. Wolf and A. Shashua. Learning over sets using kernel principal angles. Journal of Machine\n Learning Research (JMLR), 4(10):913931, 2003.\n\n\f\n", "award": [], "sourceid": 2701, "authors": [{"given_name": "Amnon", "family_name": "Shashua", "institution": null}, {"given_name": "Tamir", "family_name": "Hazan", "institution": null}]}