{"title": "Unsupervised Feature Selection for Accurate Recommendation of High-Dimensional Image Data", "book": "Advances in Neural Information Processing Systems", "page_first": 177, "page_last": 184, "abstract": null, "full_text": "Unsupervised Feature Selection for Accurate\n\nRecommendation of High-Dimensional Image Data\n\nSabri Boutemedjet\n\nDI, Universite de Sherbrooke\n2500 boulevard de l\u2019Universit\u00b4e\n\nSherbrooke, QC J1K 2R1, Canada\n\nsabri.boutemedjet@usherbrooke.ca\n\nDjemel Ziou\n\nDI, Universite de Sherbrooke\n2500 boulevard de l\u2019Universit\u00b4e\n\nSherbrooke, QC J1K 2R1, Canada\n\ndjemel.ziou@usherbrooke.ca\n\nNizar Bouguila\n\nCIISE, Concordia University\n1515 Ste-Catherine Street West\nMontreal, QC H3G 1T7, Canada\n\nbouguila@ciise.concordia.ca\n\nAbstract\n\nContent-based image suggestion (CBIS) targets the recommendation of products\nbased on user preferences on the visual content of images. In this paper, we mo-\ntivate both feature selection and model order identi\ufb01cation as two key issues for\na successful CBIS. We propose a generative model in which the visual features\nand users are clustered into separate classes. We identify the number of both user\nand image classes with the simultaneous selection of relevant visual features us-\ning the message length approach. The goal is to ensure an accurate prediction\nof ratings for multidimensional non-Gaussian and continuous image descriptors.\nExperiments on a collected data have demonstrated the merits of our approach.\n\n1 Introduction\n\nProducts in today\u2019s e-market are described using both visual and textual information. From con-\nsumer psychology, the visual information has been recognized as an important factor that in\ufb02uences\nthe consumer\u2019s decision making and has an important power of persuasion [4]. Furthermore, it is\nwell recognized that the consumer choice is also in\ufb02uenced by the external environment or context\nsuch as the time and location [4]. For example, a consumer could express an information need\nduring a travel that is different from the situation when she or he is working or even at home.\n\u201cContent-Based Image Suggestion\u201d (CBIS) [4] motivates the modeling of user preferences with\nrespect to visual information under the in\ufb02uence of the context. Therefore, CBIS aims at the sug-\ngestion of products whose relevance is inferred from the history of users in different contexts on\nimages of the previously consumed products. The domains considered by CBIS are a set of users\nU = {1, 2, . . . , Nu}, a set of visual documents V = {(cid:2)v1, (cid:2)v2, . . . , (cid:2)vNv}, and a set of possible con-\ntexts E = {1, 2, . . . , Ne}. Each (cid:2)vk is an arbitrary descriptor (visual, textual, or categorical) used\nto represent images or products. In this work, we consider an image as a D-dimensional vector\n(cid:2)v = (v1, v2, . . . , vD). The visual features may be local such as interest points or global such as\ncolor, texture, or shape. The relevance is expressed explicitly on an ordered voting (or rating) scale\nde\ufb01ned as R = {r1, r2, . . . , rNr}. For example, the \ufb01ve star scale (i.e. Nr = 5) used by Amazon al-\nlows consumers to give different degrees of appreciation. The history of each user u \u2208 U, is de\ufb01ned\nas Du = {< u, e(j), (cid:2)v(j), r(j) > |e(j) \u2208 E, (cid:2)v(j) \u2208 V, r(j) \u2208 R, j = 1, . . . ,|Du|}.\n\n\fFigure 1: The VCC-FMM identi\ufb01es like-mindedness from similar appreciations on similar images\nrepresented in 3-dimensional space. Notice the inter-relation between the number of image clusters\nand the considered feature subset.\n\nIn literature, the modeling of user preferences has been addressed mainly within collaborative \ufb01l-\ntering (CF) and content-based \ufb01ltering (CBF) communities. On the one hand, CBF approaches [12]\nbuild a separate model of \u201cliked\u201d and \u201cdisliked\u201d discrete data (word features) from each D u taken\nindividually. On the other hand, CF approaches predict the relevance of a given product for a given\nuser based on the preferences provided by a set of \u201clike-minded\u201d (similar tastes) users. The data set\nu=1Du) which is discrete since each product is represented\nused by CF is the user-product matrix (\u222a Nu\nby a categorical index. The Aspect model [7] and the \ufb02exible mixture model (FMM) [15] are exam-\nples of some model-based CF approaches. Recently, the authors in [4] have proposed a statistical\nmodel for CBIS which uses both visual and contextual information in modeling user preferences\nwith respect to multidimensional non Gaussian and continuous data. Users with similar preferences\nare considered in [4] as those who appreciated with similar degrees similar images. Therefore, in-\nstead of considering products as categorical variables (CF), visual documents are represented by\na richer visual information in the form of a vector of visual features (texture, shape, and interest\npoints). The similarity between images and between user preferences is modeled in [4] through a\nsingle graphical model which clusters users and images separately into homogeneous groups in a\nsimilar way to the \ufb02exible mixture model (FMM) [15]. In addition, since image data are generally\nnon-Gaussian [1], class-conditional distributions of visual features are assumed Dirichlet densities.\nBy this way, the like-mindedness in user preferences is captured at the level of visual features.\n\nStatistical models for CBIS are useful tools in modeling for many reasons. First, once the model is\nlearned from training data (union of user histories), it can be used to \u201csuggest\u201d unknown (possibly\nunrated) images ef\ufb01ciently i.e. few effort is required at the prediction phase. Second, the model can\nbe updated from new data (images or ratings) in an online fashion in order to handle the changes in\neither image clusters and/or user preferences. Third, model selection approaches can be employed\nto identify \u201cwithout supervision\u201d both numbers of user preferences and image clusters (i.e. model\norder) from the statistical properties of the data. It should be stressed that the unsupervised selection\nof the model order was not addressed in CF/CBF literature. Indeed, the model order in many well-\n\n\ffounded statistical models such as the Aspect model [7] or FMM [15] was set \u201cempirically\u201d as a\ncompromise between the model\u2019s complexity and the accuracy of prediction, but not from the data.\n\nFrom an \u201cimage collection modeling\u201d point of view, the work in [4] has focused on modeling user\npreferences with respect to non-Gaussian image data. However, since CBIS employs generally high-\ndimensional image descriptors, then the problem of modeling accurately image collections needs to\nbe addressed in order to overcome the curse of dimensionality and provide accurate suggestions.\nIndeed, the presence of many irrelevant features degrades substantially the performance of the mod-\neling and prediction [6] in addition to the increase of the computational complexity. To achieve a\nbetter modeling, we consider feature selection and extraction as another \u201ckey issue\u201d for CBIS. In\nliterature [6], the process of feature selection in mixture models have not received as much attention\nas in supervised learning. The main reason is the absence of class labels that may guide the selection\nprocess [6]. In this paper, we address the issue of feature selection in CBIS through a new generative\nmodel which we call Visual Content Context-aware Flexible Mixture Model (VCC-FMM). Due to\nthe problem of the inter-relation between feature subsets and the model order i.e. different feature\nsubsets correspond to different natural groupings of images, we propose to learn the VCC-FMM\nfrom unlabeled data using the Minimum Message Length (MML) approach [16]. The next Section\ndetails the VCC-FMM model with an integrated feature selection. After that, we discuss the identi\ufb01-\ncation of the model order using the MML approach in Section 3. Experimental results are presented\nin Section 4. Finally, we conclude this paper by a summary of the work.\n\n2 The Visual Content Context Flexible Mixture Model\nThe data set D used to learn a CBIS system is the union of all user histories i.e. D = \u222a u\u2208UDu. From\nthis data set we model both like-mindedness shared by user groups as well as the visual and semantic\nsimilarity between images [4]. For that end, we introduce two latent variables z and c to label each\nobservation < u, e, (cid:2)v, r > with information about user classes and image classes, respectively.\n(cid:2)\nIn order to make predictions on unseen images, we need to model the joint event p((cid:2)v, r, u, e) =\nz,c p((cid:2)v, r, u, e, z, c). Then, the rating r for a given user u, context e and a visual document (cid:2)v can be\npredicted on the basis of probabilities p(r|u, e, v) that can be derived by conditioning the generative\nmodel p(u, e, v, r). We notice that the full factorization of p((cid:2)v, r, u, e, z, c) using the chain rule\nleads to quantities with a huge number of parameters which are dif\ufb01cult to interpret in terms of the\ndata [4]. To overcome this problem, we make use of some conditional independence assumptions\nthat constitute our statistical approximation of the joint event p((cid:2)v, r, u, e). These assumptions are\nillustrated by the graphical representation of the model in \ufb01gure 2. Let K and M be the number of\nuser classes and images classes respectively, an initial model for CBIS can be derived as [4]:\n\np(z)p(c)p(u|z)p(e|z)p((cid:2)v|c)p(r|z, c)\n\nc=1\n\nz=1\n\np((cid:2)v, r, u, e) =\n\n(1)\nThe quantities p(z) and p(c) denote the a priori weights of user and image classes. p(u|z) and p(e|z)\ndenote the likelihood of a user and context to belong respectively to the user\u2019s class z. p(r|z, c) is the\nprobability to sample a rating for a given user class and image class. All these quantities are modeled\nfrom discrete data. On the other hand, image descriptors are high-dimensional, continuous and\ngenerally non Gaussian data [1]. Thus, the distribution of class-conditional densities p((cid:2)v|c) should\nbe modeled carefully in order to capture ef\ufb01ciently the added-value of the visual information. In this\nwork, we assume that p((cid:2)v|c) is a Generalized Dirichlet distribution (GDD) which is more appropriate\nthan other distributions such as the Gaussian or Dirichlet distributions in modeling image collections\n[1]. This distribution has a more general covariance structure and provides multiple shapes. The\ndistribution of the c-th component (cid:2)\u0398\u2217\nsuperscript is used to denote\nc is given by equation (2). The\nthe unknown true GDD distribution.\nD(cid:4)\n\n\u2217\n\nK(cid:3)\n\nM(cid:3)\n\np((cid:2)v|(cid:2)\u0398\n\n\u2217\nc ) =\n\n\u0393(\u03b1\u2217\n\u0393(\u03b1\u2217\n\ncl + \u03b2\u2217\ncl)\ncl)\u0393(\u03b2\u2217\ncl)\n\n(cid:2)D\nl=l vl < 1 and 0 < vl < 1 for l = 1, . . . , D. \u03b3\u2217\nD = \u03b2\u2217\n\ncl+1 for l = 1, . . . , D\u22121\nwhere\ncD, \u03b2\u2217\nand \u03b3\u2217\ncD). From the math-\nematical properties of the GDD, we can transform using a geometric transformation the data point\n\nD \u2212 1. In equation (2) we have set (cid:2)\u0398\u2217\n\ncl = \u03b2\u2217\nc = (\u03b1\u2217\n\nk=1\n\nl=1\n\n(2)\n\nv\u03b1\u2217\n\ncl\u22121\n\nl\n\ncl\n\n\u03b3\u2217\n\nvk)\n\n(1 \u2212 l(cid:3)\ncl+1\u2212\u03b2\u2217\ncl\u2212\u03b1\u2217\nc1, \u03b2\u2217\nc1, . . . , \u03b1\u2217\n\n\fFigure 2: Graphical representation of VCC-FMM.\n\nc) =\n\ncl, \u03b2\u2217\n\ncl = (\u03b1\u2217\n\n(cid:5)D\nl=1 pb(xl|\u03b8\u2217\n\ncl) which leads to the fact p((cid:2)x|(cid:2)\u0398\u2217\n\n(cid:2)v into another data point (cid:2)x = (x1, . . . , xD) with independent features without loss of information\n[1]. In addition, each xl of (cid:2)x generated by the c-th component, follows a Beta distribution p b(.|\u03b8\u2217\ncl)\nwith parameters \u03b8\u2217\ncl). The indepen-\ndence between xl makes the estimation of a GDD very ef\ufb01cient i.e. D estimations of univariate Beta\ndistributions without loss of accuracy. However, even with independent features, the unsupervised\nidenti\ufb01cation of image clusters based on high-dimensional descriptors remains a hard problem due\nto the omnipresence of noisy, redundant and uninformative features [6] that degrade the accuracy of\nthe modeling and prediction. We consider feature selection and extraction as a \u201ckey\u201d methodology\nin order to remove that kind of features in our modeling. Since x l are independent, then we can\nextract \u201crelevant\u201d features in the representation space X . However, we need some de\ufb01nition of\nfeature\u2019s relevance. From \ufb01gure 1, four well-separated image clusters can be identi\ufb01ed from only\ntwo relevant features 1 and 2 which are multimodal and in\ufb02uenced by class labels. On the other\nhand, feature 3 is unimodal (i.e. irrelevant) and can be approximated by a single Beta distribution\npb(.|\u03bel) common to all components. This de\ufb01nition of feature\u2019s relevance has been motivated in\nunsupervised learning [2][9]. Let (cid:2)\u03c6 = (\u03c61, . . . , \u03c6D) be a set of missing binary variables denoting\nthe relevance of all features. \u03c6l is set to 1 when the l-th feature is relevant and 0 otherwise. The\n\u201ctrue\u201d Beta distribution \u03b8\u2217\n\ncl, \u03c6l) (cid:4) (cid:6)\n(3)\nBy considering each \u03c6l as Bernoulli variable with parameters p(\u03c6l = 1) = \u0001l1 and p(\u03c6l = 0) = \u0001l2\n(\u0001l1 + \u0001l2 = 1) then, the distribution p(xl|\u03b8\u2217\ncl) can be obtained after marginalizing over \u03c6 l [9] as:\np(xl|\u03b8\u2217\ncl) (cid:4) \u0001l1pb(xl|\u03b8cl) + \u0001l2 pb(xl|\u03bel). The VCC-FMM model is given by equation (4). We notice\nthat both models [3] [4] are special cases of VCC-FMM.\n\ncl can be approximated as [2][9]:\np(xl|\u03b8\u2217\n\n(cid:7)\u03c6l\npb(xl|\u03b8cl)\n\npb(xl|\u03bel)\n\n(cid:7)1\u2212\u03c6l\n\n(cid:6)\n\np(z)p(u|z)p(e|z)p(c)p(r|z, c)\n\n[\u0001l1pb(xl|\u03b8cl) + \u0001l2pb(xl|\u03bel)]\n\n(4)\n\nz=1\n\nc=1\n\nl=1\n\n3 A Uni\ufb01ed Objective for Model and Feature Selection using MML\n\na \u03b8A\n\n(cid:2)\n\ncl, \u03b2\u03b8\n\ncl) and \u03bel = (\u03b1\u03be\n\nWe denote by (cid:2)\u03b8A\n\u03c0 the parameter vector of the multinomial distribution of any discrete variable A\nconditioned on its parent \u03a0 of VCC-FMM (see \ufb01gure 2). We have A| \u03a0=\u03c0 \u223c M ulti(1; (cid:2)\u03b8A\n\u03c0 ) where\n(cid:6)\n\u03c0a = p(A = a|\u03a0 = \u03c0) and\n\u03b8A\n\u03c0a = 1. Also, we employ the superscripts \u03b8 and \u03be to denote the\n(cid:7)\ni.e. \u03b8cl =\nparameters of the Beta distribution of relevant and irrelevant components, respectively\nzc, (cid:2)\u03b8\u03c6l,\nz , (cid:2)\u03b8R\nz , (cid:2)\u03b8E\n. The set \u0398 of all VCC-FMM parameters is de\ufb01ned by (cid:2)\u03b8U\nl , \u03b2\u03be\n(\u03b1\u03b8\nl )\n(cid:2)\u03b8Z , (cid:2)\u03b8C and \u03b8cl, \u03bel. The log-likelihood of a data set of N independent and identically distributed\nobservations D = {< u(i), e(i), (cid:2)x(i), r(i) > |i = 1, . . . , N, u(i) \u2208 U, e(i) \u2208 E, (cid:2)x(i) \u2208 X , r(i) \u2208 R}\nis given by:\nlog p(D|\u0398) =\n\np(z)p(c)p(u(i)|z)p(e(i)|z)p(r(i)|z, c)\n\n|\u03b8cl) + \u0001l2 pb(x(i)\n\n[\u0001l1 pb(x(i)\n\nN(cid:3)\n\nM(cid:3)\n\nK(cid:3)\n\n|\u03bel)]\n\nD(cid:4)\n\nlog\n\nl\n\nl\n\ni=1\n\nz=1\n\nc=1\n\nl=1\n\n(5)\n\nK(cid:3)\n\nM(cid:3)\n\np((cid:2)x, r, u, e) =\n\nD(cid:4)\n\n\fThe maximum likelihood (ML) approach which optimizes equation (5) w.r.t \u0398 is not appropriate\nfor learning VCC-FMM since both K and M are unknown. In addition, the likelihood increases\nmonotonically with the number of components and favors lower dimensions [5]. To overcome these\nproblems, we de\ufb01ne a message length objective [16] for both the estimation of \u0398 and identi\ufb01cation\nof K and M using MML [9][2]. This objective incorporates in addition to the log-likelihood, a\npenalty term which encodes the data to penalize complex models as:\n\ns\n2\n\n1\n2\n\n1\n12\n\n) \u2212 log p(D|\u0398)\n\nM M L(K, M) = \u2212 log p(\u0398) +\n\nlog |I(\u0398)| +\n\n(1 + log\n\n(cid:6)\n\n\u03c0 )| =\n\n(cid:5)NA\n\nN p(\u03a0 = \u03c0)\n\n(cid:7)NA\u22121/\n\n(6)\nIn equation (6), |I(\u0398)|, p(\u0398), and s denote the Fisher information, prior distribution and the to-\ntal number of parameters, respectively. The Fisher information of a parameter is the expectation\nof the second derivatives with respect to the parameter of the minus log-likelihood.\nIt is com-\nmon sense to assume an independence among the different groups of parameters which factor-\nizes both |I(\u0398)| and p(\u0398) over the Fisher and prior distribution of different groups of parame-\nters, respectively. We approximate the Fisher information of the VCC-FMM from the complete\nlikelihood which assumes the knowledge about the values of hidden variables for each observation\n< u(i), e(i), (cid:2)x(i), r(i) >\u2208 D. The Fisher information of \u03b8cl and \u03bel can be computed by following a\nsimilar methodology of [1]. Also, we use the result found in [8] in computing the Fisher information\n\u03c0 )| is\nof (cid:2)\u03b8A\ngiven by |I((cid:2)\u03b8A\n\u03c0a [8], where p(\u03a0 = \u03c0) is the marginal prob-\nability of the parent \u03a0. The graphical representation of of VCC-FMM does not involve variable\nancestors (parents of parents). Therefore, the marginal probabilities p(\u03a0 = \u03c0) are simply the pa-\nrameters of the multinomial distribution of the parent variable. For example, |I( (cid:2)\u03b8R\nzc)| is computed\nas: |I((cid:2)\u03b8R\nzcr. In case of complete ignorance, it is common to\nemploy the Jeffrey\u2019s prior for different groups of parameters. Replacing p(\u0398) and I(\u0398) in (6), and\nafter discarding the \ufb01rst order terms, the MML objective is given by:\n\n\u03c0 of a discrete variable A with NA different values in a data set of N observations. |I((cid:2)\u03b8A\n\n(cid:2)D\nl=1 log \u0001l2 + 1\nc \u2212 log p(D|\u0398)\n(7)\np = Nr + Nu + Ne \u2212 3. For\nwith Np = 2D(M + 1) + K(Nu + Ne \u2212 2) + M K(Nr \u2212 1) and N Z\n\ufb01xed values of K, M and D, the minimization of MML objective with respect to \u0398 is equivalent to\na maximum a posteriori (MAP) estimate with the following improper Dirichlet priors [9]:\n\u0001\u2212M\n\np(\u00011, . . . , \u0001D) \u221d D(cid:4)\n\n(cid:5)Nr\nr=1 \u03b8R\n(cid:2)D\n(cid:2)M\nl=1 log \u0001l1 +\nc=1 log \u03b8C\n\n2 log N + M\n2(Nr \u2212 1)\n+ 1\n\np((cid:2)\u03b8Z) \u221d K(cid:4)\n\np((cid:2)\u03b8C) \u221d M(cid:4)\n\n(cid:2)K\nz=1 log \u03b8Z\nz\n\nM M L(K, M) = Np\n\nN Nr\u22121(\u03b8C\n\nz )Nr\u22121\n\nzc)| =\n\n\u2212 N Z\n2 ,\n\na=1 \u03b8A\n\n\u2212 Nr\u22121\n\n2 N Z\np\n\n\u0001\u22121\n\nc \u03b8Z\n\n(\u03b8C\nc )\n\n(\u03b8Z\nz )\n\n(8)\n\n(cid:8)\n\n(cid:9)\n\n/\n\n,\n\n2\n\np\n\nl1\n\nl2\n\nc=1\n\nz=1\n\nl=1\n\n3.1 Estimation of parameters\n\n(cid:5)\n\nWe optimize the MML of the data set using the Expectation-Maximization (EM) algorithm in order\nto estimate the parameters. In the E-step, the joint posterior probabilities of the latent variables given\nthe observations are computed as Q zci = p(z, c|u(i), e(i), (cid:2)x(i), r(i), \u02c6\u0398):\n\nl\n\nz,c\n\nzcr(i)\n\nQzci =\n\nze(i) \u02c6\u03b8R\n\nzu(i) \u02c6\u03b8E\n\nzu(i) \u02c6\u03b8E\nc \u02c6\u03b8U\n\n(cid:5)\nl(\u0001l1 p(x(i)\nl(\u0001l1p(x(i)\n\n|\u02c6\u03b8cl) + \u0001l2p(x(i)\n|\u02c6\u03bel))\n|\u02c6\u03b8cl) + \u0001l2 p(x(i)\n(cid:10)(cid:2)\nIn the M-step, the parameters are updated using the following equations:\n(cid:2)\nz Qzci \u2212 Nr\u22121\n(cid:10)(cid:2)\n, 0\n(cid:2)\nz Qzci \u2212 Nr\u22121\n(cid:2)\n\n(cid:2)\n\u02c6\u03b8Z\nc \u02c6\u03b8U\nz \u02c6\u03b8C\n\u02c6\u03b8Z\nz \u02c6\u03b8C\n(cid:10)(cid:2)\n(cid:2)\nc Qzci \u2212 N Z\n(cid:10)(cid:2)\n, 0\n(cid:2)\nc Qzci \u2212 N Z\n(cid:2)\n\nze(i) \u02c6\u03b8R\n(cid:11)\n\n(cid:11) ,\n(cid:2)\n\n(cid:2)\n(cid:2)\n\n|\u02c6\u03bel))\n\n\u02c6\u03b8C\nc =\n\n\u02c6\u03b8Z\nz =\n\nz max\n\nc max\n\n(cid:2)\n\n(cid:2)\n\nzcr(i)\n\nmax\n\nmax\n\n, 0\n\np\n2\n\np\n2\n\n2\n\n2\n\nl\n\nl\n\nl\n\ni\n\ni\n\ni\n\ni\n\n(cid:11)\n\n, 0\n\n(cid:11)\n\n\u02c6\u03b8E\nze =\n\n\u02c6\u03b8U\nzu =\n\ni:u(i)=u\n\nN \u02c6\u03b8Z\nz\n\n1\n\u0001l1\n\n= 1 +\n\nc Qzci\n\nmax\n\nmax\n\n,\n\n(cid:6)(cid:2)\n(cid:6)(cid:2)\n\ni:e(i)=e\n\nc Qzci\n\nN \u02c6\u03b8Z\nz\n\n\u02c6\u03b8R\nzcr =\n\nl\n\n|\u03bel)\nQzci\u0001l2 pb(x(i)\n|\u03b8cl)+\u0001l2 pb(x(i)\nQzci\u0001l1 pb(Xil|\u03b8cl)\n|\u03b8cl)+\u0001l2 pb(x(i)\n\nl\n\nl\n\n|\u03bel)\n\n|\u03bel)\n\ni Qzci\n\n(cid:2)\ni:r(i)=r Qzci\n(cid:7)\n\u2212 1, 0\n(cid:7)\n\u2212 M, 0\n\nz,c,i\n\n\u0001l1 pb(x(i)\n\nl\n\nz,c,i\n\n\u0001l1 pb(x(i)\n\nl\n\n(9)\n\n(10)\n\n(11)\n\n(12)\n\n\fThe parameters of Beta distributions \u03b8cl and \u03bel are updated using the Fisher scoring method based\non the \ufb01rst and second order derivatives of the MML objective [1].\n\n4 Experiments\n\nThe bene\ufb01ts of using feature selection and the contextual information are evaluated by considering\ntwo variants: V-FMM and V-GD-FMM in addition the original VCC-FMM given by equation (4).\nze constant for all e \u2208 E. On the\nV-FMM does not handle the contextual information and assumes \u03b8 E\nother hand, feature selection is not considered for V-GD-FMM by setting \u0001 l1 = 1 and pruning the\nuninformative components \u03be l for l = 1, . . . , D.\n\n4.1 Data Set\n\nWe have collected ratings from 27 subjects who participated in the experiment (i.e. N u = 27) dur-\ning a period of three months. The participating subjects are graduate students in faculty of science.\nSubjects received periodically (twice a day) a list of three images on which they assign relevance\ndegrees expressed on a \ufb01ve star rating scale (i.e. N r = 5). We de\ufb01ne the context as a combination of\ntwo attributes: location L = {in\u2212 campus, out\u2212 campus} inferred from the Internet Protocol (IP)\naddress of the subject, and time as T = (weekday, weekend) i.e Ne = 4. A data set D of 13446\nratings is collected (N = 13446). We have used a collection of 4775 (i.e. N v = 4775) images col-\nlected from Washington University [10] and collections of free photographs which we categorized\nmanually into 41 categories. For visual content characterization, we have employed both local and\nglobal descriptors. For local descriptors, we use the 128-dimensional Scale Invariant Feature Trans-\nform (SIFT) [11] to represent image patches. We employ vector quantization to SIFT descriptors\nand we build a histogram for each image (\u201cbag of visual words\u201d). The size of the visual vocabulary\nis 500. For global descriptors, we used the color correlogram for image texture representation, and\nthe edge histogram descriptor. Therefore, a visual feature vector is represented in a 540-dimensional\nspace (D = 540). We measure the accuracy of the prediction by the Mean Absolute Error (MAE)\nwhich is the average of the absolute deviation between the actual and predicted ratings.\n\n4.2 First Experiment: Evaluating the in\ufb02uence of model order on the prediction accuracy\n\nThis experiment tries to investigate the relationship between the assumed model order de\ufb01ned by K\nand M on the prediction accuracy of VCC-FMM. It should be noticed that the ground truth number\nis not known for our data set D. We run this experiment on a ground truth\nof user classes K\u2217\n(arti\ufb01cial) data with known K and M . DGT is sampled from the preferences P1 and P2 of two\nmost dissimilar subjects according to Pearson correlation coef\ufb01cients [14]. We sample ratings for\n100 simulated users from the preferences P 1 and P2 only on images of four image classes. For each\nuser, we generate 80 ratings (\u223c 20 ratings per context). Therefore, the ground truth model order is\nK\u2217 = 2 and M \u2217 = 4. The choice of M \u2217\nis purely motivated by convenience of presentation since\nsimilar performance was reported for higher values of M \u2217\n. We learn the VCC-FMM model using\none half of DGT for different choices of training and validation data. The model order de\ufb01ned by\nM = 15 and K = 15 is used to initialize EM algorithm.\nFigure 3(a) shows that both K and M have been identi\ufb01ed correctly on D GT since the lowest MML\nwas reported for the model order de\ufb01ned by M = 4 and K = 2. The selection of the best model\norder is important since it in\ufb02uences the accuracy of the prediction (MAE) as illustrated by Figure\n3(b). It should be noticed that the over-estimation of M (M > M \u2217\n) leads to more errors than the\nover-estimation of K (K > K \u2217\n\n).\n\n4.3 Second Experiment: Comparison with state-of-the-art\n\nThe aim of this experiment is to measure the contribution of the visual information and the user\u2019s\ncontext in making accurate predictions comparatively with some existing CF approaches. We make\ncomparisons with the Aspect model [7], Pearson Correlation (PCC)[14], Flexible Mixture Model\n(FMM) [15], and User Rating Pro\ufb01le (URP) [13]. For accurate estimators, we learn the URP model\nusing Gibs sampling. We retained for the previous algorithms, the model order that ensured the\nlowest MAE.\n\n\f(a) MML\n\n(b) MAE\n\nFigure 3: MML and MAE curves for different model orders on D GT .\n\nTable 1: Averaged MAE over 10 runs of the different algorithms on D\nV-FMM V-GD-FMM VCC-FMM\nPCC(baseline) Aspect\n0.890\n1.201\n0.051\n0.034\n9.49% 13.71% 15.90% 32.94%\n\n0.754\n0.027\n43.18%\n\n0.646\n0.014\n55.84%\n\nFMM\n1.145\n0.036\n\nURP\n1.116\n0.042\n\nAvg MAE\nDeviation\n\nImprovement\n\n1.327\n0.040\n0.00%\n\nThe \ufb01rst \ufb01ve columns of table 1 show the added value provided by the visual information compara-\ntively with pure CF techniques. For example, the improvement in the rating\u2019s prediction reported by\nV-FMM is 3.52% and 1.97% comparatively with FMM and URP, respectively. The algorithms (with\ncontext information) shown in the last two columns have also improved the accuracy of the predic-\ntion comparatively with the others (at least 15.28%). This explains the importance of the contextual\ninformation on user preferences. Feature selection is also important since VCC-FMM has reported\na better accuracy (14.45%) than V-GD-FMM. Furthermore, it is reported in \ufb01gure 4(a) that VCC-\nFMM is less sensitive to data sparsity (number of ratings per user) than pure CF techniques. Finally,\nthe evolution of the average MAE provided VCC-FMM for different proportions of unrated images\nremains under < 25% for up to 30% of unrated images as shown in Figure 4(b). We explain the\nstability of the accuracy of VCC-FMM for data sparsity and new images by the visual information\nsince only cluster representatives need to be rated.\n\n(a) Data sparsity\n\n(b) new images\n\nFigure 4: MAE curves with error bars on the data set D.\n\n\f5 Conclusions\n\nThis paper has motivated theoretically and empirically the importance of both feature selection and\nmodel order identi\ufb01cation from unlabeled data as important issues in content-based image sugges-\ntion. Experiments on collected data showed also the importance of the visual information and the\nuser\u2019s context in making accurate suggestions.\n\nAcknowledgements\n\nThe completion of this research was made possible thanks to Natural Sciences and Engineering Re-\nsearch Council of Canada (NSERC), Bell Canada\u2019s support through its Bell University Laboratories\nR&D program and a start-up grant from Concordia University.\n\nReferences\n\n[1] N. Bouguila and D. Ziou. High-Dimensional Unsupervised Selection and Estimation of a Finite Gen-\neralized Dirichlet Mixture Model Based on Minimum Message Length. IEEE Transactions on Pattern\nAnalysis and Machine Intelligence, 29(10):1716\u20131731, 2007.\n\n[2] S. Boutemedjet, N. Bouguila, and D. Ziou. Unsupervised Feature and Model Selection for Generalized\nIn Proc. of International Conference on Image Analysis and Recognition\n\nDirichlet Mixture Models.\n(ICIAR), pages 330\u2013341. LNCS 4633, 2007.\n\n[3] S. Boutemedjet and D. Ziou. Content-based Collaborative Filtering Model for Scalable Visual Document\nRecommendation. In Proc. of IJCAI-2007 Workshop on Multimodal Information Retrieval, pages 11\u201318,\n2007.\n\n[4] S. Boutemedjet and D. Ziou. A Graphical Model for Context-Aware Visual Content Recommendation.\n\nIEEE Transactions on Multimedia, 10(1):52\u201362, 2008.\n\n[5] J. G. Dy and C. E. Brodley. Feature Selection for Unsupervised Learning. Journal of Machine Learning\n\nResearch, 5:845\u2013889, 2004.\n\n[6] I. Guyon and A. Elisseeff. An Introduction to Variable and Feature Selection. Journal of Machine\n\nLearning Research, 3:1157\u20131182, 2003.\n\n[7] T. Hofmann. Latent Semantic Models for Collaborative Filtering. ACM Transactions on Information\n\nSystems, 22(1):89\u2013115, 2004.\n\n[8] P. Kontkanen, P. Myllymki, T. Silander, H. Tirri, and P. Grnwald. On Predictive Distributions and\n\nBayesian Networks. Statistics and Computing, 10(1):39\u201354, 2000.\n\n[9] M. H. C. Law, M.A.T. Figueiredo, and A. K. Jain. Simultaneous Feature Selection and Clustering Using\n\nMixture Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 2004.\n\n[10] J. Li and J. Z. Wang. Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach.\n\nIEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):49\u201368, 2003.\n\n[11] D.G. Lowe. Distinctive Image Features From Scale-Invariant Keypoints. International Journal of Com-\n\nputer Vision, 60(2):91\u2013110, 2004.\n\n[12] J. Muramastsu M. Pazzani and D. Billsus. Syskill and Webert:Identifying Interesting Web Sites. In In\n\nProc. of the 13th National Conference on Arti\ufb01cial Intelligence (AAAI), 1996.\n\n[13] B. Marlin. Modeling User Rating Pro\ufb01les For Collaborative Filtering. In Proc. of Advances in Neural\n\nInformation Processing Systems 16 (NIPS), 2003.\n\n[14] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: An Open Architecture for\nCollaborative Filtering of Netnews. In Proc. of ACM Conference on Computer Supported Cooperative\nWork, 1994.\n\n[15] L. Si and R. Jin. Flexible Mixture Model for Collaborative Filtering.\n\nConference on Machine Learning (ICML), pages 704\u2013711, 2003.\n\nIn Proc. of 20th International\n\n[16] C. Wallace. Statistical and Inductive Inference by Minimum Message Length. Information Science and\n\nStatistics. Springer, 2005.\n\n\f", "award": [], "sourceid": 3267, "authors": [{"given_name": "Sabri", "family_name": "Boutemedjet", "institution": null}, {"given_name": "Djemel", "family_name": "Ziou", "institution": null}, {"given_name": "Nizar", "family_name": "Bouguila", "institution": null}]}