{"title": "Spatial Latent Dirichlet Allocation", "book": "Advances in Neural Information Processing Systems", "page_first": 1577, "page_last": 1584, "abstract": "In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely appled in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a ``bag-of-words''. It is also critical to properly design ``words'' and \u201cdocuments\u201d when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems. The spatial information is not encoded in the value of visual words but in the design of documents. Instead of knowing the partition of words into documents \\textit{a priori}, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA.", "full_text": "Spatial Latent Dirichlet Allocation\n\nXiaogang Wang and Eric Grimson\n\nComputer Science and Arti\ufb01cial Intelligence Lab\n\nMassachusetts Institute of Technology, Cambridge, MA, 02139, USA\n\nxgwang@csail.mit.edu, welg@csail.mit.edu\n\nAbstract\n\nIn recent years, the language model Latent Dirichlet Allocation (LDA), which\nclusters co-occurring words into topics, has been widely applied in the computer\nvision \ufb01eld. However, many of these applications have dif\ufb01culty with modeling\nthe spatial and temporal structure among visual words, since LDA assumes that a\ndocument is a \u201cbag-of-words\u201d. It is also critical to properly design \u201cwords\u201d and\n\u201cdocuments\u201d when using a language model to solve vision problems. In this pa-\nper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which\nbetter encodes spatial structures among visual words that are essential for solving\nmany vision problems. The spatial information is not encoded in the values of\nvisual words but in the design of documents. Instead of knowing the partition of\nwords into documents a priori, the word-document assignment becomes a random\nhidden variable in SLDA. There is a generative procedure, where knowledge of\nspatial structure can be \ufb02exibly added as a prior, grouping visual words which are\nclose in space into the same document. We use SLDA to discover objects from a\ncollection of images, and show it achieves better performance than LDA.\n\n1 Introduction\n\nLatent Dirichlet Allocation (LDA) [1] is a language model which clusters co-occurring words into\ntopics. In recent years, LDA has been widely used to solve computer vision problems. For example,\nLDA was used to discover objects from a collection of images [2, 3, 4] and to classify images into\ndifferent scene categories [5]. [6] employed LDA to classify human actions. In visual surveillance,\nLDA was used to model atomic activities and interactions in a crowded scene [7]. In these ap-\nplications, LDA clustered low-level visual words (which were image patches, spatial and temporal\ninterest points or moving pixels) into topics with semantic meanings (which corresponded to objects,\nparts of objects, human actions or atomic activities) utilizing their co-occurrence information.\nEven with these promising achievements, however, directly borrowing a language model to solve\nvision problems has some dif\ufb01culties. First, LDA assumes that a document is a bag of words,\nsuch that spatial and temporal structures among visual words, which are meaningless in a language\nmodel but important in many computer vision problems, are ignored. Second, users need to de\ufb01ne\nthe meaning of \u201cdocuments\u201d in vision problems. The design of documents often implies some\nassumptions on vision problems. For example, in order to cluster image patches, which are treated\nas words, into classes of objects, researchers treated images as documents [2]. This assumes that\nif two types of patches are from the same object class, they often appear in the same images. This\nassumption is reasonable, but not strong enough. As an example shown in Figure 1, even though\nsky is far from vehicles, if they often exist in the same images in some data set, they would be\nclustered into the same topic by LDA. Furthermore, since in this image most of the patches are sky\nand building, a patch on a vehicle is likely to be labeled as building or sky as well. These problems\ncould be solved if the document of a patch, such as the yellow patch in Figure 1, only includes other\n\n1\n\n\fFigure 1: There will be some problems (see text) if the whole image is treated as one document\nwhen using LDA to discover classes of objects.\n\n[9], and the correlated topic model\n\npatches falling within its neighborhood, marked by the red dashed window in Figure 1, instead of\nthe whole image. So a better assumption is that if two types of image patches are from the same\nobject class, they are not only often in the same images but also close in space. We expect to utilize\nspatial information in a \ufb02exible way when designing documents for solving vision problems.\nIn this paper, we propose a Spatial Latent Dirichlet Allocation (SLDA) model which encodes the\nspatial structure among visual words. It clusters visual words (e.g. an eye patch and a nose patch),\nwhich often occur in the same images and are close in space, into one topic (e.g. face). This is\na more proper assumption for solving many vision problems when images often contain several\nobjects.\nIt is also easy for SLDA to model activities and human actions by encoding temporal\ninformation. However the spatial or temporal information is not encoded in the values of visual\nwords, but in the design of documents. LDA and its extensions, such as the author-topic model [8],\nthe dynamic topic model\n[10], all assume that the partition\nof words into documents is known a priori. A key difference of SLDA is that the word-document\nassignment becomes a hidden random variable. There is a generative procedure to assign words to\ndocuments. When visual words are close in space or time, they have a high probability to be grouped\ninto the same document. Some approaches such as [11, 3, 12, 4] could also capture some spatial\nstructures among visual words.\n[11] assumed that the spatial distribution of an object class could\nbe modeled as Gaussian and the number of objects in the image was known. Both [3] and [4] \ufb01rst\nroughly segmented images using graph cuts and added spatial constraint using these segments. [12]\nmodeled the spatial dependency among image patches as Markov random \ufb01elds.\nAs an example application, we use the SLDA model to discover objects from a collection of images.\nAs shown in Figure 2, there are different classes of objects, such as cows, cars, faces, grasses,\nsky, bicycles, etc., in the image set. And an image usually contains several objects of different\nclasses. The goal is to segment objects from images, and at the same time, to label these segments\nas different object classes in an unsupervised way. It integrates object segmentation and recognition.\nIn our approach images are divided into local patches. A local descriptor is computed for each\nimage patch and quantized into a visual word. Using topic models, the visual words are clustered\ninto topics which correspond to object classes. Thus an image patch can be labeled as one of the\nobject classes. Our work is related to [2] which used LDA to cluster image patches. As shown in\nFigure 2, SLDA achieves much better performance than LDA. We will compare more results of\nLDA and SLDA in the experimental section.\n\n2 Computation of Visual Words\n\nTo obtain the local descriptors, images are convolved with the \ufb01lter bank proposed in [13], which is\na combination of 3 Gaussians, 4 Laplacian of Gaussians, and 4 \ufb01rst order derivatives of Gaussians,\nand was shown to have good performance for object categorization.\nInstead of only computing\nvisual words at interest points as in [2], we divide an image into local patches on a grid and densely\nsample a local descriptor for each patch. A codebook of size W is created by clustering all the\nlocal descriptors in the image set using K-means. Each local patch is quantized into a visual word\naccording to the codebook. In the next step, these visual words (image patches) will be further\nclustered into classes of objects. We will compare two clustering methods, LDA and SLDA.\n\n2\n\n\fFigure 2: Given a collection of images as shown in the \ufb01rst row (which are selected from the MSRC\nimage dataset [13]), the goal is to segment images into objects and cluster these objects into different\nclasses. The second row uses manual segmentation and labeling as ground truth. The third row is\nthe LDA result and the fourth row is the SLDA result. Under the same labeling approach, image\npatches marked in the same color are in one object cluster, but the meaning of colors changes across\ndifferent labeling methods.\n\n3 LDA\n\nWhen LDA is used to solve our problem, we treat local patches of images as words and the whole\nimage as a document. The graphical model of LDA is shown in Figure 3 (a). There are M docu-\nments (images) in the corpus. Each document j has Nj words (image patches). wji is the observed\nvalue of word i in document j. All the words in the corpus will be clustered into K topics (classes\nof objects). Each topic k is modeled as a multinomial distribution over the codebook. (cid:31) and \u03c6 are\nDirichlet prior hyperparameters. (cid:29)k, (cid:28) j, and zji are hidden variables to be inferred. The generative\nprocess of LDA is:\n\n1. For a topic k, a multinomial parameter (cid:29)k is sampled from Dirichlet prior (cid:29)k (cid:31) Dir(\u03c6).\n2. For a document j, a multinomial parameter (cid:28) j over the K topics is sampled from Dirichlet\n\n3. For a word i in document j, a topic label zji is sampled from discrete distribution zji (cid:31)\n\nprior (cid:28) j (cid:31) Dir((cid:31)).\n\nDiscrete((cid:28) j).\n\n4. The value wji of word i in document j is sampled from the discrete distribution of topic\n\nzji, wji (cid:31) Discrete((cid:29)zji).\n\nzji can be sampled through a Gibbs sampling procedure which integrates out (cid:28) j and (cid:29)k [14].\n\np(zji = k(cid:124) z(cid:31) ji(cid:44) w(cid:44) (cid:31)(cid:44) \u03c6) (cid:31)\n\n(cid:30)\n\n(cid:31)W\n\n+ \u03c6wji\n\nn(k)\n(cid:31) ji(cid:44) wji\nn(k)\n(cid:31) ji(cid:44) w + \u03c6w\n\nw=1\n\n(cid:29) (cid:183)\n\n(cid:31)K\n\n(cid:30)\n\nn(j)\n(cid:31) ji(cid:44) k + (cid:31)k\nn(j)\n(cid:31) ji(cid:44) k(cid:31)+ (cid:31)k(cid:31)\n\nk(cid:31)=1\n\n(cid:29)\n\n(1)\n\nwhere n(k)\n(cid:31) ji(cid:44) w is the number of words in the corpus with value w assigned to topic k excluding word\ni in document j, and n(j)\n(cid:31) ji(cid:44) k is the number of words in document j assigned to topic k excluding\nword i in document j. Eq 1 is the product of two ratios: the probability of word wji under topic k\nand the probability of topic k in document j. So LDA clusters the visual words often co-occurring\nin the same images into one object class.\nAs shown by some examples in Figure 2 (see more results in the experimental section), there are\ntwo problems in using LDA for object segmentation and recognition. The segmentation result is\n\n3\n\n\fFigure 3: Graphical model of LDA (a) and SLDA (b). See text for details.\n\nnoisy since spatial information is not considered. Although LDA assumes that one image contains\nmultiple topics, from experimental results we observe that the patches in the same image are likely\nto have the same labels. Since the whole image is treated as one document, if one object class, e.g.\ncar in Figure 2, is dominant in the image, the second ratio in Eq 1 will lead to a large bias towards\nthe car class, and thus the patches of street are also likely to be labeled as car. This problem could\nbe solved if a local patch only considers its neighboring patches as being in the same document.\n\n4 SLDA\n\nWe assume that if visual words are from the same class of objects, they not only often co-occur in the\nsame images but also are close in space. So we try to group image patches which are close in space\ninto the same documents. One straightforward way is to divide the image into regions as shown in\nFigure 4 (a). Each region is treated as a document instead of the whole image. However, since these\nregions are not overlapped, some patches, such as A (red patch) and B (cyan patch) in Figure 4 (a),\neven though very close in space, are assigned to different documents. In Figure 4 (a), patch A on\nthe cow is likely to be labeled as grass, since most other patches in its document are grass. To solve\nthis problem, we may put many overlapped regions, each of which is a document, on the images as\nshown in Figure 4 (b). If a patch is inside a region, it \u201ccould\u201d belong to that document. Any two\npatches whose distance is smaller than the region size \u201ccould\u201d belong to the same document if the\nregions are placed densely enough. We use the word \u201ccould\u201d because each local patch is covered\nby several regions, so we have to decide to which document it belongs. Different from the LDA\nmodel, in which the word-document relationship is known a priori, we need a generative procedure\nassigning words to documents. If two patches are closer in space, they have a higher probability\nto be assigned to the same document since there are more regions covering both of them. Actually\nwe can go even further. As shown in Figure 4 (c), each document can be represented by a point\n(marked by magenta circle) in the image, assuming its region covers the whole image. If an image\npatch is close to a document, it has a high probability to be assigned to that document.\nThe graphical model is shown in Figure 3 (b). In SLDA, there are M documents and N words in the\ncorpus. A hidden variable di indicates which document word i is assigned to. For each document\nj there is a hyperparameter cd\nj is the index of the image where\n\nj =(cid:0)gd\n(cid:1) is the location of the document. For a word i, in addition to the\n\ndocument j is placed and(cid:0)xd\n\n(cid:1) known a priori. gd\n\nj , xd\n\nj , yd\nj\n\nj , yd\nj\n\nobserved word value wi, its location (xi, yi) and image index gi are also observed and stored in\nvariable ci = (gi, xi, yi). The generative procedure of SLDA is:\n\n1. For a topic k, a multinomial parameter \u03c6k is sampled from Dirichlet prior \u03c6k \u223c Dir(\u03b2).\n\n4\n\n\fFigure 4: There are several ways to add spatial information among image patches when designing\ndocuments.\n(a): Divide the image into regions without overlapping. Each region, marked by a\ndashed window, corresponds to a document. Image patches inside the region are assigned to the\ncorresponding document. (b): densely put overlapped regions over images. One image patch is\ncovered by multiple regions. (c): Each document is associated with a point (marked in magenta\ncolor). These points are densely placed over the image. If a image patch is close to a document, it\nhas a high probability to be assigned to that document.\n\n2. For a document j, a multinomial parameter (cid:28) j over the K topics is sampled from Dirichlet\nprior (cid:28) j (cid:31) Dir((cid:31)).\n3. For a word (image patch) i, a random variable di is sampled from prior p(di(cid:124) \u03c3) indicating\nto which document word i is assigned. We choose p(di(cid:124) \u03c3) as a uniform prior.\n4. The image index and location of word i is sampled from distribution p(ci(cid:124) cd\n\nchoose this as a Gaussian kernel.\n\np((gi(cid:44) xi(cid:44) yi)(cid:124) (cid:28)gd\n\n(cid:44) xd\ndi\n\n(cid:44) yd\ndi\n\ndi\n\n(cid:27)(cid:44) (cid:26)) (cid:31) \u03c0gd\n\n(gi) exp\n\n(cid:30)\n\n(cid:26) 2\n(cid:44) (cid:26)) = 0 if the word and the document are not in the same image.\n\ndi\n\n(cid:26)\n\n(cid:28)xd\n\ndi\n\ndi\n\n(cid:27)2 +(cid:28)yd\n\ndi\n\n(cid:27)2\n\n(cid:44) (cid:26)). We may\n(cid:30) yi\n\n(cid:25)\n\n(cid:30) xi\n\n5. The topic label zi of word i is sampled from the discrete distribution of document di,\n\ndi\n\np(ci(cid:124) cd\nzi (cid:31) Discrete((cid:28) di).\n\n6. The value wi of word i is sampled from the discrete distribution of topic zi, wi\n\nDiscrete((cid:29)zi).\n\n4.1 Gibbs Sampling\n\nzi and di can be sampled through a Gibbs sampling procedure integrating out (cid:29)k and (cid:28) j. In SLDA\nthe conditional distribution of zi given di is the same as in LDA.\n\nn(k)\n\n(cid:30)\n\n(cid:31)W\n\nw=1\n\ni(cid:44) wi\nn(k)\n\n+ \u03c6wi\ni(cid:44) w + \u03c6w\n\n(cid:29) (cid:183)\n\n(cid:31)K\n\nn(j)\n\n(cid:30)\n\nk(cid:31)=1\n\ni(cid:44) k + (cid:31)k\nn(j)\n\ni(cid:44) k(cid:31)+ (cid:31)k(cid:31)\n\n(cid:29) (2)\n\np(zi = k(cid:124) di = j(cid:44) d(cid:31)\n\ni(cid:44) z(cid:31)\n\ni(cid:44) w(cid:44) (cid:31)(cid:44) \u03c6) (cid:31)\n\ni(cid:44) w is the number of words in the corpus with value w assigned to topic k excluding word\ni(cid:44) k is the number of words in document j assigned to topic k excluding word i. This is\n\nwhere n(k)\ni, and n(j)\neasy to understand since if the word-document assignment is \ufb01xed, SLDA is the same as LDA.\nIn addition, we also need to sample di from the conditional distribution given zi.\n\nj(cid:31)(cid:125) (cid:44) (cid:31)(cid:44) \u03c6(cid:44) \u03c3(cid:44) (cid:26)(cid:27)\n\ni(cid:124) di = j(cid:44) d(cid:31)\n\ni(cid:44) (cid:31))\n\ni(cid:44) d(cid:31)\n\ni(cid:44) ci(cid:44)(cid:123) cd\n\ni(cid:44) (cid:31)) is obtained by integrating out (cid:28) j(cid:31).\n\np(cid:28)di = j(cid:124) zi = k(cid:44) z(cid:31)\n(cid:31) p (di = j(cid:124) \u03c3) p(cid:28)ci(cid:124) cd\nj (cid:44) (cid:26)(cid:27)p (zi = k(cid:44) z(cid:31)\n(cid:23)\nM(cid:24)\n(cid:30)(cid:31)K\nM(cid:24)\n(cid:22) K\n\nk(cid:31)=1 (cid:31)k(cid:31)\nk(cid:31)=1 (cid:31)((cid:31)k(cid:31))\n\ni(cid:44) (cid:31)) =\n\n(cid:29)\n\nj(cid:31)=1\n\nj(cid:31)=1\n\n(cid:31)\n\n=\n\np((cid:28) j(cid:31)(cid:124) (cid:31))p(zj(cid:31)(cid:124) (cid:28) ji)d(cid:28) j(cid:31)\n\n(cid:30)\n(cid:22) K\n(cid:30)(cid:31)K\nk(cid:31) +(cid:31)K\nk(cid:31)=1 (cid:31)\nk(cid:31)=1 n(j(cid:31))\n\nn(j(cid:31))\nk(cid:31) + (cid:31)k(cid:31)\n\n(cid:29)\n\nk(cid:31)=1 (cid:31)k(cid:31)\n\n(cid:29)(cid:46)\n\n(cid:31)\n\np (zi = k(cid:44) z(cid:31)\n\ni(cid:124) di = j(cid:44) d(cid:31)\n\np (zi = k(cid:44) z(cid:31)\n\ni(cid:124) di = j(cid:44) d(cid:31)\n\n5\n\n(cid:31)\n(cid:31)\n(cid:31)\n(cid:31)\n(cid:31)\n(cid:31)\n(cid:31)\n(cid:183)\n\ftional distribution of di is\n\nj , \u03c3(cid:1) as a Gaussian kernel. Thus the condi-\nWe choose p (di = j|\u03b7) as a uniform prior and p(cid:0)ci|cd\np(cid:0)di = j|zi = k, z\u2212i, d\u2212i, ci,{cd\nj(cid:48)}, \u03b1, \u03b2, \u03b7, \u03c3(cid:1)\n(cid:80)K\n\nn(j)\u2212i,k + \u03b1k\n\n\u2212(xd\n\nj \u2212xi)2\n\n\u221d \u03b4gd\n\nj\n\n(gi) \u00b7 e\n\nj \u2212yi)2\n\n+(yd\n\u03c32\n\n\u00b7\n\n(3)\n\n(cid:16)\n\n(cid:17)\n\nk(cid:48)=1\n\nn(j)\u2212i,k(cid:48) + \u03b1k(cid:48)\n\nWord i is likely to be assigned to document j if they are in the same image, close in space and word\ni has the same topic label as other words in document j. In real applications, we only care about the\ndistribution of zi while dj can be marginalized by simply ignoring its samples. From Eq 2 and 3,\nwe observed that a word tends to have the same topic label as other words in its document and words\ncloser in space are more likely to be assigned to the same documents. So essentially under SLDA a\nword tends to be labeled as the same topic as other words close to it. This satis\ufb01es our assumption\nthat visual words from the same object class are closer in space.\nSince we densely place many documents over one image, during Gibbs sampling some documents\nare only assigned a few words and the distributions cannot be well estimated. To solve this problem\nwe replicate each image patch to get many particles. These particles have the same word value and\nlocation but can be assigned to different documents and have different labels. Thus each document\nwill have enough samples of words to estimate the distributions.\n\n4.2 Discussion\n\nSLDA is a \ufb02exible model intended to encode spatial structure among image patches and design\ndocuments. If there is only one document placed over one image, SLDA simply reduces to LDA.\nIf p(ci|cd\nj ) is an uniform distribution inside a local region, SLDA implements the scheme described\nin Figure 4 (b). If these local regions are not overlapped, it is the case of Figure 4 (a). There are\nalso other possible ways to add spatial information by choosing different spatial priors p(ci|cd\nj ). In\nSLDA, the spatial information is used when designing documents. However the object class model\n\u03c6k, simply a multinomial distribution over the codebook, has no spatial structure. So the objects of\na class could be in any shape and anywhere in the images, as long as they smoothly distribute in\nj , it is easy for SLDA to encode temporal structure\nspace. By simply adding a time stamp to ci and cd\namong visual words. So SLDA also can be applied to human action and activity analysis.\n\n5 Experiments\n\nWe test LDA and SLDA on the MSRC image dataset [13] with 240 images. Our codebook size is\n200 and the topic number is 15. In Figure 2, we show some examples of results using LDA and\nSLDA. Colors are used indicate different topics. The results of LDA are noisy and within one image\nmost of the patches are labeled as one topic. SLDA achieves much better results than LDA. The\nresults are smoother and objects are well segmented. The detection rate and false alarm rate of four\nclasses, cows, cars, faces, and bicycles are shown in Table 1. They are counted in pixels. We use the\nmanual segmentation and labeling in [13] as ground truth.\nThe two models are also tested on a tiger video sequence with 252 frames. We treat all the frames\nin the sequence as an image collection and ignore their temporal order. Figure 5 shows their results\non two sampled frames. Please see the result of the whole video sequence from our website [15].\nUsing LDA, usually there are one or two dominant topics distributed like noise in a frame. Topics\nchange as the video background changes. LDA cannot segment out any objects. SLDA clusters\nimage patches into tigers, rock, water, and grass. If we choose the topic of tiger, as shown in the last\nrow of Figure 5, all the tigers in the video can be segmented out.\n6 Conclusion\n\nWe propose a novel Spatial Latent Dirichlet Allocation model which clusters co-occurring and spa-\ntially neighboring visual words into the same topic. Instead of knowing word-document assignment\na priori, SLDA has a generative procedure partitioning visual words which are close in space into\nthe same documents. It is also easy to extend SLDA to including temporal information.\n\n6\n\n\fFigure 5: Discovering objects from a video sequence. The \ufb01rst column shows two frames in the\nvideo sequence. In the second column, we label the patches in the two frames as different topics\nusing LDA. The thrid column plots the topic labels using SLDA. The red color indicates the topic\nof tigers. In the fourth column, we segment tigers out by choosing the topic marked in red.\n\nTable 1: Detection(D) rate and False Alarm (FA) rate of LDA and SLDA on the MSRC data set\n\nLDA(D)\nSLDA(D)\nLDA(FA)\nSLDA(FA)\n\ncows\n0.3755\n0.5662\n0.5576\n0.0334\n\ncars\n0.5552\n0.6838\n0.3963\n0.2437\n\nfaces\n0.7172\n0.6973\n0.5862\n0.3714\n\nbicycles\n0.5563\n0.5661\n0.5285\n0.4217\n\n7 Acknowledgement\n\nThe authors wish to acknowledge DSO National Laboratory of Singapore for partially supporting\nthis research.\n\nReferences\n[1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research,\n\n3:993\u20131022, 2003.\n\n[2] J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering object categories in\n\nimage collections. In Proc. ICCV, 2005.\n\n[3] B. C. Russell, A. A. Efros, J. Sivic, W. T. Freeman, and A. Zisserman. Using multiple segmentations to\n\ndiscover objects and their extent in image collections. In Proc. CVPR, 2006.\n\n[4] L. Cao and L. Fei-Fei. Spatially coherent latent topic model for concurrent object segmentation and\n\nclassi\ufb01cation. In Proc. ICCV, 2007.\n\n[5] L. Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories. In Proc.\n\nCVPR, 2005.\n\n[6] J. C. Niebles, H. Wang, and L. Fei-Fei. Unsupervised learning of human action categories using spatial-\n\ntemporal words. In Proc. BMVC, 2006.\n\n[7] X. Wang, X. Ma, and E. Grimson. Unsupervised activity perception by hierarchical bayesian models. In\n\nProc. CVPR, 2007.\n\n[8] M. Rosen-Zvi, T. Grif\ufb01ths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents.\n\nIn Proc. of Uncertainty in Arti\ufb01cial Intelligence, 2004.\n\n[9] D. Blei and J. Lafferty. Dynamic topic models. In Proc. ICML, 2006.\n[10] D. Blei and J. Lafferty. Correlated topic models. In Proc. NIPS, 2006.\n[11] E. B. Sudderth, A. Torralba, W. T. Freeman, and A. S. Willsky. Learning hierarchical models of scenes,\n\nobjects, and parts. In Proc. ICCV, 2005.\n\n[12] J. Verbeek and B. Triggs. Region classi\ufb01cation with markov \ufb01eld aspect models. In Proc. CVPR, 2007.\n\n7\n\n\f(a)\n\n(b)\n\n(c)\n\nFigure 6: Examples of experimental results on the MSRC image data set. (a): original images; (b):\nLDA results; (c) SLDA results.\n\n[13] J. Winn, A. Criminisi, and T. Minka. Object categorization by learned universal visual dictionary. In\n\nProc. ICCV, 2005.\n\n[14] T. Grif\ufb01ths and M. Steyvers. Finding scienti\ufb01c topics. In Proc. of the National Academy of Sciences,\n\n2004.\n\n[15] http://people.csail.mit.edu/xgwang/slda.html.\n\n8\n\n\f", "award": [], "sourceid": 102, "authors": [{"given_name": "Xiaogang", "family_name": "Wang", "institution": null}, {"given_name": "Eric", "family_name": "Grimson", "institution": null}]}