{"title": "Group Anomaly Detection using Flexible Genre Models", "book": "Advances in Neural Information Processing Systems", "page_first": 1071, "page_last": 1079, "abstract": "An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies.", "full_text": "Group Anomaly Detection using Flexible Genre Models\n\nLiang Xiong\n\nMachine Learning Department,\n\nCarnegie Mellon University\nlxiong@cs.cmu.edu\n\nBarnab\u00b4as P\u00b4oczos\nRobotics Institute,\n\nCarnegie Mellon University\nbapoczos@cs.cmu.edu\n\nJeff Schneider\nRobotics Institute,\n\nCarnegie Mellon University\nschneide@cs.cmu.edu\n\nAbstract\n\nAn important task in exploring and analyzing real-world data sets is to detect\nunusual and interesting phenomena. In this paper, we study the group anomaly\ndetection problem. Unlike traditional anomaly detection research that focuses on\ndata points, our goal is to discover anomalous aggregated behaviors of groups of\npoints. For this purpose, we propose the Flexible Genre Model (FGM). FGM is\ndesigned to characterize data groups at both the point level and the group level so\nas to detect various types of group anomalies. We evaluate the effectiveness of\nFGM on both synthetic and real data sets including images and turbulence data,\nand show that it is superior to existing approaches in detecting group anomalies.\n\n1\n\nIntroduction\n\nAnomaly detection is a crucial problem in processing large-scale data sets when our goal is to\n\ufb01nd rare or unusual events. These events can either be outliers that should be ignored or novel\nobservations that could lead to new discoveries. See [1] for a recent survey of this \ufb01eld. Traditional\nresearch often focuses on individual data points. In this paper, however, we are interested in \ufb01nding\ngroup anomalies, where a set of points together exhibit unusual behavior. For example, consider\ntext data where each article is considered to be a set (group) of words (points). While the phrases\n\u201cmachine learning\u201d or \u201cgummy bears\u201d will not surprise anyone on their own, an article containing\nboth of them might be interesting.\nWe consider two types of group anomalies. A point-based group anomaly is a group of individually\nanomalous points. A distribution-based anomaly is a group where the points are relatively normal,\nbut as a whole they are unusual. Most existing work on group anomaly detection focuses on point-\nbased anomalies. A common way to detect point-based anomalies is to \ufb01rst identify anomalous\npoints and then \ufb01nd their aggregations using scanning or segmentation methods [2, 3, 4]. This\nparadigm clearly does not work well for distribution-based anomalies, where the individual points\nare normal. To handle distribution-based anomalies, we can design features for groups and then treat\nthem as points [5, 6]. However, this approach relies on feature engineering that is domain speci\ufb01c\nand can be dif\ufb01cult. Our contribution is to propose a new method (FGM) for detecting both types of\ngroup anomalies in an integral way.\nGroup anomalies exist in many real-world problems. In astronomical studies, modern telescope\npipelines1 produce descriptions for a vast amount of celestial objects. Having these data, we want\nto pick out scienti\ufb01cally valuable objects like planetary nebulae, or special clusters of galaxies that\ncould shed light on the development of the universe [7]. In physics, researchers often simulate the\nmotion of particles or \ufb02uid. In these systems, a single particle is seldom interesting, but a group of\nparticles can exhibit interesting motion patterns like the interweaving of vortices. Other examples\nare abundant in the \ufb01elds of computer vision, text processing, time series and spatial data analysis.\n\n1For example, the Sloan Digital Sky Survey (SDSS), http://www.sdss.org\n\n1\n\n\fWe take a generative approach to address this problem. If we have a model to generate normal\ndata, then we can mark the groups that have small probabilities under this model as anomalies.\nHere we make the \u201cbag-of-points\u201d assumption, i.e., points in the same group are unordered and\nexchangeable. Under this assumption, mixture models are often used to generate the data due to\nDe Finetti\u2019s theorem [8]. The most famous class of mixture models for modeling group data is\nthe family of topic models [9, 10]. In topic models, distributions of points in different groups are\nmixtures of components (\u201ctopics\u201d), which are shared among all the groups.\nOur proposed method is closely related to the class of topic models, but it is designed speci\ufb01cally for\nthe purpose of detecting group anomalies. We use two levels of concepts/latent variables to describe\na group. At the group level, a \ufb02exible structure based on \u201cgenres\u201d is used to characterize the topic\ndistributions so that complex normal behaviors are allowed and can be recognized. At the point level,\neach group has its own topics to accommodate and capture the variations of points\u2019 distributions\n(while global topic information is still shared among groups). We call this model the Flexible Genre\nModel (FGM). Given a group of points, we can examine whether or not it conforms to the normal\nbehavior de\ufb01ned by the learned genres and topics. We will also propose scoring functions that can\ndetect both point-based and distribution-based group anomalies. Exact inference and learning for\nFGM is intractable, so we resort to approximate methods. Inference for the FGM model will be\ndone by Gibbs sampling [11], which is ef\ufb01cient and simple to implement due to the application of\nconjugate distributions. Single-sample Monte Carlo EM [12] is used to learn parameters based on\nsamples produced by the Gibbs sampler.\nWe demonstrate the effectiveness of the FGM on synthetic and on real-world data sets including\nscene images and turbulence data. Empirical results show that FGM is superior to existing ap-\nproaches in \ufb01nding group anomalies.\nThe paper is structured as follows. In Section 2 we review related work and discuss the limitations\nwith existing algorithms and why a new method is needed for group anomaly detection. Section 3 in-\ntroduces our proposed model. The parameter learning of our model and inference on it are explained\nin Section 4. Section 5 describes how to use our method for group anomaly detection. Experimental\nresults are shown in Section 6. We \ufb01nish that paper by drawing conclusions (Section 7).\n\n2 Background and Related Work\n\nIn this section, we provide background about topic models and explain the limitation of existing\nmethods in detecting group anomalies. For intuition, we introduce the problem in the context of\ndetecting anomalous images, rare galaxy clusters, and unusual motion in a dynamic \ufb02uid simulation.\nWe consider a data set with M pre-de\ufb01ned groups G1, . . . , GM (e.g. spatial clusters of galax-\nies, patches in an image, or \ufb02uid motions in a local region). Group Gm contains Nm points\n(galaxies, image patches, simulation grid points). The features of these points are denoted by\nxm = {xm,n 2 Rf}n=1,...,Nm, where f is the dimensionality of the points\u2019 features. These would\nbe spectral features of each galaxy, SIFT features of each image patch, or velocities at each grid\npoint of a simulation. We assume that points in the same group are unordered and exchangeable.\nHaving these data, we ask the question whether in group Gm the distribution of features xm looks\nanomalous.\nTopic models such as Latent Dirichlet Allocation (LDA) [10] are widely used to model data having\nthis kind of group structure. The original LDA model was proposed for text processing. It represents\nthe distribution of points (words) in a group (document) as a mixture of K global topics 1, . . . K,\neach of which is a distribution (i.e., i 2 Sf , where Sf is the f-dimensional probability simplex).\nLet M(\u2713) be the multinomial distribution parameterized by \u2713 2 SK and Dir(\u21b5) be the Dirichlet\ndistribution with parameter \u21b5 2 RK\n+ . LDA generates the mth group by \ufb01rst drawing its topic\ndistribution \u2713m from the prior distribution Dir(\u21b5). Then for each point xmn in the mth group\nit draws one of the K topics from M(\u2713m) (i.e., zmn \u21e0M (\u2713m)) and then generates the point\naccording to this topic (xmn \u21e0M (zmn)).\nIn our examples, the topics can represent galaxy types (e.g. \u201cblue\u201d,\u201cred\u201d, or \u201cemissive\u201d, with\nK = 3), image features (e.g. edge detectors representing various orientations), or common mo-\ntion patterns in the \ufb02uid (fast left, slow right, etc). Each point in the group has its own topic. We\nconsider points that have multidimensional continuous feature vectors. In this case, topics can be\n\n2\n\n\fmodeled by Gaussian distributions, and each point is generated from one of the K Gaussian topics.\nAt a higher level, a group is characterized by the distribution of topics \u2713m, i.e., the proportion of\ndifferent types in the group Gm. The concepts of topic and topic distribution help us de\ufb01ne group\nanomalies: a point-based anomaly contains points that do not belong to any of the normal topics\nand a distribution-based anomaly has a topic distribution \u2713m that is uncommon.\nAlthough topic models are very useful in estimating the topics and topic distributions in groups, the\nexisting methods are incapable of detecting group anomalies comprehensively. In order to detect\nanomalies, the model should be \ufb02exible enough to enable complex normal behaviors. For example,\nit should be able to model complex and multi-modal distributions of the topic distribution \u2713. LDA,\nhowever, only uses a single Dirichlet distribution to generate topic distributions, and cannot effec-\ntively de\ufb01ne what normal and abnormal distributions should be. It also uses the same K topics for\nevery group, which makes groups indifferentiable when looking at their topics. In addition, these\nshared topics are not adapted to each group either.\nThe Mixture of Gaussian Mixture Model (MGMM) [13] \ufb01rstly uses topic modeling for group\nanomaly detection. It allows groups to select their topic distributions from a dictionary of multi-\nnomials, which is learned from data to de\ufb01ne what is normal. [14] employed the same idea but\ndid not apply their model to anomaly detection. The problem of using multinomials is that it does\nnot consider the uncertainty of topic distributions. The Theme Model (ThM) [15] lets a mixture\nof Dirichlets generate the topic distributions and then uses the memberships in this mixture to do\nclustering on groups. This idea is useful for modeling group-level behaviors but fails to capture\nanomalous point-level behaviors. The topics are still shared globally in the same way as in LDA. In\ncontrast, [16] proposed to use different topics for different groups in order to account for the bursti-\nness of the words (points). These adaptive topics are useful in recognizing point-level anomalies,\nbut cannot be used to detect anomalous behavior at the group level. For the group anomaly detection\nproblem we propose a new method, the Flexible Genre Model, and demonstrate that it is able to cope\nwith the issues mentioned above and performs better than the existing state-of-the-art algorithms.\n\n3 Model Speci\ufb01cation\n\nThe \ufb02exible genre model (FGM) extends LDA such that the generating processes of topics and topic\ndistributions can model more complex distributions. To achieve this goal, two key components are\nadded. (i) To model the behavior of topic distributions, we use several \u201cgenres\u201d, each of which is a\ntypical distribution of topic distributions. (ii) We use \u201ctopic generators\u201d to generate adaptive topics\nfor different groups. We will also use them to learn how the normal topics have been generated. The\ngenerative process of FGM is presented in Algorithm 1. A graphical representation of FGM is given\nin Figure 1.\n\nAlgorithm 1 Generative process of FGM\n\nfor Groups m = 1 to M do\n\n\u2022 Draw a genre {1, . . . , T}3 ym \u21e0M (\u21e1).\n\u2022 Draw a topic distribution according to the genre ym: SK 3 \u2713m \u21e0D ir(\u21b5ym).\n\u2022 Draw K topics {m,k \u21e0 P (m,k|\u2318k)}k=1,...,K.\nfor Points n = 1 to Nm do\n\n\u2022 Draw a topic membership {1, . . . , K}3 zm,n \u21e0M (\u2713m).\n\u2022 Generate a point xm,n \u21e0 P (xm,n|m,zmn).\n\n[m,zmn topic will be active.]\n\nend for\n\nend for\n\nWe assume there are T genres and K topics. M(\u21e1) denotes the global distribution of genres. Each\ngenre is a Dirichlet distribution for generating the topic distributions, and \u21b5 = {\u21b5t}t=1,...,T is the\nset of genre parameters. Each group has K topics m = {m,k}k=1,...,K. The \u201ctopic generators\u201d,\n\u2318 = {\u2318k},{P (\u00b7|\u2318k)}k=1,...,K, are the global distributions for generating the corresponding topics.\nHaving the topic distribution \u2713m and the topics {m,k}, points are generated as in LDA.\nBy comparing FGM to LDA, the advantages of FGM become evident. (i) In FGM, each group has a\nlatent genre attribute ym, which determines how the topic distribution in this group should look like\n(Dir(\u21b5ym)), and (ii) each group has its own topics {m,k}K\nk=1, but they are still tied through the\n\n3\n\n\fFigure 1: The Flexible Genre Model (FGM).\n\nglobal distributions P (\u00b7|\u2318). Thus, the topics can be adapted to local group data, but the information\nis still shared globally. Moreover, the topic generators P (\u00b7|\u2318) determine how the topics {m,k}\nshould look like. In turn, if a group uses unusual topics to generate its points, it can be identi\ufb01ed.\nTo handle real-valued multidimensional data, we set the point-generating distributions (i.e., the top-\nics) to be Gaussians, P (xm,n|m,k) = N (xm,n|m,k), where m,k = {\u00b5m,k, \u2303m,k} includes\nthe mean and covariance parameters. For computational convenience, the topic generators are\nGaussian-Inverse-Wishart (GIW) distributions, which are conjugate to the Gaussian topics. Hence\n\u2318k = {\u00b50,\uf8ff 0, 0,\u232b 0} parameterizes the GIW distribution [17] (See the supplementary materials\nfor more details). Let \u21e5= {\u21e1, \u21b5, \u2318} denote the model parameters. We can write the complete\nlikelihood of data and latent variables in group Gm under FGM as follows:\n\nP (Gm, ym,\u2713 m, m|\u21e5)\n\nP (Gm|\u21e5) =Xt\n\n= M(ym|\u21e1)Dir(\u2713m|\u21b5ym)Yk\nDir(\u2713m|\u21b5t)Yk\n\n\u21e1tZ\u2713m,m\n\nGIW (m,k|\u2318k)Yn M(zmn|\u2713m)N (xmn|m,zmn).\nGIW (m,k|\u2318k)Yn Xk\n\n\u2713mkN (xmn|m,k)dmd\u2713m.\n\nBy integrating out \u2713m, m and summing out ym, z, we get the marginal likelihood of Gm:\n\nFinally, the data-set\u2019s likelihood is just the product of all groups\u2019 likelihoods.\n\n4\n\nInference and Learning\n\nTo learn FGM, we update the parameters \u21e5 to maximize the likelihood of data. The inferred latent\nstates\u2014including the topic distributions \u2713m, the topics m, and the topic and genre memberships\nzm, ym\u2014can be used for detecting anomalies and exploring the data. Nonetheless, the inference\nand learning in FGM is intractable, so we train FGM using an approximate method described below.\n\n4.1\n\nInference\n\nThe approximate inference of the latent variables can be done using Gibbs sampling [11]. In Gibbs\nsampling, we iteratively update one variable at a time by drawing samples from its conditional\ndistribution when all the other parameters are \ufb01xed. Thanks to the use of conjugate distributions,\nGibbs sampling in FGM is simple and easy to implement. The sampling distributions of the latent\nvariables in group m are given below. We use P (\u00b7| \u21e0) to denote the distribution of one variable\nconditioned on all the others. For the genre membership ym we have that:\n\nP (ym = t|\u21e0 ) / P (\u2713m|\u21b5t)P (ym = t|\u21e1) = \u21e1tDir(\u2713m|\u21b5t).\n\nFor the topic distribution \u2713m:\n\nP (\u2713m|\u21e0 ) / P (zm|\u2713m)P (\u2713m|\u21b5, ym) = M(zm|\u2713m)Dir(\u2713m|\u21b5ym) = Dir(\u21b5ym + nm),\n\nwhere nm denotes the histogram of the K values in vector zm. The last equation follows from the\nDirichlet-Multinomial conjugacy. For m,k, the kth topic in group m, one can \ufb01nd that:\nP (m,k|\u21e0 ) / P (x(k)\n\nm |m,k)GIW (m,k|\u2318k) = GIW (m,k|\u23180k),\n\nm |m,k)P (m,k|\u2318k) = N (x(k)\n\n4\n\nxmnzmn\uf062NMym\uf070\uf061m\uf071\uf068KTK\fwhere x(k)\nm are points in group Gm from topic k, i.e., zm,n = k. The last equation follows from\nthe Gaussian-Inverse-Wishart-Gaussian conjugacy. \u23180k is the parameter of the posterior GIW distri-\nbution given x(k)\nm ; its exact form can be found in the supplementary material. For zmn, the topic\nmembership of point n in group m is as follows:\n\nP (zmn = k|\u21e0 ) / P (xmn|zmn = k, m)P (zmn = k|\u2713m) = \u2713m,kN (xmn|m,k).\n\n4.2 Learning\n\nLearning the parameters of FGM helps us identify the groups\u2019 and points\u2019 normal behaviors. Each of\nthe genres \u21b5 = {\u21b5t}t=1,...,T captures one typical distribution of topic distributions as \u2713 \u21e0D ir(\u21b5t).\nThe topic generators \u2318 = {\u2318k}k=1,...,K determine how the normal topics {m,k} should look like.\nWe use single-sample Monte Carlo EM [12] to learn parameters from the samples provided by\nthe Gibbs sampler. Given sampled latent variables, we update the parameters to their maximum\nlikelihood estimations (MLE): we learn \u21b5 from y and \u2713; \u2318 from ; and \u21e1 from y.\n\u21e1 can easily be estimated from the histogram of y\u2019s. \u21b5t is learned by the MLE of a Dirichlet dis-\ntribution given the multinomials {\u2713m|ym = t, m = 1, . . . , M} (i.e., the topic distributions having\ngenre t), which can be solved using the Newton\u2013Raphson method [18]. The kth topic-generator\u2019s\nparameter \u2318k = {\u00b50k,\uf8ff 0k, 0k,\u232b 0k} is the MLE of a GIW distribution given the parameters\n{m,k = (\u00b5m,k, \u2303m,k)}m=1,...,M (the kth topics of all groups). We have derived an ef\ufb01cient solu-\ntion for this MLE problem. The details can be found in the supplementary material.\nThe overall learning algorithm works by repeating the following procedure until convergence: (1)\ndo Gibbs sampling to infer the states of the latent variables; (2) update the model parameters using\nthe estimations above. To select appropriate values for the parameters T and K (the number of\ngenres and topics), we can apply the Bayesian information criterion (BIC) [19], or use the values\nthat maximize the likelihood of a held-out validation set.\n\n5 Scoring Criteria\n\nThe learned FGM model can easily be used for anomaly detection on test data. Given a test group,\nwe \ufb01rst infer its latent variables including the topics and the topic distribution. Then we treat these\nlatent states as the group\u2019s characteristicsand examine if they are compatible with the normal behav-\niors de\ufb01ned by the FGM parameters.\nPoint-based group anomalies can be detected by examining the topics of the groups. If a group\ncontains anomalous points with rare feature values xmn, then the topics {m,k}K\nk=1 that gener-\nate these points will deviate from the normal behavior de\ufb01ned by the topic generators \u2318. Let\nP (m|\u21e5) =QK\n\nk=1 GIW (m,k|\u2318k). The point-based anomaly score (PB score) of group Gm is\nEm[ ln P (m|\u21e5)] = Zm\n\nP (m|\u21e5, Gm) ln P (m|\u21e5)dm.\n\nThe posterior P (m|\u21e5, Gm) can again be approximated using Gibbs sampling, and the expectation\ncan be done by Monte Carlo integration.\nDistribution-based group anomalies can be detected by examining the topic distributions. The genres\n{\u21b5t}t=1,...,T capture the typical distributions of topic distributions. If a group\u2019s topic distribution\n\u2713m is unlikely to be generated from any of these genres, we call it anomalous. Let P (\u2713m|\u21e5) =\nPT\nt=1 \u21e1tDir(\u2713m|\u21b5t). The distribution-based anomaly score (DB score) of group Gm is de\ufb01ned as\n(1)\n\nP (\u2713m|\u21e5, Gm) ln P (\u2713m|\u21e5)d\u2713m.\n\nE\u2713m[ ln P (\u2713m|\u21e5)] = Z\u2713m\n\nAgain, this expectation can be approximated using Gibbs sampling and Monte Carlo integration.\nUsing a combination of the point-based and distribution-based scores, we can detect both point-\nbased and distribution-based group anomalies.\n\n5\n\n\f6 Experiments\n\nIn this section we provide empirical results produced by FGM on synthetic and real data. We show\nthat FGM outperforms several sate-of-the-art competitors in the group anomaly detection task.\n\n6.1 Synthetic Data\nIn the \ufb01rst experiment, we compare FGM with the Mixture of Gaussian Mixture Model\n(MGMM) [13] and with an adaptation of the Theme Model (ThM) [15] on synthetic data sets. The\noriginal ThM handles only discrete data and was proposed for clustering. To handle continuous data\nand detect anomalies, we modi\ufb01ed it by using Gaussian topics and applied the distribution-based\nanomaly scoring function (1). To detect both distribution-based and point-based anomalies, we can\nuse the data\u2019s likelihood under ThM as the scoring function.\nUsing the synthetic data sets described below, we can demonstrate the behavior of the different\nmodels and scoring functions. We generated the data using 2-dimensional GMMs as in [13]. Here\neach group has a GMM to generate its points. All GMMs share three Gaussian components with\ncovariance 0.2\u21e5 I2 and centered at points (1.7,1), (1.7,1), and (0, 2), respectively. A group\u2019s\nmixing weights are randomly chosen from w1 = [0.33, 0.33, 0.33] or w2 = [0.84, 0.08, 0.08]. Thus,\na group is normal if its points are sampled from these three Gaussians, and their mixing weights are\nclose to either w1 or w2. To test the detectors, we injected both point-based and distribution-based\nanomalies. Point-based anomalies were groups of points sampled from N ((0, 0), I2). Distribution-\nbased anomalies were generated by GMMs consisting of normal Gaussian components but with\nmixing weights [0.33, 0.64, 0.03] and [0.08, 0.84, 0.08], which were different from w1 and w2. We\ngenerated M = 50 groups, each of which had Nm \u21e0 P oisson(100) points. One point-based\nanomalous group and two distribution-based anomalous groups were injected into the data set.\nThe detection results of MGMM, ThM, and FGM are shown in Fig. 2. We show 12 out of the\n50 groups. Normal groups are surrounded by black solid boxes, point-based anomalies have green\ndashed boxes, and distribution-based anomalies have red/magenta dashed boxes. Points are col-\nored by the anomaly scores of the groups (darker color means more anomalous). An ideal detector\nwould make dashed boxes\u2019 points dark and solid boxes\u2019 points light gray. We can see that all the\n\nMGMM\n\nFGM\n\nThM\n\nThM \u2212 Likelihood\n\nFigure 2: Detection results on synthetic data.\n\nmodels can \ufb01nd the distribution-based anomalies since they are able to learn the topic distributions.\nHowever, MGMM and ThM miss the point-based anomaly. The explanation is simple; the anoma-\nlous points are distributed in the middle of the topics, thus the inferred topic distribution is around\n[0.33, 0.33, 0.33], which is exactly w1. As a result, MGMM and ThM infer this group to be normal,\nalthough it is not. This example shows one possible problem of scoring groups based on topic dis-\ntributions only. On the contrary, using the sum of point-based and distribution-based scores, FGM\nfound all of the group anomalies thanks to its ability to characterize groups both at the point-level\nand the group-level. We also show the result of scoring the groups by the ThM likelihood. Only\npoint anomalies are found. This is because the data likelihood under ThM is dominated by the\nanomalousness of points, thus a few eccentric points will overshadow group-level behaviors.\nFigures 3(a) \u2013 3(c) show the density estimations given by MGMM, ThM, and FGM, respectively, for\nthe point-based anomalous group. We can see that FGM gives a better estimation due to its adaptive\ntopics, while MGMM and ThM are limited to use their global topics. Figure 3(d) shows the learned\n\n6\n\n\fgenres visualized as the distributionPT\n\nt=1 \u21e1tDir(\u00b7|\u21b5t) on the topic simplex. This distribution sum-\nmarizes the normal topic distributions in this data set. Observe that the two peaks in the probability\nsimplex are very close to w1 and w2 indeed.\n\n(a)\n\n(b)\n\n(c)\n\n(d)\n\nFigure 3: (a),(b),(c) show the density of the point-based anomaly estimated by MGMM, ThM, and\nFGM respectively. In MGMM and ThM, topics must be shared globally, therefore their perform\nbadly. (d) The genres in the synthetic data set learned by FGM.\n\nImage Data\n\n6.2\nIn this experiment we test the performance of our method on detecting anomalous scene images. We\nuse the data set from [15]. We selected the \ufb01rst 100 images from categories \u201cmountain\u201d, \u201ccoast\u201d,\nand \u201cinside city\u201d. These 300 images are randomly divided: 80% are used for training and the rest\nfor testing. We created anomalies by stitching random normal test images from different categories.\nFor example, an anomaly may be a picture that is half mountain and half city street. These anoma-\nlies are challenging since they have the same local patches as the normal images. We mixed the\nanomalies with normal test images and asked the detectors to \ufb01nd them. Some examples are shown\nin Fig. 4(a). The images are represented as in [15]: we treat each of them as a group of local points.\nOn each image we randomly sample 100 patches, on each patch extract the 128-dimensional SIFT\nfeature [20], and then reduce its dimension to 2 using PCA. Points near the stitching boundaries are\ndiscarded to avoid boundary artifacts.\nWe compare FGM with several other methods. We implemented a simple detector based on Gaussian\nmixture models (GMM); it is able to detect point-based anomalies. This method \ufb01ts a GMM to all\ndata points, calculates the points\u2019 scores as their likelihood under this GMM, and \ufb01nally scores\na group by averaging these numbers. To be able to detect distribution-based anomalies, we also\nimplemented two other competitors. The \ufb01rst one, called LDA-KNN, uses LDA to estimate the topic\ndistributions of the groups and treats these topic distributions (vector parameters of multinomials)\nas the groups\u2019 features. Then, a k-nearest neighbor (KNN) based point detector [21] is used to score\nthe groups\u2019 features. The second method uses symmetrized Kullback-Leibler (KL) divergences\nbetween densities (DD). For each group, DD uses a GMM to estimate the distribution of its points.\nThen KL divergences between these GMMs are estimated using Monte Carlo method, and then the\nKNN-based detector is used to \ufb01nd anomalous GMMs (i.e., groups).\nFor all algorithms we used K = 8 topics and T = 6 genres as it was suggested by BIC searches. We\nset \uf8ff0 = \u232b0 = 200 for FGM. The performance is measured by the area under the ROC curve (AUC)\nof retrieving the anomalies from the test set. In the supplementary material we also show results\nusing the average precision performance measure. The performances from 30 random runs are\nshown in Figure 4(b). GMM cannot detect the group anomalies that do not have anomalous points.\nThe performance of LDA-KNN was also close to the 50% random baseline. A possible reason is\nthat the KNN detector did not perform well in the K = 8 dimensional space. MGMM, ThM, and\nFGM show improvements over the random baseline, and FGM achieves signi\ufb01cantly better results\nthan others: the paired t-test gives a p-value of 1.6 \u21e5 105 for FGM vs. ThM. We can also see that\nthe DD method performs poorly possibly due to many error-prone steps including \ufb01tting the GMMs\nand estimating divergences using Monte Carlo method.\n\n6.3 Turbulence Data\nWe present an explorative study of detecting group anomalies on turbulence data from the JHU Tur-\nbulence Database Cluster2 (TDC) [22]. TDC simulates \ufb02uid motion through time on a 3-dimensional\ngrid, and here we perform our experiment on a continuous 1283 sub-grid. In each time step and each\n\n2http://turbulence.pha.jhu.edu\n\n7\n\n\f0.75\n0.7\n0.65\n0.6\n0.55\n0.5\n0.45\n0.4\n0.35\n\nC\nU\nA\n\n(a) Sample images and stitched anomalies\n\nP\n\nLDA\u2212KNN MGMM\n\nThM FGM\u2212DB\n(b) Detection performance\n\nDD\n\nFigure 4: Detection of stitched images. (a) Images samples. Green boxes (\ufb01rst row) contain natural\nimages, and yellow boxes (second row) contain stitched anomalies. (b) The detection AUCs.\n\nvertex of the grid, TDC records the 3-dimensional velocity of the \ufb02uid. We consider the vertices in a\nlocal cubic region as a group, and the goal is to \ufb01nd groups of vertices whose velocity distributions\n(i.e. moving patterns) are unusual and potentially interesting. The following steps were used to ex-\ntract the groups: (1) We chose the {(8i, 8j, 8k)}i,j,k grid points as centers of our groups. Around\nthese centers, the points in 73 sized cubes formed our groups. (2) The feature of a point in the cube\nwas its velocity relative to the velocity at its cube\u2019s center point. After these pre-processing steps,\nwe had M = 4 096 groups, each of which had 342 3-dimensional feature vectors.\nWe applied MGMM, ThM, and FGM to \ufb01nd anomalies in this group data. T = 4 genres and K = 6\ntopics were used for all methods. We do not have a groundtruth for anomalies in this data set.\nHowever, we can compute the \u201cvorticity score\u201d [23] for each vertex that indicates the tendency of\nthe \ufb02uid to \u201cspin\u201d. Vortices and especially their interactions are uncommon and of great interest in\nthe \ufb01eld of \ufb02uid dynamics. This vorticity can be considered as a hand crafted anomaly score based\non expert knowledge of this \ufb02uid data. We do not want an anomaly detector to match this score\nperfectly because there are other \u201cnon-vortex\u201d anomalous events it should \ufb01nd as well. However,\nwe do think higher correlation with this score indicates better anomaly detection performance.\nFigure 5 visualizes the anomaly scores of FGM and the vorticity. We can see that these pictures are\nhighly correlated, which implies that FGM was able to \ufb01nd interesting turbulence activities based on\nvelocity only and without using the de\ufb01nition of vorticity or any other expert knowledge. Correlation\nvalues between vorticity and the MGMM, ThM, and FGM scores from 20 random runs are displayed\nin Fig. 5(c), showing that FGM is better at \ufb01nding regions with high vorticity.\n\n0.54\n0.52\n0.5\n0.48\n0.46\n0.44\n0.42\n\ny\nt\ni\nc\ni\nt\nr\no\nV\nh\n\n \n\nt\ni\n\n \n\nw\nn\no\n\ni\nt\n\nl\n\na\ne\nr\nr\no\nC\n\n(a) FGM-DB Score\n\n(b) Vorticity\n\nMGMM\n\nThM\n\nFGM\u2212DB\n\n(c)\n\nFigure 5: Detection results for the turbulence data. (a) & (b) FGM-DB anomaly score and vorticity\nvisualized on one slice of the cube. (c) Correlations of the anomaly scores with the vorticity.\n\n7 Conclusion\nWe presented the generative Flexible Genre Model (FGM) for the group anomaly detection problem.\nCompared to traditional topic models, FGM is able to characterize groups\u2019 behaviors at multiple\nlevels. This detailed characterization makes FGM an ideal tool for detecting different types of group\nanomalies. Empirical results show that FGM achieves better performance than existing approaches.\nIn the future, we will examine other possibilities as well. For model selection, we can extend FGM\nby using nonparametric Bayesian techniques such as hierarchical Dirichlet processes [24]. It would\nalso be interesting to study structured groups in which the exchangeability assumption is not valid.\n\n8\n\n\fReferences\n[1] Varun Chandola, Arindam Banerjee, and Vipin Kumar. Anomaly detection: A survey. ACM\n\nComputing Surveys, 41-3, 2009.\n\n[2] Geoffrey G. Hazel. Multivariate gaussian MRF for multispectral scene segmentation and\n\nanomaly detection. IEEE Trans. Geoscience and Remote Sensing, 38-3:1199 \u2013 1211, 2000.\n\n[3] Kaustav Das, Jeff Schneider, and Daniel Neill. Anomaly pattern detection in categorical\n\ndatasets. In Knowledge Discovery and Data Mining (KDD), 2008.\n\n[4] Kaustav Das, Jeff Schneider, and Daniel Neill. Detecting anomalous groups in categorical\n\ndatasets. Technical Report 09-104, CMU-ML, 2009.\n\n[5] Philip K. Chan and Matthew V. Mahoney. Modeling multiple time series for anomaly detection.\n\nIn IEEE International Conference on Data Mining, 2005.\n\n[6] Eamonn Keogh, Jessica Lin, and Ada Fu. Hot sax: Ef\ufb01ciently \ufb01nding the most unusual time\n\nseries subsequence. In IEEE International Conference on Data Mining, 2005.\n\n[7] G. Mark Voit. Tracing cosmic evolution with clusters of galaxies. Reviews of Modern Physics,\n\n77(1):207 \u2013 258, 2005.\n\n[8] B. de Finetti. Funzione caratteristica di un fenomeno aleatorio. Atti della R. Academia\nNazionale dei Lincei, Serie 6. Memorie, Classe di Scienze Fisiche, Mathematice e Naturale, 4,\n1931.\n\n[9] Thomas Hofmann. Unsupervised learning with probabilistic latent semantic analysis. Machine\n\nLearning Journal, 2001.\n\n[10] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation. JMLR,\n\n3:993\u20131022, 2003.\n\n[11] Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian\n\nrestoration of images. IEEE Trans. PAMI, 6:721 \u2013 741, 1984.\n\n[12] Gilles Celeux, Didier Chaveau, and Jean Diebolt. Stochastic version of the em algorithm: An\nexperimental study in the mixture case. J. of Statistical Computation and Simulation, 55, 1996.\n[13] Liang Xiong, Barnab\u00b4as P\u00b4oczos, and Jeff Schneider. Hierarchical probabilistic models for group\nanomaly detection. In International conference on Arti\ufb01cial Intelligence and Statistics (AIS-\nTATS), 2011.\n\n[14] Mikaela Keller and Samy Bengio. Theme-topic mixture model for document representation.\n\nIn Learning Methods for Text Understanding and Mining, 2004.\n\n[15] Li Fei-Fei and P. Perona. A bayesian hierarchical model for learning natural scene categories.\n\nIEEE Conf. CVPR, pages 524\u2013531, 2005.\n\n[16] Gabriel Doyle and Charles Elkan. Accounting for burstiness in topic models. In International\n\nConference on Machine Learning, 2009.\n\n[17] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis.\n\nChapman and Hall/CRC, 2003.\n\n[18] Thomas P. Minka. Estimating a dirichlet distribution. http://research.microsoft.\n\ncom/en-us/um/people/minka/papers/dirichlet, 2009.\n\n[19] Gideon E. Schwarz. Estimating the dimension of a model. Annals of Statistics, (6-2):461\u2013464,\n\n1974.\n\n[20] David G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91 \u2013\n\n110, 2004.\n\n[21] Manqi Zhao. Anomaly detection with score functions based on nearest neighbor graphs. In\n\nNIPS, 2009.\n\n[22] E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations\n\nusing a database cluster. In Supercomputing SC, 2007.\n\n[23] Charles Meneveau. Lagrangian dynamics and models of the velocity gradient tensor in turbu-\n\nlent \ufb02ows. Annual Review of Fluid Mechanics, 43:219\u201345, 2011.\n\n[24] Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. Hierarchical Dirichlet\n\nprocess. Journal of the American Statistical Association, 101:1566 \u2013 1581, 2006.\n\n9\n\n\f", "award": [], "sourceid": 653, "authors": [{"given_name": "Liang", "family_name": "Xiong", "institution": null}, {"given_name": "Barnab\u00e1s", "family_name": "P\u00f3czos", "institution": null}, {"given_name": "Jeff", "family_name": "Schneider", "institution": null}]}