{"title": "Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models", "book": "Advances in Neural Information Processing Systems", "page_first": 625, "page_last": 632, "abstract": null, "full_text": "Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models\n\nAlexander T. Ihler Padhraic Smyth Donald Bren School of Information and Computer Science U.C. Irvine ihler@ics.uci.edu smyth@ics.uci.edu\n\nAbstract\nData sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video surveillance, and Web data analysis. We propose a non-parametric Bayesian framework for modeling collections of such data. In particular, we use a Dirichlet process framework for learning a set of intensity functions corresponding to different categories, which form a basis set for representing individual time-periods (e.g., several days) depending on which categories the time-periods are assigned to. This allows the model to learn in a data-driven fashion what \"factors\" are generating the observations on a particular day, including (for example) weekday versus weekend effects or day-specific effects corresponding to unique (single-day) occurrences of unusual behavior, sharing information where appropriate to obtain improved estimates of the behavior associated with each category. Applications to realworld data sets of count data involving both vehicles and people are used to illustrate the technique.\n\n1 Introduction\nAs sensor and storage technologies continue to improve in terms of both cost and performance, increasingly rich data sets are becoming available that characterize the rhythms of human activity over time. Examples include logs of radio frequency identification (RFID) tags, freeway traffic over time (loop-sensor data), crime statistics, email and Web access logs, and many more. Such data can be used to support a variety of different applications, such as classification of human or animal activities, detection of unusual events, or to support the broad understanding of behavior in a particular context such as the temporal patterns of Web usage. To ground the discussion, consider data consisting of a collection of individual or aggregated events from a single sensor, e.g., a time-stamped log recording every entry and exit from a building, or the timing and number of highway traffic accidents. For example, Figure 1 shows several days worth of data from a building log, smoothed so that the similarities in patterns are more readily visible. Of interest is the modeling of the underlying intensity of the process generating the data, where intensity here refers to the rate at which events occur. These processes are typically inhomogeneous in time (as in Figure 1), as they arise from the aggregated behavior of individuals, and thus exhibit a temporal dependence linked to the rhythms of the underlying human activity. The complexity of this temporal dependence is application-dependent and generally unknown before observing the data, suggesting that non- or semi-parametric methods (methods whose complexity is capable of growing as the number of observations increase) may be particularly appropriate. Formulating the underlying event generation as an inhomogeneous Poisson process is a common first step (see, e.g., [1, 4]), as it allows the application of various classic density estimation techniques to estimate the time-dependent intensity function (a normalized version of the rate function; see Sec-\n\n\f\ntion 2). Techniques used in this context include kernel density estimation [2], wavelet analysis [3], discretization [1], and nonparametric Bayesian models [4, 5]. Among these, nonparametric Bayesian approaches have a number of appealing advan60 tages. First, they allow us to represent and reason about uncertainty in the intensity function, 50 providing not just a single estimate but a distribution over functions. Second, the Bayesian 40 framework provides natural methods for model selection, allowing the data to be naturally ex30 plained by a parsimonious set of intensity functions, rather than using the most complex expla20 nation (though similar effects may be achieved using penalized likelihood functions [3]). Fi10 nally, Bayesian methods generalize to multiple or hierarchical models, which allow informa0 0 6 12 18 24 tion to be shared among several related but differing sets of observations (e.g., multiple days Figure 1: Count data from a building entry log of data). This second point is crucial for many observed on ten Mondays, each smoothed using a problems, as we rarely obtain many observa- kernel function [2, 6] to enable visual comparison. tions of exactly the same process under exactly the same conditions; instead, we observe multiple instances which are thought to be similar, but may in fact represent any number of slightly differing circumstances. For example, behavior may be dependent on not only time of day but also day of week, type of day (weekend or weekday), unobserved factors such as the weather, or other unusual circumstances. Sharing information allows us to improve our model, but we should only do so where appropriate (itself best indicated by similarity in the data). By being Bayesian, we can remain agnostic of what data should be shared and reason over our uncertainty in this structure. In what follows we propose a non-parametric Bayesian framework for modeling intensity functions for event data over time. In particular, we describe a Dirichlet process framework for learning the unknown rate functions, and learn a set of such functions corresponding to different categories. Individual time-periods (e.g., individual days) are then represented as additive combinations of intensity functions, depending on which categories are assigned to each time-period. This allows the model to learn in a data-driven fashion what \"factors\" are generating the observations on a particular day, including (for example) weekday versus weekend effects as well as day-specific effects corresponding to unusual behavior present only on a single day. Applications to two realworld data sets, a building access log and accident statistics, are used to illustrate the technique. We will discuss in more detail in the sections that follow how our proposed approach is related to prior work on similar topics. Broadly speaking, from the viewpoint of modeling of inhomogeneous time-series of counts, our work extends the work of [4] to allow sharing of information among multiple, related processes (e.g., different days). Our approach can also be viewed as an alternative to the hierarchical Dirichlet process (HDP, [7]) for problems where the patterns across different groups are much more constrained than would be expected under an HDP model.\n\n2 Poisson processes\nA common model for continuous-time event (counting) data is the Poisson process [8]. As the discrete Poisson distribution is characterized by a rate parameter , the Poisson process1 is characterized by a rate function (t); it has the property that over any given time interval T , the number of T events occurring within that time is Poisson with rate given by  = (t). We shall use a Bayesian semi-parametric model for (t), described next. Let us suppose that we have a single collection of event times {i } arising from a Poisson process with rate function (t), i.e., {i }  P[ ; (t)] (1)\nHere, we shall use the term Poisson process interchangeably with inhomogeneous Poisson process, meaning that the rate is a non-constant function of time t.\n1\n\n\f\n where (t) is defined on t  [-, ]. We may write (t) =  f (t), where  = - (t) and f (t) is the intensity function, a normalized version of the rate function. A Bayesian model places prior distributions on these quantities; by selecting a parametric prior for  and a nonparametric prior for f (t), we obtain a semi-parametric prior for (t). Specifically, we choose K   (a, b) f (t) = (t; )dG() G  DP[G0 ] where  is the gamma distribution, K is a kernel function (for example a Gaussian distribution) and DP is a Dirichlet process [9] with parameter  and base distribution G0 . The Dirichlet process provides a nonparametric prior for f (t), such that (with probability one) f has the form of a mixture j model with infinitely many components: f (t) = wj K (t; j ). If desired we may also place prior distributions on some or all of these quantities (e.g., , {a, b}, or the parameters of G0 ) as well. Dirichlet processes and their variations [7, 911] have gained recent attention for their ability to provide representations consisting of arbitrarily large mixture models. In particular, they have been the subject of recent work in modeling intensity functions for Poisson processes defined over time [4] and spacetime [5]. 2.1 Monte Carlo Inference\n\nFor the Poisson process model just described, the likelihood of the data {i }, i = 1 . . . N at some time T is given by -T  i N p({i };  , f ()) = exp  f (t) f (i )\n-\n\nwhich, as T   (i.e., as we observe a complete data set) becomes p({i };  , f ()) = e xp(- )\nN\n\ni\n\n( 2)\n\nf (i )\n\nThe rightmost term (term involving f ) has the same form as the likelihood of the i as i.i.d. samples from the mixture model distribution defined by f . As in many mixture model applications, it will be helpful to create auxiliary assignment variables zi for each i , indicating with which of the mixture components the sample i is associated. The complete data likelihood is then . e i p({i , zi };  , f ()) = xp(- ) N wzi K (i ; zi ) Inference is typically accomplished using Markov chain Monte Carlo (MCMC) sampling [9]. Specifically, although the posterior for  has a simple closed form, p( |{i })  (N + a, 1 + b), sampling from f is more complicated. Samples from f can be drawn in a variety of ways. One of the most common methods is the so-called \"Chinese Restaurant Process\" (CRP, [7, 9]), in which the relative weights wj are marginalized out while drawing the assignment variables zi . Such exact sampling approaches work by exploiting the fact that only a finite number of the mixture components are occupied by the data; by treating the unoccupied clusters as a single group, the infinite number of potential associations can be treated as a finite number. The operations involved (such as sampling values for j given a collection of associated event times i ) are easier for certain choices of K and G than others; for example using a Gaussian kernel and normal-Wishart distribution, the necessary quantities have convenient closed forms [9]. Another, more brute-force way around the issue of having infinitely many mixture components is to perform approximate sampling using a \"truncated\" Dirichlet process representation [12, 13]. As described in [12], for a given , data set size N , and tolerance , one can compute a maximum number of components M necessary to approximate the Dirichlet process with a Dirichlet distribution using the relation  4 N exp[-(M - 1)/] and in this manner, can work with finite numbers of mixture components. This representation will prove useful in Section 3.\n\n\f\nThe truncated DP approximation is helpful primarily because it allows us to sample the (complete) function f (t) (as compared to only the \"occupied\" part in the CRP formulation). Given a set of assignments {zi } occupying (arbitrarily numbered) clusters 1 . . . J , we can sample the weights wj in two steps. Fi t, we sample the occupied mixture weights, wj (j  J ), and the total unoccupied rs weight w = J +1 wj , by drawing independent, Gamma-distributed random variables according to  (Nj , 1) and (, 1), respectively, and normalizing them to sum to one. The values of weights wj in the unoccupied clusters (j > J ) can then be sampled given w using the stickbreaking representation  of Sethuraman [14]. Note that the truncated DP approximation highlights the importance of also sampling  if we hope for our representation to act non-parametric in the sense that it may grow more complex as the data increase, since for a fixed  and the number of components M is quite insensitive to N . For more details on sampling such hyper-parameters see e.g. [10]. 2.2 Finite Time Domains\n\nOur description of non-parametric Bayesian techniques for Poisson processes has so far made implicit use of the fact that the domain of f (t) is infinite. When the domain of f is finite, for example [0, 1], a few minor complications arise. For example, the kernel functions K () should properly be defined as positive only on this interval. One possible solution to this issue is to use an alternate kernel function, such as the Beta distribution [4]. However, this means that posterior sampling of the parameters  is no longer possible in closed form. Although methods such as Metropolis-Hastings may be used [4], they can be highly dependent on the choice of proposal density. Here, we take a slightly different approach, drawing truncated Gaussian kernels with parameters sampled from a truncated Normal-Wishart distribution. Specifically, we define K (t;  = [,  2 ]) = N (t; ,  2 )1 () 1 N (x; ,  2 )dx 0 [,  2 ]  1 () 1 ( ) N W (,  2 )\n\nwhere 1 (t) is one on [0, 1] and zero elsewhere and N W is the normal-Wishart distribution. Sampling in this model turns out to be relatively simple and efficient using rejection methods. Given the 1 restrictions imposed on  and  , one can show that the normalizing quantity Z = 0 N (x; ,  2 ) is always greater than one-third. Thus, to sample from the posterior we simply draw from the original, closed form posterior distribution, discarding (and re-sampling) if   [0, 1],   [0, 1], or with probability 1 - (3Z )-1 .\n\n3 Categorical Models\nAs mentioned in the introduction, we often have several collections d = 1 . . . D of observations, {di } with i = 1 . . . Nd , arising from D instances of the same or similar processes. If these processes are known to be identical and independent, sharing information among them is relatively easy--we obtain D observations Nd with which to estimate  , and the di are collectively used to estimate f (t). However, if these processes are not necessarily identical, sharing information becomes more difficult. Yet it is just this situation which is most typical. Again consider Figure 1, which shows event data from ten different Mondays. Clearly, there is a great deal of consistency in both size and shape, although not every day is exactly the same, and one or two stand out as different. Were we to also look at, for example, Sundays or Tuesdays (as we do in Section 4), we would see that although Sunday and Monday appear quite different and, one suspects, have little shared information, Monday and Tuesday appear relatively similar and this similarity can probably be used to improve our rate estimates for both days. In this example, we might reasonably assume that the category memberships are known (for example, whether a given day is a weekday or weekend, or a Monday or Tuesday), though we shall relax this assumption in later sections. Then, given a structure of potential relationships, what is a reasonable model for sharing information among categories? There are, of course, many possible choices; we use a simple additive model, described in the next section.\n\n\f\n3.1\n\nAdditive Models\n\nThe intuition behind an additive model is that the data arises from the superposition of several underlying causes present during the period of interest. Again, we initially assume that the category memberships are known; thus, if a category is associated with a particular day, the activity profile associated with that category will be observed, along with additional activity arising from each of the other categories present. Let us associate a rate function c (t) = c fc (t) with each category in our model. We define the rate function of a given day d to be the sum of the rate functions of each category to which d belongs. Denoting by sdc the (binaryc-valued) membership indicator, i.e., that category c is present during day d, we have that d (t) = :sdc =1 c (t). At first, this model might seem quite restrictive. However, it matches our intuition of how the data is generated, stemming from the presence or absence of a particular behavioral pattern associated with some underlying cause (such as it being a work day). In fact, we do not want a model which is too flexible, such as a linear combination of patterns, since it is not physically meaningful to say, for example, that a day is only \"part\" Monday. To learn the profiles associated with a given cause (e.g., things that happen every day versus only on weekdays or only on Mondays), it makes sense to take an \"all or nothing\" model where the pattern is either present, or not. This also suggests that other methods of coupling Dirichlet processes, such as the hierarchical Dirichlet process [7], may be too flexible. The HDP couples the parameters of components across levels, but only loosely relates the actual shape of the profile, since it allows components to be larger or smaller (or even disappear completely). In [7], this is a desirable quality, but in our application it is not. Using an additive model allows both a consistent size and shape to emerge for each category, while associating deviations from that profile to categories further down in the hierarchy. Inference in this system is not significantly more difficult than in the single rate function case (Section 2). We define the association as [ydi , zdi ], where ydi indicates which of the categories generated c event di . It is easy to sample ydi according to p(ydi = c|{c (t)})  [c (di )] / [ c (di )]. 3.2 Sampling Membership\n\nOf course, it is frequently the case that the membership(s) of each collection of data are not known precisely. In an extreme case, we may have no idea which collections are similar and should be grouped together and wish to find profiles in an unsupervised manner. More commonly, however, we have some prior knowledge and interpretation of the profiles but do not wish to strictly enforce a known membership. For example, if we create categories with assigned meanings (weekdays, weekends, Sundays, Mondays, and so on), a day which is nominally a Monday but also happens to be a holiday, closure, or other unusual circumstances may be completely different from other Monday profiles. Similarly, a day with unusual extra activity (receptions, talks, etc.) may see behavior unique to its particular circumstances and warrant an additional category to represent it. We can accommodate both these possibilities by also sampling the values of the membership indicator variables sdc , i.e., the binary indicator that day d sees behavior from category c. To this end, let us assume we have some prior knowledge of these membership probabilities, pdc (sdc ); we may then re-sample from their posterior distributions at each iteration of MCMC. This sampling step is difficult to do outside the truncated representation. Although up until this point we could easily have elected to use, for example, the CRP formulation for sampling, the association variables {ydi , zdi } are tightly coupled with the memberships sdc since if any ydi = c we must have that sdc = 1. Instead, to sample the sdc we condition on the truncated rate functions c (t), with truncation depth M chosen to provide arbitrarily high precision. The likelihood of the data under these rate functions for any values of {sdc } can then be computed directly via (2) where c c sdc c (t). s dc  c and f (t) =  -1 = In practice, we propose changing the value of each membership variable sdc individually given the others, though more complex moves could also be applied. This gives the following sequence of MCMC sampling: (1) given a truncated representation of the {c (t)}, sample membership variables {sdc }; (2) given {c (t)} and {sdc }, sample associations {zdi }; (3) given associations {zdi }, sample\n\n\f\n3\n\n50 40\n\n50 40 30 20 10 6 12 18 24 0 0 6 12 18 24\n\n2\n30 20 10\n\n1\n\n0 0\n\n6\n\n12\n\n18\n\n24\n\n0 0\n\n(a) Sundays\n\n(b) Mondays\n\n(c) Tuesdays\n\nFigure 2: Posterior mean estimates of rate functions for building entry log data, estimated individually for each day (dotted) and learned by sharing information among multiple days (solid) for (a) Sundays, (b) Mondays, and (c) Tuesdays. Sharing information among similar days gives greatly improved estimates of the rate functions, resolving otherwise obscured features such as the decrease during and increase subsequent to lunchtime. category magnitudes {c } and a truncated representation of each fc (t) consisting of weights {wj } and parameters {j }.\n\n4 Experiments\nIn this section we consider the application of our model to two data sets, one (mentioned previously) from the entry log of people entering a large campus building (produced by optical sensors at the front door), and the other from a log of vehicular traffic accidents. By design, both data sets contain about ten weeks worth of observations. In both cases, we have a plausible prior structure for and interpretation of the categories, i.e., that similar days will have similar profiles. To this end, we create categories for \"all days\", \"weekends\", \"weekdays\", and \"Sundays\" through \"Saturdays\". Each of these categories has a high probability (pdc = .99) of membership for each eligible day. To account for the possibility of unusual increases in activity, we also add categories unique to each day, with lower prior probability (pdc = .20) of membership. This allows but discourages each day to add a new category if there is evidence of unusual activity. 4.1 Building Entry Data\n\nTo see the improvement in the estimated rate functions when information is shared among similar days, Figure 2 shows results from three different days of the week (Sunday, Monday, Tuesday). Each panel shows the estimated profiles of each of the ten days estimated individually (using only that day's observations) under a Dirichlet process mixture model (dotted lines). Superimposed in each panel is a single, black curve corresponding to the total profile for that day of week estimated using our categorical model; so, (a) shows the sum of the rate functions for \"all days\", \"weekends\", and \"Sundays\", while (b) shows the sum of \"all days\", \"weekdays\", and \"Mondays\". We use the same prior distributions for both the individual estimates and the shared estimate. Several features are worth noting. First, by sharing several days worth of observations, the model can produce a much more accurate estimate of the profiles. In this case, no single day contains enough observations to be confident about the details of the rate function, so each individuallyestimated rate function appears relatively smooth. However, when information from other days is included, the rate function begins to resolve into a clearly bi-modal shape for weekdays. This \"bi-modal\" rate behavior is quite real, and corresponds to the arrival of occupants in the morning (first mode), a lull during lunchtime, and a larger, narrower second peak as most occupants return from lunch. Second, although Monday and Tuesday profiles appear similar, they also have distinct behavior, such as increased activity late Tuesday morning. This behavior too has some basis in reality, corresponding to a regular weekly meeting held around lunchtime over most (though not quite all) of the weeks in question. The breakdown of a particular day (the first Tuesday) into its component categories is shown in Figure 3. As we might expect, there is little consistency between weekdays and weekends, quite a bit of similarity among weekdays and among just Tuesdays, and (for this particular day) very little to set it apart from other Tuesdays. We can also check to see that the category memberships sdc are being used effectively. One of the Mondays in our data set fell on a holiday (the individual profile very near zero). If we average the probabilities computed during MCMC to estimate the posterior probability of the sdc for that\n\n\f\n40 30 20 10 0 0\n\n40 30 20 10 0 0\n\n40 30 20 10 0 0\n\n40 30 20 10 0 0\n\n6\n\n12\n\n18\n\n24\n\n6\n\n12\n\n18\n\n24\n\n6\n\n12\n\n18\n\n24\n\n6\n\n12\n\n18\n\n24\n\n(a) All Days\n\n(b) Weekdays\n\n(c) Tuesdays\n\n(d) Unique\n\nFigure 3: Posterior mean estimates of the rate functions for each category to which the first Tuesday data might belong. For comparison, the total rate (sum of all categories) is shown as the dotted line. (a) The \"all days\" category is small, indicating little consistency in the data between weekdays and weekends; (b) the \"weekdays\" category is larger, and contains a component which appears to correspond to the occupants' return from lunch; (c) the \"Tuesday\" category has modes in the morning and afternoon, perhaps capturing regular meetings or classes; (d) the \"unique\" category (a category unique to this particular day) shows little or no activity.\n15\n0.6\n\n20 15\n\n10\n0.4 0.2 0 0\n\n10 5 5 0 0 0 0\n\n6\n\n12\n\n18\n\n24\n\n6\n\n12\n\n18\n\n24\n\n6\n\n12\n\n18\n\n24\n\nFigure 4: Profiles associated with individual-day categories in the entry log data for several days with known events (periods between dashed vertical lines). The model successfully learns which days have significant unusual activity and associates reasonable profiles with that activity (note that increases in entrance count rate typically occurs shortly before or at the beginning of the event time).\n\nparticular day, we find that it has near-zero probability of belonging to either the weekday or Monday categories, and uses only the all-day and unique categories. We can also examine days which have high probability of requiring their own category (indicating unusual activity). For this data set, we also have partial ground truth, consisting of a number of dates and times when activities were scheduled to take place in the building. Figure 4 shows three such days, and the corresponding rate profiles associated with their single-day categories. Again, all three days are estimated to have additional activity, and the period of time for that activity corresponds well with the actual start and end time shown in the schedule (dashed vertical lines).\n60\n\n4.2\n\nVehicular Accident Data\n40\n\nOur second data set consists of a database of vehicular accident times recorded by North Carolina police departments. As we might expect of driving patterns, there is still less activity on weekends, but far more than was observed in the campus building log.\n\n20\n\n0 0\n\n6\n\n12\n\n18\n\n24\n\nAs before, sharing information allows us to Figure 5: Posterior mean and uncertainty for a decrease our posterior uncertainty on the rate single day of accident data, estimated individually for any particular day. Figure 5 quantifies this (red) and with data sharing (black). Sharing data idea by showing the posterior means and (point- considerably reduces the posterior uncertainty in wise) two-sigma confidence intervals for the the profile shape. rate function estimated for the same day (the first Monday in the data set) using that day's data only (red curves) and using the category-based additive model (black). The additive model leverages the additional data to produce much tighter estimates of the rate profile. As with the previous example, the additional data also helps resolve detailed features of each day's profile, as seen in Figure 6. For example, the weekday profiles show a tri-modal shape, with one mode corresponding to the morning commute, a small mode around noon, and another large mode\n\n\f\n100 80 60 40 20 0 0 6 12 18 24\n\n100 80 60 40 20 0 0 6 12 18 24\n\n100 80 60 40 20 0 0 6 12 18 24\n\n(a) Sundays\n\n(b) Mondays\n\n(c) Fridays\n\nFigure 6: Posterior mean estimates of rate functions for vehicular accidents, estimated individually for each day (dotted) and with sharing among multiple days (solid) for (a) Sundays, (b) Mondays, and (c) Fridays. As in Figure 2, sharing information helps resolve features which the individual days do not have enough data to reliably estimate. around the evening commute. This also helps make the pattern of deviation on Friday clear, showing (as we would expect) increased activity at night.\n\n5 Conclusions\nThe increasing availability of logs of \"human activity\" data provides interesting opportunities for the application of statistical learning techniques. In this paper we proposed a non-parametric Bayesian approach to learning time-intensity profiles for such activity data, based on an inhomogeneous Poisson process framework. The proposed approach allows collections of observations (e.g., days) to be grouped together by category (day of week, weekday/weekend, etc.) which in turn leverages data across different collections to yield higher quality profile estimates. When the categorization of days is not a priori certain (e.g., days that fall on a holiday or days with unusual non-recurring additional activity) the model can infer the appropriate categorization, allowing (for example) automated detection of unusual events. On two large real-world data sets the model was able to infer interpretable activity profiles that correspond to real-world phenomena. Directions for further work in this area include richer models that allow for incorporation of observed covariates such as weather and other exogenous phenomena, as well as modeling of multiple spatially-correlated sensors (e.g., loop sensor data for freeway traffic).\n\nReferences\n[1] S. Scott and P. Smyth. The Markov modulated Poisson process and Markov Poisson cascade with applications to web traffic data. Bayesian Statistics, 7:671680, 2003. [2] R. Helmers, I.W. Mangku, and R. Zitikis. Consistent estimation of the intensity function of a cyclic Poisson process. J. Multivar. Anal., 84(1):1939, January 2003. [3] R. Willett and R. Nowak. Multiscale Poisson intensity and density estimation. submitted to IEEE Trans. IT, January 2005. [4] A. Kottas. Bayesian nonparametric mixture modeling for the intensity function of non-homogeneous Poisson processes. Technical Report ams2005-02, Department of Applied Math and Statistics, U.C. Santa Cruz, Santa Cruz, CA, 2005. [5] A. Kottas and B. Sanso. Bayesian mixture modeling for spatial Poisson process intensities, with applications to extreme value analysis. Technical Report ams2005-19, Dept. of Applied Math and Statistics, U.C. Santa Cruz, Santa Cruz, CA, 2005. [6] B.W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman & Hall, NY, 1986. [7] Y.W. Teh, M.I. Jordan, M.J. Beal, and D.M. Blei. Hierarchical Dirichlet processes. In NIPS 17, 2004. [8] D.R. Cox. Some statistical methods connected with series of events. J. R. Stat. Soc. B, 17:129164, 1955. [9] R.M. Neal. Markov chain sampling methods for Dirichlet process mixture models. J. of Comp. Graph. Stat., 9:283297, 2000. [10] M.D. Escobar and M. West. Bayesian density estimation and inference using mixtures. J. Amer. Stat. Assoc., 90:577588, 1995. [11] L.F. James. Functionals of Dirichlet processes, the Cifarelli-Reganzzini identity and Beta-Gamma processes. Ann. Stat., 33(2):647660, 2005. [12] H. Ishwaran and L.F. James. Gibbs sampling methods for stick-breaking priors. J. Amer. Stat. Assoc., 96:161173, 2001. [13] H. Ishwaran and L.F. James. Approximate Dirichlet process computing in finite normal mixtures: smoothing and prior information. J. Comp. Graph. Statist., 11:508532, 2002. [14] J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639650, 1994.\n\n\f\n", "award": [], "sourceid": 3032, "authors": [{"given_name": "Alexander", "family_name": "Ihler", "institution": null}, {"given_name": "Padhraic", "family_name": "Smyth", "institution": null}]}