Distributed Inference for Latent Dirichlet Allocation

Newman, David; Smyth, Padhraic; Welling, Max; Asuncion, Arthur

Distributed Inference for Latent Dirichlet Allocation

David Newman, Padhraic Smyth, Max Welling, Arthur U. Asuncion

Advances in Neural Information Processing Systems 20 (NIPS 2007)

Bibtex Metadata Paper

Abstract

processors only sees

We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic” model – using distributed compu- of the total data set. We pro- tation, where each of pose two distributed inference schemes that are motivated from different perspec- tives. The ﬁrst scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme re- lies on a hierarchical Bayesian extension of the standard LDA model to directly processors—it has a theo- account for the fact that data are distributed across retical guarantee of convergence but is more complex to implement than the ap- proximate method. Using ﬁve real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors.

Abstract

Name Change Policy