Hierarchically Supervised Latent Dirichlet Allocation

Part of Advances in Neural Information Processing Systems 24 (NIPS 2011)

Bibtex Metadata Paper

Authors

Adler Perotte, Frank Wood, Noemie Elhadad, Nicholas Bartlett

Abstract

We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bag-of-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not.