Integrating Topics and Syntax

Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)

Thomas Griffiths, Mark Steyvers, David Blei, Joshua Tenenbaum


Statistical approaches to language learning typically focus on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or seman- tics beyond statistical dependency. This model is competitive on tasks like part-of-speech tagging and document classification with models that exclusively use short- and long-range dependencies respectively.