Model Based Population Tracking and Automatic Detection of Distribution Changes

Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)

Bibtex Metadata Paper

Authors

Igor Cadez, P. S. Bradley

Abstract

Probabilistic mixture models are used for a broad range of data anal- ysis tasks such as clustering, classification, predictive modeling, etc. Due to their inherent probabilistic nature, mixture models can easily be combined with other probabilistic or non-probabilistic techniques thus forming more complex data analysis systems. In the case of online data (where there is a stream of data available) models can be constantly up- dated to reflect the most current distribution of the incoming data. How- ever, in many business applications the models themselves represent a parsimonious summary of the data and therefore it is not desirable to change models frequently, much less with every new data point. In such a framework it becomes crucial to track the applicability of the mixture model and detect the point in time when the model fails to adequately represent the data. In this paper we formulate the problem of change detection and propose a principled solution. Empirical results over both synthetic and real-life data sets are presented.

1 Introduction and Notation

Consider a data set D = fx1; x2; : : : ; xng consisting of n independent, identically dis- tributed (iid) data points. In context of this paper the data points could be vectors, se- quences, etc. Further, consider a probabilistic mixture model that maps each data set to a real number, the probability of observing the data set: