Part of Advances in Neural Information Processing Systems 15 (NIPS 2002)
Koji Tsuda, Motoaki Kawanabe, Klaus-Robert Müller
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is pre- served in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clus- tering algorithm specialized for the Fisher score, which can exploit im- portant dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
as follows: Let us assume that a probabilistic model
parameter estimate