Distribution of Mutual Information

Part of Advances in Neural Information Processing Systems 14 (NIPS 2001)

Bibtex Metadata Paper


Marcus Hutter


The mutual information of two random variables z and J with joint probabilities {7rij} is commonly used in learning Bayesian nets as well as in many other fields. The chances 7rij are usually estimated by the empirical sampling frequency nij In leading to a point es(cid:173) timate J(nij In) for the mutual information. To answer questions like "is J (nij In) consistent with zero?" or "what is the probability that the true mutual information is much larger than the point es(cid:173) timate?" one has to go beyond the point estimate. In the Bayesian framework one can answer these questions by utilizing a (second order) prior distribution p( 7r) comprising prior information about 7r. From the prior p(7r) one can compute the posterior p(7rln), from which the distribution p(Iln) of the mutual information can be cal(cid:173) culated. We derive reliable and quickly computable approximations for p(Iln). We concentrate on the mean, variance, skewness, and kurtosis, and non-informative priors. For the mean we also give an exact expression. Numerical issues and the range of validity are discussed.