On Reversing Jensen's Inequality

Part of Advances in Neural Information Processing Systems 13 (NIPS 2000)

Bibtex Metadata Paper


Tony Jebara, Alex Pentland


Jensen's inequality is a powerful mathematical tool and one of the workhorses in statistical learning. Its applications therein include the EM algorithm, Bayesian estimation and Bayesian inference. Jensen com(cid:173) putes simple lower bounds on otherwise intractable quantities such as products of sums and latent log-likelihoods. This simplification then per(cid:173) mits operations like integration and maximization. Quite often (i.e. in discriminative learning) upper bounds are needed as well. We derive and prove an efficient analytic inequality that provides such variational upper bounds. This inequality holds for latent variable mixtures of exponential family distributions and thus spans a wide range of contemporary statis(cid:173) tical models. We also discuss applications of the upper bounds including maximum conditional likelihood, large margin discriminative models and conditional Bayesian inference. Convergence, efficiency and prediction results are shown. 1