Robust Neural Network Regression for Offline and Online Learning

Part of Advances in Neural Information Processing Systems 12 (NIPS 1999)

Bibtex Metadata Paper


Thomas Briegel, Volker Tresp


We replace the commonly used Gaussian noise model in nonlinear regression by a more flexible noise model based on the Student-t(cid:173) distribution. The degrees of freedom of the t-distribution can be chosen such that as special cases either the Gaussian distribution or the Cauchy distribution are realized. The latter is commonly used in robust regres(cid:173) sion. Since the t-distribution can be interpreted as being an infinite mix(cid:173) ture of Gaussians, parameters and hyperparameters such as the degrees of freedom of the t-distribution can be learned from the data based on an EM-learning algorithm. We show that modeling using the t-distribution leads to improved predictors on real world data sets. In particular, if outliers are present, the t-distribution is superior to the Gaussian noise model. In effect, by adapting the degrees of freedom, the system can "learn" to distinguish between outliers and non-outliers. Especially for online learning tasks, one is interested in avoiding inappropriate weight changes due to measurement outliers to maintain stable online learn(cid:173) ing capability. We show experimentally that using the t-distribution as a noise model leads to stable online learning algorithms and outperforms state-of-the art online learning methods like the extended Kalman filter algorithm.