Barak Pearlmutter, Ronald Rosenfeld
We present a unified framework for a number of different ways of failing to generalize properly. During learning, sources of random information contaminate the network, effectively augmenting the training data with random information. The complexity of the function computed is therefore increased, and generalization is degraded. We analyze replicated networks, in which a number of identical networks are independently trained on the same data and their results averaged. We conclude that replication almost always results in a decrease in the expected complexity of the network, and that replication therefore increases expected generalization. Simulations confirming the effect are also presented.
1 BROKEN SYMMETRY CONSIDERED HARMFUL
Consider a one-unit backpropagation network trained on exclusive or. Without hidden units, the problem is insoluble. One point where learning would stop is when all weights are zero and the output is always ~, resulting in an mean squared error of ~. But this is a saddle point; by placing the discrimination boundary properly, one point can be gotten correctly, two with errors of ~, and one with error of i, giving an MSE of i, as shown in figure 1. Networks are initialized with small random weights, or noise is injected during train(cid:173) ing to break symmetries of this sort. But in breaking this symmetry, something has been lost. Consider a kNN classifier, constructed from a kNN program and the training data. Anyone who has a copy of the kNN program can construct an iden(cid:173) tical classifier if they receive the training data. Thus, considering the classification