\nm \n'-\nQ) \n> \n\u00ab \n\n0.2 \n\ntraining fault probability \n\nEI \n\n.. \n\u2022 \n\np=O.OO \np=O.05 \np=O. 10 \np=O.30 \n\n9 \n\n0.0 \n\n0.0 \n\n0.2 \n\n0.4 \n\n0.6 \n\n0.8 \n\n1.0 \n\nT est fault probability \n\nFigure 1. Performance for various training conditions. Four 8-30-8 encoders were \ntrained with different probabilities for hidden unit misfiring. Each data point is an \naverage over 1000 random stimuli with random hidden unit faults. Outputs are \nscored correct if the most active output node corresponds to the active input node. \n\n\f92 \n\nJudd and Munro \n\n3.2. DISTANCE \n\n3.2.1 Distances increase with fault probability \n\nDistances were measured between all pairs of hidden unit representations. Several net(cid:173)\nworks trained with different fault probabilities and various numbers of hidden units were \nexamined. As expected, both the minimum distances and average distances increase with \nthe training fault probability until it approaches 0.5 per node (see Figure 2). For proba(cid:173)\nbilities above 0.25, the minimum distances fall within the theoretical bounds for a 30 bit \ncode of a 16 symbol alphabet given by Gilbert and Elias (see Blahut, 1987). \n\nElias Bound \n\n14 \n\n12 \n\n-.s 10 -CD \n\n(.) c \nas \n~ 8 \nc \n\n6 \n\n4 \n\n0.0 \n\no \n\u2022 \n\naverage \nminimum \n\n0.3 \n0.1 \ntraining fault probability \n\n0.2 \n\n0.4 \n\nFigure 2. Distance increases with fault probability. Average and minimum L1 \ndistances are plotted for 16-30-16 networks trained with fault probabilities \nranging from 0.0 to 0.4. Each data point represents an average over 100 \nnetworks trained using different weight initializations. \n\n3.2.2. Input probabilities affect distance \n\nThe probability distribution over the inputs influences the relative distances of the repre(cid:173)\nsentations at the hidden unit level. To illustrate this, a 4-10-4 encoder was trained using \nvarious probabilities for one of the four inputs (denoted P*), distributing the remaining \nprobabilty unifonnly among the other three. The average distance between the representa(cid:173)\ntion of p* and the others increases with its probability, while the average distmlce among \nthe other three decreases as shown in the upper part of Figure 3. The more frequent pat(cid:173)\nterns are generally expected to \"claim\" a larger region of representation space. \n\n\fNets with Unreliable Hidden Nodes Learn Error-Correcting Codes \n\n93 \n\n6 ~~~------------~------------------~ \n\n5 \n\nCD \n\nU c as -1/1 \nas ... CD > \n\n