{"title": "Human and Machine 'Quick Modeling'", "book": "Advances in Neural Information Processing Systems", "page_first": 1151, "page_last": 1158, "abstract": null, "full_text": "Human and Machine 'Quick Modeling' \n\nJakob Bernasconi \nAsea Brown Boveri Ltd \nCorporate Research \nCH-5405 Baden, \nSWITZERLAND \n\nKarl Gustafson \nUniversity of Colorado \nDepartment of Mathematics and \nOptoelectronic Computing Center \nBoulder, CO 80309 \n\nABSTRACT \n\nWe present here an interesting experiment in 'quick modeling' by humans, \nperformed independently on small samples, in several languages and two \ncontinents, over the last three years. Comparisons to decision tree proce(cid:173)\ndures and neural net processing are given. From these, we conjecture that \nhuman reasoning is better represented by the latter, but substantially dif(cid:173)\nferent from both. Implications for the 'strong convergence hypothesis' be(cid:173)\ntween neural networks and machine learning are discussed, now expanded \nto include human reasoning comparisons. \n\n1 \n\nINTRODUCTION \n\nUntil recently the fields of symbolic and connectionist learning evolved separately. \nSuddenly in the last two years a significant number of papers comparing the two \nmethodologies have appeared. A beginning synthesis of these two fields was forged \nat the NIPS '90 Workshop #5 last year (Pratt and Norton, 1990), where one may \nfind a good bibliography of the recent work of Atlas, Dietterich, Omohundro, Sanger, \nShavlik, Tsoi, Utgoff and others. \n\nIt was at that NIPS '90 Workshop that we learned of these studies, most of which \nconcentrate on performance comparisons of decision tree algorithms (such as ID3, \nCART) and neural net algorithms (such as Perceptrons, Backpropagation). Inde(cid:173)\npendently three years ago we had looked at Quinlan's ID3 scheme (Quinlan, 1984) \nand intuitively and rather instantly not agreeing with the generalization he obtains \nby ID3 from a sample of 8 items generalized to 12 items, we subjected this example \nto a variety of human experiments. We report our findings, as compared to the \nperformance of ID3 and also to various neural net computations. \n\n1151 \n\n\f1152 \n\nBernasconi and Gustafson \n\nBecause our focus on humans was substantially different from most of the other \nmentioned studies, we also briefly discuss some important related issues for fur(cid:173)\nther investigation. More details are given elsewhere (Bernasconi and Gustafson, to \nappear). \n\n2 THE EXPERIMENT \n\nTo illustrate his ID3 induction algorithm, Quinlan (1984) considers a set C consist(cid:173)\ning of 8 objects, with attributes height, hair, and eyes. The objects are described \nin terms of their attribute values and classified into two classes, \"+\" and \"-\", re(cid:173)\nspectively (see Table 1). The problem is to find a rule which correctly classifies all \nobjects in C, and which is in some sense minimal. \n\nTable 1: The set C of objects in Quinlan's classification example. \n\nObject Height Hair \n\nEyes \n\nClass \n\n1 \n2 \n3 \n4 \n5 \n6 \n7 \n8 \n\n(s) short \n(t) tall \n(t) tall \n(s) short \n(t) tall \n(t) tall \n(t) tall \n(s) short \n\n(b) blond \n(b) blond \n(r) red \n(d) dark \n(d) dark \n(b) blond \n(d) dark \n(b) blond \n\n(bl) blue \n(br) brown \n(bl) blue \n(bl) blue \n(bl) blue \n(bl) blue \n(br) brown \n(br) brown \n\n+ \n+ \n\n+ \n\nThe ID3 algorithm uses an information-theoretic approach to construct a \"minimal\" \nclassification rule, in the form of a decision tree, which correctly classifies all objects \nin the learning set C. \nIn Figure 1, we show two possible decision trees which \ncorrectly classify all 8 objects of the set C. Decision tree 1 is the one selected by \nthe ID3 algorithm. As can be seen, \"Hair\" as root of the tree classifies four of the \neight objects immediately. Decision tree 2 requires the same number of tests and \nhas the same number of branches, but \"Eyes\" as root classifies only three objects \nat the first level of the tree. \n\nConsider now how the decision trees of Figure 1 classify the remaining four possible \nobjects in the set complement C'. Table 2 shows that the two decision trees lead to \na different classification of the four objects of sample C'. We observe that the ID3-\npreferred decision tree 1 places a large importance on the \"red\" attribute (which \noccurs only in one object of sample C), while decision tree 2 puts much less emphasis \non this particular attribute. \n\n\fHuman and Machine 'Quick Modeling' \n\n1153 \n\nDecision tree 1 \n\nDecision tree 2 \n\nFigure 1: Two possible decision trees for the classification of sample C (Table 1) \n\nTable 2: The set C' of the remaining four objects, and their classification by the \ndecision trees of Figure 1. \n\nObject Attribute Classification \nTree 1 Tree 2 \n\nValues \n\n9 \n10 \n11 \n12 \n\ns \ns \ns \nt \n\nd \nr \nr \nr \n\nbr \nbl \nbr \nbr \n\n+ \n+ \n+ \n\n+ \n\n3 GENERALIZATIONS BY HUMANS AND NEURAL \n\nNETS \n\nCurious about these differences in the generalization behavior, we have asked some \nhumans (colleagues, graduate students, undergraduate students, some nonscientists \nalso) to \"look\" at the original sample C of 8 items, presented to them without \nwarning, and to \"use\" this information to classify the remaining 4 objects. Over \nsome time, we have accumulated a \"human sample\" of total size 73 from 3 continents \nrepresenting 14 languages. The results of this human generalization experiment are \nsummarized in Table 3. We observe that about 2/3 of the test persons generalized \nin the same manner as decision tree 2, and that less than 10 percent arrived at the \ngeneralization corresponding to the ID3-preferred decision tree 1. \n\n\f1154 \n\nBernasconi and Gustafson \n\nTable 3: Classification of objects 9 through 12 by Humans and by a Neural Net. \nBased on a total sample of 73 humans. Each of the 4 contributing subsamples from \ndifferent languages and locations gave consistent percentages. \n\nObject Attribute \n\nValues \n\ns \n\ns \n\ns \n\nd \n\nr \n\nr \n\nbr \n\nbl \n\nbr \n\n9 \n\n10 \n\n11 \n\n12 \n\nA \n\n+ \n\nClassification \n\nB \n\nC \n\nD \n\nE \n\nOther \n\n+ \n\n+ \n+ \n\n+ \n+ \n+ \n8.2% \n12.1% 9.4% 4.2% 2.9% \n\n+ \n4.1% 9.6% \n\n12.3% \n\nr \n\nbr \n\nt \nHumans: \n65.8% \nNeural Net: 71.4% \n\nWe also subjected this generalization problem to a variety of neural net computa(cid:173)\ntions. In particular, we analyzed a simple perceptron architecture with seven input \nunits representing a unary coding of the attribute values (i.e., a separate input unit \nfor each attribute value). The eight objects of sample C (Table 1) were used as \ntraining examples, and we employed the perceptron learning procedure (Rumelhart \nand McClelland, 1986) for a threshold output unit . In our initial experiment, the \nstarting weights were chosen randomly in (-1,1) and the learning parameter h (the \nmagnitude of the weight changes) was varied between 0.1 and 1. After training, \nthe net was asked to classify the unseen objects 9 to 12 of Table 2. Out of the 16 \npossible classifications of this four object test set, only 5 were realized by the neural \nnet (labelled A through E in Table 3). The percentage values given in Table 3 \nrefer to a total of 9000 runs (3000 each for h = 0.1, 0.5, and 1.0, respectively). As \ncan be seen, there is a remarkable correspondence between the solution profile of \nthe neural net computations and that of the human experiment. \n\n4 BACKWARD PREDICTION \n\nThere exist many different rules which all correctly classify the given set C of 8 \nobjects (Table 1), but which lead to a different generalization behavior, i.e., to a \ndifferent classification of the remaining objects 9 to 12 (see Tables 2 and 3). From \na formal point of view, all of the 16 possible classifications of objects 9 to 12 are \nequally probable, so that no a priori criterion seems to exist to prefer one general(cid:173)\nization over the other. We have nevertheless attempted to quantify the obviously \nill-defined notion of \"meaningful generalization\". To estimate the relative \"quality\" \nof different classification rules, we propose to analyze the \"backward prediction abil(cid:173)\nity\" of the respective generalizations . This is evaluated as follows. An appropriate \nlearning method (e.g., neural nets) is used to construct rules which explain a given \nclassification of objects 9 to 12, and these rules are applied to classify the initial \nset C of 8 objects. The 16 possible generalizations can then be rated according to \ntheir \"backward prediction accuracy\" with respect to the original classification of \n\n\fHuman and Machine 'Quick Modeling' \n\n1155 \n\nthe sample C. We have performed a number of such calculations and consistently \nfound that the 5 generalizations chosen by the neural nets in the forward prediction \nmode (cf. Table 3) have by far the highest backward prediction accuracy (on the \naverage between 5 and 6 correct classifications). Their negations (\"+\" exchanged \nwith \"-\") , on the other hand, predict only about 2 to 3 of the 8 original classifica(cid:173)\ntions correctly, while the remaining 6 possible generalizations all have a backward \nprediction accuracy close to 50% (4 out of 8 correct) . These results, representing \naverages over 1000 runs, are given in Table 4. \n\nTable 4: Neural Net backward prediction accuracy for the different classifications \nof objects 9 to 12. \n\nClassification Backward prediction \n\naccuracy (% ) \n\nof objects \n\n9 \n\n10 \n\n11 \n\n12 \n\n+ \n+ + \n+ + + \n+ \n+ \n\n+ \n\n+ \n\n+ + \n+ + + \n+ + \n+ \n+ + \n+ + + + \n+ \n+ \n+ \n+ \n\n+ \n+ + \n\n+ \n\n76.0 \n71.2 \n71.1 \n67.9 \n61.9 \n52.6 \n52.5 \n52.5 \n47.4 \n47.3 \n47.0 \n37.2 \n31.7 \n30.1 \n28.3 \n23.6 \n\nIn addition to Neural Nets, we have also used the ID3 method to evaluate the back(cid:173)\nward predictive power of different generalizations. This method generates fewer \nrules than the Neural Nets (often only a single one), but the resulting tables of \nbackward prediction accuracies all exhibit the same qualitative features . As ex(cid:173)\namples, we show in Figure 2 the ID3 backward prediction trees for two different \ngeneralizations, the ID3-preferred generalization which classifies the objects 9 to 12 \nas (- + ++), and the Human and Neural Net generalization (- + --). Both trees \nhave a backward prediction accuracy of 75% (provided that \"blond hair\" in tree (a) \nis classified randomly). \n\n\f1156 \n\nBernasconi and Gustafson \n\n(a) \n\n(b) \n\nFigure 2: ID3 backward prediction trees, (a) for the ID3-preferred generalization \n(- + ++), and (b) for the generalization preferred by Humans and Neural Nets, \n(- + --) \n\nThe overall backward prediction accuracy is not the only quantity of interest in these \ncalculations. We can, for example, examine how well the original classification of an \nindividual object in the set C is reproduced by predicting backwards from a given \ngeneralization. \n\nSome examples of such backward prediction profiles are shown in Figure 3. From \nboth the ID3 and the Neural Net calculations, it is evident that the backward \nprediction behavior of the Human and Neural Net generalization is much more \ninformative than that of the ID3-solution, even though the two solutions have almost \nthe same average backward prediction accuracy. \n\nIDJ Backward Prediction: \n\nIb) \n\nIbl \n\nObject ,. \n\n~eural ~et Backward Prediction: \n\n1:1) \n\nO.S \n\no -0.>......,.,.....,.,..\"..>..>.,.;>0..>....>..>....:....,.\" \n123.l5678 \n\nObJect\" \n\nFigure 3: Individual backward prediction probabilities for the ID3-preferred gen(cid:173)\neralization [graphs (a)], and for the Human and Neural Net generalization [graphs \n(b )]. \n\n\fHuman and Machine 'Quick Modeling' \n\n1157 \n\nFinally, we have recently performed a Human backward prediction experiment. \nThese results are given in Table 5. Details will be given elsewhere (Bernasconi and \nGustafson, to appear). Note that the Backward Prediction results are commensu(cid:173)\nrate with the Forward Prediction in both cases. \n\nTable 5: Human backward predictions and accuracy from the two principal forward \ngeneralizations A (Neural Nets, Humans) and B (ID3). \n\nObject Class Backward Backward \n\nfrom A \n\nfrom B \n\n+ \n+ \n\n1 \n2 \n3 \n4 \n5 \n6 \n7 \n8 \nHumans: \nAccuracy: \n\n+ \n\n+ \n+ \n+ \n+ \n+ \n\n59% \n75% \n\n+ \n+ \n\n+ \n+ \n+ \n\n+ \n\n+ \n\n+ \n+ \n12% \n33% \n100% 75% \n\n17% \n75% \n\n5 DISCUSSION AND CONCLUSIONS \n\nOur basic conclusion from this experiment is that the \"Strong Convergence Hy(cid:173)\npothesis\" that Machine Learning and Neural Network algorithms are \"close\" can \nbe sharpened, with the two fields then better distinguished, by comparison to Hu(cid:173)\nman Modelling. From the experiment described here, we conjecture a \"Stronger \nConvergence Hypothesis\" that Humans and Neural Nets are \"closer.\" \n\nFurther conclusions related to minimal network size (re Pavel, Gluck, Henkle, 1989), \ncrossvalidation (see Weiss and Kulikowski, 1991), sharing over nodes.(as in Diet(cid:173)\nterich, Hild, Bakiri, to appear, and Atlas et al., 1990), and rule extracting (Shavlik \net al., to appear), will appear elsewhere (Bernasconi and Gustafson, to appear). Al(cid:173)\nthough we have other experiments on other test sets underway, it should be stressed \nthat our investigations especially toward Human comparisons are only preliminary \nand should be viewed as a stimulus to further investigations. \n\nACKNOWLEDGEMENT \n\nThis work was partially supported by the NFP 23 program of the Swiss National \nScience Foundation and by the US-NSF grant CDR8622236. \n\n\f1158 \n\nBernasconi and Gustafson \n\nREFERENCES \n\nL. Y. Pratt .and S. W. Norton, \"Neural Networks and Decision Tree Induction: \nExploring the Relationship Between Two Research Areas,\" NIPS '90 Workshop #5 \nSummary (1990), 7 pp. \nJ. Ross Quinlan, \"Learning Efficient Classification Procedures and Their Applica(cid:173)\ntion to Chess End Games,\" in Machine Learning: An Artificial Intelligence Ap(cid:173)\nproach, edited by R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, Springer(cid:173)\nVerlag, Berlin (1984), 463-482. \nD. E. Rumelhart and J. L. McClelland (Eds.), Parallel Distributed Processing, Vol. \n1 MIT Press, Cambridge, MA (1986). \nJ. Bernasconi and K. Gustafson, \"Inductive Inference and Neural Nets,\" to appear. \nJ. Bernasconi and K. Gustafson, \"Generalization by Humans, Neural Nets, and \nID3,\" IJCNN-91-Seattle. \nY. H. Pao, Adaptive Pattern Recognition and Neural Networks, Addison Wesley \n(1989), Chapter 4. \nM. Pavel, M. A. Gluck and V. Henkle, \"Constraints on Adaptive Networks for \nModelling Human Generalization,\" in Advances in Neural Information Processing \nSystems I, edited by D. Touretzky, Morgan Kaufmann, San Mateo, CA (1989), \n2-10. \n\nS. Weiss and C. Kulikowski, Computer Systems that Learn, Morgan Kaufmann \n(1991). \n\nT. G. Dietterich, H. HiId, and G. Bakiri, \"A Comparison of ID3 and Backpropaga(cid:173)\ntion for English Text-to-Speech Mapping,\" Machine Learning, to appear. \nL. Atlas, R. Cole, J. Connor, M. EI-Sharkawi, R. Marks, Y. Muthusamy, E. Barnard, \n\"Performance Comparisons Between Backpropagation Networks and Classification \nTrees on Three Real-World Applications,\" in Advances in Neural Information Pro(cid:173)\ncessing Systems 2, edited by D. Touretzky, Morgan Kaufmann (1990), 622-629. \nJ. Shavlik, R. Mooney, G. Towell, \"Symbolic and Neural Learning Algorithms: An \nExperimental Comparison (revised),\" Machine Learning, (1991, to appear). \n\n\f", "award": [], "sourceid": 445, "authors": [{"given_name": "Jakob", "family_name": "Bernasconi", "institution": null}, {"given_name": "Karl", "family_name": "Gustafson", "institution": null}]}