{"title": "MURPHY: A Robot that Learns by Doing", "book": "Neural Information Processing Systems", "page_first": 544, "page_last": 553, "abstract": "", "full_text": "544 \n\nMURPHY: A Robot that Learns by Doing \n\nBartlett W. Mel \n\nCenter for Complex Systems Research \n\nUniversity of Illinois \n508 South Sixth Street \nChampaign, IL 61820 \n\nJanuary 2, 1988 \n\nAbstract \n\nMURPHY consists of a camera looking at a robot arm, with a connectionist network \narchitecture situated in between. By moving its arm through a small, representative \nsample of the 1 billion possible joint configurations, MURPHY learns the relationships, \nbackwards and forwards, between the positions of its joints and the state of its visual field. \nMURPHY can use its internal model in the forward direction to \"envision\" sequences \nof actions for planning purposes, such as in grabbing a visually presented object, or in \nthe reverse direction to \"imitate\", with its arm, autonomous activity in its visual field. \nFurthermore, by taking explicit advantage of continuity in the mappings between visual \nspace and joint space, MURPHY is able to learn non-linear mappings with only a single \nlayer of modifiable weights. \n\nBackground \n\nCurrent Focus Of Learning Research \n\nMost connectionist learning algorithms may be grouped into three general catagories, \ncommonly referred to as supenJised, unsupenJised, and reinforcement learning. Supervised \nlearning requires the explicit participation of an intelligent teacher, usually to provide the \nlearning system with task-relevant input-output pairs (for two recent examples, see [1,2]). \nUnsupervised learning, exemplified by \"clustering\" algorithms, are generally concerned \nwith detecting structure in a stream of input patterns [3,4,5,6,7]. In its final state, an \nunsupervised learning system will typically represent the discovered structure as a set of \ncategories representing regions of the input space, or, more generally, as a mapping from \nthe input space into a space of lower dimension that is somehow better suited to the task at \nhand. In reinforcement learning, a \"critic\" rewards or penalizes the learning system, until \nthe system ultimately produces the correct output in response to a given input pattern \n[8]. \n\nIt has seemed an inevitable tradeoff that systems needing to rapidly learn specific, \nbehaviorally useful input-output mappings must necessarily do so under the auspices of \nan intelligent teacher with a ready supply of task-relevant training examples. This state of \naffairs has seemed somewhat paradoxical, since the processes of Rerceptual and cognitive \ndevelopment in human infants, for example, do not depend on the moment by moment \nintervention of a teacher of any sort. \n\nThe current work has been focused on a fourth type of learning algorithm, i.e. learning-by(cid:173)\ndoing, an approach that has been very little studied from either a connectionist perspective \n\nLearning by Doing \n\n\u00a9 American Institute of Physics 1988 \n\n\f545 \n\nor in the context of more traditional work in machine learning. In its basic form, the \nlearning agent \n\n\u2022 begins with a repertoire of actions and some form of perceptual input, \n\n\u2022 exercises its repertoire of actions, learning to predict i) the detailed sensory conse(cid:173)\nquences of its actions, and, in the other direction, ii) its actions that are associated \nwith incoming sensory patterns, and \n\n\u2022 runs its internal model (in one or both directions) in a variety of behaviorally-relevant \ntasks, e.g. to \"envision\" sequences of actions for planning purposes, or to internally \n\"imitate\" , via its internal action representation, an autonomously generated pattern \nof perceptual activity. \n\nIn comparison to standard supenJised learning algorithms, the crucial property of \nlearning-by-doing is that no intelligent teacher is needed to provide input-output pairs \nfor learning. Laws of physics simply translate actions into their resulting percepts, both \nof which are represented internally. The learning agent need only notice and record this \nrelationship for later use. In contrast to traditional unsupervised learning approaches, \nlearning-by-doing allows the acquisition of specific, task-relevant mappings, such as the \nrelationship between a simultaneously represented visual and joint state. Learning-by(cid:173)\ndoing differs as well from reinforcement paradigms in that it can operate in the absence of \na critic, i.e. in situations where reward or penalty for a particular training instance may \nbe inappropriate. \n\nLearning by doing may therefore by described as an unsupeMJised associative algorithm, \ncapable of acquiring rich, task-relevant associations, but without an intelligent teacher or \ncritic. \n\nAbridged History of the Approach \n\nThe general concept of leaning by doing may be attributed at least to Piaget from the \n1940's (see [9] for review). Piaget, the founder of the \"constructivist\" school of cognitive \ndevelopment, argued that knowledge is not given to a child as a passive observer, but \nis rather discovered and constructed by the child, through active manipulation of the \nenvironment. A handful of workers in artificial intelligence have addressed the issue of \nlearning-by-doing, though only in highly schematized, simulated domains, where actions \nand sensory states are represented as logical predicates [10,11,12,13]. \n\nBarto & Sutton [14] discuss learning-by-doing in the context of system identification \n\nand motor control. They demonstrated how a simple simulated automaton with two ac(cid:173)\ntions and three sensory states can build a model of its environment through exploration, \nand subsequently use it to choose among behavioral alternatives. In a similar vein, Rumel(cid:173)\nhart [15] has suggested this same approach could be used to learn the behavior of a robot \narm or a set of speech articulators. Furthermore, the forward-going \"mental model\" , once \nlearned, could be used internally to train an inverse model using back-propagation. \n\nIn previous work, this author [16] described a connectionist model (VIPS) that learned \nto perform 3-D visual transformations on simulated wire-frame objects. Since in complex \nsensory-motor environments it is not possible, in general, to learn a direct relationship \nbetween an outgoing command state and an incoming sensory state, VIPS was designed \nto predict changes in the state of its visual field as a function of its outgoing motor com(cid:173)\nmand. VIPS could then use its generic knowledge of motor-driven visual transformations \nto \"mentally rotate\" objects through a series of steps. \n\n\f546 \n\nRecinolopic \nVisual RepresemaDOII \n\n---\nI \n\nValue-Coded \nloint RepresenlatiOll \n\n- --\n\nFigure 1: MURPHY's Connectionist Architecture. 4096 coarsely-tuned visual \nunits are organized in a square, retinotopic grid. These units are bi-directionally \ninterconnected with a population of 273 joint units. The joint population is subdi(cid:173)\nvided into 3 sUbpopulations, each one a value-coded representation of joint angle \nfor one of the three joints. During training, activity in the joint unit population \ndetermines the physical arm configuration. \n\nInside MURPY \n\nThe current work has sought to further explore the process of learning-by-doing in a \ncomplex sensory-motor domain, extending previous work in three ways. First, the learning \nof mappings between sensory and command (e.g. motor) representations should be allowed \nto proceed in both directions simultaneously during exploratory behavior, where each \nmapping may ultimately subserve a very different behavioral goal. Secondly, MURPHY \nhas been implemented with a real camera and robot arm in order to insure representational \nrealism to the greatest degree possible. Third, while the specifics of MURPHY's internal \nstructures are not intended as a model of a specific neural system, a serious attempt has \nbeen made to adhere to architectural components and operations that have either been \ndirectly suggested by nervous system structures, or are at least compatible with what is \ncurrently known. Detailed biological justification on this point awaits further work. \n\nMURPHY's Body \n\nMURPHY consists of a 512 x 512 JVC color video camera pointed at a Rhino XR-3 robotic \narm. Only the shoulder, elbow, and wrist joints are used, such that the arm can move \nonly in the image plane of the camera. (A fourth, waist joint is not used). White spots are \nstuck to the arm in convenient places; when the image is thresholded, only the white spots \nappear in the image. This arrangement allows continuous control over the complexity of \nthe visual image of the arm, which in turn affects time spent both in computing visual \nfeatures and processing weights during learning. A Datacube image processing system is \nused for the thresholding operation and to \"blur\" the image in real time with a gaussian \nmask. The degree of blur is variable and can be used to control the degree of coarse-coding \n(i.e. receptive field overlap) in the camera-topic array of visual units. The arm is software \ncontrollable, with a stepper motor for each joint. Arm dynamics are not considered in this \nwork. \n\n\f547 \n\nMURPHY's Mind \n\nMURPHY is currently based on two interconnected populations of neuron-like units. The \nfirst is organized as a rectangular, visuotopically-mapped 64 x 64 grid of coarsely-tuned \nvisual units that each responds when a visual feature (such as a white spot on the arm) \nfalls into its receptive field (fig. 1). Coarse coding insures that a single visual feature will \nactivate a small population of units whose receptive fields overlap the center of stimulation. \nThe upper trace in figure 2 shows the unprocessed camera view, and the center trace depicts \nthe resulting pattern of activation over the grid of visual units. \n\nThe second population of 273 units consists of three subpopulations, representing the \nangles of each of the three joints. The angle of each joint is value-coded in a line of \nunits dedicated to that joint (fig. 1). Each unit in the population is \"centered\" at a \nsome joint angle, and is maximally activated when the joint is to be sent to that angle. \nNeighboring joint units within a joint sub population have overlapping \"projective fields\" \nand progressively increasing joint-angle centers. \n\nIt may be noticed that both populations of units are coarsely tuned, that is, the units \nhave overlapping receptive fields whose centers vary in an orderly fashion from unit to \nneighboring unit. This style of representation is extremely common in biological sensory \nsystems [17,18,19]' and has been attributed a number ofrepresentational advantages (e.g. \nfewer units needed to encode range of stimuli, increased immunity to noise and unit mal(cid:173)\nfunction, and finer stimulus discriminations). A number of additional advantages of this \ntype of encoding scheme are discussed in section, in relation to ease of learning, speed of \nlearning, and efficacy of generalization. \n\nMURPHY's Education \n\nBy moving its arm through a small, representative sample (approximately 4000) of the \n1 billion possible joint configurations, MURPHY learns the relationships, backwards and \nforwards, between the positions of its joints and the state of its visual field. During train(cid:173)\ning, the physical environment to which MURPHY's visual and joint representations are \nwired enforces a particular mapping between the states of these two representations. The \nmapping comprises both the kinematics of the arm as well as the optical parameters and \nglobal geometry of the camera/imaging system. It is incrementally learned as each unit in \npopulation B comes to \"recognize\" , through a process of weight modification, the states of \npopulation A in which it has been strongly activated. After sufficient training experience \ntherefore, the state of popUlation A is sufficient to generate a \"mental image\" on popula(cid:173)\ntion B, that is, to predictively activate the units in B via the weighted interconnections \ndeveloped during training. \n\nIn its current configuration, MURPHY steps through its entire joint space in around \n1 hour, developing a total of approximately 500,000 weights between the two popUlations. \n\nThe Learning Rule \n\nTradeoffs in Learning and Representation \n\nIt is well known in the folklore of connectionist network design that a tradeoff exists \nbetween the choice of representation (i.e. the \"semantics\") at the single unit level and the \nconsequent ease or difficulty of learning within that representational scheme. \n\nAt one extreme, the single-unit representation might be completely decoded, calling for \n\na separate unit for each possible input pattern. While this scheme requires a combinato(cid:173)\nrially explosive number of units, and the system must \"see\" every possible input pattern \nduring training, the actual weight modification rule is rendered very simple. At another \nextreme, the single unit representation might be chosen in a highly encoded fashion with \ncomplex interactions among input units. In this case, the activation of an output unit \n\n\f548 \n\nmay be a highly non-linear or discontinuous function of the input pattern, and must be \nlearned and represented in mUltiple layers of weights. \n\nResearch in connectionism has often focused on Boolean functions [20,21,1,22,23], typ(cid:173)\n\nified by the encoder problem [20], the shifter problem [21] and n-bit parity [22]. Since \nBoolean functions are in general discontinuous, such that two input patterns that are close \nin the sense of Hamming distance do not in general result in similar outputs, much effort \nhas been directed toward the development of sophisticated, multilayer weight-modification \nrules (e.g. back-propagation) capable of learning arbitrary discontinuous functions. The \ncomplexity of such learning procedures has raised troubling questions of scaling behavior \nand biological plausibility. \n\nThe assumption of continuity in the mappings to be learned, however, can act to \nsignificationly simplify the learning problem while still allowing for full generalization to \nnovel input patterns. Thus, by relying on the continuity assumption, MURPHY's is able \nto learn continuous non-linear functions using a weight modification procedure that is \nsimple, locally computable, and confined to a single layer of modifiable weights. \n\nHow MURPHY learns \n\nFor sake of concrete illustration, MURPHY's representation and learning scheme will be \ndescribed in terms of the mapping learned from joint units to visual units during training. \nThe output activity of a given visual unit may be described as a function over the 3-\ndimensional joint space, whose shape is determined by the kinematics of the arm, the \nlocation of visual features (i.e. white spots) on the arm, the global properties of the \ncamera/imaging system, and the location of the visual unit's receptive field. In order for \nthe function to be learned, a visual unit must learn to \"recognize\" the regions of joint \nspace in which it has been visually activated during training. In effect, each visual unit \nlearns to recognize the global arm configurations that happen to put a white spot in its \nreceptive field. \n\nIt may be recalled that MURPHY's joint unit population is value-coded by dimension, \nsuch that each unit is centered on a range of joint angles (overlapping with neighboring \nunits) for one of the 3 joints. \nIn this representation, a global arm configuration can \nbe represented as the conjunctive activity of the k (where k = 3) most active joint units. \nMURPHY's visual units can therefore learn to recognize the regions of joint space in which \nthey are strongly activated by simply ''memorizing'' the relevant global joint configurations \nas conjunctive clusters of input connections from the value-coded joint unit population. \n\nTo realize this conjunctive learning scheme, MURPHY's uses sigma-pi units (see [24]), \nas described below. At training step S, the set of k most active joint units are first \nidentified. Some subset of visual units is also strongly activated in state S, each one \nsignalling the presence of a visual feature (such as a white spot) in its receptive fields. \nAt the input to each active visual unit, connections from the k most highly active joint \nunits are formed as a multiplicative k-tuple of synaptic weights. The weights Wi on these \nconnections are initially chosen to be of unit strength. The output Cj of a given synaptic \nconjunction is computed by multiplying the k joint unit activation values Xi together with \ntheir weights: \n\nThe output y of the entire unit is computed as a weighted sum of the outputs of each \nconjunction and then applying a sigmoidal nonlinearity: \n\ny = u(L WjCj). \n\ni \n\nSigma-pi units of this kind may be thought of as a generalization of a logical disjunction of \nconjunctions (OR of AND's). The multiplicative nature of the conjunctive clusters insures \n\n\f549 \n\nthat every input to the conjunct is active in order for the conjunct to have an effect on \nthe unit as a whole. If only a single input to a conjunct is inactive, the effect of the \nconjunction is nullified. \n\nSpecific-Instance Learning in Continuous Domains \n\nMURPHY's learning scheme is directly reminiscent of specific-instance learning as dis(cid:173)\ncussed by Hampson &: Vol per [23] in their excellent review of Boolean learning and repre(cid:173)\nsentational schemes. Specific-instance learning requires that each unit simply ''memorize\" \nall relevant input states, i.e. those states in which the unit is intended to fire. Unfor(cid:173)\ntunately, simple specific-instance learning allows for no generalization to novel inputs, \nimplying that each desired system responses will have to have been explicitly seen dur(cid:173)\ning training. Such a state of affairs is clearly impractical in natural learning contexts. \nHampson &: Volper [23] have further shown that random Boolean functions will require \nan exponential number of weights in this scheme. \n\nFor continous functions on the other hand, two kinds of generalization are possible \nwithin this type of specific-instance learning scheme. We consider each in turn, once again \nfrom the perspective of MURPHY's visual units learning to recognize the regions in joint \nspace in which they are activated. \n\nGeneralization by Coarse-Coding \n\nWhen a visual unit is activated in a given joint configuration, and acquires an appropriate \nconjunct of weights from the set of highly active units in the joint popUlation, by conti(cid:173)\nnuity the unit may assume that it should be at least partially activated in nearby joint \nconfigurations as well. Since MURPHY's joint units are coarse-coded in joint angle, this \nwill happen automatically: as the joints are moved a small distance away from the specific \ntraining configuration, the output of the conjunct encoding that training configuration \nwill decay smoothly from its maximum. Thus, a visual unit can \"fire\" predictively in joint \nconfigurations that it has never specifically seen during training, by interpolating among \nconjuncts that encode nearby joint configurations. \n\nThis scheme suggests that training must be sufficiently dense in joint space to have \nseen configurations ''nearby'' to all points in the space by some criterion. In practice, the \ntraining step size is related to the degree of coarse-coding in the joint population, which is \nchosen in tUrn such that a joint pertubation equal to the radius of a joint unit's projective \nfield (i.e. the range of joint angles over which the unit is active) should on average push \na feature in the visual field a distance of about one visual receptive field radius. As a rule \nof thumb, the visual receptive field radius is chosen small enough so as to contain only a \nsingle feature on average. \n\nGeneralization by Extrapolation \n\nThe second type of generalization is based on a heuristic principle, again illustrated in \nterms of learning in the visual population. If a visual unit has during training been \nvery often activated over a large, easy-to-specify region of joint space, such as a hyper(cid:173)\nrectangular region, then it may be assumed that the unit is activated over the entire region \nof joint space, i.e. even at points not yet seen. At the synaptic level, \"large regions\" can be \nrepresented as conjuncts with fewer terms. In its simplest form, this kind of generalization \namounts to simply throwing out one or more joints as irrelevant to the activation of a \ngiven visual unit. What synaptic mechanism can achieve this effect? Competition among \njoint unit afferents can be used to drive irrelevant variables from the sigma-pi conjuncts. \nThus, if a visual unit is activated repeatedly during training, and the elbow and shoulder \nangle units are constantly active while the most active wrist unit varies from step to step, \nthen the weighted connections from the repeatedly activated elbow and shoulder units \n\n\f550 \n\n-- . \u2022 \n\n\u2022 \n\n--\n\n\u2022 \n\n\u2022 \n, \n\n\u2022 \n\u2022 \u2022 \n\n\u2022 \n\n, \n.. \n\ntil \n\n\u2022 \u2022 \u2022 \n\u2022 \n\nI .; \n, \n\n.. \n\u2022 \u2022 \n\u2022 \n\n,., \n\nI \n\nFigure 2: Three Visual Traces. The top trace shows the unprocessed camera view \nof MURPHY's arm. White spots have been stuck to the arm at various places, such \nthat a thresholded image contains only the white spots. This allows continuous \ncontrol over the visual complexity of the image. The center trace represents the \nresulting pattern of activation over the 64 x 64 grid of coarsely-tuned visual units. \nThe bottom trace depicts an internally-produced \"mental\" image of the arm in the \nsame configuration as driven by weighted connections from the joint population. \nNote that the \"mental\" trace is a sloppy, but highly recognizable approximation \nto the camera-driven trace. \n\nwill become progressively and mutually reinforced at the expense of the set of wrist unit \nconnections, each of which was only activated a single time. \n\nThis form of generalization is similar in function to a number of \"covering\" algorithms \ndesigned to discover optimal hyper-rectangular decompositions (with possible overlap) of \na set of points in a multi-dimensional space (e.g. [25,26]). The competitive feature has \nnot yet been implemented explicitly at the synaptic level, rather, the full set of conjuncts \nacquired during training are currently collapsed en masse into a more compact set of con(cid:173)\njuncts, according to the above heuristics. In a typical run, MURPHY is able to eliminate \nbetween 30% and 70% of its conjuncts in this way. \n\nWhat MURPHY Does \n\nGrabbing A Visually Presented Target \n\nOnce MURPHY has learned to image its arm in an arbitrary joint configuration, it can \nuse heuristic search to guide its arm \"mentally\" to a visually specified goal. Figure 3(a) \ndepicts a hand trajectory from an initial position to the location of a visually presented \ntarget. Each step in the trajectory represents the position of the hand (large blob) at an \nintermediate joint configuration. MURPHY can visually evaluate the remaining distance \nto the goal at each position and use best-first search to reduce the distance. Once a \ncomplete trajectory has been found, MURPHY can move its arm to the goal in a single \nphysical step, dispensing with all backtracking dead-ends, and other wasted operations (fig. \n3(b\u00bb . It would also be possible to use the inverse model, i.e. the map from a desired visual \ninto an internal joint image, to send the arm directly to its final position. Unfortunately, \nMURPHY has no means in its current early state of development to generate a full-blown \n\n\f551 \n\n-\n\n_ ... -\n\nM1RPI!Y'. HInt&! Tra jeetory \n\nFigure 3: Grabbing a.n Object. (a) Irregular trajectory represents sequence of \n\"mental\" steps taken by MURPHY in attempting to \"grab\" a visually-presented \ntarget (shown in (b) as white cross). Mental image depicts MURPHY's arm in its \nfinal goal configuration, i.e. with hand on top of object. Coarse-coded joint activity \nis shown at right. (b) Having mentally searched a.nd found the target through a \nseries of steps, MURPHY moves its arm phY6ically in a single step to the target, \ndiscarding the intermediate states of the trajectory that are not relevant in this \nsimple problem. \n\nvisual image of its arm in one of the final goal positions, of which there are many possible. \nSending the tip of a robot arm to a given point in space is a classic task in robotics. \nThe traditional approach involves first writing explicit kinematic equations for the arm \nbased on the specific geometric details of the given arm. These equations take joint angles \nas inputs and produce manipUlator coordinates as outputs. In general, however, it is \nmost often useful to specify the coordinates of the manipulator tip (i.e. its desired final \nposition), and compute the joint angles necessary to achieve this goal. This involves the \nsolution of the kinematic equations to generate an inverse kinematic model. Deriving such \nexpressions has been called \"the most difficult problem we will encounter\" in vision-based \nrobotics [27]. For this reason, it is highly desirable for a mobile agent to learn a model \nof its sensory-motor environment from scratch, in a way that depends little or not at all \non the specific parameters of the motor apparatus, the sensory apparatus, or their mutual \ninteractions. It is interesting to note that in this reaching task, MURPHY appears from \nthe outside to be driven by an inverse kinematic model of its arm, since its first official \nact after training is to reach directly for a visually-presented object. \n\nWhile it is clear that best-first search is a weak method whose utility is limited in com(cid:173)\n\nplex problem solving domains, it may be speculated that given the ability to rapidly image \narm configurations, combined with a set of simple visual heuristics and various mechanism \nfor escaping local minima (e.g. send the arm home), a number of more interesting visual \nplanning problems may be within MURPHY's grasp, such as grabbing an object in the \npresence of obstacles. Indeed, for problems that are either difficult to invert, or for which \nthe goal state is not fully known a priori, the technique of iterating a forward-going model \nhas a long history (e.g. Newton's Method). \n\n\f552 \n\nImitating Autonomous Arm Activity \n\nA particularly interesting feature of \"learning-by-doing\" is that for every pair of unit \npopulations present in the learning system, a mapping can be learned between them both \nbackwards and forwards. Each such mapping may enable a unique and interesting kind of \nbehavior. In MURPHY's case, we have seen that the mapping from a joint state to a visual \nimage is useful for planning arm trajectories. The reverse mapping from a visual state to \najoint image has an altogether different use, i.e. that of \"imitation\". Thus, if MURPHY's \narm is moved passively, the model can be used to ''follow'' the motion with an internal \ncommand (i.e. joint) trace. Or, if a substitute arm is positioned in MURPHY's visual \nfield, MURPHY can \"assume the position\", i.e . imitate the model arm configuration by \nmapping the afferent visual state into a joint image, and using the joint image to move the \narm. As of this writing, the implementation of this behavior is still somewhat unreliable. \n\nDiscussion and Future Work \n\nIn short, this work has been concerned with learning-by-doing in the domain of vision(cid:173)\nbased robotics. A number of features differentiate MURPHY from most other learning \nsystems, and from other approaches to vision-based robotics: \n\n\u2022 No intelligent teacher is needed to provide input-ouput pairs. MURPHY learns by \nexercising its repertoire of actions and learning the relationship between these actions \nand the sensory images that result. \n\n\u2022 Mappings between populations of units, regardless of modality, can be learned in \nboth directions simultaneously during exploratory behavior. Each mapping learned \ncan support a distinct class of behaviorally useful functions. \n\n\u2022 MURPHY uses its internal models to first solve problems \"mentally\". Plans can \n\ntherefore be developed and refined before they are actually executed. \n\n\u2022 By taking explicit advantage of continuity in the mappings between visual and joint \nspaces, and by using a variant of specific-instance learning in such a way as to allow \ngeneralization to novel inputs, MURHPY can learn \"difficult\" non-linear mappings \nwith only a single layer of modifiable weights. \n\nTwo future steps in this endeavor are as follows: \n\n\u2022 Provide MURPHY with direction-selective visual and joint units both, so that it \nmay learn to predict relationships between rates of change in the visual and joint \ndomains. In this way, MURPHY can learn how to perturb its joints in order to send \nits hand in a particular direction, greatly reducing the current need to search for \nhand trajectories. \n\n\u2022 Challenge MURPHY to grab actual objects, possibly in the presence of obstacles, \nwhere path of approach is crucial. The ability to readily envision intermediate arm \nconfigurations will become critical for such tasks. \n\n. \n\nAcknowledgements \n\nParticular thanks are due to Stephen Omohundro for his unrelenting scientific and moral \nsupport, and for suggesting vision and robotic kinematics as ideal domains for experimen(cid:173)\ntation. \n\n\f553 \n\nReferences \n\n[1] T.J. Sejnowski & C.R. Roaenberg, Complex Systems, 1, 145, (1969). \n\n[2] G.J. Tesauro & T.J. Sejnowski. A parallel network that leMIUI to play backgammon. Submitted for \n\npublication. \n\n[3] S. Grossberg, BioI. Cybern., f3, 187, (1976). \n\n[4] T. Kohonen, Self organization and auociati\"e memory., (Springer-Verlag, Berlin 1984). \n\n[5] D.E. Rumelhart & D. Zipser. In Parallel diatri6uted proceuing: e:rplorationa in tA.e microatructure \noj cognition, \"01. 1, D.E. Rumelhart, J.L. McClelland, Eds., (Bradford: Cambridge, MA, 1986), p. \n151. \n\n[6] R. Linsker, Proc. Natl. Acad. Sci., 83, 8779, (1986). \n\n[7] G.E. Hinton & J.L. McClelland. Learning representations by recirculation. Oral presentation, IEEE \n\nconference on Neural Information Processing Systems, Denver, 1987. \n\n[8] A.G. Barto, R.S. Sutton, & C.W. Anderson, IEEE Trans. on Sy \u2022. , Man, Cybern., amc-13, 834, (1983). \n\n[9] H. Ginsburg & S. Opper, Piaget'a tA.eor, oj intellectual de\"elopment., (Prentice Hall, New Jersey, \n\n1969). \n\n[10] J.D. Becker. In Computer modela Jor tA.ougA.t and language., R. Schank & K.M. Colby, Eds., (Free(cid:173)\n\nlllAIl, San Francisco, 1973). \n\n[11] R.L. Rivest & R.E. Schapire. In Proc. of the 4th into workshop on ma.ch.ine learning, 364-375, (1987). \n\n[12] J .G. Carbonell & Y. Gil. In Proc. of the 4th into workshop on machlne learning, 256-266, (1987). \n\n[13] K. Chen, Tech Report, Dept. of Computer Science, University of illinois, 1987. \n\n[14] A.G. Barto & R.S. Sutton, AFWAL-TR-81-1070, Avionics Laboratory, Air Force Wright Aeronautical \n\nLaboratories, Wright-Patterson AFB, Ohio 45433, 1981. \n\n[15] D.E. Rumelhart, \"On learning to do what you want\". Talk given at CMU Connectionist Summer \n\nSchool,1986a. \n\n[16] B.W. Melin Proc. of 8th Ann. Con!. of the Cognitive Science Soc., 562-571, (1986). \n\n[17] D.H. Ballard, G.E. Hinton, & T.J Sejnowski, Nature, 306, 21, (1983). \n\n[18] R.P. Erikson, American Scientist, May-June 1984, p. 233. \n\n[19] G.E. Hinton, J.L. McClelland, & D.E. Rumelhart. In Parallel diatri6uted proceuing: e:rplorationa \n\nin tA.e microatructure oj cognition, \"01. 1, D.E. Rumelhart, J .L. McClelland, Eds., (Bradford, Cam(cid:173)\nbridge, 1986), p. 77. \n\n[20] D.H. Ackley, G.E. Hinton, & T.J. Sejnowski, Cognitive Science, 9, 147, (1985). \n\n[21] G.E. Hinton & T .J. Sejnowski. In Parallel diatri6uted proceuing: e:rplorationa in tA.e microatructure \n\noj cognition, \"01. 1, D.E. Rumelhart, J.L. McClelland, Eds., (Bradford, Cambridge, 1986), p. 282. \n\n[22] G.J. Tes&llro, Complex Systems, 1,367, (1987). \n\n[23] S.E. Hampson & D.J. Volper, Biological Cybernetics, 56, 121, (1987). \n\n[24] D.E. Rumelhart, G.E. Hinton, & J.L. McClelland. In Parallel diatri6uted proceuing: e:rplorationa \n\nin tA.e microatructure oj cognition, \"01. 1, D.E. Rumelhart, J.L. McClelland, Eds., (Bradford, Cam(cid:173)\nbridge, 1986), p. 3. \n\n[25] R.S. Michalski, J.G. Carbonell, & T.M. Mitchell, Eds., MacA.ine learning: an artificial intelligence \n\napproacA., Vois. I and II, (Morgan KauflllAll, Los Altos, 1986). \n\n[26] S. Omohundro, Complex Systems, 1, 273, (1987). \n\n[27] Paul, R. R060t manipulatora: matA.ematica, programming, and control., (MIT Press, Cambridge, \n\n1981). \n\n\f", "award": [], "sourceid": 92, "authors": [{"given_name": "Bartlett", "family_name": "Mel", "institution": null}]}