{"title": "A Connectionist Model of the Owl's Sound Localization System", "book": "Advances in Neural Information Processing Systems", "page_first": 606, "page_last": 613, "abstract": null, "full_text": "A Connectionist Model of the Owl's \n\nSound Localization System \n\nD alliel J. Rosen\u00b7 \n\nDepartment of Psychology \n\nStanford University \nStanford, CA 94305 \n\nDavid E. Rumelhart \n\nDepartment of Psychology \n\nStanford University \nStanford, CA 94305 \n\nEric. I. Knudsen \n\nDepartment of Neurobiology \n\nStanford University \nStanford, CA 94305 \n\nAbstract \n\n,,\"'e do not have a good understanding of how theoretical principles \nof learning are realized in neural systems. To address this problem \nwe built a computational model of development in the owl's sound \nlocalization system. The structure of the model is drawn from \nknown experimental data while the learning principles come from \nrecent work in the field of brain style computation. The model \naccounts for numerous properties of the owl's sound localization \nsystem, makes specific and testable predictions for future experi(cid:173)\nments, and provides a theory of the developmental process. \n\n1 \n\nINTRODUCTION \n\nThe barn owl, Tyto Alba, has a remarkable ability to localize sounds in space. In \ncomplete darkness it catches mice with nearly flawless precision. The owl depends \nupon this skill for survival, for it is a nocturnal hunter who uses audition to guide \n\n\u00b7Current address: Keck Center for Integrative Neuroscience, UCSF, 513 Parnassus \n\nAve., San Francisco, CA 94143-0444. \n\n606 \n\n\fA Connectionist Model of the Owl's Sound Localization System \n\n607 \n\nits search for prey (Payne, 1970; Knudsen, Blasdel and Konishi, 1979). Central to \nthe owl's localization system are the precise auditory maps of space found in the \nowl's optic tectum and in the external nucleus of the inferior colliculus (lex). \n\nThe development of these sensory maps poses a difficult problem for the nervous \nsystem, for their accuracy depends upon changing relationships between the animal \nand its environment. The owl encodes information about the location of a sound \nsource by the phase and amplitude differences with which the sound reaches the \nowl's two ears. Yet these differences change dramatically as the animal matures \nand its head grows. The genome cannot \"know\" in advance precisely how the \nanimal's head will develop - many environmental factors affect this process - so it \ncannot encode the precise development of the auditory system. Rather, the genome \nmust design the auditory system to adapt to its environment, letting it learn the \nprecise interpretation of auditory cues appropriate for its head and ears. \n\nIn order to understand the nature of this developmental process, we built a connec(cid:173)\ntionist model of the owl's sound localization system, using both theoretical principles \nof learning and knowledge of owl neurophysiology and neuroanatomy. \n\n2 THE ESSENTIAL SYSTEM TO BE MODELED \n\nThe owl calculates the horizontal component of a sound source location by mea(cid:173)\nsuring the interaural time difference (lTD) of a sound as it reaches the two ears \n(Knudsen and Konishi, 1979). It computes the vertical component of the signal by \ndetermining the interaurallevel difference (ILD) of that same sound (Knudsen and \nKonishi, 1979). The animal processes these signals through numerous sub-cortical \nnuclei to form ordered auditory maps of space in both the ICx and the optic tectum. \nFigure 1 shows a diagram of this neural circuit. \nNeurons in both the ICx and the optic tectum are spatially tuned to auditory \nstimuli. Cells in these nuclei respond to sound signals originating from a restricted \nregion of space in relation to the owl (Knudsen, 1984). Neurons in the ICx respond \nexclusively to auditory signals. Cells in the optic tectum, on the other hand, encode \nboth audito!y and visual sensory maps, and drive the motor system to orient to the \nlocation of an auditory or visual signal. \n\nResearchers study the owl's development by systematically altering the animal's \nsensory experience, usually in one of two ways. They may fit the animal with a \nsound attenuating earplug, altering its auditory experience, or they may fit the owl \nwith displacing prisms, altering its visual experience. \n\nDisturbance of either auditory or visual cues, during a period when the owl is de(cid:173)\nveloping to maturity, causes neural and behavioral changes that bring the auditory \nmap of space back into alignment with the visua.l map, and/or tune the auditory sys(cid:173)\ntem to be sensitive to the appropriate range of binaural sound signals. The earplug \ninduced changes take place at the level of the VLVp, where ILD is first computed \n(Mogdans and Knudsen, 1992). The visually induced adjustment of the auditory \nmaps of space seems to take place at the level of the ICx (Brainard and Knudsen, \n1993b). The ability of the owl to adjust to altered sensory signals diminishes over \ntime, and is greatly restricted in adulthood (Knudsen and Knudsen, 1990). \n\n\f608 \n\nRosen, Rumelhart, and Knudsen \n\nOVERVIEW of the BARN OWL'. \n\nSOUND LOCALIZATION SYSTEM \n\n( ~~dIC \n\n( \n\nNUCLBJS \nMAGNO(cid:173)\n\nCEWJLAAIS \n\nT\"'*'a \n\nL\"\". \n\nNUCLBJS \nMAGNO(cid:173)\n\nCB.LULAAIS \n\nTIn*'II \n\nFigure 1: A chart describing the flow of auditory information in the owl's sound \nlocalization system. For simplicity, only the connections leading to the one of the \nbilateral optic tecta are shown. Nuclei labeled with an asterisk (*) are included in \nthe model. Nuclei that process ILD and/or lTD information are so labeled. \n\n3 THE NETWORK MODEL \n\nThe model has two major components: a network architecture based on the neuro(cid:173)\nbiology of the owl's localization system, as shown in Figure 1, and a learning rule \nderived from computational learning theory. The elements of the model are stan(cid:173)\ndard connectionist units whose output activations are sigmoidal functions of their \nweighted inputs. The learning rule we use to train the model is not standard. In \nthe following section we describe how and why we derived this rule. \n\n3.1 DEFINING THE GOAL OF THE NETWORK \n\nThe goal of the network, and presumably the owl, is to accurately map sound signals \nto sound source locations. The network must discover a model of the world which \nbest captures the relationship between sound signals and sound source locations. \nRecent work in connectionist learning theory has shown us ways to design networks \nthat search for the model that best fits the data at hand (Buntine and Weigend, \n1991; MacKay, 1992; Rumelhart, Durbin, Golden and Chauvin, in press). In this \nsection we apply such an analysis to the localization network. \n\n\fA Connectionist Model of the Owl's Sound Localization System \n\n609 \n\nTable 1: A table showing the mathematical terms used in the analysis. \nI TERM I MEANING \nThe Model \nThe Data \n\nM \n1J \n\nP(MI1J) Probability of the Model given the Data \n< X,Y>i The set of i input/target training pairs \n\nxi \nYi \nYi \nYij \nWij \n7Jj \n\n:F(7Jj) \n\nC \n\nThe input vector for training trial i \nThe target vector for training trial i \nThe output vector for training trial i \nThe value of output unit j on training trial i \nThe weight from unit j to unit i \nThe netinput to unit j \nThe activation function of unit j evaluated at its netinput \nThe term to be maximized by the network \n\n3.2 DERIVING THE FUNCTION TO BE MAXIMIZED \n\nThe network should maximize the probability of the model given the data. Using \nBayes' rule we write this probability as: \n\nP(MI1J) = P(1JIM)P(M) \n. \n\nP(1J) \n\nHere M represents the model (the units, weights and associated biases) and D \nrepresents the data. We define the data as a set of ordered pairs, [< sound(cid:173)\nsignal, location - signal >d, which represent the cues and targets normally used to \ntrain a connectionist network. In the owl's case the cues are the auditory signals, \nand the target information is provided by the visual system. (Table 1 lists the \nmathematical terms we use in this section.) \n\nWe simplify this equation by taking the natural logarithm of each side giving: \n\nIn P(MI1J) = In P(1JIM) + InP(M) -In P(1J). \n\nSince the natural logarithm is a monotonic transformation, if the network maximizes \nthe second equation it will also maximize the first. \nThe final term in the equation, In P(1J), represents the probability of the ordered \npairs the network observes. Regardless of which model the network settles upon, \nthis term remains the same - the data are a constant during training. Therefore we \ncan ignore it when choosing a model. \nThe second term in the equation, In P(M), represents the probability of the model. \nThis is the prior term in Bayesian analysis and is our estimation of how likely it \nis that a particular model is true, regardless of the data. 'Ve will discuss it below. \nFor now we will concentrate on maximizing In P(1JIM). \n\n\f610 \n\nRosen, Rumelhart, and Knudsen \n\n3.3 ASSUMPTIONS ABOUT THE NETWORK'S ENVIRONMENT \n\nWe assume that the training data - pairs of stylized auditory and visual signals -\nare independent of one another and re-write the previous term as: \n\nInP(VIM) = L:lnP\u00ab \n\ni,Y>i 1M), \n\ni \n\nThe i subscript denotes the particular data, or training, pair. We further expand \nthis term to: \n\nIn P(VIM) = Lin P(ih Iii 1\\ M) + L: In P(Xi). \n\ni \n\ni \n\nWe ignore the last term, since the sound signals are not dependent on the model. \nvVe are left, then, with the task of maximizing Li In P(Ui Iii 1\\ M). It is important \nto note that Yi represents a visual signal, not a localization decision. The network \nattempts to predict its visual experience given its auditory experience. It does not \npredict the probability of making an accurate localization decision. If we assume \nthat visual signals provide the target values for the network, then this analysis shows \nthat the auditory map will always follow the visual map, regardless of whether this \nleads to accurate localization behavior or not. Our assumption is supported by \nexperiments showing that, in the owl, vision does guide the formation of auditory \nspatial maps (Knudsen and Knudsen, 1985; Knudsen, 1988). \nNext, we must clarify the relationship between the inputs, Xi and the targets, ih. \n\\Ve know that the real world is probabilistic - that for a given input there exists \nsome distribution of possible target values. We need to estimate the shape of this \ndistribution. In this case we assume that the target values are binomially distributed \n- that given a particular sound signal, the visual system did or did not detect a \nsound source at each point in owl-centered space. \n\nHaving made this assumption, we can clarify our interpretation of the network \noutput array, Y~. Each element, Yij, of this vector represents the activity of output \nunit j on training trial i. We assume that the output activation of each of these \nunits represents the expected value of its corresponding target, Yij. In this case \nthe expected value is the mean of a binomial distribution. So the value of each \noutput unit Yij represents the probability that a sound signal originated from that \nparticular location. vVe now write the probability of the data given the model as: \n\nP(yilxi 1\\ M) = II yft (1 - Yij )l-Yij\n\n. \n\nj \n\nTaking the natural log of the probability and summing over all data pairs we get: \n\nC = L L: Yij In Yij + (1 - Yij) In( 1 - Yij) \n\ni \n\nj \n\nwhere C is the term we want to maximize. This is the standard cross-entropy term. \n\n3.4 DERIVING THE LEARNING RULE \n\nHaving defined our goal we derive a learning rule appropriate to achieving that goal. \nTo determine this rule we compute :~ where 7}j is the net input to a unit. (In these \n\n\fA Connectionist Model of the Owl's Sound Localization System \n\n611 \n\nequations we have dropped the i subscript, which denotes the particular training \ntrial, since this analysis is identical for all trials.) We write this as: \n\nwhere aF( '1]j) is the derivative of a unit's activation function evaluated at its net \ninput. \nNext we choose an appropriate activation function for the output units. The logistic \nfunction, F('1]j) = ( 1_,,\"), is a good choice for two reasons. First, it is bounded by \nzero and one. This makes sense since we assume that the probability that a sound \nsignal originated at anyone point in space is bounded by zero and one. Second, \nwhen we compute the derivative of the logistic function we get the following result: \n\nl+e \n\n, \n\naF('1]j) = F('1]j)(I- F('1]j)) = 1/j(1- 1/j). \n\nThis term is the variance of a binomial distribution and when we return to the \nderivative of our cost function, we see that this variance term is canceled by the \ndenominator. The final derivative we use to compute the weight changes at the \noutput units is therefore: \n\nac \n~ ) \n~