{"title": "Computation of Heading Direction from Optic Flow in Visual Cortex", "book": "Advances in Neural Information Processing Systems", "page_first": 433, "page_last": 440, "abstract": null, "full_text": "Computation of Heading Direction From \n\nOptic Flow in Visual Cortex \n\nMarkus Lappe\u00b7 \n\nJosefP. Rauschecker \n\nLaboratory of Neurophysiology, NIMH, Poolesville, MD, U.S.A. and \nMax-Planck-Institut fur Biologische Kybernetik, Tiibingen, Germany \n\nAbstract \n\nWe have designed a neural network which detects the direction of ego(cid:173)\nmotion from optic flow in the presence of eye movements (Lappe and \nRauschecker, 1993). The performance of the network is consistent with \nhuman psychophysical data, and its output neurons show great similarity \nto \"triple component\" cells in area MSTd of monkey visual cortex. We \nnow show that by using assumptions about the kind of eye movements \nthat the obsenrer is likely to perform, our model can generate various \nother cell types found in MSTd as well. \n\n1 \n\nINTRODUCTION \n\nFollowing the ideas of Gibson in the 1950's a number of studies in human psychophysics \nhave demonstrated that optic flow can be used effectively for navigation in space (Rieger and \nToet, 1985; Stone and Perrone, 1991; Warren et aI., 1988). In search for the neural basis of \noptic flow processing, an area in the cat's extrastriate visual cortex (PMLS) was described \nas having a centrifugal organization of neuronal direction preferences, which suggested \nan involvement of area PMLS in the processing of expanding flow fields (Rauschecker \net a/., 1987; Brenner and Rauschecker, 1990). Recently, neurons in the dorsal part of \nthe medial superior temporal area (MSTd) in monkeys have been described that respond \nto various combinations of large expanding/contracting, rotating, or shifting dot patterns \n(Duffy and Wurtz, 1991; Tanaka and Saito, 1989). Cells in MSTd show a continuum \nof response properties ranging from selectivity for only one movement pattern (\"single \n\n\u00b7Present address: Neurobiologie, ND7, Ruhr- Universitat Bochum, 4630 Bochum, Germany. \n\n433 \n\n\f434 \n\nLappe and Rauschecker \n\ncomponent cells\") to selectivity for one mode of each of the three movement types (\"triple \ncomponent cells\"). An interesting property of many MSTd cells is their position invariance \n(Andersen et al., 1990). A sizable proportion of cells, however, do change their selectivity \nwhen the stimulus is displaced by several tens of degrees of visual angle, and their position \ndependence seems to be correlated with the type of movement selectivity (DuffY and Wurtz, \n1991; Orban et al., 1992): It is most common for triple component cells and occurs least \noften in single component cells. Taken together, the wide range of directional tuning and \nthe apparent lack of specificity for the spatial position of a stimulus seem to suggest, that \nMSTd cells do not possess the selectivity needed to explain the high accuracy of human \nobservers in psychophysical experiments. Our simulation results, however, demonstrate \nthat a population encoding can be used, in which individual neurons are rather broadly \ntuned while the whole network gives very accurate results. \n\n2 THE NETWORK MODEL \n\nThe major projections to area MST originate from the middle temporal area (M1). Area \nMT is a well known area of monkey cortex specialized for the processing of visual motion. \nIt contains a retinotopic representation of local movement directions (AUman and Kaas, \n1971; Maunsell and Van Essen, 1983). In our model we assume that area MT comprises a \npopulation encoding of the optic flow and that area MST uses this input from MT to extract \nthe heading direction. Therefore, the network consists of two layers. In the first layer, 300 \noptic flow vectors at random locations within 50 degrees of eccentricity are represented. \nEach flow vector is encoded by a population of directionally selective neurons. It has been \nshown previously that a biologically plausible population encoding like this can also be \nmodelled by a neural network (Wang et al., 1989). For simplicity we use only four neurons \nto represent an optic flow vector 8 i as \n\n4 \n\n8, = 2: Sileil, \n\n1=1 \n\n(1) \n\nwith equally spaced preferred directions eil = (cos(1rk/2),sin(1rk/2\u00bbt. A neuron's \nresponse to a flow vector of direction 4>i and speed OJ is given by the tuning curve \n\nif COS(4)i -1rk/2) > 0 \notherwise. \n\nThe second layer contains a retinotopic grid of possible translational heading directions T j. \nEach direction is represented by a population of neurons, whose summed activities give the \nlikelihood that Tj is the correct heading. The perceiVed direction is finally chosen to be \nthe one that has the highest population activity. \n\nThe calculation of this likelihood is based on the subspace algorithm by Heeger and Jepson \n(1992). It employs the minimization of a residual function over all possible heading \ndirections. The neuronal populations in the second layer evaluate a related function that \nis maximal for the correct heading. The subspace algorithm works as follows: When \nan observer moves through a static environment all points in space share the same six \nmotion parameters, the translation T = (Tx, Ty , Tz)t and the rotation n = (Ox, Oy, Oz)t. \nThe optic flow 8(z, y) is the projection of the movement ofa 3D- point (X, Y, Z)t onto the \nretina, which, for simplicity, is modelled as an image plane. In a viewer centered coordinate \n\n\fComputation of Heading Direction From Optic Flow in Visual Cortex \n\n435 \n\nsystem the optic flow can be written as: \n1 \n\n8(z, y) = Z(z, y) A(z, y)T + B(z, y)fl \n\n(2) \n\nwith the matrices \n\nA(z,y) = \n\n( -f \n0 \n\no Z) \n- f \n\ny \n\nand B(z, y) = \n\n( zy j f \nf + y2 j f \n\n- f - z2 j f \n-zyj f \n\ndepending only on coordinates (z , y) in the image plane and on the \"focal length\" f (Heeger \nand Jepson, 1992). In trying to estimate T, given the optic flow 8, we first have to note \nthat the unknowns Z (z, y) and T are multiplied together. They can thus not be determined \nindependently so that the translation is considered a unit vector pointing in the direction of \nheading. Eq. (2) now contains six unknowns, Z (z , y), T and fl, but only two measurements \n{}z; and {}y. Therefore, flow vectors from m distinct image points are combined into the \nmatrix equation \n\n(3) \nwhere S = (81, ... , 8m )t is a 2m-dimensional vector consisting of the components of \nthe m image velocities, q = (ljZ(z1' yI), ... , 1jZ(zm, Ym), nz;, ny, nz)t an (m + 3)(cid:173)\ndimensional vector, and \n\nS = C(T)q, \n\nC(T) = (A(Zr)T ::~ \n\no \n\n(4) \n\na 2m x (m + 3) matrix. Heeger and Jepson (1992) show that the heading direction can be \nrecovered by minimizing the residual function \n\nR(T) = IIStCl.(T)112. \n\nIn this equation Cl.(T) is defined as follows: Provided that the columns of C(T) are \nlinearly independent, they form a basis of an C m + 3 )-dimensional subspace of the R 2m, \nwhich is called therangeofC(T). Thematrix Cl.(T) spans the remaining (2m-( m+3))(cid:173)\ndimensional subspace which is called the orthogonal complement of C(T). Every vector \nin the orthogonal complement ofC(T) is orthogonal to every vector in the range ofC(T). \n\nIn the network, the population of neurons representing a certain Tj shall be maximally \nexcited when R(Tj) = O. Two steps are necessary to accomplish this. First an individual \nneuron evaluates part of the argument ofR(T j) by picking out one of the colwnn vectors of \nCl.(Tj), denoted by Ct(Tj), and computing StCtCTj). This is done in the following \nway: m first layer populations are chosen to form the neuron's input receptive field The \nneuron's output is given by the sigmoid function \n\nm \n\n4 \n\nUjl = g(1: 1: Jij1:1Sik -1-'), \n\n(5) \n\n;=1 k=1 \n\nin which Jij1:1 denotes the strength of the synaptic connection between the /-th output \nneuron in the second layer population representing heading direction T j and the k-th input \nneuron in the first layer population representing the optic flow vector 8;, I-' denotes the \nthreshold For the synaptic strengths we require that: \n\n4 \n\nm \n1:1: Jij1:1S i1: = StCt(Tj ). \ni=1 1:=1 \n\n(6) \n\n\f436 \n\nLappe and Rauschecker \n\nAt a single image location i this is: \n\nijlr:1Silr:=ui C1. .(T.) \n\nnt (Cl~2i-1 (Tj)) \n. \n\n~ J \nL..J \nlr:=1 \n\n1,21 \n\n, \n\nSubstituting eq. (I) we find: \n\n~ t (Cl~2i-1(Tj)) \n. \nlr:=1 \nTherefore we set the synaptic strengths to: \n\n~ \nL..J J,jlr:1 S ,lr: = L..J Silr:eilr: C1. .(T.) \nlr:=1 \n\n1,21 \n\n, \n\nJij1r:l = eilr: C1. .(T.) \n, \n\nt (Cl~2i-1(Tj)) \n. \n\n1,21 \n\nThen. whenever T j is compatible with the measured optic flow, i.e. when 8 is in the range \nofe(Tj), the neuron receives a net input of zero. In the second step, another neuron Ujl1 is \nconstructed so that the sum of the activities of the two neurons is maximal in this situation. \nBothneurons are connected to the same set of image locations but their connection strengths \nsatisfY Jij k 11 = - Jij kl\u00b7 In addition, the threshold JL is given a slightly negative value. Then \nboth their sigmoid transfer functions overlap at zero input, and the sum has a single peak. \nFinally, the neurons in every second layer population are organized in such matched pairs \nso that each population j generates its maximal activity when R(T j) = O. \nIn simulations, our network is able to compute the direction of heading with a mean error \nof less than one degree in agreement with human psychophysical data (see Lappe and \nRauschecker, 1993). Like heading detection in human observers it functions over a wide \nrange of speeds, it works with sparse flow fields and it needs depth in the visual environment \nwhen eye movements are performed. \n\n3 DIFFERENT RESPONSE SELECTIVITIES \n\nFor the remainder of this paper we will focus on the second layer neuron's response proper(cid:173)\nties by carrying out simulations analogous to neurophysiological experiments (Andersen et \nal., 1990; DuffY and Wurtz, 1991; Orban et aI., 1992). A single neuron is constructed that \nreceives input from 30 random image locations forming a 60 x 60 degree receptive field \nThe receptive field occupies the lower left quadrant of the visual field and also includes \nthe fovea (Fig. IA). The neuron is then presented with shifting, expanding/contracting and \nrotating optic flow patterns. The center (XC) Yc) of the expanding/contracting and rotating \npatterns is varied over the 100 x 100 degree visual field in order to test thepositiondepen(cid:173)\ndence of the neuron's responses. Directional tuning is assessed via the direction ~ of the \nshifting patterns. All patterns are obtained by choosing suitable translations and rotations \nin eq. (2). For instance, rotating patterns centered at (XC) Yc) are generated by \n\nT=O \n\nand \n\n0= Jz1:~~+P (~). \n\n(7) \n\nIn keeping with the most common experimental condition, all depth values Z(x,) Yi) are \ntaken to be equal. \n\n\fComputation of Heading Direction From Optic Flow in Visual Cortex \n\n437 \n\nA \n\nc \n\nE \n\n-... \n\nm: \n\n\u2022 \n\nJI \n\n.. . . \n\n\" \n\n\u2022\u2022 \n\n\u2022 _.\\ \n\n\u2022\u2022 \n\n\u2022 \n\n\u2022 \n\n_Ja \n\n\u2022 \n\n\u2022 \n\nFigure 1: Single Neuron Responding To All Three Types Of Optic Flow Stimuli \n(\" Triple Component Cell\") \n\nIn the following we consider different assumptions about the observer's eye movements. \nThese assumptions change the equations of the subspace algorithm. The rotational matrix \nB(.x, y) takes on different forms. We will show that these changes result in differwt \ncell types. First let us restrict the model to the biologically most important case: During \nlocomotion in a static environment the eye movements of humans or higher animals are \nusually the product of intentional behavior. A very common situation is the fixation of a \nvisible object during locomotion. A specific eye rotation is necessary to compensate for the \ntranslational body-movement and to keep the object fixed in the center (0, 0) of the visual \nfield, so that its image velocity eq. (2) vanishes: \n\n8(0,0) = ZF \n\n1 (-ITx) \n\n(0) \n-ITy + +jOx = 0 \n\n(-lOY) \n\n. \n\n(8) \n\nZF denotes the distance of the fixation point. We can easily calculate Ox and Oy from \neq. (8) and chose Oz = O. The optic flow eq. (2) in the case of the fixation of a stationary \nobject then is \n\nwith \n\n1 \n\n1 -\n\n-\n8(.x,y) = Z(.x,y)A(.x,y)T + ZFB(.x,y)T, \n(.xy)/ f 0) \n\n( 1+.x2 / f \n(.xy)/ j \n\n-\nB(.x, y) = \n\n1+ y2 / f 0 \n\n. \n\nWe would like to emphasize that another common situation, namely no eye movements at \nall, can be approximated by Z F -+ (X). We can now construct a new matrix \n~(\"\" YllT ) \nB(.xm , Ym)T \n\no \n\nand form synaptic connections in the same way as described above. The resulting network \nis able to deal with the most common types of eye movements. The response properties of a \n\n\f438 \n\nLappe and Rauschecker \n\ny \n\nA \n\n.. \n... \n. \" \n-10 \u00b7 . . \nt . -.. \n\u2022\u2022 \u2022 \n.. . \n\u2022 \u2022\u2022 \n\u2022 \u2022 \n\u2022 \u2022\u2022 \n\u2022\u2022 \nB \n\n\u2022 \n\noo\u00b7 \n\n-.0 \n\n-30 \n\n-oo\u00b7 _10\u00b7 \n\n.0\u00b7 \n\n\u00b750\u00b7 \n\n-... \n\nJ'J' \n1'1' \n\nC \n\n0 \n\nE \n\nF \n\n\u2022 \n\nFigure 2: Neuron Selective For Two Components (\"Double Component Cell\") \n\nsingle neuron from such a network are shown in Fig. 1. The neuron is selective for all three \ntypes of flow patterns. It exhibits broad directional tuning (Fig. IB) for upward shifting \npatterns (~ = 90 deg.). The responses to expanding (Fig. Ie), contracting (Fig. ID) and \nrotating (Fig. IE-F) patterns show large areas of position invariant selectivity. Inside the \nreceptive field, which covers the second quadrant (see destribution of input locations in \nFig. IA), the neuron favors upward shifts, contractions and counterclockwise rotations. It is \nthus compatible with a triple component cell in MSTd Also, lines are visible along which \nthe selectivities reverse. This happens because the neuron's input is a linear function of the \nstimulus position (xc, Yc). For example. for rotational patterns we can calculate the input \nusing eqs. (2), (6), and (7): \n\n~~ J .. s\u00b7 -\n~~ 1.1 l1 ,k - v' 2 \n\n':=1 k=1 \n\n\u00b1O ~ ~(x \n\nf2 ~ z \nF \n\ni=1 \n\nXc + Yc + \n\n2 \n\n)Bt(x. .)(Cl~2i-1(T;)) \n. \n\n\"y, Cl. .(T\u00b7) \n\nc,Yc,f \n\n1,21 \n\n.1 \n\nAs long as the threshold J.t is small, the neuron's output is halfway between its maximal and \nminimal values whenever its input is zero, i.e. when \n\nThis is the equation of a line in the (xc, Yc) plane. The naJIon's selectivity for rotations \nreverses along this line. A similar equation holds expansion/contraction selectivity. \n\nNow, what would the neuron's selectivity look like, if we had not restricted the eye mov~ \nments to the case of the fixation of an object. The responses of a neuron that is constructed \nfollowing the unconstrained version of the algorithm, as described in section 2, is shown in \nFig. 2. There is no selectivity for clockwise versus counterclockwise rotations at all, since \nboth patterns elicit the same response everywhere in the visual field Inside the receptive \nfield the neuron favors contractions and shifts towards the upper left (4) = 150 deg.). It \ncan thus be regarded as a double component cell. To understand the absence of rotational \nselectivity we have to calculate the whole rotational optic flow pattern 0 rot by inserting T \n\n\fComputation of Heading Direction From Optic Flow in Visual Cortex \n\n439 \n\nc \n\nE \n\n.. \n\ny \n\n... \n\nA \n\n. \" \n. .. \n\n-... \n\n\u2022\u2022 \n\n-'0\u00b7 \n\n\u2022 \n\u2022 \n..... \n\nB \n\n\u2022 \n\nFigure 3: Predominantly Rotation Selective Neuron (\"Single Component Cell\") \n\nand n from eq. (7) into eq. (3). C(T) becomes \n... 0 \n0 \n: \n... \n: \no ... 0 \n\nC(O) = \n\n( \n\nDenoting the three rightmost colwnn vectors of C (T) by B I\u2022 B 2 , and B3 we:find \n\n\u00b1n \n\n8 rot = \n\nVz~ + y~ + /2 \n\n(ZeBI + Ye B 2 + /B3). \n\nComparison to C(T), eq. (4), shows that 8 rot can be written as a linear combination of \ncolwnn vectorsofC(T). Thus 8 rot lies in therangeofC(T) and is orthogonal to Cl.(T), \nso that 8 rot ct (Tj) = 0 for all j and I. From eqs. (5) and (6) it follows, that the neuron's \nresponse to any rotationalpattem is always Ujl = g( -1-'). \nThe last type of eye movements we want to consider is that of a general frontoparallel \nrotation, which is defined by nz = O. In addition to the fixation of a stationary object, fron(cid:173)\ntoparallel rotations also include smooth pursuit eye movements necessary for the fixation \nof a moving object. Inserting nz = 0 into eq. (2) gives \n\n(nx) \n9(z,y) = Z(x,y)A(z,y)T+B(z,y) Oy \n\n1 \n\nA \n\nA \n\nwith \n\nH(z y) -\n, -\n\n( \n\nzy/ / \n\n/ + y2 / / \n\n-(I + x 2 / f) ) \n\n-zy/ / \n\nnow being a 2 x 2 matrix, so that C(T), eq. (4), becomes a 2m x (m + 2) matrix \nC(T). A neuron that is constructed using C(T) can be seen in Fig. 3. It best responds to \ncounterclockwise rotational patterns showing complete position invariance over the visual \nfield The neuron is much less selective to expansions and unidirectional shifts, since \n\n\f440 \n\nLappe and Rauschecker \n\nthe responses never reach saturation. It therefore resembles a single component rotation \nselective cell. The position invariant behavior can again be explained by looking at the \nrotational optic flow pattern. Using the same argument as above, one can show that the \nneuron's input is zero whenever Oz vanishes, i.e. when the rotational axis lies in the \n(X, Y)-plane. Then the flow pattern becomes \n\nBrot = \n\nv'x~ + y~ + /2 \n\n\u00b1O \n\nA \n\n(XcBl + yc B 2), \n\nA \n\nand is an element of the range of C(T;). The (X, Y)-plane thus splits the space of all \nrotational axes into two half spaces, one in which the neuron's input is always positive and \none in which it is always negative. Clockwise rotations are characterized by Oz > 0 and \nhence all lie in the same half space, while counterclockwise rotations lie in the other. As a \nresult the neuron is exclusively excited by one mode of rotation in all of the visual field \n\n4 Conclusion \n\nOur neural network model for the detection of ego-motion proposes a computational map \nof heading directions. A similar map could be contained in area MSTd of monkey visual \ncortex. Cells in MSTd exhibit a varying degree of selectivity for basic optic flow patterns, \nbut often show a substantial indifference towards the spatial position of a stimulus. By using \na population encoding of the heading directions, individual neurons in the model exhibit \nsimilar position invariant responses within large parts of the visual field Different neuronal \nselectivities found in MSTd can be modelled by assuming specializations pertaining to \ndifferent types of eye movements. Consistent with experimental findings the position \ninvariance of the model neurons is largest in the single component cells and less developed \nin the double and triple component cells. \n\nReferences \n\nAllman, J. M. and Kaas, 1. H. 1971. Brain Res. 31,85-105. \nAndersen, R, Graziano, M., and Snowden, R 1990. Soc. Neurosci. Abstr. 16, 7. \nBrenner, E. and Rauschecker, J. P. 1990. J. Physiol. 423,641-660. \nDuffy, C. J. and Wurtz, R H. 1991. J. Neurophysiol. 65(6), 1329-1359. \nGibson, J. J. 1950. The Perception of the Visual World. HoughtonMi:fHin, Boston. \nHeeger, D. J. and Jepson, A. 1992. Int. J. Compo Vis. 7(2),95-117. \nLappe, M. and Rauschecker, J. P. 1993. Neural Computation (in press). \nMaunsell,1. H. R and Van Essen, D. C. 1983. J. Neurophysiol. 49(5), 1127-1147. \nOrban, G. A., Lagae, L., Vern, A., Raiguel, S., Xiao, D., Maes, H., and Torre, V. 1992. \n\nProc. Nat. Acad Sci. 89,2595-2599. \n\nRauschecker, J. P., von Griinau, M. W., and Poulin, C. 1987. J. Neurosci. 7(4),943-958. \nRieger, J. H. and Toe!, L. 1985. BioI. Cyb. 52,377-381. \nStone, L. S. and Perrone, J. A. 1991. In Soc. Neurosci. Abstr. 17,857. \nTanaka, K. and Saito, H.-A. 1989. J. Neurophysiol. 62(3),626--641. \nWang, H. T., Mathur, B. P. and Koch, C. 1989. Neural Computation 1,92-103. \nWarren, W. H. Jr., and Hannon, D. J. 1988. Nature 336, 162-163. \n\n\f", "award": [], "sourceid": 652, "authors": [{"given_name": "Markus", "family_name": "Lappe", "institution": null}, {"given_name": "Josef", "family_name": "Rauschecker", "institution": null}]}