{"title": "VLSI Implementation of Cortical Visual Motion Detection Using an Analog Neural Computer", "book": "Advances in Neural Information Processing Systems", "page_first": 685, "page_last": 691, "abstract": "", "full_text": "VLSI Implementation of Cortical Visual Motion \nDetection Using an Analog Neural Computer \n\nRalph Etienne-Cummings \n\nElectrical Engineering, \n\nSouthern Illinois University, \n\nCarbondale, IL 62901 \n\nJan Van der Spiegel \n\nThe Moore School, \n\nUniversity of Pennsylvania, \n\nPhiladelphia, PA 19104 \n\nNaomi Takahashi \nThe Moore School, \n\nUniversity of Pennsylvania, \n\nAlyssa Apsel \n\nElectrical Engineering, \n\nCalifornia Inst. Technology, \n\nPaul Mueller \nCorticon Inc., \n\n3624 Market Str, \n\nPhiladelphia, PA 19104 \n\nPasadena, CA 91125 \n\nPhiladelphia, PA 19104 \n\nAbstract \n\nTwo dimensional image motion detection neural networks have been \nimplemented using a general purpose analog neural computer. \nThe \nneural circuits perform spatiotemporal feature extraction based on the \ncortical motion detection model of Adelson and Bergen. The neural \ncomputer provides the neurons, synapses and synaptic time-constants \nrequired to realize the model in VLSI hardware. Results show that \nvisual motion estimation can be implemented with simple sum-and(cid:173)\nthreshold neural hardware with temporal computational capabilities. \nThe neural circuits compute general 20 visual motion in real-time. \n\n1 INTRODUCTION \nVisual motion estimation is an area where spatiotemporal computation is of fundamental \nimportance. Each distinct motion vector traces a unique locus in the space-time domain. \nHence, the problem of visual motion estimation reduces to a feature extraction task, with \neach feature extractor tuned to a particular motion vector. Since neural networks are \nparticularly efficient feature extractors, they can be used to implement these visual motion \nestimators. Such neural circuits have been recorded in area MT of macaque monkeys, \nwhere cells are sensitive and selective to 20 velocity (Maunsell and Van Essen, 1983). \n\nIn this paper, a hardware implementation of 20 visual motion estimation with \nspatiotemporal feature extractors is presented. A silicon retina with parallel, continuous \ntime edge detection capabilities is the front-end of the system. Motion detection neural \nnetworks are implemented on a general purpose analog neural computer which is \ncomposed of programmable analog neurons, synapses, axon/dendrites and synaptic time-\n\n\f686 \n\nR. Etienne-Cummings, 1. van der Spiege~ N. Takahashi, A. Apsel and P. Mueller \n\nconstants (Van der Spiegel et al., 1994). The additional computational freedom introduced \nby the synaptic time-constants, which are unique to this neural computer, is required to \nThe motion detection neural circuits are \nrealize the spatiotemporal motion estimators. \nbased on the early ID model of Adelson and Bergen and recent 2D models of David Heeger \n(Adelson and Bergen, 1985; Heeger et at., 1996). However, since the neurons only \ncomputed delayed weighted sum-and-threshold functions, the models must be modified. \nThe original models require division for intensity normalization and a quadratic non(cid:173)\nlinearity to extract spatiotemporal energy. In our model, normalization is performed by \nthe silicon retina with a large contrast sensitivity (all edges are normalized to the same \noutput), and rectification replaces the quadratic non-linearity. Despite these modifications, \nwe show that the model works correctly. The visual motion vector is implicitly coded as \na distribution of neural activity. \n\nDue to its computational complexity, this method of image motion estimation has not \nbeen attempted in discrete or VLSI hardware. The general purpose analog neural computer \noffers a unique avenue for implementing and investigating this method of visual motion \nestimation. The analysis, implementation and performance of spatiotemporal visual \nmotion estimators are discussed. \n\n2 SPATIOTEMPORAL FEATURE EXTRACTION \nThe technique of estimating motion with spatiotemporal feature extraction was proposed \nIt emerged out of the \nby Adelson and Bergen in 1985 (Adelson and Bergen, 1985). \nobservation that a point moving with constant velocity traces a line in the space-time \ndomain, shown in figure la. The slope of the line is proportional to the velocity of the \npoint. Hence, the velocity is represented as the orientation of the line. Spatiotemporal \norientation detection units, similar to those proposed by Hubel and Wiesel for spatial \norientation detection, can be used for detecting motion (Hubel and Wiesel, 1962). In the \nfrequency domain, the motion of the point is also a line where the slope of the line is the \nvelocity of the point. Hence orientation detection filters, shown as circles in figure lb, \ncan be used to measure the motion of the point relative to their tuned velocity. A \npopulation of these tuned filters, figure ic, can be used to measure general image motion. \n\nx -direction \n\n:> \n\nm \ne \n\n(a) \n\n(b) \n\n-Velocity \n\n+Velocity \n\n(c) \n\nFigure 1: (a) ID Motion as Orientation in the Space-Time Domain. \n(b) and (c) Motion detection with Oriented Spatiotemporal Filters. \n\nIf the point exhibits 2D motion, the problem is substantially more complicated, as \nobserved by David Heeger (1987). A point executing 2D motion spans a plane in the \nfrequency domain. The spatiotemporal orientation filter tuned to this motion must also \nspan a plane (Heeger et ai., 1987, 1996). Figure 2a shows a filter tuned to 2D motion. \nUnfortunately, this torus shaped filter is difficult to realize without special mathematical \ntools. Furthermore, to create a general set of filters for measuring general 2D motion, the \nfilters must cover all the spatiotemporal frequencies and all the possible velocities of the \nstimuli. The latter requirement is particularly difficult to obtain since there are two \ndegrees of freedom (vX' v.y) to cover. \n\n\fVLSI Implementation of Cortical Vzsual Motion Detection \n\n687 \n\n(a) \n\nI \nIi-\n\n(b) \n\nFigure 2: (a) 20 Motion Detection with 20 Oriented Spatiotemporal \nFilters. (b) General 20 Motion Detection with 2 Sets of 10 Filters. \n\nTo circumvent these problems, our model decomposes the image into two orthogonal \nimages, where the perpendicular spatial variation within the receptive field of the filters \nare eliminated using spatial smoothing. Subsequently, ID spatiotemporal motion \ndetection is used on each image to measure the velocity of the stimuli. This technique \nplaces the motion detection filters, shown as the circles in figure 2b, only in the rox-rot \nand roy-rot planes to extract 20 motion, thereby drastically reducing the complexity of the \n20 motion detection model from O(n2) to O(2n). \n\n2.1 CONSTRUCTING THE SPATIOTEMPORAL MOTION FILTERS \n\nThe filter tuned to a velocity vox (vOy) is centered at ro\"x (rooy) and root where vox = roo/roox \n(voy = roo/roOy)' To create the filters, quadrature pairs (i.e. odd and even pairs) of spatial \nand temporal band-pass filters centered at the appropriate spatiotemporal frequencies are \nsummed and differenced (Adelson and Bergen, 1985). The 1tI2 phase relationship between \nthe filters allows them to be combined such that they cancel in opposite quadrants, \nleaving the desired oriented filter, as shown in figure 3a. Equation 1 shows examples of \nquadrature pairs of spatial and temporal filters implemented. The coefficients of the filters \nbalance the area under their positive and negative lobes. The spatial filters in equation 1 \nhave a 5 x 5 receptive field, where the sampling interval is determined by the silicon \nretina. Figure 3b shows a contour plot of an oriented filter (a=11 rads/s, 02=201=4Da). \nS(even) = [0.5 - 0.32Cos(wx )- 0.18Cos(2wx )] \nS(odd) = [-{).66jSin(wx ) - 0.32jSin(2wx )] \n\n(b) \n\n(a) \n\nT(even) = \n\nT(odd) = \n\n-w26 \nt 2 \n\n. 66 \nJWt 1 2 \n\n(jwt + a)(jwt + 61 )(jwt + 62 ) , \n\n. a\u00ab 6 \"\" 6 \n2 \n\n1 \n\n. a\u00ab 6 \"\" 6 \n2 \n\n1 \n\n(c) \n\n(d) \n\n(I) \n\n(jwt +a)(jwt +61)(jwt +62 ) , \n\n(e) \n\nLeft Motion = S(e)T(e) - S(o)T(o) or S(e)T(o) - S(o)T(e) \nRight Motion = S(e)T(e)+S(o)T(o) or S(e)T(o)+S(o)T(e) \nTo cover a wide range of velocity and stimuli, multiple filters are constructed with \nvarious velocity, spatial and temporal frequency selectivity. Nine filters are chosen per \ndimension to mosaic the rox-rot and roy-rot planes as in figure 2b. The velocity of a \nstimulus is given by the weighted average of the tuned velocity of the filters, where the \nweights are the magnitudes of each filter's response. All computations for 20 motion \ndetection based on cortical models have been realized in hardware using a large scale \ngeneral purpose analog neural computer. \n\n(f) \n\n\f688 \n\nR. Etienne-Cummings, J. van der Spiegel, N. Takahashi, A. Apsel and P. Mueller \n\nTuned Velocity: Vx = 6.3 mmls \n\n60 \n\n30 \n\nWt o \n[Hz] \n\n-30 \n\n-60 \n\n'\n\n__ 1_ l \nI \n\nI \n\nI \n\nI \n\n-~l r .. t~ Wx \n\n~ = '\u00ae;. \nI \n\n- l ='\nI \nI \n\nI \n\nS(e)T(e)+S(o)T(o) \nRight Motion (+vx) \n\nS(e)T(e)-S(o)T(o) \nLeft Motion (-vx) \n\n(a) \n\n-3 .0 -1.5 0 \n\nI \nOOx [Vmm] \n\n(b) \n\nFigure 3: \nContour Plot of One of the Filters Implemented. \n\n(a) Constructing Oriented Spatiotemporal Filters. \n\n(b) \n\n3 HARDWARE IMPLEMENTATION \n3.1 GENERAL PURPOSE ANALOG NEURAL COMPUTER \n\nThe computer is intended for fast prototyping of neural network based applications. It \noffers the flexibility of programming combined with the real-time performance of a \nhardware system (Mueller, 1995). It is modeled after the biological nervous system, i.e. \nthe cerebral cortex, and consists of electronic analogs of neurons, synapses, synaptic time \nfunctional and \nconstants and axon/dendrites. The hardware modules capture \ncomputational aspects of the biological counterparts. The main features of the system are: \nconfigurable interconnection architecture, programmable neural elements, modular and \nexpandable architecture, and spatiotemporal processing. These features make the network \nideal to implement a wide range of network architectures and applications. \n\nthe \n\nThe system, shown in part in figure 4, is constructed from three types of modules (chips): \n(1) neurons, (2) synapses and (3) synaptic time constants and axon/dendrites. The neurons \nhave a piece-wise linear transfer function with programmable (8bit) threshold and \nminimum output at threshold. The synapses are implemented as a programmable \nresistance whose values are variable (8 bit) over a logaritnmic range between 5KOhm and \nlOMohm. \nThe time constant, realized with a load-compensated transconductance \namplifier, is selectable between O.5ms and Is with a 5 bit resolution. The axon/dendrites \nare implemented with an analog cross-point switch matrix. The neural computer has a \ntotal of 1024 neurons, distributed over 64 neuron modules, with 96 synaptic inputs per \nneuron, a total of 98,304 synapses, 6,656 time constants and 196,608 cross point \nswitches. Up to 3,072 parallel buffered analog inputs/outputs and a neuron output analog \nmulitplexer are available. A graphical interface software, which runs on the host \ncomputer, allows the user to symbolically and physically configure the network and \ndisplay its behavior (Donham, 1994). Once a particular network has been loaded, the \nneural network runs independently of the digital host and operates in a fully analog, \nparallel and continuous time fashion. \n\n3.2 NEURAL IMPLEMENTATION OF SPATIOTEMPORAL FILTERS \n\nThe output of the silicon retina, which transforms a gray scale image into a binary image \nof edges, is presented to the neural computer to implement the oriented spatiotemporal \nfilters. The first and second derivatives of Gaussian functions are chosen to implement \nthe odd and even spatial filters, respectively. They are realized by feeding the outputs of \n\n\fVLSI Implementation of Cortical VISual Motion Detection \n\n689 \n\nNeuron~ \n\n~ Synapses wij \n\"'-\n\n/ \n\n\u2022 \u2022 \u2022 \n\nkL.~ \nILft~ \n\n\u2022 \n\u2022\u2022 \n\nTime \n\nc \n\nonstants \n\n.::oW itches \n\n( \n\nANALOG INPUTS AND OUTPUTS \n\n~ \n\nFigure 4: Block Diagram of the Overall Neural Network Architecture. \n\nthe retina, with appropriate weights, into a layer of neurons. Three parallel channels \nwith varying spatial scales are implemented for each dimension. The output of the even \n(odd) spatial filter is subsequently fed to three parallel even (odd) temporal filters, which \nalso have varying temporal tuning. Hence, three non-oriented pairs of spatiotemporal \nfilters are realized for each channel. Six oriented filters are realized by summing arx.l \ndifferencing the non-oriented pairs. The oriented filters are rectified, and lateral inhibition \nis used to accentuate the higher response. Figure 4 shows a schematic of the neural \ncircuitry used to implement the orientation selective filters. \n\nThe image layer of the network in figure 5 is the direct, parallel output of the silicon \nretina. A 7 x 7 pixel array from the retina is decomposed into 2, 1 x 7 orthogonal linear \nimages, and the nine motion detection filters are implemented per image. The total \nnumber of neurons used to implement this network is 152, the number of synapse is 548 \nand the number of time-constants is 108. The time-constant values ranges from 0.75 ms \nto 375 ms. After the networks have been programmed into the VLSI chips of the neural \ncomputer, the system operates in full parallel and continuous time analog mode. \nConsequently, this system realizes a silicon model for biological visual image motion \nmeasurement, starting from the retina to the visual cortex. \n\nOdd Spatial Filter S(o)\"' dG(x)ldx\"' 2xExp(-x2) \n\nEven Spatial Filter S(e)\"' ()2G(x)ldx2 = C4x2_2)Exp(-x2) \n\n!l\\ \n\\i'V ~u- ~ \nooW 0 \u00b0 0?1b210 \n\n.ffit \n\nIt\\. \n\nVelocity Selective Spatiotemporal Filters \n\nFigure 5: \nSpatiotemporal Filters. \n\nNeural Network \n\nImplementation of \n\nthe Oriented \n\n\f690 \n\nR. Etienne-Cummings, 1. van der Spiegel, N. Takahashi, A. Apsel and P. Mueller \n\n4 RESULTS \nThe response of the spatiotemporal filters implemented with the neural computer are \nshown in figure 6. The figure is obtained by sampling the output of the neurons at \nIMHz using the on-chip analog multiplexers. In figure 6a, the impulse response of the \nspatial filters are shown as a point moves across their receptive field. Figure 6b shows \nthe outputs of the even and odd temporal filters for the moving point. At the output of \nthe filters, the even and odd signals from the spatial filters are no longer out of phase. \nThis transformation yields to constructive or destructive interference when they are \nsummed and differenced. When the point move in opposite direction, the odd filters \nchanges such that the output of the temporal filters become 1800 out of phase. \nSubsequent summing and differencing will have the opposite result. Figure 6c shows the \noutput for all nine x-velocity selective filters as a point moves with positive velocity . \n\n~ ...... :....... .. + .... ; ... +....\u00b7-f--\u00b7+\u00b7i .... \u00b7 .... f .... j .... ! .... \u00b7 \n\n..... ,.. .; ...... ;...... [~::o~:':l :\u00b7:~~:\u00b7~\u00b7\u00b711 :\u00b7~~,\u00b7~'\u00b7\u00b7\u00b7\u00b7\u00b7I \nliliIt1\u00b1JffiIJ \n0].0 .. _00 (a)::!\u00ab':!: ti.::li~\u00b7!1 \n=\u00b7T\u00b7 .. r .... ' .... 1 .. ; .... \u00b7\u00b71 ...... \u00b7 .. \u00b7\u00b7: \n.. 1\u00b7 .. \u00b7 .. 1 ........... : .+ .. -+ .. .. \nc:~::::::t:~\u00b7i\u00b7' :; to;:::': I \n, .. , ........ , ~ \", \n.. -.. \":\" ._-:--_ .. _!. ...... \n: : : \n\u00b7\u00b7\u00b7\u00b7\u00b7r\u00b7\u00b7\u00b7-r\u00b7\u00b7\u00b7\u00b7\u00b7r\u00b7\u00b7\u00b7-- \u00b7\u00b7\u00b7\u00b7t\u00b7o\u00b7\u00b7r\u00b7\u00b7\u00b7\u00b7r\u00b7\u00b7\u00b7\u00b7\u00b7 \u00b7\u00b7-\u00b7\u00b7r\u00b7\u00b7\u00b7\u00b7r-\u00b7\u00b7\u00b7 \u00b7~\u00b7\u00b7\u00b7 \u00b7 -\n..... ; .-: ...... -;...... \n.; ... 11 j' I I ... : .. : ... ::;.: .... i.: .. \u00b71 .. \u00b7: .. ~ .. :\u00b7;\u00b7 .. \u00b7\u00b7; .. \u00b7 .. I .. \u00b7'..:: .... :;:\u00b7 .. \u00b7i. \"' 1 \n6 1\ndiJ\"i.C~) .~IT.r::l{~)i,;.~.r+: \n\n: : : \n\n: : : \n\nFigure 6: Output of the Neural Circuits for a Moving Point: \nSpatial Filters, (b) Temporal Filters and (c) Motion Filters. \n\n(a) \n\nFigure 7 shows the tuning curves for the filters tuned to x-motion. The variations in the \nresponses are due to variations in the analog components of the neural computer. Some \naliasing is noticeable in the tuning curves when there is a minor peak in the opposite \ndirection. This results from the discrete properties of the spatial filters, as seen in \n\nTuning Curves \n\n1.0 \n\n08 \n\n06 \n\n0.4 \n\n0.2 \n\nS \n\n~ \n~ -= \nu:: \n'\" ~ :a \ne 0 z \n\n0.0 \n\n\u00b78.0 \n\n.~. \n\n\" \n\n\u00b76.0 \n\n.4.0 \n\n\u00b72 0 \n\n00 \n\nSpeed [em/sl \n\n20 \n\n4.0 \n\n6.0 \n\n80 \n\nFigure 7: Tuning Curves for the Nine X-Motion Filters. \n\n___ +2~ mml, \n\u00b7._ \u2022\u2022 -27mmfs \n___ .c)mmls \n.. _g_. \"mmls \n--+-- .... mmls \n__ ._ \u2022. 5 mmfs \n--...- +22 mmls \n\n:..::..:..: +l!:::~: \n\n__ \u2022\u2022 _ -6 mm/s \n___ +) . ~ mm/:l \n- -e -- -3.2 mmls \n--+-- +14.5 mml~ \n--. -- -Bmm/s \n--..-- +3 ~ mm/3 \n--a - --6 mm/s \n_+3mm1s \n--9 - - -2.t mmls \n\n\fVLSI Implementation a/Cortical Visual Motion Detection \n\n691 \n\nfigure 3b. Due to the lateral inhibition employed, the aliasing effects are minimal. \nSimilar curves are obtained for the y-motion tuned filters. \nFor a point moving with v. = 8.66 mmls and Vy = 5 mmls, the output of the motion \nfilters are shown in Table I. Computing a weighted average using equation 2, yields v.m \n= 8.4 mmls and v ym = 5.14 mmls. This result agrees with the actual motion of the point. \nv = ~ vtunedo./~ 0 \n(2) \nmkJ l lL .J1 \n\ni \n\ni \n\nTable 1: Filter Responses for a Point Moving at 10 mmls at 30\u00b0. \n\nX Filters [Speed in mmls] \n9 \n\n3.5 14 . 3.5 \n\n22 \n\n4 \n\n7 \n\n3 \n\nY Filters [Speed in mm/s] \n\nIfuned Speed 25 \nl!iesQonse \ntruned ~peed -27 \nResponse \n\n4. 1 \n0.52 0.95 0.57 0.53 0.9 0.3 0.7 0.9 0.31 0.35 0.67 0.92 0 .3 0.85 0 .9 0.54 0.9 \n-5 \n\n-4.1 \n0.0 0.05 0.1 0. 1 0.05 0.05 0.0 0.05 0.1 0. 1 O.O~ 0.1 \n\n-2 \n0.3 0.05 0.01 0.23 0.05 0. 1 \n\n20 7.8 3.7 15 \n\n-2 .1 -25 \n\n3.5 \n0.9 \n\n26 9.5 \n\n-3.2 \n\n-13 \n\n-14 \n\n-21 \n\n-7 .7 \n\n-5 \n\n-18 \n\n-6 \n\n-7 \n\n-4 \n\n5 \n\n-6 \n\n-8 \n\n5 CONCLUSION \n2D image motion estimation based on spatiotemporal feature extraction has been \nimplemented in VLSI hardware using a general purpose analog neural computer. The \nneural circuits capitalize on the temporal processing capabilities of the neural computer. \nThe spatiotemporal feature extraction approach is based on the 1 D cortical motion \ndetection model proposed by Adelson and Bergen, which was extended to 2D by Heeger et \nal. To reduce the complexity of the model and to allow realization with simple sum-and(cid:173)\nthreshold neurons, we further modify the 2D model by placing filters only in the (O.-rot \nand (Oy-(Ot planes, and by replacing quadratic non-linearities with a rectifiers. The \nmodifications do not affect the performance of the model. While this technique of image \nmotion detection requires too much hardware for focal plane implementation, our results \nshow that it is realizable when a silicon \"brain,\" with large numbers of neurons and \nsynaptic time constant, is available. This is very reminiscent of the biological master. \n\nReferences \n\nE. Adelson and J. Bergen, \"Spatiotemporal Energy Models for the Perception of Motion,\" \n1. Optical Society of America, Vol. A2, pp. 284-99, 1985 \n\nC. Donham, \"Real Time Speech Recognition using a General Purpose Analog \nNeurocomputer,\" Ph.D. Thesis, Univ. of Pennsylvania, Dept. of Electrical Engineering, \nPhiladelphia, PA, 1995. \n\nD. Heeger, E. Simoncelli and J. Movshon, \"Computational Models of Cortical Visual \nProcessing,\" Proc. National Academy of Science, Vol. 92, no. 2, pp. 623, 1996 \n\nD. Heeger, \"Model for the Extraction of Image Flow,\" 1. Optical Society of America, \nVol. 4, no. 8, pp. 1455-71 , 1987 \n\nD. Hubel and T. Wiesel, \"Receptive Fields, Binocular Interaction and Functional \nArchitecture in the Cat's Visual Cortex,\" 1. Physiology, Vol. 160, pp. 106-154, 1962 \n\nJ. Maunsell and D. Van Essen, \"Functional Properties of Neurons in Middle Temporal \nI. Selectivity for Stimulus Direction, Speed and \nVisual Area of the Macaque Monkey. \nOrientation,\" 1. Neurophysiology , Vol. 49, no. 5, pp. 1127-47, 1983 \n\nP. Mueller, 1. Van der Spiegel, D. Blackman, C. Donham and R. Etienne-Cummings, \"A \nProgrammable Analog Neural Computer with Applications to Speech Recognition,\" \nProc. Compo & Info. Sci. Symp. (CISS), J. Hopkins, May 1995. \n\n\f", "award": [], "sourceid": 1287, "authors": [{"given_name": "Ralph", "family_name": "Etienne-Cummings", "institution": null}, {"given_name": "Jan", "family_name": "Van der Spiegel", "institution": null}, {"given_name": "Naomi", "family_name": "Takahashi", "institution": null}, {"given_name": "Alyssa", "family_name": "Apsel", "institution": null}, {"given_name": "Paul", "family_name": "Mueller", "institution": null}]}