{"title": "An Auditory Localization and Coordinate Transform Chip", "book": "Advances in Neural Information Processing Systems", "page_first": 787, "page_last": 794, "abstract": null, "full_text": "An Auditory Localization \n\nand  Coordinate Transform Chip \n\nComputation and  Neural Systems Program \n\nCalifornia Institute of Technology \n\nTimothy K.  Horiuchi \ntimmer@cns.caltech.edu \n\nPasadena, CA  91125 \n\nAbstract \n\nThe  localization  and  orientation  to  various  novel  or  interesting \nevents  in  the  environment  is  a  critical  sensorimotor  ability in  all \nanimals,  predator  or  prey.  In  mammals,  the  superior  colliculus \n(SC)  plays  a  major  role  in  this  behavior,  the  deeper  layers  ex(cid:173)\nhibiting topographically mapped responses to visual, auditory, and \nsomatosensory  stimuli.  Sensory  information arriving from  differ(cid:173)\nent modalities should  then be  represented  in  the same coordinate \nframe.  Auditory  cues,  in  particular,  are  thought to be  computed \nin head-based coordinates which must then be transformed to reti(cid:173)\nnal coordinates.  In this paper, an analog VLSI implementation for \nauditory localization in the azimuthal plane is described  which ex(cid:173)\ntends  the  architecture  proposed for  the  barn owl to a  primate eye \nmovement system  where  further  transformation  is  required.  This \ntransformation is intended to model the projection in primates from \nauditory  cortical areas to the deeper layers of the primate superior \ncolliculus.  This  system  is  interfaced  with  an  analog  VLSI-based \nsaccadic eye movement system also being constructed  in our labo(cid:173)\nratory. \n\nIntroduction \n\nAuditory localization has been studied in many animals, particularly the barn owl. \nMost  birds have  a  resolution  of only  10  to 20  degrees,  but owls  are  able  to orient \n\n\f788 \n\nTimothy  Horiuchi \n\nto  sound  with  an  accuracy  of  1  to  2  degrees  which  is  comparable  with  humans. \nOne  important  cue  for  localizing sounds  is  the  relative  time  of arrival  of a  sound \nto  two spatially  separated  ears.  A  neural  architecture  first  described  by  Jeffress \n(1948)  for  measuring this  time  difference  has been  shown to exist  in  the  barn owl \nauditory  localization system  (Konishi  1986) .  An  analog  VLSI  implementation  of \nthe  barn  owl system  constructed  by Lazzaro  (1990)  is  extended  here  to include  a \ntransformation from head  coordinates to retinal coordinates. \n\nIn comparison to the barn owl, the neurophysiology of auditory localization in cats \nand primates is not as  well understood  and a  clear map of auditory space  does not \nappear to be present in the inferior colliculus as it is in the owl.  It has been suggested \nthat  cortical  auditory  regions  may provide  the  head-based  map  of auditory space \n(Groh and Sparks  1992). \n\nIn  primates,  where  much of the  oculomotor system is  based  in  retinotopic  coordi(cid:173)\nnates, head-based  information must ultimately be transformed in order  to be used. \nWhile  other  models  of  coordinate  transformation  have  been  proposed  for  visual \ninformation  (e.g.  Zipser  and  Andersen  1988,  Krommenhoek  et  al.  1993)  and for \nauditory information (Groh and Sparks 1992), the model of coordinate transforma(cid:173)\ntion used in this system is a switching network which shifts the entire projection of \nthe head-based  map of auditory space  onto a  retinotopic  \"colliculus\"  circuit.  This \nparticular  model  is  similar to  a  basis  function  approach  where  intermediate units \nhave  compact  receptive  fields  in  an  eye-position  /  head-based  azimuth space  and \nthe output units sum the outputs of a subset  of these units. \nThe auditory localization system described here provides acoustic target information \nto  an  analog  VLSI-based  saccadic  eye  movement system  (Horiuchi,  Bishofberger, \nand Koch  1994) being developed in our laboratory for  multimodal operation. \n\nBandpass \n_::::::10-,.,... .......  Filtering \n@3.2kHz \n\nelectret \ncondenser \nmicrophones \n\nThresholded \nZero-crossings \n\nBandpass \nFiltering \n@3.2kHz \n\nThresholded \nZero-crossings \n\nLocalization \n\nand \n\nCoordinate \nTransform \n\nChip \n\n~--Eye Position \n\nSound Source Location \n(in retinal coordinates) \n\nFigure 1:  Block diagram of the  auditory localization system.  The analog front end \nconsists of external  discrete  analog electronics. \n\n\fAn Auditory Localization and Coordinate  Transform  Chip \n\n789 \n\nM\u00a5 Snop II 45 degrees R of Center \n\n.' \n\n.-\n\n.. \n\ni f 0 \n\n\u00b71 \n\n.. \n\n.  \" \n\n~+-----~-----+----~~----+-----~----~ \n\no \n\nM\u00a5 Snop II CerRr POIIition \n\ni I \u00b71 \n\n~+-----~-----+------~----+-----~----~ \n\no \n\nFinga' Snop 0145 degrees LofCCDtcr \n\nI \nf \n\n\u00b71 \n\n. . \n'  . .. \n~  : \n\n. . \n~  : : ~  ~ : ~ \n\n:  :  \". \n\n~~----~-----+----~~----+-----~----~ \n\no \n\nFigure 2:  Filtered signals of the left  and right microphones from  3 different  angles. \n\n\f790 \n\nTimothy  Horiuchi \n\nThe Localization System \n\nThe analog front-end of the system (see figure  1) consists of three basic components, \nthe  microphones,  the  filter  stage,  and  the  thresholded,  zero-crossing  stage.  Two \nmicrophones  are  placed  with  their  centers  about  2  inches  apart.  For  any  given \ntime difference  in arrival of acoustic stimuli, there are many possible locations from \nwhich  the  sound  could  have  originated.  These  points  describe  a  hyperbola  with \nthe  two microphones  as  the  two foci.  If the  sound  source  is  distant  enough,  we \ncan  estimate the  angle since  the hyperbola approaches  an asymptote.  The current \nsystem operates  on  a  single frequency  and the inter-microphone distance  has been \nchosen  to  be  just  under  one  wavelength  apart  at  the  filter  frequency.  The  filter \nfrequency  chosen  was  3.2  kHz  because  the  author's  finger  snap,  used  extensively \nduring development contained a large component at that frequency.  The next step \nin  the  computation  consists  of triggering  a  digital  pulse  at  the  moment  of zero(cid:173)\ncrossing if the  acoustic signal is  large enough. \n\nThn::sb.olded Zct<Hlrossing Detection Pulses \n\n6 \n\n5 \n\n4 \n\n2 \n\n~  3 \n.e \n} \n! \n< \n] \nit \n\n0 \n\n\u00b71 \n\n\u00b72 \n\n\u00b7 3 \n\n0 \n\n7.5 \n\n5.0 \n\n25 \n\n0.0 \n\n\u00b725 \n\n\u00b75.0 \n\n'-- '-- '-- '-- '-- '-- '--\n\n1\\ \n\nII \n\n\u00b775 \n\nflA~ \nV V  '\"  \u00b710.0 \nV \n\n\u00b7125 \n\nV \n\n~ \n.\u00a7 \n.!! \n~ \nI!' \n'R \nII \n~ \n\n\",...  VV.....,  V \n\n.....  I i  \n\nJ'L \n\n! .....  ' \"  \n\n2 \n\n3 \n\n4 \n\n5 \n\n1imo (aeconda)  (10\") \n\n-15.0 \n\n\u00b7 175 \n\n6 \n\nFigure 3:  Example of output pulses from the external circuitry.  Zero phase is chosen \nto be the positive-slope zero-crossing.  Top:  Digital pulses are generated at the time \nof zero phase for signals whose derivative is larger than a preset threshold.  Bottom: \n3.2 kHz  Bandpass filtered  signal for  a finger  snap. \n\nPhase Detection and Coordinate Transform in Analog VLSI \n\nThe  analog VLSI  component of the system consists  of two axon  delay lines  (Mead \n1988)  which  propagate  the  left  and  right  microphone  pulse  signals  in  opposing \ndirections in order to compute the cross correlation (see  Fig 4.)  The location of the \npeak  in this  correlation  technique represents  the relative  phase of the  two signals. \n\n\fAn Auditory Localization and Coordinate  Transform  Chip \n\n791 \n\nThis technique is described  in more  detail and  with more biological justification by \nLazzaro (1990).  The current implementation contains 15 axon circuits in each delay \nline.  This  is  shown  in  figure  4.  At  each  position  in  the  correlation  delay  line  are \nlogical AND  circuits which output a logic one when there  are two active axon units \nat that location.  Since these units only turn on for specific  time delays, they define \nauditory  \"receptive fields\".  The output of this subsystem are 15  digital lines which \nare  passed  on to the coordinate transform. \n\nLeft \nChannel \nJlllL \n\nRight \nChannel \nJllll \n\nFigure 4:  Diagram of the double axon delay line which accepts  digital spikes on the \ninputs  and  propagates  them across  the  array.  Whenever  two spikes  meet,  a pulse \nis generated  on the output AND units.  The position of the AND circuit  which gets \nactivated indicates the relative time of arrival of the left  and right inputs.  NOTE: \nthe actual circuit contains 15  axon units. \n\nA \nleft ear-.... IIIUI=llI1l~ .... .- right ear \n\nHead-based Auditory Units \n\nf3 \n'S \n::> \ns:: .g \n\u2022 .-4  A \n~ \n~ \n~ \n\n(I) \n\neye position \n\nRetinotopic Auditory Units \n\nFigure 5: \n\nFor the one-dimensional  case  described  in  this  project,  the  appropriate transform \nfrom head to retinal coordinates is a rotation which subtracts the eye position.  The \neye position information on the chip  is represented as a  voltage which activates one \nof the eye  position units.  The spatial pattern of activation from  the auditory units \nis then \"steered\"  to the output stage with the appropriate shift.  (See figure 5).  This \n\n\f792 \n\nTimothy  Horiuchi \n\nis similar to a  shift scheme proposed  by  Pitts and  McCulloch  (1947)  for  obtaining \npitch invariance for  chord recognition.  The eye position units are constructed from \nan array of \"bump\" circuits (Delbriick 1993) which compare the eye position voltage \nwith  its  local  voltage  reference.  The  two dimensional  array  of intermediate units \ntake the digital signal from the auditory units and switch the  \"bump\"  currents onto \nthe output lines.  The output current lines drive the inputs of a  centroid  circuit. \n\nThe current implementation of the shift can be viewed as a basis function approach \nwhere  a  population of intermediate units respond  to limited  \"ball-like\"  regions  in \nthe  two-dimensional  space  of  horizontal  eye  position  and  sound  source  azimuth \n(head-coordinates).  The output units then sum the outputs of only  those interme(cid:173)\ndiate units  which represent  the same retinal location.  It should  be noted  that this \ncoordinate  transformation  is  closely  related  to the  \"dendrite  model\"  proposed  for \nthe projection of cortical auditory information to the deep  SC by Groh and Sparks \n(1992). \n\nThe  final  output  stage  converts  this  spatial  array  of current  carrying  lines  into \na  single  output  voltage  which  represents  the  centroid  of  the  stimulus  in  retinal \ncoordinates.  This  centroid  circuit  (DeWeerth  1991)  is  intended  to  represent  the \nprimate SC  where  a similar computation is  believed to occur. \n\nResults and  Conclusions \n\nFigure  6 shows  three  plots  of the  chip's output  voltage as  a  function  of the  inter(cid:173)\npulse time interval.  Figure 7 shows three plots of the full system's output voltage for \ndifferent eye position voltages.  The output is roughly linear with azimuth and linear \nwith eye position voltage.  In operation, the system input consists of a sound entering \nthe two microphones and the output consists of an analog voltage representing the \nposition of the sound source  and a  digital signal indicating that  the analog data is \nvalid. \n\nThe auditory localization system  described  here  is  currently in use  with  an  analog \nVLSI-based  model  of the  primate  saccadic  system  to  expand  its  operation  into \nthe  auditory  domain  (Horiuchi,  Bishofberger,  &  Koch  1994).  In  addition  to  the \neffort  of our  laboratory  to  model  and  understand  biological  computing structures \nin real-time systems,  we  are exploring the use of these low power integrated sensors \nin  portable applications such as mobile robotics.  Analog VLSI  provides a  compact \nand efficient implementation for many neuromorphic computing architectures which \ncan potentially be used  to provide, small, fast,  low power sensors  for  a  wide variety \nof applications. \n\nAcknowledgements \n\nThe author would like to acknowledge Prof.  Christof Koch for his academic support \nand use of laboratory facilities for this project, Brooks Bishofberger for his assistance \nin constructing some of the discrete  electronics  and  Prof.  Carver Mead for  running \nthe CNS184  course  under  which this chip was fabricated. \nThe author is supported by an  AASERT grant from  the Office  of Naval Research. \n\n\fAn Auditory Localization and Coordinate  Transform  Chip \n\n793 \n\nOutput vs. Arrival Time Difference (3 eye positions) \n\n3.1 \n\n3.0 \n\n2.9 \n\n2.8 \n\n2.7 \n\n6' \n..; \nold \n..j \nN \nd \nN \nII \n\n0 \u00a3  2.6 \n0 1 \n0 > \n<;; g  2.4 \n\n2.5 \n\n2.3 \n\n2.2 \n\n2.1 \n\n-600 \n\n-200 \n\no \n\nR to L delay  (microseconda) \n\n200 \n\n400 \n\n600 \n\nFigure  6:  Chip  output vs.  input  pulse  timing:  The chip  was  driven  with  a  signal \ngenerator  and  the  output voltage was  plotted for  three  different  eye position  volt(cid:173)\nages.  Due to the discretized nature of the axon, there are only  15  axon locations at \nwhich pulses can  meet.  This creates  the staircase response. \n\nReferences \n\nT. Delbriick (1993) Investigations of Analog VLSI  Visual Transduction and Motion \nProcessing,  Ph.D. Thesis,  California Institute of Technology \n\nJ.  Groh  and  D.  Sparks  (1992)  2  Models  for  Transforming  Auditory  Signals from \nHead-Centered  to Eye-Centered Coordinates  Bioi.  Cybern.  67(4)  291-302. \n\nT.  Horiuchi,  B.  Bishofberger,  &  C.  Koch,  (1994)  An  Analog  VLSI-based  Saccadic \nSystem, In (ed.),  Advances in Neural Information  Processing Systems  6 San Mateo, \nCA:  Morgan  Kaufman \nL.  A.  Jeffress  (1948)  A  Place  Theory  of  Sound  Localization  J.  Comp_  Physiol. \nPsychol.  41:  35-39. \nM.  Konishi  (1986)  Centrally Synthesized  Maps of Sensory  Space.  TINS April,  pp. \n163-168. \nK. P. Krommenhoek, A. J. Van Opstal, C. C. A.  M.  Gielen, J. A.  ,M. Van Gisbergen. \n(1993)  Remapping of Neural  Activity in  the  Motor  Colliculus:  A  Neural  Network \nStudy.  Vision  Research 33(9):1287-1298. \n\n\f794 \n\nTimothy  Horiuchi \n\nJ.  Lazzaro.  (1990)  Silicon  Models  of Early  Audition,  Ph.D.  Thesis,  California In(cid:173)\nstitute of Technology \n\nC.  Mead,  (1988)  Analog  VLSI and  Neural Systems Menlo  Park:  Addison-Wesley \n\nW.  Pitts and W.  S.  McCulloch, (1947)  How we  know universals:  the perception  of \nauditory and visual forms.  Bulletin  of Mathematical Biophysics 9:127-147. \n\nD.  Zipser  and  R.  A.  Andersen  (1988)  A  back-propagation  programmed  network \nthat simulates response  properties of a subset of posterior parietal neurons.  Nature \n331:679-684. \n\nLocalization Output vs. Sound Source Azimuth \n\nII -e ,. \non ..; \n'Ij \nN \n'Ij \n\nII \n\nt \n\nS \n-e \n~ \n\n3.6 \n\n3.4 \n\n3.2 \n\n3.0 \n\n28 \n\n26 \n\n24 \n\n22 \n\n20 \n\n1.8 \n\n1.6 \n\n1.4 \n\n\u00b720 \n\n0 \n\nFigure 7:  Performance of the full system on continuous input (sinusoidal) delivered \nby a speaker from different angles.  Note that 90 degrees denotes the center position. \nThe three plots are the outputs for  three different settings of the eye position input \nvoltage. \n\n\f", "award": [], "sourceid": 963, "authors": [{"given_name": "Timothy", "family_name": "Horiuchi", "institution": null}]}