{"title": "Optimizing Cortical Mappings", "book": "Advances in Neural Information Processing Systems", "page_first": 330, "page_last": 336, "abstract": null, "full_text": "Optimizing Cortical Mappings \n\nGeoffrey J. Goodhill \n\nThe Salk Institute \n\n10010 North Torrey Pines Road \n\nLa Jolla, CA 92037, USA \n\nSteven Finch \n\nHuman Communication Research Centre \nUniversity of Edinburgh, 2 Buccleuch Place \n\nEdinburgh EH8 9LW, GREAT BRITAIN \n\nTerrence J. Sejnowski \n\nThe Howard Hughes Medical Institute \nThe Salk Institute for Biological Studies \n\n10010 North Torrey Pines Road, La Jolla, CA 92037, USA \n\nDepartment of Biology, University of California San Diego \n\nLa Jolla, CA 92037, USA \n\n& \n\nAbstract \n\n\"Topographic\"  mappings occur frequently  in the brain.  A  pop(cid:173)\nular approach to understanding the structure of such mappings \nis  to map points representing input features  in a  space of a  few \ndimensions  to points in a  2 dimensional space using some self(cid:173)\norganizing  algorithm.  We  argue  that a  more general  approach \nmay be useful,  where similarities between features  are  not con(cid:173)\nstrained to be geometric distances, and the objective function for \ntopographic matching is chosen explicitly rather than being spec(cid:173)\nified implicitly by the self-organizing algorithm.  We  investigate \nanalytically an example of this more general approach applied to \nthe  structure of interdigitated mappings,  such as the pattern of \nocular dominance columns in primary visual cortex. \n\n1 \n\nINTRODUCTION \n\nA prevalent feature of mappings in the brain is that they are often \"topographic\". \nIn the most straightforward case  this simply means that neighbouring points on \na two-dimensional sheet (e.g.  the retina) are mapped to neighbouring points in a \nmore central two-dimensional structure (e.g.  the optic tectum).  However a more \ncomplex case, still often referred to as topographic, is the mapping from an abstract \nspace of features (e.g.  position in the visual field,  orientation, eye of origin etc) to \n\n\fOptimizing  Cortical  Mappings \n\n331 \n\nthe cortex (e.g.  layer 4 of VI).  In many cortical sensory areas, the preferred sensory \nstimuli of neighbouring neurons changes slowly, except at discontinuous jumps, \nsuggestive of an optimization principle that attempts to match \"similar\" features \nto nearby points in the cortex.  In this paper, we (1) discuss what might constitute \nan appropriate measure of similarity between features, (2) outline an optimization \nprinciple for matching the similarity structure of two abstract spaces (i.e.  a measure \nof the degree of topography of a mapping), and (3)  use these ideas to analyse the \ncase where two equivalent input variables are mapped onto one target structure, \nsuch as the \"ocular dominance\" mapping from the right and left eyes to VI in the \ncat and monkey. \n\n2  SIMILARITY MEASURES \n\nA much-investigated computational approach to the study of mappings in VI is \nto consider  the input features  as  pOints  in a  multidimensional euclidean  space \n[1,5,9].  The input dimensions then consist of e.g.  spatial position, orientation, \nocular dominance, and so on. Some distribution of points in this space is assumed \nwhich attempts, in some sense, to capture the statistics of these features in the visual \nworld.  For instance, in [5], distances between points in the space are interpreted \nas  a  decreasing  function of the  degree  to which  the corresponding features  are \ncorrelated over an ensemble of images.  Some self-organizing algorithm is  then \napplied which produces a  mapping from  the high-dimensional feature  space to \na  two-dimensional sheet representing  the cortex,  such that nearby points in the \nfeature space map to nearby points in the two-dimensional sheet.l \nHowever,  such  approaches  assume that the dissimilarity  structure  of the input \nfeatures  is  well-captured by euclidean distances in a  geometric space.  There is \nno particular reason why this should be true.  For instance, such a representation \nimplies  that the dissimilarity between features  can become arbitrarily large,  an \nunlikely scenario. In addition, it is difficult to capture higher-order relationships in \nsuch a representation, such as that two oriented line-segment detectors will be more \ncorrelated if the line segments are co-linear than if they are not. We propose instead \nthat, for a set of features, one could construct directly from the statistics of natural \nstimuli a feature matrix representing similarities or dissimilarities, without regard \nto whether the resulting relationships can be conveniently captured by distances in \na euclidean feature space. There are many ways this could be done; one example is \ngiven below.  Such a similarity matrix for features can then be optimally matched \n(in some sense) to a similarity matrix for positions in the output space. \nA disadvantage from a computational point of view of this generalized approach is \nthat the self-organizing algorithms of e.g.  [6,2] can no longer be applied, and pos(cid:173)\nsibly less efficient optimization techniques are required.  However, an advantage \nof this is that one may now explore the consequences of optimizing a whole range \nof objective functions for quantifying the quality of the mapping, rather than hav(cid:173)\ning to accept those given explicitly or implicitly by the particular self-organizing \nalgorithm. \n\nlWe mean this in a rather loose sense, and wish to include here the principles of mapping \nnearby points in the sheet to nearby points in the feature space, mapping distant points in \nthe feature space to distant points in the sheet, and so on. \n\n\f332 \n\nG.  J.  GOODHILL. S. FINCH. T.J. SEJNOWSKI \n\nVout \n\nM \n\nFigure 1:  The mapping framework. \n\n3  OPTIMIZATION PRINCIPLES \n\nWe  now outline a  general framework for measuring to what degree a  mapping \nmatches the structure of one similarity matrix to that of another. It is assumed that \ninput and output matrices are of the same (finite) dimension, and that the mapping \nis bijective.  Consider an input space Yin and an output space Vout,  each of which \ncontains N points.  Let M be the mapping from points in Yin to points in Vout (see \nfigure 1).  We  use the word \"space\" in a  general sense:  either or both of Yin and \nVout may not have a geometric interpretation. Assume that for each space there is \na symmetric \"similarity\" function which, for any given pair of points in the space, \nspecifies how similar (or dissimilar) they are.  Call these functions F for Yin and G \nfor Vout.  Then we define a cost functional C as follows \nC = L L F(i,j)G(M(i), MO)), \n\n(1) \n\nN \n\ni=1  i<i \n\nwhere i and j label pOints in ViT\\J  and M(i) and M(j) are their respective images in \nVout.  The sum is over all possible pairs of points in Yin.  Since M is a bijection it is \ninvertible, and C can equivalently be written \n\nN \n\nC =  LL F(M-1(i),M-1(j))G(i,j), \n\ni=1  i<i \n\n(2) \n\nwhere now i  and j  label points in Vout! and M - I  is the inverse map.  A  good (i.e. \nhighly topographic) mapping is one with a high value of C.  However, if one of F or \nG were given as a dissimilarity function (i.e.  increasing with decreasing similarity) \nthen a good mapping would be one with a low value of C.  How F and G are defined \nis problem-specific. \n\nC  has  a  number  of  important properties  that help  to  justify its  adoption  as  a \nmeasure of the degree of topography of a mapping (for more details see [3]).  For \ninstance,  it can be  shown  that if  a  mapping  that  preserves  ordering  relationships \nbetween two similarity matrices exists, then maximizing C will find it.  Such maps \nare  homeomorphisms.  However not all  homeomorphisms have  this  propert}j \nso we refer to such \"perfect\" maps as \"topographic homeomorphisms\".  Several \npreviously defined optimization principles, such as minimum path and minimum \n\n\fOptimizing  Cortical  Mappings \n\n333 \n\nwiring [1], are special cases of C.  It is also closely related (under the assumptions \nabove) to Luttrell's minimum distortion measure [7], if F is euclidean distance in a \ngeometric input space, and G gives the noise process in the output space. \n\n4  INTERDIGITATED  MAPPINGS \n\nAs a particular application of the principles discussed so far,  we consider the case \nwhere the similarity structure of Yin can be expressed in matrix form as \n\nwhere  Qs  and  Qc  are  of dimension Nil.  This  means  that Yin consists  of two \nhalves, each with the same internal similarity structure, and an in general different \nsimilarity structure between the two halves.  The question is how best to match \nthis  dual similarity structure to a  single similarity structure in  Vout.  This  is  of \nmathematical interest since it is  one of the simplest cases of a mismatch between \nthe similarity structures of V in and Vout! and of biological interest since it abstractly \nrepresents the case of input from two equivalent sets of receptors coming together \nin a single cortical sheet, e.g.  ocular dominance columns in primary visual cortex \n(see e.g.  [8, 5]).  For simplicity we consider only the case of two one-dimensional \nretinae mapping to a one-dimensional cortex. \nThe feature space approach to the problem presented in [5]  says that the dissim(cid:173)\nilarities in Yin are given by squared euclidean distances between points arranged \nin two parallel rows in a two-dimensional space. That is, \n\n.  . \n\nF(l., J)  = \n\n{ \n\n'12 \nI\u00b7 \nl.- J \n\nIi _ j _  NIll2 + k2 \n\n: \n: \n\ni, j in same half of Yin \ni, j  in different halves of Yin \n\n(3) \n\nassuming that indices  1 ... Nil give points in one half and indices Nil + 1 ... N \ngive pOints in the other half.  G (i, j) is given by \n\nG (.  .) _  {1  : \n\ni, j neighbouring \n\nl., J  -\n\n0 \n\n:  otherwise \n\n(4) \n\nIt can be shown that the globally optimal mapping (i.e.  minimum of C) when k > 1 \nis to keep the two halves of V in entirely separate in Vout [5].  However, there is also a \nlocal minimum for an interdigitated (or \"striped\") map, where the interdigitations \nhave width n  = lk.  By varying the value of k it is thus possible to smoothly vary \nthe periodicity of the locally  optimal striped map.  Such behavior predicted the \noutcome of a recent biological experiment [4].  For k < 1 the globally optimal map \nis stripes of width n  = 1. \nHowever, in principle many alternative ways of measuring the similarity in Yin \nare possible.  One obvious idea is to assume that similarity is given directly by the \ndegree of correlation between points within and between the two eyes.  A simple \nassumption about the form of these correlations is that they are a gaussian function \nof physical distance between the receptors (as in [8]).  That is, \n\n.  . \n\nF(l.,J)= \n\n{ \n\nI\u00b7  '12 \ne- ott-) \n\nce-f3li-i-N/211 \n\ni, j  in same half of Yin \ni, j  in different halves of Yin \n\n(5) \n\nwith c <  1.  We assume for ease of analysis that G is still as given in equation 4. \nThis directly implements an intuitive notion put forward to account for the inter(cid:173)\ndigitation of the ocular dominance mapping [4]:  that the cortex tries to represent \n\n\f334 \n\nG.  J. GOODHILL, S.  FINCH, TJ. SEJNOWSKI \n\nsimilar inputs close together, that similarity is given by the degree of correlation \nbetween the activities of points (cells), and additionally that natural visual scenes \nimpose a  correlational structure of the same qualitative form as  equation 5.  We \nnow calculate C analytically for various mappings (c.f.  [5]), and compare the cost \nof a map that keeps the two halves of Yin entirely separate in Vout to those which \ninterdigitate the two halves of Yin with some regular periodicity.  The map of the \nfirst type we consider will be refered to as the \"up and down\" map: moving from \none end of Vout to the other implies moving entirely through one half of ViT\\l then \nback in the opposite direction through the other half.  For this map, the cost Cud is \ngiven by \n\n(6) \n\n(7) \n\nCud  = 2(N -\n\nl)e- ct +  c. \n\nFor an interdigitated (striped) map where the stripes are of width n  ~ 2: \n\nCs(n) =  N  [2  (1  - ~) e- ct  +  ~ (e-~f(n) +  e-~g(n))] \n\nwhere for  n  even f(n)  =  g(n)  =  (n\"22)2  and for  n  odd f(n)  =  (n\"2I)2,  g(n)  = \n(n\"23 ) 2.  To characterize this system we now analyze how the n for which C s ( n) has \na local maximum varies with c, a., 13, and when this local maximum is also a global \nmaximum.  Setting  dCci\u00a3n)  =  0 does not yield analytically  tractable expressions \n(unlike [5]).  However, more direct methods can be used: there is a local maximum \natnifCs(n-1) < Cs(n) > Cs(n+ 1).  Using equation 7we derive conditions on C \nfor this to be true.  For n  odd, we obtain the condition CI  < C < C2  where CI  =  C2; \nthat is,  there are no local maxima at odd values of n.  For n  even, we also obtain \nCI  < C < C2  where now \n\nCI  = \n\n2e- ct \n\nn-2  2 \nne-~(-z)  - (n - 2)e-~(-z) \n\nn-4  2 \n\nand c2(n)  = CI (n +  2).  CI (n) and c2(n)  are  plotted in figure  2,  from  which one \ncan see the ranges of C for which particular n  are local maxima.  As  13  increases, \nmaxima for larger values of n become apparent, but the range of c for which they \nexist becomes rather small. It can be shown that Cud is always the global maximum, \nexcept when e- ct  > c, when n  =  2 is globally optimal.  As C decreases the optimal \nstripe width gets wider, analogously to k increasing in the dissimilarities given by \nequation 3.  When 13  is such that there is no local maximum the only optimum is \nstripes as wide as possible.  This fits with the intuitive idea that if corresponding \npoints in the two halves of Yin (Le.  Ii - j I = N/2) are sufficiently similar then it is \nfavorable to interdigitate the two halves in VoutJ otherwise the two halves are kept \ncompletely separate. \nThe qualitative behavior here is similar to that for equation 3.  n  =  2 is a  global \noptimum for large c (small k), then as C decreases (k increases) n  = 2 first becomes a \nlocal optimum, then the position of the local optimum shifts to larger n. However, \n~n important difference  is  that in equation 3  the dissimilarities  increase without \nlimit with distance, whereas in equation 5 the similarities tend to zero with dis(cid:173)\ntance.  Thus for equation 5 the extra cost of stripes one unit wider rapidly becomes \nnegligible, whereas for equation 3 this extra cost keeps on increasing by ever larger \namounts.  As n  -+  00, Cud'\" Cs(n) for the similarities defined by equation 5 (i.e. \nthere is the same cost for traversing the two blocks in the same direction as in the \nopposite direction), whereas for the dissimilarities defined by equation 3 there is a \nquite different cost in these two cases. That F and G should tend to a bounded value \nas i  and j become ever more distant neighbors seems biologically more plausible \nthan that they should be potentially unbounded. \n\n\fOptimizing  Cortical  Mappings \n\n335 \n\n(a) \n\n'\"  1.0 \n\"\" ~  0.9 \n\n<.> \n\n0.8 \n\n0.7 \n\n0.6 \n\n0.5 \n\n0.4 \n\n0.3 \n\n0.2 \n\n0.1 \n\n0.0 \n\n0 \n\n\u00b7 \u00b7 \u2022 \u00b7 \u2022 \u00b7 \u00b7 o \n\nD \n\ncl \nc:2 \n\n2 \n\n4 \n\n6 \n\n8 \n\n10 \n\n12 \n\n14 \nn \n\n(b) \n\n'\"  1.0 \n.t< \n~  0.9 \n<.> \n0.8 \n\n0.7 \n\n0.6 \n\n0.5 \n\n0.4 \n\n0.3 \n\n0.2 \n\n0.1 \n\n0.0 \n\n0 \n\n, \n\n\u2022 , , . , . \u2022 , , , \n\u2022 . . . \n\n...... ,' \n\ncl \nc:2 \n\nn \n\nFigure 2:  The ranges of c for which particular n  are local maxima.  (a)  oc  = f3  = 0.25.  (b) \noc  = 0.25,  i3  = 0.1.  When the Cl  (dashed) line is below the c,  (solid) line no local maxima \nexist.  For each (even) value of n to the left of the crossing point, the vertical range between \nthe two lines gives the values of c for which that n  is a local maximum. Below the solid line \nand to the right of the crossing point the only maximum is stripes as wide as possible. \n\nIssues such as those we have addressed regarding the transition from \"striped\" to \n\"blocked\" solutions for combining two sets of inputs distinguished by their intra(cid:173)\nand inter-population similarity structure  may be relevant  to understanding  the \nspatial representation of functional attributes across cortex.  The  results suggest \nthe hypothesis that two variables are interdigitated in the same area rather than \nbeing represented separately in two distinct areas if the inter-population similarity \nis  sufficiently high.  An interesting point is  that the  striped solutions are  often \nonly local optima.  It is possible that in reality developmental constraints (e.g.  a \nchemically defined bias  towards  overlaying  the  two projections)  impose a  bias \ntowards finding a striped rather than blocked solution, even though the latter may \nbe the global optimum. \n\n5  DISCUSSION \n\nWe  have  argued  that,  in order  to understand the  structure  of mappings in the \nbrain,  it could be useful to examine more general measures of similarity and of \ntopographic matching than those implied by standard feature space models.  The \nconsequences of one particular alternative set of choices has been examined for the \ncase of an interdigitated map of two variables. Many alternative objective functions \nfor topographic matching are of course possible; this topic is reviewed in [3].  Two \nissues we have not discussed are the most appropriate way to define the features \nof interest, and the most appropriate measures of similarity between features (see \n[10]  for an interesting discussion). \nA next step is to apply these methods to more complex structures in VI than just the \nocular dominance map.  By examining more of the space of possibilities than that \noccupied by the current feature space models, we hope to understand more about \nthe optimization strategies that might be being pursued by the cortex.  Feature \nspace models may still tum out to be more or less the right answer; however even \nif this is true, our approach will at least give a deeper level of understanding why. \n\n\f336 \n\nAcknowledgements \n\nG.  1.  GOODHILL, S. FINCH, T.l. SEINOWSKI \n\nWe thank Gary Blasdel, Peter Dayan and Paul Viola for stimulating discussions. \n\nReferences \n\n[1]  Durbin,  R.  &  Mitchison,  G.  (1990).  A  dimension reduction framework for \n\nunderstanding cortical maps. Nature, 343, 644-647. \n\n[2]  Durbin, R.  &  Willshaw, D.J.  (1987). An analogue approach to the travelling \n\nsalesman problem using an elastic net method. Nature, 326,689-691. \n\n[3]  Goodhill, G. J.,  Finch,  S.  &  Sejnowski, T.  J.  (1995).  Quantifying neighbour(cid:173)\nhood  preservation  in  topographic  mappings.  Institute  for  Neural  Com(cid:173)\nputation  Technical  Report  Series,  No.  INC-9505,  November  1995.  Avail(cid:173)\nable  from  ftp:/ / salk.edu/pub / geoff/ goodhillJinch_sejnowski_tech95.ps.Z \nor http://cnl.salk.edu/ \"\"geoff. \n\n[4]  Goodhill, G.J.  &  Lowel, S.  (1995). Theory meets experiment:  correlated neu(cid:173)\nral activity helps determine ocular dominance column periodicity. Trends  in \nNeurosciences, 18,437-439. \n\n[5]  Goodhill, G.J. & Willshaw, D.J. (1990). Application of the elastic net algorithm \n\nto the formation of ocular dominance stripes. Network, 1, 41-59. \n\n[6]  Kohonen, T. (1982). Self-organized formation of topologically correct feature \n\nmaps. Bioi.  Cybern., 43, 59-69. \n\n[7]  Luttrell, S.P.  (1990). Derivation of a class of training algorithms. IEEE  Trans. \n\nNeural Networks, 1,229-232. \n\n[8]  Miller,  KD.,  Keller,  J.B.  &  Stryker,  M.P.  (1989).  Ocular dominance column \n\ndevelopment:  Analysis and simulation. Science, 245, 605-615. \n\n[9]  Obermayer,  K,  Blasdel,  G.G.  &  Schulten,  K  (1992).  Statistical-mechanical \nanalysis of self-organization and pattern formation during the development \nof visual maps. Phys.  Rev. A, 45, 7568-7589. \n\n[10]  Weiss, Y.  &  Edelman, S. (1995). Representation of similarity as a goal of early \n\nsensory coding. Network,  6, 19-41. \n\n\f", "award": [], "sourceid": 1103, "authors": [{"given_name": "Geoffrey", "family_name": "Goodhill", "institution": null}, {"given_name": "Steven", "family_name": "Finch", "institution": null}, {"given_name": "Terrence", "family_name": "Sejnowski", "institution": null}]}