{"title": "Optimizing Cortical Mappings", "book": "Advances in Neural Information Processing Systems", "page_first": 330, "page_last": 336, "abstract": null, "full_text": "Optimizing Cortical Mappings \n\nGeoffrey J. Goodhill \n\nThe Salk Institute \n\n10010 North Torrey Pines Road \n\nLa Jolla, CA 92037, USA \n\nSteven Finch \n\nHuman Communication Research Centre \nUniversity of Edinburgh, 2 Buccleuch Place \n\nEdinburgh EH8 9LW, GREAT BRITAIN \n\nTerrence J. Sejnowski \n\nThe Howard Hughes Medical Institute \nThe Salk Institute for Biological Studies \n\n10010 North Torrey Pines Road, La Jolla, CA 92037, USA \n\nDepartment of Biology, University of California San Diego \n\nLa Jolla, CA 92037, USA \n\n& \n\nAbstract \n\n\"Topographic\" mappings occur frequently in the brain. A pop(cid:173)\nular approach to understanding the structure of such mappings \nis to map points representing input features in a space of a few \ndimensions to points in a 2 dimensional space using some self(cid:173)\norganizing algorithm. We argue that a more general approach \nmay be useful, where similarities between features are not con(cid:173)\nstrained to be geometric distances, and the objective function for \ntopographic matching is chosen explicitly rather than being spec(cid:173)\nified implicitly by the self-organizing algorithm. We investigate \nanalytically an example of this more general approach applied to \nthe structure of interdigitated mappings, such as the pattern of \nocular dominance columns in primary visual cortex. \n\n1 \n\nINTRODUCTION \n\nA prevalent feature of mappings in the brain is that they are often \"topographic\". \nIn the most straightforward case this simply means that neighbouring points on \na two-dimensional sheet (e.g. the retina) are mapped to neighbouring points in a \nmore central two-dimensional structure (e.g. the optic tectum). However a more \ncomplex case, still often referred to as topographic, is the mapping from an abstract \nspace of features (e.g. position in the visual field, orientation, eye of origin etc) to \n\n\fOptimizing Cortical Mappings \n\n331 \n\nthe cortex (e.g. layer 4 of VI). In many cortical sensory areas, the preferred sensory \nstimuli of neighbouring neurons changes slowly, except at discontinuous jumps, \nsuggestive of an optimization principle that attempts to match \"similar\" features \nto nearby points in the cortex. In this paper, we (1) discuss what might constitute \nan appropriate measure of similarity between features, (2) outline an optimization \nprinciple for matching the similarity structure of two abstract spaces (i.e. a measure \nof the degree of topography of a mapping), and (3) use these ideas to analyse the \ncase where two equivalent input variables are mapped onto one target structure, \nsuch as the \"ocular dominance\" mapping from the right and left eyes to VI in the \ncat and monkey. \n\n2 SIMILARITY MEASURES \n\nA much-investigated computational approach to the study of mappings in VI is \nto consider the input features as pOints in a multidimensional euclidean space \n[1,5,9]. The input dimensions then consist of e.g. spatial position, orientation, \nocular dominance, and so on. Some distribution of points in this space is assumed \nwhich attempts, in some sense, to capture the statistics of these features in the visual \nworld. For instance, in [5], distances between points in the space are interpreted \nas a decreasing function of the degree to which the corresponding features are \ncorrelated over an ensemble of images. Some self-organizing algorithm is then \napplied which produces a mapping from the high-dimensional feature space to \na two-dimensional sheet representing the cortex, such that nearby points in the \nfeature space map to nearby points in the two-dimensional sheet.l \nHowever, such approaches assume that the dissimilarity structure of the input \nfeatures is well-captured by euclidean distances in a geometric space. There is \nno particular reason why this should be true. For instance, such a representation \nimplies that the dissimilarity between features can become arbitrarily large, an \nunlikely scenario. In addition, it is difficult to capture higher-order relationships in \nsuch a representation, such as that two oriented line-segment detectors will be more \ncorrelated if the line segments are co-linear than if they are not. We propose instead \nthat, for a set of features, one could construct directly from the statistics of natural \nstimuli a feature matrix representing similarities or dissimilarities, without regard \nto whether the resulting relationships can be conveniently captured by distances in \na euclidean feature space. There are many ways this could be done; one example is \ngiven below. Such a similarity matrix for features can then be optimally matched \n(in some sense) to a similarity matrix for positions in the output space. \nA disadvantage from a computational point of view of this generalized approach is \nthat the self-organizing algorithms of e.g. [6,2] can no longer be applied, and pos(cid:173)\nsibly less efficient optimization techniques are required. However, an advantage \nof this is that one may now explore the consequences of optimizing a whole range \nof objective functions for quantifying the quality of the mapping, rather than hav(cid:173)\ning to accept those given explicitly or implicitly by the particular self-organizing \nalgorithm. \n\nlWe mean this in a rather loose sense, and wish to include here the principles of mapping \nnearby points in the sheet to nearby points in the feature space, mapping distant points in \nthe feature space to distant points in the sheet, and so on. \n\n\f332 \n\nG. J. GOODHILL. S. FINCH. T.J. SEJNOWSKI \n\nVout \n\nM \n\nFigure 1: The mapping framework. \n\n3 OPTIMIZATION PRINCIPLES \n\nWe now outline a general framework for measuring to what degree a mapping \nmatches the structure of one similarity matrix to that of another. It is assumed that \ninput and output matrices are of the same (finite) dimension, and that the mapping \nis bijective. Consider an input space Yin and an output space Vout, each of which \ncontains N points. Let M be the mapping from points in Yin to points in Vout (see \nfigure 1). We use the word \"space\" in a general sense: either or both of Yin and \nVout may not have a geometric interpretation. Assume that for each space there is \na symmetric \"similarity\" function which, for any given pair of points in the space, \nspecifies how similar (or dissimilar) they are. Call these functions F for Yin and G \nfor Vout. Then we define a cost functional C as follows \nC = L L F(i,j)G(M(i), MO)), \n\n(1) \n\nN \n\ni=1 i* 1 \nis to keep the two halves of V in entirely separate in Vout [5]. However, there is also a \nlocal minimum for an interdigitated (or \"striped\") map, where the interdigitations \nhave width n = lk. By varying the value of k it is thus possible to smoothly vary \nthe periodicity of the locally optimal striped map. Such behavior predicted the \noutcome of a recent biological experiment [4]. For k < 1 the globally optimal map \nis stripes of width n = 1. \nHowever, in principle many alternative ways of measuring the similarity in Yin \nare possible. One obvious idea is to assume that similarity is given directly by the \ndegree of correlation between points within and between the two eyes. A simple \nassumption about the form of these correlations is that they are a gaussian function \nof physical distance between the receptors (as in [8]). That is, \n\n. . \n\nF(l.,J)= \n\n{ \n\nI\u00b7 '12 \ne- ott-) \n\nce-f3li-i-N/211 \n\ni, j in same half of Yin \ni, j in different halves of Yin \n\n(5) \n\nwith c < 1. We assume for ease of analysis that G is still as given in equation 4. \nThis directly implements an intuitive notion put forward to account for the inter(cid:173)\ndigitation of the ocular dominance mapping [4]: that the cortex tries to represent \n\n\f334 \n\nG. J. GOODHILL, S. FINCH, TJ. SEJNOWSKI \n\nsimilar inputs close together, that similarity is given by the degree of correlation \nbetween the activities of points (cells), and additionally that natural visual scenes \nimpose a correlational structure of the same qualitative form as equation 5. We \nnow calculate C analytically for various mappings (c.f. [5]), and compare the cost \nof a map that keeps the two halves of Yin entirely separate in Vout to those which \ninterdigitate the two halves of Yin with some regular periodicity. The map of the \nfirst type we consider will be refered to as the \"up and down\" map: moving from \none end of Vout to the other implies moving entirely through one half of ViT\\l then \nback in the opposite direction through the other half. For this map, the cost Cud is \ngiven by \n\n(6) \n\n(7) \n\nCud = 2(N -\n\nl)e- ct + c. \n\nFor an interdigitated (striped) map where the stripes are of width n ~ 2: \n\nCs(n) = N [2 (1 - ~) e- ct + ~ (e-~f(n) + e-~g(n))] \n\nwhere for n even f(n) = g(n) = (n\"22)2 and for n odd f(n) = (n\"2I)2, g(n) = \n(n\"23 ) 2. To characterize this system we now analyze how the n for which C s ( n) has \na local maximum varies with c, a., 13, and when this local maximum is also a global \nmaximum. Setting dCci\u00a3n) = 0 does not yield analytically tractable expressions \n(unlike [5]). However, more direct methods can be used: there is a local maximum \natnifCs(n-1) < Cs(n) > Cs(n+ 1). Using equation 7we derive conditions on C \nfor this to be true. For n odd, we obtain the condition CI < C < C2 where CI = C2; \nthat is, there are no local maxima at odd values of n. For n even, we also obtain \nCI < C < C2 where now \n\nCI = \n\n2e- ct \n\nn-2 2 \nne-~(-z) - (n - 2)e-~(-z) \n\nn-4 2 \n\nand c2(n) = CI (n + 2). CI (n) and c2(n) are plotted in figure 2, from which one \ncan see the ranges of C for which particular n are local maxima. As 13 increases, \nmaxima for larger values of n become apparent, but the range of c for which they \nexist becomes rather small. It can be shown that Cud is always the global maximum, \nexcept when e- ct > c, when n = 2 is globally optimal. As C decreases the optimal \nstripe width gets wider, analogously to k increasing in the dissimilarities given by \nequation 3. When 13 is such that there is no local maximum the only optimum is \nstripes as wide as possible. This fits with the intuitive idea that if corresponding \npoints in the two halves of Yin (Le. Ii - j I = N/2) are sufficiently similar then it is \nfavorable to interdigitate the two halves in VoutJ otherwise the two halves are kept \ncompletely separate. \nThe qualitative behavior here is similar to that for equation 3. n = 2 is a global \noptimum for large c (small k), then as C decreases (k increases) n = 2 first becomes a \nlocal optimum, then the position of the local optimum shifts to larger n. However, \n~n important difference is that in equation 3 the dissimilarities increase without \nlimit with distance, whereas in equation 5 the similarities tend to zero with dis(cid:173)\ntance. Thus for equation 5 the extra cost of stripes one unit wider rapidly becomes \nnegligible, whereas for equation 3 this extra cost keeps on increasing by ever larger \namounts. As n -+ 00, Cud'\" Cs(n) for the similarities defined by equation 5 (i.e. \nthere is the same cost for traversing the two blocks in the same direction as in the \nopposite direction), whereas for the dissimilarities defined by equation 3 there is a \nquite different cost in these two cases. That F and G should tend to a bounded value \nas i and j become ever more distant neighbors seems biologically more plausible \nthan that they should be potentially unbounded. \n\n\fOptimizing Cortical Mappings \n\n335 \n\n(a) \n\n'\" 1.0 \n\"\" ~ 0.9 \n\n<.> \n\n0.8 \n\n0.7 \n\n0.6 \n\n0.5 \n\n0.4 \n\n0.3 \n\n0.2 \n\n0.1 \n\n0.0 \n\n0 \n\n\u00b7 \u00b7 \u2022 \u00b7 \u2022 \u00b7 \u00b7 o \n\nD \n\ncl \nc:2 \n\n2 \n\n4 \n\n6 \n\n8 \n\n10 \n\n12 \n\n14 \nn \n\n(b) \n\n'\" 1.0 \n.t< \n~ 0.9 \n<.> \n0.8 \n\n0.7 \n\n0.6 \n\n0.5 \n\n0.4 \n\n0.3 \n\n0.2 \n\n0.1 \n\n0.0 \n\n0 \n\n, \n\n\u2022 , , . , . \u2022 , , , \n\u2022 . . . \n\n...... ,' \n\ncl \nc:2 \n\nn \n\nFigure 2: The ranges of c for which particular n are local maxima. (a) oc = f3 = 0.25. (b) \noc = 0.25, i3 = 0.1. When the Cl (dashed) line is below the c, (solid) line no local maxima \nexist. For each (even) value of n to the left of the crossing point, the vertical range between \nthe two lines gives the values of c for which that n is a local maximum. Below the solid line \nand to the right of the crossing point the only maximum is stripes as wide as possible. \n\nIssues such as those we have addressed regarding the transition from \"striped\" to \n\"blocked\" solutions for combining two sets of inputs distinguished by their intra(cid:173)\nand inter-population similarity structure may be relevant to understanding the \nspatial representation of functional attributes across cortex. The results suggest \nthe hypothesis that two variables are interdigitated in the same area rather than \nbeing represented separately in two distinct areas if the inter-population similarity \nis sufficiently high. An interesting point is that the striped solutions are often \nonly local optima. It is possible that in reality developmental constraints (e.g. a \nchemically defined bias towards overlaying the two projections) impose a bias \ntowards finding a striped rather than blocked solution, even though the latter may \nbe the global optimum. \n\n5 DISCUSSION \n\nWe have argued that, in order to understand the structure of mappings in the \nbrain, it could be useful to examine more general measures of similarity and of \ntopographic matching than those implied by standard feature space models. The \nconsequences of one particular alternative set of choices has been examined for the \ncase of an interdigitated map of two variables. Many alternative objective functions \nfor topographic matching are of course possible; this topic is reviewed in [3]. Two \nissues we have not discussed are the most appropriate way to define the features \nof interest, and the most appropriate measures of similarity between features (see \n[10] for an interesting discussion). \nA next step is to apply these methods to more complex structures in VI than just the \nocular dominance map. By examining more of the space of possibilities than that \noccupied by the current feature space models, we hope to understand more about \nthe optimization strategies that might be being pursued by the cortex. Feature \nspace models may still tum out to be more or less the right answer; however even \nif this is true, our approach will at least give a deeper level of understanding why. \n\n\f336 \n\nAcknowledgements \n\nG. 1. GOODHILL, S. FINCH, T.l. SEINOWSKI \n\nWe thank Gary Blasdel, Peter Dayan and Paul Viola for stimulating discussions. \n\nReferences \n\n[1] Durbin, R. & Mitchison, G. (1990). A dimension reduction framework for \n\nunderstanding cortical maps. Nature, 343, 644-647. \n\n[2] Durbin, R. & Willshaw, D.J. (1987). An analogue approach to the travelling \n\nsalesman problem using an elastic net method. Nature, 326,689-691. \n\n[3] Goodhill, G. J., Finch, S. & Sejnowski, T. J. (1995). Quantifying neighbour(cid:173)\nhood preservation in topographic mappings. Institute for Neural Com(cid:173)\nputation Technical Report Series, No. INC-9505, November 1995. Avail(cid:173)\nable from ftp:/ / salk.edu/pub / geoff/ goodhillJinch_sejnowski_tech95.ps.Z \nor http://cnl.salk.edu/ \"\"geoff. \n\n[4] Goodhill, G.J. & Lowel, S. (1995). Theory meets experiment: correlated neu(cid:173)\nral activity helps determine ocular dominance column periodicity. Trends in \nNeurosciences, 18,437-439. \n\n[5] Goodhill, G.J. & Willshaw, D.J. (1990). Application of the elastic net algorithm \n\nto the formation of ocular dominance stripes. Network, 1, 41-59. \n\n[6] Kohonen, T. (1982). Self-organized formation of topologically correct feature \n\nmaps. Bioi. Cybern., 43, 59-69. \n\n[7] Luttrell, S.P. (1990). Derivation of a class of training algorithms. IEEE Trans. \n\nNeural Networks, 1,229-232. \n\n[8] Miller, KD., Keller, J.B. & Stryker, M.P. (1989). Ocular dominance column \n\ndevelopment: Analysis and simulation. Science, 245, 605-615. \n\n[9] Obermayer, K, Blasdel, G.G. & Schulten, K (1992). Statistical-mechanical \nanalysis of self-organization and pattern formation during the development \nof visual maps. Phys. Rev. A, 45, 7568-7589. \n\n[10] Weiss, Y. & Edelman, S. (1995). Representation of similarity as a goal of early \n\nsensory coding. Network, 6, 19-41. \n\n\f", "award": [], "sourceid": 1103, "authors": [{"given_name": "Geoffrey", "family_name": "Goodhill", "institution": null}, {"given_name": "Steven", "family_name": "Finch", "institution": null}, {"given_name": "Terrence", "family_name": "Sejnowski", "institution": null}]}*