{"title": "Spatial Decorrelation in Orientation Tuned Cortical Cells", "book": "Advances in Neural Information Processing Systems", "page_first": 852, "page_last": 858, "abstract": "", "full_text": "Spatial Decorrelation in Orientation \n\nTuned Cortical Cells \n\nAlexander Dimitrov \n\nDepartment of Mathematics \n\nUniversity of Chicago \n\nChicago, IL 60637 \n\na-dimitrov@uchicago.edu \n\nJack D. Cowan \n\nDepartment of Mathematics \n\nUniversity of Chicago \n\nChicago, IL 60637 \n\ncowan@math.uchicago.edu \n\nAbstract \n\nIn this paper we propose a model for the lateral connectivity of \norientation-selective cells in the visual cortex based on information(cid:173)\ntheoretic considerations. We study the properties of the input sig(cid:173)\nnal to the visual cortex and find new statistical structures which \nhave not been processed in the retino-geniculate pathway. Applying \nthe idea that the system optimizes the representation of incoming \nsignals, we derive the lateral connectivity that will achieve this for \na set of local orientation-selective patches, as well as the complete \nspatial structure of a layer of such patches. We compare the results \nwith various physiological measurements. \n\n1 \n\nIntroduction \n\nIn recent years much work has been done on how the structure of the visual sys(cid:173)\ntem reflects properties of the visual environment (Atick and Redlich 1992; Attneave \n1954; Barlow 1989). Based on the statistics of natural scenes compiled and studied \nby Field (1987) and Ruderman and Bialek (1993), work was done by Atick and \nRedlich (1992) on the assumption that one of the tasks of early vision is to re(cid:173)\nduce the redundancy of input signals, the results of which agree qualitatively with \nnumerous physiological and psychophysical experiments. Their ideas were further \nstrengthened by research suggesting the possibility that such structures develop via \nsimple correlation-based learning mechanisms (Atick and Redlich 1993; Dong 1994). \nAs suggested by Atick and Li (1994), further higher-order redundancy reduction \nof the luminosity field in the visual processing system is unlikely, since it gives \nlittle benefit in information compression. In this paper we apply similar ideas to a \ndifferent input signal which is readily available to the system and whose statistical \nproperties are lost in the analysis of the luminosity signal. We note that after the \n\n\fSpatial Decorrelation in Orientation Tuned Cortical Cells \n\n853 \n\napplication of the retinal \"mexican hat\" filter the most obvious salient features \nthat are left in images are sharp changes in luminosity, for which the filter is not \noptimal, i.e. \nlocal edges. Such edges have correlations which are very different \nfrom the luminosity autocorrelation of natural images (Field 1987), and have zero \nprobability measure in visual scenes, so they are lost in the ensemble averages used \nto compute the autocorrelation function of natural images. We know that this \nsignal is projected to a set of direction-sensitive units in VI for each distinct retinal \nposition, thereby introducing new redundancy in the signal. Thus the necessity for \ncompression and use of factorial codes arises once again. \n\nSince local edges are defined by sharp changes in the luminosity field, we can use \na derivative operation to pick up the pertinent structure. Indeed, if we look at the \ngradient of the luminosity as a vector field, its magnitude at a point is proportional \nto the change of luminosity, so that a large magnitude signals the possible presence of \na discontinuity in the luminosity profile. Moreover, in two dimensions, the direction \nof the gradient vector is perpendicular to the direction of the possible local edge, \nwhose presence is given by the magnitude. These properties define a one-to-one \ncorrespondence between large gradients and local edges. \n\nThe structure of the network we use reflects what is known about the structure of \nV!. We select as our system a layer of direction sensitive cells which are laterally \nconnected to one another, each receiving input from the previous layer. We assume \nthat each unit receives as input the directional derivative of the luminosity signal \nalong the preferred visuotopic axis of the cell. This implies that locally the input to \na cell is proportional to the cosine of the angle between the unit's preferred direction \nand the local gradient (edge). Thus each unit receives a broadly tuned signal, with \nHW-HH approximately 60 0 \u2022 With this feed-forward structure, the idea that the \nsystem is trying to decorrelate its inputs suggests a way to calculate the lateral \nconnections that will perform this task. This calculation, and a further study of the \nstatistical properties of the input is the topic of the paper. \n\n2 Mathematical Model \nLet G(x) = (Gl(x), G2(x\u00bb be the gradient of luminosity at x . Assume that there \nis a set of N detectors with activity Oi at x, each with a preferred direction ni. \nLet the input from the previous layer to each detector be the directional derivative \nalong its preferred direction. \n\nVi(x) = IGrad(L(x\u00bb.nil = Idni L(x)1 \n\nd \n\n(1) \n\nThere are long range correlations in the inputs to the network due both to the \nstatistical structure of the natural images and the structure of the input. The \nsimplest of them are captured in the two-point correlation matrix Rij(Xl , X2) =< \nVi(XI)Vj(X2) >, where the averaging is done across images. Then R is a block \nmatrix, with an N x N matrix at each spatial position (Xl, X2)' \nWe formulate the problem in terms of a recurrent kernel W, so that \n\nThe biological interpretation of this is that V is the effective input to VI from \nthe LGN and W specifies the lateral connectivity in V!. This equation describes \nthe steady state of the linear dynamical system 6 = -0 + W * 0 + V. The \n\n(2) \n\n\f854 \n\nA. Dimitrov and J. D. Cowan \n\nabove recurrent system has a solution for 0 not an eigenfunction of W in the form \n0= (6 - W)-l * V = K * V. This suggests that there is an equivalent feed-forward \nsystem with a transfer function K = (6 - W)-l and we can consider only such \nsystems. \n\nThe corresponding feed-forward system is a linear system that acts on the input \nVex) to produce an output O(x) = (1<. V)(x) == J K(x, y). V(y)dy. If we use Bar(cid:173)\nlow's redundancy reduction hypothesis (Barlow 1989), this filter should decorrelate \nthe output signal. This is achieved by requiring that \n\n6(X1 - X2) \n6(X1 - X2) \n\n'\" < O(xd 0 0(X2) >=< (K . V)(X1) 0 (K . V)(X2) >{:} \n'\" K\u00b7 R . KT \n\n(3) \nThe aim then is to solve (3) for K. Obviously, this is equivalent to KT . K \"\" R- 1 \n(assuming K and R are non-singular), which has a solution K '\" R-t, unique up \nto a unitary transformation . The corresponding recurrent filter is then \n\n(4) \n\nThis expression suggests an immediate benefit in the use of lateral kernels by the \nsystem. As (4) shows, the filter does not now require inverting the correlation matrix \nand thus is more stable than a feed-forward filter. This also helps preserve the local \nstructure of the autocorrelator in the optimal filter, while, because of the inversion \nprocess, a feed-forward system will in general produce non-local, non-topographic \nsolutions. \n\nTo obtain a realistic connectivity structure, we need to explicitly include the effects \nof noise on the system. The system is then described by 0 1 = V + N1 + M * W * \n(0 1 + N2), where N1 is the input noise and N2 is the noise, generated by individual \nunits in the recurrently connected layer. Similarly to a feed-forward system (Atick \nand Redlich 1992), we can modify the decorrelation kernel W derived from (2) to \nM * W. The form of the correction M, which minimizes the effects of noise on \nthe system, is obtained by minimizing the distance between the states of the two \nsystems. If we define X2(M) =< 10 - 0 1 12 > as the distance function, the solution \nto ox;1M ) = 0 will give us the appropriate kernel. A solution to this problem is \n\nM * W = 6 - (R + Nf + Ni) * (p R 1/ 2 + Ni)-l \n\n(5) \n\nWe see that it has the correct asymptotics as N1, N2 approach zero . The filter be(cid:173)\nhaves well for large N 2 , turning mostly into a low-pass filter with large attenuation. \nIt cannot handle well large N1 and reaches -00 proportionally to N'f. \n\n3 Results \n\n3.1 Local Optimal Linear Filter \n\nAs a first calculation with this model, consider its implications for the connectivity \nbetween units in a single hypercolumn. This allows for a very simple application \nof the theory and does not require any knowledge of the input signal under very \ngeneral assumptions. \n\nWe assume that direction selective cells receive as input from the previous layer the \nprojection of the gradient onto their preferred direction. Thus they act as directional \n\n\fSpatial Decorrelation in Orientation Tuned Cortical Cells \n\n855 \n\nderivatives, so that their response to a signal with the luminosity profile L( x) and \nno input from other lateral units is Vi(x) = IGrad(L(x)).nil = Idldni(L(x))1 \nWith this assumption the outputs of the edge detectors are correlated. Define a \n(local) correlation matrix R;j =< ViVj >. By assumption (1), Vk = la Gas(a -\nak)1, where a and a are random, independent variables, denoting the magnitude \nand direction of the local gradient and ak is the preferred angle of the detector. \nAssuming spatially isotropic local structure for natural scenes, we can calculate the \naverage of R by integrating over a uniform probability measure in a. Then \n\n(6) \n\nwhere A =< a2 > can be factored because of the assumption of statistical inde(cid:173)\npendence. By the homogeneity assumption, Rij is a function of the relative angle \nlai - aj I only. This allows us to easily calculate the integral in (6) from its Fourier \nseries. Indeed, in Fourier space R is just the square of the power spectrum of the \nunderlying signal, i.e., cos(a) on [0, 7r] . Thus we obtain the form of R analytically. \nKnowing the local correlations, we can find a recurrent linear filter which decorre(cid:173)\nlates the outputs after it is applied. This filter is W = 0 - p R-! (Sec.2), unique \nup to a unitary transformation . \n\nw \n\n0.8 \n\n0.6 \n\n0.4 \n\n0.2 \n\n-0.2 \n\n- 0. 4 \n\ne \n\nFigure 1: Local recurrent filter in the presence of noise. The connection strength W \ndepends only on the relative angle (J between units. \nIf we include noise in the calculation according to (5), we obtain a filter which \ndepends on the signal to noise ratio of the input level. We model the noise process \nhere as a set of independent noise processes for each unit, with (Ndi being the \ninput noise and (N2)i the output noise for unit i. All noise processes are assumed \nstatistically independent. The result for SI N2 \"\" 3 is shown on Fig.I. We observe \nthe broadening of the central connections, caused by the need to average local results \nin order to overcome the noise. It was calculated at very low Nl level, since, as \nmentioned in Section 2, the filter is unstable with respect to input noise. \n\nWith this filter we can directly compare calculations obtained from applying it to a \nspecific input signal, with physiological measurements of the orientation selectivity \nof cells in the cortex. The results of such comparisons are presented in Fig.2, in \nwhich we plot the activity of orientation selective cells in arbitrary units vs stimulus \nangle in degrees. We see very good matches with experimental results of Celebrini, \nThorpe, Trotter, and Imbert (1993), Schiller, Finlay, and Volman (1976) and Orban \n(1984) . We expect some discrepancies, such as in Figures 2.D and 2.F, which can \nbe attributed to the threshold nature of real neural units. We see that we can use \nthe model to classify physiologically distinct cells by the value of the N2 parameter \n\n\f856 \n\nA. Dimitrov and J. D. Cowan \n\nthat describes them. Indeed, since this parameter models the intrinsic noise of a \nneural unit, we expect it to differ across populations. \n\nA \n\n-10 \n\n-tOO \n\n..... \n\nD \n\nB \n\nE \n\nc \n\nF \n\nFigure 2: Comparison with experimental data. The activity of orientation selective cells \nin arbitrary units is plotted against stimulus angle in degrees. Experimental points are \ndenoted with circles, calculated result with a solid line. The variation in the forms of \nthe tuning curves could be accounted for by selecting different noise levels in our noise \nmodel. A - data from cell CAJ4 in Celebrini et.al. and fit for Nl = 0.1, N2 = 0.2. B(cid:173)\ndata from cell CAK2 in Celebrini et.al. and fit for Nl = 0.35, N2 = 0.1. C - data from \na complex cell from Orban and fit for Nl = 0.3, N2 = 0.45. D - data from a simple cell \nfrom Orban and fit for Nl = 1.0, N2 = 0.45. E - data from a simple cells in Schiller et.al. \nand fit for Nl = 0.06, N2 = 0.001. F - data from a simple cells in Schiller et.al. and fit for \nNl = 15.0, N2 = 0.01. \n\n3.2 Non-Local Optimal Filter \n\nWe can perform a similar analysis of the non-local structure of natural images to \ndesign a non-local optimal filter. This time we have a set of detectors Vk(X) = \nla(x) Cos(a(x) - k 7r/N) I and a correlation function Rij(X, y) =< Vi(x) Vj(y) >, \naveraged over natural scenes. We assume that the function is spatially translation \ninvariant and can be represented as Rij(X, y) = ~j(x - y) . The averaging was done \nover a set of about 100 different pictures, with 10-20 256 2 samples taken from each \npicture. \n\nThe structure of the correlation matrix depends both on the autocorrelator of the \ngradient field and the structure of the detectors, which are correlated. Obviously \nthe fact that the output units compute la(x) Cos(a(x) - k7r/N)1 creates many local \ncorrelations between neighboring units. Any non-local structure in the detector set \nis due to a similar structure, present in the gradient field autocorrelator. \nThe structure of the translation-invariant correlation matrix R( x) is shown in \nFig.3A. This can be interpreted as the correlation between the input to the center \nhypercolumn with the input to rest of the hypercolumns. The result of the complete \nmodel (5) can be seen in Fig.3B. Since the filter is also assumed to be translation \ninvariant, the pictures can be interpreted as the connectivity of the center hypercol(cid:173)\numn with the rest of the network. This is seen to be concentrated near the diagonal , \n\n\fSpatial Decorrelation in Orientation Tuned Cortical Cells \n\n857 \n\nA \n\nB \n\n2 \n\n-1 ... \n.. I \n~ , I I \n\n. \n\n. i-I \n\nFigure 3: A. The autocorrelation function of a set with 8 detectors. Dark represents high \ncorrelation, light - low correlation. The sets are indexed by the preferred angles Oi, OJ in \nunits of f and each RiJ has spatial structure, which is represented as a 32 x 32 square. B. \nThe lateral connectivity for the central horizontal selective unit with neighboring horizontal \n(1) and 1r/4 (2) selective units. Note the anisotropic connectivity and the rotation of the \nconnectivity axis on the second picture. \n\nand weak in the two adjacent bands, which represent connections to edge detectors \nwith a perpendicular preferred direction. The noise minimizing filter is a low pass \nfilter, as expected, and thus decreases the high frequency component of the power \nspectrum of the respective decorrelating filter. \n\n4 Conclusions and Discussion \n\nWe have shown that properties of orientation selective cells in the visual cortex can \nbe partially described by some very simple linear systems analysis. Using this we \nobtain results which are in very good agreement with physiological and anatomical \ndata of single-cell recordings and imaging. We can use the parameters of the model \nto classify functionally and structurally differing cells in the visual cortex. \n\nWe achieved this by using a recurrent network as the underlying model. This was \nchosen for several reasons. One is that we tried to give the model biological plau(cid:173)\nsibility and recurrency is well established on the cortical level. Another related \nheuristic argument is that although there exists a feed-forward network with equiv(cid:173)\nalent properties, as shown in Section 2, such a network will require an additional \nlayer of cells, while the recurrent model allows both for feed-forward processing (the \ninput to our model) as well as manipulation of the output of that (the decorrelation \nprocedure in our model). Finally, while a feed-forward network needs large weights \nto amplify the signal, a recurrent network is able to achieve very high gains on the \ninput signal with relatively small weights by utilizing special architecture. As can \nbe seen from our equivalence model, K = (6 - W)-l, so if W is so constructed as \nto have an eigenvalues close to 1, it will produce enormous amplification. \n\nOur work is based on previous suggestions relating the structure of the visual envi(cid:173)\nronment to the structure of the visual pathway. It was thought before (Atick and \nLi 1994) that this particular relation can describe only early visual pathways, but \nis insufficient to account for the structure of the striate cortex. We show here that \nredundancy reduction is still sufficient to describe many of the complexities of the \nvisual cortex, thus strengthening the possibility that this is a basic building princi-\n\n\f858 \n\nA. Dimitrov and J. D. Cowan \n\npIe for the visual system and one should anticipate its appearance in later regions \nof the latter. \n\nWhat is even more intriguing is the possibility that this method can account for \nthe structure of other sensory pathways and cortices. We know e.g. that the so(cid:173)\nmatosensory pathway and cortex are similar to the visual ones, because of the similar \nenvironments that they encounter (luminosity, edges and textures have analogies in \nsomesthesia). Similar analogies may be expected for the auditory pathway. \n\nWe expect even better results if we consider a more realistic non-linear model for \nthe neural units. In fact this improves tremendously the information-processing \nabilities of a bounded system, since it captures higher order correlations in the \nsignal and allows for true minimization of the mutual information in the system, \nrather than just decorrelating. Very promising results in this direction have been \nrecently described by Bell and Sejnowski (1996) and Lin and Cowan (1997) and we \nintend to consider the implications for our model. \n\nAcknowledgements \nSupported in part by Grant # 96-24 from the James S. McDonnell Foundation. \n\nReferences \n\nAtick, J. J. and Z. Li (1994). Towards a theory of the striate cortex. Neural \n\nComputation 6, 127-146. \n\nAtick, J. J. and N. N. Redlich (1992). What does the retina know about natural \n\nscenes? Neural Computation 4, 196-210. \n\nAtick, J . J. and N. N. Redlich (1993). Convergent algorithm for sensory receptive \n\nfield developement. Neural Computation 5, 45-60. \n\nAttneave, F. (1954). Some informational aspects of visual perception. Psycholog(cid:173)\n\nical Review 61, 183-193. \n\nBarlow, H. B. (1989). Unsupervised learning. Neural Computation 1,295-311. \nBell, A. T. and T. J. Sejnowski (1996). The \"independent components\" of natural \n\nscences are edge filters. Vision Research (submitted). \n\nCelebrini, S., S. Thorpe, Y. Trotter, and M. Imbert (1993). Dynamics of ori(cid:173)\nentation coding in area VI of the awake primate. Visual Neuroscience 10, \n811-825. \n\nDong, D. (1994). Associative decorrelation dynamics: \n\na theory of self(cid:173)\n\norganization and optimization in feedback networks. Volume 7 of Advances in \nNeural Information Processing Systems, pp. 925-932. The MIT Press. \n\nField, D. J. (1987). Relations between the statistics of natural images and the \n\nresponse properties of cortical cells. J. Opt. Soc. Am. 4, 2379-2394. \n\nLin, J . K. and J. D. Cowan (1997). Faithful representation of separable input \n\ndistributions. Neural Computation, (to appear). \n\nOrban, G. A. (1984). Neuronal Operations in the Visual Cortex. Springer-Verlag, \n\nBerlin. \n\nRuderman , D. L. and W. Bialek (1993). Statistics of natural images: Scaling in \nthe woods. In J. D. COWall, G. Tesauro, and J. Alspector (Eds.), Advances \nin Neural Information Processing Systems, Volume 6. Morgan Kaufman, San \nMateo, CA. \n\nSchiller, P., B. Finlay, and S. Volman (1976). Quantitative studies of single(cid:173)\n\ncell properties in monkey striate cortex. II. Orientation specificity and ocular \ndominance. J. Neuroph. 39(6), 1320-1333. \n\n\f", "award": [], "sourceid": 1304, "authors": [{"given_name": "Alexander", "family_name": "Dimitrov", "institution": null}, {"given_name": "Jack", "family_name": "Cowan", "institution": null}]}