{"title": "Estimating disparity with confidence from energy neurons", "book": "Advances in Neural Information Processing Systems", "page_first": 1537, "page_last": 1544, "abstract": "Binocular fusion takes place over a limited region smaller than one degree of visual angle (Panum's fusional area), which is on the order of the range of preferred disparities measured in populations of disparity-tuned neurons in the visual cortex. However, the actual range of binocular disparities encountered in natural scenes ranges over tens of degrees. This discrepancy suggests that there must be a mechanism for detecting whether the stimulus disparity is either inside or outside of the range of the preferred disparities in the population. Here, we present a statistical framework to derive feature in a population of V1 disparity neuron to determine the stimulus disparity within the preferred disparity range of the neural population. When optimized for natural images, it yields a feature that can be explained by the normalization which is a common model in V1 neurons. We further makes use of the feature to estimate the disparity in natural images. Our proposed model generates more correct estimates than coarse-to-fine multiple scales approaches and it can also identify regions with occlusion. The approach suggests another critical role for normalization in robust disparity estimation.", "full_text": "Estimating disparity with confidence from energy \n\nneurons\n\nEric K. C. Tsang\n\nDept. of Electronic and Computer Engr.\n\nHong Kong Univ. of Sci. and Tech.\n\nKowloon, HONG KONG SAR\n\neeeric@ee.ust.hk\n\nBertram E. Shi\n\nDept. of Electronic and Computer Engr.\n\nHong Kong Univ. of Sci. and Tech.\n\nKowloon, HONG KONG SAR\n\neebert@ee.ust.hk\n\nAbstract\n\nThe peak location in a population of phase-tuned neurons has been shown to be a\nmore reliable estimator for disparity than the peak location in a population of\nposition-tuned neurons. Unfortunately, the disparity range covered by a phase-\ntuned population is limited by phase wraparound. Thus, a single population can-\nnot cover the large range of disparities encountered in natural scenes unless the\nscale of the receptive fields is chosen to be very large, which results in very low\nresolution depth estimates. Here we describe a biologically plausible measure of\nthe confidence that the stimulus disparity is inside the range covered by a popula-\ntion of phase-tuned neurons. Based upon this confidence measure, we propose an\nalgorithm for disparity estimation that uses many populations of high-resolution\nphase-tuned neurons that are biased to different disparity ranges via position\nshifts between the left and right eye receptive fields. The population with the\nhighest confidence is used to estimate the stimulus disparity. We show that this\nalgorithm outperforms a previously proposed coarse-to-fine algorithm for dispar-\nity estimation, which uses disparity estimates from coarse scales to select the\npopulations used at finer scales and can effectively detect occlusions.\n\n1 Introduction \nBinocular disparity, the displacement between the image locations of an object between two eyes or\ncameras, is an important depth cue. Mammalian brains appear to represent the stimulus disparity\nusing populations of disparity-tuned neurons in the visual cortex [1][2]. The binocular energy\nmodel is a first order model that explains the responses of individual disparity-tuned neurons [3]. In\nthis model, the preferred disparity tuning of the neurons is determined by the phase and position\nshifts between the left and right monocular receptive fields (RFs). \nPeak picking is a common disparity estimation strategy for these neurons([4]-[6]). In this strategy,\nthe disparity estimates are computed by the preferred disparity of the neuron with the largest\nresponse among the neural population. Chen and Qian [4] have suggested that the peak location in\na population of phase-tuned disparity energy neurons is a more reliable estimate than the peak loca-\ntion in a population of position-tuned neurons. \nIt is difficult to estimate disparity from a single phase-tuned neuron population because its range of\npreferred disparities is limited. Figure 1 shows the population response of phase-tuned neurons\n(vertical cross section) for different stimulus disparities. If the stimulus disparity is confined to the\nrange of preferred disparities of this population, the peak location changes linearly with the stimu-\nlus disparity. Thus, we can estimate the disparity from the peak. However, in natural viewing condi-\ntion, the stimulus disparity ranges over ten times larger than the range of the preferred disparities of\nthe population [7]. The peak location no longer indicates the stimulus disparity, since the peaks still\noccur even when the stimulus disparity is outside the range of neurons\u2019 preferred disparities. The\nfalse peaks arise from two sources: the phase wrap-around due to the sinusoidal modulation in the\n\n\fDpref\n\n5\n0\n-5\n\n-40\n\n-20\n\n-30\n30\nstimulus disparity (pixels)\n\n-10\n\n10\n\n20\n\n0\n\n40\n\nFig. 1: Sample population responses of the phase-tuned disparity neurons for different disparities.\nThis was generated by presenting the left image of the \u201cCones\u201d stereogram shown in Figure 5a to\nboth eyes but varying the disparity by keeping the left image fixed and shifting the right image. At\neach point, the image intensity represents the response of a disparity neuron tuned to a fixed\npreferred disparity (vertical axis) in response to a fixed stimulus disparity (horizontal axis). The\ndashed vertical lines indicate the stimulus disparities that fall within the range of preferred\ndisparities of the population (\n\n pixels).\n\n8\u00b1\n\nGabor function modelling neuron\u2019s receptive field (RF) profile, or unmatching edges entering the\nneuron's RF [5].\nAlthough a single population can cover a large disparity range, the large size of the required recep-\ntive fields results in very low resolution depth estimates. To address this problem, Chen and Qian\n[4] proposed a coarse-to-fine algorithm which refines the estimates computed from coarse scales\nusing populations tuned to finer scales.\nHere we present an alternative way to estimate the stimulus disparity using a biologically plausible\nconfidence measure that indicates whether the stimulus disparity lies inside or outside the range of\npreferred disparities in a population of phase tuned neurons. We motivate this measure by examin-\ning the empirical statistics of the model neuron responses on natural images. Finally, we demon-\nstrate the efficacy of using this measure to estimate the stimulus disparity. Our model generates\nbetter estimates than the coarse-to-fine approach [4], and can detect occlusions. \n\n2 Features of the phase-tuned disparity population\nIn this section, we define different features of a population of phase-tuned neurons. These features\nwill be used to define the confidence measure. Figure 2a illustrates the binocular disparity energy\nmodel of a phase-tuned neuron [3]. For simplicity, we assume 1D processing, which is equivalent\nto considering one orientation in the 2D case. The response of a binocular simple cell is modelled\nby summing of the outputs of linear monocular Gabor filters applied to both left and right images,\nfollowed by a positive or negative half squaring nonlinearity. The response of a binocular complex\ncell is the sum of the four simple cell responses. \n denotes the dis-\nFormally, we define the left and right retinal images by \n is the difference between the locations of corresponding\ntance from the RF center. The disparity \n in the left image\npoints in the left and right images, i.e., an object that appears at point \nappears at point \n in the right image. Pairs of monocular responses are generated by integrating\nimage intensities weighted by pairs of phase quadrature RF profiles, which are the real and imagi-\nnary parts of a complex-valued Gabor function (\n\n, where \n\nUr x( )\n\nUl x( )\n\n and \n\nd+\n\n1\u2013\n\n):\n\n=\n\nd\n\nx\n\nx\n\nx\n\nj\n\n(\n\n(\n\n\u03c3\n\n)\n\n=\n\n)\n\n=\n\n\u03a9\n\ncos\n\ng x( )\n\n and \n\ng x( )ej \u03a9x \u03c8+\n\n\u03a9x \u03c8+\n\nh x \u03c8,(\n\u03c8\n\n is a zero mean Gaussian with standard deviation \n\n(1)\nwhere \n are the spatial frequency and the phase of the left and right monocular RFs, and\ng x( )\n, which is inversely proportional to the spa-\ntial frequency bandwidth. The spatial frequency and the standard deviation of the left and right RFs\n). We can compactly express the pairs of left\nare identical, but the phases may differ (\nand right monocular responses as the real and imaginary parts of \n and\nVr \u03c8r\n)\n\n, where with a slight abuse of notation, we define\n\nVrej\u03c8r\n\nVlej\u03c8l\n\n\u03a9x \u03c8+\n\nVl \u03c8l\n\n and \n\njg x( )\n\n\u03c8r\n\nsin\n\n\u03c8l\n\n=\n\n)\n\n+\n\n(\n\n)\n\n(\n\n=\n\n(\n\n)\n\nVl\n\n\u222b=\n\ng x( )ej\u03a9xUl x( ) xd\n\n and \n\nVr\n\n\u222b=\n\ng x( )ej\u03a9xUr x( ) xd\n\n(2)\n\n\f(a)\n\nUl x( )\n\nh x \u03c8l\n,(\n)\nRe.\n\nIm.\n\nh x \u03c8r\n,(\n)\n\nRe.\n\n\u03a3\n\n\u03a3\n\nHalf Squaring\n\n(b)\n\nP\n\n\u03a3\n\nEd \u0394\u03c8(\n)\n\nUr x( )\n\n\u03c8l\n\n0=\n\nIm.\n\u03c8r\n\n\u03c0\n---=\n2\n\nBinocular \nSimple Cells\n\nBinocular \nComplex Cell\n\nEd \u0394\u03c8(\n\n)\n\nS\n\n\u0394\u03c8\n\n\u03c0\u2013\n\n\u0394\u03a6\n\n\u03c0\n\nFig. 2: (a) Binocular disparity energy model of a disparity neuron in the phase-shift mechanism.\n between the left and right monocular RFs determines the preferred\nThe phase-shift \ndisparity of the neuron. The neuron shown is tuned to a negative disparity of \n. (b) The\n centered at a retinal location with the\npopulation response of the phase-tuned neurons \nphase-shifts \n\n can be characterized by three features \n\n\u03c8r \u03c8l\u2013\n\n\u03c0 2\u03a9(\n\u2044\u2013\n\nEd \u0394\u03c8(\n\n and \n\nS P,\n\n\u0394\u03a6\n\n\u0394\u03c8\n\n\u03c0\u2013\n\n\u03c0,\n\n\u2208\n\n)\n\n)\n\n[\n\n]\n\n.\n\nThe response of the binocular complex cell (the disparity energy) is the squared modulus of the\nsum of the monocular responses:\n\nEd \u0394\u03c8(\n\n)\n\n=\n\nVlej\u03c8l Vrej\u03c8r\n\n+\n\n2\n\n=\n\nVl\n\n2 VlVr\n+\n\n*e j\u2013 \u0394\u03c8 Vl\n\n+\n\n*Vrej\u0394\u03c8\n\n+\n\n2\n\nVr\n\n(3)\n\n=\n\n\u03c8r \u03c8l\u2013\n\n controls the preferred disparity \n\nwhere the * superscript indicates the complex conjugation. The phase-shift between the right and\nleft neurons \n of the binocular\n\u0394\u03c8\ncomplex cell [6].\nIf we fix the stimulus and allow \npopulation response of phase-tuned neurons whose preferred disparities range between \n and \n\u03c0 \u03a9\u2044\n\nEd \u0394\u03c8(\n. The population response can be completely specified by three features \n\n to vary between \n\n, the function \n\nDpref \u0394\u03c8(\n\n\u0394\u03c8 \u03a9\u2044\n\nS P\n\n\u2013\u2248\n\n\u0394\u03c8\n\n\u03c0\u00b1\n\n, \n\n)\n\n)\n\n in (3) describes the\n and\n [4][5].\n(4)\n\n\u03c0 \u03a9\u2044\u2013\n\u0394\u03a6\n\nwhere\n\nEd \u0394\u03c8(\n\n)\n\n=\n\nS P\n\n+\n\ncos\n\n(\n\n\u0394\u03a6 \u0394\u03c8\u2013\n\n)\n\nS\nP\n\n\u0394\u03a6\n\n=\n=\n\n=\n\n2\n\nVl\n+\n2 Vl Vr\n\u03a6l \u03a6r\n\n\u2013\n\n=\n\n2\n\nVr\n=\n\n*\n2 VlVr\n*\nVlVr\narg\n(\n\n)\n\n(5)\n\n is the average\nFigure 2b shows the graphical interpretation of these features. The feature \nP\n is the difference between the peak and average\nresponse across the population. The feature \nVl\n is the peak location\nresponses. Note that \n\u2013\nof the population response. Peak picking algorithms compute the estimates from the peak location,\ni.e. \n\n. The feature \n\n, since \n\n\u0394\u03a6 \u03a9\u2044\n\nS P\u2265\n\nS P\u2013\n\n [6]. \n\n\u2013=\n\n\u0394\u03a6\n\n0>\n\nVr\n\n)2\n\n=\n\n(\n\nS\n\ndest\n\n3 Feature Analysis \nIn this section, we suggest a simple confidence measure that can be used to differentiate between\ntwo classes of stimulus disparities: DIN and DOUT corresponding to stimulus disparities inside\n(\nWe find this confidence measure by analyzing the empirical joint densities of \nR\n=\nering \n\nS\n conditioned on the two disparity classes. Considering \n\u0394\u03a6\n\n and the ratio\n and \n is equivalent to consid-\n will be less effective in distin-\n\n) the range of preferred disparities in the population. \n\n. Intuitively, the peak location \n\n) and outside (\n\n. We ignore \n\nP S\u2044\nS\n\n\u03c0 \u03a9\u2044>\n\n\u03c0 \u03a9\u2044\u2264\n\n and \n\n\u0394\u03a6\n\nR\n\nP\n\nS\n\nd\n\nd\n\n\f(a)\n\nR\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n(b)\n\nR\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n5\n\n10\n\nS\n\n15\n\n20\n\n5\n\n10\n\nS\n\n15\n\n20\n\n(c)\n\nR\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\nx 10-3\n\n(d)\n\nPe\u0394\n\n8\n\n6\n\n4\n\n2\n\n15\n\n20\n\n0\n0.1\n\n5\n\n10\n\nS\n\n0.2\n\n0.3\n\nP DIN[\n\n0.4\n\n0.5\n\n]\n\nFig. 3: The empirical joint density of \n given (a) DIN and (b) DOUT. Red indicates large\nvalues. Blue indicates small values. (c) The optimal decision boundaries derived from the Bayes\nfactor. (d) The change in total probability of error \n between using a flat boundary (thresholding\nR\n\n) versus the optimal boundary. \n\nPe\u0394\n\n and \n\nR\n\nS\n\nR\n\nguishing between DIN and DOUT, since Figure 1 shows that the phase ranges between \nfor both disparity classes. The ratio \nBecause of the uncertainties in the natural scenes, the features \n are random variables. In\nmaking a decision based on random features, Bayesian classifiers minimize the classification error.\nBayesian classifiers compare the conditional probabilities of the two disparity classes (DIN and\nDOUT) given the observed feature values. The decision can be specified by thresholding the Bayes\nfactor.\n\n is bounded between \n\n1\n and \n\n and \nS\n\n, since \n\nS P\u2265\n\n and \n\n\u03c0\u2013\n\nR\n\n\u03c0\n\n0\n\n.\n\nBS R,\n\n=\n\n(\n\nfS R, C s r, DIN\n)\n----------------------------------------------\nfS R, C s r, DOUT\n)\n\n(\n\nDIN\n<>\n\nDOUT\n\nTS R,\n\n(6)\n\nc\n\n)\n\n\u03c3\n\n]\n\n]\n\nS\n\nR\n\nc\n\n)\n\n(\n\n}\n(\n\n}\n\n. \n\n[\n{\n\n8\u00b1\n\n2\u03c3\n\nTS R,\n\n and \n\nP DIN[\n\n and \nc\n\nDIN DOUT\n\nfS R, C s r,\n\nP DOUT\n\u2208\n,\n\n and depends upon the prior class probabilities \n\n is the conditional density of the features given the class \n\n controls the location of the decision boundary in the feature space\n. The function\n\nwhere the threshold \nS R,{\nfS R, C s r,\nTo find the optimal decision boundary for the features \n, we estimated the joint class likeli-\nhood \n from data obtained using the \u201cCones\u201d and the \u201cTeddy\u201d stereograms from Mid-\ndlebury College [8][9], shown in Figure 5a. The stereograms are rectified, so that the\ncorrespondences are located in the same horizontal scan-lines. Each image has 1500 x 1800 pixels.\nWe constructed a population of phase-tuned neurons at each pixel. The disparity neurons had the\nsame spatial frequency and standard deviation, and were selective to vertical orientations. The spa-\ntial frequency was \n radians per pixel and the standard deviation in the horizontal\n2\u03c0 16\u2044\n=\n\u03a9\n pixels, corresponding to a spatial bandwidth of 1.8 octaves. The standard\ndirection was \n=\n6.78\ndeviation in the vertical direction was \n. The range of the preferred disparities (DIN) of the pop-\nulation is between \n pixels. To reduce the variability in the classification, we also applied Gauss-\nR\nian spatial pooling with the standard deviation \ncomputed from population were separated into two classes (DIN and DOUT) according to the\nground truth in Figure 5b. \nFigure 3a-b show the empirically estimated joint conditional densities for the two disparity classes.\nThey were computed by binning the features \n and 0.01 for\nR\n. Given the disparity within the range of preferred disparities (DIN), the joint density concen-\ntrates at small \n. For the out-of-range disparities (DOUT), the joint density shifts to\nboth large \n. Intuitively, a horizontal hyperplane, illustrated by the red dotted line in\nFigure 3a-b, is an appropriate decision boundary to separate the DIN and DOUT data. This indi-\ncates that the feature \n can be an indicator to distinguish between the in-range and out-of-range\ndisparities. Mathematically, we can compute the optimal decision boundaries by applying different\nthresholds to the Bayes factor in (6). Figure 3c shows the boundaries. They are basically flat except\nat small \nWe also demonstrate the efficacy of thresholding \n instead of using the optimal decision boundar-\nies to distinguish between in-range and out-of-range disparities. Given the prior class probability\n\n to the population [4][5]. The features \n\n with the bin sizes of 0.25 for \n\n and small \n\n and large \n\n and \n\n and \n\n0.5\u03c3\n\nS\n\n. \n\nS\n\nR\n\nR\n\nR\n\nS\n\nS\n\nS\n\nS\n\nR\n\nR\n\n\fphase tuned population\n\nUl x( )\n\nEd \u0394\u03c8(\n\n)\n\nR128 \u0394\u03a6128\n\n,\n\n\u0394c\n\n=\n\n128\n\nEd \u0394\u03c8(\n\n)\n\nUr x( )\n\n\u0394c\n\n0=\n\nEd \u0394\u03c8(\n)\n\n\u0394c\n\n\u2013=\n\n128\n\nR\u0394c*\n\u0394\u03a6\u0394c*\n\nl\nl\na\n \ne\nk\na\nt\n \nr\ne\nn\nn\ni\nW\n\nR TR>\n\nDIN\n/DOUT\n\ndest\n\nFig. 4: Proposed disparity estimator with the validation of disparity estimates. \n\n]\n\n, we compute a threshold \n\nc\n\n\u2208\n\n0 1,[\n\n]\n\n that minimizes the total probability of classification\n\nP DIN[\nerror: \n\nPe\n\n=\n\nP DIN[\n\n]\n\n\u222b\nR c<\n\nfS R, C s r, DIN\n\n(\n\n)\n\n+\n\n1 P DIN[\n\u2013(\n\n]\n\n)\n\nfS R, C s r, DOUT\n\n(\n\n)\n\n(7)\n\n\u222b\nR c>\n\nWe then compare this total probability of error with the one computed using the optimal decision\nboundaries derived in (6). Figure 3d shows the deviation in the total probability of error between\nthe two approaches versus \n) suggesting that\nR\nthresholding \ncan be used as a confidence measure for distinguishing DIN and DOUT. Moreover, this measure\ncan be computed by normalization, which is a common component in models for V1 neurons [11]. \n\n results in similar performance as using the optimal decision boundaries. Thus, \n\n. The deviation is small (on the order of \n\nP DIN[\n\n10 2\u2013\n\nR\n\n]\n\nR\n\n\u0394\u03c8\n\n\u0394c\n\n4 Hybrid position-phase model for disparity estimation with validation \nOur analysis above shows that \n is a simple indicator to distinguish between in-range and out-of-\nrange disparities. In this section, we describe a model that uses this feature to estimate the stimulus\ndisparity with validation. \nFigure 4 shows the proposed model, which consists of populations of hybrid tuned disparity neu-\nrons tuned to different phase-shifts \n. For each population tuned to the\nsame position-shift but different phase-shifts (phase-tuned population), we compute the ratio\nR\u0394c\n can be computed by pooling the responses of the\nentire phase-tuned neurons. The feature \n can be computed by subtracting the peak response\nS\u0394c P\u0394c\n at dif-\nferent position-shifts are compared through a winner-take-all network to select the position-shift\n\u0394c*\n\u0394\u03a6\u0394c*\nby \n\nR\u0394c\n. The disparity estimate is further refined by the peak location \n\n of the phase tuned population with the average activation \n\n. The average activation \n\n and position-shifts \n\n with the maximum \n\n. The features \n\nP\u0394c S\u0394c\n\nS\u0394c\nP\u0394c\n\nR\u0394c\n\n\u2044\n\n=\n\n+\n\nS\u0394c\n\ndest\n\n=\n\n\u0394c*\n\n\u2013\n\n\u0394\u03a6\u0394c*\n-----------------\n\n\u03a9\n\n(8)\n\nIn additional to estimate the stimulus disparity, we also validate the estimates by comparing \nwith a threshold \nfeature \n\nR\u0394c*\n. Instead of choosing a fixed threshold, we vary the threshold to show that the\n\n can be an occlusion detector. \n\nTR\n\nR\u0394c\n\n4.1 Disparity estimation with confidence\nWe applied the proposed model to estimate the disparity of the \u201cCones\u201d and the \u201cTeddy\u201d stereo-\ngrams, shown in Figure 5a. The spatial frequency and the spatial standard deviation of the neurons\n\n\f(a)\n\nleft\n\nright\n\n(b)\n\n(c)\n\ns\ne\nn\no\nC\n\ny\nd\nd\ne\nT\n\n(d)\n\nestimate\n\nerror\n\n(e)\n\n-100\n\n0\n\n100\n\nestimate\n\nerror\n\nFig. 5: (a) The two natural stereograms used to evaluate the model performance. (b) The ground\ntruth disparity maps with respect to the left images, obtained by the structured light method. (c) The\nground truth occlusion maps. (d) The disparity maps and the error maps computed by the coarse-to-\nfine approach. (e) The disparity maps and the error maps computed by the proposed model. The\ndetected invalid estimates are labelled in black in the disparity maps. \n\n\u03c3\n\n128\u00b1\n\n pixels, according to the ground truth.\n\nwere kept the same as the previous analysis. We also performed spatial pooling and orientation\npooling to improve the estimation. For spatial pooling, we applied a circularly symmetric Gaussian\nfunction with standard deviation \n. For orientation pooling, we pooled the responses over five ori-\nentations ranging from 30 to 150 degrees. The range of the position-shifts for the populations was\nset to the largest disparity range, \nWe also implemented the coarse-to-fine model as described in [4] for comparison. In this model, an\ninitial disparity estimate computed from a population of phase-tuned neurons at the coarsest scale is\nsuccessively refined by the populations of phase-tuned neurons at the finer scales. By choosing the\ncoarsest scale large enough, the disparity range covered by this method can be arbitrarily large. The\ncoarsest and the finest scales had the Gabor periods of 512 and 16 pixels. The Gabor periods of the\nsuccessive scales differed by a factor of \n. Neurons at the finest scale had the same RF parame-\nters as our model. Same spatial pooling and orientation pooling were applied on each scale. \nFigure 5d-e show the estimated disparity maps and the error maps of the two approaches. The error\nmaps show the regions where the disparity estimates exceed 1 pixel of error in the disparity. Both\nmodels correctly recover the stimulus disparity at most locations with gradual disparity changes,\nbut tend to make errors at the depth boundaries. However, the proposed model generates more\naccurate estimates. In the coarse-to-fine model, the percentage of pixels being incorrectly estimated\nis 36.3%, while our proposed model is only 27.8%. \n\n2\n\n\fThe coarse-to-fine model tends to make errors around the depth boundaries. This arises because the\nassumption that the stimulus disparity is constant over the RF of the neuron is unlikely at very large\nscales. At boundaries, the coarse-to-fine model generates poor initial estimates, which cannot be\ncorrected at the finer scales, because the actual stimulus disparities are outside the range considered\nat the finer scales.\nOn the other hand, the proposed model can not only estimate the stimulus disparity, but also can\nvalidate the estimates. In general, the responses of neurons selective to different position disparities\nare not comparable, since they depend upon image contrast which varies at different spatial loca-\ntions. However, the feature \n, which is computed by normalizing the response peak by the average\nresponse, eliminates such dependency. Moreover, the invalid regions detected (the black regions on\nthe disparity maps) are in excellent agreement with the error labels.\n\nR\n\n4.2 Occlusion detection \nIn addition to validating the disparity estimates, the feature \n can also be used to detect occlusion.\nOcclusion is one of the challenging problems in stereo vision. Occlusion occurs near the depth dis-\ncontinuities where there is no correspondence between the left and right images. The disparity in\nthe occlusion regions is undefined. The occlusion regions for these stereograms are shown in\nFigure 5c.\nThere are three possibilities for image pixels that are labelled as out of range (DOUT). They are\noccluded pixels, pixels with valid disparities that are incorrectly estimated, and pixels with valid\ndisparity that are correctly estimated. Figure 6a shows the percentages of DOUT pixels that fall\ninto each possibility as the threshold \n\n applied to \n\nR\n\nP1 occluded\n\n(\n\n)\n\n=\n\n\u00d7\n\n100%\n\n(9)\n\nR\n\nTR\n varies, e.g.,\n# of occluded pixels in DOUT\n------------------------------------------------------------------------\n\ntotal # of pixels in DOUT\n\n. For small thresholds, the detector mainly\nThese percentages sum to unity for any thresholds \nidentifies the occlusion regions. As the threshold increases, the detector also begins to detect incor-\nrect disparity estimates. Figure 6b shows the percentages of pixels in each possibility that are clas-\nsified as DOUT as a function of \n\nTR\n\nP2 occluded\n\n(\n\n)\n\n=\n\nTR\n\n, e.g.,\n# of occluded pixels in DOUT\n------------------------------------------------------------------------\n# of occluded pixels in image\n\n\u00d7\n\n100%\n\n(10)\n\nFor a large threshold (\n close to unity), all estimates are labelled as DOUT, so the three percent-\nages approach 100%. The proposed detector is effective in identifying occlusion. At the threshold\nTR\n, it identifies ~70% of the occluded pixels, ~20% of the pixels with incorrect estimates\nwith only ~10% misclassification.\n\n0.3\n\nTR\n\n=\n\n(a)\n\n)\n\n%\n0\n0\n1\nx\n(\n \n1\nP\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n(b)\n\n)\n\n%\n0\n0\n1\nx\n(\n \n2\nP\n\n1\n\n0.8\n\n0.6\n\n0.4\n\n0.2\n\n0\n\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\nTR\n\n0\n\n0\n\n0.2\n\n0.4\n\n0.6\n\n0.8\n\n1\n\nTR\n\nFig. 6: The percentages of occluded pixels (thick), pixels with incorrect disparity estimates (thin)\nand pixels with correct estimates (dotted) identified as DOUT. (a) Percentages as a fraction of total\nnumber of DOUT pixels. (b) Percentages as a fraction of number of pixels of each type. \n\n\f5 Discussion\nIn this paper, we have proposed an algorithm to estimate stimulus disparities based on a confidence\nmeasure computed from population of hybrid tuned disparity neurons. Although there have been\npreviously proposed models that estimate the stimulus disparity from populations of hybrid tuned\nneurons [4][10], our model is the first that also provides a confidence measure for these estimates.\nOur analysis suggests that pixels with low confidence are likely to be in occluded regions. The\ndetection of occlusion, an important problem in stereo vision, was not addressed in these previous\napproaches.\nThe confidence measure used in the proposed algorithm can be computed using normalization,\nwhich has been used to model the responses of V1 neurons [11]. Previous work has emphasized the\nrole of normalization in reducing the effect of image contrast or in ensuring that the neural\nresponses tuned to different stimulus dimensions are comparable [12]. Our results show that, in\naddition to these roles, normalization also serves to make the magnitude of the neural responses\nmore representative of the confidence in validating the hypothesis that the input disparity is close to\nthe neurons preferred disparity. The classification performance using this normalized feature is\nclose to that using the statistical optimal boundaries. \nAggregating the neural responses over locations, orientations and scales is a common technique to\nimprove the estimation performance. For the consistency with the coarse-to-fine approach, our\nalgorithm also applies spatial and orientation pooling before computing the confidence. An inter-\nesting question, which we are now investigating, is whether individual confidence measures com-\nputed from different locations or orientations can be combined systematically. \n\nAcknowledgements\nThis work was supported in part by the Hong Kong Research Grants Council under Grant 619205. \n\nReferences\n[1]\n\nH. B. Barlow, C. Blakemore, and J. D. Pettigrew. The neural mechanism of binocular depth discrimi-\nnation. Journal of Neurophysiology, vol. 193(2), 327-342, 1967.\nG. F. Poggio, B. C. Motter, S. Squatrito, and Y. Trotter. Responses of neurons in visual cortex (V1 and\nV2) of the alert macaque to dynamic random-dot stereograms. Vision Research, vol. 25, 397-406,\n1985.\nI. Ohzawa, G. C. Deangelis, and R. D. Freeman. Stereoscopic depth discrimination in the visual cortex:\nneurons ideally suited as disparity detectors. Science, vol. 249, 1037-1041, 1990. \nY. Chen and N. Qian. A Coarse-to-Fine Disparity Energy Model with Both Phase-Shift and Position-\nShift Receptive Field Mechanisms. Neural Computation, vol. 16, 1545-1577, 2004.\nD. J. Fleet, H. Wagner and D. J. Heeger. Neural encoding of binocular disparity: energy models, posi-\ntion shifts and phase shifts. Vision Research, 1996, vol. 36, 1839-1857. \nN. Qian, and Y. Zhu. Physiological computation of binocular disparity. Vision Research, vol. 37, 1811-\n1827, 1997. \nS. J. D. Prince, B. G. Cumming, and A. J. Parker. Range and Mechanism of Encoding of Horizontal\nDisparity in Macaque V1. Journal of Neurophysiology, vol. 87, 209-221, 2002.\nD. Scharstein and R. Szeliski. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspon-\ndence Algorithms. International Journal of Computer Vision, vol. 47(1/2/3), 7-42, 2002. \nD. Scharstein and R. Szeliski. High-accuracy stereo depth maps using structured light. IEEE Confer-\nence on Computer Vision and Pattern Recognition, vol. 1, 195-202, 2003.\nJ. C. A. Read and B. G. Cumming. Sensors for impossible stimuli may solve the stereo correspondence\nproblem. Nature Neuroscience, vol. 10, 1322-1328, 2007.\nD. J. Heeger. Normalization of cell responses in cat striate cortex. Visual Neuroscience, vol. 9, 181-\n198, 1992.\nS. R. Lehky and T. J. Sejnowski. Neural model of stereoacuity and depth interpolation based on a dis-\ntributed representation of stereo disparity. Journal of Neuroscience, vol. 10, 2281-2299, 1990.\n\n[2]\n\n[3]\n\n[4]\n\n[5]\n\n[6]\n\n[7]\n\n[8]\n\n[9]\n\n[10]\n\n[11]\n\n[12]\n\n\f", "award": [], "sourceid": 1076, "authors": [{"given_name": "Eric", "family_name": "Tsang", "institution": null}, {"given_name": "Bertram", "family_name": "Shi", "institution": null}]}