{"title": "Can Peripheral Representations Improve Clutter Metrics on Complex Scenes?", "book": "Advances in Neural Information Processing Systems", "page_first": 2847, "page_last": 2855, "abstract": "Previous studies have proposed image-based clutter measures that correlate with human search times and/or eye movements. However, most models do not take into account the fact that the effects of clutter interact with the foveated nature of the human visual system: visual clutter further from the fovea has an increasing detrimental influence on perception. Here, we introduce a new foveated clutter model to predict the detrimental effects in target search utilizing a forced fixation search task. We use Feature Congestion (Rosenholtz et al.) as our non foveated clutter model, and we stack a peripheral architecture on top of Feature Congestion for our foveated model. We introduce the Peripheral Integration Feature Congestion (PIFC) coefficient, as a fundamental ingredient of our model that modulates clutter as a non-linear gain contingent on eccentricity. We finally show that Foveated Feature Congestion (FFC) clutter scores (r(44) = \u22120.82 \u00b1 0.04, p < 0.0001) correlate better with target detection (hit rate) than regular Feature Congestion (r(44) = \u22120.19 \u00b1 0.13, p = 0.0774) in forced fixation search; and we extend foveation to other clutter models showing stronger correlations in all cases. Thus, our model allows us to enrich clutter perception research by computing fixation specific clutter maps. Code for building peripheral representations is  available.", "full_text": "Can Peripheral Representations Improve Clutter\n\nMetrics on Complex Scenes?\n\nArturo Deza\n\nDynamical Neuroscience\n\nInstitute for Collaborative Biotechnologies\n\nUC Santa Barbara, CA, USA\n\ndeza@dyns.ucsb.edu\n\nMiguel P. Eckstein\n\nPsychological and Brain Sciences\n\nInstitute for Collaborative Biotechnologies\n\nUC Santa Barbara, CA, USA\neckstein@psych.ucsb.edu\n\nAbstract\n\nPrevious studies have proposed image-based clutter measures that correlate with\nhuman search times and/or eye movements. However, most models do not take\ninto account the fact that the effects of clutter interact with the foveated nature of\nthe human visual system: visual clutter further from the fovea has an increasing\ndetrimental in\ufb02uence on perception. Here, we introduce a new foveated clutter\nmodel to predict the detrimental effects in target search utilizing a forced \ufb01xation\nsearch task. We use Feature Congestion (Rosenholtz et al.) as our non foveated\nclutter model, and we stack a peripheral architecture on top of Feature Congestion\nfor our foveated model. We introduce the Peripheral Integration Feature Congestion\n(PIFC) coef\ufb01cient, as a fundamental ingredient of our model that modulates clutter\nas a non-linear gain contingent on eccentricity. We show that Foveated Feature\nCongestion (FFC) clutter scores (r(44) = \u22120.82 \u00b1 0.04, p < 0.0001) correlate\nbetter with target detection (hit rate) than regular Feature Congestion (r(44) =\n\u22120.19 \u00b1 0.13, p = 0.0774) in forced \ufb01xation search; and we extend foveation to\nother clutter models showing stronger correlations in all cases. Thus, our model\nallows us to enrich clutter perception research by computing \ufb01xation speci\ufb01c clutter\nmaps. Code for building peripheral representations is available1.\n\nIntroduction\n\n1\nWhat is clutter? While it seems easy to make sense of a cluttered desk vs an uncluttered desk at a\nglance, it is hard to quantify clutter with a number. Is a cluttered desk, one stacked with papers? Or is\nan uncluttered desk, one that is more organized irrelevant of number of items? An important goal in\nclutter research has been to develop an image based computational model that outputs a quantitative\nmeasure that correlates with human perceptual behavior [19, 12, 24, 21]. Previous studies have\ncreated models that output global or regional metrics to measure clutter perception. Such measures\nare aimed to predict the in\ufb02uence of clutter on perception. However, one important aspect of human\nvisual perception is that it is not space invariant: the fovea processes visual information with high\nspatial detail while regions away from the central fovea have access to lower spatial detail. Thus,\nthe in\ufb02uence of clutter on perception can depend on the retinal location of the stimulus and such\nin\ufb02uences will likely interact with the information content in the stimulus.\nThe goal of the current paper is to develop a foveated clutter model that can successfully predict\nthe interaction between retinal eccentricity and image content in modulating the in\ufb02uence of clutter\non perceptual behavior. We introduce a foveated mechanism based on the peripheral architecture\nproposed by Freeman and Simoncelli [9] and stack it into a current clutter model (Feature Conges-\ntion [23, 24]) to generate a clutter map that arises from a calculation of information loss with retinal\neccentricity but is multiplicatively modulated by the original unfoveated clutter score. The new\n\n1Piranhas Toolkit: https://github.com/ArturoDeza/Piranhas\n\n30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.\n\n\fFigure 1: The Feature Congestion pipeline as explained in Rosenholtz et al. [24]. A color, contrast\nand orientation feature map for each spatial pyramid is extracted, and the max value of each is\ncomputed as the \ufb01nal feature map. The Feature Congestion map is then computed by a weighted sum\nover each feature map. The Feature Congestion score is the mean value of the map.\n\nmeasure is evaluated in a gaze-contingent psychophysical experiment measuring target detection in\ncomplex scenes as a function of target retinal eccentricity. We show that the foveated clutter models\nthat account for loss of information in the periphery correlates better with human target detection\n(hit rate) across retinal eccentricities than non-foveated models. Although the model is presented in\nthe context of Feature Congestion, the framework can be extended to any previous or future clutter\nmetrics that produce clutter scores that are computed from a global pixel-wise clutter map.\n\n2 Previous Work\nPrevious studies have developed general measures of clutter computed for an entire image and do\nnot consider the space-variant properties of the human visual system. Because our work seeks to\nmodel and assess the interaction between clutter and retinal location, experiments manipulating\nthe eccentricity of a target while observers hold \ufb01xation (gaze contingent forced \ufb01xation) are most\nappropriate to evaluate the model. To our knowledge there has been no systematic evaluation of\n\ufb01xation dependent clutter models with forced \ufb01xation target detection in scenes. In this section, we\nwill give an overview of state-of-the-art clutter models, metrics and evaluations.\n\n2.1 Clutter Models\n\nFeature Congestion: Feature Congestion, initially proposed by [23, 24] produces both a pixel-wise\nclutter score map as a well as a global clutter score for any input image or Region of Interest (ROI).\nEach clutter map is computed by combining a Color map in CIELab space, an orientation map [14],\nand a local contrast map at multiple scales through Gaussian Pyramids [5]. One of the main advan-\ntages Feature Congestion has is that each pixel-wise clutter score (Fig. 1) and global score can be\ncomputed in less than a second. Furthermore, this is one of the few models that can output a speci\ufb01c\nclutter score for any pixel or ROI in an image. This will be crucial for developing a foveated model\nas explained in Section 4.\nEdge Density: Edge Density computes a ratio after applying an Edge Detector on the input im-\nage [19]. The \ufb01nal clutter score is the ratio of edges to total number of pixels present in the image.\nThe intuition for this metric is straightforward: \u201cthe more edges, the more clutter\u201d (due to objects for\nexample).\nSubband Entropy: The Subband Entropy model begins by computing N steerable pyramids [25] at\nK orientations across each channel from the input image in CIELab color space. Once each N \u00d7 K\nsubband is collected for each channel, the entropy for each oriented pyramid is computed pixelwise\nand they are averaged separately. Thus, Subband Entropy wishes to measure the entropy of each\nspatial frequency and oriented \ufb01lter response of an image.\nScale Invariance: The Scale Invariant Clutter Model proposed by Farid and Bravo [4] uses graph-\nbased segmentation [8] at multiple k scales. A scale invariant clutter representation is given by the\npower law coef\ufb01cient that matches the decay of number of regions with the adjusted scale parameter.\n\n2\n\nColor MapPyramid 1Pyramid 2Pyramid 3Max operationContrast MapPyramid 1Pyramid 2Pyramid 3Max operationOrientation MapPyramid 1Pyramid 2Pyramid 3Max operationFeature Congestion mapInput Image\fFigure 2: Experiment 1: Forced Fixation Search \ufb02ow diagram. A naive observer begins by \ufb01xating\nthe image at a location that is either 1, 4, 9 or 15 deg away from the target (the observer is not aware\nof the possible eccentricities). After \ufb01xating on the image for a variable amount of time (100, 200,\n400, 900 or 1600 ms), the observer must make a decision on target detection.\n\nProtoObject Segmentation: ProtoObject Segmentation proposes an unsupervised metric for clut-\nter scoring [27, 28]. The model begins by converting the image into HSV color space, and then\nproceeds to segment the image through superpixel segmentation [17, 16, 1]. After segmentation,\nmean-shift [11] is applied on all cluster (superpixel) medians to calculate the \ufb01nal amount of repre-\nsentative colors present in the image. Next, superpixels are merged with one another contingent on\nthem being adjacent, and being assigned to the same mean-shift HSV cluster. The \ufb01nal score is a\nratio between initial number of superpixels and \ufb01nal number of superpixels.\nCrowding Model: The Crowding Model developed by van der Berg et al. [26] is the only model\nto have used losses in the periphery due to crowding as a clutter metric. It decomposes the image\ninto 3 different scales in CIELab color space. It then produces 6 different orientation maps for each\nscale given the luminance channel; a contrast map is also obtained by difference of Gaussians on the\npreviously mentioned channel. All feature maps are then pooled with Gaussian kernels that grow\nlinearly with eccentricity, KL-divergence is then computed between the pre and post pooling feature\nmaps to get information loss coef\ufb01cients, all coef\ufb01cients are averaged together to produce a \ufb01nal\nclutter score. We will discuss the differences of this model to ours in the Discussion (Section 5).\nTexture Tiling Model: The Texture Tiling Model (TTM) is a recent perceptual model that accounts\nfor losses in the periphery [22, 13] through psyhophysical experiments modelling visual search [7]:\nfeature search, conjunction search, con\ufb01guration search and asymmetric search. In essence, the\nMongrels proposed by Rosenholtz et al.\nthat simulate peripheral losses are very similar to the\nMetamers proposed by Freeman & Simoncelli [9]. We do not include comparisons to the TTM model\nsince it requires additional psychophysics on the Mongrel versions of the images.\n\n2.2 Clutter Metrics\nGlobal Clutter Score: The most basic clutter metric used in clutter research is the original clutter\nscore that every model computes over the entire image. Edge Density & Proto-Object Segmentation\noutput a ratio, while Subband Entropy and Feature Congestion output a score. However, Feature\nCongestion is the only model that outputs a dense pixelwise clutter map before computing a global\nscore (Fig. 1). Thus, we use Feature Congestion clutter maps for our foveated clutter model.\nClutter ROI: The second most used clutter metric is ROI (Region of Interest)-based, as shown in the\nwork of Asher et al. [3]. This metric is of interest when an observer is engaging in target search, vs\nmaking a human judgement (Ex: \u201crate the clutter of the following scenes\u201d).\n\n2.3 Clutter Evaluations\nHuman Clutter Judgements: Multiple studies of clutter, correlate their metrics with rankings/ratings\nof clutter provided by human participants. Ideally, if clutter model A is better than clutter model B,\nthen the correlation of model scores and human rankings/ratings should be higher for model A than\nfor model B. [28, 19, 26]\n\n3\n\n++Fixation: 500 - 1000 ms(1 of 4 locations)Stimulus: 100, 200, 400, 900, 1600 ms(Remain \ufb01xated)Task 1 response(unlimited time, no feedback)\fResponse Time: Highly cluttered images will require more time for target search, hence more time\nto arrive to a decision of target present/absent. Under the previous assumption, a high correlation\nvalue between response time and clutter score are a good sign for a clutter model. [24, 4, 26, 3, 12]\nTarget Detection (Hit Rate, False Alarms, Performance): In general, when engaging in target\nsearch for a \ufb01xed amount of time across all trial conditions, an observer will have a lower hit rate and\nhigher false alarm rate for a highly cluttered image than an uncluttered image. [24, 3, 12]\n\n3 Methods & Experiments\n3.1 Experiment 1: Forced Fixation Search\n\nA total of 13 subjects participated in a Forced Fixation Search experiment where the goal was to\ndetect a target in the subject\u2019s periphery and identify if there was a target (person) present or absent.\nParticipants had variable amounts of time (100, 200, 400, 900, 1600 ms) to view each clip that was\npresented in a random order at a variable degree of eccentricities that the subjects were not aware of\n(1 deg, 4 deg, 9 deg, 15 deg). They were then prompted with a Target Detection rating scale where\nthey had to rate from a scale from 1-10 by clicking on a number reporting how con\ufb01dent they were\non detecting the target. Participants have unlimited time for making their judgements, and they did\nnot take more than 10 seconds per judgment. There was no response feedback after each trial. Trials\nwere aborted when subjects broke \ufb01xation outside of a 1 deg radius around the \ufb01xation cross.\nEach subject did 12 sessions that consisted of 360 unique images. Every session also presented the\nimages with aerial viewpoints from different vantage points (Example: session 1 had the target at\n12 o\u2019clock - North, while session 2 had the target at 3 o\u2019clock - East). To control for any \ufb01xational\nbiases, all subjects had a unique \ufb01xation point for every trial for the same eccentricity values. All\nimages were rendered with variable levels of clutter. Each session took about an hour to complete.\nThe target was of size 0.5 deg \u00d70.5 deg, 1 deg \u00d71 deg, 1.5 deg \u00d71.5 deg, depending on zoom level.\nFor our analysis, we only used the low zoom and 100 ms time condition since there was less ceiling\neffects across all eccentricities.\nStimuli Creation: A total of 273 videos were created each with a total duration of 120 seconds,\nwhere a \u2018birds eye\u2019 point-of-view camera rotated slowly around the center. While the video was\nin rotating motion, there was no relative motion between any parts of the video. From the original\nvideos, a total of 360 \u00d7 4 different clips were created. Half of the clips were target present, while the\nother half were target absent. These short and slowly rotating clips were used instead of still images\nin our experiment, to simulate slow real movement from a pilot point of view. All clips were shown\nto participants in random order.\nApparatus: An EyeLink 1000 system (SR Research) was used to collect Eye Tracking data at a\nfrequency of 1000Hz. Each participant was at a distance of 76 cm from a LCD screen on gamma\ndisplay, so that each pixel subtended a visual angle of 0.022 deg /px. All video clips were rendered\nat 1024 \u00d7 760 pixels (22.5 deg \u00d716.7 deg) and a frame rate of 24fps. Eye movements with velocity\nover 22 deg /s and acceleration over 4000 deg /s2 were quali\ufb01ed as saccades. Every trial began with\na \ufb01xation cross, where each subject had to \ufb01xate the cross with a tolerance of 1 deg.\n\n4 Foveated Feature Congestion\nA regular Feature Congestion clutter score is computed by taking the mean of the Feature Congestion\nmap of the image or of a target ROI [12]. We propose a Foveated Feature Congestion (FFC) model\nthat outputs a score which takes into account two main terms: 1) a regular Feature Congestion (FC)\nscore and 2) a Peripheral Integration Feature Congestion (PIFC) coef\ufb01cient that accounts the lower\nspatial resolution of the visual periphery that are detrimental for target detection. The \ufb01rst term is\nindependent of \ufb01xation, while the second term will act as a non-linear gain that will either reduce or\namplify the clutter score depending on \ufb01xation distance from the target.\nIn this Section we will explain how to compute a PIFC, which will require creating a human-like\nperipheral architecture as explained in Section 4.1. We then present our Foveated Feature Congestion\n(FFC) clutter model in Section 4.2. Finally, we conclude by making a quantiative evaluation of the\nFFC (Section 4.3) in its ability to predict variations of target detectability across images and retinal\neccentricity of the target.\n\n4\n\n\f(a) Top: gn(e) function. Bottom: hn(\u03b8) function.\n\n(b) Peripheral Architecture.\n\nFigure 3: Construction of a Peripheral Architecture a la Freeman & Simoncelli [9] using the functions\ndescribed in Section 4.1 are shown in Fig. 3(a). The blue region in the center of Fig. 3(b), represents\nthe fovea where all information is preserved. Outer regions (in red), represent different parts of the\nperiphery at multiple eccentricities.\n4.1 Creating a Peripheral Architecture\nWe used the Piranhas Toolkit to create a Freeman and Simoncelli [9] peripheral architecture. This\nbiologically inspired model has been tested and used to model V1 and V2 responses in human and\nnon-human primates with high precision for a variety of tasks [20, 10, 18, 2]. It is described by a\nset of pooling (linear) regions that increase in size with retinal eccentricity. Each pooling region is\nseparable with respect to polar angle hn(\u03b8) and log eccentricity gn(e), as described in Eq. 2 and\nEq. 3 respectively. These functions are multiplied for every angle and eccentricity (\u03b8, e) and are\nplotted in log polar coordinates to create the peripheral architecture as seen in Fig. 3.\n\n));\n\nt0\n\nf (x) =\n\n1;\n\u2212cos2( \u03c0\n\n\uf8f1\uf8f4\uf8f2\uf8f4\uf8f3cos2( \u03c0\n2 ( x\u2212(t0\u22121)/2\n2 ( x\u2212(1+t0)/2\n(cid:16) \u03b8 \u2212 (w\u03b8n + w\u03b8(1\u2212t0)\n(cid:16) log(e) \u2212 [log(e0) + we(n + 1)]\n(cid:17)\n\nhn(\u03b8) = f\n\nt0\n\nw\u03b8\n\n)) + 1;\n\n(cid:17)\n\n)\n\n2\n\ngn(e) = f\n\nwe\n\n\u2212(1 + t0)/2 < x \u2264 (t0 \u2212 1)/2\n(t0 \u2212 1)/2 < x \u2264 (1 \u2212 t0)/2\n(1 \u2212 t0)/2 < x \u2264 (1 + t0)/2\n\n; w\u03b8 =\n\n2\u03c0\nN\u03b8\n\n; n = 0, ..., N\u03b8 \u2212 1\n\n(1)\n\n(2)\n\n; we =\n\nlog(er) \u2212 log(e0)\n\nNe\n\n; n = 0, ..., Ne \u2212 1\n\n(3)\n\nThe parameters we used match a V1 architecture with a scale of s = 0.25 , a visual radius of\ner = 24 deg, a fovea of 2 deg, with e0 = 0.25 deg 2, and t0 = 1/2. The scale de\ufb01nes the number of\neccentricities Ne, as well as the number of polar pooling regions N\u03b8 from (cid:104)0, 2\u03c0].\nAlthough observers saw the original stimuli at 0.022 deg/pixel, with image size 1024 \u00d7 760; for\nmodelling purposes: we rescaled all images to half their size so the peripheral architecture could \ufb01t\nall images under any \ufb01xation point. To preserve stimuli size in degrees after rescaling our images,\nour foveal model used an input value of 0.044 deg/pixel (twice the value of experimental settings).\nResizing the image to half its size also allows the peripheral architecture to consume less CPU\ncomputation time and memory.\n\n4.2 Creating a Foveated Feature Congestion Model\nIntuitively, a foveated clutter model that takes into account target search should score very low when\nthe target is in the fovea (near zero), and very high when the target is in the periphery. Thus, an\nobserver should \ufb01nd a target without dif\ufb01culty, achieving a near perfect hit rate in the fovea, yet\nthe observer should have a lower hit rate in the periphery given crowding effects. Note that in the\n\n2We remove regions with a radius smaller than the foveal radius, since there is no pooling in the fovea.\n\n5\n\n369121518212401.00.90.80.70.60.50.40.30.20.10.0Eccentricity in degrees away from fovea1.00.90.80.70.60.50.40.30.20.10.0Function valueFunction valuePolar angle referenced from fovea2\u03c0\u03c00\u03c0/2\u03c0/43\u03c0/45\u03c0/43\u03c0/27\u03c0/4Function Value0.00.10.20.30.40.50.60.70.80.91.006121824-6-12-18-24Eccentricity (degrees)06121824-6-12-18-24Eccentricity (degrees)\fFigure 4: Foveated Feature Congestion \ufb02ow diagram: In this example, the point of \ufb01xation is at\n15 deg away from the target (bottom right corner of the input image). A Feature Congestion map of\nthe image (top \ufb02ow), and a Foveated Feature Congestion map (bottom \ufb02ow) are created. The PIFC\ncoef\ufb01cient is computed around an ROI centered at the target (bottom \ufb02ow; zoomed box). The Feature\nCongestion score is then multiplied by the PIFC coef\ufb01cient, and the Foveated Feature Congestion\nscore is returned. Sample PIFC\u2019s across eccentricities can be seen in the Supplementary Material.\n\nperiphery, not only should it be harder to detect a target, but it is also likely to confuse the target\nwith another object or region af\ufb01ne in shape, size, texture and/or pixel value (false alarms). Under\nthis assumption, we wish to modulate a clutter score (Feature Congestion) by a multiplicative factor,\ngiven the target and \ufb01xation location. We call this multiplicative term: the PIFC coef\ufb01cient, which is\nde\ufb01ned over a 6 deg \u00d76 deg ROI around the location of target t. The target itself was removed when\nprocessing the clutter maps since it indirectly contributes to the ROI clutter score [3]. The PIFC aims\nat quantifying the information loss around the target region due to peripheral processing.\nTo compute the PIFC, we use the before mentioned ROI, and calculate a mean difference from the\nfoveated clutter map with respect to the original non-foveated clutter map. If the target is foveated,\nthere should be little to no difference between a foveated map and the original map, thus setting the\nPIFC coef\ufb01cient value to near zero. However, as the target is farther away from the fovea, the PIFC\ncoef\ufb01cient should be higher given pooling effects in the periphery. To create a foveated map, we use\nFeature Congestion and apply max pooling on each pooling region after the peripheral architecture\nhas been stacked on top of the Feature Congestion map. Note that the FFC map values will depend on\nthe \ufb01xation location as shown in Fig. 4. The PIFC map is the result of subtracting the foveated map\nfrom the unfoveated map in the ROI, and the score is a mean distance value between these two maps\n(we use L1-norm, L2-norm or KL-divergence). Computational details can be seen in Algorithm 1.\nThus, we can resume our model in Eq. 4:\n\nFFCf,t\n\nI = FCI \u00d7 PIFCf\n\nROI(t)\n\n(4)\n\nwhere FCI is the Feature Congestion score [24] of image I which is computed by the mean of the\nFeature Congestion map RF C, and FFCf,t\nis the Foveated Feature Congestion score of the image I,\nI\ndepending on the point of \ufb01xation f and the location of the target t.\n\n4.3 Foveated Feature Congestion Evaluation\nA visualization of each image and its respective Hit Rate vs Clutter Score across both foveated and\nunfoveated models can be visualized in Fig 5. Qualitatively, it shows the importance of a PIFC\nweighting term to the total image clutter score when performing our forced \ufb01xation search experiment.\nFuthermore, a quantitative bootstrap correlation analysis comparing classic metrics (Image, Target,\nROI) against foveal metrics (FFC1, FFC2 and FFC3) shows that hit rate vs clutter scores are greater\nfor those foveated models with a PIFC: Image: (r(44) = \u22120.19 \u00b1 0.13, p = 0.0774), Target:\n(r(44) = \u22120.03 \u00b1 0.14, p = 0.4204), ROI: (r(44) = \u22120.25 \u00b1 0.14, p = 0.0392), FFC1 (L1-norm):\n\n6\n\nFeature CongestionFoveated Feature CongestionFeature mapsInput ImageFeature CongestionScorePIFC Coe\ufb03cientFoveatedFeature Congestion ScoreTotal mapDi\ufb00erence MapPIFC map\fAlgorithm 1 Computation of Peripheral Integration Feature Congestion (PIFC) Coef\ufb01cient\n1: procedure COMPUTE PIFC OF ROI OF IMAGE I ON FIXATION f\n2: Create a Peripheral Architecture A : (N\u03b8, Ne)\n3: Offset image I in Peripheral Architecture by \ufb01xation f : (fx, fy).\n4: Compute Regular Feature Congestion (RF C ) map of image I\nF C ) \u2282 IR2\n5: Set Peripheral Feature Congestion (P f\n+ map to zero.\n6: Copy Feature Congestion values in fovea r0: P f\nF C (r0) = (RF C (r0))\nfor each pooling region ri overlapping I, s.t. 1 \u2264 i \u2264 N\u03b8 \u00d7 Ne do\n7:\n8:\n9:\n10:\n11: Crop PIFC map to ROI: pf\n12: Crop FC map to ROI: rF C = RF C (ROI)\n13: Choose Distance metric D between rF C and pf\n14: Compute Coef\ufb01cient = mean(D(rF C , pf\n15: return Coef\ufb01cient\n16: end procedure\n\nGet Regular Feature Congestion (FC) values in ri\nSet Peripheral FC value to max Regular FC value: P f\n\nF C (ri) = max(RF C (ri))\n\nF C = P f\n\nF C (ROI)\n\nend for\n\nF C map\n\nF C ))\n\n(a) Feature Congestion with image ID\u2019s\n\n(b) Foveated Feature Congestion with image ID\u2019s\n\nFigure 5: Fig. 5(a) shows the current limitations of global clutter metrics when engaging in Forced\nFixation Search. The same image under different eccentricities has the same clutter score yet possess\na different hit rate. Our proposed foveated model (Fig. 5(b)), compensates this difference through the\nPIFC coef\ufb01cient, and modulates each clutter score depending on \ufb01xation distance from target.\n\n(r(44) = \u22120.82\u00b1 0.04, p < 0.0001), FFC2 (L2-norm): (r(44) = \u22120.79\u00b1 0.06, p < 0.0001), FFC3\n(KL-divergence): (r(44) = \u22120.82 \u00b1 0.04, p < 0.0001).\nNotice that there is no difference in correlations between using the L1-norm, L2-norm or KL-\ndivergence distance for each model in terms of the correlation with hit rate. Table ??(Supp. Mat.)\nalso shows the highest correlation with a 6 \u00d7 6 deg ROI window across all metrics. Note that the\nsame analysis can not be applied to false alarms, since it is indistinguishable to separate a false\nalarm at 1 deg from 15 deg (the target is not present, so there is no real eccentricity away from\n\ufb01xation). However as mentioned in the Methods section, \ufb01xation location for target absent trials\nin the experiment were placed assuming a location from its matching target present image. It is\nimportant that target present and absent \ufb01xations have the same distributions for each eccentricity.\n\n5 Discussion\nIn general, images that have low Feature Congestion have less gain in PIFC coef\ufb01cients as eccentricity\nincreases. While images with high clutter have higher gain in PIFC coef\ufb01cients. Consequently, the\ndifference of FFC between different images increases nonlinearly with eccentricity, as observed in\nFig. 6. This is our main contribution, as these differences in clutter score as a function of eccentricity\ndo not exist for regular Feature Congestion, and these differences in scores should be able to correlate\nwith human performance in target detection.\n\n7\n\n(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)1 deg4 deg9 deg15 degFeature Congestion2.002.252.502.753.003.251.00.90.80.70.60.50.40.30.20.10.0Hit Rate(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)1 deg4 deg9 deg15 degFoveated Feature Congestion036912151.00.90.80.70.60.50.40.30.20.10.0Hit Rate\f(a) FC vs Eccentricity.\n\nFigure 6: Feature Congestion (FC) vs Foveated Feature Congestion (FFC). In Fig. 6(a) we see\nthat clutter stays constant across different eccentricities for a forced \ufb01xation task. Our FFC model\n(Fig. 6(c)) enriches the FC model, by showing how clutter increases as a function of eccentricity\nthrough the PIFC in Fig. 6(b).\n\n(b) PIFC (L1-norm) vs Eccentricity.\n\n(c) FFC vs Eccentricity.\n\nFigure 7: Dense and Foveated representations of multiple models assuming a center point of \ufb01xation.\n\nOur model is also different from the van der Berg et al. [26] model since our peripheral architecture\nuses: a biologically inspired peripheral architecture with log polar regions that provide anisotropic\npooling [15] rather than isotropic gaussian pooling as a linear function of eccentricity [26, 6]; we used\nregion-based max pooling for each \ufb01nal feature map instead of pixel-based mean pooling (gaussians)\nper each scale (which allows for stronger differences); this \ufb01nal difference also makes our model\ncomputationally more ef\ufb01cient running at 700ms per image, vs 180s per image for the Crowding\nmodel (\u00d7250 speed up). A home-brewed Crowding Model applied to our forced \ufb01xation experiment\nresulted in a correlation of (r(44) = \u22120.23 \u00b1 0.13, p = 0.0469), equivalent to using a non foveated\nmetric such as regular Feature Congestion (r(44) = \u22120.19 \u00b1 0.13, p = 0.0774).\nWe \ufb01nally extended our model to create foveated(FoV) versions of Edge Density(ED) [19], Sub-\nband Entropy(SE) [25, 24] and ProtoObject Segmentation(PS) [28] showing that correlations for\nall foveated versions are stronger than non-foveated versions for the same task: rED = \u22120.21,\nrED+F oV = \u22120.76, rSE = \u22120.19, rSE+F oV = \u22120.77, rP S = \u22120.30, but rP S+F oV = \u22120.74.\nNote that the highest foveated correlation is FC: rF C+F oV = \u22120.82, despite rF C = \u22120.19 under a\nL1-norm loss of the PIFC. Feature Congestion has a dense representation, is more bio-inspired than\nthe other models, and outperforms in the periphery. See Figure 7. An overview of creating dense and\nfoveated versions for previously mentioned models can be seen in the Supp. Material.\n6 Conclusion\nIn this paper we have introduced a peripheral architecture that shows detrimental effects of different\neccentricities on target detection, that helps us model clutter for forced \ufb01xation experiments. We\nintroduced a forced \ufb01xation experimental design for clutter research; we de\ufb01ned a biologically\ninspired peripheral architecture that pools features in V1; and we stacked the previously mentioned\nperipheral architecture on top of a Feature Congestion map to create a Foveated Feature Congestion\n(FFC) model \u2013 and we extended this pipeline to other clutter models. We showed that the FFC\nmodel better explains loss in target detection performance as a function of eccentricity through the\nintroduction of the Peripheral Integration Feature Congestion coef\ufb01cient which varies non linearly.\n\n8\n\n(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)3.002.752.502.252.0014915EccentricityFeature Congestion(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)14915EccentricityPIFC coe\ufb03cient432105(1)(2)(3)(4)(25)(26)(27)(28)(50)(51)(52)0369121514915EccentricityFoveated Feature CongestionFeature CongestionEdge DensitySubband EntropyProtoObject SegmentationDenseRepresentationFoveatedRepresentation\fAcknowledgements\nWe would like to thank Miguel Lago and Aditya Jonnalagadda for useful proof-reads and revisions,\nas well as Mordechai Juni, N.C. Puneeth, and Emre Akbas for useful suggestions. This work has been\nsponsored by the U.S. Army Research Of\ufb01ce and the Regents of the University of California, through\nContract Number W911NF-09-0001 for the Institute for Collaborative Biotechnologies, and that the\ncontent of the information does not necessarily re\ufb02ect the position or the policy of the Government or\nthe Regents of the University of California, and no of\ufb01cial endorsement should be inferred.\n\nReferences\n[1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. S\u00fcsstrunk. Slic superpixels. Technical report,\n\n2010.\n\npreprint arXiv:1408.0814, 2014.\n\n[2] E. Akbas and M. P. Eckstein. Object detection through exploration with a foveated visual \ufb01eld. arXiv\n\n[3] M. F. Asher, D. J. Tolhurst, T. Troscianko, and I. D. Gilchrist. Regional effects of clutter on human target\n\ndetection performance. Journal of vision, 13(5):25\u201325, 2013.\n\n[4] M. J. Bravo and H. Farid. A scale invariant measure of clutter. Journal of Vision, 8(1):23\u201323, 2008.\n[5] P. J. Burt and E. H. Adelson. The laplacian pyramid as a compact image code. Communications, IEEE\n\nTransactions on, 31(4):532\u2013540, 1983.\n\n[6] R. Dubey, C. S. Soon, and P.-J. B. Hsieh. A blurring based model of peripheral vision predicts visual\n\nsearch performances. Journal of Vision, 14(10):935\u2013935, 2014.\n\n[7] M. P. Eckstein. Visual search: A retrospective. Journal of Vision, 11(5):14\u201314, 2011.\n[8] P. F. Felzenszwalb and D. P. Huttenlocher. Ef\ufb01cient graph-based image segmentation. International\n\nJournal of Computer Vision, 59(2):167\u2013181, 2004.\n\n[9] J. Freeman and E. P. Simoncelli. Metamers of the ventral stream. Nature neuroscience, 14(9):1195\u20131201,\n\n2011.\n\n1991.\n\n[10] J. Freeman, C. M. Ziemba, D. J. Heeger, E. P. Simoncelli, and J. A. Movshon. A functional and perceptual\n\nsignature of the second visual area in primates. Nature neuroscience, 16(7):974\u2013981, 2013.\n\n[11] K. Fukunaga and L. D. Hostetler. The estimation of the gradient of a density function, with applications in\n\npattern recognition. Information Theory, IEEE Transactions on, 21(1):32\u201340, 1975.\n\n[12] J. M. Henderson, M. Chanceaux, and T. J. Smith. The in\ufb02uence of clutter on real-world scene search:\n\nEvidence from search ef\ufb01ciency and eye movements. Journal of Vision, 9(1):32\u201332, 2009.\n\n[13] S. Keshvari and R. Rosenholtz. Pooling of continuous features provides a unifying account of crowding.\n\nJournal of Vision, 16(39), 2016.\n\n[14] M. S. Landy and J. R. Bergen. Texture segregation and orientation gradient. Vision research, 31(4):679\u2013691,\n\n[15] D. M. Levi. Visual crowding. Current Biology, 21(18):R678\u2013R679, 2011.\n[16] A. Levinshtein, A. Stere, K. N. Kutulakos, D. J. Fleet, S. J. Dickinson, and K. Siddiqi. Turbopixels: Fast\nsuperpixels using geometric \ufb02ows. Pattern Analysis and Machine Intelligence, IEEE Transactions on,\n31(12):2290\u20132297, 2009.\n\n[17] M.-Y. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa. Entropy rate superpixel segmentation. In Computer\n\nVision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 2097\u20132104. IEEE, 2011.\n\n[18] J. A. Movshon and E. P. Simoncelli. Representation of naturalistic image structure in the primate visual\ncortex. In Cold Spring Harbor symposia on quantitative biology, volume 79, pages 115\u2013122. Cold Spring\nHarbor Laboratory Press, 2014.\n\n[19] A. Oliva, M. L. Mack, M. Shrestha, and A. Peeper. Identifying the perceptual dimensions of visual\n\ncomplexity of scenes.\n\n[20] J. Portilla and E. P. Simoncelli. A parametric texture model based on joint statistics of complex wavelet\n\ncoef\ufb01cients. International Journal of Computer Vision, 40(1):49\u201370, 2000.\n\n[21] R. Pramod and S. Arun. Do computational models differ systematically from human object perception?\nIn Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1601\u20131609,\n2016.\n\n[22] R. Rosenholtz, J. Huang, A. Raj, B. J. Balas, and L. Ilie. A summary statistic representation in peripheral\n\nvision explains visual search. Journal of vision, 12(4):14\u201314, 2012.\n\n[23] R. Rosenholtz, Y. Li, J. Mans\ufb01eld, and Z. Jin. Feature congestion: a measure of display clutter. In\nProceedings of the SIGCHI conference on Human factors in computing systems, pages 761\u2013770. ACM,\n2005.\n\n[24] R. Rosenholtz, Y. Li, and L. Nakano. Measuring visual clutter. Journal of vision, 7(2):17\u201317, 2007.\n[25] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: A \ufb02exible architecture for multi-scale\n\nderivative computation. In icip, page 3444. IEEE, 1995.\n\n[26] R. van den Berg, F. W. Cornelissen, and J. B. Roerdink. A crowding model of visual clutter. Journal of\n\nVision, 9(4):24\u201324, 2009.\n\n[27] C.-P. Yu, W.-Y. Hua, D. Samaras, and G. Zelinsky. Modeling clutter perception using parametric proto-\n\nobject partitioning. In Advances in Neural Information Processing Systems, pages 118\u2013126, 2013.\n\n[28] C.-P. Yu, D. Samaras, and G. J. Zelinsky. Modeling visual clutter perception using proto-object segmenta-\n\ntion. Journal of vision, 14(7):4\u20134, 2014.\n\n9\n\n\f", "award": [], "sourceid": 1440, "authors": [{"given_name": "Arturo", "family_name": "Deza", "institution": "UCSB"}, {"given_name": "Miguel", "family_name": "Eckstein", "institution": "UCSB"}]}