{"title": "Feature Transitions with Saccadic Search: Size, Color, and Orientation Are Not Alike", "book": "Advances in Neural Information Processing Systems", "page_first": 2523, "page_last": 2531, "abstract": "Size, color, and orientation have long been considered elementary features whose attributes are extracted in parallel and available to guide the deployment of attention. If each is processed in the same fashion with simply a different set of local detectors, one would expect similar search behaviours on localizing an equivalent flickering change among identically laid out disks. We analyze feature transitions associated with saccadic search and find out that size, color, and orientation are not alike in dynamic attribute processing over time. The Markovian feature transition is attractive for size, repulsive for color, and largely reversible for orientation.", "full_text": "Feature Transitions with Saccadic Search:\nSize, Color, and Orientation Are Not Alike\n\nStella X. Yu\n\nComputer Science Department\n\nBoston College\n\nChestnut Hill, MA 02467\nstella.yu@bc.edu\n\nAbstract\n\nSize, color, and orientation have long been considered elementary features whose\nattributes are extracted in parallel and available to guide the deployment of atten-\ntion. If each is processed in the same fashion with simply a different set of local\ndetectors, one would expect similar search behaviours on localizing an equivalent\n\ufb02ickering change among identically laid out disks. We analyze feature transitions\nassociated with saccadic search and \ufb01nd out that size, color, and orientation are not\nalike in dynamic attribute processing over time. The Markovian feature transition\nis attractive for size, repulsive for color, and largely reversible for orientation.\n\n1\n\nIntroduction\n\nSize, color, and orientation have long been considered elementary features [14] that are available to\nguide attention and visual search [17]. Their special status in early visual processing is supported\nby a large volume of psychophysical evidence on how they can mediate effortless texture segre-\ngation, recombine in illusory conjunctions, and pop out in feature search [13, 16]. There is also\nphysiological evidence on how these features could be extracted with separate sets of dedicated de-\ntectors working in parallel across the entire space [6]. Consequently, in schematic diagrams as well\nas computational models on visual saliency [13, 3, 12], image segmentation [5], object recognition\n[10, 4, 20], and scene classi\ufb01cation [12, 11], it is routinely assumed that features at all scales, colors,\nand orientations are processed and available simultaneously.\n\nWhile size, color, and orientation are alike at parallel local detections across space, they may not be\nalike at serial deployment of attention across time. We investigate this issue in a gaze-tracked visual\nsearch experiment which often requires multiple saccades for the subject to locate the target (Fig. 1).\n\ndisk1\n\ndisk2\n\ndisk1\n\ndisk1\n\ndisk2\n\ndisk1\n\ndisk2\n\ndisk1\n\ndisk2\n\ndisk2\n\ndisk2\n\ndisk1\n\ndisk1\n\ndisk1\n\ndisk2\n\ndisk2\n\ndisk2\n\ndisk2\n\ndisk2\n\ndisk2\n\ndisk1\n\ndisk1\n\ndisk1\n\ndisk1\n\nFigure 1: Two kinds of disks are uniformly randomly distributed in a \ufb01xed regular layout. Only one\ndisk changes its kind during a repeated \ufb02ickering presentation. For the same size of change, does it\nmatter to visual search whether the two kinds of disks are rendered in size, color, or orientation?\n\n1\n\n\flocalization\nmouse click\n\ndetection\nmouse click\n\n\ufb02icker stimulus\n120ms per frame\n\n\ufb01xation\n1000ms\n\nFigure 2: Each trial goes through \ufb01xation, stimulus, detection, and localization stages. A \ufb01xation\ndot is displayed for 1 second before the onset of the \ufb02icker stimulus, with disk image 1, blank, disk\nimage 2, blank repeatedly presented for 120ms each. The subject issues a mouse click as soon as he\ndetects the change, and the the last seen disk image remains on till he clicks the disk of change. A\nblank screen is then displayed for 2 seconds before the start of next trial.\n\nWe present two kinds of disks in a \ufb01xed regular layout in a \ufb02icker paradigm, and the subject\u2019s task\nis to locate the only disk that changes its kind (Fig. 2). The paradigm induces change blindness,\nwhere a large difference between two images becomes strikingly dif\ufb01cult to detect with a blank\nin-between, even with repeated presentations [9, 2, 8, 18] . Without the blank, the change elicits a\nsingular motion signal which automatically draws the viewer\u2019s attention to the location of change;\nwith the blank, the motion signal is disrupted and overwhelmed by those motion transients between\neither image and the blank, effectively masking the location of change.\n\nIf the magnitude of change is comparable across feature dimensions, does it matter whether the disks\nare rendered in size, color, or orientation? That is, does visual search vary according to whether the\nsame array of disks are: 1) small and large, 2) black and white, or 3) horizontal and vertical disks? If\nsize, color, and orientation are processed in the same fashion with dedicated local detectors operating\nin parallel across space, then the detector responses are identical spatially at any time instance in the\n3 scenarios. The question is whether the deployment of attention, i.e. deciding what disks to look at\nnext and how to look, depend on which \ufb01lters produce these responses.\n\nNote that our stimuli decouple the target of feature search from visual saliency in the space. Our\ntarget is de\ufb01ned not by one of the attributes as done in static search displays [14, 17], but by the tem-\nporal change of the attribute. At any time instance, the attributes are uniformly random everywhere,\nso the target cannot draw attention to itself, but has to be discovered with search. The effect of the\nattribute itself on attention can thus be studied without the confounding factor of saliency.\n\nThe focus of this paper is on how the feature space is navigated with saccadic search. We formulate\na feature descriptor for each \ufb01xation, based on which we develop a Markovian feature transition\nmodel for saccadic eye movements. Our model reveals that feature transition is attractive for size,\nrepulsive for color, and largely reversible for orientation, suggesting that size, color, and orientation\nare not alike in dynamic attribute processing over time.\n\n2 Gaze-Tracked Change Blindness Experiment\n\nWe investigate whether visual search for attribute change differs when the stimulus is rendered in\nsize, color, or orientation with the same layout. We establish in a separate experiment that the change\nis equivalent among dimensions: Detection is equally fast and accurate for a change between two\nattributes across dimensions and for a no-change within each dimension across two attributes.\n\n2\n\n\f\ufb02icker stimuli\n\nsize\n\ncolor\n\norientation\n\n1\n\n2\n\n1\n\n2\n\n1\n\n2\n\nidentical\n\nspatial layout\n\n1st image\n\n1 2 1 1 2 1\n2 1 2 2 2 1\n1 1 1 2 2 2\n2 2 2 1 1 1\n\n2nd image\n\n1 2 1 1 2 1\n2 1 2 2 2 1\n1 1 1 2 2 2\n2 2 2 1 1 2\n\nFigure 3: Flicker stimuli are rendered in the same layout but separately in size, color, and orientation.\nThe 1st image contains 12 attribute-1 disks and 12 attribute-2 disks in a uniformly random spatial\ndistribution. The 2nd image is identical to the 1st image except that 1 disk changes its attribute. It\ncould be any of the 24 disks. The disk of change here is circled in both layout matrices.\n\nStimuli. There are 2 kinds of disks for each dimension. Size has 2 radii, 0.45\u25e6 for small and 1.35\u25e6\nfor large. Color has 2 values, 0.3 for black and 0.7 for white on 0 \u2212 1 value scale. Orientation has 2\nangles, 0\u25e6 for horizontal and 90\u25e6 for vertical, with disk radii 0.45\u25e6\u00d71.35\u25e6 along two directions. Both\nsize and orientation stimuli are of black value 0.3. Color stimuli are of medium disk radius 0.9\u25e6.\nThe background is of neutral gray value 0.5. Here we restrict color to luminance only, as color hue\nprocessing is uniquely foveal, which would greatly confound explanations for search behaviours.\n\nThe \ufb02icker stimuli for the 3 dimensions are rendered in an identical spatial layout. Each stimulus\ninvolves a pair of 24-disk images which are identical except for one disk. These 24 disks are located\ncentrally on a regular 4 \u00d7 6 grid, with an inter-disk distance of 5.4\u25e6, which is 4 times the maximal\nradius a disk could assume. The 1st image of the stimulus consists of uniformly randomly distributed\n12 attribute-1 disks and 12 attribute-2 disks. The 2nd image changes one of the 24 disks (Fig. 3).\n\nApparatus. The display extends 25.6\u25e6 \u00d7 34.1\u25e6 at a viewing distance of 5 meters. Gaze data are\nrecorded with a Tobii x50 eye tracker at 50Hz sampling rate and 0.5\u25e6-0.7\u25e6 accuracy. Two clock-\nsynced 3.2GHz Dell Precision computers control the eye tracker and the stimulus presentation re-\nspectively. The eye tracker is calibrated at the beginning of each data recording session.\n\nProcedure. Each trial begins with a \ufb01xation dot of radius 0.5\u25e6 shown at the center of the display\nfor 1 second. The \ufb02icker stimulus, in the sequence of disk image 1, blank, disk image 2, and blank,\nis then repeatedly presented for 120 ms each. Once the subject issues a mouse click to indicate his\ndetection of the change, the last disk image remains on till the location of change is clicked (Fig. 2).\n\nThere are 3 sets of random stimuli run in 3 sessions. Each session has 3 blocks of 24 trials each, one\ntrial for one change location and one block for one dimension. The trials are completely randomized\nin a block, and the blocks are also randomized and balanced among the subjects.\n\nThe subject is told that two images differing in only one disk are presented repeatedly. His task is to\ndetect and localize the changing disk. He should issue a click as soon as he detects the change. The\n\ufb02ickering then stops at the last seen disk image, and he should click the disk of change.\n\nParticipants. A total of 24 naive subjects with normal or corrected-to-normal vision participated\nafter providing informed consent and were compensated with cash. 11, 8, and 5 subjects took part\nin one, two and all three sessions respectively.\n\n3\n\n\f3 Performance Analysis\n\nWe evaluate the task performance on both the accuracy measured by the percentage of correct change\nlocalizations and the reaction time measured from the \ufb02icker stimulus onset to the subject\u2019s \ufb01rst\nmouse click for indicating a detection. Fig. 4 shows that localizing an equivalent change among\nidentically laid out items yields signi\ufb01cantly different performances in the 3 dimensions. It is fastest\nand most accurate in size, less so in orientation, and least in color.\n\n)\n\n%\n\n(\n \ny\nc\na\nr\nu\nc\nc\na\n\n99\n\n98\n\n97\n\n96\n\n95\n\n94\n\n93\n\nsize\n\norientation\n\ncolor\n\n2.3\n\n2.5\n\n2.7\n\n2.9\n\n3.1\n\n3.3\n\n3.5\n\nreaction time (seconds)\n\nFigure 4: Change localization given an identical\nlayout is best (fastest and most accurate) in size,\nworse in orientation, and worst in color. The\nsample means and their standard errors of re-\naction times (x-axis) and accuracies (y-axis) are\nindicated by the centers and radii of ellipses re-\nspectively. The differences are signi\ufb01cant, with\none-way ANOVA results of F (2, 3021) = 10.43,\np = 3.1 \u00d7 10\u22125 for accuracy and F (2, 3021) =\n20.43, p = 1.5 \u00d7 10\u22129 for reaction time. We treat\nthe data from all the subjects as samples from a\nsingle subject population, since we are interested\nnot in individual subjects\u2019 performance, but in\nthe distinction between feature dimensions.\n\nThe human visual system must accomplish change localization by examining more than one disk\nper \ufb02icker cycle, since the mean reaction time is only about 5, 6, and 7 cycles (0.12 \u00d7 4 = 0.48\nseconds per cycle) for size, orientation, and color respectively. If only one item is looked at and\nruled out per cycle, on average it would require \ufb01xating 50% of 24 disks till hitting the target disk,\ni.e. in 12 \ufb02icker cycles. Our average of 6 cycles suggests that about 2 disks are examined per cycle.\n\nWhen a disk is being \ufb01xated, all its 8 neighbouring disks are mostly out of fovea, since they are\neither 5.4\u25e6 or 7.7\u25e6 apart. Some coarse information about neighbouring disks must be utilized in\neach \ufb01xation. The neighbourhood effect on change localization is studied in Fig. 5 and Fig. 6.\n\ncommonly best\nspatial layout\n(100%, 1.4 s)\n\ncommonly worst\n\nspatial layout\n(80%, 7.3 s)\n\na:\n\n( 100%, 1.36 s )\n\nb:\n\n( 100%, 1.52 s )\n\nc:\n\n( 100%, 1.20 s )\n\nd:\n\n( 83%, 11.98 s )\n\ne:\n\n( 78%, 4.18 s )\n\nf:\n\n( 78%, 5.88 s )\n\nFigure 5: The common spatial layout that yields the best (a,b,c) or the worst (d,e,f) change local-\nization performance in all 3 feature dimensions. Each pair of numbers (a%, b s) indicate mean\naccuracy a and reaction time b. Shown here is the average image of a \ufb02icker stimulus, with the disk\nof change taking two attributes, except in the case of color: Since the average has the same intensity\nas background, the change is outlined in white instead. The commonly best layout has the change\namong uniform attributes, whereas the commonly worst layout has a mixture of attributes.\n\n4\n\n\fdimension\n\nsize\n\ncolor\n\norientation\n\ndimension-\n\nspeci\ufb01c\n\nbest\n\nspatial layout\n\ndimension-\n\nspeci\ufb01c\nworst\n\nspatial layout\n\na:\n\n( 100%, 2.34 s )\n( 83%, 5.44 s )\n( 83%, 6.47 s )\n\nb:\n\n( 100%, 3.08 s )\n( 100%, 2.09 s )\n( 80%, 2.35 s )\n\nc:\n\n( 94%, 3.37 s )\n( 78%, 4.08 s )\n( 94%, 2.69 s )\n\nd:\n\n( 90%, 2.10 s )\n( 100%, 2.65 s )\n( 100%, 2.59 s )\n\ne:\n\n( 94%, 3.37 s )\n( 78%, 4.08 s )\n( 94%, 2.69 s )\n\nf:\n\n( 100%, 3.08 s )\n( 100%, 2.09 s )\n( 80%, 2.35 s )\n\nFigure 6: The dimension-speci\ufb01c spatial layout that yields the best or worst change localization\nperformance in one dimension only, with the largest performance gap over the other 2 dimensions.\nSame convention as Fig. 5. The 3 rows of numbers below each image indicate the mean accuracy\nand reaction time for a stimulus rendered in the same layout but in size, color, and orientation\nrespectively. The localization of a \ufb02ickering change is easier in a primarily large neighbourhood for\nsize, in any homogeneous neighbourhood for color, and in a collinear neighbourhood for orientation.\n\nFig. 5 shows that a uniform neighbourhood tends to facilitate change localization, whereas a mixed\nneighbourhood tends to hinder change localization, no matter which dimension the disks are ren-\ndered in. Fig. 6 shows distinctions in the neighbourhood uniformity between the 3 dimensions.\n\nFor size, change localization is easier in a neighbourhood populated with large disks. If the domi-\nnant size is large (Fig. 6a), missing a large would be easier to detect, whereas if the dominant size\nis small (Fig. 6d), missing a small would be dif\ufb01cult to detect. That is, unlike color or orientation,\nthe attributes of size are asymmetrical: small produces a smaller response than large, with size 0 for\na response of 0 in the limiting case. When neither small nor large is dominant in the neighbourhood\n(Fig. 5d), change localization becomes most dif\ufb01cult. For color, change localization is easier if one\ncolor, either black or white, dominates the neighbourhood. For orientation, it is easier only if the\noriented disk is part of collinear layout.\n\n4 Feature Analysis with Eye Movements\n\nHaving seen that neighbourhood uniformity has an impact on the change localization performance,\nwe investigate how it in\ufb02uences the decision on which item to look at in the next \ufb01xation.\n\nWe \ufb01rst associate a \ufb01xation with a set of f -numbers at that location, each measuring the overall\nattribute density in a neighbourhood de\ufb01ned by a Gaussian spatial weighting function. Let loc(i)\ndenote the location of pixel i, dist(i, j) the distance between pixels i and j, and G(x; \u03c3) the 1D\nGaussian function of x with mean 0 and standard deviation \u03c3. We have:\n\nno disk at loc(i)\n\u22121, disk type 1 at loc(i)\n1, disk type 2 at loc(i)\n\nf0(i) =( 0,\nf\u03c3(i) = Pj f0(j)G(dist(i, j); \u03c3)\nPj G(dist(i, j); \u03c3)\n\n5\n\n(1)\n\n(2)\n\n\ff0\n\nf1\n\nf2\n\nf4\n\nFigure 7: The f -number images of a \ufb02icker stimulus. A negative f number (in blue shades) indicates\nthe dominance of attribute 1, whereas a positive f number (in red shades) indicates the dominance\nof attribute 2. The closer the f number is to 0 (in gray shades), neither attribute dominates the neigh-\nbourhood. f\u03c3 measures the average attribute in a Gaussian neighbourhood with standard deviation\n\u03c3. The 3 circles on the target of change mark the \u03c3, 2\u03c3, 3\u03c3 radii. While \u03c3 = 1 covers only one disk\nin isolation, \u03c3 = 2 also covers 8 adjacent disks, and \u03c3 = 4 covers 16 adjacent disks.\n\nAn f value of \u22121, 1, 0 indicates the dominance of attribute 1, 2 or neither. With an increasing \u03c3, f\u03c3\nestimates the majority of attributes in a larger neighbourhood.\n\nEach location is now associated with a set of f numbers, (f1, f2, . . .), and they as a whole capture the\nattribute homogeneity surrounding that location. Fig. 7 shows f for the best spatial layout in Fig. 5.\nAt \u03c3 = 1, the neighbourhood could only contain one disk, thus f1(i) = f0(i) for most location i\u2019s.\nAt \u03c3 = 2, it also contains 8 adjacent neighbours: f2(i) = f1(i) for i in a uniform neighbourhood,\nand f2(i) \u2248 0 for i in a mixed neighbourhood. At \u03c3 = 4, the neighbourhood is about the half size\nof the display, with f4(i) = 0 for i bordering two large different uniform neighbourhoods.\n\nFig. 8 shows the distributions of f associated with all the \ufb01xations. The two peaks of f1 in all the\n3 feature dimensions demonstrate that visual search tends to \ufb01xate disks rather than empty spaces\nbetween disks. There is also an attribute bias in each dimension, and the bias is weakest in ori-\nentation and strongest in size. This bias is not diminished in f2, demonstrating that visual search\ntends to navigate in groups of large disks. The single peak of f4 at value 0 not only con\ufb01rms the\nuniform randomness of our stimuli, but also reveals that the empty spaces being \ufb01xated tend to be\nthose borders between different attribute neighbourhoods at a coarser scale (Fig. 7 Column 4).\n\n0.2\n\n0.1\n\nsize\n\nf 1\n\nf 2\n\nf 4\n\n \n\n0.2\n\n0.1\n\ncolor\n\nf 1\n\nf 2\n\nf 4\n\n \n\n0.2\n\n0.1\n\norientation\n\nf 1\n\nf 2\n\nf 4\n\n \n\n0\n\n \n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n0\n\n \n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\n0\n\n \n\n\u22121\n\n\u22120.5\n\n0\n\n0.5\n\n1\n\nFigure 8: The probability distribution of f1, f2, f4 (in increasing line widths) associated with all\nthe \ufb01xations shows a strong preference in size for large disks as well as areas of large disks (+1),\na small preference in color for black disks (\u22121), and a slight preference in orientation for vertical\ndisks (+1). The single peak of f4 at 0 reveals most \ufb01xations occurring near those disks separating\nlarge groups of uniform attributes. These statistics are robust with respect to subject sub-sampling\nvalidation, e.g. over 8 samplings of 10 subjects only, the maximal standard error is 0.006.\n\n6\n\n\fsize\n\ncolor\n\norientation\n\nP\u03c3\n\nP1\n\nP2\n\nP4\n\nd \u2264 2\n\nd \u2208 [3, 9]\n\nd \u2265 10\n\nd \u2264 2\n\nd \u2208 [3, 9]\n\nd \u2265 10\n\nd \u2264 2\n\nd \u2208 [3, 9]\n\nd \u2265 10\n\nFigure 9: The probability distribution of f1, f2, f4 associated with all the saccades shows a prefer-\nence of jumping to a disk of the same attribute regardless of saccade distance d and neighbourhood\nsize \u03c3. Each transition P (a, b; d, \u03c3) given d and \u03c3 is visualized as a 2D image, e.g. for size, the\nright lower corner of P1 is the frequency of saccading from large to large. A darker gray indicates\na larger transitional probability. As d increases, it is more likely to jump to a different attribute, and\nthe chance is more uniformly random in orientation.\n\nFig. 9 shows the joint distributions of two f numbers associated with the initiating \ufb01xations and the\nlanding \ufb01xations of all the saccades, organized according to the saccade distance. For a saccade\nfrom pixel i to pixel j, it contributes one count of transition from a to b in the f -space:\nP\u03c3(a, b|d) = Prob(f\u03c3(i) = a, f\u03c3(j) = b, loc(i) saccade\u2212\u2192 loc(j)| dist(i, j) = d)\n\n(3)\n\nFig. 9 shows that all the transitions within 2\u25e6 tend to cluster tightly along the diagonals, i.e. between\nthe same attributes. At such a short distance, each saccade could not reach a different disk. These\ntransitions are thus between the same disks or between the same inter-disk empty spaces, by e.g.\nmicro-saccades. The bias towards a particular attribute is also clear in each dimension: There are\nmore transitions between larges than between smalls, more between blacks than between whites,\nabout the same between horizontals and between verticals. As the saccade distance increases, disks\nof various attributes become viable candidates to saccade to. It becomes more likely to saccade to\nanother disk of the same or different attribute than to saccade to an empty space (i.e. low probabili-\nties in the middle rows or columns of P\u03c3 images).\n\nWe further examine P\u03c3(a, b|d) at \u03c3 = 1 and d in the middle range of [2\u25e6, 10\u25e6], associated with\nsaccades towards adjacent disks. We quantize f into 3 values based on a threshold \u03b8: \u22121 if f < \u2212\u03b8,\n0 if |f | < \u03b8, and 1 if f > \u03b8. The joint probability can be decomposed into marginal probability \u03c0(a)\nat the initiating attribute a and conditional probability P (b|a) for the landing attribute b:\n\nP (a, b) = \u03c0(a) \u00d7 P (b|a) = Xc\n\nP (a, c)! \u00d7(cid:18) P (a, b)\nPc P (a, c)(cid:19)\n\n(4)\n\nWhile \u03c0(a) measures the proportion of \ufb01xations at attribute a among all the \ufb01xations, P (b|a) mea-\nsures the proportion of saccades towards b given the current \ufb01xation at a. Consistent with Fig. 8,\n\u03c0(a) in Fig. 10 shows more visits to large, black, vertical than to small, white, and horizontal.\n\nThe most interesting \ufb01nding comes from P (b|a): While the attributes are uniformly random in the\nneighbourhood, our eyes do not act like a blind space wanderer. 1) For size, it is much more likely to\nvisit large no matter what is being looked at in the current \ufb01xation, i.e., attribute large is an attractor\nin the f space. 2) For color, it is more likely to visit black from either black or white, but not from an\nempty space, i.e., if no disk is in \ufb01xation, it is more likely to visit white in the next \ufb01xation. Unlike\nsize, white is not an attractor, but a repeller: Once at white, the eyes are more inclined to leave for\nblack than staying in the group of whites. 3) For orientation, it is only slightly more likely to visit\nvertical than horizontal. When the eyes are on an empty space, it is in fact equally likely to visit\nhorizontal or vertical in the next \ufb01xation, i.e., there is no strong attractor or repeller in orientation,\nand the two attributes are largely reversible. Such biases also persist over larger saccades.\n\n7\n\n\fsize\n\ncolor\n\norientation\n\n\u03c0(a)\n\nP (b|a)\n\n\u03c0(a)\n\nP (b|a)\n\n\u03c0(a)\n\nP (b|a)\n\n.43\n\n.04\n\n.53\n\n.40 .10 .50\n\n.36 .15 .49\n\n.37 .12 .51\n\n.52\n\n.04\n\n.44\n\n.53 .09 .38\n\n.33 .16 .51\n\n.48 .09 .43\n\n.46\n\n.04\n\n.50\n\n.44 .08 .48\n\n.44 .12 .44\n\n.43 .10 .47\n\nFigure 10: The probability distribution of f1 for all the saccades within [2\u25e6, 10\u25e6]. f1 is quantized into\n\u22121, 0, 1, corresponding to attribute 1, empty space, and attribute 2 respectively, based on threshold\n\u03b8 = 0.15. These statistics are validated over 13 leave-50%-subjects-out samplings, with the standard\nerror for each number less than 0.01 except for the second row of P (b|a)(valued at 0.02, 0.01, 0.02\ninstead). Let a and b denote attributes, or row and column indices into the transition table. \u03c0(a) is\nthe overall probability of looking at a. P (b|a) is the probability of saccading to b at a. For example,\nfor size, \u03c0 shows that 43% of all the \ufb01xations look at small, 4% at empty, and 53% at large, whereas\nthe 3rd row of P shows that upon \ufb01xating at large, there is 51% chance of saccading to another\nlarge, 37% chance to a small disk and 12% chance to an empty space. The most likely action is\nhighlighted in red. Search in size tends to be attracted to large, search in color tends to be repelled\nby white, whereas search in orientation is largely reversible between horizontal and vertical.\n\nThese results cannot be explained by visual crowding, where the perception of peripherally viewed\nshapes is impaired with nearby similar shapes [7]. While critical spacing is always roughly half the\nviewing eccentricity and independent of stimulus size, crowding magnitude differs across features:\nSize crowding is almost as strong as orientation crowding, whereas the effect is much weaker for\ncolor [15]. Therefore, feature crowding cannot explain the different natures of feature transitions for\nsize, color, and orientation, or why such biases persist over larger saccades.\n\n5 Summary\n\nSize, color, and orientation are considered elementary features extracted with separate sets of detec-\ntors responding in parallel across space. They are modeled by the same computational mechanism,\ndiffering only in the \ufb01lters that implement their local attribute detectors.\n\nWe conducted a gaze-tracked change blindness experiment, where the subject needs to locate a\n\ufb02ickering change among items rendered identically in space and separately in size, color, and ori-\nentation. If the deployment of attention during search depends only on the master spatial map of\nresponses [14, 13, 3, 17, 12], regardless of which type of \ufb01lters produces them, we should observe\nlittle differences in the search performance and behaviours among the 3 dimensions.\n\nOur search performance analysis shows that change localization is fastest and most accurate in size,\nless in orientation, worst in color. Change in a uniform neighbourhood is easier to localize, but only\nif the attribute is large for size, or if the items form collinear extension for orientation.\n\nOur feature analysis with eye movements shows that search in each dimension has an attribute bias:\nlarge for size, black for color, and vertical for orientation, and a common spatial bias on border items\nseparating large uniform groups. However, feature transitions with saccades have a strong attractor\nbias for large, and a repeller bias for white, and a very little bias for orientation.\n\nThese biases create an interesting dynamics in serial processing over time which could explain why\nlocalization is most effective in size and worst in color. It is not due to their alike local detectors in\nthe space, but due to their own selectivity in grouping [8, 19, 1] over time: Focusing on the large\ngroup essentially cuts down the search space by half, whereas excursion into the white group from\nthe primary black group only hurts the spatial ef\ufb01ciency of search.\n\nOur results and analysis methods on these elementary features thus provide new insights into the\ncomputation of visual saliency and task-speci\ufb01c visual features across dimensions and over time.\n\n8\n\n\fAcknowledgements\n\nThis research is funded by NSF CAREER IIS-0644204 and a Clare Boothe Luce Professorship.\nI would like to thank Dimitri Lisin, Marcus Woods, Sebastian Skardal, Peter Sempolinski, David\nTolioupov, and Kyle Tierney for earlier discussions and excellent assistance with the experiments. I\nam grateful for many insightful comments I have received from Jeremy Wolfe, Ronald Rensink, and\nanonymous reviewers; their valuable suggestions have greatly improved the paper.\n\nReferences\n\n[1] G. Fuggetta, S. Lanfranchi, and G. Campana. Attention has memory: priming for the size of\n\nthe attentional focus. Spatial Vision, 22(2):147\u201359, 2009.\n\n[2] J. Grimes. On the failure ot detect changes in scenes across saccades. 2, 1996.\n\n[3] L. Itti and C. Koch. Computational modelling of visual attention. Nature Neuroscience, pages\n\n194\u2013203, 2001.\n\n[4] D. G. Lowe. Distinctive image features from scale-invariant keypoints. 2003.\n\n[5] J. Malik, S. Belongie, T. Leung, and J. Shi. Contour and texture analysis for image segmenta-\n\ntion. International Journal of Computer Vision, 2001.\n\n[6] J. H. R. Maunsell and W. T. Newsome. Visual processing in monkey extrastriate cortex. Annual\n\nReview of Neuroscience, 10:363\u2013401, 1987.\n\n[7] D. G. Pelli, M. Palomares, and N. J. Majaj. Crowding is unlike ordinary masking: distinguish-\n\ning feature integration from detection. Journal of Vision, 4(12):1136\u201369, 2004.\n\n[8] R. Rensink. Visual search for change: A probe into the nature of attentional processing. Visual\n\nCognition, 7:345\u201376, 2000.\n\n[9] R. A. Rensink, J. K. O\u2019Regan, and J. J. Clark. Image \ufb02icker is as good as saccades in making\n\nlarge scene changes invisible. 24, pages 26\u20138, 1995.\n\n[10] M. Riesenhuber and T. Poggio. Hierarchical models of object recognition in cortex. Nature\n\nNeuroscience, 2(11):1019\u201325, 1999.\n\n[11] T. Serre, A. Oliva, and T. Poggio. A feedforward architecture accounts for rapid categorization.\n\nProceedings of National Academy of Sciences, 104(15):6424\u20139, 2007.\n\n[12] A. Torralba. Contextual in\ufb02uences on saliency. In L. Itti, G. Rees, and J. Tsotsos, editors,\n\nNeurobiology of Attention, pages 586\u201393. Academic Press, 2004.\n\n[13] A. Treisman. The perception of features and objects. In R. D. Wright, editor, Visual Attention.\n\nOxford University Press, 1998.\n\n[14] A. Treisman and G. Gelade. A feature-integration theory of atttention. Cognitive Psychology,\n\n12(1):97\u2013136, 1980.\n\n[15] R. van den Berg, J. B. T. M. Roerdink, and F. W. Cornelissen. On the generality of crowding:\nvisual crowding in size, saturation, and hue compared to orientation. Journal of Vision, 7(2):1\u2013\n11, 2007.\n\n[16] J. M. Wolfe. Asymmetries in visual search: an introduction. Perception and Psychophysics,\n\n63:381\u20139, 2001.\n\n[17] J. M. Wolfe and T. S. Horowitz. What attributes guide the deployment of visual attention and\n\nhow do they do it? Nature Neuroscience, 5, 2004.\n\n[18] J. M. Wolfe, A. Reinecke, and P. Brawn. Why don\u2019t we see changes? the role of attentional\n\nbottlenecks and limited visual memory. Visual Cognition, 19(4-8):749\u201380, 2006.\n\n[19] Y. Yeshurun and M. Carrasco. The effects of transient attention on spatial resolution and the\n\nsize of the attentional cue. Perception and Psychophysics, 70(1):104\u201313, 2008.\n\n[20] H. Zhang, A. C. Berg, M. Maire, and J. Malik. SVM-KNN: Discriminative nearest neighbor\nclassi\ufb01cation for visual category recognition. In IEEE Conference on Computer Vision and\nPattern Recognition, pages 2126\u201336, 2006.\n\n9\n\n\f", "award": [], "sourceid": 277, "authors": [{"given_name": "Stella", "family_name": "Yu", "institution": null}]}