{"title": "A unified model of short-range and long-range motion perception", "book": "Advances in Neural Information Processing Systems", "page_first": 2478, "page_last": 2486, "abstract": "The human vision system is able to effortlessly perceive both short-range and long-range motion patterns in complex dynamic scenes. Previous work has assumed that two different mechanisms are involved in processing these two types of motion. In this paper, we propose a hierarchical model as a unified framework for modeling both short-range and long-range motion perception. Our model consists of two key components: a data likelihood that proposes multiple motion hypotheses using nonlinear matching, and a hierarchical prior that imposes slowness and spatial smoothness constraints on the motion field at multiple scales. We tested our model on two types of stimuli, random dot kinematograms and multiple-aperture stimuli, both commonly used in human vision research. We demonstrate that the hierarchical model adequately accounts for human performance in psychophysical experiments.", "full_text": "A uni\ufb01ed model of short-range and long-range\n\nmotion perception\n\nShuang Wu\n\nUCLA\n\nXuming He\n\nUCLA\n\nHongjing Lu\n\nUCLA\n\nDepartment of Statistics\n\nDepartment of Statistics\n\nDepartment of Psychology\n\nLos Angeles , CA 90095\n\nshuangw@stat.ucla.edu\n\nLos Angeles , CA 90095\nhexm@stat.ucla.edu\n\nLos Angeles , CA 90095\nhongjing@ucla.edu\n\nDepartment of Statistics, Psychology, and Computer Science\n\nAlan Yuille\n\nUCLA\n\nLos Angeles , CA 90095\n\nyuille@stat.ucla.edu\n\nAbstract\n\nThe human vision system is able to effortlessly perceive both short-range and\nlong-range motion patterns in complex dynamic scenes. Previous work has as-\nsumed that two different mechanisms are involved in processing these two types of\nmotion. In this paper, we propose a hierarchical model as a uni\ufb01ed framework for\nmodeling both short-range and long-range motion perception. Our model consists\nof two key components: a data likelihood that proposes multiple motion hypothe-\nses using nonlinear matching, and a hierarchical prior that imposes slowness and\nspatial smoothness constraints on the motion \ufb01eld at multiple scales. We tested our\nmodel on two types of stimuli, random dot kinematograms and multiple-aperture\nstimuli, both commonly used in human vision research. We demonstrate that the\nhierarchical model adequately accounts for human performance in psychophysical\nexperiments.\n\n1 Introduction\n\nWe encounter complex dynamic scenes in everyday life. As illustrated by the motion sequence\ndepicted in Figure 1, humans readily perceive the baseball player\u2019s body movements and the faster-\nmoving baseball simultaneously. However, from the computational perspective, this is not a trivial\nproblem to solve. The dif\ufb01culty is due to the large speed difference between the two objects, i.e,\nthe displacement of the player\u2019s body is much smaller than the displacement of the baseball be-\ntween the two frames. Separate motion systems have been proposed to explain human perception in\nscenarios like this example. In particular, Braddick [1] proposed that there is a short-range motion\nsystem which is responsible for perceiving movements with relatively small displacements (e.g., the\nplayer\u2019s movement), and a long-range motion system which perceives motion with large displace-\nments (e.g., the \ufb02ying baseball), which is sometimes called apparent motion. Lu and Sperling [2]\nhave further argued for the existence of three motion systems in human vision. The \ufb01rst and second-\norder systems conduct motion analysis on luminance and texture information respectively, while the\nthird-order system uses a feature-tracking strategy. In the baseball example, the \ufb01rst-order motion\nsystem would be used to perceive the player\u2019s movements, but the third-order system would be re-\nquired for perceiving the faster motion of the baseball. Short-range motion and \ufb01rst-order motion\nappear to apply to the same class of phenomena, and can be modeled using computational theories\nthat are based on motion energy or related techniques. However, long-range motion and third-order\n\n1\n\n\fFigure 1: Left panel: Short-range and long-range motion: two frames from a baseball sequence\nwhere the ball moves with much faster speed than the other objects. Right panel: A graphical\nillustration of our hierarchical model in one dimension. Each node represents motion at different\nlocation and scales. A child node can have multiple parents, and the prior constraints on motion are\nexpressed by parent-child interactions.\n\nmotion employ qualitatively different computational strategies involving tracking features over time,\nwhich may require attention-driven processes.\n\nIn contrast to these previous multi-system theories [2, 3], we develop a uni\ufb01ed single-system frame-\nwork to account for these phenomena of human motion perception. We model motion estimation\nas an inference problem which uses \ufb02exible prior assumptions about motion \ufb02ows and statistical\nmodels for quantifying the uncertainty in motion measurement. Our model differs from the tradi-\ntional approaches in two aspects. First, the prior model is de\ufb01ned over a hierarchical graph, see\nFigure 1, where the nodes of the graph represent the motion at different scales. This hierarchical\nstructure is motivated by the human visual system that is organized hierarchically [8, 9, 4]. Such\na representation makes it possible to de\ufb01ne motion priors and contextual effects at a range of dif-\nferent scales, and so differs from other models of motion perception based on motion priors [5, 6].\nThis model connects lower level nodes to multiple coarser-level nodes, resulting in a loopy graph\nstructure, which imposes a more \ufb02exible prior than tree-structured models (eg. [7]). We de\ufb01ne a\nprobability distribution on this graph using potentials de\ufb01ned over the graph cliques to capture spa-\ntial smoothness constraints [10] at different scales and slowness constraints [5, 11, 12, 13]. Second,\nour data likelihood terms allow a large space of possible motions, which include both short-range\nand long-range motion. Locally, the motion is often highly ambiguous (e.g., the likelihood term\nallows many possible motions) which is resolved in our model by imposing the hierarchical motion\nprior. Note that we do not coarsen the image and do not rely on coarse-to-\ufb01ne processing [14].\nInstead we use a bottom-up compositional/hierarchical approach where local hypotheses about the\nmotion are combined to form hypotheses for larger regions of the image. This enables us to deal\nsimultaneously with both long-range and short-range motion.\n\nWe tested our model using two types of stimuli commonly used in human vision research. The \ufb01rst\nstimulus type are random dot kinematograms (RDKs), where some of the dots (the signal) move\ncoherently with large displacements, whereas other dots (the noise) move randomly. RDKs are one\nof the most important stimuli used in both physiological and psychophysical studies of motion per-\nception. For example, electrophysiological studies have used RDKs to analyze the neuronal basis of\nmotion perception, identifying a functional link between the activity of motion-selective neurons and\nbehavioral judgments of motion perception [15]. Psychophysical studies have used RDKs to mea-\nsure the sensitivity of the human visual system for perceiving coherent motion, and also to infer how\nmotion information is integrated to perceive global motion under different viewing conditions [16].\nWe used two-frame RDKs as an example of a long-range motion stimulus. The second stimulus type\nare moving gratings or plaids. These stimuli have been used to study many perceptual phenomena.\nFor example, when randomly orientated lines or grating elements drift behind apertures, the per-\nceived direction of motion is heavily biased by the orientation of the lines/gratings, as well as by the\nshape and contrast of the apertures [17, 18, 19]. Multiple-aperture stimuli have also recently been\nused to study coherent motion perception with short-range motion stimulus [20, 21]. For both types\nof stimuli we compared the model predictions with human performance across various experimental\nconditions.\n\n2\n\n\f2 Hierarchical Model for Motion Estimation\n\nOur hierarchical model represents a motion \ufb01eld using a graph G = (V, E), which has L + 1\nhierarchical levels, i.e., V = \u03bd 0 \u222a ... \u222a \u03bd l \u222a ... \u222a \u03bd L. The level l has a set of nodes \u03bd l = {\u03bdl(i, j), i =\n1 . . . , Ml, j = 1 . . . , Nl}, forming a 2D lattice indexed by (i, j). More speci\ufb01cally, we start from\nthe pixel lattice and construct the hierarchy as follows.\n\nThe nodes {\u03bd0(i, j)} at the 0th level correspond to the pixel position {x|x = (i, j)} of the image\nlattice. We recursively add higher levels with nodes \u03bd l (l = 1, ..., L). The level l lattice decreases\nby a factor of 2 along each coordinate direction from level l \u2212 1. The edges E of the graph connect\nnodes at each level of the hierarchy to nodes in the neighboring levels. Speci\ufb01cally, edges connect\nnode \u03bdl(i, j) at level l to a set of child nodes Chl(i, j) = {\u03bdl\u22121(i0, j 0)} at level l \u2212 1 satisfying\n2i \u2212 d \u2264 i0 \u2264 2i + d, 2j \u2212 d \u2264 j 0 \u2264 2j + d. Here d is a parameter controlling how many neighboring\nnodes in a level share child nodes. Figure 1 illustrates the graph structure of this hierarchical model\nin the 1-D case and with d = 2. Note that our graph G contains closed loops due to sharing of child\nnodes.\n\nTo apply the model to motion estimation, we de\ufb01ne state variable ul(i, j) at each node to represent\nthe motion, and connect the 0th level nodes to two consecutive image frames, D = (It(x), It+1(x)).\nThe problem of motion estimation is to estimate the 2D motion \ufb01eld u(x) at time t for every pixel\nsite x from input D. For simplicity, we use ul\ni to denote the motion instead of ul(i, j) in the\nfollowing sections.\n\n2.1 Model formulation\n\nWe de\ufb01ne a probability distribution over the motion \ufb01eld U = {ul\nG conditioned on the input image pair D:\n\ni}L\n\nl=0 and ul = {ul\n\ni}on the graph\n\nP (U |D) =\n\n1\nZ\n\nexp \u2212\"Ed(D, u\n\n0) +\n\nL\u22121\n\nXl=0\n\nEl\n\nu(u\n\nl, u\n\nl+1)#!\n\n(1)\n\nwhere Ed is the data term for the motion based on local image cues and El\nu are hierarchical priors\non the motion which impose slow and smoothness constraints at different levels. Energy terms\nu} are de\ufb01ned using L1 norms to encourage robustness [22]. This robust norm helps deal\nEd, {El\nwith the measurement noise that often occur at motion boundary and to prevent over-smoothing at\nthe higher levels. The details of two energy function terms are described as follows:\n1) The Data Term Ed\nThe data energy term is de\ufb01ned only at the bottom level of the hierarchy. It is speci\ufb01ed in terms of\nthe L1 norm between local image intensity values from adjacent frames. More precisely:\n\nEd(D, u\n\n0) =Xi (cid:0)||It(xi) \u2212 It+1(xi + u\n\n(2)\n\n0\n0\ni )||L1 + \u03b1||u\n\ni ||L1(cid:1)\n\nwhere the \ufb01rst term de\ufb01nes a difference measure between two measurements centered at xi in It\nand centered at xi + u0\ni in It+1 respectively. We choose to use pixel values only here. The second\nterm imposes a slowness prior on the motion which is weighted by the coef\ufb01cient \u03b1. Note that the\n\ufb01rst term is a matching term that computes the similarity between It(x) and It+1(x + u) given any\ndisplacement u. These similarity scores at x gives con\ufb01dence for different local motion hypotheses:\nhigher similarity means the motion is more likely while lower means it is less likely.\n2) The Hierarchical Prior {El\nWe de\ufb01ne a hierarchical prior on the slowness and spatial smoothness of motion \ufb01elds. The \ufb01rst\nterm of this prior is expressed by energy terms between nodes at different levels of the hierarchy and\nenforces a smoothness preference for their states u \u2013 that the motion of a child node is similar to\nthe motion of its parent. We use the robust L1 norm in the energy terms so that the violation of that\nconsistency constraint will be penalized moderately. This imposes weak smoothness on the motion\n\ufb01eld and allows abrupt change on motion boundaries. The second term is a L1 norm of motion\nvelocities that encourages the slowness.\n\nu}\n\n3\n\n\fFigure 2: An illustration of our inference procedure. Left top panel: the original hierarchical graph\nwith loops. Left bottom panel: the bottom-up process proceeds on a tree graph with multiple copies\nof nodes (connected by solid lines) which relaxes the problem. The top-down process enforces\nthe consistency constraints between copies of each node (denoted by dash line connection). Right\npanel: An example of the inference procedure on two street scene frames. We show the estimates\nfrom minimizing \u02dcE(U ) (bottom-up) and E(U ) (top-down). The motions are color-coded and also\ndisplayed by arrows.\n\nTo be speci\ufb01c, the energy function Eu(ul, ul+1) is de\ufb01ned to be:\n\nEl\n\nu(u\n\nl, u\n\nl+1) = \u03b2(l) Xi\u2208\u03bdl+1\n\n\uf8eb\n\uf8ed Xj\u2208Chl+1(i)\n\n||u\n\nl+1\nl+1\nl\nj||L1 + \u03b3||u\ni \u2212 u\ni\n\n(3)\n\n||L1\uf8f6\n\uf8f8 ,\n\nwhere \u03b2(l) is the weight parameter for the energy terms at the lth level and \u03b3 controls the relative\nweight of the slowness prior. Note that our hierarchical smoothness prior differs from conventional\nsmoothness constraints, e.g., [10], because they impose smoothness \u2019sideways\u2019 between neighboring\npixels at the same resolution level, which requires that the motion is similar between neighboring\nsites at the pixel level only. Imposing longer range interactions sideways becomes problematic as it\nleads to Markov Random Field (MRF) models with a large number of edges. This structure makes it\ndif\ufb01cult to do inference using standard techniques like belief propagation and max-\ufb02ow/min-cut. By\ncontrast, we impose smoothness by requiring that child nodes have similar motions to their parent\nnodes. This \u2019hierarchical\u2019 formulation enables us to impose smoothness interactions at different\nhierarchy levels while inference can be done ef\ufb01ciently by exploiting the hierarchy.\n\n2.2 Motion Estimation\n\nWe estimate the motion \ufb01eld by computing the most probable motion \u02c6U = arg maxU P (U |D),\nwhere P (U |D) was de\ufb01ned as a Gibbs distribution in equation (1). Performing inference on this\nmodel is challenging since the energy is de\ufb01ned over a hierarchical graph structure with many closed\nloops, the state variables U are continuous-valued, and the energy function is non-convex.\n\nOur strategy is to convert this into a discrete optimization problem by quantizing the motion state\nspace. For example, we estimate the motion at an integer-valued resolution if the accuracy is suf\ufb01-\ncient for certain experimental settings. Given a discrete state space, our algorithm involves bottom-\nup and top-down processing and is sketched in Figure 2. The algorithm is designed to be paralleliz-\nable and to only require computations between neighboring nodes. This is desirable for biological\nplausibility but also has the practical advantage that we can implement the algorithm using GPU\ntype architectures which enables fast convergence. We describe our inference algorithm in detail as\nfollows.\ni) Bottom-up Pass. We \ufb01rst approximate the hierarchial graph with a tree-structured model by\nmaking multiple copies of child nodes such that each child node has a single parent (see [23]). This\nenables us to perform exact inference on the relaxed model using dynamic programming. More\nspeci\ufb01cally, we compute an approximate energy function \u02dcE(U ) recursively by exploiting the tree\n\n4\n\n\fstructure:\n\nwhere \u02dcE(u0\nstates (\u02c6uL\n\ni ) which minimize \u02dcE(uL\ni ).\n\n\u02dcE(u\nl+1\ni\n\n) = Xj\u2208Chl+1(i)\n\nmin\nul\nj\n\n[El\n\nl+1\nu(u\ni\n\nl\n, u\n\nj) + \u02dcE(u\n\nl\nj)]\n\nj ) at the bottom level is the data energy Ed(u0\n\nj ; D). At the top level L we compute the\n\nii) Top-down Pass. Given the top-level motion (\u02c6uL\ni ), we then compute the optimal motion con-\n\ufb01guration for other levels using the following top-down procedure. The top-down pass enforces\nthe consistency constraints, relaxed earlier on the recursively-computed energy function \u02dcE, so that\nall copies of each node have the same optimal state. We minimize the following energy function\nrecursively for each node:\n\nl\n\u02c6u\nj = arg min\nul\nj\n\n[ Xi\u2208P al(j)\n\nEl\n\nl+1\nu(\u02c6u\ni\n\nl\n; u\n\nj) + \u02dcE(u\n\nl\nj)]\n\nwhere P al(j) is the set of parents of level-l node j. In the top-down pass, the spatial smoothness is\nimposed to the motion estimates at higher levels which provide context information to disambiguate\nthe motion estimated at lower levels.\n\nThe intuition for this two-pass inference algorithm is that the motion estimates of the lower level\nnodes are typically more ambiguous than the motion estimates of the higher level nodes because the\nhigher levels are able to integrate information from larger number of nodes at lower levels (although\nsome information is lost due to the coarse representation of motion \ufb01eld). Hence the estimates\nfrom the higher-level nodes are usually less noisy and can be used to give \u201ccontext\u201d to resolve the\nambiguities of the lower level nodes. From another perspective, this can be thought of as a message-\npassing type algorithm which uses a speci\ufb01c scheduling scheme [24].\n\n3 Experiments with random dot kinematograms\n\n3.1 The stimuli and simulation procedures\n\nRandom dot kinematogram (RDK) stimuli consist of two image frames with N dots in each frame\n[1, 16, 6]. As shown in \ufb01gure (3), the dots in the \ufb01rst frame are located at random positions. A pro-\nportion CN of dots (the signal dots) are moved coherently to the second frame with a translational\nmotion. The remaining (1 \u2212 C)N dots (the noise dots) are moved to random positions in the second\nframe. The displacement of signal dots are large between the two frames. As a result, the two-frame\nRDK stimuli are typically considered as an example of long-range motion. The dif\ufb01culty of per-\nceiving coherent motion in RDK stimuli is due to the large correspondence uncertainty introduced\nby the noise dots as shown in rightmost panel in \ufb01gure (3).\n\nFigure 3: The left three panels show coherent stimuli with N = 20, C = 0.1, N = 20, C =\n0.5 and N = 20, C = 1.0 respectively. The closed and open circles denote dots in the \ufb01rst and\nsecond frame respectively. The arrows show the motion of those dots which are moving coherently.\nCorrespondence noise is illustrated by the rightmost panel showing that a dot in the \ufb01rst frame has\nmany candidate matches in the second frame.\n\nBarlow and Tripathy [16] used RDK stimuli to investigate how dot density can affect human perfor-\nmance in a global motion discrimination task. They found that human performance (measured by\nthe coherence threshold) vary little with dot density. We tested our model on the same task to judge\n\n5\n\n\fFigure 4: Estimated motion \ufb01elds for random dot kinematograms. First row: 50 dots in the\nRDK stimulus; Second row: 100 dots in the RDK stimulus; Column-wise, coherence ratio C =\n0.0, 0.3, 0.6, 0.9, respectively. The arrows indicate the motion estimated for each dot.\n\nthe global motion direction using RDK motion stimulus as the input image. We applied our model\nto estimate motion \ufb01elds and used the average velocity to indicate the global motion direction (to\nthe left or to the right). We ran 500 trials for each coherence ratio condition. The dot number varies\nwith N = 40, 80, 100, 200, 400, 800 respectively, corresponding to a wide range of dot densities.\nThe model performance was computed for each coherence ratio to \ufb01t psychometric functions and to\n\ufb01nd the coherence threshold at which model performance can reach 75% accuracy.\n\n3.2 The Results\n\nFigure (4) shows examples of the estimated motion \ufb01eld for various values of dot number N and\ncoherence ratio C. The model outputs provide visually coherent motion estimates when the coher-\nence ratio was greater than 0.3, which is consistent with human perception. With the increase of\ncoherence ratio, the estimated motion \ufb02ow appears to be more coherent.\n\nTo further compare with human performance [16], we examined whether model performance can be\naffected by dot density in the RDK display. The right plot in \ufb01gure (5) shows the model performance\nas a function of the coherence ratio. The coherence threshold, using the criterion of 75% accuracy,\nshowed that model performance varied little with the increase of dot density, which is consistent\nwith human performance reported in psychophysical experiments [16, 6].\n\n4 Experiments with multi-aperture stimuli\n\n4.1 The two types of stimulus\n\nThe multiple-aperture stimulus consisted of a dense set of spatially isolated elements. Two types of\nelements were used in our simulations: (i) drifting sine-wave gratings with random orientation, and\n(ii) plaids which includes two gratings with orthogonal orientations. Each element was displayed\nthrough a stationary Gaussian window. Figure (6) shows examples of these two types of stimuli.\nThe grating elements are of form Pi(~x, t) = G(~x \u2212 ~xi, \u03a3)F (~x \u2212 ~xi \u2212 ~vit) where ~xi denotes the\ncenter of the element, and F (.) represents a grating , F (x, y) = sin(f x sin(\u03b8i)+f y cos(\u03b8i)), where\nf is the \ufb01xed spatial frequency and \u03b8i is the orientation of the grating.\n\nThe grating stimulus is I(~x, t) =PN\n\ni=1 Pi(~x, t), where N is the number of elements (which is kept\nconstant). For the CN signal gratings, the motion ~vi was set to a \ufb01xed value ~v. For the (1 \u2212 C)N\nnoise gratings, we set |~vi| = |~v| and the direction of ~vi was sampled from a uniform distribution.\nThe grating orientation angles \u03b8i were sampled from a uniform distribution also.\n\n6\n\n\fl\n\nd\no\nh\ns\ne\nr\nh\nT\no\n\n \n\ni\nt\n\n \n\na\nR\ne\nc\nn\ne\nr\ne\nh\no\nC\n\n0.04\n\n0.035\n\n0.03\n\n0.025\n\n0.02\n\n0.015\n\n0.01\n\n0.005\n\n0\n40\n\n80 100\n\n200\nN\n\n400\n\n800\n\nFigure 5: Left panel: Figure 2 in [16] showing that the coherence ratio threshold varies very little\nwith dot density. Right panel: Simulations of our model show a similar trend. N =40, 80, 100, 200,\n400 and 800.\n\nFigure 6: Multi-aperture gratings and plaids. Left column: sample stimuli. Right column: stimuli\nwith the local drifting velocity of each element indicated by arrows. The stimulus details are shown\nin the magni\ufb01ed windows at the upper right corner of each image.\n\nThe plaid elements combine two gratings with orthogonal orientations (each grating has the same\nspeed but can have a different motion direction). This leads to plaid element Qi(~x, t) = G(~x \u2212\n~xi, \u03a3){F1(~x \u2212 ~xi \u2212 ~vi,1t) + F2(~x \u2212 ~xi \u2212 ~vi,2t), where F1(x, y) = sin(f x sin \u03b8i + f y cos \u03b8i) and\nF2(x, y) = sin(\u2212f x cos \u03b8i + f y sin \u03b8i).\n\nThe plaid stimulus is I(~x, t) =PN\n\ni=1 Qi(~x, t). For the CN signal plaids, the motions ~vi,1, ~vi,2 were\nset to a \ufb01xed ~v. For the (1 \u2212 C)N noise plaids, the directions of ~vi,1, ~vi,2 were randomly assigned,\nbut their magnitude |~v| was \ufb01xed.\n\n7\n\n\fy\nc\na\nr\nu\nc\nc\nA\n\n0.9\n\n0.8\n\n0.7\n\n0.6\n\n0.5\n\n0.4\n\n0.3\n\n0.2\n\n0.1\n \n0\n\n \n\ngratings\nplaids\n\n0.02\n\n0.04\n0.06\nCoherence Ratio\n\n0.08\n\n0.1\n\nFigure 7: Left two panels: Estimated motion \ufb01elds of grating and plaids stimuli. Rightmost panel:\nPsychometric functions of gratings and plaids stimuli.\n\n4.2 Simulation procedures and results\n\nThe left two panels in Figure (7) show the estimated motion \ufb01elds for the two types of stimulus we\nstudied with the same coherence ratios 0.7. Plaids stimuli produce more coherent estimated motion\n\ufb01eld than grating stimuli, which is understandable. because they have less ambiguous local motion\ncues.\n\nWe tested our model in an 8-direction discrimination task for estimating global motion direction\n[20]. The model used raw images frames as the input. We ran 300 trials for each stimulus type, and\nused the direction of the average motion to predict the global motion direction. The prediction accu-\nracy \u2013 i.e. the number of times our model predicted the correct motion direction from 8 alternatives\n\u2013 was calculated at different coherence ratio levels. This difference between gratings and plaids is\nshown in the rightmost panel of Figure (7), where the psychometric function of plaids stimuli is\nalways above that of grating stimuli, indicating better performance. These simulation results of our\nmodel are consistent with the psychophysics experiments in [20].\n\n5 Discussion\n\nIn this paper, we proposed a uni\ufb01ed single-system framework that is capable of dealing with both\nshort-range and long-range motion. It differs from traditional motion energy models because it does\nnot use spatiotemporal \ufb01ltering. Note that it was shown in [6] that motion energy models are not\nwell suited to the long-range motion stimuli studied in this paper. The local ambiguities of motion\nare resolved by a novel hierarchical prior which combines slowness and smoothness at a range of\ndifferent scales. Our model accounts well for human perception of both short-range and long-range\nmotion using the two standard stimulus types (RDKs and gratings).\n\nThe hierarchical structure of our model is partly motivated by known properties of cortical organi-\nzation. It also has the computational motivation of being able to represent prior knowledge about\nmotion at different scales and to allow ef\ufb01cient computation.\n\nAcknowledgments\n\nThis research was supported by NSF grants IIS-0917141, 613563 to AY and BCS-0843880 to HL.\nWe thank Alan Lee and George Papandreou for helpful discussions.\n\nReferences\n\n[1] O. Braddick. A short-range process in apparent motion. Vision Research. 14, 519-529. 1974.\n\n[2] Z. Lu, and G. Sperling. Three-systems theory of human visual motion perception: review and update.\n\nJournal of the Optical Society of America. A. 18, 2331-2369. 2001.\n\n8\n\n\f[3] L. M. Vaina, and S. Soloviev. First-order and second-order motion: neurological evidence for neu-\n\nroanatomically distinct systems. Progress in Brain Research. 144, 197-212. 2004.\n\n[4] T.S. Lee and D.B. Mumford. Hierarchical Bayesian inference in the visual cortex. JOSA A, Vol. 20, Issue\n\n7, pp. 1434-1448. 2003.\n\n[5] A.L. Yuille and N.M. Grzywacz, A computational theory for the perception of coherent visual motion.\n\nNature 333 pp. 71-74. 1988.\n\n[6] H. Lu and A.L. Yuille. Ideal observers for detecting motion: Correspondence noise. NIPS, 2006.\n[7] M. R. Luettgen, W. C. Karl and A. S. Willsky. Ef\ufb01cient Multiscale Regularization with Applications to\n\nthe Computation of Optical Flow. IEEE Transactions on image processing. Vol. 3, pp. 41-64. 1993.\n\n[8] P. Cavanagh. Short-range vs long-range motion: not a valid distinction. 5(4), pp 303-309. 1991.\n[9] S. Grossberg, and M. E. Rudd. Cortical dynamics of visual motion perception: short-range and long-range\n\napparent motion. Psychological Review. 99(1), pp 78-121. 1992.\n\n[10] B.K.P. Horn and B.G. Schunck. Determining Optical Flow. Arti\ufb01cial Intelligence. 17(1-3), pp 185-203.\n\n1981.\n\n[11] Y. Weiss, E.P. Simoncelli, and E.H. Adelson. Motion illusions as optimal percepts. Nature Neuroscience,\n\n5(6):598-604, Jun 2002.\n\n[12] A. A. Stocker and E. P. Simoncelli. Noise characteristics and prior expectations in human visual speed\n\nperception. Nature Neuroscience, vol.9(4), pp. 578\u2013585, Apr 2006.\n\n[13] S. Roth and M. J. Black: On the spatial statistics of optical \ufb02ow. International Journal of Computer\n\nVision, 74(1):33-50, August 2007.\n\n[14] P. Anandan. A computational framework and an algorithm for the measurement of visual motion. Int.\n\nJournal. Computer Vision. 2. pp 283-310. 1989.\n\n[15] K. H. Britten, M. N. Shadlen, W. T. Newsom and J. A. Movshon. The analysis of visual motion: a\ncomparison of neuronal and psychophysical performance. Journal of Neuroscience. 12(12), 4745-4765.\n1992\n\n[16] H. Barlow, and S.P. Tripathy. Correspondence noise and signal pooling in the detection of coherent visual\n\nmotion. Journal of Neuroscience, 17(20), 7954-7966. 1997.\n\n[17] E. Mingolla, J.T. Todd, and J.F. Norman. The perception of globally coherent motion. Vision Research,\n\n32(6), 1015-1031. 1992.\n\n[18] J. Lorenceau, and M. Shiffrar. The in\ufb02uence of terminators on motion integration across space. Vision\n\nResearch, 32(2), 263-273. 1992.\n\n[19] T. Takeuchi. Effect of contrast on the perception of moving multiple Gabor patterns. Vision research,\n\n38(20), 3069-3082. 1998.\n\n[20] K. Amano, M. Edwards, D. R. Badcock and S. Nishida. Adaptive pooling of visual motion signals by\nthe human visual system revealed with a novel multi-element stimulus. Journal of Vision, 9(3(4)), 1-25.\n2009.\n\n[21] A. Lee and H. Lu. A comparison of global motion perception using a multiple-aperture stimulus. Journal\n\nof Vision. 10(4), 9. 2010.\n\n[22] M. Black and P. Anandan. The robust estimation of multiple motions: Parametric and piecewise-smooth\n\n\ufb02ow \ufb01elds. CVIU 63(1), 1996.\n\n[23] A. Choi, M. Chavira and A. Darwiche. A Scheme for Generating Upper Bounds in Bayesian Networks.\n\nUAI, 2007.\n\n[24] J. Pearl. Probabilistic Reasoning in Intelligent Systems: networks of plausible inference, 1988\n\n9\n\n\f", "award": [], "sourceid": 1344, "authors": [{"given_name": "Shuang", "family_name": "Wu", "institution": null}, {"given_name": "Xuming", "family_name": "He", "institution": null}, {"given_name": "Hongjing", "family_name": "Lu", "institution": null}, {"given_name": "Alan", "family_name": "Yuille", "institution": null}]}