Paper ID: 1349
Title: Context-sensitive active sensing in humans
Reviews

Submitted by Assigned_Reviewer_1

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper propose a myopic version of C-DAC, a visual search algorithm that includes costs and benefits. Some c-b algorithm of this sort will surely beat a naive cost-benefit free version...so the issue is just what the details look like.
Fig 2 shows that C-DAC does well on many dimensions of fit.

Comments:
C-DAC is not psycho'll plausible? why not?
Fig 2. why is set 9 and 1&3 so diagnostic.
Q2: Please summarize your review in 1-2 sentences
This paper find evidence for a search algorithm that myopically adjust for the cost of future visual saccades.

Submitted by Assigned_Reviewer_4

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
The paper explores theoretical issues in active sensing -- situations where people much actively gather information to achieve some goal. A extension to a previous model (C-DAC) is proposed which myopically gathers information in order to maximize a task-specific cost function. The model is compared against an alternative that favors information without respect to cost (Infomax). The overall goal of the paper is interesting and important. The novelty of the paper is somewhat reduced simply because it reports an extension to Ahmad & Yu (2013) to deal myopically with action selection. This is fine but I wondered how novel or important that aspect of the modeling actually was (e.g., it didn't seem to figure centrally into explanations of the experiment itself).

There is some prior work on this issue. In particular, the authors might check out Markant & Gureckis, 2012 "Does the utility of information influence sampling behavior?" which explores a similar issue: whether people optimize information or some task-specific cost function. They found that information optimizing provided a better fit to people's behavior.

The paper was poorly written overall. The abstract was much too long, the paper exceeded the 8 page limit at NIPS, and the paper cites Ahmad & Yu, 2013 in almost every sentence including twice in the abstract alone! I had a really hard time understanding the experiment design. Terms like "patch" are introduced without definition (is a patch a set of trials with a particular distribution of target probabilities, or possible target locations? I couldn't really tell). Also, while the overall theoretical issue seemed well motivated, the paper itself focused on this "confirmatory search" phenomena which the authors seemed to ad-hoc identify in their data. Why was this phenomena selected or critical for differentiating the models?

Also, the model result for section 3 were not that compelling. The full C-DAC hits all the qualitative patterns as do the myopic one. The authors don't discuss what this means. Is it the case that the measures the authors selected here just don't tell any of the models apart very well? No fit statistics or quantification of the advantage for the more limited model is offered other than to describe is as "more psychologically plausible." Section 4 was also very opaque. I didn't understand the policy-space graphs in Figure 3 and I suspect that most readers will have significant difficulty with this as well. It is unclear what "Greedy MAP" is as well as this model is never mentioned in the paper.

The analysis presented in Figure 4 is odd because if this is meant to be a model of humans, no mention is made of human performance. It is fine to say that the myopic C-DAC model performs well on a number of metrics, but as a model it should match human observer as well (unless this is meant as a engineering analysis of a computer vision based visual search algorithm?).

*Note* My initial evaluation was a little more negative, but after discussion with the other reviewers I have given a 7. I think some of the critical comments above still apply, but there are also strengths including the emphasis on multiple model comparison, the empirical data, and the novelty of deriving theorems about the optimal stopping rule. I thought the author's response was helpful and it would be nice if the paper was edited somewhat in accordance with the reviewer questions to be more clear if it is accepted.
Q2: Please summarize your review in 1-2 sentences
A myopic modification is added to a recent model of cost-sensitive active sensing. The model is fit to data from a visual search task, and the myopic model appears to fit the data of the experiment fairly well. However, qualitatively the full (non-myopic) model performs about the same. Other aspects of the paper were somewhat difficult to understand given what was written including the graphs of policy-space functions. Overall I think there are a lot of interesting ideas here, but the paper is not quite developed yet. The contribution beyond the original C-DAC model is unclear (outside of a vauge notion of being more psychologically plausible because it doesn't optimize multiple steps ahead into the future).

Submitted by Assigned_Reviewer_6

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Paper 1349 – Context-sensitivity in human active sensing

This paper explores how people guide their search for information in a visual display. They describe a recently proposed model, C-DAC, present an optimal stopping result, and a computation-bounded approximation to the optimal cost. Afterwards, they present an experiment where there are three patches of stimuli (dots moving either to the left or right with some amount of coherence), and participants are supposed to report the patch with dots moving to the left as quickly as possible. Generally, the models using C-DAC generated policies that resulted in behavior a bit more like the human results, than the behavior resulting from a model using an Infomax generated policy. Finally, they conclude with simulations that can distinguish between the three policy generation methods.

The paper focuses on a critical computational problem faced by both people and machines: How should an agent allocate its sensors to the different places in the environment to best solve the task at hand? The approach taken is very interesting, elegant, and a reasonable candidate for what the visual system is aiming to solve at a computational level. However, the simulations at the end of the paper distinguishing between the different methods was very interesting, but I was left disappointed that there weren’t human results as well! Furthermore, I found the evaluation of their method too poor to merit a NIPS publication, and as, C-DAC is a previously published model, and it should not be considered a contribution of the paper.

However, there are some issues with the paper that leave me on the fence regarding whether or not I believe it should be published at NIPS. The framing of the paper is about how C-DAC provides context-sensitive policies, but I don’t understand in what sense it is context-sensitive. It would help to be more upfront about what exactly the context is in this context. I don’t see how the case study highlighted why being context-sensitive matters. In fact, besides Infomax getting the search times wrong for when 9 was the target, I did not find the experiment too convincing that people were guided by similar computational principles as C-DAC. Furthermore, there are few dependent variables being predicted for the complexity and number of parameters used by C-DAC. Finally, the simulations at the end of the paper seemed much more interesting than the case study and potentially could provide stronger evidence for C-DAC. However, there were no human experiments based on these simulations. The authors may be better off waiting to publish the project with the human results for these later simulations, which would result in a much stronger paper, than publishing the current paper.

Minor comments:
Section 2.1: When a vector has been indexed to a scalar, I personally like it to no longer be bold.
Section 3.1, line 232: “the probability of a target containing a target” should that be a patch containing a target?
Section 3.2 title: “Model vs. behavior” the use of vs. sounded a bit strange to me. Also, should model be plural?
Section 3.2, line 261-263: “… the general tendency for human subjects typically rely more on recent observations…” -> “… the general tendency for human subjects to rely more on recent observations…”
Figure 2: label the units for the y-axes.
Q2: Please summarize your review in 1-2 sentences
An excellent project in the making; however, it seems too premature for publication.
Author Feedback

Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however that reviewers and area chairs are very busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
We thank the reviewers for their valuable feedback. We feel that the main concerns expressed by the reviewers may have risen from a lack of clarity in our writing in certain places, and the reviews were very helpful in highlighting these lapses. In the following, we first address the concerns shared by the reviewers, and then proceed to individual points as needed. We will of course also incorporate these additional clarifications in any revision of the paper.

General comments
1. The reviewers asked why C-DAC is "arguably not very psychologically plausible" (line 332). This is because C-DAC requires using dynamic programming (recursing Bellman's optimal equation) offline to compute a *globally* optimal policy over the continuous state space (belief state), so that the discretized state space scales exponentially in the number of hypotheses. Similar observations are made in the discussion section of Markant & Gureckis, 2012, the reference suggested by reviewer #2. Myopic C-DAC is "online, fast, and easy to scale to more complex problems" (line 412), as it incurs a constant cost to compute the policy online, for only the current belief state. It has huge computation savings and is thus "psychologically more plausible" (line 335).

2. The reviewers asked why we focused on the particular aspect of the behavioral data that we did. That is in part because the more obvious aspects (e.g. where each policy would choose to fixate first) are also the more trivial aspects that all reasonable policies would predict (e.g. the most probable one). We focused on the confirmation-bias-related fixation duration differences, because previous models like Infomax could not account for them. Furthermore, confirmation bias is also psychologically important. It is a well studied phenomenon exhibited by humans in a variety of choice and decision behavior (see Nickerson, 1998, for a review), and we feel any convincing model should be able to account for it.

3. The reviewers asked for a summary of how the performance of C-DAC was different from Myopic C-DAC and Infomax in capturing data. As shown in Fig. 2, out of the four confirmatory trends that follow, C-DAC captures all four, myopic C-DAC three (1, 2 and 4), and Infomax two (2 and 4): (1) Time taken to confirm when the more likely patch (9) is the target is shorter compared to when less likely patches (1&3) are the target (two left bars); (2) Similarly, time taken to disconfirm when patch 9 is not target is longer (two right bars); (3) Time taken to confirm patch 9 when it is the target is shorter than time taken to disconfirm when it is not (blue bars); (4) Time taken to confirm when patches 1&3 are targets is longer (red bars). Furthermore, C-DAC provides best quantitative fit for Hits and False Alarms.

4. The reviewers asked why we split the performance metrics into two sets (1&3 and 9). Since a major goal of the experiment is to examine top-down cognitive influences in saccadic vision, we manipulated top-down expectations by having the target appearing in the three patches with unequal probabilities, and looking at the fixation behavior in each patch separately. Note that the coherence of random dots and hence the bottom-up stimulus information is identical at each patch -- the stimulus being fixated is either always the target or distractor. So any differences in human or model behavior arise entirely from top-down expectations about target location. This is something that any bottom-up or saliency-based models cannot account for (e.g. Itti & Koch, 2000) .

5. The reviewers asked why we included model predictions that we do not yet have data for. We firmly believe that any good model should not only be able to explain existing data but also make specific and testable predictions for future experiments. Thus, Sec. 4 contains predictions that sets the stage for future investigations, highlighting scenarios where we predict the different models would agree or disagree the most. We believe that timely publication of these predictions would facilitate scientific progress by encouraging other groups to investigate these predictions either collaboratively or independently.


Individual Comments
Reviewer #2:
The reviewer inquired about our novel contributions. These are: (1) A novel visual search experiment with model comparisons to human behavior. (2) An efficient approximation, Myopic C-DAC, that compares favorably with human behavior. (3) A novel way to derive a decision threshold bound, that can be used to augment policies like Infomax, which are agnostic about when to stop in a time-pressured scenario.

The reviewer expressed concerns about exceeding the page limit, over-citing the C-DAC paper, and some typographical errors. The paper is in compliance with new NIPS guideline of (8+1) pages. We cite Infomax and C-DAC several times, the latter more so owing to our myopic approximation; we will be more parsimonious in a revision. There is a typographical error in Fig. 3 which caused confusion: it should be "Myopic C-DAC" instead of "Greedy MAP". Lastly, a patch is a stimulus location possibly containing the target, a definition we missed.

Reviewer #3:
The reviewer asked for clarification of what "context-sensitivity" means. Based on the visual search task, and the reward structure explained to the subjects (line 236), context-sensitivity refers to the ability to change the control policy based on behavioral costs, such as those related to time, inaccuracy, and effort.

The reviewer asked about the number of free and dependent variables in the policies. C-DAC actually explains all trends in fixation duration (general comment 3), as well as Hits/FAs, and has two new parameters (c & c_s). Since we use a decision threshold for Infomax and Myopic C-DAC that also depends on c, effectively there is just one extra parameter, c_s. The reviewer also made some minor comments regarding formatting, which we intend to incorporate in the revision.