Reviews: Probing the Compositionality of Intuitive Functions

NIPS 2016
Mon Dec 5th through Sun the 11th, 2016 at Centre Convencions Internacional Barcelona

Paper ID:	1854
Title:	Probing the Compositionality of Intuitive Functions

Reviewer 1

Summary

The authors formalize the idea of human decision over functions (compositional vs non-compositional) in a framework that combines Bayesian Regression and Gaussian Process Kernels.

Qualitative Assessment

The authors provide an extensive set of experiments that compare the decision of subjects in different tasks with a grammar set (based on kernels). The presentation is very well written. However, there is one part missing regarding on the experiments. Which control mechanism was used in the experiments?. Two minor points. The x-axis is difficult to read in Figure3, and Figure 7 is quite small to read the text.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 2

Summary

This paper provides a detailed analysis of people's inferences about the functional relationship between continuous variables. The authors argue for a "compositional" representation of functions, where periodicity and linearity can combine with local similarity to produce relationships, as in some recent machine learning models. Four experiments provide support for this approach.

Qualitative Assessment

This is a very nice paper, with a clear central idea and extensive evaluation through behavioral experiments. However, there are a few weaknesses: 1. Most of the human function learning literature has used tasks in which people never visualize data or functions. This is also the case in naturalistic settings where function learning takes place, where we have to form a continuous mapping between variables from experience. All of the tasks that were used in this paper involved presenting people with data in the form of a scatterplot or functional relationship, and asking them to evaluate lines applied to those axes. This task is more akin to data analysis than the traditional function learning task, and much less naturalistic. This distinction matters because performance in the two tasks is likely to be quite different. In the standard function learning task, it is quite hard to get people to learn periodic functions without other cues to periodicity. Many of the effects in this paper seem to be driven by periodic functions, suggesting that they may not hold if traditional tasks were used. I don't think this is a major problem if it is clearly acknowledged and it is made clear that the goal is to evaluate whether data-analysis systems using compositional functions match human intuitions about data analysis. But it is important if the paper is intended to be primarily about function learning in relation to the psychological literature, which has focused on a very different task. 2. I'm curious to what extent the results are due to being able to capture periodicity, rather than compositionality more generally. The comparison model is one that cannot capture periodic relationships, and in all of the experiments except Experiment 1b the relationships that people were learning involved periodicity. Would adding periodicity to the spectral kernel be enough to allow it to capture all of these results at a similar level to the explicitly compositional model? 3. Some of the details of the models are missing. In particular the grammar over kernels is not explained in any detail, making it hard to understand how this approach is applied in practice. Presumably there are also probabilities associated with the grammar that define a hypothesis space of kernels? How is inference performed?

Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

Reviewer 3

Summary

The authors explore human inductive bias in function extrapolation and interpolation and compare it to the inductive biases incurred using different types of kernels. Compositions of kernels, i.e. sums and products of linear, Gaussian and periodic kernels seem to capture the human extra/interpolations more immediately and naturally compared to spectral kernels. A number of psychophysics experiments on humans was conducted using Amazon Mechanical Turk and the results are presented in the paper.

Qualitative Assessment

The compositional kernel seems to be closer adapted to the human occam's razor than the other proposed variants. This seems to be the case whether the functions to be inter/extrapolated are generated from a compositional kernel or not. That is interesting. The question is whether the comparison to the reference kernels is fair. Drawing an extra/interpolation using a spectral kernel which is much more global will probably look "wrong" most of the time. Please show some examples of extrapolations generated on the real-world data. Comparing to a harder null might thus have been appropriate. The Human Kernel paper has some functions that may not be that easy to extrapolate using the compositional kernel either (although obviously better than the alternative proposed). All in all the work done seems solid and the results are there, but one can remain afraid of hidden biases in the choice of real world data and the types of kernel compositions (why 3? why Gaussian + linear + periodic).

Confidence in this Review

1-Less confident (might not have understood significant parts)

Reviewer 4

Summary

The authors study the inductive bias of human function learning and find evidence that this bias exhibits compositional structure.

Qualitative Assessment

I think this should be accepted. Function learning has a long tradition of study in cognitive science and this paper brings modern tools -- Gaussian Processes and MCMC with people -- to bear on the problem. The formal setup and experiments are sensible and decently explained, and the results are interesting. I do think that the overall support for compositionality is a bit less than the authors assert, e.g.,: - Figure 3: The extrapolation distributions don't look *that* good in Lin + Per 3, Lin + Per 4, or Lin x Per. - Figure 4: It's odd that l+p extrapolations weren't accepted more. - Figure 6: The RBF kernel is competitive for interpolation. So the strength of the claims should be scaled back a bit. In addition, I wasn't totally clear on some details of MCMCP. Some questions: - I would like to understand why the stationary distribution is the posterior predictive -- can you give intuition for this? Or show a sketch in a supplement? - Why not plot the distributions implied by the Markov chains and compare them to the actual posterior predictives? - What exactly is the inverse marginal likelihood? ---------------------------- Update: I read the rebuttal and reviews and my assessment is largely unchanged - I'm in favor of this paper.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 5

Summary

The paper focuses on understanding human intuition about functions. The authors first define a hypothetical grammar for intuitive functions and then investigate how this grammar predicts human function learning performance. The authors describe three different kernel parametrizations and provide data from experiments ran through Amazon mechanical turk. They find that people seem to prefer extrapolations based on compositional kernels over other alternatives.

Qualitative Assessment

Please provide some more background about the importance of function learning in intuition. Specify in more details the concepts briefly introduced in paragraphs 1 to 3. What are the reasons behind choosing the specified kernels for the proposed compositional kernel? How is the error bar calculated for each plot? Typo 1: On page two, last paragraph, "second" should be replaced with third. Typo 2: Inconsistency in reporting the number of participants (either in number or letter).

Confidence in this Review

1-Less confident (might not have understood significant parts)

Reviewer 6

Summary

This paper tests the hypothesis that people's hypothesis space for learning functions is compositional. Function learning is formalized as Bayesian regression with functional forms given by compositional Gaussian process kernels (summing and multiplying radial basis, linear, and periodic kernels). A spectral mixture kernel provides an alternative hypothesis. A series of experiments demonstrates that people prefer to extrapolate functions (given points on the function) using compositional structure and find compositional functions to be more predictable.

Qualitative Assessment

Technical quality The number of experiments reported in the paper is impressive. Moreover, the use of the MCMCP method to probe people's posterior belief distribution is interesting. However, because Experiments 1a, 2a, and 3 use compositional ground truth, the conclusions that can be drawn from these experiments seem limited. It is not clear if people prefer compositional structures in these experiments because the data are generated from them or because they always prefer compositional structures. Furthermore, why are the compositional kernels not compared to the spectral mixture kernel in Experiments 2a and 2b? Perhaps the most convincing demonstration that people prefer compositional structure is Experiment 1b, in which compositional extrapolations are chosen more often even when the ground truth is the spectral mixture. I wonder if it would be possible to incorporate a prior over the compositional structures that expresses people's preference for linear and periodic functions and (presumably) simpler combinations, and look at the full posterior belief distribution rather than just the likelihood? A small comment on Experiment 3: I don't think many participants would find drawing a function using a Bezier curve very intuitive. What did the MTurk participants think of this? Maybe this experiment could be conducted on touchscreen devices instead. Novelty/originality This work introduces the compositional kernel and includes several novel and interesting behavioral experiments. Potential impact This paper will probably be of interest to many cognitive scientists and some computer scientists as well. However, it seems somewhat unsurprising from the outset that people prefer to think of functions as consisting of compositional structure rather than as a spectral density (unless there is strong evidence to believe the latter). The paper would be improved if some reasons to think otherwise were presented at the beginning. Clarity and presentation The paper is well-written and clear. However, it would be helpful to include in the Introduction some examples of why function learning is important for an organism. I did not fully understand the math in Sections 2, 3.1, and 3.2, but I appreciated that the authors tried to provide intuitive explanations for the mathematical formalisms at every step. One small comment is that in lines 43-45, there are two symbols, x_asterisk and x_star, and it's not clear what they each denote (or if they're actually the same symbol).

Confidence in this Review

1-Less confident (might not have understood significant parts)