__ Summary and Contributions__: The paper proposed a novel framework for instrumental variable regression, called DualIV, that directly estimates the structural causal function. The derivation is rooted in dual optimization and it overcomes a longstanding challenge that prevents two-stage regression in estimating causal inference via instrumental variables.
=====
I have read the authors' feedback carefully and I do appreciate the author's reply on the scalability concern. They have claimed that it is not adequately addressed in this paper and will include more details in the camera-ready version. I am looking forward to their updated one. The merit of this paper and the beautiful insight will not be overshadowed by this issue, it would be better if they can discuss or solve this problem at least to some extent.

__ Strengths__: I like the idea of this paper. The paper is of high quality and provides theoretical proofs and extensive experimental results. Duality is a very nice, clean, and useful approach to combining causality and stochastic programming. The paper has the potential of conveying the message of causality into the wider machine learning community and thereby trigger other ideas in this area.

__ Weaknesses__: The general perception is that kernel methods are not scalable. I am a little bit worry about the performance of dualIV on the large-scale data, and I am willing to see more experimental results (e.g. the sigmoid function f in Kernel IV paper).

__ Correctness__: The claims and method are correct and the experimental results are promising.

__ Clarity__: The paper is very clearly written and well organized, a pleasure to read. The assumptions are clearly stated.

__ Relation to Prior Work__: They clearly clarified the main contribution of the work relative to these previous works.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper studies the regression problem, fittiing the function f with Y~E(f(X)|Z) via its dual formulation. This applies on least squares problem with the unsupervised approach. The paper proposes an algorithm based on this setting equipped with parameter selection, theoretical analysis and experiments to justify the algorithm.
=======================
I have seen the feedback and I'm convinced by the reply to my doubts. I think this paper is correct now, but I'd like to see those points are explained more clearly in the final version. I think it's okay to accept it as a main conf paper, but since the technical difficulty is fairly low and the method to some extent origins from what people have known or used, I'd keep my score of 6.

__ Strengths__: Concrete analysis of dual formulation. Clear explanation of the roadmap towards the idea of the working algorithm.

__ Weaknesses__: I do not see major weaknesses, some minor discussions below.

__ Correctness__: I believe the theory is correct and the experiments are reasonable.

__ Clarity__: Yes the ideas are conveyed clearly.

__ Relation to Prior Work__: Sorry I'm not quite familliar with the area. I appreciate the list of related literature and discussions in the first two sections.

__ Reproducibility__: Yes

__ Additional Feedback__: 1. Line 54, could the authors explain more about do(X=x)? And I'd like to see it emphasized later in section 3&4, what it, as well as correlated noise, means and obstacles for the formulation of the problem.
2. I don't quite see the paragraph Line 130 "cumbersome to solve (4)...". If you can do E_{XYZ}, does it mean you know p(X,Y,Z), and why is E_{X|Z} hard? Is it hard to formulate the RKHS?

__ Summary and Contributions__: This paper addresses the problem of instrumental variable regression. It introduces a dual formulation for the corresponding learning problem, which boils down to be a convex-concave saddle-point problem. Restricting the hypothesis set to an RKHS, the authors present a closed-form solution to the minimax problem, leading to an easy implementation of the desired estimator.

__ Strengths__: The contribution is algorithmic since the paper introduces a dual formulation and a closed-form solution to a known problem (intrsumental variable regression). I find it very interesting and definitely relevant for the NeurIPS community.
The paper is well written and the proposed approach is clearly explained. The work is complete since the method is supported by theoretical and numerical evidences.

__ Weaknesses__: I only have minor comments:
- Page 2, Line 54: more explanation is needed about conditionning to do(X=x).
- Page 6, Line 233: clarify dimensions of objects.
- Page 7, Line 243: \top means inner product in RKHS F?

__ Correctness__: Everything seems correct.

__ Clarity__: The paper is clear.

__ Relation to Prior Work__: Relation to prior work is adequately addressed.

__ Reproducibility__: Yes

__ Additional Feedback__: I thank the authors for addressing my comments in their rebuttal.