NeurIPS 2019

**Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center**

### Reviewer 1

The motivation of this paper is clear. That is reducing the evaluation cost on the hyper-parameter tuning tasks. Some previous works have been proposed to tackle this issue, such as warm-starting optimization processes, multi-fidelity optimization, etc. This paper proposed to build a meta-model among problems that are from a problem distribution. By applying the Gaussian processing to capture the relations among problems (it likes the surrogate function in Bayesian optimization), the evaluations in the new problems are cheaper. However, this paper is poorly-written and poorly-organized. There are full of typos and grammatical issues in this paper. Some detail comments are listed below: 1. In Figure 1, how much hyper-parameters are you selected in the experiments? Random search is hard to beat GP-BO when hyper-parameters are more than 2 in common sense. 2. What is your algorithm framework of the proposed method? Can you show the pseudo-code of the proposed method? 3. What is the conclusion we can get from the Figure 3? Can you explain it?

### Reviewer 2

The idea of generating surrogate tasks or datasets for evaluating HPO methods is quite interesting, and is especially useful if the target task or model is very computationally expensive to evaluate a hyperparameter configuration, e.g., ResNet. Unfortunately, the empirical evaluations fail to effectively support the superiority and feasibility of the proposed benchmarking algorithm in practical applications. --------------------------------------------------------------------------------------------------------- I have read the authors' response, and tend to maintain my score. The authors have addressed some of my concerns, while I am even more worried about the feasibility and practicability of the proposed benchmarking algorithm (cf. Line 30-32).

### Reviewer 3

Originality: Conditioning on data for existing tasks to generate an unbounded number of tasks appears to be a novel idea. Quality: The paper is very well written. The idea is clear and well motivate. The idea could also be very useful to advance reproducibility in black-box optimization. The experiments are thorough and well executed. Clarity: As mentioned already, the paper is very well written. Significance: This work and in particular the tasks if released are bound to become useful benchmarks for folks working on black-box optimization.