__ Summary and Contributions__: This work considers the problem of learning optimal contextual prices from a set of allocation/bid samples that would arise from a users bids within an auction. Prior work has addressed this problem significantly, and shown that when generalizable, optimal prices could be computed based on the empirical distribution provided, but have not shown how to do this efficiently without assumptions on data.
The work presents an approach (Myersonian Regression) that represents more direct learning of estimated losses relating to revenue in auctions, at least relative to the benchmark of Lipcshitz smooth pricing policies. The work accomplishes this through Johnson-Lindenstrauss dimensionality reduction and sampling, then solving optimally on smaller sample. The benchmark used is the optimal 1-Lipshitz pricing policy, which due to the resulting smoothness of the policy, becomes an easier target for dimensionality reduction through Johnson-Lindenstrauss.

__ Strengths__: The work covers formulation, hardness and approximation, with fewer distributional assumptions than prior work.
The approach of moving the assumptions of smoothness from the distribution to the benchmark is interesting where either for practical concerns, there would be complexity restrictions on the actual policy or in a setting it can be shown that there is usually little loss from simplicity or smoothness assumptions on the mechanism.

__ Weaknesses__: The terminology "Myersonian Regression" suggests a more direct tie in to Myersonian Virtual Values / amortization of expected revenue (x(v - (1-F)/f)). I do not think this approach is leveraging that amortization - if it is leveraging and I missed the connect, more discussion of its contribution should be shown.
The results are direct consequences from the assumption of using as a benchmark 1-Lipschitz pricing policies (for which the leveraged Johnson-Lindenstrauss sampling approach works). However, this is a very strong assumption. The optimal auction in simple settings is often a reservation price, which is not a 1-Lipschitz pricing policy (and is never once discussed in the work). The work would be made stronger with empirical justification that optimal 1-Lipschitz pricing policies are close to optimal pricing policies in the given setting. This work claims to have fewer distributional assumptions than other work, but that is really because the distributional assumptions have been moved from the algorithm into the benchmark, without a full discussion of the benchmark performance vs the full optimal benchmark.
AFTER RESPONSE: The authors addressed my (misplaced) concern over applicability of 1-Lipschitz pricing policies.

__ Correctness__: The methods seem to make sense, I was not able to fully verify proofs.

__ Clarity__: The paper was reasonably well written, however more connection to the themes driving results (revenue characterization and sampling) would be helpful.

__ Relation to Prior Work__: The work clearly discusses how it differs from previous contributions.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This paper presents the Myersonian loss function (based on Myerson price) and show that finding the optimal hypothesis exactly is NP-hard, but present a PTAS yields an additive approximation of optimal revenue. Depending on a normalization constraint added to the ERM problem (MR vs UMR), the authors show stability of (normalized) MR solutions and possible instability of UMR solutions.
Note: Throughout, I use the citation [MM14] to reference A. M. Medina and M. Mohri. Learning theory and algorithms for revenue optimization in second price auctions with reserve. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), pages 262–270, 2014a

__ Strengths__: The presentation of a PTAS (and lack of FPTAS) to learn the approximate Myerson price (MP) is a strong result, and contributes to a seemingly limited literature on learning MP in a learning theory setting.
Moreover, the work, in showing NP-hardness yet giving a PTAS by a JL projection for learning Myerson price gives a pretty tight characterization on the difficulty of the problem for a thorough study of the problem.
- Having the clear explanation of ``the quadrants'' of online vs offline and contextual vs noncontextual settings was very helpful for me to situate this work.

__ Weaknesses__: SUGGESTIONS FOR IMPROVING SCORE
--------------------------------
- Develop and motivate a bit further the construction of the revenue function at the bottom of page 1 and if surrogates could be used.
- Flesh out the literature review and spend more time contextualizing the results.
- Share how the NP-hardness result is surprising if it is.
- The broader impacts mentions pricing for online marketplaces; mentioning this in the motivation for *why* one might want to learn the Myerson price would strength the motivation of the paper.
=================
AFTER RESPONSE
=================
- Thank you for the elaboration re: the contextualization of the hardness result.

__ Correctness__: In reviewing the proofs, I did not find any inaccuracies.

__ Clarity__: Overall, the paper was well-written, though some of the emphasis could be reframed to focus on the perhaps more surprising results of the existence of PTAS for predicting Myersonian price.
CLARIFICATIONS
--------------------------------
- The origin of some constants in proofs is unintuitive, and it is unclear if they are arbitrary. (ex: exp(5), 10\epsilon)
- Aren't discontinuous losses almost always NP-hard to optimize? In that sense, the result seems unsurprising.
- There is a small difference between your revenue function (above line 25) and the revenue function in [MM14] Equation 1, namely that their v is in \reals^2. Can you speak to this?
- Is there a consistent surrogate that makes sense in this context? ([MM14] uses a convex surrogate, but it is seemingly not consistent, see [MM14] Fig 1b.)
=================
AFTER RESPONSE
=================
-The use of constants for simplifying analysis makes sense; I just wasn't sure if there was something in the analysis I was missing in my reading.
- Emphasizing the beauty of the hardness result might be helpful to readers, though I understand the difficulty given space constraints.

__ Relation to Prior Work__: It took an external review of [MM14] for me to grasp how this paper differs. In the contextual offline revenue maximization setting, [MM14] is the only paper discussed. It is unclear if this is because this is the only other paper that has studied this setting, but given it was the only reference for the paper's setting, a more thorough review and comparison is needed.

__ Reproducibility__: Yes

__ Additional Feedback__: - Figure 1 would be very helpful to have in the body of the paper.
IMPRESSIONS
--------------------------------
The primary contribution of this work, it seems is the PTAS, since [MM14] shows the target loss is discontinuous, and therefore it would seem hard to optimize, making the first result seemingly unsurprising. That being said, the PTAS (and lack of FPTAS) are both interesting results worth publication on their own.

__ Summary and Contributions__: This paper studies Myersonian regression, which a variant of linear regression. The input is a set of samples representing the bids of a buyer for different products. The goal is to compute the (Myerson) pricing policy which maximizes the expected revenue, for any possible distribution from which the samples could be drawn from. The authors study the complexity of this problem: they show that is hard, there is no FPTAS, and present a PTAS that obtains an additive approximation of the optimal possible revenue, under normalization assumptions.

__ Strengths__: The paper is definitely relevant to the NeurIPS community, and is almost complete in terms of sets of results: the authors have proved both the hardness of the problem (which was missing from previous work) and also presented a PTAS for it.

__ Weaknesses__: A concrete example/applications is missing to motivate the study of the problem. Also, the assumption that the samples come from truthful auctions seem quite restrictive (since auctions used in practice are typically not truthful). The authors do mention this as a future direction though.

__ Correctness__: The methods seem to be correct as far I could check.

__ Clarity__: The paper is adequately written.

__ Relation to Prior Work__: The authors thoroughly discuss the related literature and where this paper stands in terms of previous work.

__ Reproducibility__: No

__ Additional Feedback__: You do not need to write ||x||_2 for the 2-norm, ||x|| is enough; 2-norm is the default one for the length notation.
Post rebuttal: Thank you for your response. My opinion remains the same.