NIPS 2016
Mon Dec 5th through Sun the 11th, 2016 at Centre Convencions Internacional Barcelona
Paper ID: 1920 Online Pricing with Strategic and Patient Buyers

Reviewer 1

Summary

The paper considers the problem of setting reserve prices to buyers that participate only for tau rounds and who decide to buy only if the minimum price over the rounds they stay playing is below their reserve. ** After reading the author's rebuttal I have changed my opinion of this problem and I acknowledge that the main contributions of the authors come not from the upper bound but from the lower bound proved in the paper.

Qualitative Assessment

The abstract of the paper needs a lot of improvement. The authors should also put front and center the fact that this can be viewed as a policy regret problem.

Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

Reviewer 2

Summary

Problem studied - The paper studies the following model of onilne pricing with patient buyers. In this model there is a seller with unlimited inventory and buyers arrive at each time step. The buyers are patient and willing to wait and each buyer i is willing to wait \tai_i\leq \tau steps. The seller posts a price for the next \tau steps and each buyer will pick the time that is cheapest in its waiting time. The goal for the seller is to maximize revenue. The paper gives an algorithm with regret \tau^{1/3}*n^{1/3}T^{2/3} (where n is the number of possible prices that can be posted) and gives a lower bound of \tau^{1/3}*T^{2/3}. Contributions of the paper. - The model studied in this paper is very interesting and there has been a long line of work on posted pricing for selling items. Unfortunately they do not address the fact that buyers can choose to wait to get a lower price. This paper addresses this by making a reasonable assumption of upper bound on how long each buyer is willing to wait. - The algorithm is fairly natural. It doesn't immediately follow from previous literature and requires the seller to post same price for time steps a bit longer than \tau to get correct estimates for probability of sale at a given price. - The lower bound is non-trivial and is from a reduction to bandits with switching costs.

Qualitative Assessment

Nice paper. The only way this paper can be strengthened is by getting matching upper and lower bounds with respect to number of arms.

Confidence in this Review

3-Expert (read the paper in detail, know the area, quite certain of my opinion)

Reviewer 3

Summary

The authors analyze a type of repeated posted-price auction with patient buyers. They assume a seller producing an unlimited supply of some homogenous good. The seller encounters a sequence of T buyers, each with private valuation v_t for the good. The novelty of this work is to assume that the buyers are patient, that they will wait tau_t time steps, accepting the best price in the window t,t+1,...,t+tau_t. In order to make this behavior optimal for the buyer, it is assumed that the seller announces the price at time t+tau on time step t, where tau is an upper bound on tau_t. Thus, a buyer's decision to accept/reject a price at time t has no influence on any of the prices in t,t+1,...,t+tau. A simple application of EXP3 over Epochs of length T^{1/3}, during which the buyer announces the same price, gives a regret rate of O(tau^{1/3} T^{2/3}) with respect to the best fixed price in hindsight. More interesting is the lower bound demonstrating that the T^{2/3} term cannot be improved.

Qualitative Assessment

The model introduced by the authors is attractive and worth analyzing, primarily for its simplicity; it is a reasonable model of strategic behavior that nevertheless admits a simple characterization of the buyer's optimal behavior. The upper bound is straight-forward and unsurprising. However, the lower bound is non-trivial and involves a reduction from general MAB with switching costs => impatient sellers with switching costs => patient sellers (without switching costs). I found the connection to MAB with switching costs to be surprising and interesting. One downside to this paper is that there are a number of clarity issues with the document, as well as typos and grammar errors. An incomplete list typos, etc., is included below. Several typos in the abstract! 145-146: " (there is a slight gain, by the increase probability of sell, but it does suffices to offset the loss)" 179: Capitalization: this we do again The arguments to the beta function are switched in Lemma 4 compared to elsewhere in the paper. Appendix A, line 335: chi is not defined where it is first used 220: "The idea of construction is as follows" 225: "Algorithm A' receive price" 239: "therefore" One suggestion for improvement: the body of section 4.1 describes the main ideas behind Lemma 4, not Theorem 3. The statement of Theorem 3 should be moved closer to the construction of A' in my opinion.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 4

Summary

A seller of an infinite set of identical items can post prices over time as T bidders arrive, wait, possibly obtain an item, and leave. The seller would like low regret compared to the best posted price in hindsight despite adversarially chosen bidders who are strategic about when to purchase in the window that they are present. Restricts to seller strategies that post prices \tau days in advance, where \tau is the longest bidder window; otherwise, complex game-theoretic knots arise. Proves T^(2/3) upper and lower bounds on regret, using a no-regret algorithm that posts fixed prices for T^(1/3) steps at a time.

Qualitative Assessment

I like the model and ideas presented and think the paper does a good job addressing them. There are some presentation problems such as typos and especially the abstract, but these should not be hard to fix. The results also do mainly rely on reducing to previous results. On the positive side, the model seems very interesting. The most intriguing result, which I think you should emphasize more, is that as soon as bidder patience goes from 0 to 1, optimal regret goes from sqrt(T) to T^(2/3). And indeed your proof reduces down to this case for whatever the bidder patience started at. (By the way, the main text should mention that the proof for general \tau is in the appendix, since only \tau=1 is shown and I think that is not mentioned, or else I missed it.) The ideas for the upper and lower bound are also nice in my opinion. I also thought the discussion section was very helpful. ------------- After discussion and author response: My review and opinion stay about the same.

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 5

Summary

This paper considers an online pricing problem when the seller faces a steam of incoming strategic and patient buyers. The patience of buyers is modeled by a time window, beyond which buyer will leave. The arrival of buyers types are assumed to be adversarial. A EXP3 based learning algorithm is proposed. The basic idea of the algorithm is to bundle time steps are into epochs, and a fixed price (given by EXP3) will be announced for each epoch. Then the algorithm uses the feedback from the entire epoch to update EXP3. Upper bound on learner’s regret can thus be easily established via using the results from EXP3. Then the problem is shown to be reducible from a MAB problem with switching cost; a matching lower bound is developed nicely following the reduction. This paper considered a different worker patience model for online pricing. The techniques for proving the lower bound (relate the strategic buyer model to a MAB with switching cost) are interesting, and may find applications in other problem settings. The results are presented clearly. I read the authors' rebuttal.

Qualitative Assessment

While the problem is well formulated, and both upper and matching lower bound are nicely presented, the reviewer has the following concerns. On modeling: The algorithm requires knowing workers’ patience. It is mentioned in the paper that such information can be gathered through buyers’ reports; but when such information is elicited from workers, would strategic manipulation and reporting complicate the algorithm design? On the lower bound: Based on the reviewer’s reading, the lower bound is established over a learning space that the learning algorithm always posts price forehand for tau+1 time window sizes. So if this is the case, the proved lower bound is not quite the true lower bound, as the learner can be strategic about price posting, and so can the workers respond correspondingly. Also a relevant note: before presenting Algorithm 1, it is mentioned that the intuition is to minimize the number of times it changes prices, as the buyer would select the lower of the two prices for example when the window size is 2. But this argument is again based on the fact prices are pre-posted? If the learner can adjust the prices accordingly, forgoing the current price the buyer may risk missing a lower price in hindsight. Other comments: -The authors mention that modeling patient workers using time window is an important aspect of this work, while considering a discounted buyer will make the problem less trackable. But it looks like the discount model has already been considered in existing results, e.g., in [1], -Just want to clarify the window size = tau (patience) + 1? so the claim that [11] is a special case of the current paper is not precise: [11] has tau = 0, while the set of results developed in this paper are for tau >= 1; as otherwise a sort{T} lower bound can immediately be established based on the results in [11].

Confidence in this Review

2-Confident (read it all; understood it all reasonably well)

Reviewer 6

Summary

This paper consider a trading model where a seller faces a sequence of buyers, with uncertain "patience". Given that the seller does not know when the buyers will buy the product, the seller will update the optimal price after every period. This papers provide algorithm for adjusting price dynamically and achieves a sharp lower bound on the seller's regret level.

Qualitative Assessment

Two comments: 1. While considering the seller's feedback at any day, the paper assumes that the seller has no knowledge about the buyers' patience. It seems reasonable to assume that the seller has a prior of the buyers' patience and this prior will be updated after seeing the buyers' "not buy" choice. 2. Maybe I miss some parts, but it is not clear for me why buyers' behavior is an optimal response for the seller's algorithm. I think the paper should justify this point since it is considering the strategic interactions between buyers and the seller. So the paper should give a description of "equilibrium" under the algorithm.

Confidence in this Review

1-Less confident (might not have understood significant parts)