__ Summary and Contributions__: This paper studies a two-firm problem, where each firm tries to set its price on a single product. Consumers are assumed to have a reference price, which evolves linearly over time based on the prices chosen by the firms at each time step. The demand for the product from firm i at times t is assumed to be linear in the current price vector, and the reference price. The paper studies a stable Nash equilibrium, which is one where each firm is setting a best-response price to the reference price and the price of the other firm. It is shown that such an SNE always exists. Then, the authors show that if each firm runs online mirror descent as a pricing scheme then it is possible to converge to SNE under certain conditions.

__ Strengths__: The theoretical model is clear, and the results are clear.
The paper is well written.

__ Weaknesses__: I'm unsure to what extent this work is of interest to the NeurIPS community.
The model seems very simple. For example, it does not seem super convincing that everything would be linearly-evolving in prices. It's also unclear why firms would observe the demand gradient.
Unclear if this model has any connection to real applications.

__ Correctness__: The claims seem to be correct,

__ Clarity__: Mostly, yes. There are some vagueness issues around the assumption m >= 2 and also around Assumption 1.
It is hard to figure out from the body of the paper whether Assumption 1 is invoked in Section 5. There are occasional references to "the unique interior equilibrium," but it's not clear why this unique interior equilibrium exists, unless Assumption 1 is used.
Edit: After rereading the text right before Assumption 1, it is stated that Assumption 1 is assumed throughout the rest of the paper. I think this should be explicitly highlighted in all theorem statements, or none (right now it's highlighted in Lemma 3.2, but not in later sections where it's arguably more necessary).

__ Relation to Prior Work__: Yes

__ Reproducibility__: Yes

__ Additional Feedback__: This paper is well-written and I found it interesting to read. However, in the end I gave the paper "marginally below," mostly because I am unsure to what extent the paper would appeal to a NeurIPS audience, and because I am unsure how novel the paper is (I am not very familiar with the related literature). From a game-theoretic perspective I don't think the SNE existence and OMD are results are quite interesting enough to clear the bar. Secondly, from a practical perspective I'm somewhat skeptical about the assumption that demand gradients are observed, and similarly that demands evolve linearly in prices seems somewhat unlikely.
On a more specific note, I don't find assumption 1 and footnote 7 very convincing. If the SNE occurs on the boundary then you say that the firm might expand the set of prices. But what if the SNE then loses the best-response condition? In general, why is it guaranteed that they can enlarge the set of prices until a solution occurs in the interior? And what if the set of prices needs to be expanded a lot, then the bounds would be seriously violated?
Minor comments:
"an OMD algorithm has been shown" the OMD algorithm? or OMD algorithms?
"has been studied under various equilibria frameworks" equilibrium
"In this work, unless stated otherwise, we consider m ≥ 2. In particular, we only use this condition for showing our convergence results in Section 5." These sentences seem contradictory. Are you assuming m >= 2 only in Section 5? Unclear as currently written.
In the best response conditions, after stating the inequality, it should say "for all p in the feasible set"

__ Summary and Contributions__: The paper studies the problem of price competition between two firms in a repeated setting. The authors consider a setting in which the value consumers have for the firm is endogeneous, and depends on the history of prices used by the firms. The paper study the concept of stable Nash equilibria, which encodes the traditional condition that the firm best respond to each other, but also ensures that prices are stable (the current reference price coincides with the one induced by the current prices).
The main contributions are as follows:
1) The authors show existence and uniqueness of a stable Nash equilibrium of the problem
2) The authors show how no-regret dynamics (in this case, online mirror descent) lead to convergence to a stable Nash, and characterize the convergence rate of the dynamics.

__ Strengths__: First, I think the model used by the authors is interesting. In particular, to the best of my knowledge, the idea of reference prices is novel and adds a layer of complexity to the problem. Unlike the standard model where customer values are exogeneous and the demand is only a function of the prices set by the firms, the current model considers the case when past prices define consumer expectations and have an effect on their value and on the demand. This seems to model effects that commonly arise in real-life but are not often taken into account in the CS-econ literature.
Second, the paper seems to get the best of both worlds. On the one hand, it assumes agents act in a no-regret manner, which is a much more flexible and permissible model of agent behavior than assuming they are computing (which can be computationally hard) and playing an equilibrium. A plus of this framework is that, on top of being more natural and realistic, it also allows the authors to deal with settings of incomplete information/the players’ update rely on minimum levels of information. On the other hand, even as the agents play simpler no regret strategies, the dynamics converge to a strong concept of equilibrium where agents best respond to each other and prices are stable.
From a technical point of view, I think the idea of explaining the behavior of nature as running its own online mirror descent to compute the reference prices, and reducing the 2-player pricing problem with reference prices to a 3-player pricing problem, is interesting.
Finally, the authors show a reasonable (1/t) rate of convergence of their dynamics under the right regularization and choice of (decreasing) step size. They provide additional conditions under which taking a constant step size is sufficient for convergence; in this case, the algorithm gains both in simplicity and in faster convergence rates (e^t), which constitutes a nice addition to the results of the paper.

__ Weaknesses__: One of the main weaknesses of the paper is that references prices and demand functions have a simple linear form, which seems fairly unrealistic. However, I think this is fine as these models have been studied in previous work, and as the focus of the paper is in the novelty of having reference prices and in the way they affect the demand.
Further, the experiments and Figure 1 do not seem to add much to the theory, and basically show the behavior already stated in the theorems (except for 1c that exhibits an oscillating behavior for a bad choice of constant step size). I think a useful plot that would complement the theory rather than repeating it would compare the theoretical convergence rates to the ones obtained in practice.

__ Correctness__: I have not been able to check the proofs in the paper very carefully, but the proof ideas seem reasonable at a high level.

__ Clarity__: I think the paper is well written and easy to follow. I believe the authors do a good job of motivating the problem they are looking at and explaining the novelty and the technical challenges of their contributions. I also like that, despite the short page limit, the authors spend a good amount of time discussing their algorithms and results, explaining their choices of parameters, and providing examples of when their conditions hold/do not hold.

__ Relation to Prior Work__: The authors position themselves clearly with respect to previous work. The introduction makes the position they take on the information available to the firms and their model of strategic behavior clear compared to previous work. They also provide a discussion of how their proof techniques differ from previous work in Section 5.3

__ Reproducibility__: Yes

__ Additional Feedback__: Post-rebuttal: thanks to the authors for their detailed response! I think Reviewers 1 and 4 bring up valid concerns, and it seems that some of reviewer 4's concerns are still not fully addressed after the rebuttal. I however did not lower my score and still recommend this paper for acceptance, as I think that the study of reference effects in competition is novel to the best of my knowledge, even if the model proposed by the authors is still simple. I also think that the fact that the authors show that no-regret dynamics converge to SNE is interesting, even if these results may not be too surprising. I believe this work to be of interest to the CS-Econ community in general and to the game theory track at NeurIPS.

__ Summary and Contributions__: This paper studies a multi-period model: in each period, each of two firms posts a price for each product. The consumers demand for each of the products is linear in both prices and in addition linear in the reference price which captures past prices. Specifically, it is a weighted average of the previous-period reference price and the two previous-period posted prices. The firms do not know the specific demand function and have access only to the derivative of their revenue (a function that maps a price to revenue which is equal to demand times price) which is denoted by g_i(p_i). Note that g_i() depends on the other parameters but the firm views them as constants and is able to feed g_i with a possible choice of p_i and then get the revenue derivative for that price choice.
The paper analyzes a natural learning process through which firms can reach market stability which is defined as a steady state (p^*, r^*) satisfying: (i) p_i^* maximizes the period t revenue of firm i given that the other firms posts a price p_{-i}^* in period t and that the reference price in period t is r^*, (ii) if the state in period t is (p^*, r^*) then the reference price period t+1 is r^*. I believe that this is a natural stability condition, and the paper gives a constructive proof that, for every initial reference price r, there exists a learning process that converges monotonically to market stability. However, this learning process relies on knowing which prices maximize revenue. Naively, this would require knowledge of all parameters – the opponent’s posted price, the reference price, and the coefficients of the demand function.
The paper then suggests another learning algorithm that relies on carefully predetermined fixed strongly convex function R and “step sizes” epsilon_{i,t} for every firm i and period t. The algorithm sets the p_{i,t+1} so that the difference between R’(p_{i,t+1}) and R’(p_{i,t}) divided by the step size epsilon_{i,t} will be equal to g_i(p_{i,t}) or to the upper or lower bound of the price interval if no interior point satisfies the equality. The paper shows that with the appropriate choice of the R’s and the step sizes, this algorithm converges to market stability where for any t the Euclidean norm of the difference between p_t and p^* is O(1/t).
There is the issue of why the proposed algorithm is different in a meaningful way from standard gradient descent algorithms and other results on learning dynamics in games, which the paper tries to resolve in section 5.3. There are also the standard issues of justifying the demand model and explaining whether the proposed learning algorithm captures any realistic aspects (otherwise, why is it interesting?). I am still in favor of acceptance as I believe that the analysis is novel, that the learning community should better understand such learning dynamics, and that the paper can serve as a beneficial step in that direction.

__ Strengths__: See above.

__ Weaknesses__: See above.

__ Correctness__: See above.

__ Clarity__: YES

__ Relation to Prior Work__: See above.

__ Reproducibility__: Yes

__ Additional Feedback__:

__ Summary and Contributions__: This work proposes a model for two firms setting prices in a large consumer market where consumer demand is dependent on a memory of historical prices. Deterministic, fixed step-size dynamics are assumed known for the reference price allowing analysis of the market as a 3-player game. Assuming conditions on the parameters of the consumer demand model, reference price dynamics, and access to gradient information, firms employing no-regret algorithms are proven to converge to a stable Nash equilibrium under decaying, and in special cases, fixed step sizes. The reference price is also shown to stabilize. A few experiments are conducted to support the claims.

__ Strengths__: The theoretical analysis is sound and allows a general family of step sizes and strongly convex regularizers for the firms employing online mirror descent. The general setting firms interacting in an environment that changes due to their interactions is interesting.

__ Weaknesses__: I am unconvinced of the significance and relevance of this work to the NeurIPS community. This work analyzes a 3-player game, each player with a 1-d "action" in a deterministic environment, i.e., the parameters of the game are fixed. If the authors assume the standard Euclidean regularizer, it seems as though the proofs in this paper can be mostly reasoned about by considering a linear dynamical system and basic results from variational inequalities (VI). For example given the known reference price dynamics, the Jacobian of the map F for an appropriate VI(F,X) is:
J = [2\beta_1, -\delta_1, -\gamma_1]
[-\delta_2, 2\beta_2, -\gamma_2]
[-\theta_1, -\theta_2, 1].
Given assumptions on the parameters in the paper (line 90 and 113), this matrix is diagonally dominant, therefore, has eigenvalues with positive real part (note updating the system looks like x <-- x - step * (J * x + b) so positive means continuous time convergent). The authors assume the SNE exists in the interior of the set. With this assumption, the SNE is obviously unique because this linear system has a single fixed point. The upper left 2 x 2 can be shown to be positive definite (J+J^T > alpha * I > 0) which means that the appropriate VI that models the two firms is strongly monotone [1]. The solution to a strongly monotone VI is unique and a Nash equilibrium [2]. The point of this is simply to reason that the results are not very surprising.
Given the complexity of the proof techniques used in the paper, I would have liked to have seen results that are general to n firms, consider a stochastic setting or unknown reference price dynamics, and/or examine a non-linear model.
[1] "Projected Dynamical Systems and Variational Inequalities with Applications"; Nagurney & Zhang; '12
[2] "Convex Optimization, Game Theory, and Variational Inequality Theory"; Scutari, Palomar, Facchinei, Pang; '10
After rebuttal: 1. I see your point that the SNE w.r.t. a moving ref price is more nuanced than an NE w.r.t. a fixed ref price, but the moving ref price moves in an amount proportional to the firms; if the firms converge, then the ref price converges and is effectively fixed. My argument above was not only about uniqueness, the diagonal dominance also suggests convergence. 2. I understand decaying step sizes complicate analysis. Note fixed heterogenous step sizes are handled by VIs: x <-- x - step * D * (J * x + b) = x - step * (D * J * x + D * b) where D is a pos def diagonal matrix that accounts for differences in step sizes between players. Note diagonal dominance still holds for D * J and monotonicity of the upper left 2 x 2 still holds for a range of relative step sizes for the two firms. IMO, viewing the reference price as an OMD player is not a primary contribution. Any linear update rule can be written as gradient descent on some objective. Also, referring to it as OMD is a bit misleading; the ref price updates with vanilla online gradient descent. Maybe I missed a more general analysis. 3. The player loss functions in [4] are not fixed; they may change with time. Also, the view that opponents in a game can be viewed instead as a non-stationary environment is well known; you simply reverse this argument. All in all, this paper presents an analysis of a deterministic, 3-d, linear dynamical system. Complexities arise from the fact that a) the firms may use more general OMD updates (not necessarily OGD) which could make the system nonlinear and b) decaying step sizes while the ref price uses a fixed step size. I'm raising my score to marginally above as it appears the other reviewers are interested in this analysis. I've relayed my comments to the AC and I'm okay with acceptance if the other reviewers are.

__ Correctness__: I did not check all proofs, particularly Theorem 5.3 and on, but I do not doubt the claims. The empirical methodology also appears sound.
The claim is repeatedly made that "In the multi-agent online learning literature, the games are stateless and all agents are required to take decreasing step sizes". This is not true or at least misleading. Online learning specifically looks at environments that change over time, possibly adversarially and agents use no-regret algorithms with fixed step sizes (assuming the time horizon is known) [3,4]. Moreover, the most basic algorithm from variational inequality theory, the "projection method", converges for fixed step sizes assuming certain properties of the vector field (e.g., strongly monotone) [1,2].
[3] "No-regret learning in convex games"; Gordon, Greenwald, and Marks; '08
[4] "Online Monotone Games"; Gemp and Mahadevan; '17 *note the player loss functions here are not fixed; they may change every time step.

__ Clarity__: The paper is well written although possibly overly dense given the simplicity of the model. Some of my concern likely comes from not fully digesting that the system being analyzed was linear (regularizers aside) and 3-d. The paper may benefit from providing the reader with more information up front. A paragraph giving a simple proof sketch or explanation near the beginning of the paper outlining why the results are likely true followed by an argument for why an in-depth analysis is both important and challenging is missing.
The modeling decisions were generally well motivated however the authors argue parameters are chosen in a way because it is sensible that "demand is most affected by a firm’s own prices relative to competitor’s prices and surcharge/discounts." It is a common technique when designing a linear system to choose parameters such that the system's Jacobian is diagonally dominant so that the system will be convergent. It seems this was likely an additional motivation and would be worth mentioning to the reader if so.

__ Relation to Prior Work__: The related work focuses primarily on the economics literature, which is partly telling why this paper feels less relevant to the NeurIPS community. I am not familiar with many papers in the ML literature similar to this one (e.g., [5] maybe?) and so I'm not sure what previous contributions might be missing.
[5] "Smooth markets: A basic mechanism for organizing gradient-based learners; Balduzzi et al; '20

__ Reproducibility__: Yes

__ Additional Feedback__: I will happily raise my score if the other reviewers find this paper relevant and the results important to the "Game Theory and Computational Economics" track as this is my main concern. I realize the proofs in this paper go beyond vanilla "gradient descent" for the firms and the analysis around step sizes is a bit involved -- I don't mean to trivialize your work with my back-of-the-envelope calculations above. It is just to explain that the claims you have proven do not seem like they would be unexpected.
I made suggestions above around proving claims for a more complex model.
line 40: "has..." --> "have been studied"