{"title": "Optimal Pricing in Repeated Posted-Price Auctions with Different Patience of the Seller and the Buyer", "book": "Advances in Neural Information Processing Systems", "page_first": 941, "page_last": 953, "abstract": "We study revenue optimization pricing algorithms for repeated posted-price auctions where a seller interacts with a single strategic buyer that holds a fixed private valuation.\nWhen the participants non-equally discount their cumulative utilities, we show that the optimal constant pricing (which offers the Myerson price) is no longer optimal. \nIn the case of more patient seller, we propose a novel multidimensional optimization functional --- a generalization of the one used to determine Myerson's price. This functional allows to find the optimal algorithm and to boost revenue of the optimal static pricing by an efficient low-dimensional approximation.\nNumerical experiments are provided to support our results.", "full_text": "Optimal Pricing in Repeated Posted-Price Auctions\nwith Different Patience of the Seller and the Buyer\n\nArsenii Vanunts\n\nYandex\n\nMoscow, Russia\n\navanunts@yandex.ru\n\nAlexey Drutsa\nYandex; MSU\nMoscow, Russia\n\nadrutsa@yandex.ru\n\nAbstract\n\nWe study revenue optimization pricing algorithms for repeated posted-price auc-\ntions where a seller interacts with a single strategic buyer that holds a \ufb01xed private\nvaluation. When the participants non-equally discount their cumulative utilities,\nwe show that the optimal constant pricing (which offers the Myerson price) is no\nlonger optimal. In the case of more patient seller, we propose a novel multidimen-\nsional optimization functional \u2014 a generalization of the one used to determine\nMyerson\u2019s price. This functional allows to \ufb01nd the optimal algorithm and to boost\nrevenue of the optimal static pricing by an ef\ufb01cient low-dimensional approximation.\nNumerical experiments are provided to support our results.\n\n1\n\nIntroduction\n\nAuctions have been studied for decades [82] and remain the main instrument for extracting revenue\nin Internet advertising for many years [36]. Revenue optimization problem in static (i.e., one-period)\nauctions is well studied and has proved its great worth to the Internet industry [64, 2], while the same\nproblem in dynamic auctions is still understudied [57], although the major part of web advertisement\nsales has repeated nature [7, 32]. Consider the following example: an RTB platform (a seller) tracks\na user and repeatedly sells impressions on the user\u2019s screen to advertisers (buyers) until the user is\nout of the RTB\u2019s sight. This example is naturally modeled by a sequence of repeated auctions, in\nwhich buyers have \ufb01xed valuation for a good all the way through.\nFor more than eleven years generalized second-price (GSP) auctions remain the leading instrument for\nselling ads [79] and, as argued by [7, 8, 61, 30, 32, 33], a signi\ufb01cant part of auctions in AdExchanges\ninvolve only a single buyer. Single-buyer GSP auctions are known in the literature as posted-price\nauctions [51]. Repeated setting of them is referred to as repeated posted-price auctions in studies on\nworst-case regret minimization [8, 32] and as a \ufb01shmonger\u2019s problem in studies on expected revenue\nmaximization [29]. The setting of the \ufb01shmonger\u2019s problem relies on the assumption that the seller\nknows the distribution of the buyer valuation of a good. This assumption is realistic for advertising\nauctions, since most Internet companies possess rich historical bidding data [58, 67].\nWe study the \ufb01shmonger\u2019s problem in which the seller repeatedly sells goods through a posted-price\nmechanism to the same buyer that holds a \ufb01xed private valuation for a good. The buyer seeks to\nmaximize his cumulative surplus, which is a discounted sum of his instant utilities over all rounds.\nThe seller knows the valuation distribution and the buyer\u2019s discount sequence; so, she applies a pricing\nalgorithm that sets prices in each round in order to learn the valuation and extract more revenue. The\nalgorithm is announced to the buyer in advance [61], thus, the buyer picks an optimal strategy w.r.t.\nthe announced algorithm, the valuation, and his discount sequence. The seller optimizes her expected\ncumulative revenue \u2014 a discounted sum of her instant expected utilities over all rounds \u2014 w.r.t. her\ndiscount sequence, the valuation distribution, and the buyer\u2019s discount sequence.\n\n33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.\n\n\fWhen both the seller and the buyer equally discount their utilities, an optimal pricing is known\nfrom a \u201cfolklore wisdom\" [29]: it is the constant pricing algorithm that proposes the Myerson\noptimal price [64] each round. Thus, the seller cannot advantageously apply any dynamic learning\nof prices (based on previous decisions of the buyer) to improve her revenue with respect to a much\nsimpler approach that offers the optimal static constant price over all rounds. However, in many real\napplications, the equal discount assumption may not hold due to an imbalance between the sides in\nthe patience to wait for utility [7, 8, 61] or their ability to estimate the probability that the game do\nnot terminates in a round [75, 61]. The case of our setting where the time discounts of the seller and\nthe buyer are different was never studied before1. In this work, we attempt to \ufb01ll this gap.\nIn the case of less patient seller (i.e., the seller discount is less than the buyer one), we show that the\n\u201cfolklore wisdom\" technique can be easily adapted to prove that \u201cBig deal\" algorithm2 is optimal. The\nexpected revenue of this pricing is shown to be strictly greater than the one of the optimal constant\nalgorithm, if the seller is strictly less patient. In the inverse case (the buyer is less patient), we show\nthat the problem is much more challenging and cannot be resolved by the \u201cfolklore wisdom\" or\nMyerson techniques. The problem in the initial form has structure similar to a saddle-point problem:\nthe revenue depends on an algorithm via argmax over buyer strategies and the derivatives of such\ndependence have exponential number of jump discontinuities (Sec. 4). Hence, the initial revenue\noptimization problem can be numerically solved only via a brute-force search.\nIn our work, \ufb01rst, for the game with a \ufb01nite horizon T , we reduce the problem to the optimization of\na novel multivariate functional (Th. 3) that constitutes a generalization of the one used to determine\nMyerson\u2019s price. This functional has a simple bilinear-like structure and is continuously differentiable\nas many times as the CDF of the valuation distribution. This allows to \ufb01nd the optimal pricing\nalgorithm by means of a variety of ef\ufb01cient gradient-based methods. Second, for any game, we make\na low-dimensional approximation of the optimal revenue problem by an optimal \u03c4-step algorithm3,\nwhich can be found using our reduction approach as well (Sec. 5). In this way, our multivariate\nfunctional constitutes a powerful and simple technique that allows the seller to signi\ufb01cantly increase\nher revenue (w.r.t. the optimal static pricing) even in the games with large T . So, we provide the\nrule of thumb: choose \u03c4 to \ufb01t your computational capabilities (e.g., \u03c4= 2,3), \ufb01nd the optimal \u03c4-step\npricing by the functional, and apply the prices learned in this way to get a boost in revenue.\nFinally, we support our \ufb01ndings by an extensive numerical experimentation for a variety of discount\nrates. We demonstrate that optimal algorithms are non-trivial, may be non-consistent [30, 32], have\nprices noticeably dependent on the discounts, and generate revenue larger than the constant algorithm\nwith Myerson\u2019s price (Sec. 5). Overall, our main contribution is our reduction approach that allows\nboth to \ufb01nd the optimal algorithm (even with possible structural constraints) and to boost revenue by\nthe ef\ufb01cient low-dimensional approximation in the case of less patient buyer.\n\n2 Problem statement and preliminaries\n\nSetup. A single seller and a single buyer interact repeatedly over a sequence of T rounds, where\nthe horizon T is either \ufb01nite, or in\ufb01nite. The seller possess a fresh copy of a good each round\nand the buyer values each copy of this good by a \ufb01xed private valuation v \u2208 R+. At each round\nt \u2208 [T ] :={1, ..., T} the seller sets a price pt for a new copy of the good and the buyer answers with\na decision at \u2208 {0, 1}: an accept 1 or a reject 0. Sequences of the buyer\u2019s answers are denoted by\nbold Latin letters, e.g., a = {at}T\nt=1, and are referred to as buyer strategies. The price pt depends on\nthe previous answers of the buyer a1, .., at\u22121, i.e. the seller uses a deterministic pricing algorithm A\nto set prices [30].\nGiven an algorithm A and a strategy a, the price sequence {pt}T\nt=1 is uniquely determined. The\ninstant utilities of the buyer and the seller are at(v \u2212 pt) and atpt, respectively, in round t. The\ninstant utilities contribute to the buyer\u2019s (the seller\u2019s) total utility w.r.t. a discount \u03b3B\nt , resp.).\nTotal utilities of the buyer and the seller are referred to as the buyer surplus and the seller revenue:\nt atpt, resp. Both the buyer and the\nseller are rational and risk-neutral agents [52]. Discount sequences \u03b3B ={\u03b3B\nt }T\nt=1\n1There were works (e.g., [7, 8, 30, 32, 33]), where only buyer utilities were discounted ,while the seller\u2019s\nones did not. But, those studies considered worst-case regret optimization, which is different from our setting.\n\nt at(v\u2212pt) and Rev\u03b3 S (A, a) :=(cid:80)T\n\nSur\u03b3 B (A, a, v) :=(cid:80)T\n\nt=1 \u03b3B\n\n2This pricing offers an up-front payment for all copies of a good for Myerson\u2019s price in the \ufb01rst round.\n3A \u03c4-step algorithm plays all the T rounds, but its prices do not change after the round \u03c4 < T .\n\nt (\u03b3S\n\nt=1 \u03b3S\n\nt }T\n\nt=1 and \u03b3S ={\u03b3S\n\n2\n\n\fB\n\nt > 0 \u2200t, and have \ufb01nite sums: \u0393B:=(cid:80)T\n\nt , \u03b3S\n\nt , \u0393S:=(cid:80)T\n\nt=1 \u03b3S\n\nand \u03b3S\n\nt = \u03b3t\u22121\n\nS\n\nt=1 \u03b3B\n\nt < \u221e. For simplicity\nare positive, \u03b3B\nof presentation, from here on in our paper we assume that the discounts decrease geometrically:\nt = \u03b3t\u22121\nfor some \u03b3B, \u03b3S > 0. However, our results hold for a larger variety of\n\u03b3B\ndiscounts (see Remarks 1 and 2).\nOur setting is based on two standard assumptions: (1) the buyer knows the pricing algorithm A in\nadvance (i.e., the seller commits to it at the beginning of the \ufb01rst round); and (2) the seller knows the\ndistribution D from which the private buyer valuation v (unknown to her) is drawn. Assumption (1)\nmatches the practice in Internet advertising [61] since RTB platforms run hundreds of millions\nauctions a day [7, 30] (see App. F for mode details). Assumption (2) is realistic since most Internet\ncompanies have access to rich historical data [67]. We also assume that the seller knows the exact\nbuyer discount sequence.4 The CDF and the density of D is denoted by FD and fD, resp. The\nrandom valuation is denoted by V , V \u223c D.\nOur rational buyer with the private valuation v that knows the algorithm A in advance is referred to as a\nstrategic buyer [7] and exploits an optimal strategy aOpt(A, v, \u03b3B) := argmaxa\u2208ST Sur\u03b3 B(A, a, v),\nwhere ST :={0, 1}T is the set of all possible strategies. This leads us to the de\ufb01nition of the strategic\nrevenue of the pricing A, which faces the strategic buyer with a valuation v:\nSRev\u03b3 S,\u03b3 B (A, v) := Rev\u03b3 S(A, aOpt(A, v, \u03b3B)).\n\n(1)\nWe consider the problem of pricing optimization from the seller\u2019s point of view. This problem is stated\nas follows: \ufb01nd such algorithm A\u2217 that its expected strategic revenue (ESR) EV \u223cD[SRev\u03b3 S,\u03b3 B(A\u2217,V )]\nis not less than the ESR of any other algorithm (i.e. the ESR of the algorithm A\u2217 is the maximum).\nNotations and auxiliary de\ufb01nitions. Following [51, 61, 30, 32], we associate a deterministic pricing\nalgorithm with a perfect (T\u22121)-depth binary tree with labeled nodes. Let NT be the set of nodes of\nthe tree and A(n) be a label of a node n \u2208 NT . In the \ufb01rst round, the current node is the root e \u2208 NT .\nLet n be the current node in a round t (the depth |n| of n is t \u2212 1); then the algorithm offers the price\nA(n). If this price is rejected, the current node moves to the n\u2019s left child denoted by n0, otherwise\nthe current node moves to the n\u2019s right child denoted by n1. We denote nodes by \ufb01nite strings over\nthe alphabet {0, 1}: the root is the empty string e, its left child is 0, the right one is 1, the right child\nof 0 is 01, etc. (e.g. 0k is the string of k zeros). Thus, NT :={n\u2208{0, 1}\u2217 | |n| < T}, where |n| is the\nlength of the string n, and the set of algorithms AT is the set of maps from NT to R+: AT = RNT\n+ .\nResearch questions. One of standard interpretations5 of a discount factor \u03b3t\u22121\n) is the\nparticipant\u2019s estimate of the probability that the repeated auctions will last at least t rounds [29, 32].\nWhile the constant Myerson algorithm (see Sec. 3) is a well-known folklore solution for the case\nof equal discounts [29], the case of different discounts, in our setup, was never considered earlier,\nalthough it is more realistic. In Internet advertising, the seller and the buyer are usually companies of\ndifferent sizes, with different opportunities and capabilities (e.g., an RTB platform vs. a web site with\nan advertisement, see App. F for an example as well). In this way, they may have different data (or\naccess to them) that are used to make an estimation of the game-continuation probability (i.e., the\ndiscount factor). For instance, most likely that the RTB platform has more data and may know which\ndata are not available to the advertiser. As a result, the auction participants have different discounts.\nIn the case when the buyer overestimates the discount factor \u03b3B > \u03b3S, we show that the seller\ncan obtain (1 \u2212 \u03b3S)/(1 \u2212 \u03b3B) times larger expected revenue than the one of the constant Myerson\nalgorithm: she should apply the \"Big deal\" pricing algorithm (Sec.3). The inverse case appears to\nbe non-trivial, and, in our study, we primarily address the following research questions in the case\nof \u03b3S > \u03b3B (Sec. 4 and 5): (1) What is the optimal algorithm and its expected strategic revenue?\n(2) How much more is the maximal ESR than the constant Myerson\u2019s one? (3) Can the seller extract\nexpected revenue more than in the static Myerson pricing having limits on computational resources?\nRelated work. There are two series of works that are most relevant to ours. The \ufb01rst one studied\nrepeated posted-price auctions in the worst-case scenario [51, 7, 8, 61, 30, 31, 32], where our setting\nof the strategic buyer with a \ufb01xed private valuation is considered. Amin et al. [7] proposed to seek for\nalgorithms that have the lowest possible asymptotic upper bound on the strategic regret for the worst\ncase valuation of the buyer. Recently, Drutsa [30, 32, 33] has found pricings with optimal regret\n\n(or \u03b3t\u22121\n\nB\n\nS\n\n4Our results still can be applied in the case when the seller possesses only incomplete information about the\n\nbuyer\u2019s discount sequence. See 6 for more details.\n\n5Alternatively, a discount factor can model the patience level of a participant to wait for instant revenue [7, 30].\n\n3\n\n\fbound. In contrast to these studies, \ufb01rst, we search for a pricing algorithm that maximizes the strategic\nrevenue expected over buyer valuations, which matches the practice of ad exchanges and optimization\ngoals in classical auction theory [52]. Second, our revenue optimization problem is solved exactly\n(not via optimization of lower/upper bounds). Third, our study considers a more general setup in\nwhich not only the buyer\u2019s surplus is discounted over rounds, but also the seller\u2019s revenue does.\nThe second series studied repeated posted-price auctions with an incomplete information and in the\nabsence of the ability to commit [43, 75, 29, 47]. The authors of [43, 75] showed that, in the case of\nnon-commitment, the seller is forced to sell a good for a minimal possible price until last few rounds.\nDevanur et al. [29] showed that the seller can obtain non-trivial revenue, if she is able to partially\ncommit, e.g. to commit not to raise prices. Enhancing the competition was shown to allow the\nseller to extract non-trivial revenue as well [47]. However, all these works treated the \"commitment\"\nrevenue as a unachievable benchmark. Hence, in our case of repeated auctions that is motivated\nby Internet advertisement sales, where the seller is able to commit6, it is unreasonable to consider\nthe non-commitment case. Our setting (but in the case of equal discounts) can be considered as a\nmore general dynamic mechanism design problem studied, e.g. in [49, 71, 70, 13]. To the best of our\nknowledge this line of work never considered scenarios with different discounts. It would be a great\nfuture study to generalize our results for different discounts to more general dynamic mechanisms.\n3 Less patient seller: the case of \u03b3S \u2264 \u03b3B\nOur study begins with the analysis of the case \u03b3S \u2264 \u03b3B in two steps. First, the subcase of equal\ndiscounts \u03b3S = \u03b3B can be resolved by means of the classical auction theory [64]. Second, we reduce\nthe whole case \u03b3S \u2264 \u03b3B to the subcase \u03b3S = \u03b3B by showing that, for \u03b3S \u2264 \u03b3B, the seller can obtain the\nsame strategic revenue as if her discount was \u03b3B instead. Two simple optimal algorithms are provided.\nEqual discounts: a constant algorithm. Let \u03b3S =\u03b3B =\u03b3, then one can apply the almost folklore\ntechnique of reducing this subcase to a single-round feasible mechanism [29]. Key steps of this\ntechnique are provided in App. A.1.1 for completeness of our study on different discounts. The\nD(1 \u2212\nexpected revenue of the obtained feasible mechanism is known [64] to be no greater than p\u2217\nFD(p\u2217\nD is the Myerson price, i.e.,\nthe price that maximizes the functional HD(p) := pP[V \u2265 p] = p(1 \u2212 FD(p))7. Thus, the following\nD)) \u2200A \u2208 AT . This bound is achieved, in\nupper bound holds: E [SRev\u03b3,\u03b3(A, V )] \u2264 \u0393p\u2217\nparticular, by the algorithm A\u2217\nD, i.e., \u2200n A\u2217\nD, and is\nreferred to as the optimal constant algorithm. Overall, the following theorem holds:\nTheorem 1 ([29]). Let the discount rates be equal: \u03b3S = \u03b3B = \u03b3. Then the optimal constant algorithm\nA\u2217\n1 is optimal among all pricing algorithms AT and the optimal revenue is \u0393p\u2217\n\u201cBig deal\" for less patient seller. Let us consider the whole case of less patient seller: \u03b3S \u2264 \u03b3B. It\nis easy to see that Rev\u03b31 (A, a)\u2264 Rev\u03b32 (A, a) for any A and a, when \u03b31 \u2264 \u03b32. Hence, for any A,\nv, and \u03b31\u2264 \u03b32, the inequality SRev\u03b31,\u03b3 B (A, v)\u2264 SRev\u03b32,\u03b3 B (A, v) holds as well, since the optimal\nstrategy aOpt does not depend on the seller\u2019s discount \u03b3S. So, taking \u03b31 = \u03b3S and \u03b32 = \u03b3B, one gets:\n(2)\n\nD)), where FD is the CDF of our valuation variable V \u223c D and p\u2217\n\n1 which constantly offers the price p\u2217\n\nD(1 \u2212 F (p\u2217\n\nD)).\n\nD(1 \u2212 F (p\u2217\n\nE [SRev\u03b3 B,\u03b3 B(A, V )] = \u0393Bp\u2217\n\nD(1 \u2212 FD(p\u2217\n\nD)).\n\n1(n) = p\u2217\n\nmaxA\u2208AT\n\nE [SRev\u03b3 S,\u03b3 B(A, V )] \u2264 maxA\u2208AT\n\nThe latter identity in Eq. (2) is from Th. 1. The bound in Eq. (2) is achievable as well. Namely, let us\nconsider the following algorithm A\u2217\nbd (referred to as the \u201cbig deal\") given \u03b3B and V \u223c D: the \ufb01rst\nprice is Abd(e) = \u0393Bp\u2217\nD; if the buyer accepts it, prices in further rounds will be Abd(1 \u25e6 n) = 0 \u2200n;\notherwise Abd(0 \u25e6 n) = p\u2217\nD \u2200n. An attentive reader may note that the strategic buyer accepts the \ufb01rst\nD. Hence, similarly to the algorithm A\u2217\nbd(e) \u21d4 v > p\u2217\nprice A\u2217\n1, it is easy to show that the ESR of\nA\u2217\nD(1 \u2212 F (p\u2217\nD)). The key idea behind the algorithm A\u2217\nbd is \u0393Bp\u2217\nbd is quite simple. Roughly speaking,\nthe seller \u201caccumulates\" all her revenue at the \ufb01rst round by proposing the buyer a \u201cbig deal\" that\nincentivises him to pay a large price at the \ufb01rst round and get all goods in the subsequent rounds for\nfree, or, otherwise, get nothing8. Overall, the following theorem holds:\nTheorem 2. Let the discount rates be s.t. \u03b3S \u2264 \u03b3B. Then the \u201cbig-deal\" algorithm A\u2217\namong all pricing algorithms AT and the optimal revenue is \u0393Bp\u2217(1 \u2212 FD(p\u2217)).\n\nbd is optimal\n\n6RTB platforms run 108 auctions a day: commitment violation will be easily seen by advertisers, see App.F.\n7This price can be \ufb01nd by the equation p=(1\u2212FD(p))/fD(p), when D has continuous probability density fD.\n8A similar pricing was in [49] for mechanism environments with multiplicative separability.\n\n4\n\n\fTh. 2 implies that, \ufb01rst, the optimal constant algorithm A\u2217\n1 is not the unique optimal one in the\nsubcase of equal discounts \u03b3S = \u03b3B. Second, in the other subcase of \u03b3S < \u03b3B, the constant algorithm\n1 is no longer optimal: the relative ESR of the optimal algorithm A\u2217\nA\u2217\nbd w.r.t. the optimal constant one\nA\u2217\n9; i.e. the optimal revenue is larger than the one obtained by\n1 is \u0393B/\u0393S, which is > 1, when \u03b3S<\u03b3B\noffering the Myerson price constantly. This result is quite inspiring for the seller, since the dominance\nof the buyer\u2019s discount \u03b3B over the seller\u2019s one \u03b3S suggests a hypothesis that the seller should earn\nlower than with \u03b3B (e.g., see the revenue of A\u2217\n1). But the ability of the seller to apply the trick of\n\u201caccumulation\" of all her revenue at the \ufb01rst round allows her to get the payments for all goods\ndiscounted by the buyer\u2019s \u03b3B at the \ufb01rst round and to boost thus her revenue over the constant pricing.\nRemark 1. All results of the section hold even for non-geometric discounts s.t. \u03b3S\u2264\u03b3B (see App.A.1).\n4 Less patient buyer (\u03b3S \u2265 \u03b3B): reduction to an optimization functional\nThis section provides the central fundamental results of our study. They are obtained for \ufb01nite games,\nbut, further, we show how to use them to get approximately optimal algorithms even for in\ufb01nite\ngames. In contrast to the case \u03b3S\u2264\u03b3B, \ufb01nding an optimal pricing for \u03b3S\u2265\u03b3B is much more dif\ufb01cult\nproblem since the technique used in Sec. 3 to upper bound the expected strategic revenue is no longer\napplicable (because it relies on the condition \u03b3S\u2264\u03b3B) and a generalization of the functional HD(\u00b7)\nto a multivariate analogue is required. Note that the optimization problem of the ESR has structure\nsimilar to a saddle-point problem: the ESR depends on A via aOpt which is an argmax over the\nset of strategies ST . Moreover, the derivative of such dependences are piecewise continuous with\njump discontinuities on the boundaries of pieces (there are 22T \u22123 pieces with derivatives of different\nforms). Hence, the problem in the initial form can be numerically solved only via brute-force search.\nIn order to make numerical solution of the problem more feasible, we will reduce it into the form of\na multidimensional maximization of a simple bilinear-like function (namely, L(v) in Eq.(4)) that is\ncontinuously differentiable as many times as the CDF FD and its derivatives have simple form and\ncan be easily computed. The key steps are: (1) \ufb01nd a class of algorithms whose prices (2) can be\nlinearly parametrized by points in the support of D s.t. (3) the strategic revenue is constant between\nthese points. For the sake of presentation, we consider regular discounts.\nDe\ufb01nition 1. A discount sequence \u03b3 is regular, if \u03b3\u00b7a1(cid:54)= \u03b3\u00b7a2 for any strategies a1, a2\u2208 ST , i.e., any\nt atbt).\nDe\ufb01nition 2. Let \u03b3 be a discount, then an algorithm A \u2208 AT is said to be completely active (CA)\nfor \u03b3, if for any strategy a \u2208 ST there exists a valuation v \u2208 R+ s.t. Sa(v) = S(v), where\nSa(v) := Sur\u03b3(A, a, v) and S(v) := SaOpt(A,v,\u03b3)(v), i.e., the surplus function Sa (as a line) is\ntangent to the optimal surplus function S. We denote the set of all CA algorithms for \u03b3 by \u02dcAT (\u03b3).\n\nbuyer strategy a \u2208 ST results in a unique discounted quantity of purchased goods (a \u00b7 b :=(cid:80)\n\nB\n\nA CA algorithm is such that any node in its labeled tree can be reached by the strategic buyer for at\nleast one valuation v, i.e., be active. Surprisingly, any algorithm can be transformed to a completely\nactive one for \u03b3B with no loss in the expected strategic revenue. Indeed, let A be a non-CA algorithm\nfor \u03b3B, then there exists an inactive strategy a \u2208 ST (i.e. \u2200v \u2265 0 Sa(v) < S(v)). We tune A\nin such a way that Sa becomes tangent to S without affecting the other surplus functions Sb for\nb (cid:54)= a (it is visualized in Fig. A.1 in App. A.2.1). Namely, let \u03c4 be the index of the last 1 in a\nand n := a1:\u03c4\u22121 be the (\u03c4\u22121)-round substrategy of a. We decrease p := A(n) until Sa becomes\ntangent to S. This operation will move also all Sb s.t. b1:\u03c4 = a1:\u03c4 to the left. In order to make them\nunaffected, we simultaneously increase ps :=A(n \u25e6 10s) for 0\u2264 s\u2264 T \u2212 \u03c4 \u2212 1 in such a way that\nps = const. Hence, aOpt(A, v, \u03b3B) is unaffected for all v except the point of tangency. Since\np + \u03b3s+1\n\u03b3S > \u03b3B, the revenues Rev\u03b3 S (A, v, b) only increase after our tuning, when b1:\u03c4 = a1:\u03c4 , otherwise\nthey are not changed for b(cid:54)= a what infers that SRev\u03b3 S,\u03b3 B (A, \u00b7 ) increases in all points except one.\nTuning of the algorithm by \u201cactivating\" all inactive strategies one by one in descending order of \u03c4\n(this ensures that decreasing of p will not result in negative prices) gives us a CA (for \u03b3B) algorithm\nwithout loss in the ESR. Formally, the following proposition holds (the proof is in App. A.2.1).\n}T\nProposition 1. Let T \u2208 N and \u03b3S, \u03b3B be discount rates s.t. \u03b3S \u2265 \u03b3B and the sequence \u03b3B ={\u03b3t\u22121\nis regular. Then, for any pricing algorithm A \u2208 AT , there exists a CA algorithm \u02dcA \u2208 \u02dcAT (\u03b3B) s.t.\n(3)\nand goes to +\u221e as \u03b3B\u21921\u2212 for a \ufb01xed \u03b3S.\n\nE(cid:2)SRev\u03b3 S,\u03b3 B (A, V )(cid:3) \u2264 E(cid:2)SRev\u03b3 S,\u03b3 B( \u02dcA, V )(cid:3).\n\n9Moreover, for T =\u221e, this revenue improvement is \u0393B/\u0393S= 1\u2212\u03b3S\n1\u2212\u03b3B\n\nB\n\nt=1\n\n5\n\n\fj=1\n\nThe fundamental property of a CA algorithm: it bijectively corresponds to the break (discontinuity)\npoints of the derivative of its surplus function S(\u00b7), which is piecewise linear10. Namely, the class\n\u02dcAT can be linearly mapped onto \u2206k :={v={vj}k\nj=1 \u2208Rk|0\u2264v1\u2264...\u2264vk}, where k:=k(T ):=2T\u22121.\nThe key intuition is as follows. Number the buyer strategies ST = {a0, ..., ak} in ascending order\nof the slope \u03b3B \u00b7 ai of the corresponding \u03b3B-discounted surplus function Sai (the \u03b3B-dependent\nnatural order). Let (vi, si) be the coordinates of the intersection of the straight lines Sai(\u00b7) and\nSai\u22121 (\u00b7). An algorithm is CA iff these intersections are on the envelop S(\u00b7) and vi\u22121 \u2264 vi \u2200i \u2264 k.\nThe linear parametrization holds since the break point vi is linearly expressed in terms of the slopes\nand intercepts of the lines Sai(\u00b7) and Sai\u22121 (\u00b7), while the intercepts are linear in the algorithm prices.\nFormally, this dependence is the product ZT (\u03b3B)JT KT (\u03b3B, \u03b3B) of k \u00d7 k matrices, where JT is\na two-diagonal one with 1 on the diagonal and \u22121 under the diagonal; ZT (\u03b3) = diag(z1, .., zk),\nzj = (\u03b3 \u00b7 aj \u2212 \u03b3 \u00b7 aj\u22121)\u22121 for j = 1, .., k; and KT (\u03b3B, \u03b3(cid:48)) = ((\u03baij))i,j=1,..,k, where \u03baij = \u03b3(cid:48)\nt if\ntai\nthe path ai \u2208 ST passes through the node nj \u2208 NT whose round is t=|nj|+1, and \u03baij = 0, otherwise,\nfor some \ufb01xed numbering of the nodes NT = {nj}k\n11. All technical details are in App. A.2.2.\nFinally, the parametrization via the break points {vi}k\ni=1 allows to easily calculate the ESR of the\nalgorithm. Indeed, the revenue SRev\u03b3 S,\u03b3 B(A, v) is constant on the intervals (vi, vi+1), because \u03b3B\nis regular and the strategic buyer chooses only the strategy ai, when his valuation v is in (vi, vi+1).\nHence, the ESR is the sum of constant revenues on the intervals weighted by their probabilities:\ni=1(FD(vi+1) \u2212 FD(vi))Rev\u03b3 S (A, ai), where Rev\u03b3 S (A, ai) can be lin-\nearly expressed in terms of the algorithm prices and, thus, in terms of the break points {vi}k\ni=1 (by\nmeans of our matrices introduced above). Integration by parts makes the ESR be a bilinear form of\n{1\u2212FD(vi)}k\ni=1. We formalize it in the following proposition (the proof is in App. A.2.3),\nwhich implies Th. 3 since the class of CA algorithms \u02dcAT contains an optimal pricing (by Prop. 1).\nProposition 2. Let T \u2208 N, \u03b3S be a discount, \u03b3B be a regular discount, the strategies ST be naturally\nordered by \u03b3B and the matrix notations be introduced as above. Then there exists an invertible linear\ntransformation w\u03b3 B : \u02dcAT (\u03b3B) \u2192 \u2206k, k = k(T ), s.t., for any completely active pricing algorithm\nA \u2208 \u02dcAT (\u03b3B), its ESR has the form EV \u223cD [SRev\u03b3 S,\u03b3 B (A, V )] = LD,\u03b3 S,\u03b3 B(w\u03b3 B (A)), where\n\nE [SRev\u03b3 S,\u03b3 B (A, V )] =(cid:80)k\n\nLD,\u03b3 S,\u03b3 B(v) := (1 \u2212 FD(v))\n\u039eT (\u03b3S, \u03b3B) := JT \u00b7 KT (\u03b3B, \u03b3S)KT (\u03b3B, \u03b3B)\u22121J\u22121\ndepends only on the discounts; and the vector (1 \u2212 FD(v)) := {1\u2212FD(vi)}k\nTheorem 3. Let T \u2208 N and \u03b3S, \u03b3B be discount rates s.t. \u03b3S \u2265 \u03b3B and the sequence \u03b3B ={\u03b3t\u22121\nt=1\nis regular. The optimization problem of \ufb01nding an optimal algorithm is equivalent to maximization of\nthe multivariate functional LD,\u03b3 S,\u03b3 B (\u00b7) over the set \u2206k ={v \u2208 Rk| 0\u2264 v1\u2264 ...\u2264 vk}, k = 2T \u22121, i.e.,\n(5)\n\n(4)\nT ZT (\u03b3B)\u22121 is the invertible k \u00d7 k matrix that\n}T\n\n\u039eT (\u03b3S, \u03b3B)v, v \u2208 \u2206k;\n\ni=1 and {vi}k\n\ni=1 \u2208 Rk.\n\nEV \u223cD[SRev\u03b3 B,\u03b3 S(A, V )] = max\nv\u2208\u2206k\n\nLD,\u03b3 S,\u03b3 B (v),\n\nmaxA\u2208AT\n\n(cid:124)\n\nB\n\nwhere LD,\u03b3 S,\u03b3 B is de\ufb01ned in Eq. (4) and depends only on the discounts and the distribution D.\n\nIt is quite important to emphasize that the k-dimensional functional LD,\u03b3 S,\u03b3 B is a bilinear form\napplied to the vectors v and 1 \u2212 FD(v). This bilinear form is independent of the distribution\nD and is de\ufb01ned by the matrix \u039eT (\u03b3S, \u03b3B). In this view, there is a strong relationship between\nour optimization functional LD,\u03b3 S,\u03b3 B and the function HD (see Sec. 3): the functional LD,\u03b3 S,\u03b3 B\nconstitutes the key basis of optimal algorithms in dynamic setting and is fundamental for them as the\nfunction HD(p) = pPV \u223cD[V \u2265 p] is fundamental for optimal pricing in static auctions. Moreover,\nin the case of equal discounts \u03b3S = \u03b3B, the optimization of LD,\u03b3 B,\u03b3 B reduces to the maximization of\nHD (simple algebra is in App. A.2.4). Since, in the particular case of \u03b3S = \u03b3B, the optimization of\nLD,\u03b3 B,\u03b3 B has no closed form solution (it reduces to the optimization of HD), we thus expect that, in\nthe other cases, generally, our optimization problem does not admit a closed form solution as well.\nIn contrast to the initial form of our problem, numerical optimization of the functional LD,\u03b3 B,\u03b3 B\nis much easier (though it still has the same number of variables as the initial problem). First, the\nfunctional is continuously differentiable as many times as the CDF FD. Second, its derivatives\nl(1 \u2212 FD(vl))\u03beli, \u2202vi\u2202vj L(v) =\n10In a piece (an interval (vi, vi+1)) the function S(\u00b7) equals to the function Sai (\u00b7) for some strategy ai which\n\nhave simple form, i, j = 1, .., k: \u2202viL(v) =\u2212fD(vi)(cid:80)\nis a linear function of v: Sai (v) = ((cid:80)\n11Note: by the de\ufb01nition, the i-th component of the vector KT (\u03b3 B, \u03b3(cid:48))A is equal to(cid:80)T\n\nl \u03beilvl +(cid:80)\n\nt)v \u2212 ((cid:80)\n\ntpt), see Def. 2.\n\nt \u03b3B\n\nt \u03b3B\n\ntA(ai\nt=1\u03b3(cid:48)\ntai\n\n1...ai\n\nt\u22121).\n\nt ai\n\nt ai\n\n6\n\n\fD(vi)(cid:80)\n\nvi\n\nL(v) =\u22122fD(vi)\u03beii\u2212f(cid:48)\n\n\u2212fD(vi)\u03beij\u2212fD(vj)\u03beji for i (cid:54)= j, and \u22022\nl \u03beilvl, where \u03beij is the\nij-th element of \u039eT (\u03b3S, \u03b3B). The derivatives can be easily computed: see App. I for the pseudo-code\nthat calculates \u03beij. Third, the domain \u2206k is convex (moreover is closed when the support of FD is\nbounded) and has a simple form of simplex. Finally, the matrix \u039eT (\u03b3S, \u03b3B) is positive de\ufb01nite on \u2206k.\nHence, a variety of gradient methods can be used to \ufb01nd the solution (see our experiments in Sec. 5).\nThe step-by-step instruction to \ufb01nd the optimal pricing. Remind that, for static pricing, the op-\ntimal (Myerson) price can be found from maximization of the functional HD(p) = p(1 \u2212 FD(p)).\nIn our dynamic case, the optimal pricing algorithm can be found similarly as follows: (I) construct\nthe matrix \u039e (the pseudo-code to calculate its elements is in Appendix I); (II) construct the func-\ntional LD,\u03b3 B,\u03b3 B (\u00b7) from Eq. (4); (III) \ufb01nd a vector vOpt s.t. it maximizes LD,\u03b3 B,\u03b3 B (v), e.g., an apply\nnumerical method using derivatives of LD,\u03b3 B,\u03b3 B (\u00b7) provided in the previous paragraph; (IV) convert\n\u03b3 B (\u00b7),\nthe vector vOpt to the prices of the optimal algorithm by means of the linear transformation w\u22121\nwhich is mentioned in Prop. 2 and whose matrix is KT (\u03b3B, \u03b3B)\u22121J\u22121\nT ZT (\u03b3B)\u22121 (see App. A.2.2).\nRemark 2. In Appendix A.2, we show that all results of this section hold also for non-geometric\ndiscounts \u03b3S = {\u03b3S\nt \u2264 \u03b3S\nRemark 3. The regularity of the discount \u03b3B is used to get: the uniqueness of \u03b3-dependent natural\norder of the strategies ST (for Prop. 2); zero probability of valuations for which the optimal buyer\nstrategy is not unique (in Prop. 1). Ways to relax this restriction are discussed in App. D. In any way,\nnon-regular discounts are rare, and do not affect our qualitative results in Sec. 5.\n\nt=1 and \u03b3B = {\u03b3B\n\nt=1 such that \u03b3B\n\nt .\nt+1/\u03b3S\n\nt+1/\u03b3B\n\nt }T\n\nt }T\n\n5 Ef\ufb01cient approximation, constrained optimization, numerical experiments\nApproximation by optimal \u03c4-step pricing (\u03b3S \u2265 \u03b3B). In the case of in\ufb01nite games, we have no\nsimilar powerful instrument to \ufb01nd an optimal pricing (unlike to the case of \ufb01nite games in Sec. 4).\nMoreover, when the horizon T is \ufb01nite but suf\ufb01ciently large, the optimization problem even in the\nsimpli\ufb01ed form of Eq. (5) suffers from dimensional complexity since the number of variables is\n2T \u2212 1. In both cases, however, we can approximate the optimal algorithm by an algorithm that\nis optimal in some \ufb01nite dimensional subclass of AT , T \u2208 N\u222a{\u221e}. Namely, for \u03c4 \u2208 N, let us say\nthat A is a \u03c4-step pricing algorithm, if \u2200a, t > \u03c4 : A(a1:t\u22121) = A(a1:\u03c4\u22121), i.e., it is constant\nfrom the \u03c4-th round on. The set of all \u03c4-step algorithms is denoted by A\u03c4\nT . An attentive reader\nmay note that the problem of \ufb01nding an optimal \u03c4-step algorithm A\u2208 A\u03c4\nT for the \ufb01nite or in\ufb01nite\ngame is equivalent to \ufb01nding an optimal algorithm for the \u03c4-round \ufb01nite game with \"shortened\"\ndiscount sequences \u03b3S,\u03c4 := (\u03b3S\nt ). Hence, one\ncan apply the optimization technique from Th. 3 (which holds for \u03b3B,\u03c4 and \u03b3S,\u03c4 due to Remark 2).\nThe following proposition (the proof is in App. A.3.1) formally states that the expected revenue of the\nT converges to one of the optimal pricing A\u2217 \u2208 AT when \u03c4 \u2192 T .\noptimal \u03c4-step algorithm A\u2217\nProposition 3. Let T \u2208 N \u222a {\u221e} and \u03b3S, \u03b3B be discount sequences s.t. \u03b3B\n\u03c4 :=\n\n\u03c4\u22121,(cid:80)T\n\n\u03c4\u22121,(cid:80)T\n\nt ) and \u03b3B,\u03c4 := (\u03b3B\n\n\u03c4 \u2208 A\u03c4\n\nt \u2264 \u03b3S\n\n1, .., \u03b3B\n\n1, .., \u03b3S\n\nt+1/\u03b3S\n\nt+1/\u03b3B\n\nt=\u03c4 \u03b3B\n\nt=\u03c4 \u03b3S\n\nt , \u0393S\n\n(cid:80)T\n\nt=\u03c4 +1 \u03b3S\nE[SRev\u03b3 S,\u03b3 B (A,V )]\u2264 maxA\u2208AT\n\nt for \u03c4 \u2208 N, \u03c4 < T . Then the following bounds hold:\nE [SRev\u03b3 S,\u03b3 B(A,V )] \u2264 maxA\u2208A\u03c4\n\nT\n\nmaxA\u2208A\u03c4\n\nT\n\nE [SRev\u03b3 S,\u03b3 B(A, V )]+\u0393S\n\n\u03c4\n\nE [V ] .\n\n(6)\n\nS\n\n1\n\nE [V ] /\u0393S\n\nE [V ] = \u03b3\u03c4\u22121\n\nFirst, Prop. 3 provides the seller with a tool to make a trade-off between the achievable fraction of\nthe maximal revenue and the computational complexity of the optimization problem to be solved.\nIn particular, she is able to choose the parameter \u03c4 s.t. her computational capabilities on the dimen-\nsion 2\u03c4 \u2212 1 of the optimization functional L are \ufb01tted and the boost in the relative regret bound\nis minimal. Note that the seller can improve her revenue obtained from an\n\u0393S\n\u03c4\noptimal constant algorithm just by applying an optimal \u03c4-step algorithm for small \u03c4. For instance, for\n\u03c4 = 4, this algorithm can be easily found in 2\u03c4 \u2212 1 = 15-dimensional space and provides noticeable\nboost in revenue (revenue improvement is illustrated in Fig. 1). Second, from Eq. (6), we have that\n\u03c4 = \u03b3S. On the one\nthe convergence bound is \u0393S\nhand, it means that the smaller \u03b3S is, the faster the revenue of the suboptimal algorithm A\u2217\n\u03c4 converges\nto the optimal revenue, and, thus, the functional L in Eq. (4) with the smaller dimension should be\noptimized to reach revenue close to the optimal one within \u0001 error, \u0001 > 012. On the other hand, the\n(\u0001(1\u2212\u03b3S)E[V ]) to be \u0001-close to the optimal revenue. Note that \u03c4\u03b3S,D,\u0001\u2192\u03b3S\u21920 0.\n\nS /(1 \u2212 \u03b3S) and the convergence rate is \u0393S\n\n12Take \u03c4 > \u03c4\u03b3S,D,\u0001 := log\u03b3S\n\n\u03c4 +1/\u0393S\n\n\u03c4 = \u03b3\u03c4\n\n7\n\n\f4 and the relative expected strategic revenue (w.r.t. A\u2217\n\n4(n), for nodes n\u2208 N s.t. |n|\u2264 3, of the\nFigure 1: In\ufb01nite game T = \u221e, uniform D. The prices A\u2217\noptimal 4-step algorithm A\u2217\n1) of the optimal\n\u03c4-step algorithm A\u2217\n\u03c4 , \u03c4 = 2, ..,6, for discounts: (a) \u03b3S=0.8 and various \u03b3B; (b) \u03b3B=0.2 and various \u03b3S.\nslower convergence rate is, the more revenue can be extracted from non-static pricing. Namely, the\ncloser \u03b3S to 1 is, the larger the improvement of the revenue of the optimal \u03c4-step pricing is w.r.t. the\nconstant pricing (for \ufb01xed \u03b3B and \u03c4). This is both supported by our experiments (see growing relative\nrevenue in Fig. 1(b, bottom) & Fig.C7 as \u03b3S grows) and in line with the intuition: the larger \u03b3S is, the\nmore revenue could be earned in future rounds (and hence the more pro\ufb01table dynamic pricing is).\nOptimal algorithms with constraints. One more structural insight of our reduction in Sec. 4:\noptimization over the set of break points {vi} of the surplus envelope S allows to \ufb01nd optimal\nalgorithms with constraints that can be expressed in terms of these break points. In particular, the\nseller is able to control the probability of buyer usage of each strategy ai\u2208 ST through a constraint\non F (vi+1)\u2212F (vi) (e.g., setting it to zero). E.g., the seller is looking for an algorithm s.t. strategies\nactive with positive probability are monotone, i.e. of the form 0n1T\u2212n for some n\u2264 T . Hence, if ai\nis not monotone, then vi = vi+1, i.e. the line Sai is tangent to the envelope S in only one point. To\n\ufb01nd an optimal algorithm among those for which vi = vi+1, one needs slightly update the functional\nL: replace i-th and (i+1)-th rows in the matrix \u039e by their sum, do the same with i-th and (i+1)-th\ncolumns, and remove i-th components from the vectors 1\u2212F (v) and v. The modi\ufb01ed optimization\nfunctional for the problem with constraints will have T +1 variables since it is equal to the number of\nstrategies that are active with positive probability. So, the dimensionality of the optimization problem\ncan be reduced by means of constraints on the form of the algorithm, that can thus be \ufb01nd ef\ufb01ciently.\nLower bound on the maximal revenue for \u03b3S = 1. In this case, the algorithm PRRFES [30][Th.5]\nwith optimal upper regret bound can be used to get a lower bound on the optimal ESR. Using PRRFES,\nthe seller is able to increase her revenue w.r.t. the optimal constant pricing by up to E[V ]/HD(p\u2217\nD) > 1\n(e.g., it is +100% when D is uniform on [0, 1]) as T \u2192 +\u221e. See details in App. G.\nNumerical experiments13. To show the practical pro\ufb01t and properties of optimal algorithms obtained\nvia our functional L from Eq. (4) for the case \u03b3S \u2265 \u03b3B, we conducted numerical experiments in\nseveral representative games. We seek for optimal \u03c4-step algorithms A\u2217\n\u03c4 , \u03c4 = 2, .., 6, in in\ufb01nite\ngames with the valuation V uniformly distributed in [0, 1]14, i.e., FD(v) = v. Hence, the functional\nLD,\u03b3 S,\u03b3 B becomes thus quadratic and is optimized numerically using the Sequential Least Squares\nProgramming. The ESR of the algorithms are compared with the expected revenue HD(p\u2217(D))\u0393S of\nthe optimal constant pricing A\u2217\n1 (see Sec. 3), which is treated as the baseline from here on. Fig. 1\ncontains: the obtained in this way prices A\u2217\n4(n) for all nodes n (at the top) and the relative expected\nstrategic revenue of A\u2217\n\u03c4 (w.r.t. A\u2217\n1) for \u03c4=2, .., 6 (at the bottom). The results in Fig. 1(a) are for \u03b3S=0.8\nand \u03b3B\u2208{0.01+i\u00b70.005}148\nFirst, at the bottom of Fig. 1, we see that the optimal \u03c4-step algorithms A\u2217\n\u03c4 outperform the baseline\noptimal constant pricing A\u2217\n1 for any observed pair of discounts. Moreover, Fig. 1 demonstrates that\nthe signi\ufb01cant increase in revenue can be obtained even when the minimal possible step aside from the\nconstant pricing is made (\u03c4 = 2). E.g., the seller can extract up to +20% revenue by just maximizing\nthe functional Eq. (4) in the 3-dimensional space (since 2\u03c4 \u2212 1 = 3 for \u03c4 = 2): e.g., the revenue\nimprovement is larger than 20% for \u03b3S = 0.9, \u03b3B = 0.2, larger than 16% for \u03b3S = 0.8, \u03b3B = 0.5,\nand larger than 10% for \u03b3S = 0.8, \u03b3B = 0.55. Second, we see that the expected strategic revenue of\nA\u2217\n\u03c4 converges quite quickly to the optimal one (which thus larger than the revenue of the baseline\nA\u2217\n1 as well). This observation constitutes the empirical evidence of Prop. 3, which suggests that\n\ni=0, while the ones in Fig. 1(b) for \u03b3B=0.2 and \u03b3S\u2208{0.2+i\u00b70.005}159\ni=0.\n\n13The code of all our experiments is avail. at https://github.com/theonlybars/neurips-2019-rppa.\n14Experiments for other distributions and horizons are presented in App. C. The results for them are similar.\n\n8\n\n0.10.20.30.40.50.60.70.8(b) gamma_b = 0.2A(000)A(00)A(001)A(0)A(010)A(01)A(011)A()A(100)A(10)A(101)A(1)A(110)A(11)A(111)0.10.20.30.40.50.60.70.8Optimal algorithm prices(a) gamma_s = 0.80.20.30.40.50.60.70.80.9gamma_s1.01.11.21.31.41.51.6BaselineOptimal tau = 2Optimal tau = 3Optimal tau = 4Optimal tau = 5Optimal tau = 60.00.10.20.30.40.50.60.70.8gamma_b1.01.11.21.31.4Relative expected strategic revenue\fthe convergence rate is equal to \u03b3S. Third, the top part of Fig. 1 demonstrates us that an optimal\nalgorithm may be non-consistent: e.g., the reverse order of the prices A\u2217\n4(001) for \u03b3B >\n\u2248 0.57 in Fig. 1(a). Fourth, if the distance between the discount rates \u03b3S and \u03b3B converges to 0, then\nthe optimal algorithm A\u2217 converges to the optimal constant one A\u2217\n1 (what experimentally supports\nthat HD is a special case of LD,\u03b3 S,\u03b3 B). More details and observations are in App. C.2.3. Overall, we\nconclude that learning of prices even in several starting rounds allow to extract revenue signi\ufb01cantly\nlarger than the one of optimal static pricing.\n\n4(e) < A\u2217\n\n6\n\nIncomplete information about buyer discount sequence\n\nt , \u03b31\n\nt ]}T\n\nt ; \u03b31\n\n\u02c6\u03b3B, then she can apply \u201cBig deal\", which prices are calculated using \u02c6\u03b3B: Abd(e) = (cid:80)\nstill accepts the \ufb01rst proposed price, hence, the seller gets at least(cid:80)\n\nOur results can also be applied in the case of a weak assumption on the seller\u2019s information about\nthe buyer\u2019s discount sequence. The weak assumption: the seller does not know the exact discount\nsequence of the buyer, but rather knows a set of intervals {[\u03b30\nt=1 s.t. the discount coef\ufb01cient \u03b3B\nt\nt ]. We provide the interpretation of the model, which explains the foundation of\nis located in [\u03b30\nthe weak assumption. We also show the performance of our results adapted to the weak assumption\nsetting. For the sake of exposition, all discount sequences are geometrical from here on in the section.\nThe discount in our model can be interpreted as the continuation probability, i.e., \u03b3 is the probability\nthat the game will continue for one more round. E.g., in the example from Sec. 1 (see App. F for an\nextended version as well), \u03b3 is the probability that the user does not click on the ad and follows a\nlink that is in the sight of the RTB platform. In this interpretation, the discount \u03b3 is common. The\ndifference in discounts appears, because the seller and the buyer do not know \u03b3 exactly, but rather\nestimate it based on available information about the user. Let \u03b3 = \u03b3(\u03be1, \u03be2), where \u03be1, \u03be2 are user\nfeatures. Assume that the seller observes both \u03be1 and \u03be2, while the buyer observes only \u03be1. Then the\nseller is able to estimate \u03b3 accurately as well as to recover the buyer\u2019s estimate \u03b3B(\u03be1). To sum up: it\nis likely that the seller in our model can at least recover the buyer discount \u03b3B with high accuracy.\nLet us consider two cases. Case (1): if the seller knows only a lower bound \u02c6\u03b3B for \u03b3B s.t. \u03b3S <\nt \u02c6\u03b3t\u22121\np\u2217\nD;\nD \u2200n. Buyer (whose discount \u03b3B \u2265 \u02c6\u03b3B) with valuation v > p\u2217\nAbd(1 \u25e6 n) = 0 \u2200n; Abd(0 \u25e6 n) = T p\u2217\nD)). This is\nless than the optimal revenue (when \u03b3B is known exactly), but strictly larger than the one of static\npricing. Similarly, modi\ufb01cations of \u201cBig deal\" can be applied when seller knows only distribution\nof \u03b3B, \u03b3B \u2265 \u03b3S. Case (2): The seller uses the functional L to \ufb01nd an optimal algorithm, assumes\nbuyer\u2019s discount is \u03b3(cid:48)\nB = \u03b3B + \u03b5, but faces a buyer with true\ndiscount \u03b3B. We evaluate the loss in revenue by the following\nnumerical experimentation: T = 5, V \u223c U [0; 1] (uniform\non [0; 1]) and \u03b3S = 0.5 (different sets of parameters give\nqualitatively the same results). In \ufb01gure above, the expected\nstrategic revenue (ESR) of this seller is divided by the ESR\nof a well-informed seller (i.e. s.t. \u03b5 = 0). We see: (a) if \u03b5 is\nsmall enough (for \u03b5 = 0.02, or \u2265 4% of \u03b3B), then S still able\nto extract over 99% of the optimal ESR; (b) even if \u03b5 is very\nlarge (for \u03b5 = 0.1, or \u2265 20% of \u03b3B) S still able to extract\nover 97% of the optimal ESR for most cases (\u03b3B \u2264 0.4); and\n(c) if S is able to just separate \u03b3B of \u03b3S with a decent margin, then she is able to gain extra revenue.\n\nD(1 \u2212 F (p\u2217\np\u2217\n\nt \u02c6\u03b3t\u22121\n\nB\n\nB\n\nD\n\n7 Conclusions\n\nWe studied online learning algorithms that maximize expected cumulative revenue of repeated posted-\nprice auctions with a strategic buyer that holds a \ufb01xed private valuation. First, when the participants\nnon-equally discount their cumulative utilities, we showed that the constant pricing, surprisingly, is no\nlonger optimal. Second, for the case of more patient seller, we introduced a novel multidimensional\noptimization functional which is a multivariate analogue of the one used to determine Myerson\u2019s price.\nThis functional can be used (1) to \ufb01nd an optimal dynamic pricing, i.e., by ef\ufb01cient gradient-based\nmethods; and (2) to construct an optimal \u03c4-step algorithm (low-dimensional approximation) that\nallows the seller to improve her revenue even in the game with a large horizon T . Finally, we\nconducted extensive numerical analysis to show that optimal algorithms are non-trivial, may be\nnon-consistent, and generate larger expected revenue than the constant pricing with Myerson\u2019s price.\n\n9\n\n0.200.250.300.350.400.450.50Gamma_B (true buyer discount)0.880.900.920.940.960.981.00Expected Strategic Revenue relative to optimalESR relative to optimal in case of estimation error; g_S = 0.5; v ~ U[0; 1]; T = 5;eps = 0.0eps = -0.1eps = -0.05eps = -0.02eps = -0.01eps = 0.01eps = 0.02eps = 0.05eps = 0.1const Myerson\fReferences\n[1] D. Agarwal, S. Ghosh, K. Wei, and S. You. Budget pacing for targeted online advertisements at linkedin.\n\nIn KDD\u20192014, pages 1613\u20131619, 2014.\n\n[2] G. Aggarwal, A. Goel, and R. Motwani. Truthful auctions for pricing search keywords. In Proceedings of\n\nthe 7th ACM conference on Electronic commerce, pages 1\u20137. ACM, 2006.\n\n[3] G. Aggarwal, G. Goel, and A. Mehta. Ef\ufb01ciency of (revenue-) optimal mechanisms. In EC\u20192009, pages\n\n235\u2013242, 2009.\n\n[4] G. Aggarwal, S. Muthukrishnan, D. P\u00e1l, and M. P\u00e1l. General auction mechanism for search advertising. In\n\nWWW\u20192009, pages 241\u2013250, 2009.\n\n[5] K. Amin, M. Kearns, P. Key, and A. Schwaighofer. Budget optimization for sponsored search: Censored\n\nlearning in mdps. In UAI\u20192012, pages 54\u201363, 2012.\n\n[6] K. Amin, M. Kearns, and U. Syed. Bandits, query learning, and the haystack dimension. In COLT, 2011.\n\n[7] K. Amin, A. Rostamizadeh, and U. Syed. Learning prices for repeated auctions with strategic buyers. In\n\nNIPS\u20192013, pages 1169\u20131177, 2013.\n\n[8] K. Amin, A. Rostamizadeh, and U. Syed. Repeated contextual auctions with strategic buyers. In NIPS\u20192014,\n\n2014.\n\n[9] I. Ashlagi, C. Daskalakis, and N. Haghpanah. Sequential mechanisms with ex-post participation guarantees.\n\nIn EC\u20192016, 2016.\n\n[10] I. Ashlagi, B. G. Edelman, and H. S. Lee. Competing ad auctions. Harvard Business School NOM Unit\n\nWorking Paper, (10-055), 2013.\n\n[11] M. Babaioff, S. Dughmi, R. Kleinberg, and A. Slivkins. Dynamic pricing with limited supply. ACM\n\nTransactions on Economics and Computation, 3(1):4, 2015.\n\n[12] Y. Bachrach, S. Ceppi, I. A. Kash, P. Key, and D. Kurokawa. Optimising trade-offs among stakeholders in\n\nad auctions. In EC\u20192014, pages 75\u201392, 2014.\n\n[13] S. Balseiro, O. Besbes, and G. Y. Weintraub. Dynamic mechanism design with budget constrained buyers\n\nunder limited commitment. In EC\u20192016, 2016.\n\n[14] S. R. Balseiro, O. Besbes, and G. Y. Weintraub. Repeated auctions with budgets in ad exchanges:\n\nApproximations and design. Management Science, 61(4):864\u2013884, 2015.\n\n[15] D. Bergemann and M. Said. Dynamic auctions. Wiley Encyclopedia of Operations Research and Manage-\n\nment Science, 2010.\n\n[16] D. Besanko. Multi-period contracts between principal and agent with adverse selection. Economics Letters,\n\n17(1-2):33\u201337, 1985.\n\n[17] H. Bester and R. Strausz. Contracting with imperfect commitment and the revelation principle: the single\n\nagent case. Econometrica, 69(4):1077\u20131098, 2001.\n\n[18] C. Borgs, J. Chayes, N. Immorlica, K. Jain, O. Etesami, and M. Mahdian. Dynamics of bid optimization in\n\nonline advertisement auctions. In WWW\u20192007, pages 531\u2013540, 2007.\n\n[19] C. Borgs, J. Chayes, N. Immorlica, M. Mahdian, and A. Saberi. Multi-unit auctions with budget-constrained\nbidders. In Proceedings of the 6th ACM conference on Electronic commerce, pages 44\u201351. ACM, 2005.\n\n[20] L. E. Celis, G. Lewis, M. M. Mobius, and H. Nazerzadeh. Buy-it-now or take-a-chance: a simple sequential\n\nscreening mechanism. In WWW\u20192011, pages 147\u2013156, 2011.\n\n[21] N. Cesa-Bianchi, C. Gentile, and Y. Mansour. Regret minimization for reserve prices in second-price\n\nauctions. In SODA\u20192013, pages 1190\u20131204, 2013.\n\n[22] D. Charles, N. R. Devanur, and B. Sivan. Multi-score position auctions. In WSDM\u20192016, pages 417\u2013425,\n\n2016.\n\n[23] S. Chawla, N. R. Devanur, A. R. Karlin, and B. Sivan. Simple pricing schemes for consumers with evolving\n\nvalues. In SODA\u20192016, pages 1476\u20131490, 2016.\n\n10\n\n\f[24] X. Chen and Z. Wang. Bayesian dynamic learning and pricing with strategic customers. SSRN, 2016.\n\n[25] Y. Chen and V. F. Farias. Robust dynamic pricing with strategic customers. In EC\u20192015, pages 777\u2013777,\n\n2015.\n\n[26] M. Chhabra and S. Das. Learning the demand curve in posted-price digital goods auctions.\n\nICAAMS\u20192011, pages 63\u201370, 2011.\n\nIn\n\n[27] M. C. Cohen, I. Lobel, and R. Paes Leme. Feature-based dynamic pricing. In EC\u20192016, 2016.\n\n[28] A. V. den Boer. Dynamic pricing and learning: historical origins, current research, and new directions.\n\nSurveys in operations research and management science, 20(1):1\u201318, 2015.\n\n[29] N. R. Devanur, Y. Peres, and B. Sivan. Perfect bayesian equilibria in repeated sales. In SODA\u20192015, 2015.\n\n[30] A. Drutsa. Horizon-independent optimal pricing in repeated auctions with truthful and strategic buyers. In\n\nWWW\u20192017, pages 33\u201342, 2017.\n\n[31] A. Drutsa. On consistency of optimal pricing algorithms in repeated posted-price auctions with strategic\n\nbuyer. CoRR, abs/1707.05101, 2017.\n\n[32] A. Drutsa. Weakly consistent optimal pricing algorithms in repeated posted-price auctions with strategic\n\nbuyer. In ICML\u20192018, pages 1318\u20131327, 2018.\n\n[33] A. Drutsa. Reserve pricing in repeated second-price auctions with strategic bidders. CoRR, abs/1906.09331,\n\n2019.\n\n[34] P. D\u00fctting, M. Henzinger, and I. Weber. An expressive mechanism for auctions on the web. In WWW\u20192011,\n\npages 127\u2013136, 2011.\n\n[35] B. Edelman and M. Ostrovsky. Strategic bidder behavior in sponsored search auctions. Decision support\n\nsystems, 43(1):192\u2013198, 2007.\n\n[36] B. Edelman, M. Ostrovsky, and M. Schwarz. Internet advertising and the generalized second-price auction:\n\nSelling billions of dollars worth of keywords. American economic review, 97(1):242\u2013259, 2007.\n\n[37] M. Feldman, T. Koren, R. Livni, Y. Mansour, and A. Zohar. Online pricing with strategic and patient\n\nbuyers. In NIPS\u20192016, pages 3864\u20133872, 2016.\n\n[38] G. Goel and M. R. Khani. Revenue monotone mechanisms for online advertising. In WWW\u20192014, 2014.\n\n[39] G. Goel, M. R. Khani, and R. P. Leme. Core-competitive auctions. In EC\u20192015, pages 149\u2013166, 2015.\n\n[40] R. Gomes and V. Mirrokni. Optimal revenue-sharing double auctions with applications to ad exchanges.\n\nIn WWW\u20192014, pages 19\u201328, 2014.\n\n[41] R. Gonen and E. Pavlov. An incentive-compatible multi-armed bandit mechanism. In Proceedings of the\ntwenty-sixth annual ACM symposium on Principles of distributed computing, pages 362\u2013363. ACM, 2007.\n\n[42] A. Greenwald, J. Li, and E. Sodomka. Approximating equilibria in sequential auctions with incomplete\n\ninformation and multi-unit demand. In NIPS\u20192012, pages 2321\u20132329, 2012.\n\n[43] O. D. Hart and J. Tirole. Contract renegotiation and coasian dynamics. The Review of Economic Studies,\n\n55(4):509\u2013540, 1988.\n\n[44] D. He, W. Chen, L. Wang, and T.-Y. Liu. A game-theoretic machine learning approach for revenue\n\nmaximization in sponsored search. In IJCAI\u20192013, pages 206\u2013212, 2013.\n\n[45] H. Heidari, M. Mahdian, U. Syed, S. Vassilvitskii, and S. Yazdanbod. Pricing a low-regret seller. In\n\nICML\u20192016, pages 2559\u20132567, 2016.\n\n[46] P. Hummel and P. McAfee. Machine learning in an auction environment. In WWW\u20192014, pages 7\u201318,\n\n2014.\n\n[47] N. Immorlica, B. Lucier, E. Pountourakis, and S. Taggart. Repeated sales with multiple strategic buyers. In\n\nEC\u20192017, pages 167\u2013168, 2017.\n\n[48] K. Iyer, R. Johari, and M. Sundararajan. Mean \ufb01eld equilibria of dynamic auctions with learning. ACM\n\nSIGecom Exchanges, 10(3):10\u201314, 2011.\n\n11\n\n\f[49] S. M. Kakade, I. Lobel, and H. Nazerzadeh. Optimal dynamic mechanism design and the virtual-pivot\n\nmechanism. Operations Research, 61(4):837\u2013854, 2013.\n\n[50] Y. Kanoria and H. Nazerzadeh. Dynamic reserve prices for repeated auctions: Learning from bids. SSRN,\n\n2017.\n\n[51] R. Kleinberg and T. Leighton. The value of knowing a demand curve: Bounds on regret for online\n\nposted-price auctions. In Foundations of Computer Science, pages 594\u2013605, 2003.\n\n[52] V. Krishna. Auction theory. Academic press, 2009.\n\n[53] T. Lin, J. Li, and W. Chen. Stochastic online greedy learning with semi-bandit feedbacks. In NIPS\u20192015,\n\n2015.\n\n[54] B. Lucier, R. Paes Leme, and E. Tardos. On revenue in the generalized second price auction. In WWW\u20192012,\n\npages 361\u2013370, 2012.\n\n[55] A. M. Medina and S. Vassilvitskii. Revenue optimization with approximate bid predictions. In NIPS\u20192017,\n\n2017.\n\n[56] A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani. Adwords and generalized on-line matching. In 46th\nAnnual IEEE Symposium on Foundations of Computer Science (FOCS\u201905), pages 264\u2013273. IEEE, 2005.\n\n[57] V. Mirrokni, R. Paes Leme, P. Tang, and S. Zuo. Non-clairvoyant dynamic mechanism design. 2017.\n\n[58] M. Mohri and A. M. Medina. Learning theory and algorithms for revenue optimization in second price\n\nauctions with reserve. In ICML\u20192014, pages 262\u2013270, 2014.\n\n[59] M. Mohri and A. M. Medina. Non-parametric revenue optimization for generalized second price auctions.\n\nIn UAI\u20192015, 2015.\n\n[60] M. Mohri and A. M. Medina. Learning algorithms for second-price auctions with reserve. JMLR,\n\n17(74):1\u201325, 2016.\n\n[61] M. Mohri and A. Munoz. Optimal regret minimization in posted-price auctions with strategic buyers. In\n\nNIPS\u20192014, pages 1871\u20131879, 2014.\n\n[62] M. Mohri and A. Munoz. Revenue optimization against strategic buyers. In NIPS\u20192015, pages 2530\u20132538,\n\n2015.\n\n[63] J. H. Morgenstern and T. Roughgarden. On the pseudo-dimension of nearly optimal auctions. In NIPS\u20192015,\n\n2015.\n\n[64] R. B. Myerson. Optimal auction design. Mathematics of operations research, 6(1):58\u201373, 1981.\n\n[65] N. Nisan, T. Roughgarden, E. Tardos, and V. V. Vazirani. Algorithmic game theory. v.1 CUPC, 2007.\n\n[66] G. Noti, N. Nisan, and I. Yaniv. An experimental evaluation of bidders\u2019 behavior in ad auctions. In\n\nWWW\u20192014, pages 619\u2013630, 2014.\n\n[67] M. Ostrovsky and M. Schwarz. Reserve prices in internet advertising auctions: A \ufb01eld experiment. In\n\nEC\u20192011, pages 59\u201360, 2011.\n\n[68] R. Paes Leme, M. P\u00e1l, and S. Vassilvitskii. A \ufb01eld guide to personalized reserve prices. In WWW\u20192016,\n\n2016.\n\n[69] S. Pandey and C. Olston. Handling advertisements of unknown quality in search advertising. In Advances\n\nin neural information processing systems, pages 1065\u20131072, 2007.\n\n[70] A. Pavan, I. Segal, and J. Toikka. Dynamic mechanism design: A myersonian approach. Econometrica,\n\n82(2):601\u2013653, 2014.\n\n[71] A. Pavan, I. R. Segal, and J. Toikka. Dynamic mechanism design: Incentive compatibility, pro\ufb01t maxi-\n\nmization and information disclosure. 2009.\n\n[72] A. Radovanovic and W. D. Heavlin. Risk-aware revenue maximization in display advertising.\n\nWWW\u20192012, pages 91\u2013100, 2012.\n\nIn\n\n[73] T. Roughgarden and J. R. Wang. Minimizing regret with multiple reserves. In EC\u20192016, pages 601\u2013616,\n\n2016.\n\n12\n\n\f[74] M. R. Rudolph, J. G. Ellis, and D. M. Blei. Objective variables for probabilistic revenue maximization in\n\nsecond-price auctions with reserve. In WWW\u20192016, pages 1113\u20131122, 2016.\n\n[75] K. M. Schmidt. Commitment through incomplete information in a simple repeated bargaining game.\n\nJournal of Economic Theory, 60(1):114\u2013139, 1993.\n\n[76] Y. Sun, Y. Zhou, and X. Deng. Optimal reserve prices in weighted gsp auctions. Electronic Commerce\n\nResearch and Applications, 13(3):178\u2013187, 2014.\n\n[77] D. R. Thompson and K. Leyton-Brown. Revenue optimization in the generalized second-price auction. In\n\nEC\u20192013, pages 837\u2013852, 2013.\n\n[78] D. Vainsencher, O. Dekel, and S. Mannor. Bundle selling by online estimation of valuation functions. In\n\nICML\u20192011, pages 1137\u20131144, 2011.\n\n[79] H. R. Varian. Position auctions. international Journal of industrial Organization, 25(6):1163\u20131178, 2007.\n\n[80] H. R. Varian. Online ad auctions. The American Economic Review, 99(2):430\u2013434, 2009.\n\n[81] H. R. Varian and C. Harris. The vcg auction in theory and practice. The A.E.R., 104(5):442\u2013445, 2014.\n\n[82] W. Vickrey. Counterspeculation, auctions, and competitive sealed tenders. The Journal of \ufb01nance,\n\n16(1):8\u201337, 1961.\n\n[83] J. Weed, V. Perchet, and P. Rigollet. Online learning in repeated auctions. JMLR, 49:1\u201331, 2016.\n\n[84] S. Yuan, J. Wang, B. Chen, P. Mason, and S. Seljan. An empirical study of reserve price optimisation in\n\nreal-time bidding. In KDD\u20192014, pages 1897\u20131906, 2014.\n\n[85] Y. Zhu, G. Wang, J. Yang, D. Wang, J. Yan, J. Hu, and Z. Chen. Optimizing search engine revenue in\n\nsponsored search. In SIGIR\u20192009, pages 588\u2013595, 2009.\n\n[86] M. Zoghi, Z. S. Karnin, S. Whiteson, and M. De Rijke. Copeland dueling bandits. In NIPS\u20192015, 2015.\n\n13\n\n\f", "award": [], "sourceid": 535, "authors": [{"given_name": "Arsenii", "family_name": "Vanunts", "institution": "Yandex"}, {"given_name": "Alexey", "family_name": "Drutsa", "institution": "Yandex"}]}