Part of Advances in Neural Information Processing Systems 33 (NeurIPS 2020)
Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Nicola Gatti
In Bayesian persuasion, an informed sender has to design a signaling scheme that discloses the right amount of information so as to influence the behavior of a self-interested receiver. This kind of strategic interaction is ubiquitous in real economic scenarios. However, the original model by Kamenica and Gentzkow makes some stringent assumptions which limit its applicability in practice. One of the most limiting assumptions is arguably that, in order to compute an optimal signaling scheme, the sender is usually required to know the receiver's utility function. In this paper, we relax this assumption through an online learning framework in which the sender faces a receiver with unknown type. At each round, the receiver's type is chosen adversarially from a finite set of possible types. We are interested in no-regret algorithms prescribing a signaling scheme at each round of the repeated interaction with performances close to that of the best-in-hindsight signaling scheme. First, we prove a hardness result on the per-iteration running time required to achieve the no-regret property. Then, we provide algorithms for the full and partial information model which exhibit regret sublinear in the number of rounds and polynomial in the parameters of the game.