NIPS 2017
Mon Dec 4th through Sat the 9th, 2017 at Long Beach Convention Center
Paper ID: 2586 Position-based Multiple-play Bandit Problem with Unknown Position Bias

### Reviewer 1

The paper investigates a multiple-play multi-armed bandit model with position bias i.e., one that involves $K$ Bernoulli bandits (with parameters $\theta_i$ \in (0,1)$and$L$slots with parameters$\kapa_j$\in (0,1]$. At each point in time $t$ the system selects $L$ arms $(I_1 (t), . . . , I_L (t))$ and receives a Bernoulli reward from each arm $Ber(\theta_{I_j(t)} \kapa_j{I_j(t)})$, $j= I_1 (t), . . . , I_L (t)$. The paper derives asymptotically optimal policies in the sense of Lai & Robbins (1985). I like the model and the analysis, but I did not have time to check all the mathematics at a satisfactory level. Finally, I would like to bring to the attention of the author(s) the papers below. [1] provides the extension of the Lai Robbins (1985) work (that involves single parameter bandits) to those with multiple unknown parameters. [2] Provides asymptotically optimal polices for unknown MDPs. [1] Optimal adaptive policies for sequential allocation problems AN Burnetas, MN Katehakis - Advances in Applied Mathematics, 1996 [2] Optimal adaptive policies for Markov decision processes AN Burnetas, MN Katehakis - Mathematics of Operations Research, 1997