NIPS 2017
Mon Dec 4th through Sat the 9th, 2017 at Long Beach Convention Center
Paper ID: 2035 A multi-agent reinforcement learning model of common-pool resource appropriation

### Reviewer 1

An interesting take on modelling the CPR dilemma. From a social science perspective and purely as future work, it would be interesting to see the following: 1. How the agents behave if they are aware of the type of opponents in their local view (zapper/non-zapper). Does it lead them to form a (temporary?) alliance or do they tend to avoid each other. 2. The effects of varying the rate at which resources are harvested for a part of the agent population.

### Reviewer 2

* Summary Authors introduce a novel multi-agent problem (from machine learning's perspective) of common-pool resource appropriation and study how independently learning agents, while trying to optimize certain incentives, can learn to avoid "tragedy of the commons". The authors find that in certain environments, the learned strategies are consistent with predictions from economics. * Strengths The paper is well written and argued paper introducing a novel problem applying techniques of multi-agent reinforcement learning to model possible mechanisms by which intelligent agents can learn to avoid "tragedy of the commons". It shows how modern machine learning methods, when applied appropriately can add more fidelity to the model under study, while still maintaining some tractability. The authors clearly mention the limitations of the model and avoid making any comments about the social-economical implications. * Weaknesses - The paper is not self contained Understandable given the NIPS format, but the supplementary is necessary to understand large parts of the main paper and allow reproducibility. I also hereby request the authors to release the source code of their experiments to allow reproduction of their results. - Use of deep-reinforcement learning is not well motivated The problem domain seems simple enough that a linear approximation would have likely sufficed? The network is fairly small and isn't "deep" either. - > We argue that such a mechanism is more realistic because it has an effect within the game itself, not just on the scores This is probably the most unclear part. It's not clear to me why the paper considers one to be more realistic than the other rather than just modeling different incentives? Probably not enough space in the paper but actual comparison of learning dynamics when the opportunity costs are modeled as penalties instead. As economists say: incentives matter. However, if the intention was to explicitly avoid such explicit incentives, as they _would_ affect the model-free reinforcement learning algorithm, then those reasons should be clearly stated. - Unclear whether bringing connections to human cognition makes sense As the authors themselves state that the problem is fairly reductionist and does not allow for mechanisms like bargaining and negotiation that humans use, it's unclear what the authors mean by Perhaps the interaction between cognitively basic adaptation mechanisms and the structure of the CPR itself has more of an effect on whether self-organization will fail or succeed than previously appreciated.'' It would be fairly surprising if any behavioral economist trying to study this problem would ignore either of these things and needs more citation for comparison against "previously appreciated". * Minor comments ** Line 16: > [18] found them... Consider using \citeauthor{} ? ** Line 167: > be the N -th agent’s should be i-th agent? ** Figure 3: Clarify what the fillcolor implies and how many runs were the results averaged over? ** Figure 4: Is not self contained and refers to Fig. 6 which is in the supplementary. The figure is understandably large and hard to fit in the main paper, but at least consider clarifying that it's in the supplementary (as you have clarified for other figures from the supplementary mentioned in the main paper). ** Figure 5: - Consider increasing the axes margins? Markers at 0 and 12 are cut off. - Increase space between the main caption and sub-caption. ** Line 299: From Fig 5b, it's not clear that |R|=7 is the maximum. To my eyes, 6 seems higher.