NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID:8735
Title:Text-Based Interactive Recommendation via Constraint-Augmented Reinforcement Learning

Reviewer 1


		
The paper is well described and the proposed method appears to be interesting and useful. The authors show performance improvement over existing methods on text-based recommendation and text generation tasks. I'd suggest more details to be provided for "Model Training" (line 125-151), including explaining the selection of \Gamma and \lambda_{max} and how the projection operator stabilizes the parameters.

Reviewer 2


		
This paper presents a novel reinforcement learning algorithm that incorporates constraints from previous feedback into recommendation. The key contribution is that the user preference constraints can be learned by a discriminator neural network. It is well written and easy to understand, and experimental results clearly demonstrate that the proposed method outperforms other existing baseline algorithms.

Reviewer 3


		
The paper proposes a new Constraint Reinforcement Learning method, where the constraints are sequentially added. The authors apply this method in the text-based recommendation and text generation tasks. Generally speaking, this paper is well-written and well organized. The motivation example (i.e. feedback such as clicking or rating contains little information to reflect complex user attitude towards various aspects of an item) in Introduction naturally motivates the proposition of the sequentially added constraints. The proposed model (i.e. Eq. (3), Eq. (5) and its model details) is consistent with the target task. The reward and constraints are reasonably designed. The experimental setting is remarkable (especially the Online Evaluation by simulator and the four proposed evaluation metrics) and the results are positive. However, this paper still has the following minor issues. 1. In the experiment section, the sentences are generated by a GRU model. Then how to ensure that the sentences are short and meaningful such as ‘ankle boots’ and ‘shoes with suede texture’ in Fig. 2. 2. It seems that the recommendation completely depends on the current visual picture and natural language feedback, without considering historical behaviors as in traditional recommendation methods. I wonder whether it can outperform traditional recommendation methods, or how to incorporate them into the framework in this paper. 3. More related works need to be cited. ‘Q&R: A two-stage approach toward interactive recommendation, KDD 2018’ and ‘Query-based Interactive Recommendation by Meta-Path and Adapted Attention-GRU, arxiv 2019’ also focus on interactive recommendation. Constraints are added in a sequential fashion and the recommendation is based on the sequential constraints. Therefore, sequential recommendation such as ‘What to Do Next: Modeling User Behaviors by Time-LSTM, IJCAI 2017’ and ‘A Brand-level Ranking System with the Customized Attention-GRU Model, IJCAI 2018’ are also related to this paper. 4. Typos: a. Page 2: ‘In text-based recommendation ,’ should be ‘In text-based recommendation,’; ‘, in practice’ should be ‘. In practice’. b. Page 3: ‘proposed to using’ should be ‘proposed to use’. c. Page 5: ‘p(X)’ should be ‘p(\hat{X})’. d. Page 6: ‘ Besides’ should be ‘. Besides’; ‘manual’ should be ‘manually’; ‘improving’ should be ‘improve’; ‘ahierarchical’ should be ‘a hierarchical’; ‘softmaxand’ should be ‘softmax and ’. e. Page 7: ‘after after’ should be ‘after’. After the authors' response: Most of my questions are well answered except for the following two. 1. Although the authors argue that the GRU model is trained on short sentences and would generate short sentences, in the format of simple sentences with prefix. I still doubt whether it can surely generate meaningful short sentences. 2. I do not agree that 'Traditional recommendation models are usually trained in an offline manner'. Because there are some recommendation methods that are trained in an online learning fashion, and meanwhile, they consider historical behaviors, e.g. FTRL. However, these two minor issues do not prevent this paper from being a good submission. I would not change my score.