This paper is well written and easy to follow. The experimental settings looks convincing to me, and the analysis and results are interesting. To me, this is a good piece of work on understanding stochastic models for sequence modeling. I do not fully understand section 3.4. From Eq(2), the auxiliary posterior is q_\phi(z_t | z_{
Reviewer 2
The authors discuss the role of latent variable models in sequence models where multiple observations of the time series are modeled at once using a factorized form which assumes conditional independence. This assumption is almost surely violated in practice, thus limiting the performance of such models. When the sequence model is provided with latent variables it is possible to account for the correlation structure of the likely correlated observations within a time window, thus resulting in better performance compared to models without latent variables. Results on multiple datasets demonstrate this intuition. Though the analysis presented by the authors is clear, well motivated and justified, the paper seems to downplay the importance and motivation of sequence models that consider multiple observations at once in a windowed manner, and how sequence models with stochastic (latent) variables by their ability to capture correlation structure alleviate some of the issues associated with windowing, i.e., the conditional independence assumption. The above being considered, the results in Table 4 are not surprising, however, for full context they should be presented alongside with runtimes, relegated to the supplementary material. Post rebuttal: The authors have addressed my concerns about the context of the results, runtimes and the trade-off between computational cost and performance.
Reviewer 3
- The authors verified the effectiveness of latent variables in SRNN for speech density estimation. They point out that the performance advantage of F-SRNN could be entirely attributed to implicitly utilizing the intra-step correlation. Their experimental results show that under the non-factorized output setting, no benefit of latent variables can be observed, the straightforward auto-regressive approach demonstrates superior performance. The authors performed a thorough analysis and discussion on the problem setting and give reasonable assumptions. They carefully conducted the empirical experiments and the results are convincing. - The paper offers some meaningful recommendations such as: besides capturing the long-term temporal structure across steps, properly modeling the intra-step dependency is also crucial to the practical performance. - I was surprised at the massive increase in the auto-regressive results in column 2 of Table 4. Therefore, it is somewhat uncertain to determine the difficulty and importance of these improvements. - Overall, this is a well-written paper. The language is easy to understand, and its results are highly reproducible. Therefore I recommend accepting for publication following minor revision.