Reviews: Learning Representations for Time Series Clustering

The submission proposes a model for time-series clustering. The model is a novel combination of several existing components: a) a deep recurrent auto-encoder using dilated RNNs, b) a spectral relaxation of the K-means objective and c) a self-supervision loss to discriminate time-series corrupted by random shuffling from the original ones. The model is evaluated on a common benchmark for time-series clustering and achieves superior performance to existing methods. Overall I feel positive about the proposed method as the quantitative results look promising and using the spectral relaxation of K-means for deep clustering is novel and original. Nevertheless I do have some concerns about the submission in its current form: 1.) As far as I understand the time-series in the UCR benchmark are of fixed length for each category. In that sense there is no reason to explicitly model them as time-series but one could technically consider them as static vectors. Therefore one should compare the proposed method to an appropriate subset of the large body of recent work in deep learning clustering methods for static inputs (e.g. see [1] for recent overview). 2.) The submission has no discussion/analysis on how to treat the case if the number of clusters K is not known a priori, which is often the case for real world problems. 3.) The analysis and results in sections 4.3.2 and 4.3.3 are purely qualitative each evaluated on a single time-series clustering problem. It is not clear what general properties about the algorithm can be inferred. Minor comments - The paper mentions sequence-to-sequence modeling multiple times, it should reference the original paper using deep learning for seq2seq [2] - Typo in author name in line 119: Zhang et al. should be Zha et al. References [1] Aljalbout, E., Golkov, V., Siddiqui, Y., Strobel, M., & Cremers, D. (2018). Clustering with deep learning: Taxonomy and new methods. arXiv preprint arXiv:1801.07648. [2]Sutskever et al., Sequence to Sequence Learning with Neural Networks NIPS 2014 ——————————————————————————————— Given the author feedback, in particular the comparison with DEC and iDEC also on a separate, larger dataset, I am happy to increase my score to 7.

originality, quality, clarity, and significance. This paper proposes a method which integrates two new losses into an RNN autoencoder learning process for time series data. Specifically it adds a k-means loss to encourage k-means friendly clusters and a classification loss based on discriminating between shuffled input data and the true input data. It is evaluated on a number of standard benchmark datasets and shows an average improvement of 1% compared to the second best approach. There is a number of benefits to this work. a) I think the idea of using an RNN autoencoder, jointly trained with more cluster friendly losses, is good. Further, the 'fake data' discrimination process further strengthens this work. b) The evaluation process, while limited to benchmark datasets and (mostly) time-series algorithms, shows an improved performance, and the robustness of the approach. c) The ablation study helps clarify the individual contribution of each element of the proposed method. However, I think there are also a number of significant improvements that could be made to this work to bring it up to the level required for significance. a) The integration of k-means into an autoencoder loss function has been successfully used before in non time series specific methods. I would be interested in understanding the motivation behind the use of the k-means loss in this setting. How does this hold-up compared to dynamic time warping approaches? b) Some parameters are hardcoded, such as T and lambda at lines 137 and 154. How robust is the method to these? I think this is important as in the unsupervised setting typical supervised methods of choosing these parameters is unavailable. Can these parameters be set based on some other knowledge? c) On line 163 it is mentioned that datasets contain a train test split. What is the significance of saying this? In the unsupervised setting is the entire dataset not used for testing? d) I think each experiment should be run at least a few times (ideally 5 or 10) and the means and standard deviations reported. With approaches such as the proposed one it is difficult to understand how much the random initialization contributed to the final performance without this. e) The proposed method would also be much more convincing if larger more complex datasets were evaluated. f) I would find it useful to have a comparison with some other state of the art deep clustering methods (such as DEC[1], IDEC[2]). While not designed for time-series datasets, a comparison with them will improve both the clarity and potentially significance of the proposed method. g) The choice of metric is rather limiting. I would like to see accuracy and NMI also included as an evaluation metric as they are both commonly used in the literature and provide additional insights into the performance. Some minor suggestions to help improve the clarity of the paper Line 56 and 61 - 'a' should be 'an'. Line 192 - for / bracket placement. Line 185 - 'most best' is not grammatically correct, 'best' is fine. [1] Xie, Junyuan, Ross Girshick, and Ali Farhadi. "Unsupervised deep embedding for clustering analysis." International conference on machine learning. 2016. [2] Guo, Xifeng, et al. "Improved deep embedded clustering with local structure preservation." IJCAI. 2017.

Paper ID:	2058
Title:	Learning Representations for Time Series Clustering

Reviewer 1

Reviewer 2

Reviewer 3