Review for NeurIPS paper: Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network

NeurIPS 2020

Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network

Review 1

Summary and Contributions: This paper presents an unsupervised attention based feed-forward neural network which uses multi-resolution features extracted by dilated RNNs as inputs. SVDD loss is used as an anomaly objective function. In order to encourage neural networks to learn informative features, an orthogonal loss as well as self supervised (an auto-regression loss) loss are also used. The empirical studies on 6 popular time series datasets justified the effectiveness of the proposed technique. The major contribution is enable the model to consider mutli-resolution features in time series anomaly detection problem.

Strengths: The major strength of this paper is their models' performance on 6 datasets. Their model outperforms all 13 baselines by a large margin.

Weaknesses: One potential weakness is lacking of novelty. The neural network framework can be separated into 3 parts: multi-resolution feature extraction module using dilated RNNs; Features fusion module using attention mechanism; Anomaly loss using SVDD. All three modules are known techniques with little technical improvements. The presentation of this paper needs to be improved. For example, section 3.1.2 should cite attention mechanism rather than re-invent some technical terms. Also authors do not mention how multi-resolution features extracted from dilated RNNs have been inputed into fusion module.

Correctness: Both claims and method are plausible. Empirical experiments show outstanding performance.

Clarity: Generally the paper is well written and easy to follow. However as stated in previous section, several key parts are not very clear. Also, authors should consider to summarize contributions in introduction.

Relation to Prior Work: Yes.

Reproducibility: Yes

Additional Feedback: (1) The justification of model design is not fully convincing. For instance, this work employs dilated RNN to extract multiscale temporal features. However, similar works (e.g., wavenet, MSCRED) have been developed to address this issue. It is not clear why dilated RNN is a valid choice. (2) Instead of using attention mechanism together with orthogonal loss as well as auto-regression loss, authors may consider to use self-attention mechanism to simplify their framework. ============================== My concerns have been partially addressed. Therefore, I am happy to raise my score.

Review 2

Summary and Contributions: The paper addresses the task of unsupervised anomaly detection in time-series using a multi-scale RNN to extract temporal features and a hierarchical network to output anomaly scores. The authors build on Deep SVDD by extending the single hypersphere case to cover multi modalities and multi-level features. They also include a self-supervised loss (reconstruction loss). The proposed model outperforms all other competitors on 4 anomaly-detection datasets.

Strengths: The performance and the ablation study section demonstrate the importance of using both a hierarchical structure and additional losses.

Weaknesses: In the "Broader Impact" section the authors state: "This work extends the one-class classification method to time series". As the authors themselves previously mentioned: "One-class classification [...] The idea is that by assuming that most of the training data are normal, their characteristics are captured and learned by a model. An outlier is then detected when the current observation from the time series cannot be well-fitted by the model". This is a very generic description which can potentially fit any unsupervised anomaly detection method (e.g. an autoencoder trained for reconstruction on normal data only would fit it), rendering the first statement not true. I think the authors should remove it.

Correctness: The claims and empirical methodology seem correct.

Clarity: The use of multiple indices in section 3 makes it really hard to follow the method workflow. The use of "s" for both the sample and the skip dilation doesn't help in that sense. Figure 1 (right) is not clear, as it seems to indicate a tree structure in the hierarchical network. I encourage the authors to replace it to enhance the overall clarity.

Relation to Prior Work: The authors clearly state how their work differs from Deep SVDD, which can be considered here as a baseline.

Reproducibility: Yes

Additional Feedback: After the rebuttal I am still positive toward the paper so my rating is unchanged.

Review 3

Summary and Contributions: This paper proposes a temporal hierarchical one-class neural network for time series anomaly detection based on the proposed multiscale support vector data description (MVDD) based objective function. Extending the support vector data description (SVDD) for the anomaly detection from dynamic data is an important and interesting problem. In order to show the advantages of the proposed approach, this paper conducts thorough experiments on various datasets and compares the proposed approach with many basic and advanced baselines.

Strengths: This paper targets on an important problem, time series anomaly detection, and extends the objective function for static data from deep SVDD to the time series data by proposing MVDD. The proposed approach beats a number of baselines for time series anomaly detection in terms of precision, recall, and F1.

Weaknesses: I am confused about the MVDD part of the proposed approach. For deep SVDD, only one center \mathbf{c} is used to represent the normal class, which is easy to understand. However, for the proposed approach, in each layer of RNN, there are K^l clusters. I cannot fully understand the motivation of this setting. It also means there are so many hyper-parameters (K^l) in the proposed neural network. I think at least, the parameter sensitivity regarding K^l should be evaluated in experiments. Meanwhile, we can also observe from the ablation study shown in Table 5 that when only using the MVDD-based loss, the performance is poor, worse than most of the baselines. It seems that the self-supervised and orthogonal losses play an important role for the anomaly detection.

Correctness: Since the proposed MVDD is different from the commonly used form of SVDD, it would be better to provide the theoretical analysis about MVDD on mapping anomalies outside the hypersphere.

Clarity: Overall, the paper is well written. The notations are a little bit complicated and make the equations not easy to follow. It would be better to put the Tables 2 and 4 in the same place for clarity.

Relation to Prior Work: Yes

Reproducibility: No

Additional Feedback: As pointed out by R1, this paper conducts a good empirical study, so I am willing to increase the overall score.