Review for NeurIPS paper: HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory

NeurIPS 2020

HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory

Review 1

Summary and Contributions: it presents a novel graph-based similarity search algorithm called HM-ANN, which takes both memory and data heterogeneity into consideration and enables billion-scale similarity search on a single node without using compression.

Strengths: This paper is easy to follow and achieves the SOTA results. And the iead is to make sense.

Weaknesses: The top-1 recall in datasets are limited than other methods.

Correctness: yes

Clarity: yes

Relation to Prior Work: yes

Reproducibility: Yes

Additional Feedback:

Review 2

Summary and Contributions: This paper studies the problem of approximate nearest neighbor search (ANNS) and aims at redesigning the state-of-the-art solution towards the emergence of new hardware architecture, i.e. heterogeneous memory (HM). HM has two levels, with the first level is the traditional faster and smaller memory and the second level is a new slower and larger memory. Compared with traditional memory, HM offers more memory space with the new second level. As a tradeoff, HM introduces data fetching cost that frequently moves data from the second level to the first level. The paper claims four contributions. First, it presents the first billion-scale ANNS using HM. Second, it optimizes data fetching to improve ANNS query speed. Third, it proposes a performance model to automatically set hyperparameters for ANNS according to practical time and accuracy constraints. Fourth, it demonstrates that the proposed algorithm outperforms the state-of-the-art HNSW and NSG algorithms.

Strengths: S1. The paper introduces a new algorithm HM-ANN that redesigns the state-of-the-art ANNS algorithm towards heterogeneous memory. S2. The paper presents the performance of billion-scale ANNS that does not compress data points. S3. The proposed algorithm HM-ANN achieves better query efficiency than the state-of-the-art algorithms, including HNSW and NSG. Previous billion-scale ANNS algorithms mostly compress data points due to memory insufficiency. This work perceives the emergence of hierarchical memory (HM) and explores ANNS on uncompressed data. The work further proposes a new algorithm HM-ANN that optimizes data fetching in HM to improve query efficiency. The authors conducted extensive experiments to report the performance of billion-scale ANNS. The experiments demonstrate that HM-ANN outperforms the state-of-the-art HNSW and NSG algorithms and are 2X to 5.8X faster. [Author Response] The authors have addressed some of the comments in the response. I am happy to accept this paper.

Weaknesses: W1. The paper makes an inaccurate claim about the presence of billion-scale ANNS solutions. W2. The performance gain of the proposed HM-ANN algorithm seems marginal when considering its learning curve in practice. W3. The experiments do not evaluate the performance of data fetching. So it is hard to conclude that the proposed HM-ANN achieves better utilization of HM. The paper claims that the proposed HM-ANN is the first billion-scale ANNS solution on a single machine, without using compression (see the last paragraph of Introduction). The claim is questionable since the paper presents four existing solutions, i.e. HNSW, NSG, IMI+OPQ, and L&C. These existing solutions seem applicable to billion-scale ANNS, single machine, no compression, according to Figure 1(a) – (d). It seems each solution can process a billion uncompressed data when using Hierarchical Memory. The authors may consider to change the claim or make more explanations. Besides, the query efficiency improvements of HM-ANN seem marginal. In Figure 1(a), when achieving a recall of 0.9, HM-ANN is about 1 time faster than HNSW. In Figure 1(b), when achieving a recall of 0.9, HM-ANN is not clearly faster than HNSW. HM-ANN achieves the largest improvement in Figure 1(c). At recall 0.9, the runtime is about 2 times faster than HNSW. Overall, HM-ANN does not clearly outperform HNSW in many experiments. Given HM-ANN costs a lot of human efforts in learning curve and deployment, it is unclear how likely practitioners will use HM-ANN to replace HMSW. Although the paper claims HM-ANN is superior in data fetching optimizations, there are no experiments verifying the claim. The experiments report index construction time, index size, and query time. But none of the metrics directly related to data fetching efficiency. It is too early to conclude that the better efficiency of HM-ANN comes from data fetching optimizations. Although the paper presents many details on the optimizations, there are no theoretical justifications or experiments to evaluate the optimizations.

Correctness: Please refer to Weaknesses.

Clarity: The paper is well written except for potentially questionable claims. Please refer to Weaknesses.

Relation to Prior Work: I do not find discussions in this work that mention prior contributions.

Reproducibility: Yes

Additional Feedback:

Review 3

Summary and Contributions: This paper introduces an adapted version of the HNSW graph-based approximate nearest neighbor search algorithm. It is adapted to PMM, a storage medium that is between RAM and Solid State Disk (SSD) in terms of speed-capacity tradeoff. The changes consists in: - storing the lower level L0 of the HNSW on PMM, the rest remains in RAM, hence the heterogeneous memory (HM) name. - a better selection of the L0 points to put in L1 based on the node degrees - doing a beam search in L1 as opposed to a greedy search

Strengths: The results are truly impressive. It is impossible to obtain such high recalls with in-RAM compression. The comparisons are fair, especially because the authors did a good job to explore more straightforward ways to exploit HM (and show they are inferior).

Weaknesses: The hardware setup (Heterogeneous Memory) is uncommon, which makes this a niche application The algorithmic adaptations are somewhat straightforward

Correctness: The method is simple, so there is not much room for incorrectness. The discussion of the "performance model" in 3.3 is a bit disappointing because after the derivation it just boils down to a grid search on the few parameters of the method, which is what most other papers do as well.

Clarity: The paper is very well written overall. Two points: - in the beginning of sec 3 it is not clear to me what the difference is between the degree of a node and the nb of established connections in the HNSW data structure. My understanding is that the degree is the number of incoming edges in a K-NN graph. - please avoid long math identifiers like "efSearch_L0" : they look more like variable identifiers in computer code - it would be useful to recap what properties of the slow-memory make this possible: is it fast memory access? Throughput? threaded access? Could it be applied to SSD storage as well?

Relation to Prior Work: The prior work for billion-scale indexing is relevant. The baselines are probably slightly outdated, since inverted-index based methods now often use a HNSW structure as top-level quantizer (see eg. https://github.com/dbaranchuk/ivf-hnsw or the Faiss recommended indexes). This should not invalidate the main results of the paper though.

Reproducibility: Yes

Additional Feedback: *** after rebuttal: The authors have provided a serious rebuttal. I particularly appreciate the 2nd item that decomposes the contribution of each improvement. I would suggest the authors incorporate it in a final version. ***

Review 4

Summary and Contributions: This paper studies approximate nearest neighbour (NN) search and adapts the popular HNSW [28] algorithm with hierarchical heterogeneous memory support to query billion-scale datasets. The key adaptations are: (1) promoting elements with the highest degrees to upper layers in the hierarchical NN graph structure, (2) promoting more elements to the upper layers (to take advantage of the fast search efficiency of the hardware used to store such elements), and (3) parallel L0 search with multiple entry points in L0. A cost model is designed to help algorithm parameter selection. Experimental results on both billion-scale and million-scale datasets confirm the effectiveness and efficiency of the proposed algorithm. ***Update after author rebuttal: I appreciate the authors' efforts to address my comments. I am happy with the rebuttal and am updating my score from "Marginally below the acceptance threshold" to "Marginally above the acceptance threshold".

Strengths: 1. An important problem is studied -- approximate nearest neighbour search has wide applications in data mining and database problems. 2. A cost model is designed to help algorithm parameter selection. 3. Experimental results on large-scale real data confirm the effectiveness and efficiency of the proposed algorithm.

Weaknesses: The proposal is a simple and effective adaptation of the HNSW algorithm that yields fast and accurate ANN search results on billion-scale data. The main concern is that the technical depth of the paper seems somewhat limited. There is no theoretical analysis on the approximation ratio of the proposed algorithm, or the approximation ratio in comparison with HNSW due to the proposed adaptations. The results may suit a less theoretical venue. An additional baseline is needed: adapting HNSW by adding a parallel search to L0 with multiple entry points from L1 (similar to what Lines 8 and 9 in Algorithm 3 do). This will help demonstrate whether it is sufficient to just modify the search procedure (without modifying the hierarchical NN graph of HNSW) to achieve similar performance gains to what HM-ANN yields.

Correctness: Line 316: "Figure 2 shows the result. Overall, HM-ANN achieves competitive and sometimes even better performance as HNSW." This seems to contradict with the figure where HM-ANN outperforms HNSW in most cases. Line 297: "DiskANN [38] (not open-sourced), provides 298 95% top-1 recall in 3.5ms. In contrast, HM-ANN provides the same recall in less than 1ms". Are these numbers obtained from the same hardware and software settings?

Clarity: The paper is in general very well written and easy to follow. Adding a figure to illustrate the modified bottom-up promotion procedure may help further improve the readability of the paper.

Relation to Prior Work: The introduction and related work sections contain sufficient details on related studies. Section 3 contains sufficient details on how the proposed technique HM-ANN differs from HNSW.

Reproducibility: Yes

Additional Feedback: See above. Line 3 of Algorithm 1: v -> $v$ PS. This is an emergency review called upon two days ago and hence it was a bit late in the submission.