Paper ID: 784
Title: Learning to Prune in Metric and Non-Metric Spaces
Reviews

Submitted by Assigned_Reviewer_5

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper studies how to improve the computational efficiency of approximate nearest neighbor retrieval methods. The paper itself has been well summarized in the abstract. It is well-written and interesting. Moreover, a key contribution is Property 1 where the authors show their proposed method is applicable to what kind of spaces.

However, the contribution is perhaps not closely related to learning and neural computing. I do not see learning plays an important role in the proposed method. The learning in this paper includes simple sampling and approximating piecewise linear function in very low dimensional spaces. As a consequence, this paper may not match the research interests of NIPS very well to me.

** Comment after authors' feedback **

The authors have clarified my concerns. Now I see why learning cannot be absent for this method, and why learning in this method should be simple (in fact learning for the underlying problem should be simple).
Q2: Please summarize your review in 1-2 sentences
This paper studies how to improve the computational efficiency of approximate nearest neighbor retrieval methods. It is well-written and interesting.

Submitted by Assigned_Reviewer_6

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
Summary: The paper presents a method that learns a pruning algorithm for a VP-tree, in non-metric spaces. The idea is to estimate the decision function of the approximate nearest neighbor search in the VP-tree by sampling, and approximating it with a piecewise linear function. The learning to prune method is validated for the search efficiency against relevant baselines for prunning, and outperforms them substantially when the intrinsic dimensionality of the data is small.

Clarity: The paper is mostly clearly written but sometimes does not really go into explaining the implementation details and the choice of some parameters (for example, why choose K=100, m=7, rho=8 and the bucket size = 10^5? Line 185,227,315)

Originality: Learning to approximate the approximate nearest neighbor classification on a VP-tree, to the extent of my knowledge, is the first work that 'learns to prune'

Significance: Nearest neighbor method is a very fundamental topic in search or classification; thus this learning-to-prune method which approximates the nearest neighbor search with a non-linear function would be of some interest to a wide audience.

However, the datasets chosen for validation for the experiments seem rather simple and have low-dimensionality, which are far from realistic. (What is the result on the RCV-256, and SIFT for L2?) Also, whether the proposed method can achieve the desired the speed-up is not well justified for the metric space, which limits its application. For fast search in the metric space, there are existing methods that utilize LSH and embeddings. One relevant paper is as follows: [31] P. Jain, B. Kulis, K. Grauman, Fast Image Search for Learned Metrics, CVPR 2008.
Q2: Please summarize your review in 1-2 sentences
The paper presents a novel learning-to-prune method that approximates the approximate nearest neighbor decision function in a VP-tree by a non-linear piecewise function learned with sampling, which results in large gains in speed-up compared to existing methods without learning. The approach of learning the decision function seems novel and the method seems to work well at least on the selected datasets, but the motivation of targeting a non-metric space specifically should be better justified, since it leaves out relevant baselines to be used for metric spaces.

Submitted by Assigned_Reviewer_7

Q1: Comments to author(s). First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. (For detailed reviewing guidelines, see http://nips.cc/PaperInformation/ReviewerInstructions)
This paper proposes to estimate the decision function to speed up the nearest neighbor retrieval process for VP-tree. More specifically the authors propose to do that via a sampling + regression with piecewise linear function. This strategy works for both metric (e.g. Euclidean space) and non-metric (eg., some Bregman divergence) space. The proposed method has been shown to be empirically faster than recently proposed state-of -the-art in most of the cases. Also, the paper discusses the applicability of VP-tree.

This method seems to be fairly reasonable, and can be viewed as an application of basic machine learning algorithms to search where brute-force evaluation is expensive. The paper is well organized and clearly written, and the experiments are convincing.

This paper is clearly written, and seems to be reasonably new and technically sound.
Q2: Please summarize your review in 1-2 sentences
This paper proposes to estimate the decision function to speed up the nearest neighbor retrieval process for VP-tree. More specifically the authors propose to do that via a sampling + regression with piecewise linear function. This paper is clearly written, and seems to be reasonably new and technically sound.
Author Feedback

Q1:Author rebuttal: Please respond to any concerns raised in the reviews. There are no constraints on how you want to argue your case, except for the fact that your text should be limited to a maximum of 6000 characters. Note however that reviewers and area chairs are very busy and may not read long vague rebuttals. It is in your own interest to be concise and to the point.
Dear reviewers,

Thank you for your comments!

First, learning is essential to our method, because there is no analytical solution for a prunner in generic non-metric spaces. The algorithm will not work if we remove the learning part.

Second, the learning method should be simple. Otherwise, indexing cost will be prohibitive and retrieval can be slow. In fact, we do demonstrate that a previously proposed search method (not based on a learning approach) for Bregman divergences can be slow due to using a computationally-expensive pruning function. Simple learning methods work well and it is a promise that more performance gains can be achieved with better learning algorithms and pivot-selection techniques.

Third, NN-classifiers require an efficient NN-searching method. NN-searching methods were published in proceedings of NIPS several times. Even in cases when these search methods did not involve learning at all. In particular, the bb-tree method due to Cayton. Our method can be useful as a NN-classifier and it does involve learning.

Also note that dimensionality is not exactly low. SIFT vector dimensionality is 1111. The uniform data set has a very high INTRINSIC dimensionality (in fact much higher than that of the SIFT and RCV-* data sets). We do improve over the bb-tree (KL-divergence) in many cases, even when dimensionality is high.

For the camera-ready version, we decided to expand the experimental section (and to shorten the description of the sampling approach that was not very competitive).