NeurIPS 2019
Sun Dec 8th through Sat the 14th, 2019 at Vancouver Convention Center
Paper ID: 2257 Thinning for Accelerating the Learning of Point Processes

### Reviewer 1

The paper deals with parametric point processes on the real line. The authors show that thinning a point process by keeping each point at random with given probability $p$ is a method that compresses the intensity, but preserves its structure. Hence, it provides a downsampling method. The method seems to be new, even if it is not a major breakthrough. It is more elegant than the different techniques to paste sub-interval downsampling, and the proofs given in the paper are quite general. Yet, it misses an evaluation of the uncertainty of the estimate. The paper is clearly written, even if it is sometimes unnecessarily abstract (see, eg. Definition 2.2 of the stochastic intensity). By way of example, the theoretical results are applied to two particular parametric cases: non-homogeneous Poisson point processes and Hawkes processes. This is a good idea because it helps to understand the general theoretical results, and to see their possible use.

### Reviewer 2

The paper rigorously tackles the question of what is the best way to learn point process parameters given the presence of very long sequences which are samples drawn from the point process. There are two ways of training the models by using either sub-intervals or, as recommended in this paper, by thinning the original sequences to make them shorter. The authors first rigorously establish the classes of models for which thinning can work, show that the variance of the gradient of the residue calculated for the thinned sequence is going to be smaller than that calculated over the sub-interval sequence, and show using experimentation that thinning manages to learn the state-of-the-art models. The paper is very well written; each lemma/theorem/definition is followed by a clear example and explanation. The presentation is also smooth and the paper is approachable while remaining rigorous. However, there still are a few parts of the paper which could do with more explanation. Firstly, the authors hint at the gradient for stochastic intensities potentially being unbiased in line 186-190. An example here and potential discussion of the limitations would help contextualize the contribution better, Also, there seems to be a rather sudden jump from learning/modelling one intensity function to multi-variate intensity functions between Definition 4.2 and Experiments. Overall, I believe that the paper makes a significant contribution and should be accepted for publication. Minor: - Line 66: Missing period. - Eq (2): R(.) is undefined. - Line 196: "of applying" - Missing Theorem (reference) in Supplementary, above eqn. 9.

### Reviewer 3

The thinning idea of learning point processes is interesting. The paper is well written. The only concern I have is on the applicability of the proposed model. In the real world experiments, only the task to learn a Hawkes process is discussed. However, Hawkes process is a weak baseline and there are many other point process models that are shown to have better performance than Hawkes processes on the IPTV and taxi data. It would improve the paper if these models can be compared and discussed. -------------- thanks the authors for your response, which addressed my concerns. I changed the score accordingly.