A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters

Part of Advances in Neural Information Processing Systems 16 (NIPS 2003)

Bibtex Metadata Paper


Daniel Neill, Andrew Moore


Given an N(cid:2)N grid of squares, where each square has a count and an un- derlying population, our goal is to find the square region with the highest density, and to calculate its significance by randomization. Any density measure D, dependent on the total count and total population of a re- gion, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff’s spatial scan statistic DK to find the most significant spatial disease cluster. A naive approach to finding the maximum density region requires O(N 3) time, and is generally computationally infeasible. We present a novel algorithm which partitions the grid into overlapping regions, bounds the maximum score of subregions contained in each region, and prunes re- gions which cannot contain the maximum density region. For sufficiently dense regions, this method finds the maximum density region in optimal O(N2) time, in practice resulting in significant (10-200x) speedups.