Evaluating the statistical significance of biclusters

Part of Advances in Neural Information Processing Systems 28 (NIPS 2015)

Bibtex »Metadata »Paper »Reviews »


Jason D. Lee, Yuekai Sun, Jonathan E. Taylor


Biclustering (also known as submatrix localization) is a problem of high practical relevance in exploratory analysis of high-dimensional data. We develop a framework for performing statistical inference on biclusters found by score-based algorithms. Since the bicluster was selected in a data dependent manner by a biclustering or localization algorithm, this is a form of selective inference. Our framework gives exact (non-asymptotic) confidence intervals and p-values for the significance of the selected biclusters. Further, we generalize our approach to obtain exact inference for Gaussian statistics.