Processing math: 100%

Coresets for Decision Trees of Signals

Part of Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

Bibtex Paper Reviews And Public Comment » Supplemental

Authors

Ibrahim Jubran, Ernesto Evgeniy Sanches Shayda, Ilan I Newman, Dan Feldman

Abstract

A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D-signal) into k1 block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix D of N entries (labels) is the sum of squared differences over every label in D and its assigned label by t.Given an error parameter ε(0,1), a (k,ε)-coreset C of D is a small summarization that provably approximates this loss to \emph{every} such tree, up to a multiplicative factor of 1±ε. In particular, the optimal k-tree of C is a (1+ε)-approximation to the optimal k-tree of D.We provide the first algorithm that outputs such a (k,ε)-coreset for \emph{every} such matrix D. The size |C| of the coreset is polynomial in klog(N)/ε, and its construction takes O(Nk) time.This is by forging a link between decision trees from machine learning -- to partition trees in computational geometry. Experimental results on \texttt{sklearn} and \texttt{lightGBM} show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x10, while keeping similar accuracy. Full open source code is provided.