Online robust non-stationary estimation

Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023) Main Conference Track

Bibtex Paper

Authors

Abishek Sankararaman, Balakrishnan Narayanaswamy

Abstract

The real-time estimation of time-varying parameters from high-dimensional, heavy-tailed and corrupted data-streams is a common sub-routine in systems ranging from those for network monitoring and anomaly detection to those for traffic scheduling in data-centers. For estimation tasks that can be cast as minimizing a strongly convex loss function, we prove that an appropriately tuned version of the {\ttfamily clipped Stochastic Gradient Descent} (SGD) is simultaneously {\em(i)} adaptive to drift, {\em (ii)} robust to heavy-tailed inliers and arbitrary corruptions, {\em(iii)} requires no distributional knowledge and {\em (iv)} can be implemented in an online streaming fashion. All prior estimation algorithms have only been proven to posses a subset of these practical desiderata. A observation we make is that, neither the $\mathcal{O}\left(\frac{1}{t}\right)$ learning rate for {\ttfamily clipped SGD} known to be optimal for strongly convex loss functions of a \emph{stationary} data-stream, nor the $\mathcal{O}(1)$ learning rate known to be optimal for being adaptive to drift in a \emph{noiseless} environment can be used. Instead, a learning rate of $T^{-\alpha}$ for $ \alpha < 1$ where $T$ is the stream-length is needed to balance adaptivity to potential drift and to combat noise. We develop a new inductive argument and combine it with a martingale concentration result to derive high-probability under \emph{any learning rate} on data-streams exhibiting \emph{arbitrary distribution shift} - a proof strategy that may be of independent interest. Further, using the classical doubling-trick, we relax the knowledge of the stream length $T$. Ours is the first online estimation algorithm that is provably robust to heavy-tails, corruptions and distribution shift simultaneously. We complement our theoretical results empirically on synthetic and real data.