Keywords: Inference, Curvature Analysis, ML-guided Annotation
Abstract: Inspired by Active Learning, Active Statistical Inference (ASI) is an inference framework that leverages machine learning predictions to guide data-label acquisition, efficiently utilizing the labeling budget. However, relying solely on model-output uncertainty can lead to labeling redundant instances with diminishing informational returns. To address this, we propose Curvature-Aware Active Statistical Inference (CA-ASI), which prioritizes points of high model-output uncertainty while penalizing redundant points based on their structural similarity to the inference target. The structural similarity itself is evaluated by incorporating second-order information into the sampling rule, ensuring diverse and informative points are selected to be labeled. Further, we show that CA-ASI constructs provably valid confidence intervals and hypothesis tests for any black-box model. Under the same budget, CA-ASI enables smaller confidence intervals and more powerful statistical tests than ASI. We demonstrate CA-ASI's effectiveness on real-world datasets across standard baselines.
Submission Number: 4
Loading