Active Learning for Parameter Estimation in the Presence of Noise for Linear Models

TMLR Paper6475 Authors

11 Nov 2025 (modified: 01 Feb 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Parameter estimation is central to scientific inference, yet standard data collection practices, such as random sampling, often yield inefficient or suboptimal results when data are noisy, imbalanced, or expensive to obtain. In such settings, not all samples equally contribute to inference, motivating the need for principled methods to identify and prioritize the most informative data when data are noisy. We propose an active learning method based on Fisher information that quantifies each sample’s contribution to the precision of parame- ter estimates. Unlike prediction performance-driven active learning, our method explicitly targets the improvement of inference precision rather than predictive generalization. By incorporating an adjusted Fisher Information metric, the framework naturally accounts for measurement noise and heteroscedasticity, assigning a higher value to samples that most effectively reduce estimator variance. We provide theoretical guarantees for both linear and logistic regression, demonstrating faster convergence than CoreSet and BAIT approaches, with gains that scale logarithmically with the unlabeled pool size. Extensions to multivariate and non-Gaussian settings further show that parameter-focused active learning offers a principled, efficient strategy for subset selection – prioritizing the most informative observations under realistic, high-noise scientific conditions.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: In our revision, we have included the suggested changes in red. This includes: - Clarifying contributions of our work - Clarification of Assumption 2 - Adding ablation studies to test our assumptions - Adding connection to optimal experimental design - Adding suggested literature review - Typo and math corrections - Updated logistic regression experiment description
Assigned Action Editor: ~Tom_Rainforth1
Submission Number: 6475
Loading