Unified Analysis of Continuous Weak Features Learning with Applications to Learning from Missing Data

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This paper proposes a unified framework for learning with low-quality continuous features and provides a theoretical analysis of the interplay between feature quality improvement and downstream predictive performance.
Abstract: This paper addresses weak features learning (WFL), focusing on learning scenarios characterized by low-quality input features (weak features; WFs) that arise due to missingness, measurement errors, or ambiguous observations. We present a theoretical formalization and error analysis of WFL for continuous WFs (continuous WFL), which has been insufficiently explored in existing literature. A previous study established formalization and error analysis for WFL with discrete WFs (discrete WFL); however, this analysis does not extend to continuous WFs due to the inherent constraints of discreteness. To address this, we propose a theoretical framework specifically designed for continuous WFL, systematically capturing the interactions between feature estimation models for WFs and label prediction models for downstream tasks. Furthermore, we derive the theoretical conditions necessary for both sequential and iterative learning methods to achieve consistency. By integrating the findings of this study on continuous WFL with the existing theory of discrete WFL, we demonstrate that the WFL framework is universally applicable, providing a robust theoretical foundation for learning with low-quality features across diverse application domains.
Lay Summary: We are interested in understanding how the quality of input information affects the performance of predictive models trained using machine learning. While this question has been explored to some extent for categorical inputs, the impact of input quality has not been sufficiently discussed in the context of continuous-valued inputs. We focus on this underexplored aspect, and analyze the relationship between the quality of continuous input features and the predictive accuracy of trained models. Our study reveals how variations in input quality influence model performance. Our findings highlight the importance of improving input data quality and lay the groundwork for theoretical analysis of its effects on predictive modeling.
Link To Code: https://github.com/KOHsEMP/continuous_WFL
Primary Area: Theory->Learning Theory
Keywords: weak features learning, impute-then-regress, missing value, weak supervised learning
Submission Number: 5281
Loading