Learning From Complementary Features

Kosuke Sugiyama, Masato Uchida

Published: 2026, Last Modified: 01 Apr 2026IEEE Access 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: High-quality observations of input features are essential for building highly accurate predictive models. However, in real-world scenarios, precise measurements of some features are often difficult to obtain due to limitations in observation precision, high data acquisition costs, or privacy constraints. In this study, we formulate prediction problems with features where only complementary information, specifically “what the value is not,” is available under such constraints, and conducts a theoretical and empirical investigation of such problems. In this setting, we define features whose ground-truth values are precisely observable (i.e., directly measurable without uncertainty) as ordinary features (OFs), and features for which only the aforementioned complementary information is available as complementary features (CFs). We then formulate the learning task using both OFs and CFs as a new problem, which we term complementary features learning (CFL). We treat CFL as a two-stage learning problem: 1) estimating the ground-truth values corresponding to the CFs, and 2) predicting the downstream label using the estimated CFs’ ground-truth values together with the OFs. We first provide a theoretical analysis of the prediction error in CFL. Our analysis reveals that, to accurately learn the true input-output relationship, the following two terms must be minimized: (i) the estimation error of the CFs’ ground-truth values, and (ii) the prediction error of the label based on the estimated CFs’ ground-truth values and OFs’ observed values. The first term (i) can be characterized by the mutual information between the estimated CFs’ ground-truth value and the target labels, highlighting that accurate estimation for CFs strongly associated with the downstream task is crucial in CFL. The second term (ii) is measured by the Kullback-Leibler (KL) divergence between the true label distribution and the label prediction model. Moreover, we further show that terms (i) and (ii) can also be evaluated with an arbitrary $f$ -divergence, respectively. Based on these theoretical insights, we propose a principled approach for constructing objective functions that enable effective and adaptable learning of the true input-output relationship. Finally, we validate the effectiveness of the proposed approach through numerical experiments.
Loading