Target-Risk Identification and Honest Inference from Weak Labels: The Observed Fiber, Not the Row Space

Target-Risk Identification and Honest Inference from Weak Labels: The Observed Fiber, Not the Row Space

TMLR Paper9109 Authors

21 May 2026 (modified: 22 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We study target-population evaluation of a fixed predictor when clean target labels are unavailable and source labels are observed only through weak supervision. The standard loss-correction view says that the clean target risk is estimable when the clean loss vector lies in the row space of the weak-label channel, so that an unbiased corrected weak-label loss exists. We show that this row-space condition answers a stronger, uniform question, not the observed-law question faced by an evaluator. The correct population object is the observed weak-label fiber: the set of clean posteriors that reproduce the observed weak conditional distribution. Under exact covariate shift and overlap, the target risk is point identified exactly when the clean-loss functional is constant on this fiber almost surely; otherwise, two pointwise linear programs give the sharp identified interval. The main technical addition is a finite-sample inference layer for the realistic case in which the weak-label law, target covariate weights, and weak-label channel are estimated or sensitivity-modeled. We introduce confidence fibers, prove honest coverage of the clean target risk from joint confidence sets for these nuisance objects, give an exact linear-program formulation under polyhedral multinomial confidence sets, and show convergence to the structural fiber interval without a separation condition at the boundary between point and partial identification. The resulting audit output is deliberately conservative: it certifies point identification when weak labels justify a number, and otherwise reports an honest interval rather than a pseudo-corrected point estimate. Public WRENCH audits illustrate the warning that coarse weak labels can severely understate clean risk while confidence-fiber intervals expose the missing information.

Submission Type: Long submission (more than 12 pages of main content)

Assigned Action Editor: ~Masashi_Sugiyama1

Submission Number: 9109

Loading