Revealing Distribution Discrepancy by Sampling Transfer in Unlabeled Data

Zhilin Zhao; Longbing Cao; Xuhui Fan; Wei-Shi Zheng

Revealing Distribution Discrepancy by Sampling Transfer in Unlabeled Data

Zhilin Zhao, Longbing Cao, Xuhui Fan, Wei-Shi Zheng

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Non-IID Data; Distribution Discrepancy; Density Ratio; Likelihood Ratio; Generalization

TL;DR: This paper introduces a method to evaluate distribution discrepancies between training and test distributions without needing class labels in the test samples.

Abstract: There are increasing cases where the class labels of test samples are unavailable, creating a significant need and challenge in measuring the discrepancy between training and test distributions. This distribution discrepancy complicates the assessment of whether the hypothesis selected by an algorithm on training samples remains applicable to test samples. We present a novel approach called Importance Divergence (I-Div) to address the challenge of test label unavailability, enabling distribution discrepancy evaluation using only training samples. I-Div transfers the sampling patterns from the test distribution to the training distribution by estimating density and likelihood ratios. Specifically, the density ratio, informed by the selected hypothesis, is obtained by minimizing the Kullback-Leibler divergence between the actual and estimated input distributions. Simultaneously, the likelihood ratio is adjusted according to the density ratio by reducing the generalization error of the distribution discrepancy as transformed through the two ratios. Experimentally, I-Div accurately quantifies the distribution discrepancy, as evidenced by a wide range of complex data scenarios and tasks.

Primary Area: Evaluation (methodology, meta studies, replicability and validity)

Submission Number: 2083

Loading