Laplace Sample Information: Data Informativeness Through a Bayesian Lens

ICLR 2025 Conference Submission1607 Authors

18 Sept 2024 (modified: 26 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Sample informativeness, Sample Information, Sample Difficulty, Long-tailed distribution, Leave-one-out retraining, KL Divergence
TL;DR: This paper establishes a novel measure of sample informativeness based on the KL divergence applied on a quasi Bayesian approximaiton of the model parameters.
Abstract: Accurately estimating the informativeness of individual samples in a dataset is an important objective in deep learning, as it can guide sample selection, which can improve model efficiency and accuracy by removing redundant or potentially harmful samples. We propose $\text{\textit{Laplace Sample Information}}$ ($\mathsf{LSI}$) measure of sample informativeness grounded in information theory widely applicable across model architectures and learning settings. $\mathsf{LSI}$ leverages a Bayesian approximation to the weight posterior and the KL divergence to measure the change in the parameter distribution induced by a sample of interest from the dataset. We experimentally show that $\mathsf{LSI}$ is effective in ordering the data with respect to typicality, detecting mislabeled samples, measuring class-wise informativeness, and assessing dataset difficulty. We demonstrate these capabilities of $\mathsf{LSI}$ on image and text data in supervised and unsupervised settings. Moreover, we show that $\mathsf{LSI}$ can be computed efficiently through probes and transfers well to the training of large models.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1607
Loading