LossVal: Data Valuation using Weighted Loss Functions

TMLR Paper9370 Authors

01 Jun 2026 (modified: 03 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Machine learning models are often limited not by how much data we have, but by how much trustworthy data we have. We introduce LossVal, a data valuation method that computes per-sample importance scores during neural network training by integrating a self-weighting mechanism into standard loss functions (e.g., cross-entropy and mean squared error). LossVal produces meaningful importance scores without repeated retraining and achieves competitive performance on common data valuation tasks such as noisy sample detection and bad point removal. Across multiple classification and regression datasets, LossVal reliably distinguishes helpful from harmful samples. Experiments with ResNet-50 and BERT indicate that LossVal can also be applied to larger architectures in our experimental setup.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=OI6w7bSY30&noteId=OI6w7bSY30
Changes Since Last Submission: Was desk rejected due to changes in the template. We removed those changes.
Assigned Action Editor: ~Ju_Sun1
Submission Number: 9370
Loading