Efficient Forward-Only Data Valuation For LLMs and VLMs

Wenlong Deng; Qi Zeng; Jiaming Zhang; Minghui Chen; Zixin Ding; Christos Thrampoulidis; Boying Gong; Xiaoxiao Li

Efficient Forward-Only Data Valuation For LLMs and VLMs

Wenlong Deng, Qi Zeng, Jiaming Zhang, Minghui Chen, Zixin Ding, Christos Thrampoulidis, Boying Gong, Xiaoxiao Li

18 Sept 2025 (modified: 02 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Data Valuation, LLM, VLM

Abstract: Quantifying the influence of individual training samples is essential for enhancing the transparency and accountability of large language models (LLMs) and vision-language models (VLMs). Existing data valuation methods rely on Hessian information or model retraining, making them computationally prohibitive for billion-parameter models. In this work, we introduce For-Value, a forward-only data valuation framework that enables scalable and efficient influence estimation for both LLMs and VLMs. By leveraging the rich representations of modern foundation models, For-Value computes influence scores using a simple closed-form expression based on a single forward pass, thereby eliminating the need for costly gradient computations. Our theoretical analysis demonstrates that For-Value accurately estimates per-sample influence by capturing alignment in hidden representations and prediction errors between training and valuation samples. Extensive experiments show that For-Value matches or outperforms gradient-based baselines in identifying impactful fine-tuning examples and effectively detecting mislabeled data.

Supplementary Material: pdf

Primary Area: interpretability and explainable AI

Submission Number: 10422

Loading