Steering Back-Propagation with Prior Information in Natural Language

Steering Back-Propagation with Prior Information in Natural Language

ICLR 2026 Conference Submission15023 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Prior-Guided Tuning, Prior-based Gradient Editing, Large Language Models

Abstract: Large language models (LLMs) often struggle when task-relevant prior knowledge is missing or incorrect, leading to overfitting and hallucinations—especially on tasks with ambiguous or sparse data. Simple prompt concatenation can inject priors, but it often yields only marginal gains and may fail to capture the full intent encoded in the priors. We introduce prior-guided tuning, a paradigm that directly embeds natural-language priors into model learning, and propose Prior-based Gradient Editing (PGE) as a concrete instantiation. PGE computes auxiliary losses for positive (correct) and negative (misleading) prior prompts and adds their difference as an extra term in the gradient update. By shaping gradient updates with this prior-derived signal, PGE steers the model to internalize desired priors and improve task performance. Empirically, PGE outperforms baselines on both a synthetic mathematical expression mapping benchmark and real-world datasets (Jigsaw and BEAD), producing substantial gains in learning efficiency and robustness. Ablations confirm that priors must be presented together with the original training data to be effective, and attention visualizations show that PGE-trained models attend more to prior-relevant tokens. Our code and data will be made publicly available.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 15023

Loading