Keywords: Large Language Models, Verbatim Memorization, Training Objectives, Privacy in NLP, Token-Aware Loss, TF-IDF
Abstract: Large language models are typically trained under uniform token weighting, which allows frequent and low-information tokens to dominate learning and can increase the tendency to memorize surface-level text spans. To address this memorization issue, we present an information-weighted cross-entropy loss that rescales token-level contributions using TF-IDF statistics, emphasizing semantically-informative tokens while downweighting ubiquitous ones. Experiments on five decoder-only LLMs ranging from 1.1B to 13B parameters show consistent reductions in memorized substring length across all models, while preserving perplexity and downstream task performance. The strongest reduction is observed for GPT-J 6B, with an average decrease of up to 20% in average substring memorization length. Our approach is architecture-agnostic and can be incorporated into existing training pipelines with minimal overhead, suggesting that token-aware weighting provides a lightweight and principled approach to mitigating memorization without disrupting standard training dynamics.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: Language Modeling, Machine Learning for NLP, Efficient/Low-Resource Methods for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 7822
Loading