Reasoning with Neologisms: Can Soft Tokens Learn Composable Reasoning Skills Without Forgetting?
Keywords: Continual learning, skill neologisms, reasoning, LLMs, soft prompts, prompt tuning
TL;DR: RL-trained soft tokens can match weight-based fine-tuning on compositional reasoning while preserving all prior LLM capabilities by architectural construction.
Abstract: Large language models (LLMs) acquire reasoning capabilities in post-training via methods such as supervised fine-tuning (SFT) on demonstrations and reinforcement learning (RL) with verifiable rewards, but continuously extending these capabilities without forgetting previously acquired ones remains an open problem. Skill neologisms (SNs) have recently been proposed as a way to add procedural knowledge to a model without weight updates, by learning semantically integrated soft tokens. Whether SNs can acquire reasoning skills—and whether such skills compose with the model's existing capabilities—remains unknown. We systematically compare new reasoning skill acquisition via SFT/RL fine-tuning and via SNs in a controlled string-manipulation setting that prevents pretraining contamination and allows precise control over the base model's reasoning skills. We find that (i) naive fine-tuning catastrophically forgets previously internalized skills, while SNs preserve them by construction; (ii) SN-learned skills compose with existing skills both in- and out-of-distribution, with the composition gap to fine-tuning shrinking with model scale; and (iii) RL-trained SNs match fully-internalized skills on compositions involving existing capabilities, with pass@$k$ analysis confirming genuine skill acquisition rather than distribution sharpening. Together, these findings establish skill neologisms as a promising path for continually extending LLM reasoning capabilities.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 109
Loading