TL;DR: TRACE achieves efficient and adaptable control of language models by using a one-time distilled HMM and a lightweight log-linear classifier to perform exact probabilistic reasoning over future text.
Abstract: As large language models (LMs) advance, there is an increasing need to control their outputs to align with human values (e.g., detoxification) or desired attributes (e.g., personalization, topic). However, autoregressive models focus on next-token predictions and struggle with global properties that require looking ahead. Existing solutions either post-train LMs for each new attribute—expensive and inflexible—or approximate the Expected Attribute Probability (EAP) of future sequences by sampling or training, which is slow and unreliable for rare attributes. We introduce **TRACE** (Tractable Probabilistic Reasoning for Adaptable Controllable gEneration), a novel framework that efficiently computes EAP and adapts to new attributes through tractable *probabilistic* reasoning and lightweight *control*. TRACE distills a Hidden Markov Model (HMM) from an LM and pairs it with a small classifier to estimate attribute probabilities, enabling exact EAP computation over the HMM’s predicted futures. This EAP is then used to reweigh the LM’s next-token probabilities for globally compliant continuations. Empirically, TRACE achieves state-of-the-art detoxification results with only 20% decoding overhead, yields 76 low-resource personalized LMs within seconds, and seamlessly extends to composite attributes.
Lay Summary: AI language models are powerful, but getting them to follow rules can be tricky. How do you make sure an AI stays polite, or create a chatbot that sounds like Taylor Swift? Current methods for controlling AI are often like having to rewire an entire skyscraper just to change the lightbulb in one office—they're slow, expensive, and impractical for each new task.
We developed TRACE, a new technique that acts like a fast, simple "crystal ball" for the AI. At every word it writes, TRACE uses a simplified map of language to peek into thousands of potential future sentences. It checks the odds that a sentence will break a rule (like "be non-toxic") and uses that foresight to guide the AI's word choices in the present.
This approach works. TRACE sets a new standard for preventing toxic language with very little slowdown. And because it's so adaptable, you could teach it a new personality in seconds—letting you finally create that Taylor Swift bot that actually sounds like her. It can even combine complex rules, like asking for a political speech that is also strictly non-toxic.
Link To Code: https://github.com/yidouweng/trace
Primary Area: Probabilistic Methods
Keywords: Controlled Generation, Probabilistic Reasoning, Tractable Inference, Hidden Markov Models (HMM), Detoxification
Submission Number: 15596
Loading