Archiving Submission: Yes (archival)
Keywords: tokenization, constrained generation, automata theory
TL;DR: We discuss tokenizer-aware, finite-state-transducer-based subword-level constrained generation implementation pitfalls and details.
Abstract: Constrained generation, where language models are forced to output text that adheres to a specified format, is a powerful tool for many tasks. Several libraries implement variants of it as the foundation for a larger feature set. In implementing our own version, we uncovered many subtle problems (some of which are present in existing libraries) that can affect the downstream performance of models that use constrained decoding.
Here, we describe the process and common pitfalls when implementing robust constrained generation on the example of \textsc{Llama2}, but which can be extended to all major tokenizers. Furthermore, we address favorable properties of our character-to-canonical pipeline (ease-of-use, efficiency, modularity, etc.). We hope this work guides you and your tokens to reliably correct constrained outputs.
Submission Number: 51
Loading