Keywords: large language models, controllability, generation conditioning, data taxonomy
TL;DR: An optimization approach that bridges the gap between training and inference techniques via a highly detailed taxonomy of data characteristics to explicitly control generation attributes and implicitly condition generations during inference.
Abstract: One of the most profound challenges of modern machine learning is performing
well on the long-tail of rare and underrepresented features. Large general-purpose
models are trained for many tasks, but work best on high-frequency use cases.
After training, it is hard to adapt a model to perform well on specific use cases
underrepresented in the training corpus. Relying on prompt engineering or few-shot
examples to maximize the output quality on a particular test case can be frustrating,
as models can be highly sensitive to small changes, react in unpredicted ways
or rely on a fixed system prompt for maintaining performance. In this work, we
ask: Can we optimize our training protocols to both improve controllability and
performance on underrepresented use cases at inference time? We revisit the divide
between training and inference techniques to improve long-tail performance while
providing users with a set of control levers the model is trained to be responsive
to. We create a detailed taxonomy of data characteristics and task provenance to
explicitly control generation attributes and implicitly condition generations at
inference time. We fine-tune a base model to infer these markers automatically,
which makes them optional at inference time. This principled and flexible approach
yields pronounced improvements in performance on examples from the long tail
of the training distribution. Overall, we observe lifts of 5.7% across all tasks.
However, treasure markers are particularly effective at finding difficult to obtain
gains in the long-tail. We observe relative lifts of up to 14.1% on underrepresented
tasks like CodeRepair and absolute improvements of 35.3% on length instruction
following evaluations.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 25770
Loading