Keywords: domain generalization, Large Language Model, LLM, activation steering, activation engineering, generative, classification, out of distribution, OOD, style transfer
TL;DR: We introduce CONTXT, a simple and intuitive way to augment contextual information in feature representations that can improve classifier performance and steer LLM outputs without retraining.
Abstract: Artificial Neural Networks (ANNs) are increasingly deployed across diverse domains, often requiring them to generalize beyond their training conditions. This shift in context frequently leads to performance degradation, a central challenge in Domain Generalization (DG). While numerous techniques exist to mitigate this issue (e.g., fine-tuning, activation steering, meta-learning, adversarial training, normalization-based approaches, and parameter-efficient methods such as prompt tuning), they are often complex, resource-intensive, and difficult to scale; particularly for large models like Large Language Models (LLMs). In contrast, we introduce CONTXT (\emph{\textbf{C}ontextual augmentati\textbf{O}n for \textbf{N}eural fea\textbf{T}ure \textbf{X} \textbf{T}ransforms}): a simple, intuitive, and elegant method for contextual adaptation. CONTXT work by augmenting the model’s internal representations with lightweight, contextually relevant feature indexes through straightforward multiplicative and additive vector operations. Despite its simplicity, CONTXT significantly improves performance across both discriminative (e.g., classification with ANNs/CNNs) and generative (e.g., LLMs) tasks. With minimal computational overhead and straight forward integration, CONTXT layers offer a practical and effective solution to DG and a variety of problems facing ANNs, demonstrating that strong results need not come at the cost of complexity. More generally, CONTXT provides a compact mechanism to manipulate information flow and steer ANN processing in a desired direction without retraining the network.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 21899
Loading