Track: long paper (up to 4 pages)
Keywords: parameter-efficient fine-tuning, mixture-of-experts, adaptive inference, low-rank adaptation, large language models, fine-tuning, hyper-networks, batch-aware clustering, efficient adaptation, scalable optimization, contextual adaptation, foundation models, inference-time adaptation, dynamic model updates
TL;DR: ChamaleonLLM, a novel framework for inference-time adaptation of large language models using batch-aware clustering and dynamically generated low-rank updates via a hyper-network, enabling efficient and context-sensitive adaptation.
Abstract: Recent advances in large language models (LLMs) have shown remarkable performance across diverse tasks. However, these models are typically deployed with fixed weights, which limits their ability to adapt dynamically to the variability inherent in real-world data during inference. This paper introduces ChamaleonLLM, a novel framework that enables inference-time adaptation of LLMs by leveraging batch-aware clustering and on-the-fly generation of low-rank updates. Unlike traditional fine-tuning approaches such as Low-Rank Adaptation (LoRA) or methods that rely on a fixed set of pre-learned uniforms (changeable masks), our method dynamically generates adaptive modifications to the decoder weights based on the aggregated statistics of clustered batches. By intelligently grouping similar inputs and computing context-aware low-rank updates via a hyper-network, ChamaleonLLM achieves significant performance gains, outperforming conventional LoRA methods while eliminating the overhead of maintaining multiple expert models. Our experiments highlight the potential of our approach to serve as a versatile and highly adaptive solution for language model inference. ChamaleonLLM is open-sourced to ensure the reproducibility of our experiments: https://anonymous.4open.science/r/ChamaleonLLM/
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 32
Loading