ChameleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters

Kamer Ali Yuksel; Hassan Sawaf

ChameleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters

Kamer Ali Yuksel, Hassan Sawaf

Published: 05 Mar 2025, Last Modified: 14 Apr 2025SCOPE - ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Main paper track (up to 5 pages excluding references and appendix)

Keywords: adaptive inference, low-rank adaptation, large language models, fine-tuning, hyper-networks, batch-aware clustering, efficient adaptation, scalable optimization, contextual adaptation, foundation models, inference-time adaptation, mixture of experts, dynamic model updates, efficient fine-tuning, model optimization

TL;DR: ChameleonLLM, a novel framework for inference-time adaptation of large language models using batch-aware clustering and dynamically generated low-rank updates via a hyper-network, enabling efficient and context-sensitive adaptation.

Abstract: Recent advances in large language models (LLMs) have shown remarkable performance across diverse tasks. However, these models are typically deployed with fixed weights, which limits their ability to adapt dynamically to the variability inherent in real-world data during inference. This paper introduces ChameleonLLM, a novel framework that enables inference-time adaptation of LLMs by leveraging batch-aware clustering and on-the-fly generation of low-rank updates. Unlike traditional fine-tuning approaches such as Low-Rank Adaptation (LoRA) or methods that rely on a fixed set of pre-learned uniforms (changeable masks), our method dynamically generates adaptive modifications to the decoder weights based on the aggregated statistics of clustered batches. By intelligently grouping similar inputs and computing context-aware low-rank updates via a hyper-network, ChameleonLLM achieves significant performance gains, outperforming conventional LoRA methods while eliminating the overhead of maintaining multiple expert models. Our experiments highlight the potential of our approach to serve as a versatile and highly adaptive solution for language model inference. ChameleonLLM is open-sourced to ensure the reproducibility of our experiments: https://anonymous.4open.science/r/ChamaleonLLM/

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 49

Loading