Keywords: post-training, LLM, deep learning, representation learning
Abstract: The success of powerful open source Large Language Models (LLMs) has enabled the community to create a vast collection of post-trained models adapted to specific tasks and domains. However, navigating and understanding these models remains challenging due to inconsistent metadata and unstructured repositories. We introduce Delta Activations, a method to represent finetuned models as vector embeddings by measuring shifts in their internal activations relative to a base model. Clustering analysis shows that Delta Activations achieve strong separation of finetuned domains, significantly outperforming baselines such as flattened weights, salient parameter masks, and output embeddings, while being more lightweight and computationally efficient. Delta Activations also demonstrate desirable properties: it is robust across finetuning settings and exhibits an additive property when finetuning datasets are mixed. We also explore extensions of Delta Activations: it can represent tasks via few-shot finetuning for reliable model retrieval and guide model selection for merging by quantifying similarity between models. Furthermore, activations can be substituted with other representation extraction methods, demonstrating the flexibility of the broader Delta-X framework.
We hope Delta Activations can facilitate the practice of reusing publicly available models.
Submission Number: 239
Loading