ABLE: Representing and Mapping LLMs via Attribution-Based Large-model Embedding

ABLE: Representing and Mapping LLMs via Attribution-Based Large-model Embedding

ACL ARR 2026 January Submission3799 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Representation Learning, Model Representation, Feature Attribution

Abstract: The explosive growth of large language models (LLMs) has led to an opaque ecosystem where undocumented model relationships hinder copyright protection, security auditing, and model routing. Existing representation methods struggle to address this challenge efficiently. Approaches analyzing internal parameters face scalability barriers due to structural heterogeneity across diverse architectures, while methods relying on external outputs are susceptible to behavioral mimicry, where distinct models converge to similar predictions despite differing underlying mechanisms. To bridge this gap, we propose ABLE (Attribution-Based Large-model Embedding), a novel framework that leverages the interpretability space to construct model representations. By aggregating gradient-based feature attributions via a tokenizer-agnostic word-level alignment, ABLE captures the intrinsic cognitive patterns of models rather than surface-level outputs. Beyond empirical utility, we proved that ABLE is a Lipschitz continuous mapping with finite-sample convergence guarantees, ensuring stability and reliability. Extensive experiments on 239 LLMs demonstrate that our training-free approach achieving competitive or superior performance in relation prediction, model routing and benchmark score prediction.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: representation learning,feature attribution

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 3799

Loading