ABLE: Representing and Mapping LLMs via Attribution-Based Large-model Embedding

ACL ARR 2026 January Submission3799 Authors

04 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Representation Learning, Model Representation, Feature Attribution
Abstract: The explosive growth of large language models (LLMs) has led to an opaque ecosystem where undocumented model relationships hinder copyright protection, security auditing, and model routing. Existing representation methods struggle to address this challenge efficiently. Approaches analyzing internal parameters face scalability barriers due to structural heterogeneity across diverse architectures, while methods relying on external outputs are susceptible to behavioral mimicry, where distinct models converge to similar predictions despite differing underlying mechanisms. To bridge this gap, we propose ABLE (Attribution-Based Large-model Embedding), a novel framework that leverages the interpretability space to construct model representations. By aggregating gradient-based feature attributions via a tokenizer-agnostic word-level alignment, ABLE captures the intrinsic cognitive patterns of models rather than surface-level outputs. Beyond empirical utility, we proved that ABLE is a Lipschitz continuous mapping with finite-sample convergence guarantees, ensuring stability and reliability. Extensive experiments on 239 LLMs demonstrate that our training-free approach achieving competitive or superior performance in relation prediction, model routing and benchmark score prediction.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: representation learning,feature attribution
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 3799
Loading