Analysis and Explainability of LLMs Via Evolutionary Methods

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLMs, Evolutionary Methods, Trees
TL;DR: This paper expands evolutionary methods for analyzing and drawing inferences about LLMs and relationships among the models for improved explainability and safety.
Abstract: Evolutionary methods have proven to be useful for analysis and explainability in the areas of genetics, biology, ecology, and more. In this work, we expand upon and extend these methods for neural networks, specifically for Large Language Models (LLMs), to better analyze and explain the relationships between them. We demonstrate how relating weights to genotypes (genetic makeup) and output text to phenotypes (observable traits) can result in enhanced understanding of lineage of models, important datasets, purpose of different layers of the models and also improved visualizations. We demonstrate this with a controlled experiment, in which we show that our estimated evolutionary trees reliably recreate the topology of the ground-truth evolutionary tree. We further examine the most important weight layers according to the weight differences, and show through phenotypic experiments that a certain dataset for training seems to add more important information than the other datasets. Finally, we generate an unsupervised evolutionary tree of black-box foundation models. Throughout, we provide visualizations to provide a better understanding of evolutionary relationships.
Primary Area: interpretability and explainable AI
Submission Number: 6243
Loading