Keywords: Large Language Models, Multilingualism, Sparse Autoencoders
TL;DR: We provide several distinct patterns between high- and low-resource languages in LLMs through the lens of Sparse Autoencoders
Abstract: Despite the impressive multilingual capabilities of recent large language models (LLMs), the mechanisms underlying their language-specific processing remain largely unclear. In this paper, we investigate how LLMs handle multilingualism through the lens of sparse autoencoders (SAEs), uncovering distinctive patterns that offer new insights into their internal workings. Specifically, we introduce two novel concepts—task instruction–focused (TF) and heading-focused (HF) SAE features—and use them to reveal intrinsic discrepancies between high- and low-performing languages. Our analysis yields several key findings: (1) SAEs provide concrete evidence that LLMs have a precise understanding of prompt structure; (2) heading keywords (e.g., “Question,” “Choices,” and “Answer”) play a distinct role in LLM processing; and (3) low-performing languages exhibit a relative deficiency in TF features compared to high-performing languages.
Building on these insights, we propose two practical strategies to improve zero-shot multilingual performance: (1) incorporating English heading keywords and (2) amplifying TF features through steering. Our approach improves zero-shot performance in low-performing languages by up to 3.7% on average on ARC-Challenge and MMLU, while also shedding new light on fundamental differences between high- and low-performing languages in LLMs. Our code is available at https://github.com/ihcho2/SAE-ML.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 889
Loading