How a Bilingual LM Becomes Bilingual: Tracing Internal Representations with Sparse Autoencoders

How a Bilingual LM Becomes Bilingual: Tracing Internal Representations with Sparse Autoencoders

ACL ARR 2025 May Submission7295 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This study explores how bilingual language models develop complex internal representations. We employ sparse autoencoders to analyze internal representations of bilingual language models with a focus on the effects of training steps, layers, and model sizes. Our analysis shows that language models first learn languages separately, and then gradually form bilingual alignments, particularly in the middle layers. We also found that this bilingual tendency is stronger in larger models. Building on these findings, we demonstrate the critical role of bilingual representations in model performance by employing a novel method that integrates decomposed representations from a fully trained model into a mid-training model. Our results provide insights into how language models acquire bilingual capabilities.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: knowledge tracing, probing, multilingualism, multilingual representations,

Contribution Types: Model analysis & interpretability

Languages Studied: English, Japanese

Submission Number: 7295

Loading