Demystifying Representation Spaces of Multilingual and Multimodal Aspects in Large Audio Language Models

ACL ARR 2025 July Submission1033 Authors

29 Jul 2025 (modified: 03 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Mechanistic interpretability of large language models (LLMs) has driven the development of various language model capabilities, such as controllable generation, knowledge editing, model stitching, and more. However, the interpretability of LLMs in multimodal and multilingual contexts remains underexplored, even as the complexity of language models continues to grow over time. This paper investigates how large audio language models (LALMs) process and represent language, modality, and speaker demography. Through a series of experiments, the latent processing states of two state-of-the-art open-weight LALMs: Ultravox 0.5 and Qwen2 Audio, are extracted and analyzed using various types of input. This study explores representational patterns based on input feature variations, covering eight languages and two modalities (text and spoken audio). Additionally, paralinguistic features in spoken audio, such as gender, age, and accent, as well as acoustic features resulting from recording environment variations, are also examined. The experimental results reveal clustering patterns that emerge throughout the processing stages, with the presence of such clusters depending on its input features. Through these experiments, this study lays the groundwork for further research involving the representational space of language models.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Mechanistic, Representation, LALM, Multimodal, Multilingual
Contribution Types: Model analysis & interpretability
Languages Studied: English, French, German, Chinese, Japanese, Indonesian, Vietnamese, Spanish
Previous URL: https://openreview.net/forum?id=G4bRs6NChM
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).
Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: N/A
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 4
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: Ethical Consideration
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Ethical Consideration
B4 Data Contains Personally Identifying Info Or Offensive Content: N/A
B5 Documentation Of Artifacts: N/A
B6 Statistics For Data: N/A
C Computational Experiments: Yes
C1 Model Size And Budget: No
C1 Elaboration: Section 4
C2 Experimental Setup And Hyperparameters: N/A
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 5
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: Yes
E1 Elaboration: Ethical Consideration
Author Submission Checklist: yes
Submission Number: 1033
Loading