Demystifying Representation Spaces of Multilingual and Multimodal Aspects in Large Audio Language Models

Demystifying Representation Spaces of Multilingual and Multimodal Aspects in Large Audio Language Models

ACL ARR 2025 July Submission1033 Authors

29 Jul 2025 (modified: 03 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Mechanistic interpretability of large language models (LLMs) has driven the development of various language model capabilities, such as controllable generation, knowledge editing, model stitching, and more. However, the interpretability of LLMs in multimodal and multilingual contexts remains underexplored, even as the complexity of language models continues to grow over time. This paper investigates how large audio language models (LALMs) process and represent language, modality, and speaker demography. Through a series of experiments, the latent processing states of two state-of-the-art open-weight LALMs: Ultravox 0.5 and Qwen2 Audio, are extracted and analyzed using various types of input. This study explores representational patterns based on input feature variations, covering eight languages and two modalities (text and spoken audio). Additionally, paralinguistic features in spoken audio, such as gender, age, and accent, as well as acoustic features resulting from recording environment variations, are also examined. The experimental results reveal clustering patterns that emerge throughout the processing stages, with the presence of such clusters depending on its input features. Through these experiments, this study lays the groundwork for further research involving the representational space of language models.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Mechanistic, Representation, LALM, Multimodal, Multilingual

Contribution Types: Model analysis & interpretability

Languages Studied: English, French, German, Chinese, Japanese, Indonesian, Vietnamese, Spanish

Previous URL: https://openreview.net/forum?id=G4bRs6NChM

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).

Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 4

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: Ethical Consideration

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: Ethical Consideration

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B5 Documentation Of Artifacts: N/A

B6 Statistics For Data: N/A

C Computational Experiments: Yes

C1 Model Size And Budget: No

C1 Elaboration: Section 4

C2 Experimental Setup And Hyperparameters: N/A

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 5

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: Yes

E1 Elaboration: Ethical Consideration

Author Submission Checklist: yes

Submission Number: 1033

Loading