ASAudio: A Survey of Advanced Spatial Audio Research

ASAudio: A Survey of Advanced Spatial Audio Research

ACL ARR 2025 July Submission553 Authors

28 Jul 2025 (modified: 08 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the rapid development of spatial audio technologies today, applications in AR, VR and other scenarios have garnered extensive attention. Unlike traditional mono sound, spatial audio offers a more realistic and immersive auditory experience. Despite notable progress in the field, there remains a lack of comprehensive surveys that systematically organize and analyze these methods and their underlying technologies. In this paper, we provide a comprehensive overview of spatial audio and systematically review recent literature in the area. To address this, we chronologically outline existing work related to spatial audio and categorize these studies based on input-output representations, as well as generation and understanding tasks, thereby summarizing various research aspects of spatial audio. In addition, we review related datasets, evaluation metrics, and benchmarks, offering insights from both training and evaluation perspectives. Related materials are available at https://github.com/ASAudio/ASAudio.

Paper Type: Long

Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding

Research Area Keywords: Spatial Audio, speech technologies, speech and vision

Contribution Types: Surveys

Languages Studied: None

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: Yes

A2 Elaboration: Yes — see Section Ethical Considerations

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Section 1(Introduction), 2(Representation), 3(Understanding Models), 4(Generation Models), 5(Datasets and Evaluation Metrics)

B2 Discuss The License For Artifacts: Yes

B2 Elaboration: We cite and respect the licenses of all artifacts. See Appendix E. We report the licenses or stated terms of datasets, models, and tools where available, and we do not redistribute any third‑party artifacts.

B3 Artifact Use Consistent With Intended Use: Yes

B3 Elaboration: This work is a survey only. We do not collect or redistribute data, nor deploy models. Any referenced artifacts are cited and, when consulted, used strictly within their intended research‑only or non‑commercial terms.

B4 Data Contains Personally Identifying Info Or Offensive Content: No

B4 Elaboration: This is just a survey and does not contains data

B5 Documentation Of Artifacts: Yes

B5 Elaboration: We provide tables of mentioned models and datasets in appendix B and C.

B6 Statistics For Data: N/A

C Computational Experiments: No

C1 Model Size And Budget: N/A

C2 Experimental Setup And Hyperparameters: N/A

C3 Descriptive Statistics: N/A

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: No

E1 Elaboration: AI tools are used for grammar and spelling checks.

Author Submission Checklist: yes

Submission Number: 553

Loading