Abstract: With the rapid development of spatial audio technologies today, applications in AR, VR and other scenarios have garnered extensive attention.
Unlike traditional mono sound, spatial audio offers a more realistic and immersive auditory experience.
Despite notable progress in the field, there remains a lack of comprehensive surveys that systematically organize and analyze these methods and their underlying technologies.
In this paper, we provide a comprehensive overview of spatial audio and systematically review recent literature in the area.
To address this, we chronologically outline existing work related to spatial audio and categorize these studies based on input-output representations, as well as generation and understanding tasks,
thereby summarizing various research aspects of spatial audio.
In addition, we review related datasets, evaluation metrics, and benchmarks, offering insights from both training and evaluation perspectives.
Related materials are available at https://github.com/ASAudio/ASAudio.
Paper Type: Long
Research Area: Speech Recognition, Text-to-Speech and Spoken Language Understanding
Research Area Keywords: Spatial Audio, speech technologies, speech and vision
Contribution Types: Surveys
Languages Studied: None
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Yes — see Section Ethical Considerations
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 1(Introduction), 2(Representation), 3(Understanding Models), 4(Generation Models), 5(Datasets and Evaluation Metrics)
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: We cite and respect the licenses of all artifacts. See Appendix E. We report the licenses or stated terms of datasets, models, and tools where available, and we do not redistribute any third‑party artifacts.
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: This work is a survey only. We do not collect or redistribute data, nor deploy models. Any referenced artifacts are cited and, when consulted, used strictly within their intended research‑only or non‑commercial terms.
B4 Data Contains Personally Identifying Info Or Offensive Content: No
B4 Elaboration: This is just a survey and does not contains data
B5 Documentation Of Artifacts: Yes
B5 Elaboration: We provide tables of mentioned models and datasets in appendix B and C.
B6 Statistics For Data: N/A
C Computational Experiments: No
C1 Model Size And Budget: N/A
C2 Experimental Setup And Hyperparameters: N/A
C3 Descriptive Statistics: N/A
C4 Parameters For Packages: N/A
D Human Subjects Including Annotators: No
D1 Instructions Given To Participants: N/A
D2 Recruitment And Payment: N/A
D3 Data Consent: N/A
D4 Ethics Review Board Approval: N/A
D5 Characteristics Of Annotators: N/A
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: No
E1 Elaboration: AI tools are used for grammar and spelling checks.
Author Submission Checklist: yes
Submission Number: 553
Loading