Egocentric Spatial Reference Resolution under ASR Noise in Traditional Chinese Conversational Navigation

ACL ARR 2026 January Submission9400 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Egocentric Spatial Language, Conversational Navigation, ASR Noise, Spatial Reasoning, Multimodal Chain-of-Thought, Traditional Chinese
Abstract: Resolving egocentric spatial expressions in conversational navigation requires mapping speaker-centric references such as “on my right” to allocentric orientations (e.g., north, east, south, west). In practice, this process is often mediated by automatic speech recognition (ASR), where transcription errors, linguistic variation, and referential ambiguity obscure spatial relations. Despite progress in navigation and multimodal reasoning, egocentric spatial reference resolution under ASR-transcribed, non-English conversational input remains underexplored. We introduce Conversational Orientation Reasoning (COR), a diagnostic benchmark for egocentric spatial reference resolution in Traditional Chinese conversational navigation. COR pairs ASR-transcribed utterances with structured landmark coordinates derived from real-world environments, while controlling spatial geometry to isolate the effects of spoken language noise. This enables evaluation of transcription errors, linguistic variation, and referential ambiguity beyond end-to-end navigation success. Using COR, we study an interpretable decomposition of orientation inference into three steps: extracting spatial relations, mapping landmark coordinates to absolute directions, and inferring the speaker’s orientation. Models following this decomposition achieve high performance on clean text and retain high accuracy under ASR-transcribed input, while unstructured baselines degrade substantially. COR supports analysis of failure modes in spoken spatial reference resolution, providing insights into how language grounding degrades under noisy conversational input.
Paper Type: Long
Research Area: Speech Processing and Spoken Language Understanding
Research Area Keywords: automatic speech recognition, speech technologies, spoken dialog, spoken language understanding, spoken language grounding, spoken language translation, QA via spoken queries
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: Traditional Chinese
Submission Number: 9400
Loading