S2E: Towards an End-to-End Entity Resolution Solution from Acoustic Signal

Published: 01 Jan 2024, Last Modified: 20 May 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Traditional cascading Entity Resolution (ER) pipeline suffers from propagated errors from upstream tasks. We address this issue by formulating a new end-to-end (E2E) ER problem, Signal-to-Entity (S2E), resolving query entity mentions to actionable entities in textual catalogs directly from audio queries instead of audio transcriptions in raw or parsed format. Additionally, we extend the E2E Spoken Language Understanding framework by introducing a novel dimension to ER research. We adapt three public datasets for the S2E task, and propose a novel solution, which aligns the multimodal signals via an effective retrieval co-attention mechanism and refined multimodal objectives. Despite 42% smaller in terms of the total model size, the proposed design outperforms the cascading baseline by 2.6%, 47.0%, and 73.3% across the three datasets respectively with different acoustic conditions.
Loading