RPG-MoGe: Relation Prompt-Guided Multi-Order Generative Ensemble Framework for Speech Relation Extraction

ACL ARR 2025 February Submission2239 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Speech Relation Extraction (SpeechRE) aims to extract relation triplets directly from speech data. However, existing datasets suffer from limited quantity and diversity of real-human speech in their training sets, while current models are constrained by fixed single-order generation templates and a lack of high-level semantic alignment, significantly hindering their performance. To address these challenges, we introduce **CommonVoice-SpeechRE**, a large-scale dataset comprising nearly 20,000 real-human speech samples from diverse speakers, establishing a new benchmark for SpeechRE research. Furthermore, we propose the **R**elation **P**rompt-**G**uided **M**ulti-**O**rder **G**enerative **E**nsemble (**RPG-MoGe**), a novel framework that features: (1) a multi-order triplet generation ensemble strategy, leveraging data diversity through diverse element orders during both training and inference, and (2) CNN-based latent relation prediction heads that generate explicit relation prompts to guide cross-modal alignment and accurate triplet generation. Extensive experiments demonstrate the superiority of our framework, outperforming state-of-the-art baselines. Our work not only provides a valuable dataset resource for the community but also offers an effective methodology to advance SpeechRE in real-world applications.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: Information Extraction,Speech Recognition, Text-to-Speech and Spoken Language Understanding,Resources and Evaluation,
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources
Languages Studied: English
Submission Number: 2239
Loading