MatchTime: Towards Automatic Soccer Game Commentary Generation

MatchTime: Towards Automatic Soccer Game Commentary Generation

ACL ARR 2024 June Submission4458 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Soccer is a globally popular sport with a vast audience, in this paper, we consider to construct an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: *First*, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for soccer game commentary generation, termed as ***SN-Caption-test-align***; *Second*, we propose a multimodal temporal alignment pipeline to automatically correct and filter the existing dataset at scale, creating a higher-quality soccer game commentary dataset for training, denoted as ***MatchTime***; *Third*, based on our curated dataset, we train an automatic commentary generation model, named ***MatchVoice***. Extensive experiments and ablation studies have demonstrated the effectiveness of our alignment pipeline, and training model on the curated datasets achieves state-of-the-art performance for commentary generation, showcasing that better alignment can lead to significant performance improvements in downstream tasks.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: image text matching, cross-modal content generation, cross-modal application, multimodality

Contribution Types: Model analysis & interpretability, Data analysis

Languages Studied: English

Submission Number: 4458

Loading