Multimodal Topic Segmentation of Podcast Shows with Pre-trained Neural EncodersOpen Website

Published: 01 Jan 2023, Last Modified: 04 Aug 2023ICMR 2023Readers: Everyone
Abstract: We present two multimodal models for topic segmentation of podcasts built on pre-trained neural text and audio embeddings. We show that results can be improved by combining different modalities; but also by combining different encoders from the same modality, especially general-purpose sentence embeddings with specifically fine-tuned ones. We also show that audio embeddings can be substituted with two simple features related to sentence duration and inter-sentential pauses with comparable results. Finally, we publicly release our two datasets, the first in our knowledge publicly and freely available multimodal datasets for topic segmentation.
0 Replies

Loading