It's What You Say and How You Say It: Exploring Textual and Audio Features for Podcast DataDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Podcasts are relatively new media in the form of spoken documents or conversations with a wide range of topics, genres, and styles. With a massive increase in the number of podcasts and their listener base, it is beneficial to understand podcasts better, to derive insights into questions such as what makes certain podcasts more popular than others or which tags help in characterizing a podcast. In this work, we provide a comprehensive analysis of hand-crafted features from two modalities, i.e., text and audio. We explore multiple feature combinations considering podcast popularity prediction and multi-label tag assignment as proxy downstream tasks. In our experiments, we use document embeddings, affective features, named entities, tags, and topics as the textual features, while multi-band modulation and traditional speech processing features constitute the audio features. We find the audio feature prosody and textual affective features, sentiment, and emotions are significant for both the downstream tasks. We observe that the combination of textual and audio features helps in improving performance in the popularity prediction task.
Paper Type: long
0 Replies

Loading