Hierarchical Residual-pyramidal Model for Large Context Based Media Presence DetectionDownload PDF

24 Sept 2019 (modified: 14 Mar 2022)OpenReview Archive Direct UploadReaders: Everyone
Abstract: We study media presence detection, that is, learning to recognize if a sound segment (typically lasting for a few seconds) of a long recorded stream contains media (TV) sound. This problem is difficult because non-media sound sources can be quite diverse (e.g. human voicing, non-vocal sounds, and non-human sounds), and the recorded sound can be a mixture of media and non-media sound. Different from speech recognition, where the recognizer needs to detect local phonetic variation, the key features used to distinguish media and non-media sounds are nonlocal features. Motivated by this, we propose a hierarchical model to learn the representation of each pre-chunked segment within a long recorded stream jointly, and encourage every local representation to be not sensitive to variations within each segment. We also further explore the effects of techniques including stream-based normalization and iteratively imputing missing labels of the training dataset. Experimental results indicate that our proposed contextual based methods are effective for media presence detection.
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview