Generating visual-adaptive audio representation for audio recognition

Jongsu Youn, Dae Ung Jo, Seungmo Seo, Sukhyun Kim, Jongwon Choi

Published: 2025, Last Modified: 14 May 2025Pattern Recognit. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We introduce an audio feature, easily employed in previous studies using spectrogram.•We propose a Visual Adaptive Spectrogram Generation (VASG).•VASG uses audiovisual correspondence of unlabeled video data.•Our model can be applied to spectrogram-based audio tasks without the visual inputs.