City2Scene: Improving Acoustic Scene Classification with City Features

Yiqiang Cai, Yizhou Tan, Peihong Zhang, Yuxuan Liu, Shengchen Li, Xi Shao, Mark D. Plumbley

Published: 2025, Last Modified: 12 Jun 2025CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Acoustic scene recordings are often collected from a diverse range of cities. Most existing acoustic scene classification (ASC) approaches focus on identifying common acoustic scene patterns across cities to enhance generalization. In contrast, we hypothesize that city-specific environmental and cultural differences in acoustic features are beneficial for the ASC task. In this paper, we introduce City2Scene, a novel framework that leverages city features to improve ASC. City2Scene transfers the city-specific knowledge from city classification models to a scene classification model using knowledge distillation. We evaluated City2Scene on the DCASE Challenge Task 1 datasets, where each audio clip is annotated with both scene and city labels. Experimental results demonstrate that city features provide valuable information for classifying scenes. By distilling the city-specific knowledge, City2Scene effectively improves accuracy for various state-of-the-art ASC backbone models, including both CNNs and Transformers.