Abstract: In this paper, we propose a novel kernel learning scheme for acoustic scene classification using multiple short-term observations. The method takes inspiration from the recent result of psychological research - "Humans use summary statistics to perceive auditory sequences" we endeavor to devise computational framework imitating such important auditory mechanism for acoustic scene parsing. Conventional schemes usually encode spectro-temporal patterns with a compact feature vector by time-averaging, e.g. in Gaussian Mixture models (GMM). However, such integration may not be the ideal, since the arithmetic mean is vulnerable to extreme outliers which can be generated by sounds irrelevant to scene category. In this work, an effective scheme has been developed to exploit rich discriminant information from multiple short-term observations of an acoustic scene. Concretely, we first segment audio recording into short slices, e.g. 2 seconds; one vector can be extracted from each slice consisting of descriptive features. Then, we employ the resultant feature matrix to represent an acoustic scene. Since discriminant information of an acoustic scene can be characterized by either global structure or local patterns, we perform heterogeneous kernel analysis in hybrid feature spaces. Moreover, we conditionally fuse the two-way discriminant information to achieve better classification. The proposed method is validated using DCASE2016 challenge dataset. Experimental results demonstrated the effectiveness of our approach.
0 Replies
Loading