Abstract: Audio-based activity recognition is an essential task in a wide range of human-centric applications. However, most of the work predominantly focuses on event detection, machine sound classification, road surveillance, scene classification, etc. There has been negligible attention to the recognition of low-intensity human activities for outdoor scenarios. This paper proposes a deep learning-based framework for recognizing different low-intensity human activities in a sparsely populated outdoor environment using audio. The proposed framework classifies 2.0 s long audio recordings into one of nine different activity classes. A variety of audio sounds in an outdoor environment makes it challenging to distinguish human activities from other background sounds. The proposed framework is an end-to-end architecture that employs a combination of mel-frequency cepstral coefficients and a 2D convolutional neural network to obtain a deep representation of activities and classify them. The extensive experimental analysis demonstrates that the proposed framework outperforms existing frameworks by 16.43% on the parameter F1-score. Additionally, we collected and provided an audio dataset for evaluation and benchmarking purposes to the research community.
0 Replies
Loading