Abstract: Overparameterization of pretrained transformers often leads to inefficiencies in diverse Environmental Sound Classification (ESC) tasks, where excessive computation limits deployment in resource-constrained scenarios. To address this issue, we propose Adaptive Depth-wise Pruning (ADP), a task-adaptive and architecture-agnostic model compression framework for efficient ESC. ADP decomposes a general classification model into hierarchical depth-wise modular blocks and adaptively prunes less important blocks based on depth-wise classification performance. By incorporating Self-distillation (SD) into a global optimization framework, ADP efficiently performs depth-wise classification while maintaining a shared feature extraction backbone, and selects the most compact yet effective subnetwork within an acceptable performance degradation range. We evaluate ADP on the Audio Spectrogram Transformer (AST) with the ESC-50 dataset, achieving a 50.68% reduction in parameters with less than a 2% accuracy drop, demonstrating its ability to adaptively balance compression and performance within an acceptable degradation margin. The source code is available at https://github.com/youngwhite/ADP for reproducibility.
Loading