Abstract: In the domain of multimedia and multimodal processing, the efficient handling of diverse data streams—such as images, video, and sensor data—is paramount. Model compression and multitask learning (MTL) are crucial in this field, offering the potential to address the resource-intensive demands of processing and interpreting multiple forms of media simultaneously. However, effectively compressing a multitask model presents significant challenges due to the complexities of balancing sparsity allocation and accuracy performance across multiple tasks. To tackle the challenges, we propose AdapMTL, an adaptive pruning framework for MTL models. AdapMTL leverages multiple learnable soft thresholds independently assigned to the shared backbone and the task-specific heads to capture the nuances in different components' sensitivity to pruning. During training, it co-optimizes the soft thresholds and MTL model weights to automatically determine the suitable sparsity level at each component in order to achieve both high task accuracy and high overall sparsity. It further incorporates an adaptive weighting mechanism that dynamically adjusts the importance of task-specific losses based on each task's robustness to pruning. We demonstrate the effectiveness of AdapMTL through comprehensive experiments on popular multitask datasets, namely NYU-v2 and Tiny-Taskonomy, with different architectures, showcasing superior performance compared to state-of-the-art pruning methods.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: The multitask learning model inherently involves multiple tasks capable of handling diverse data modalities—including images, video, audio, and sensor data. AdapMTL enhances multimedia and multimodal processing by optimizing multitask learning models through adaptive pruning, thereby enhancing their efficiency and effectiveness.
Supplementary Material: zip
Submission Number: 3921
Loading