Abstract: Traditional multi-task learning often relies on explicit task interaction mechanisms to enhance multi-task performance. However, these approaches encounter challenges such as negative transfer when jointly learning multiple weakly correlated tasks. Additionally, these methods handle encoded features at a large scale, which escalates computational complexity to ensure dense prediction task performance. In this study, we introduce a Task-Interaction-Free Network (TIF) for multi-task learning, which diverges from explicitly designed task interaction mechanisms. Firstly, we present a Scale Attentive-Feature Fusion Module (SAFF) to enhance each scale in the shared encoder to have rich task-agnostic encoded features. Subsequently, our proposed task and scale-specific decoders efficiently decode the enhanced features shared across tasks without necessitating task-interaction modules. Concretely, we utilize a Self-Feature Distillation Module (SFD) to explore task-specific features at lower scales and the Low-To-High Scale Feature Diffusion Module (LTHD) to diffuse global pixel relationships from low-level to high-level scales. Experiments on publicly available multi-task learning datasets validate that our TIF attains state-of-the-art performance.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: Multi-task learning (MTL) has been a popular research domain in recent years. In this paper, we have rethinked the task interaction manner existing in current MTL methods. Such interaction manner typically leads to negative transferring problem when processing multi-modal fusion, which we implemented extensive experiments to prove it. Furthermore, pixel-level dense prediction tasks generally encounter high model complexity problems when they implement global dependency. Therefore, we propose the Task-Interaction-Free Network (TIF) to handle MTL with a novel task-interaction manner, which is proved by the extensive experiments. We hope such multi-modal fusion method can provide the researchers a novel way to fuse multi-modal features, which can avoid the negative transferring. Meanwhile, we also provide a novel method to efficiently process global relationships in high-scale pixel-level tasks. We also believe that the proposed Scale Feature Diffusion method can achieve high performance and high efficiency on MTL dense prediction domains.
Submission Number: 2921
Loading