Dropped Scheduled Task: Mitigating Negative Transfer in Multi-task Learning using Dynamic Task Dropping
Abstract: In Multi-Task Learning (MTL), K distinct tasks are jointly optimized. With the varying nature and complexities of tasks, few tasks might dominate learning. For other tasks, their respective performances may get compromised due to a negative transfer from dominant tasks. We propose a Dropped-Scheduled Task (DST) algorithm, which probabilistically “drops” specific tasks during joint optimization while scheduling others to reduce negative transfer. For each task, a scheduling probability is decided based on four different metrics: (i) task depth, (ii) number of ground-truth samples per task, (iii) amount of training completed, and (iv) task stagnancy. Based on the scheduling probability, specific tasks get joint computation cycles while others are “dropped”. To demonstrate the effectiveness of the proposed DST algorithm, we perform multi-task learning on three applications and two architectures. Across unilateral (single input) and bilateral (multiple input) multi-task net- works, the chosen applications are (a) face (AFLW), (b) fingerprint (IIITD MOLF, MUST, and NIST SD27), and (c) character recognition (Omniglot) applications. Experimental results show that the proposed DST algorithm has the minimum negative transfer and overall least errors across different state-of-the-art algorithms and tasks.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: The authors thank the reviewers and the Associate Editor for providing constructive feedback and comments. These comments have indeed strengthened our work and improved the overall manuscript. In comparison to the original submission, the significant changes incorporated into the updated manuscript are: 1. Addressed comments raised by all three reviewers, which majorly include: 1a. Additional comparison results with three recent research studies from the literature. 1b. Added a new Section 1.1 on Page 2, titled “Background and Problem Formulation”. 1c. Formally defined negative transfer and showed a statistical impact of the negative transfer on MTL. 2. Further, we have prepared a deanonymized camera-ready version of the paper and included an acknowledgment section. 3. We have added the link to the code of the proposed DST algorithm on Pg 5. This will enhance wider usage of the algorithm and assist in the reproducibility of the results 4. Lastly, we have fixed minor grammatical errors to improve the manuscript's readability.
Assigned Action Editor: ~Tie-Yan_Liu1
Submission Number: 376