Abstract: Multi-task learning (MTL) has found wide application in computer vision tasks. We train a backbone network to learn a shared representation for different tasks such as semantic segmentation, depth- and normal estimation. In many cases negative transfer, i.e. impaired performance in the target domain, causes the MTL accuracy to be lower than training the corresponding single-task networks. To mitigate this issue, we propose an online knowledge distillation method, where single-task networks are trained simultaneously with the MTL network to guide the optimization process. We propose selectively training layers for each task using an adaptive feature distillation (AFD) loss with an online task weighting (OTW) scheme. This task-wise feature distillation enables the MTL network to be trained in a similar way to the single-task networks. On the NYUv2 and Cityscapes datasets we show improvements over a baseline MTL model by 6.22% and 9.19%, respectively, outperforming recent MTL methods. We validate the design choices in ablative experiments, including the use of online task weighting and the adaptive feature distillation loss.
0 Replies
Loading