Abstract: Highlights•A novel progressively sparse multi-task architecture learning method, namely dual-mask, is proposed, which selects the salient channels and layers from a dense network for each task in a differentiable manner, enabling the findings of the flexible and efficient feature sharing architectures.•A simple yet effective importance-guided relaxation method is proposed to learn sparse, high-quality masks for tasks. It enables the simultaneous learning of both hybrid binary and real-value masks and model parameters.•A progressive training strategy with continuation is designed to induce mask sparsity gradually instead of forcing the mask to be sparse at the beginning, which benefits the MTL model by increasing inductive bias across tasks.
Loading