Abstract: We aim to train a multi-task model such that users can
adjust the desired compute budget and relative importance
of task performances after deployment, without retraining.
This enables optimizing performance for dynamically varying user needs, without heavy computational overhead to
train and save models for various scenarios. To this end,
we propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and
decoder channel widths are slimmable. Our key idea is to
control the task importance by varying the capacities of
task-specific decoders, while controlling the total computational cost by jointly adjusting the encoder capacity. This
improves overall accuracy by allowing a stronger encoder
for a given budget, increases control over computational
cost, and delivers high-quality slimmed sub-architectures
based on user’s constraints. Our training strategy involves a
novel ‘Configuration-Invariant Knowledge Distillation’ loss
that enforces backbone representations to be invariant under
different runtime width configurations to enhance accuracy.
Further, we present a simple but effective search algorithm
that translates user constraints to runtime width configurations of both the shared encoder and task decoders, for
sampling the sub-architectures. The key rule for the search
algorithm is to provide a larger computational budget to
the higher preferred task decoder, while searching a shared
encoder configuration that enhances the overall MTL performance. Various experiments on three multi-task benchmarks
(PASCALContext, NYUDv2, and CIFAR100-MTL) with diverse backbone architectures demonstrate the advantage of
our approach. For example, our method shows a higher
controllability by ∼ 33.5% in the NYUD-v2 dataset over
prior methods, while incurring much less compute cost.
0 Replies
Loading