Abstract: The deployment of deep networks on mobile devices requires to efficiently use
the scarce computational resources, expressed as either available memory or
computing cost. When addressing multiple tasks simultaneously, it is extremely
important to share resources across tasks, especially when they all consume the
same input data, e.g., audio samples captured by the on-board microphones. In
this paper we propose a multi-task model architecture that consists of a shared
encoder and multiple task-specific adapters. During training, we learn the model
parameters as well as the allocation of the task-specific additional resources
across both tasks and layers. A global tuning parameter can be used to obtain
different multi-task network configurations finding the desired trade-off
between cost and the level of accuracy across tasks. Our results show that this
solution significantly outperforms a multi-head model baseline. Interestingly,
we observe that the optimal resource allocation depends on both the task
intrinsic characteristics as well as on the targeted cost measure (e.g., memory
or computing cost).
Keywords: Audio, multi-task learning
Original Pdf: pdf
4 Replies
Loading