K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning

Pramod Kaushik Mudrakarta; Mark Sandler; Andrey Zhmoginov; Andrew Howard

K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning

Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, Andrew Howard

Published: 21 Dec 2018, Last Modified: 05 May 2023ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks. The basic approach is to learn a model patch - a small set of parameters - that will specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases is sufficient to convert a pretrained network to perform well on qualitatively different problems (e.g. converting a Single Shot MultiBox Detection (SSD) model into a 1000-class image classification model while reusing 98% of parameters of the SSD feature extractor). Similarly, we show that re-learning existing low-parameter layers (such as depth-wise convolutions) while keeping the rest of the network frozen also improves transfer-learning accuracy significantly. Our approach allows both simultaneous (multi-task) as well as sequential transfer learning. In several multi-task learning problems, despite using much fewer parameters than traditional logits-only fine-tuning, we match single-task performance.

Keywords: deep learning, mobile, transfer learning, multi-task learning, computer vision, small models, imagenet, inception, batch normalization

TL;DR: A novel and practically effective method to adapt pretrained neural networks to new tasks by retraining a minimal (e.g., less than 2%) number of parameters

Data: [COCO](https://paperswithcode.com/dataset/coco), [FGVC-Aircraft](https://paperswithcode.com/dataset/fgvc-aircraft-1), [ImageNet](https://paperswithcode.com/dataset/imagenet), [Oxford 102 Flower](https://paperswithcode.com/dataset/oxford-102-flower), [Places](https://paperswithcode.com/dataset/places), [ssd](https://paperswithcode.com/dataset/openlane-v1)

9 Replies

Loading