Beyond Traditional Transfer Learning: Co-finetuning for Action Localisation

TMLR Paper749 Authors

03 Jan 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Transfer learning is the predominant paradigm for training deep networks on small target datasets. Models are typically pretrained on large “upstream” datasets for classification, as such labels are easy to collect, and then finetuned on “downstream” tasks such as action localisation, which are smaller due to their finer-grained annotations. In this paper, we question this approach, and propose co-finetuning – simultaneously training a single model on multiple “upstream” and “downstream” tasks. We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data, and also show how we can easily extend our approach to multiple “upstream” datasets to further improve performance. In particular, co-finetuning significantly improves the performance on rare classes in our downstream task, as it has a regularising effect, and enables the network to learn feature representations that transfer between different datasets. Finally, we observe how co-finetuning with public, video classification datasets, we are able to achieve significant improvements for spatio-temporal action localisation on the challenging AVA and AVA-Kinetics datasets, outperforming recent works which develop intricate models.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Updated Section 2 (Related work) with discussion about papers on multi-source domain adaptation. - Updated Section 3.1 (Model) with clarification of bounding box parameterisation. Also changed the notation so that ground truth and predictions are distinct. - Updated Tables 1 and 2 with the total number of training iterations - Updated Section 4.2 (Ablation study) with discussion of how the total training time is not changed. - Updated appendix with further analysis of regularisation. - Updated appendix with additional analysis of training jointly on upstream datasets. - Miscellaneous updates on writing throughout the paper, as suggested by Reviewers N47V and u1GF.
Assigned Action Editor: ~Yale_Song1
Submission Number: 749
Loading