TACO Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

Denis Kuznedelev; Soroush Tabesh; Kimia Noorbakhsh; Elias Frantar; Sara Beery; Eldar Kurtic; Dan Alistarh

TACO Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression

Denis Kuznedelev, Soroush Tabesh, Kimia Noorbakhsh, Elias Frantar, Sara Beery, Eldar Kurtic, Dan Alistarh

Published: 26 Feb 2025, Last Modified: 26 Feb 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent vision architectures and self-supervised training methods have enabled training computer vision models that are extremely accurate, but come with massive computational costs. In settings such as identifying species in camera traps in the field, users have limited resources, and may fine-tune a pretrained model on (often limited) data from a small set of specific categories of interest. Such users may still wish to make use of highly-accurate large models, but are often constrained by the computational cost. To address this, we ask: can we quickly compress generalist models into accurate and efficient specialists given a small amount of data? Towards this goal, we propose a simple and versatile technique, which we call Few-Shot Task-Aware COmpression (TACO). Given a general-purpose model pretrained on a broad task, such as classification on ImageNet or iNaturalist datasets with thousands of categories, TACO produces a much smaller model that is accurate on specialized tasks, such as classifying across vehicle types or animal species, based only on a few examples from each target class. The method is based on two key insights - 1) a powerful specialization effect for data-aware compression, which we showcase for the first time; 2) a dedicated finetuning procedure with knowledge distillation, which prevents overfitting even in scenarios where data is very scarce. Specifically, TACO is applied in few-shot fashion, i.e. only a few task-specific samples are used for compression, and the procedure has low computational overhead. We validate this approach experimentally using highly-accurate ResNet, ViT/DeiT, and ConvNeXt models, originally trained on ImageNet and iNaturalist datasets, which we specialize and compress to a diverse set of ``downstream'' subtasks, with notable computational speedups on both CPU and GPU.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: * Added quantitative assessment (FID score) of image generation quality (Reviewer rVaw) for dense and compressed diffusion generative models. * Added ablation (**Appendix H**) with respect to randomness to estimate the statistical significance of the paper claims and variance in the experiments (**Reviewer rnwv**). Overall, the obtained results suggest that the conclusions made are robust to random seeds and the advantage of TACO over PTC is statistically significant. * Restructured some parts of the paper and figure placement according to the suggestions of **Reviewers rVaw and rnwv**. References have been fixed.

Supplementary Material: zip

Assigned Action Editor: ~Li_Dong1

Submission Number: 3256

Loading