Keywords: Transfer Learning, Domain Generalization, Multimodal, Attention Mechanism, Computational Efficiency, Memory Efficiency
TL;DR: Layer-wise Feature Aggregation (LFA) boosts large-scale pre-trained models by leveraging hierarchical features, enhancing domain shift and few-shot learning. It's efficient, needing only top-layer optimization without full model backpropagation.
Abstract: Large-scale pre-trained models capable of extracting generalized features from huge data have become a key component on various downstream tasks. However, it is challenging to achieve substantial performance improvement with efficiency. There have been many works, such as prompt tuning and adapters, but both the efficiency and the performance improvement are limited. We propose a novel approach called Layer-wise Feature Aggregation (LFA), utilizing features from all layers of a pre-trained model with attention mechanism. Our focus is on utilizing existing low level features rather than generating new ones. First, LFA captures hierarchical features from low-level to high-level, enabling the extraction of richer and more general features; therefore, it significantly improves the performance in domain shift and few-shot learning. Second, LFA requires optimization only on top of large pre-trained models. Therefore, LFA optimization is efficient because it does not require back-propagation through the model. LFA is a new efficient transfer learning approach with improved performance and efficiency. Our methods are implemented at: https://github.com/MLAI-Yonsei/LFA
Submission Number: 84
Loading