Stand on Two Shoulders: Dynamically Merging Tokens from General and Medical Experts

Shentong Mo; Xufang Luo; Zilong Wang; Dongsheng Li

Stand on Two Shoulders: Dynamically Merging Tokens from General and Medical Experts

Shentong Mo, Xufang Luo, Zilong Wang, Dongsheng Li

23 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Visual Adaptation, Medical Representation Learning

TL;DR: In this work, we introduce the DynaMer Adapter, a novel architecture designed to enable Dynamically Merge tokens from general and medical pre-trained models, enhancing the adaptability of ViTs for medical imaging tasks.

Abstract: In the realm of medical image analysis, the transferability of pre-trained Vision Transformers (ViTs) to specialized medical tasks remains a significant challenge. Previous approaches focus on adapting a single model, by introducing specialized learnable layers to the pre-trained model. However, a single model optimized for general tasks underperforms in domain-specific applications, while one medical models limited by their fundamental inferior capabilities, is not robust enough in real-world adaptation. To address this, we introduce the DynaMer Adapter, a novel architecture designed to enable Dynamically Merge tokens from general and medical pre-trained models, enhancing the adaptability of ViTs for medical imaging tasks. DynaMer incorporates a Gated Mixture-of-Expert (MoE) Adapter, ensuring that the model ingeniously prioritizes relevant features for specific medical tasks. Additionally, we incorporate a layer-wise skipping router within the architecture, designed to adjust the number of input tokens efficiently, thereby optimizing inference time without compromising on model accuracy. Extensive evaluations on the Medical Visual Task Adaptation Benchmark (Med-VTAB) demonstrate that DynaMer achieves state-of-the-art performance, particularly excelling in patient out-of-distribution settings and tasks with only few samples.

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3144

Loading