MoE Meets Reparameterization: Reparameterizable Mixture-of-Experts Model Enhances Dermatology Diagnosis via Dense-to-Experts Distillation

16 Sept 2025 (modified: 30 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Medical image analysis, Mixture-of-experts, Re-parameterization, Distillation
Abstract: Reliable automated dermatological assessment hinges on algorithms capable of operating consistently amid the substantial heterogeneity inherent in clinical imaging data. A robust foundation model with strong representational flexibility to capture such complex variability is crucial. Here, we introduce SkinMoE, the first Mixture-of-Experts (MoE)–based foundation model specifically designed for dermatology image analysis. At the core of SkinMoE is Dense-to-Experts Distillation (DED), a novel knowledge distillation strategy that transfers rich representations from a pretrained dense vision transformer, PanDerm, to a set of expert networks. A key innovation is the Mergeable-MoE Block, which enables joint expert training and reparameterization into a single 1×1 convolution at inference time, preserving the computational efficiency of standard feed-forward networks. Unlike prior MoE approaches that use sparse top-k routing, SkinMoE employs a soft weighting mechanism, allowing all experts to contribute to predictions. This design enhances model expressiveness while introducing only a negligible increase in inference computation, regardless of the number of experts. On the DermNet dataset, SkinMoE achieves up to a 2.5\% improvement in Weighted F1 over PanDerm. Ablation studies confirm the contributions of each component. Code and pretrained models will be released.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7499
Loading