Abstract: Highlights•We find that vanilla MWA performs well for class-imbalanced tasks and that early-epoch averaging yields greater gains, inspiring the design of IMWA.•IMWA iteratively conducts parallel training and weight averaging, and its integration with EMA shows their complementary benefits.•Extensive experiments demonstrate that IMWA outperforms vanilla MWA and effectively boosts performance for class-imbalanced learning.
Loading