Abstract: Highlights•MoCE: dual-level MoE with deconfounding abilities for model generalization.•Micro: FFN-MoE causal expert enhancement; Macro: Scalable deconfounding integration.•Evaluated on VQA2.0, e-SNLI-VE, NLVR2, ECtHR, MoCE surpasses widespread baselines.
Loading