Keywords: Federated learning, Causal reference, CLIP
TL;DR: We propose CauFed-CLIP, a novel Causal-based Federated Contrastive Language-Image Pre-training model.
Abstract: Although visual language models (VLMs) have achieved remarkable success, applying them directly in federated learning (FL) faces key challenges: high communication/computation costs and poor generalization due to client data heterogeneity. To tackle these, we propose CauFed-CLIP, a novel Causal-based Federated Contrastive Language-Image Pre-training model. Our model reduces overhead by freezing the VLM backbone and training a lightweight causal module on clients. To enhance generalization, our model employs a progressive causal mechanism. It first disentangles observed features (x) into domain-invariant (s) and domain-variant (z) representations, aided by global and local guidance to suppress their spurious correlations. From this disentangled foundation, it then infers the underlying causal "concept" (c)-a quasi-invariant latent variable that represents the essence of a category and holds a weak causal link with the domain (z). Ultimately, relying solely on this pure concept "c" for prediction allows the model to transcend superficial statistics and grasp the core causal logic.
Supplementary Material: zip
Primary Area: causal reasoning
Submission Number: 6785
Loading