Keywords: decentralized federated learning, multimodal, partial alignment, partial information decomposition, feature fission
TL;DR: PARSE enables server-free multimodal DFL via PID-guided feature fission and partial alignment of shareable knowledge, improving accuracy and stability across heterogeneous client mixes.
Abstract: Multimodal decentralized federated learning (DFL) is challenging because agents differ in available modalities and model architectures, yet must collaborate over peer-to-peer (P2P) networks without a central coordinator. Standard multimodal pipelines learn a single shared embedding across all modalities. In DFL, such a monolithic representation induces gradient misalignment between uni- and multimodal agents; as a result, it suppresses heterogeneous sharing and cross-modal interaction. We present PARSE, a multimodal DFL framework that *operationalizes* partial information decomposition (PID) in a server-free setting. Each agent performs *feature fission* to factorize its latent representation into *redundant*, *unique*, and *synergistic* slices. P2P knowledge sharing among heterogeneous agents is enabled by slice-level *partial alignment*: only semantically shareable branches are exchanged among agents that possess the corresponding modality. By removing the need for central coordination and gradient surgery, PARSE resolves uni-/multimodal gradient conflicts, thereby overcoming the multimodal DFL dilemma while remaining compatible with standard DFL constraints. Across benchmarks and agent mixes, PARSE yields consistent gains over task-, modality-, and hybrid-sharing DFL baselines. Ablations on fusion operators and split ratios, together with qualitative visualizations, further demonstrate the efficiency and robustness of the proposed design.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 1716
Loading