SecFFT: Safeguarding Federated Fine-Tuning for Large Vision Language Models Against Covert Backdoor Attacks in IoRT Networks
Abstract: As the large vision language models (LVLMs) and embodied intelligent robotic networks continue to advance at a remarkable pace, particularly in applications spanning smart cities, power grids, factories, and transportation, visual perception and understanding have emerged as foundational elements to overcoming performance limitations in such intelligent systems. However, since general pretrained models are not well-suited to specific tasks, federated fine-tuning (FFT) has gained attention as a promising technique for enhancing the performance of vision-based perception models by leveraging data and computational power distributed across nodes. The rise of advanced persistent threats has revealed significant vulnerabilities in existing defense mechanisms, which struggle to mitigate sophisticated backdoor attacks toward FFT for LVLMs. To address these challenges, this article proposes the SecFFT method, which tackles both the stealthiness and complexity of backdoor strategies. The approach incorporates instantaneous attack behavior detection based on frequency-domain distribution consistency and introduces a long-term secure aggregation mechanism aimed at identifying hidden attack intentions. These strategies effectively limit the feasibility of adversaries attempting to bypass defense measures by concealing their behaviors. Experiments conducted on public datasets demonstrate that SecFFT significantly improves defense success rates, model performance, and detection accuracy, particularly in response to highly covert, multiround backdoor attacks.
External IDs:dblp:journals/iotj/ZhouXWLHYY25
Loading