Abstract: In practical object detection scenarios, distributed data and stringent privacy protections significantly limit the feasibility of traditional centralized training methods. Federated learning (FL) emerges as a promising solution to this dilemma. Nonetheless, the issue of data heterogeneity introduces distinct challenges to federated object detection, evident in diminished object perception, classification, and localization abilities. In response, we introduce a task-driven federated learning methodology, dubbed Adaptive Hierarchical Aggregation (FedAHA), tailored to overcome these obstacles. Our algorithm unfolds in two strategic phases from shallow-to-deep layers: (1) Structure-aware Aggregation (SAA) aligns feature extractors during the aggregation phase, thus bolstering the global model's object perception capabilities; (2) Convex Semantic Calibration (CSC) leverages convex function theory to average semantic features instead of model parameters, enhancing the global model's classification and localization precision. We demonstrate experimentally and theoretically the effectiveness of the proposed two modules respectively. Our method consistently outperforming the state-of-the-art methods across multiple valuable application scenarios. Moreover, we build a real FL system using Raspberry Pis to demonstrate that our approach achieves a good trade-off between performance and efficiency.
Primary Subject Area: [Systems] Systems and Middleware
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: Our work has innovated in the field of federated learning algorithms for object detection tasks, especially in addressing the issue of data heterogeneity. This research can make significant contributions to the development of multimedia processing, specifically:1)Enhancing the intelligent processing of multimedia content. Federated learning allows for collaborative model training across multiple clients, utilizing distributed data sources. This means that for multimedia applications, such as video surveillance and medical imaging content, our algorithm can fully leverage various data sources, even if their distribution is uneven. 2) Protecting privacy and data security. Federated learning does not require the sharing of raw data, which is particularly important for processing sensitive multimedia content. This enables multimedia processing in areas such as healthcare, finance, and personal media to proceed while safeguarding privacy. 3) Improving the generalization capability of algorithms. Mitigating issues with data heterogeneity allows the learned models to better generalize to different data distributions, which is crucial for handling multimedia content from around the world and across various devices.
Submission Number: 2812
Loading