AxiomVision: Accuracy-Guaranteed Adaptive Visual Model Selection for Perspective-Aware Video Analytics

Published: 20 Jul 2024, Last Modified: 31 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The rapid evolution of multimedia and computer vision technologies requires adaptive visual model deployment strategies to effectively handle diverse tasks and varying environments. This work introduces \textit{AxiomVision}, a novel framework that can guarantee accuracy by leveraging edge computing to dynamically select the most efficient visual models for video analytics under diverse scenarios. Utilizing a tiered edge-cloud architecture, \textit{AxiomVision} enables the deployment of a broad spectrum of visual models, from lightweight to complex DNNs, that can be tailored to specific scenarios while considering camera source impacts. In addition, \textit{AxiomVision} provides three core innovations: (1) a dynamic visual model selection mechanism utilizing continual online learning, (2) an efficient online method that efficiently takes into account the influence of the camera's perspective, and (3) a topology-driven grouping approach that accelerates the model selection process. With rigorous theoretical guarantees, these advancements provide a scalable and effective solution for visual tasks inherent to multimedia systems, such as object detection, classification, and counting. Empirically, \textit{AxiomVision} achieves a 25.7\% improvement in accuracy.
Primary Subject Area: [Systems] Transport and Delivery
Secondary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: This paper introduces AxiomVision, a framework that stands at the forefront of multimedia and multimodal processing, particularly excelling in applications like smart traffic monitoring under varying environmental conditions. It distinguishes itself through a unique, agile approach that integrates and deploys visual models across an advanced edge-cloud infrastructure, enabling precise and efficient multimedia content analysis. Unlike static, unimodal systems, AxiomVision adapts to changing environments and visual requirements by dynamically selecting models, leveraging both edge computing for rapid response and cloud computing for complex model management, which inherently embodies the principles of multimedia and multimodal processing. AxiomVision's remarkable contributions include its dynamic model selection mechanism, which perpetually adjusts to meet shifting requirements and a layered edge-cloud architecture that strategically deploys visual models. Its perspective-aware learning algorithm, which accounts for the nuances of camera viewpoints to refine analysis accuracy, and a topology-informed grouping strategy, which enhances model selection across camera networks, demonstrate a nuanced understanding of multimodal data management. AxiomVision pushes the boundaries of the field with its robust theoretical foundations and compelling empirical evidence within dynamic settings.
Supplementary Material: zip
Submission Number: 3186
Loading