Abstract: Recent advances in Visual Language Models (VLMs) have significantly enhanced video analytics. VLMs capture complex visual and textual connections. While Convolutional Neural Networks (CNNs) excel in spatial pattern recognition, VLMs provide a global context, making them ideal for tasks like complex incidents and anomaly detection. However, VLMs are much more computationally intensive, posing challenges for large-scale and real-time applications. This paper introduces EdgeCloudAI, a scalable system integrating VLMs and CNNs through edge-cloud computing. Edge-CloudAI performs initial video processing (e.g., CNN) on edge devices and offloads deeper analysis (e.g., VLM) to the cloud, optimizing resource use and reducing latency. We have deployed EdgeCloudAI on the NSF COSMOS testbed in NYC. In this demo, we will demonstrate EdgeCloudAI's performance in detecting user-defined incidents in real-time.
Loading