UrbanWasteNet: A Hierarchical Multimodal Framework for Automated Street-Level Waste Detection Using Vision and Language Models

Published: 27 Jan 2026, Last Modified: 27 Jan 2026AAAI 2026 AI4ES PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Waste dumpsite detection, Computer vision, Large language models, Street-view imagery, Multimodal learning, Spatial analysis, Smart city, Municipal management
TL;DR: A multimodal LLM-based dual-phase system that fuses visual and spatial features for automated urban waste detection, outperforming traditional single-modal approaches.
Abstract: Improperly managed waste dumpsites pose severe environmental threats, yet their random distribution makes systematic detection challenging. We present UrbanWasteNet, a hierarchical multimodal framework integrating computer vision and large language models for automated street-level waste detection. Our three-phase architecture combines EfficientNet-based visual processing with spatial metadata analysis, followed by GPT-4o se-mantic verification. Experimental results on our UrbanDumpSight dataset demonstrate 96.6% detection accuracy and 86.2% classification accuracy across waste types. The dual-phase design significantly reduces false positives while enabling scalable processing of urban imagery, providing municipal authorities with an effective auto-mated solution for waste management and environmental monitoring.
Submission Number: 27
Loading