M3 LUC: Multi-modal Model for Urban Land-Use Classification

Published: 01 Jan 2024, Last Modified: 14 May 2025SIGSPATIAL/GIS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Identifying urban land-use types is crucial for effective resource management, urban planning, and sustainable development. However, classifying land use is complex due to the complexity of the city and the poor data available in undeveloped areas. In this work, we present the Multi-modal Model for Land-use Classification (M3LUC). Our model is the first to leverage the advanced Vision-Language Model (VLM) to better capture urban functionality through remote sensing data and Points of Interest (POI). We have also designed specific mechanisms to robustly and extensively tackle the modality missing and conflict to enhance transferability. Experiments conducted in four major cities in China demonstrate our model's superior performance in both transfer and non-transfer tasks, revealing its potential for broader applications.
Loading