UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web

Published: 23 Jan 2024, Last Modified: 23 May 2024TheWebConf24 OralEveryoneRevisionsBibTeX
Keywords: urban computing, multimodal learning, satellite imagery, web-sourced data, contrastive learning
Abstract: Accurately profiling urban regions in terms of social, economic and environmental indicators is crucial for urban planning and sustainable development. The ubiquitous urban imagery from the Web, particularly satellite images, has become a key source for inferring urban indicators. Recent studies in urban imagery representation learning suffer from some limitations: 1) most works supervised by specific urban downstream tasks struggle to generalize well across diverse urban prediction tasks, leading to robustness and effectiveness issues. 2) prevalent unsupervised learning approaches for satellite imagery only take into account spatial attributes (e.g., Point-of-Interest and mobility) as supplemental information, but the semantically enhanced and explainable textual modality is rarely leveraged for comprehensive urban region profiling. To address such limitations, this paper introduces a simple yet effective learning framework, Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining (UrbanCLIP), which is trained on image-text pairs, seamlessly unifying natural language supervision for visual representation learning jointly with contrastive loss and language modeling loss. Results from applying learnt satellite visual representations to predict urban indicators in four major Chinese metropolises demonstrate superior performance, with an average improvement of 6.1\% on R^2 compared to the best baseline. This paper sheds light on web mining from satellite imagery and location description for urban region profiling.
Track: Web Mining and Content Analysis
Submission Guidelines Scope: Yes
Submission Guidelines Blind: Yes
Submission Guidelines Format: Yes
Submission Guidelines Limit: Yes
Submission Guidelines Authorship: Yes
Student Author: No
Submission Number: 379
Loading