UrbanPlanBench: A Comprehensive Assessment of Urban Planning Abilities in Large Language Models

Yu Zheng; Longyi Liu; Yuming Lin; Jie Feng; Guozhen Zhang; Depeng Jin; Yong Li

UrbanPlanBench: A Comprehensive Assessment of Urban Planning Abilities in Large Language Models

Yu Zheng, Longyi Liu, Yuming Lin, Jie Feng, Guozhen Zhang, Depeng Jin, Yong Li

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Benchmark, urban planning

TL;DR: This paper introduces the first benchmark for LLM on urban planning capabilities and the largest-ever SFT dataset for LLMs in urban planning. Abstract:

Abstract: Urban planning is a professional discipline that shapes our daily surroundings, which demands multifaceted domain knowledge and relies heavily on human expertise. The advent of Large Language Models (LLMs) holds promise for revolutionizing such a field by the pre-trained world knowledge. However, the extent to which these models can assist human practitioners remains largely unexplored. In this paper, we introduce a comprehensive benchmark, PlanBench, tailored to evaluate the efficacy of LLMs in urban planning, which encompasses fundamental principles, professional knowledge, and management and regulations, aligning closely with the qualifications expected of human planners. Through extensive evaluation, we reveal a significant imbalance in the acquisition of planning knowledge among LLMs, with even the most proficient models falling short of meeting professional standards. For instance, we observe that 70% of LLMs achieve subpar performance in understanding planning regulations compared to other aspects. Besides the benchmark, we present the largest-ever supervised fine-tuning (SFT) dataset, PlanText, for LLMs in urban planning, comprising over 30,000 instruction pairs sourced from urban planning exams and textbooks. Our findings demonstrate that fine-tuned models exhibit enhanced performance in memorization tests and comprehension of urban planning knowledge, while there exists significant room for improvement, particularly in tasks requiring domain-specific terminology and reasoning. Our benchmark, dataset, and associated evaluation and fine-tuning toolsets aim to catalyze the integration of LLMs into practical urban computing, fostering a symbiotic relationship between human expertise and machine intelligence.

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13434

Loading