Urban planning in the age of large language models: Assessing OpenAI o1's performance and capabilities across 556 tasks

Published: 2025, Last Modified: 08 Oct 2025Comput. Environ. Urban Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•First comprehensive LLM evaluation in urban planning using a 556-task benchmark.•OpenAI o1 excels, scoring 84.08 on average, outperforming both GPT-3.5 and GPT-4o.•Identifies OpenAI o1's key strengths and limitations for professional practice.•Informs and guides future LLM advancements for urban planning applications.
Loading