CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics

11 May 2025 (modified: 30 Oct 2025)Submitted to NeurIPS 2025 Datasets and Benchmarks TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: computational fluid dynamics, numerical analysis, large language models, benchmark, agents
TL;DR: A holistic benchmark to evaluate how well large language models can understand, select, implement algorithms, and use tools of CFD to simulate fluid flows.
Abstract: Large Language Models (LLMs) have demonstrated strong performance across general NLP tasks, but their utility in automating numerical experiments of complex physical system---a critical and labor-intensive component---remains underexplored. As the major workhorse of computational science over the past decades, Computational Fluid Dynamics (CFD) offers a uniquely challenging testbed for evaluating the scientific capabilities of LLMs. We introduce \textit{CFDLLMBench}, a benchmark suite comprising three complementary components---\textit{CFDQuery}, \textit{CFDCodeBench}, and \textit{FoamBench}---designed to holistically evaluate LLM performance across three key competencies: graduate-level CFD knowledge, numerical and physical reasoning of CFD, and context-dependent implementation of CFD workflows. Grounded in real-world CFD practices, our benchmark combines a detailed task taxonomy with a rigorous evaluation framework to deliver reproducible results and quantify LLM performance across code executability, solution accuracy, and numerical convergence behavior. \textit{CFDLLMBench} establishes a solid foundation for the development and evaluation of LLM-driven automation of numerical experiments for complex physical systems.
Croissant File: json
Dataset URL: https://www.kaggle.com/datasets/f7e918624a9d2e5321ea2ec1e4ef818919c89f120cb065ba04c9596b80f1297c
Supplementary Material: zip
Primary Area: AL/ML Datasets & Benchmarks for physics (e.g. climate, health, life sciences, physics, social sciences)
Flagged For Ethics Review: true
Submission Number: 1993
Loading