CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics

Nithin Somasekharan; Weichao LI; Ling Yue; Yadi Cao; Xingyu Xie; Pochinapeddi Sai Bhargav; Anurag Acharya; Patrick Emami; Shaowu Pan

CFDLLMBench: A Benchmark Suite for Evaluating Large Language Models in Computational Fluid Dynamics

Nithin Somasekharan, Weichao LI, Ling Yue, Yadi Cao, Xingyu Xie, Pochinapeddi Sai Bhargav, Anurag Acharya, Patrick Emami, Shaowu Pan

11 May 2025 (modified: 30 Oct 2025)Submitted to NeurIPS 2025 Datasets and Benchmarks TrackEveryoneRevisionsBibTeXCC BY 4.0

Keywords: computational fluid dynamics, numerical analysis, large language models, benchmark, agents

TL;DR: A holistic benchmark to evaluate how well large language models can understand, select, implement algorithms, and use tools of CFD to simulate fluid flows.

Abstract: Large Language Models (LLMs) have demonstrated strong performance across general NLP tasks, but their utility in automating numerical experiments of complex physical system---a critical and labor-intensive component---remains underexplored. As the major workhorse of computational science over the past decades, Computational Fluid Dynamics (CFD) offers a uniquely challenging testbed for evaluating the scientific capabilities of LLMs. We introduce \textit{CFDLLMBench}, a benchmark suite comprising three complementary components---\textit{CFDQuery}, \textit{CFDCodeBench}, and \textit{FoamBench}---designed to holistically evaluate LLM performance across three key competencies: graduate-level CFD knowledge, numerical and physical reasoning of CFD, and context-dependent implementation of CFD workflows. Grounded in real-world CFD practices, our benchmark combines a detailed task taxonomy with a rigorous evaluation framework to deliver reproducible results and quantify LLM performance across code executability, solution accuracy, and numerical convergence behavior. \textit{CFDLLMBench} establishes a solid foundation for the development and evaluation of LLM-driven automation of numerical experiments for complex physical systems.

Croissant File: json

Dataset URL: https://www.kaggle.com/datasets/f7e918624a9d2e5321ea2ec1e4ef818919c89f120cb065ba04c9596b80f1297c

Supplementary Material: zip

Primary Area: AL/ML Datasets & Benchmarks for physics (e.g. climate, health, life sciences, physics, social sciences)

Flagged For Ethics Review: true

Submission Number: 1993

Loading