From RawTokens to PhysSummary: Probing Text Interfaces for Inverse 1D PDE Parameter Estimation

Published: 01 Mar 2026, Last Modified: 05 Mar 2026AI&PDE PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Partial differential equations (PDEs), Large language models (LLMs), Inverse problems, AI4Science
TL;DR: Benchmark how text interfaces (RawTokens vs PhysSummary) change LLM reliability for inverse 1D PDE parameter estimation, revealing regime-dependent design rules and hidden failure modes
Abstract: Text is the de facto glue layer between numerical data and large language models (LLMs), yet interface design can silently determine whether an LLM-based scientific pipeline succeeds or fails. We present a controlled benchmark isolating text-interface effects on LLM-based inverse parameter estimation for 1D heat and advection equations, comparing RawTokens (quantized grid serialization) and PhysSummary (physics-informed descriptors) across zero-shot learning (ZSL), in-context learning (ICL), and supervised fine-tuning (SFT) regimes with four LLMs. Rather than replacing conventional solvers, we derive practical design rules: closed-API models benefit from PhysSummary under prompting, while open-weight models and SFT are more robust with RawTokens. Evaluating in-range rate alongside root mean square error (RMSE) exposes three failure modes---prompt-visible constant copying, SFT collapse on compressed features, and non-monotonic shot scaling---invisible to accuracy metrics alone. SFT on RawTokens narrows the gap to float-grid baselines despite operating on quantized text. These findings provide interface-aware design guidance for any scientific workflow routing numerical data through an LLM.
Journal Opt In: Yes, I want to participate in the IOP focus collection submission
Journal Corresponding Email: yiderigun.yiderigun@hereon.de
Journal Notes: We opt in for the AI&PDE Focus Collection and plan a strengthened journal extension. The workshop paper provides a controlled benchmark of how numeric-to-text interfaces (RawTokens vs PhysSummary) affect LLM-based inverse 1D PDE parameter estimation across learning regimes, and highlights reliability/failure modes that are not captured by RMSE alone. For the journal version, we will specifically address the remaining weaknesses and deepen the contribution along four concrete axes: (1) Uncertainty-aware inverse estimation: Inverse problems require uncertainty quantification. We will add stochastic decoding / sampling-based inference to obtain distributions over estimated parameters, report calibration and risk–coverage behavior, and analyze when deterministic decoding masks unreliability. (2) Robust reliability metrics beyond hard thresholds: The current in-range rate uses a hard interval criterion that can be brittle. We will add tolerance-aware variants (soft/near-in-range scoring or margin-based acceptance) and report results side-by-side with RMSE/MAE to better reflect practical acceptability. (3) Broader model-family coverage and generalization: To test whether observed interface effects and failure patterns are model-family specific, we will evaluate additional open-weight model families (and selected closed models where feasible) under the same benchmark protocol, and expand OOD/noise/generalization analyses to assess stability of the derived interface rules-of-thumb. Timeline: due to the workshop camera-ready deadline, this version focuses on clarity and consolidation without additional experiments. We plan to run the above extensions after the mid of March, complete the main experimental additions in April–May, and consolidate the journal manuscript in June–July for submission ahead of the Focus Collection deadline.
Submission Number: 146
Loading