Physics-Guided Transformer (PGT): Physics-Aware Attention Mechanism for PINNs

ehsan zeraatkar, Rodion Podorozhny, Jelena Tešić

Published: 30 Mar 2026, Last Modified: 16 Apr 2026arXivEveryonearXiv.org perpetual, non-exclusive license

Abstract: Reconstructing continuous physical fields from sparse, irregular observations is a fundamental challenge in scientific machine learning, particularly for nonlinear systems governed by partial differential equations (PDEs). Dominant physics-informed approaches enforce governing equations as soft penalty terms during optimization, a strategy that often leads to gradient imbalance, instability, and degraded physical consistency when measurements are scarce. Here we introduce the Physics-Guided Transformer (PGT), a neural architecture that moves beyond residual regularization by embedding physical structure directly into the self-attention mechanism. Specifically, PGT incorporates a heat-kernel–derived additive bias into attention logits, endowing the encoder with an inductive bias consistent with diffusion physics and temporal causality. Query coordinates attend to these physics-conditioned context tokens, and the resulting features drive a FiLM-modulated sinusoidal implicit decoder that adaptively controls spectral response based on the inferred global context. We evaluate PGT on two canonical benchmark systems spanning diffusion-dominated and convection-dominated regimes: the one-dimensional heat equation and the two-dimensional incompressible Navier–Stokes equations. In 1D sparse reconstruction with as few as 100 observations, PGT attains a relative L2 error of 5.9×10−3, representing a 38-fold reduction over physics-informed neural networks and more than 90-fold reduction over sinusoidal implicit representations. In the 2D cylinder-wake problem reconstructed from 1500 scattered spatiotemporal samples, PGT uniquely achieves strong performance on both axes of evaluation: a governing-equation residual of 8.3 × 10−4 — on par with the best residual-based methods — alongside a competitive overall relative L2 error of 0.034, substantially below all methods that achieve com parable physical consistency. No individual baseline simultaneously satisfies these dual criteria. Convergence analysis further reveals sustained, monotonic error reduction in PGT, in contrast to the early optimization plateaus observed in residual-based approaches. These findings demonstrate that structural incorporation of physical priors at the representational level, rather than solely as an external loss penalty, substantially improves both optimization stability and physical coherence under data-scarce conditions. Physics-guided attention provides a principled and extensible mechanism for reliable reconstruction of nonlinear dynamical systems governed by partial differential equations.