VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction

TMLR Paper6015 Authors

27 Sept 2025 (modified: 13 Oct 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: In-Context Operator Networks (ICONs) have demonstrated the ability to learn operators across diverse partial differential equations using few-shot, in-context learning. However, existing ICONs process each spatial point as an individual token, severely limiting computational efficiency when handling dense data in higher spatial dimensions. We propose \textit{Vision In-Context Operator Networks} (VICON), which integrate vision transformer architectures to efficiently process 2D data through patch-wise operations while preserving ICON's adaptability to multi-physics systems and varying timesteps. Evaluated across three fluid dynamics benchmarks, VICON significantly outperforms state-of-the-art baselines DPOT and MPP, reducing the average last-step rollout error by 37.9\% compared to DPOT and 44.7\% compared to MPP, while requiring only 72.5\% and 34.8\% of their respective inference times. VICON naturally supports flexible rollout strategies with varying timestep strides, enabling immediate deployment in \textit{imperfect measurement systems} where sampling frequencies may differ or frames might be dropped—common challenges in real-world settings—without requiring retraining or interpolation. In these realistic scenarios, VICON exhibits remarkable robustness, experiencing only 24.41\% relative performance degradation compared to 71.37\%-74.49\% degradation in baseline methods, demonstrating its versatility for deployment in realistic applications.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Manuel_Haussmann1
Submission Number: 6015
Loading