Track: New scientific result
Keywords: Vision transformers, convolution kernel modulator, convolution stride modulator, spatio-temporal data
Abstract: Modeling dynamical systems governed by partial differential equations presents significant challenges for machine learning-based surrogate models. While transformers have shown potential in capturing complex spatial dy- namics, their reliance on fixed-size patches limits flexibility and scalability. In this work, we introduce two convolutional encoder and decoder architec- tural blocks—Convolutional Kernel Modulator (CKM) and Convolutional Stride Modulator (CSM)—designed for patch embedding and reconstruction in autoregressive prediction tasks. These blocks unlock dynamic patching and striding strategies to balance accuracy and computational efficiency during inference. Furthermore, we propose a rollout strategy that adap- tively adjusts patching and striding configurations throughout temporally sequential predictions, mitigating patch artifacts and long-term error accu- mulation while improving the capture of fine-scale structures. We show that our approaches enable dynamic control over patch sizes at inference time without losing accuracy over fixed patch baselines.
Supplementary: https://github.com/xyzzxyzz-cloud/ICLR_multiscale_2025
Presenter: ~Alberto_Bietti1
Submission Number: 6
Loading