Keywords: cross-modal adaptation, partial differential equations, architecture
Abstract: Different methods of fine-tuning Large Language Models to new modalities have been introduced in recent years, particularly for Scientific ML tasks such as time-dependent simulation tasks based on Partial Differential Equations
(PDEs). Most of these approaches are based on encoder-only models, even though decoder-only models have gained popularity in NLP and ML more broadly, given their scaling capabilities. However, the impact of model architecture on these approaches has not been investigated before. In this ongoing work, we perform a series of ablation studies that compare encoder-only and decoder-only models. We find that encoder-only models perform better than decoder-only models (with a great variation between tasks). This is because of how the data is introduced into decoder-only models, which get heavily penalized for being autoregressive. We also find that, in contrast to other tasks, scaling decoder-only models does not change performance. Pending more experimentation, these results show that we need to find new ways to harness the potential of decoder-only models in the context of cross-modal adaptation.
Submission Number: 28
Loading