Aquilon: Towards Building Multimodal Weather LLMs

Sumanth Varambally; Veeramakali Vignesh Manivannan; Yasaman Jafari; Luyu Han; Zachary Novack; Zhirui Xia; Salva Rühling Cachay; Srikar Eranky; Ruijia Niu; Taylor Berg-Kirkpatrick; Duncan Watson-Parris; Yian Ma; Rose Yu

Aquilon: Towards Building Multimodal Weather LLMs

Sumanth Varambally, Veeramakali Vignesh Manivannan, Yasaman Jafari, Luyu Han, Zachary Novack, Zhirui Xia, Salva Rühling Cachay, Srikar Eranky, Ruijia Niu, Taylor Berg-Kirkpatrick, Duncan Watson-Parris, Yian Ma, Rose Yu

Published: 10 Jun 2025, Last Modified: 14 Jul 2025ICML 2025 World Models WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Weather, Scientific Reasoning, Scientific Question Answering, Weather Foundation Models

TL;DR: We present a scalable framework for enabling multimodal LLMs to reason over complex weather data by generating diverse weather-related QA tasks and embedding numerical forecasts into LLM-compatible representations.

Abstract: Recent advancements in weather foundation models—pre-trained on vast amounts of structured numerical data—have set new standards in weather forecasting accuracy. However, their lack of language-based reasoning capabilities leaves a critical opportunity untapped for human-in-the-loop analysis systems. In contrast, large language models (LLMs) excel at understanding and generating text, but they struggle with high-dimensional weather inputs like meteorological datasets. In this work, we take a significant step towards bridging this gap by enabling multimodal LLMs to reason over complex weather data. We address two fundamental challenges: (1) the absence of large-scale, multitask, multimodal datasets for weather reasoning, and (2) the lack of methods for embedding multi-channel weather data into LLM-compatible representations. To tackle these, we introduce a scalable data generation pipeline that constructs diverse question-answer pairs across a wide spectrum of weather-related tasks, from basic lookups to advanced forecasting and extreme event detection. We also leverage pretrained weather foundation models to extract low-dimensional embeddings of weather fields, enabling their integration with LLMs. Our experiments reveal that multimodal weather reasoning is a challenging problem that current models only partially address—highlighting the need for more effective weather representations and richer training data to fully unlock the potential of LLMs in meteorological applications.

Submission Number: 43

Loading