IRRISIGHT: A Large-Scale Multimodal Dataset and Scalable Pipeline to Address Irrigation and Water Management in Agriculture

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Remote Sensing, Multimodal Dataset, Vision, LLM
TL;DR: IRRISIGHT is a large-scale multimodal dataset that combines satellite, soil, crop, and hydrological data across 20 U.S. states to advance irrigation mapping and agricultural water management research.
Abstract: The lack of fine-grained, large-scale datasets on water availability presents a critical barrier to applying machine learning (ML) for agricultural water management. Since there are multiple natural and anthropogenic factors that influence water availability, incorporating diverse multimodal features can significantly improve modeling performance. However, integrating such heterogeneous data is challenging due to spatial misalignments, inconsistent formats, semantic label ambiguities, and class imbalances. To address these challenges, we introduce IRRISIGHT, a large-scale, multimodal dataset spanning 20 U.S. states. It consists of 1.4 million pixel-aligned 224×224 patches that fuse satellite imagery with rich environmental attributes. We develop a robust geospatial fusion pipeline that aligns raster, vector, and point-based data on a unified 10m grid, and employ domain-informed structured prompts to convert tabular attributes into natural language. With irrigation type classification as a representative problem, the dataset is AI-ready, offering a spatially disjoint train/test split and extensive benchmarking with both vision and vision–language models. Our results demonstrate that multimodal representations substantially improve model performance, establishing a foundation for future research on water availability. https://github.com/Nibir088/IRRISIGHT https://huggingface.co/datasets/OBH30/IRRISIGHT
Croissant File: json
Dataset URL: https://huggingface.co/datasets/OBH30/IRRISIGHT
Code URL: https://github.com/Nibir088/IRRISIGHT
Supplementary Material: pdf
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 2002
Loading