Large Multimodal Models Enable Scalable Monitoring of Aquaculture Ponds

Published: 01 Mar 2026, Last Modified: 05 Apr 2026ML4RS @ ICLR 2026 (Main)EveryoneRevisionsBibTeXCC BY 4.0
Abstract: Aquaculture plays an increasingly important role in global food systems, and its rapid expansion, particularly in biodiverse regions such as the Amazon River basin, requires reliable and scalable spatial monitoring to ensure environmentally sustainable management. However, satellite-based monitoring of inland aquaculture systems has traditionally been constrained by the difficulty of detecting small and irregularly shaped aquaculture ponds whose spectral similarity to surrounding land uses and variable optical conditions complicate their identification in heterogeneous landscapes. Consequently, conventional classification approaches (e.g., Random Forest and CNNs) often fail to capture this spatial and contextual complexity. Here, we present a novel agentic framework, AISciVision-Aqua, which integrates Retrieval-Augmented Generation (RAG), large multimodal models (LMMs), and agentic interactive tools (e.g., zooming, panning, and predictive tools) to explicitly emulate human expert workflows and detect aquaculture ponds from satellite imagery. Our results demonstrate that AISciVision-Aqua consistently outperforms baseline classification methods, such as convolutional neural networks, achieving higher precision and recall while providing transparent reasoning transcripts that detail the model’s decision-making process. The AISciVision-Aqua’s interactive capabilities enable ecologists to validate, interrogate (in different languages), and iteratively refine predictions in real-time, fostering trust and adaptability in Artificial Intelligence (AI)-assisted environmental monitoring. Collectively, these findings demonstrate how LMM-driven, agentic reasoning can advance fine-scale environmental feature extraction from satellite imagery, offering a scalable and interpretable approach for mapping inland aquaculture. AISciVision-Aqua thus exemplifies next-generation remote sensing workflows that tightly integrate human expertise and agentic multimodal AI for scalable environmental monitoring and informed decision making.
Submission Number: 54
Loading