MarineMaid: Dataset and Benchmark on Detecting and Understanding Marine Creatures

Liang Haixin; Zeyu Ma; Wong Yuk Kwan; Yiwei Chen; Zheng Ziqiang; Rinaldi Gotama; Pascal Sebastian; Lauren D. Sparks; Serena Stean; Sai-Kit Yeung

MarineMaid: Dataset and Benchmark on Detecting and Understanding Marine Creatures

Liang Haixin, Zeyu Ma, Wong Yuk Kwan, Yiwei Chen, Zheng Ziqiang, Rinaldi Gotama, Pascal Sebastian, Lauren D. Sparks, Serena Stean, Sai-Kit Yeung

27 Sept 2024 (modified: 24 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: open-vocabulary object detection, marine instance detection and captioning, biodiversity monitoring, vision-language understanding

TL;DR: MarineMaid: Dataset and Benchmark for Instance Detection and Understanding Abstract:

Abstract: Oceans, covering more than 70% surfaces of our blue planets are less explored by the whole computer vision community. The scarcity of the labeled data is attributed to the most hindering issue. In this work, we propose a novel and comprehensive dataset called MarineMaid specifically designed for marine monitoring and understanding, including a wide spectrum of marine creatures. Based on the essential requirements of the marine research community, we adopt object detection and vision-language understanding as our two fundamental tasks. The former object detection could yield precise localization and category predictions for species identification and monitoring. Besides the sole category and BBOX predictions, the latter vision-language understanding generates redundant and comprehensive captions about biological traits required for domain experts. MarineMaid contains 12,873 fine-grained instance-captioning pairs and 42,217 bounding boxes annotated by domain experts. We have benchmarked 14 state-of-the-art algorithms on our MarineMaid dataset to reveal the strengths and limitations of existing general-purpose and domain-specific algorithms. The hierarchical and comprehensive experimental results provide valuable insights on how to develop practical and efficient marine visual perception algorithms to satisfy the domain requirements. To foster the further development of this direction, we will release our MarineMaid dataset with the acceptance of this paper.

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10671

Loading