Abstract: Continuous inspection and mapping of the seabed allows for monitoring the impact of anthropogenic activities on benthic ecosystems. Compared to traditional manual assessment methods which are impractical at scale, computer vision holds great potential for widespread and long-term monitoring.
We deploy an underwater remotely operated vehicle (ROV) in Jammer Bay, a heavily fished area in the Greater North Sea, and capture videos of the seabed for habitat classification. The collected JAMBO dataset is inherently ambiguous: water in the bay is typically turbid which degrades visibility and makes habitats more difficult to identify. To capture the uncertainties involved in manual visual inspection, we employ multiple annotators to classify the same set of images and analyze time spent per annotation, the extent to which annotators agree, and more.
We then evaluate the potential of vision foundation models (DINO, OpenCLIP, BioCLIP) for automating image-based benthic habitat classification. We find that despite ambiguity in the dataset, a well chosen pre-trained feature extractor with linear probing can match the performance of manual annotators when evaluated in known locations. However, generalization across time and place is an important challenge.
Loading