Recurrence over Video Frames (RoVF) for Animal Re-identification

Published: 01 Mar 2026, Last Modified: 08 May 2026International Journal of Computer VisionEveryoneRevisionsCC BY-SA 4.0
Abstract: Recent advances in deep learning have greatly enhanced the accuracy and scalability of animal re-identification by automating the extraction of subtle distinguishing features from images and videos. This enables large-scale, non-invasive monitoring of animal populations. This article proposes a segmentation pipeline and a re-identification model to identify animals without ground-truth IDs. The segmentation pipeline isolates animals from the background using bounding boxes and leverages the DINOv2 and Segment Anything Model 2 (SAM2) foundation models. For re-identification, Recurrence over Video Frames (RoVF) is introduced, a novel approach that employs a recurrent component based on the Perceiver transformer atop a DINOv2 image model, iteratively refining embeddings from video frames. The proposed methods are evaluated on video datasets of meerkats and polar bears (PolarBearVidID). The proposed segmentation model achieved high accuracy (94.36% and 97.26%) and IoU (73.14% and 92.77%) for meerkats and polar bears, respectively. RoVF outperformed frame- and video-based re-identification baselines, achieving a top-1 accuracy of 46.5% and 55% on masked test sets for meerkats and polar bears, respectively, as well as higher top-3 accuracy. These results highlight the potential of the proposed approach to reduce annotation burdens in future individual-based ecological studies. The code is available at https://github.com/Strong-AI-Lab/RoVF-Meerkat-Reidentification.
Loading