Future Feature-Based Supervised Contrastive Learning for Streaming Perception

Tongbo Wang, Hua Huang

Published: 2024, Last Modified: 12 Nov 2025IEEE Trans. Circuits Syst. Video Technol. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Streaming perception, a critical task in computer vision, involves the real-time prediction of object locations within video sequences based on prior frames. While current methods like StreamYOLO mainly rely on coordinate information, they often fall short of delivering precise predictions due to feature misalignment between input data and supervisory labels. In this paper, a novel method, Future Feature-based Supervised Contrastive Learning (FFSCL), is introduced to address this challenge by incorporating appearance features from future frames and leveraging supervised contrastive learning techniques. FFSCL establishes a robust correspondence between the appearance of an object in current and past frames and its location in the subsequent frame. This integrated method significantly improves the accuracy of object position prediction in streaming perception tasks. In addition, the FFSCL method includes a sample pair construction module (SPC) for the efficient creation of positive and negative samples based on future frame labels and a feature consistency loss (FCL) to enhance the effectiveness of supervised contrastive learning by linking appearance features from future frames with those from past frames. The efficacy of FFSCL is demonstrated through extensive experiments on two large-scale benchmark datasets, where FFSCL consistently outperforms state-of-the-art methods in streaming perception tasks. This study represents a significant advancement in the incorporation of supervised contrastive learning techniques and future frame information into the realm of streaming perception, paving the way for more accurate and efficient object prediction within video streams.

External IDs:dblp:journals/tcsv/WangH24