Abstract: Quickly understanding the livestreaming scenarios is crucial for online content regulation. Logos in livestreaming videos are key cues to the content of the scene. However, due to the complex and time-varying content of livestreaming, some logos that are too small or have variable shapes negatively affect the performance of downstream tasks. For this reason, we focus on reconstructing two modules that are closely related to the feature map sampling process for supplementing the feature key information, thus proposing dual reconstructed YOLO (DR-YOLO) for logo detection in livestreaming. First, we design a reconstructed spatial pyramid pooling fast (RSPPF) module to achieve feature-level fusion of local and global features by sharing global information. Then, we develop a reconstructed content-aware feature (RCAF) module to augment the existing frame information by unitizing adjacent frame, so that the reconstructed feature maps pay more attention to the data of the relevant points in local region, thereby enhancing semantic information. Finally, we employ channel-wise knowledge distillation to transfer knowledge from a large model to lightweight one for downstream tasks such as livestreaming. Extensive experiments are conducted on publicly available QUML-OpenLogo and LogoDet3k datasets, as well as the self-built livestreaming video dataset BJUT-VLD. The results achieve competitive mAP of 48.3%, 69.5%, and 54.9%, respectively, which demonstrate the effectiveness and superiority of the proposed DR-YOLO.
External IDs:dblp:journals/mms/YuanZLZ25
Loading