Abstract: Video understanding is a significant computer vision research subject since online video content is growing exponentially. Feature extraction and representation play a crucial role in video understanding tasks such as classification, segmentation, and recognition. However, the model’s learning is ambiguous since adjacent video frames typically have similar RGB features. To address this issue, we present graph-based embedding to enhance video feature distribution. We construct a graph-structured of videos by connecting similar features. Node embedding is generated by utilizing a graph model. Experiments demonstrate that our approach effectively improves feature distribution. The graph attention network (GAT) improves accuracy and editing score by 4% over the visual model.
Loading