Abstract: In recent years, many automobiles have been equipped
with cameras, which have accumulated an enormous
amount of video footage of driving scenes. Autonomous
driving demands the highest level of safety, for which even
unimaginably rare driving scenes have to be collected in
training data to improve the recognition accuracy for specific scenes. However, it is prohibitively costly to find very
few specific scenes from an enormous amount of videos. In
this article, we show that proper video-to-video distances
can be defined by focusing on ego-vehicle actions. It is well
known that existing methods based on supervised learning cannot handle videos that do not fall into predefined
classes, though they work well in defining video-to-video
distances in the embedding space between labeled videos.
To tackle this problem, we propose a method based on
semi-supervised contrastive learning. We consider two related but distinct contrastive learning: standard graph contrastive learning and our proposed SOIA-based contrastive
learning. We observe that the latter approach can provide
more sensible video-to-video distances between unlabeled
videos. Next, the effectiveness of our method is quantified by evaluating the classification performance of the egovehicle action recognition using HDD dataset, which shows
that our method including unlabeled data in training significantly outperforms the existing methods using only labeled
data in training.
Loading