Learning event representations for temporal segmentation of image sequences by dynamic graph embedding
Abstract: Recently, self-supervised learning has proved to be
effective to learn representations of events suitable for temporal
segmentation in image sequences, where events are understood
as sets of temporally adjacent images that are semantically
perceived as a whole. However, although this approach does not
require expensive manual annotations, it is data hungry and
suffers from domain adaptation problems. As an alternative,
in this work, we propose a novel approach for learning event
representations named Dynamic Graph Embedding (DGE). The
assumption underlying our model is that a sequence of images
can be represented by a graph that encodes both semantic and
temporal similarity. The key novelty of DGE is to learn jointly
the graph and its graph embedding. At its core, DGE works by
iterating over two steps: 1) updating the graph representing the
semantic and temporal similarity of the data based on the current
data representation, and 2) updating the data representation
to take into account the current data graph structure. The
main advantage of DGE over state-of-the-art self-supervised
approaches is that it does not require any training set, but
instead learns iteratively from the data itself a low-dimensional
embedding that reflects their temporal and semantic similarity.
Experimental results on two benchmark datasets of real image
sequences captured at regular time intervals demonstrate that
the proposed DGE leads to event representations effective for
temporal segmentation. In particular, it achieves robust temporal
segmentation on the EDUBSeg and EDUBSeg-Desc benchmark
datasets, outperforming the state of the art. Additional experi-
ments on two Human Motion Segmentation benchmark datasets
demonstrate the generalization capabilities of the proposed DGE
0 Replies
Loading