Abstract: The problem of modeling the dynamic structure of human activities is considered. Video is mapped
to a semantic feature space, which encodes activity attribute probabilities over time. The binary dynamic
system (BDS) model is proposed to jointly learn the
distribution and dynamics of activities in this space.
This is a non-linear dynamic system that combines binary observation variables and a hidden Gauss-Markov
state process, extending both binary principal component analysis (PCA) and the classical linear dynamic
systems (LDS). A BDS learning algorithm, inspired by
the popular dynamic texture, and a dissimilarity measure between BDSs, which generalizes the Binet-Cauchy
kernel, are introduced. To enable the recognition of
highly non-stationary activities, the BDS is embedded
in a bag of words. An algorithm is introduced for learning a BDS codebook, enabling the use of the BDS as a
visual word for attribute dynamics (WAD). Short-term
video segments are then quantized with a WAD codebook, allowing the representation of video as a bagof-words for attribute dynamics (BoWAD). Video sequences are finally encoded as vectors of locally aggregated descriptors (VLAD), which summarize the firstmoments of video snippets on the BDS manifold. Experiments show that this representation achieves stateof-the-art performance on the tasks of complex activity
recognition and event identification.
0 Replies
Loading