Abstract: Extracting multiscale contextual information and
higher-order correlations among skeleton sequences
using Graph Convolutional Networks (GCNs) alone is
inadequate for effective action classification. Hypergraph
convolution addresses the above issues but cannot harness
the long-range dependencies. The transformer proves to
be effective in capturing these dependencies and making
complex contextual features accessible. We propose
an Autoregressive Adaptive HyperGraph Transformer
(AutoregAd-HGformer) model for in-phase (autoregressive and discrete) and out-phase (adaptive) hypergraph
generation. The vector quantized in-phase hypergraph
equipped with powerful autoregressive learned priors
produces a more robust and informative representation
suitable for hyperedge formation. The out-phase hypergraph generator provides a model-agnostic hyperedge
learning technique to align the attributes with input
skeleton embedding. The hybrid (supervised and unsupervised) learning in AutoregAd-HGformer explores the
action-dependent feature along spatial, temporal, and
channel dimensions. The extensive experimental results
and ablation study indicate the superiority of our model
over state-of-the-art hypergraph architectures on the NTU
RGB+D, NTU RGB+D 120, and NW-UCLA datasets.
Loading