Abstract: Graph Convolutional Networks (GCNs) have been widely used in skeleton-based action recognition. In GCN-based approaches, graph topology dominates feature aggregation, and therefore extraction of the complex relationships between joints is the key to generate spatial-temporal skeletal graph topology structure. We note that current methods are more inclined to construct the topology matrix from the spatial dimension and rarely combine the features from the temporal dimension. This paper proposes a Temporal Topology Aggregation Graph Convolutional Network (TTA-GCN) to learn temporal topology dynamically and efficiently aggregating topology structure in channel dimensions for skeleton-based action recognition. In addition, the multi-stream ensemble framework has a significant effect on improving action recognition accuracy, and more than single natural skeleton modality are required to fuse multi-streams. Therefore, we present a multi-modal representation according to the semantics of human skeleton to capture relationships between non-naturally connected joints. Extensive experiments show that our model results achieved state-of-the-arts performance on three large accepted datasets: NTU-RGB+D 60, NTU-RGB+D 120, and Northwestern-UCLA. Finally, we evaluated the effectiveness of our model through various comparison experiments.
0 Replies
Loading