Abstract: Highlights•This architecture allows the model to capture local and global features between joints more effectively and realize the full fusion of different modalities.•By adding physical topology constraints of human skeleton and integrating multi-scale temporal convolution and transformer, the model can learn more abundant information in spatial and temporal dimension better.•A new joint-bone cross fusion transformer(JBC-Former) is proposed, which dynamically generates complementary features based on the original input.
External IDs:dblp:journals/ijon/LiuCGLMW25
Loading