Abstract: Highlights•PO-GUISE is a human motion and ADL-guided token selection for video transformers.•The resulting model improves the accuracy-GFLOPs trade-off during inference.•Our model integrates heatmap tokens for temporal and multi-actor prediction.•Sets new state-of-the-art results on ADL benchmarks at a reduced computational cost.
External IDs:dblp:journals/ivc/PizarroVBBB25
Loading