Abstract: Highlights•A hybrid Vision Transformer boosts human pose estimation at medium and small scales.•Two HRPM-derived insertion strategies improve performance from two perspectives.•HRPVT outperforms HRNet-W48 while reducing complexity by 60%.
Loading