Abstract: Highlights•We cast VG as a direct regression problem and present a simple yet effective framework (PRVG) for dense VG with accurate and efficient inference.•We design a robust and scale-invariant proposal-level attention loss function to guide the training of PRVG for better performance.•Extensive experiments demonstrate the superiority of PRVG and the effectiveness of parallel decoding paradigm on dense video grounding task.
Loading