Motion-guided token prioritization and semantic degradation fusion for exo-to-ego cross-view video generation
Abstract: Highlights•A novel video-based method, TPDF (motion-guided Token Prioritization and semantic Degradation Fusion), is proposed for cue-free E2VG task.•MSPT and MTPT incorporate motion cues and orthogonal constraints to adaptively identify informative tokens, ensuring spatial–temporal consistency generation.•The SDF progressively learns egocentric semantics through a degradation learning mechanism.•By developing cascaded cross-self attention framework, the designed CPD effectively compensates for the degradation of egocentric semantic information and incorporate informative tokens at different granularities.•The TPDF achieves state-of-the-art performance in the cue-free E2VG task.
Loading