Dynamic scale position embedding for cross-modal representation learning

Published: 01 Jan 2026, Last Modified: 12 Nov 2025Neural Networks 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We propose a position-based method for dynamic temporal scaling.•We design a single-encoder multi-scale temporal model.•We achieve consistent gains on 4 datasets in video retrieval and video captioning.
Loading