Abstract: Highlights•We propose a position-based method for dynamic temporal scaling.•We design a single-encoder multi-scale temporal model.•We achieve consistent gains on 4 datasets in video retrieval and video captioning.
External IDs:dblp:journals/nn/ShinKCK26
Loading