Abstract: Highlights•Multi-stage regularization effectively enhances video captioning performance.•The number of blocks determines the trade-off between regularization and performance.•Inference can be fast as only the first captioning block is needed.
External IDs:dblp:journals/ivc/PutraJ24
Loading