Abstract: In this work, we present a new method for video retrieval using OpenAI’s CLIP and Temporally Ordered Multi-query Scoring (TOMS). Our approach extends CLIP with a scoring function for matching multiple ordered queries, which enables fast, accurate video search while retaining its zero-shot capability. This allows effective video retrieval on any dataset without the cost of data annotation and model fine-tuning, both of which can be expensive if not unaffordable in Vietnam. An extensive benchmark against using CLIP alone shows superior performance in video searching. Furthermore, we also present our solution for Ho Chi Minh City AI Challenge 2023, which is built upon this method and achieved competitive results.
Loading