CDC: Color-Based Diffusion Model with Caption Embedding in VBS 2022Open Website

Published: 01 Jan 2022, Last Modified: 05 Nov 2023MMM (2) 2022Readers: Everyone
Abstract: With the rapid development of the internet and technology, the amount of information that people need to store is exploding. This leads to a burden on search engines which are required to qualify the need of seeking items within seconds or even less. Therefore, information retrieval tasks are getting more and more attention in the research community. Video Browser Showdown (VBS) is one of the annual competitions where researchers can evaluate and compare their works with others on the provided benchmarks. Given a query, which can be in form of a text or a short video, the system is supposed to return the video that is closely relevant to the information in the query. In this work, we introduce CDC: a video browser system using our proposed Color-based Diffusion model and the Caption embedding method inherited from the current state-of-the-art visual-language model (Oscar). To the best of our knowledge, Oscar is currently holding the best performance in cross-modal vision-language modeling, while the color-based feature with diffusion helps enhance the searching process.
0 Replies

Loading