Abstract: The increasing volume of multimedia content has intensified the demand for video retrieval systems that can efficiently and accurately extract relevant information from large-scale archives. However, existing methods frequently encounter challenges when dealing with ambiguous queries, particularly those involving complex temporal relationships, often leading to incomplete or suboptimal retrieval results. To address these limitations, we propose a novel multimodal video retrieval system designed to handle a wide range of query types by integrating outputs from multiple search models. A central feature of the system is its advanced temporal search mechanism, which improves ambiguity resolution by conducting additional searches within adjacent video shots, rather than relying solely on chronological order. The effectiveness of the proposed system is demonstrated through its performance in the 2024 Ho Chi Minh AI Challenge.
External IDs:doi:10.1007/978-981-96-4291-5_14
Loading