Abstract: In today's multimedia-rich environment, the rapid growth of data poses significant challenges for developing efficient multi-modal retrieval systems essential for retrieving text, images, audio, and video. As data expands, newer, scalable, and high-performance retrieval systems are increasingly necessary. Embedding-based deep neural networks (DNNs) have become key solutions, transforming high-dimensional data into lower-dimensional embeddings for easy comparison and retrieval. However, updating DNNs changes the internal feature representations, necessitating the extraction of new feature vectors for all gallery data, which is costly, especially with gallery sets comprising billions of data. Learning backward-compatible representations addresses this by allowing new representation to be matched with old gallery data without recalculating features. This tutorial aims to equip participants with the knowledge and tools to apply backward-compatible representations, enhancing multimedia retrieval systems' efficiency and scalability. Participants will learn the importance of compatible representations, basic methods and techniques, and explore challenging open questions that are becoming increasingly relevant to multimedia and cross-modal retrieval.
Loading