A multi-modal lecture video indexing and retrieval framework with multi-scale residual attention network and multi-similarity computation

Abhijit Debnath, K. Sreenivasa Rao, Partha Pratim Das

Published: 01 Jan 2024, Last Modified: 20 Sept 2025Signal Image Video Process. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Due to technological development, the mass production of video and its storage on the Internet has increased. This made a huge amount of videos to be available on websites from various sources. Thus, the retrieval of essential lecture videos from multimedia is difficult. So, an effective way of indexing and retrieving the video by considering various similarities in the video features is suggested using the deep learning method in this paper. From the standardized set of data, the videos containing lectures are obtained for training. The optimal keyframes are selected from the obtained videos employing the Adaptive Anti-Corona virus Optimization Algorithm. Then the video contents are segmented and arranged on the basis of the optimized keyframes. The optical characters, such as semantic words and keywords, are recognized by means of Optical Character Reorganization, and the image features are extracted from the segmented frames with the help of a Multi-scale Residual Attention Network (MRAN). The generated pool of features is arranged and stored in the database according to the contents. Text and video queries are given as the input for testing the trained model. The features from the text query and the features of the optimized keyframes from the video query are obtained with the help of MRAN in the testing phase. The generated pool features from the text and video queries are compared with the features that are stored in the database for analyzing the similarities using Cosine, Jacquard, and Euclidean similarity indices. From this, the multi-similarity features are used for retrieval of the relevant videos in accordance with the provided query. The experimental results show that the performance of the proposed system for video indexing and retrieval is better and more efficient than the existing methods of video retrieval.