Multimedia video analytics using deep hybrid fusion algorithm

Published: 2025, Last Modified: 04 Jan 2026Multim. Tools Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent times, Multimedia Video Analytics is considered as one of the trending areas of research that focus on understanding different modalities of user-generated data. One of the sophisticated approaches to address this area of research has been to develop an efficient fusion technique that can handle multimedia data such as video which is a combination of text, image and speech data. But most of the researchers failed to fill the gap between different modalities as the video signals are heterogeneous, invariant and this poses a major challenge even today. In contrast to previous research, proposed study examines implementation of novel framework for Multimodal Video Analysis called "Multimedia Video Analytics using Deep Hybrid Fusion Algorithm" which is a smart video analytical framework with three major modules such as Video Feature Extraction, Modality Representation and Fusion. Firstly the three modalities of video such as text, image and speech are extracted and represented in subspace as six hidden vectors using deep learning approach called “Modality Unchanged Precise Representation” or MUR Algorithm which uses Encoder Decoder Representation of BiLSTM. Later a novel video fusion technique called Deep Hybrid Fusion algorithm built over Attention based Transformation technique using Softmax suppression is used to fuse the six hidden vectors in subspace for further task prediction. The proposed DHF approach is compared against fusion variants of LSTM such as MFN, TFN, MRN, MRMF, MV-LSTM and applied for humor detection task on classic video datasets such as IEMOCAP, CMU-MOSI, CMU-MOSEI. By using metrics such as Precision, Recall, F-Measure and Accuracy the proposed DHF algorithm outperformed to provide best 7 class accuracy of 95.84%.
Loading