TSFC: A Hierarchical Classification Framework for Multimodal Commercial Contents in Bengali

Published: 01 Jan 2026, Last Modified: 27 May 2026IEEE AccessEveryoneRevisionsCC BY-SA 4.0
Abstract: Social media has become a crucial platform for businesses to promote their products, significantly affecting consumer behavior and advertising success. Automatically identifying commercial content in the diverse social media landscape is vital for targeted advertising and brand monitoring. Despite significant progress in multimodal content classification for high-resource languages (HRLs) in recent years, similar advancements in domain-specific tasks for low-resource languages (LRLs) remain in their infancy. This paper introduces MHDC3 (Multimodal Hierarchical Dataset for Commercial Content Classification), a novel dataset comprising 5,007 Bengali-language social media posts, primarily categorized into commercial(Com) and non-commercial(NCom) classes. In addition to binary classification, this work expands the task to fine-grained categorization, dividing commercial posts into four distinct categories: Fashion(Fa), Food(Fo), Lifestyle(LS), and Trends and Tech(T&T). This paper proposes TSFC (TextSelf-FusionCross), a cross-attention-based multimodal model designed to address both coarse-grained and fine-grained classification tasks. TSFC employs self-attention over textual features and subsequently fuses them with visual representations via a cross-attention mechanism, enabling comprehensive integration of multimodal cues. Extensive experiments with MHDC3 demonstrate that TSFC outperforms several state-of-the-art multimodal baselines, achieving the highest F1-score of 94.38% (coarse-grained classification) and 94.80% (fine-grained classification). These results highlight the efficacy of attention-based multimodal fusion for detecting commercial content in languages like Bengali.
Loading