Inceptive Transformers: Enhancing Contextual Representations through Multi-Scale Feature Learning Across Domains and Languages

ACL ARR 2025 May Submission6878 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Conventional transformer models typically compress the information from all tokens in a sequence into a single $\texttt{[CLS]}$ token to represent global context— an approach that can lead to information loss in tasks requiring localized or hierarchical cues. In this work, we introduce $\textit{Inceptive Transformer}$, a modular and lightweight architecture that enriches transformer-based token representations by integrating a multi-scale feature extraction module inspired by inception networks. Our model is designed to balance local and global dependencies by dynamically weighting tokens based on their relevance to a particular task. Evaluation across a diverse range of tasks including emotion recognition (both English and Bangla), irony detection, disease identification, and anti-COVID vaccine tweets classification shows that our models consistently outperform the baselines by 1\% to 14\% while maintaining efficiency. These findings highlight the versatility and cross-lingual applicability of our method for enriching transformer-based representations across diverse domains.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Model architectures, Efficient models, Applications, Multilingualism and Cross-Lingual NLP, Generalization of NLP Models
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings
Languages Studied: English, Bangla
Submission Number: 6878
Loading