Inceptive Transformer: Augmenting Transformer Models with Multi-Scale Feature Learning for Generalized Cross-Domain Text Classification

ACL ARR 2025 February Submission7129 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: In this work we introduce $\textit{Inceptive Transformers}$, an architecture designed to enhance transformer based models by incorporating a multi-scale feature extraction module inspired by inception networks. Unlike conventional transformers, which compress the information from all tokens into a single $\texttt{[CLS]}$ token to capture the global context of the sequence, our model balances local and global dependencies by dynamically weighting token interactions, enriching their representations for downstream tasks. We propose a generalizable framework that can be integrated into both domain-specific pre-trained models (e.g., BERTweet, BioBERT, CT-BERT) and general-purpose models like RoBERTa. We evaluate our models on a diverse range of text classification tasks, including emotion recognition, irony detection, disease identification, and anti-COVID vaccine tweets classification, covering both multi-class and multi-label settings. Results show that our models consistently outperform baseline transformers by 1% to 9% while maintaining efficiency, highlighting the versatility and generalization capabilities of Inceptive Transformers across diverse domains and applications.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: model architectures, efficient models, applications, domain adaptation, generalization of NLP Models
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 7129
Loading