Keywords: attention, pretraining, finetuning, classification
Abstract: State-of-the-art Extreme Multi-Label Text Classification models rely on multi-label attention to focus on key tokens in input text, but learning good attention weights is challenging.
We introduce PLANT — Pretrained and Leveraged Attention — a plug-and-play strategy for initializing attention.
PLANT works by planting label-specific attention using a pretrained Learning-to-Rank model guided by mutual information gain.
This architecture-agnostic approach integrates seamlessly with large language model backbones (e.g., Mistral, LLaMA, DeepSeek, and Phi-3).
PLANT outperforms state-of-the-art methods across tasks such as ICD coding, legal topic classification, and content recommendation.
Gains are especially pronounced in few-shot settings, with substantial improvements on rare labels. Ablation studies confirm that attention initialization is a key driver of these gains.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 15590
Loading