Don’t Just Pay Attention, PLANT It: Transfer L2R Models to Fine-tune Attention in Extreme Multi-Label Text Classification For ICD Coding
Abstract: State-of-the-art Extreme Multi-Label Text Classification (XMTC) models rely heavily on multi-label attention layers to focus on key tokens in input text, but obtaining optimal attention weights is challenging and resource-intensive. To address this, we introduce \plant — \textbf{P}retrained and \textbf{L}everaged \textbf{A}tte\textbf{NT}ion — a novel transfer learning strategy for fine-tuning XMTC decoders.
\plant surpasses existing state-of-the-art methods across all metrics on the MIMIC-III, MIMIC-III-top50, and MIMIC-IV datasets. It particularly excels in few-shot ICD coding, outperforming previous models specifically designed for few-shot scenarios by over 50 percentage points in F1 scores on MIMIC-III-rare50 and by over 36 percentage points on MIMIC-III-few, demonstrating its superior capability in handling rare codes. \plant also shows remarkable data efficiency in few-shot settings, achieving precision comparable to traditional models with significantly less data.
These results are achieved through key technical innovations: leveraging a pretrained Learning-to-Rank (L2R) model as the planted attention layer, integrating mutual-information gain to enhance attention, introducing an inattention mechanism, and implementing a stateful-decoder to maintain context. Comprehensive ablation studies validate the importance of these contributions in realizing the performance gains.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: NLP Applications, Efficient/Low-Resource Methods for NLP, Information Extraction, Information Retrieval and Text Mining, Language Modeling, Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 437
Loading