All in the Head?: A Controlled Study of Component Contributions in Few-Shot NLP

Rishaan Desai

All in the Head?: A Controlled Study of Component Contributions in Few-Shot NLP

Rishaan Desai

Published: 02 Mar 2026, Last Modified: 02 Mar 2026Sci4DL 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: few-shot learning, few-shot text classification, classification head design, frozen encoder, representation learning, LSTM, attention mechanisms, data augmentation, controlled experiments, ablation studies, parameter-efficient learning, low-resource NLP, model efficiency, inductive bias, component analysis

TL;DR: Frozen BERT, varied head: a compact LSTM with attention and synonym augmentation boosts few-shot accuracy across SST-2/5, RAFT, AGNews. Ablations show both components add measurable gains with only ~3.1M trainable params.

Abstract: Few-shot text classification is often studied through model scaling or full fine-tuning, but less is known about how classification head design influences performance when representations are held fixed. This work examines that question under a controlled frozen-encoder setting, where a compact LSTM-based head is trained on top of contextual embeddings while all encoder parameters remain unchanged. We evaluate the effects of three design choices, recurrence, attention, and targeted synonym-based augmentation, across multiple few-shot benchmarks using a consistent protocol. Our experiments show that each component contributes measurable gains under tight data constraints, and that a small recurrent head can recover strong accuracy with only a few million trainable parameters. We report consistent improvements over simpler head configurations and competitive performance relative to compact transformer-based alternatives under identical conditions, while maintaining a low optimization footprint. These results provide evidence that head architecture and training choices remain consequential even with fixed contextual encoders, and highlight a simple controlled framework for studying inductive biases in low-shot classification systems.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Style Files: I have used the style files.

Submission Number: 111

Loading