Abstract: Early-exiting predictions in a deep Transformer network evolve from layer to layer in a somewhat smooth process. This has been exploited in language modeling to improve factuality, with the observation that factual associations emerge in later layers. We find that a similar process occurs in standard multiway text classification, motivating us to propose Linear Layer Extrapolation, which finds stable improvements by recasting contrastive inference as linear extrapolation. Experiments across multiple models and emotion classification datasets find that Linear Layer Extrapolation outperforms standard classification on fine-grained sentiment analysis tasks.
Paper Type: long
Research Area: Sentiment Analysis, Stylistic Analysis, and Argument Mining
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data analysis
Languages Studied: English
0 Replies
Loading