Keywords: language models, zero-shot learning, commonsense reasoning, calibration
TL;DR: We investigate the effects of prompt engineering and calibration on small language models on multiple choice commonsense reasoning.
Abstract: Prompt engineering and calibration make large language models excel at reasoning tasks, including multiple choice commonsense reasoning. From a practical perspective, we investigate and evaluate these strategies on smaller language models. Through experiments on five commonsense reasoning benchmarks, we find calibration favors GPT-2 and T5, prompt engineering favors Flan-T5, but their joint effects are mostly negative.
10 Replies
Loading