Evaluate Confidence Instead of Perplexity for Unsupervise Commonsense ReasoningDownload PDF

Anonymous

17 Apr 2023ACL ARR 2023 April Blind SubmissionReaders: Everyone
Abstract: In this paper, we present a novel approach to unsupervised commonsense reasoning that outperforms conventional perplexity evaluation. Specifically, we propose the use of non-replacement confidence (NRC), which is evaluated by a pre-trained token corruption discriminator. We show that NRC is a more consistent metric for commonsense reasoning, as it allows for equal synonym positiveness and negative sample learning. Our experiments using the ELECTRA discriminator demonstrate that NRC significantly outperforms perplexity on both tuple and sentence-level commonsense knowledge databases. Moreover, we show that NRC sets a new unsupervised state-of-the-art (SOTA) on seven commonsense question answering tasks, outperforming even complex reasoning systems. In supervised learning, we find that NRC is the most successful metric for applying pre-trained knowledge on annotated data for inference. In fact, without negative samples, NRC achieves between $82.8\%$ and $90.0\%$ of the performance of supervised methods, significantly outperforming other metrics under weaker supervision. To further improve the performance of NRC, we propose a new scenario in which the discriminator is first pre-trained on positive samples and then the NRC evaluation of negative samples is incorporated to tune the confidence. This approach significantly outperforms conventional fine-tuning by an average of $2.0$ accuracy points. In summary, our research indicates that NRC is a superior metric compared to perplexity when it comes to learning commonsense knowledge under various supervision settings.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
0 Replies

Loading