Extracting Salient Facts from Company Reviews with Scarce Labels

Jinfeng Li, Nikita Bhutani, Alexander Whedon, Chieh-Yang Huang, Estevam Hruschka, Yoshihiko Suhara

20 Jan 2022OpenReview Archive Direct UploadReaders: Everyone

Abstract: In this paper, we propose a task of extracting salient facts from online company reviews. Salient facts present unique and distinctive information about a company, which helps the user decide whether to apply to the company. We formulate the salient fact extraction task as a text classification problem and leverage pre-trained language models to tackle the problem. However, the scarcity of salient facts in company reviews causes a serious label imbalance issue, which makes it difficult to take full advantage of pre-trained language models. To address the issue, we develop two data enrichment methods: the first one, representation enrichment, which highlights uncommon tokens by appending special tokens, and the second one, label propagation, which creates pseudo positive examples from unlabeled data in an interactive manner. Experimental results on an online company review corpus show that our approach improves the performance of pre-trained language models by up to an F1 score of 0.24. We also confirm that our approach competitively performs well against the state-of-the-art data augmentation method on the SemEval 2019 benchmark, even when trained with only 20% of training data.

0 Replies