Polarity-Aware Semantic Retrieval with Fine-Tuned Sentence Embeddings

Tollef Emil Jørgensen; Jens Breitung

Polarity-Aware Semantic Retrieval with Fine-Tuned Sentence Embeddings

Tollef Emil Jørgensen, Jens Breitung

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: sentence embeddings, transformers, fine-tuning, classification, semantic textual similarity

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: This paper explores fine-tuning of sentence embeddings to retrieve similar sentences of the same classification, where we evaluate the trade-off between semantic similarity and polarity to find an optimal configuration.

Abstract: This paper investigates the effectiveness of fine-tuning sentence embeddings for simultaneously retrieving sentences of equal polarity and high semantic similarity. We define two opposing metrics to support evaluation: Polarity Score and Semantic Similarity Score, used in a test suite with various lightweight sentence-transformer models, hyperparameters and loss functions. We perform evaluations on two binary classification problems from different domains: the SST-2 dataset for sentiment analysis and on detecting sarcastic news headlines. Our findings show a trade-off between a model's capability for retaining semantic similarity while being fine-tuned to differentiate between the polarity of training data. By accepting a minor decrease in semantic similarity, however, we achieve polarity scores far higher than the baselines. The results and modeling scheme allows for using a single, efficient model for text analytics systems suitable for in-domain retrieval.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3876

Loading