Semantic similarity prediction is better than other semantic similarity measures

Published: 23 Jan 2024, Last Modified: 23 Jan 2024Accepted by TMLREveryoneRevisionsBibTeX
Abstract: Semantic similarity between natural language texts is typically measured either by looking at the overlap between subsequences (e.g., BLEU) or by using embeddings (e.g., BERTScore, S-BERT). Within this paper, we argue that when we are only interested in measuring the semantic similarity, it is better to directly predict the similarity using a fine-tuned model for such a task. Using a fine-tuned model for the Semantic Textual Similarity Benchmark tasks (STS-B) from the GLUE benchmark, we define the STSScore approach and show that the resulting similarity is better aligned with our expectations on a robust semantic similarity measure than other approaches.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url:
Changes Since Last Submission: In the last revision I forgot to add the broader impact - this was still from the template and missed them creating the PDF. This is now fixed. I am sorry for the additional effort for the editors this caused.
Supplementary Material: zip
Assigned Action Editor: ~Vlad_Niculae2
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1748