Semantic Similarities Using Classical Embeddings in Quantum NLP

Damir Cavar, Chi Zhang

Published: 01 Jan 2024, Last Modified: 16 May 2025QCE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We demonstrate how word and text embeddings as n-dimensional real vectors used in classical Natural Language Processing (NLP)/AI applications can be mapped to quantum states or quantum embedding representations in Quantum Applications using NLP and AI. Models like fastText [1], [2], GloVe [3], Numberbatch [4], or BERT [5] are common NLP embedding vectors or models that encode semantic properties of words or text fragments. We use these models to evaluate mapping properties and compression rate, as well as information preservation in quantum embeddings. For mapping embedding vectors to quantum states, we use encoding strategies, such as Amplitude Encoding. The encoding strategies allow us to map large dense vectors from the embedding models to compact quantum states using different compression ratios. The compression with Amplitude Encoding can be $2^{n}$ to $N$, resulting in a mapping of a 1,024-dimensional vector of reals in the classical environment to a 10-qubit state. The goal of this work is to evaluate these strategies with respect to their compression ratio and measure the preservation of semantic information using similarity scores. We show that the resulting quantum embeddings based on mapped classical computing embeddings exhibit the same relational properties and that there is no significant loss of semantic information in the conversion from classical n-dimensional real vector embeddings to qubit states. We conclude that the experimental results allow us to quantum compute semantic similarities of words or text, reusing existing and freely available embedding models from classical NLP/AI computing.