On the feasibility of semantic query metrics

Published: 2025, Last Modified: 14 Dec 2025CoRR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We consider the problem of defining semantic metrics for relational database queries. Informally, a semantic query metric for a query language $L$ is a metric function $δ:L\times L\to \mathbb{N}$ where $δ(Q_1, Q_2)$ represents the length of a shortest path between queries $Q_1$ and $Q_2$ in a graph. In this graph, nodes are queries from $L$, and edges connect semantically distinct queries where one query is maximally semantically contained in the other. Since query containment is undecidable for first-order queries, we focus on the simpler language of conjunctive queries. We establish that defining a semantic query metric is impossible even for conjunctive queries. Given this impossibility result, we identify a significant subclass of conjunctive queries where such a metric is feasible, and we establish the computational complexity of calculating distances within this language.
Loading