Abstract: We present a multi-level knowledge graph for formal mathematics built on the
SciLibRU ontology. This ontology is modular and is formalized in OWL and description logic.
It strictly distinguishes three levels of description: the interpretation level, which captures what
an object means; the representation level, which describes how this object is expressed; and
the resource level, which specifies where the object is stored and in what form it exists. This
separation yields an addressee-invariant knowledge space suitable for both human researchers
and AI agents. The knowledge graph is materialized from the filtered Mathlib library which
is a formalized mathematical library implemented in the Lean 4 proof assistant. It consists
from approximately 190 000 statements with typed dependency edges and a domain taxonomy
of 660 subclasses with provenance metadata. A multimodal embedding model trained on five
types of mathematical objects correctly matches elements from different modalities 74% of the
time on the first try. We demonstrate the practical utility of the approach through combined
vector-graph structured lemmas retrieval for automated theorem proving with LLM DeepSeek-
proover-v2 7B. On the MiniF2F benchmark, which includes 488 tasks and 50,752 runs, our
system significantly outperforms three state-of-the-art search engines for Lean: LeanSearch,
LeanFinder, and LeanExplore. All observed improvements are statistically significant. Our
vector search and LeanSearch produce statistically indistinguishable results, with a p-value
close to 1. This confirms that the observed improvement is driven by the use of graph structure,
specificallythesymboliclayer, aswellasontology-basedhintretrievalandcategorization, rather
than by embedding quality alone.
Loading