Submission Track: Full Paper
Submission Category: AI-Guided Design + Automated Chemical Synthesis
Keywords: Knowledge graph, total synthesis, data extraction, benchmark
TL;DR: A benchmark for data extraction of organic syntheses from papers is proposed, along with LLM-based data extraction algorithms
Abstract: Knowledge graphs (KGs) have emerged as a pow-
erful tool for organizing and integrating complex
information, making it a suitable format for sci-
entific knowledge. However, translating scientific
knowledge into KGs is challenging as a wide va-
riety of styles and elements to present data and
ideas is used. Although efforts for KG extraction
(KGE) from scientific documents exist, evalua-
tion remains challenging and field-dependent; and
existing benchmarks do not focuse on scientific
information. Furthermore, establishing a general
benchmark for this task is challenging as not all
scientific knowledge has a ground-truth KG repre-
sentation, making any benchmark prone to ambi-
guity. Here we propose Graph of Organic Synthe-
sis Benchmark (GOSyBench), a benchmark for
KG extraction from scientific documents in chem-
istry, that leverages the native KG-like structure
of synthetic routes in organic chemistry. We de-
velop KG-extraction algorithms based on LLMs
(GPT-4, Claude, Mistral) and VLMs (GPT-4o),
the best of which reaches 73% recovery accuracy
and 59% precision, leaving a lot of room for im-
provement. We expect GOSyBench can serve as
a valuable resource for evaluating and advancing
KGE methods in the scientific domain, ultimately
facilitating better organization, integration, and
discovery of scientific knowledge
Submission Number: 19
Loading