SpectraQuery: A Hybrid Retrieval-Augmented Conversational Assistant for Battery Science

ACL ARR 2026 January Submission9134 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: hybrid question answering, retrieval-augmented generation, text-to-SQL, semantic parsing, SUQL, scientific question answering, literature-grounded generation, Raman spectroscopy, battery materials, human-in-the-loop evaluation, LLM-as-a-judge
Abstract: Scientific reasoning increasingly requires linking structured experimental data with the unstructured literature that explains it, yet most large language model (LLM) assistants cannot reason jointly across these modalities. We introduce SpectraQuery, a hybrid natural-language query framework that integrates a relational Raman spectroscopy database with a vector-indexed scientific literature corpus using a Structured and Unstructured Query Language (SUQL)–inspired design. By combining semantic parsing with retrieval-augmented generation, SpectraQuery translates open-ended questions into coordinated SQL and literature retrieval operations, producing cited answers that unify numerical evidence with mechanistic explanation. Across SQL correctness, answer groundedness, retrieval effectiveness, and expert evaluation, SpectraQuery demonstrates strong performance: approximately 80% of generated SQL queries are fully correct, synthesized answers reach 93–97% groundedness with 10–15 retrieved passages, and battery scientists rate responses highly across accuracy, relevance, grounding, and clarity (4.1–4.6/5). These results show that hybrid retrieval architectures can meaningfully support scientific workflows by bridging data and discourse for high-volume experimental datasets.
Paper Type: Long
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: Question Answering, Information Retrieval and Text Mining, Generation, Dialogue and Interactive Systems, Resources and Evaluation
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: English
Submission Number: 9134
Loading