Talmud-IR: A Talmud-Inspired Interface for Discussing RAG Response Quality

Wojciech Kusa, Niklas Deckers, Maik Fröbe, Laura Dietz, Birte Platow, Mark Sanderson

Published: 2026, Last Modified: 15 Apr 2026ECIR (4) 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Retrieval-augmented generation (RAG) systems promise factually grounded answers, yet evaluating their quality remains difficult. Automated metrics and LLM-as-judge approaches offer scalability but risk circularity, benchmark leakage, and loss of diversity. Human assessors, meanwhile, often struggle to notice subtle omissions or hallucinations when responses appear linguistically fluent and confident. We present Talmud-IR, a novel user interface inspired by the dialogic structure of the Talmud. It visualizes RAG outputs as a central text surrounded by layers of evidence, commentary, and meta-assessment, enabling sustained human–LLM discussion about system quality and failure priorities. The prototype supports comparative RAG evaluation, collaborative exploration of “unknown unknowns,” and pedagogical use for teaching critical reading of AI-generated content. Code and Prototype: https://github.com/WojciechKusa/talmud-ir
Loading