VoMBaT: A Tool for Visualising Evaluation Measure Behaviour in High-Recall Search Tasks

Wojciech Kusa; Aldo Lipani; Petr Knoth; Allan Hanbury

VoMBaT: A Tool for Visualising Evaluation Measure Behaviour in High-Recall Search Tasks

Wojciech Kusa, Aldo Lipani, Petr Knoth, Allan Hanbury

18 Oct 2023ACM SIGIR Badging SubmissionReaders: Everyone

Abstract: The objective of high-recall information retrieval (HRIR) is to retrieve as many relevant documents as possible for a given search topic. One type approach to HRIR is Technology-Assisted Review (TAR), which uses information retrieval and machine learning techniques to aid the review of large document collections. TAR systems are commonly used in legal eDiscovery and systematic literature reviews. Successful TAR systems are able to find the majority of relevant documents using the least number of assessments. Commonly used retrospective evaluation assumes that the system achieves a specific, fixed recall level first, and then measures the precision or work saved (e.g., precision at r% recall). This approach can cause problems related to understanding the behaviour of evaluation measures in a fixed recall setting. It is also problematic when estimating time and money savings during technology-assisted reviews. This paper presents a new visual analytics tool to explore the dynamics of evaluation measures depending on recall level. We implemented 18 evaluation measures based on the confusion matrix terms, both from general IR tasks and specific to TAR. The tool allows for a comparison of the behaviour of these measures in a fixed recall evaluation setting. It can also simulate savings in time and money and a count of manual vs automatic assessments for different datasets depending on the model quality. The tool is open-source, and the demo is available under the following URL: https://vombat.streamlit.app

Artifact Type Made Available By Authors: Code

Requested Badges: Artifacts Evaluated – Functional, Artifacts Evaluated – Reusable and Available

Venue Accepted: ACM SIGIR

0 Replies

Loading