TOGBench: A Developer-Written Multi-Variant Dataset and Benchmark Suite for Test Oracle Generation

Published: 14 May 2026, Last Modified: 14 May 2026AIWare 2026 Benchmark and DatasetEveryoneRevisionsCC BY 4.0
Keywords: dataset, benchmark, test oracle generation, software engineering
Abstract: Test oracles determine whether a program execution is correct for a given input. Two common forms are assertion oracles, which compare observed outputs with expected results, and exception oracles, which verify that a program raises an expected exception. Automated test oracle generation (TOG) aims to reduce the manual effort involved in constructing such oracles. Although recent TOG methods, especially LLM-based approaches, have made rapid progress, their evaluation remains constrained by benchmarks that rely on automatically generated tests, narrow single-assert formulations, simplified developer-written tests, or limited oracle diversity. To address these limitations, we introduce OE25𝑑𝑒𝑣 , a multi-variant dataset curated from developer-written unit tests across 25 open-source Java projects spanning 56 modules, and TOGBench, an end-to-end benchmark suite for TOG. OE25𝑑𝑒𝑣 captures six oracle categories and preserves realistic settings, including single- and multi-oracle configurations, mixed assertion-and-exception oracles, and developer-authored custom oracles. TOGBench supports end-to-end experimentation by reintegrating generated oracles into runnable test suites and evaluating them via compilation, execution, false-positive analysis, and mutation testing. Our evaluation further shows that OE25𝑑𝑒𝑣 preserves substantially greater structural complexity than prior benchmarks and exposes marked performance degradation of representative TOG models on developer-written tests, particularly for assertion oracles.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 18
Loading