Automated Benchmarking of LLMs: Applying Regression to Estimate LLM Accuracy

Suryaansh Jain; Umair Z. Ahmed; Shubham Sahai; Ben Leong

Automated Benchmarking of LLMs: Applying Regression to Estimate LLM Accuracy

Suryaansh Jain, Umair Z. Ahmed, Shubham Sahai, Ben Leong

Published: 21 Apr 2025, Last Modified: 05 Jul 2025AI4X 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Benchmarking, Precision Estimate, Regression

TL;DR: A novel framework to estimate LLM's performance on subjective tasks by leveraging LLM based evaluators and limited human annotations

Confirmation Of Submission Requirements: I submit a previously published paper. It was published in an archival peer–reviewed venue on or after September 8th 2024, I specify the DOI in the field below, and I submit the camera-ready version of the paper.

Submission Number: 242

Loading