"It Doesn’t Know Anything About my Work": Participatory Benchmarking and AI Evaluation in Applied Settings

Elizabeth Anne Watkins; Emanuel Moss; Ramesh Manuvinakurike; Christopher Persaud; Giuseppe Raffa; Lama Nachman

"It Doesn’t Know Anything About my Work": Participatory Benchmarking and AI Evaluation in Applied Settings

Elizabeth Anne Watkins, Emanuel Moss, Ramesh Manuvinakurike, Christopher Persaud, Giuseppe Raffa, Lama Nachman

Published: 24 Sept 2025, Last Modified: 26 Nov 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: evaluation, benchmarking, measurement, participatory, sociotechnical, applied, manufacturing

TL;DR: We report on a participatory benchmarking study of an AI assistant in manufacturing, showing how incorporating end-users’ situated expertise enables more nuanced, context-aware evaluations of model performance.

Abstract: This paper investigates the benefits of socially embedded approaches to model evaluation. We present findings from a participatory benchmarking evaluation of an AI assistant deployed in a manufacturing setting, demonstrating how evaluation practices that incorporate end-users’ situated expertise enable more nuanced assessments of model performance. By foregrounding context-specific knowledge, these practices more accurately capture real-world functionality and inform iterative system improvement. We conclude by outlining implications for the design of context-aware AI evaluation frameworks.

Submission Number: 115

Loading