Debiasing LLM-as-a-Judge in Cybersecurity: A Study of Judge Reliability and Human Alignment

12 Sept 2025 (modified: 25 Sept 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, LLM-as-a-judge, cybersecurity, script analysis, human-AI alignment
Abstract: As Large Language Models (LLMs) increasingly serve as automated judges for evaluating natural language outputs, understanding their alignment with human judgment becomes critical, particularly in specialized domains like cybersecurity. This paper presents a comprehensive evaluation of LLM-as-a-judge performance in cybersecurity script analysis through extensive human annotation, evaluating multiple judge models across two distinct prompting strategies. First, we constructed a novel dataset containing over 1,000 samples of both clean and malicious scripts with expert-curated natural language annotations that serve as reference summaries for LLM judge evaluation. Second, using pairwise comparison, we evaluate human-judge alignment across multiple LLM models through manual assessment of judge decisions. We provide insights into which LLM judges demonstrate stronger alignment in cybersecurity contexts and analyse key factors affecting LLM-judge quality, including analysis of self-preference bias and prompting strategy effects on judge performance. Our findings contribute to the growing body of work on LLM-as-a-judge reliability while addressing the critical need for automated evaluation in cybersecurity applications where natural language comparison is essential for scalable analysis. This work contributes empirical findings on LLM judge performance in specialized domains along with a dataset of manual annotations for script explanation, and provides evidence-based guidance for selecting appropriate judge models in cybersecurity evaluation tasks.
Primary Area: datasets and benchmarks
Code Of Ethics: true
Submission Guidelines: true
Anonymous Url: true
No Acknowledgement Section: true
Submission Number: 4384
Loading