Improving Your Model Ranking on Chatbot Arena by Vote Rigging

Published: 06 Mar 2025, Last Modified: 18 Mar 2025ICLR 2025 FM-Wild WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Chatbot Arena, Vote Rigging, Large Language Models, Leaderboard Manipulation
TL;DR: We demonstrate that we could improve model rankings on Chatbot Arena by vote rigging.
Abstract: Chatbot Arena is a popular platform for evaluating LLMs by pairwise battles, where users vote for their preferred response from two randomly sampled anonymous models. While Chatbot Arena is widely regarded as a reliable LLM ranking leaderboard, we show that crowdsourced voting can be *rigged* to improve (or decrease) the ranking of a target model $m_{t}$. We first introduce a straightforward **target-only rigging** strategy that focuses on new battles involving $m_{t}$, identifying it via watermarking or a binary classifier, and exclusively voting for $m_{t}$ wins. However, this strategy is practically inefficient because there are over $190$ models on Chatbot Arena and on average only about $1\%$ of new battles will involve $m_{t}$. To overcome this, we propose **omnipresent rigging** strategies, exploiting the Elo rating mechanism of Chatbot Arena that any new vote on a battle can influence the ranking of the target model $m_{t}$, even if $m_{t}$ is not directly involved in the battle. We conduct experiments on around $1.7$ *million* historical votes from the Chatbot Arena Notebook, showing that omnipresent rigging strategies can improve model rankings by rigging only *hundreds of* new votes. While we have evaluated several defense mechanisms, our findings highlight the importance of continued efforts to prevent vote rigging.
Submission Number: 77
Loading