CATTLE TRADE: A MULTI-AGENT BENCHMARK FOR LLM BLUFFING, BIDDING, AND NEGOTIATION

Published: 02 Mar 2026, Last Modified: 16 Apr 2026MALGAIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi Agent, Eval, LLM, Auction, Bluffing
TL;DR: We propose an llm benchmark about strategic ressource management and bluffing inspired by the game Kuhhandel/ Bluff it.
Abstract: Standard benchmarks evaluate LLM knowledge and single-agent reasoning, but miss the capabilities required for real-world strategic interaction: bluffing, negoti- ation, and resource management on a long term basis. Existing game benchmarks isolate individual skills, such as deception in Werewolf or bidding in simple auc- tions, rather than requiring their integrated deployment. We introduce CATTLE TRADE, a benchmark based on the card game Kuhhandel1 that integrates com- petitive auctions, hidden-information trades, and deceptive offers within 50–60 turn games. We evaluate 6 frontier LLMs across 33 games and find that strategic commitment, measured through offer values in trades and buy-right exercise rates, strongly predicts success, while pure bluffing strategies underperform.
Submission Number: 75
Loading