ClawBot-Matching: Bidirectional, Explainable, and Learnable Collaboration Matching in Mixed Human-Agent Networks

Published: 23 May 2026, Last Modified: 25 May 2026ACM CAIS 2026: RLEval Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning evaluation, RL evaluation environments, LLM agents, multi-agent systems, collaborative agents, agent benchmarking, human-agent collaboration, mixed human-agent networks, collaboration matching, task allocation, team formation, expert matching, multi-stakeholder recommendation, bidirectional matching, explainable matching, online feedback learning, bandit learning, uncertainty-aware exploration, reward design, agent coordination, agent evaluation metrics, human-centered AI evaluation, simulation-based evaluation, match-card explanations, capability modeling, constraint-aware recommendation, safety gates, consent-aware systems, cold-start evaluation, agent trust, auditable AI systems, Scientist ClawBot, ClawBot-Matching
TL;DR: ClawBot-Matching evaluates how mixed human-agent systems can form better collaborations by matching tasks with the right humans, agents, or skills through explainable, constraint-aware, and feedback-driven recommendations.
Abstract: Agent evaluation usually measures what an agent does after it receives a task. Mixed human-agent systems face an earlier evaluation question: which human, proxy agent, service agent, or reusable skill should join the task at all? We present $\textbf{ClawBot-Matching}$, a matching environment for Scientist ClawBot Hub. The system parses natural-language interaction into structured capabilities, needs, and constraints. It then uses $MapScore$ to estimate bidirectional value, asking whether a candidate covers the requester's residual capability gap and whether the task satisfies the candidate's participation needs. Hard constraints such as consent, data access, safety, and availability act as gates. A coarse-to-fine planner combines greedy ranking, uncertainty-aware exploration, and LLM-based dream simulation for the top candidates. Online feedback updates capability priors and scoring weights. We frame this pipeline as an RL-style evaluation environment, where the state is a mixed human-agent network, the action is a proposed match, and the reward combines explicit user feedback, behavior signals, and downstream outcomes. The released prototype supports one-to-one ranking, one-to-many team construction, explanation cards, fixture scenarios, and regression tests. Code is available at https://github.com/kexinchu/clawbot-matching.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 11
Loading