Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model
Abstract: Large Language Models (LLMs) with API calling capabilities enabled building effective Language Agents (LA), while also revolutionizing the conventional task-oriented dialogue (TOD) paradigm. However, current approaches face a critical dilemma: TOD systems are often trained on a limited set of target APIs, requiring new data to maintain their quality when interfacing with new services, while LAs are not trained to maintain user intent over multi-turn conversations. Because both robust multi-turn management and advanced function calling are crucial for effective conversational agents, we evaluate these skills on three popular benchmarks: MultiWOZ 2.4 (TOD), BFCL V3 (LA), and API-Bank (LA)—and our analyses reveal that specialized approaches excel in one domain but underperform in the other.
To bridge this chasm, we introduce **CALM** (**C**onversational **A**gentic **L**anguage **M**odel), a unified approach that integrates both conversational and agentic capabilities.
We created **CALM-IT**, a carefully constructed multi-task dataset that interleave multi-turn ReAct reasoning with complex API usage.
Using CALM-IT, we train three models **CALM 8B**, **CALM 70B**, and **CALM 405B**, which outperform top domain-specific models, including GPT-4o, across all three benchmarks.
This demonstrates the feasibility of a single model approach for both TOD and LA, setting a new standard for conversational agents. We release code, model weights, datasets, and training artifacts to support future research.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Conversational Agents, Tool Usage, Task Oriented Dialogue, Generalization
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 7041
Loading