Deep Think with Rehearsal for Low-Latency Team-AI Collaboration

ACL ARR 2026 January Submission2587 Authors

03 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Human-AI Collaboration, Large Language Models, LLM reasoning, Low-Latency Interaction, Biomedical NLP, Tool-Augmented Reasoning
Abstract: The integration of Large Language Models (LLMs) into scientific team meetings presents exciting opportunities to accelerate biomedical discovery, especially through their strong deep thinking capabilities enabled by multi-step reasoning and web search. However, such methods are computationally expensive and introduce substantial latency, limiting their effectiveness in real-time Team-AI communications. In this study, we propose Deep Think with Rehearsal (DTR), a novel framework that decouples deep reasoning from synchronous interaction in the AI4Science context. DTR transfers computationally intensive reasoning into an offline rehearsal phase, allowing the LLM to pre-cognize complex scientific contexts and deliver high-quality, "deep" insights with minimal latency during live interactions. To facilitate this research, we introduce the Scientific Team Meeting Dataset (STMD), a hybrid benchmark comprising authentic transcripts from three real-world biomedical research labs alongside extensive simulated multi-party deliberations synthesized from PubMed literature. Experiments in both simulated and real-world settings demonstrate that DTR consistently improves response quality while reducing inference latency compared to state-of-the-art methods, highlighting the effectiveness of rehearsal in enabling low-latency, high-quality scientific collaboration.
Paper Type: Long
Research Area: Human-AI Interaction/Cooperation and Human-Centric NLP
Research Area Keywords: human-AI interaction/cooperation, participatory/community-based NLP, human-in-the-loop, human-centered evaluation, value-centered design, user-centered design
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 2587
Loading