InferAct: Inferring Safe Actions for LLMs-Based Agents Through Preemptive Evaluation and Human Feedback

InferAct: Inferring Safe Actions for LLMs-Based Agents Through Preemptive Evaluation and Human Feedback

ACL ARR 2024 June Submission630 Authors

12 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: A crucial requirement for deploying LLM-based agents in real-life applications is the robustness against risky or even irreversible mistakes. However, the existing research lacks a focus on preemptive evaluation of reasoning trajectories performed by LLM agents, leading to a gap in ensuring safe and reliable operations. To explore better solutions, this paper introduces $\texttt{InferAct}$, a novel critic that leverages the Theory-of-Mind capability of LLMs to proactively detect potential errors before critical actions are executed (e.g., $\textit{`buy-now'}$ in automatic online trading or web shopping). $\texttt{InferAct}$ is also capable of integrating human feedback to prevent irreversible risks as well as enhance the actor agent's decision-making process. Experiments on three widely-used tasks demonstrate the effectiveness of $\texttt{InferAct}$. The proposed solution presents a novel approach and concrete contributions toward developing LLM agents that can be safely deployed in different environments involving critical decision-making.

Paper Type: Long

Research Area: Human-Centered NLP

Research Area Keywords: human-AI interaction, human-centered evaluation

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 630

Loading