Keywords: Jailbreaking, Hijacking, Agentic AI, Function Calling Models, LLMs, Security
TL;DR: This paper introduces the first function hijacking attack that manipulates the tool selection process of agentic models to force the invocation of a specific, attacker-chosen function.
Abstract: The growth of agentic AI has drawn significant attention to function calling Large Language Models (LLMs), which are designed to extend the capabilities of AI-powered system by invoking external functions. Injection and jailbreaking attacks have been extensively explored to showcase the vulnerabilities of LLMs to user prompt manipulation. The expanded capabilities of agentic models introduce further vulnerabilities via their function calling interface. Recent work in LLM security showed that function calling can be abused, leading to data tampering and theft, causing disruptive behavior such as endless loops, or causing LLMs to produce harmful content in the style of jailbreaking attacks. This paper introduces the first function hijacking attack that manipulates the tool selection process of agentic models to force the invocation of a specific, attacker-chosen function. We conducted experiments on 3 different models, reaching 80% to 98% ASR over the established BFCL dataset. We also introduce FunSecBench, an extension of the BFCL dataset to assess the vulnerability of function calling models to the triggering of attacker-selected functions. Our findings further demonstrate the need for strong guardrails and security modules for agentic systems.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17235
Loading