Capability Hallucination in LLM Agents
Capability hallucination is the failure mode in which a large language model agent claims to have performed an action or used a tool that it has no actual access to. It is distinct from refusing a real capability: the model imitates assistant-like language about sending emails, setting reminders, or searching the web even when no such tool is wired up. The root cause is training data full of those phrasings; mitigations include strict tool schemas, ReAct-style traces, and explicit refusal training.
Capability hallucination is a failure mode of LLM agents in which the model asserts that it has taken an action, called an external system, or used a tool that it does not actually have wired into its runtime. Typical surface forms are statements like "I have sent you the email," "I will set a reminder for tomorrow morning," or "Let me search the web for that" produced by a chat session that exposes no email client, scheduler, or browsing tool. The action is fabricated in text only; nothing happens in the outside world. This is sometimes called fabricated tool use or, in recent agent literature, tool hallucination. The pattern is distinct from refusing a capability the model genuinely has. A correctly configured agent with a Function Calling (LLM) interface attached to a mail API can actually send a message; capability hallucination is the inverse, where the model talks as if such a tool exists when none is registered. It is also distinct from tool selection errors, where the agent calls the wrong real tool, and from parameter hallucinations, where arguments are fabricated for a tool that does exist. The root cause is largely a training data artefact. Pretraining and instruction-tuning corpora are saturated with helpful-assistant phrasings in which a human or fictional assistant claims to perform exactly these actions. The base model learns the linguistic shape of "I sent the email" long before any deployment wires up an email tool, so during generation it produces the most probable next assistant-like sentence without inspecting the actual tool list. Reinforcement learning from human feedback can amplify this when raters reward confident, action-shaped completions. Recent work on agentic systems also notes that scaling reasoning capability does not automatically suppress fabricated tool calls and can even increase them. Mitigations operate at several layers. Strict, machine-checked tool schemas let the runtime reject any "call" that is not a well-formed invocation of a registered function, so a fabricated send_email cannot silently succeed. ReAct prompting, introduced by Yao and colleagues in 2022, interleaves explicit reasoning steps with observable actions, which makes phantom actions easier to spot and to ground against tool outputs. Toolformer (Schick et al., 2023) showed that models can be trained to insert real API calls in context, narrowing the gap between described and executed tool use. A clean separation of planner and executor roles, together with explicit "I cannot do that here" training examples and post-hoc verification of claimed effects, further reduces the rate at which agents invent capabilities they were never given.