Format-Following Failures in LLMs

When asked for strictly formatted output (JSON, CSV, exactly N bullets, no preamble), LLMs reliably drift: they add 'Here is the JSON you requested:' wrappers, wrap output in markdown fences, emit four bullets instead of three, or produce trailing commas that break parsers. The root cause is structural — training data is conversational, and {{RLHF}} rewards helpful-looking, explanatory answers. Prompting alone hits 5-20% failure rates; the actual fix is {{constrained decoding}} or tool-call APIs that enforce the schema at the token level.

Ask a large language model for strict output — valid JSON, a CSV row, exactly three bullet points, no preamble — and it will frequently drift in predictable ways. Common failure modes include polite preambles ("Here is the JSON you requested:"), wrapping the payload in ```json markdown fences, trailing commas, single-quoted keys, hallucinated extra fields, renamed keys for "clarity", four bullets where three were demanded, and type drift such as returning a string "42" where an integer was specified. Studies of prompt-only JSON extraction report failure rates between roughly 5% and 20% depending on schema complexity. The cause is structural, not a bug per se. Pretraining data is overwhelmingly conversational prose, so the next-token distribution naturally favors human-readable wrappers over bare machine payloads. RLHF compounds the problem: reward models are known to exhibit verbosity bias, scoring longer, more explanatory answers higher than terse ones. The model has been trained to look helpful, and a bare JSON object without context reads as curt. "Here is the JSON you requested" is a learned politeness ritual, not a parsing bug. See also Sycophancy in LLM Responses for a related RLHF artifact. Prompt-based workarounds ("reply with ONLY valid JSON, no other text") reduce but do not eliminate drift, because they still rely on the model's sampled distribution. The reliable fixes intervene at decoding or API level: - Constrained decoding: at each generation step, mask the logits of any token that would violate the target grammar. Libraries like Outlines and guidance compile a JSON Schema or context-free grammar into a finite-state machine that gates valid tokens. Output is syntactically guaranteed; the model never emits a stray markdown fence. - Structured Outputs (OpenAI's response_format with json_schema and strict: true, introduced August 2024) and Google Gemini's response_schema apply the same idea behind the API. OpenAI reported 100% schema adherence on their internal eval versus under 40% for prompt-only GPT-4. - Function Calling (LLM) and tool-use APIs (including Anthropic tool use) reuse the same constrained-decoding machinery for tool arguments. - Fine-tuning on format-strict examples improves baseline behavior but does not give hard guarantees. - Post-hoc validation with retry — parse the output, on failure feed the parser error back as a correction prompt — remains useful as a fallback when constrained decoding is unavailable (e.g. behind an OpenAI-incompatible endpoint). The practical takeaway: if a downstream system parses the model's output, do not rely on prompting for format compliance. Either use a provider's structured-output mode, a constrained decoding library locally, or validate-and-retry with a strict parser.

Format-Following Failures in LLMs

Related Knowledge

Constrained Decoding

Structured Outputs (OpenAI)

Have insights to add?