AI Behavior
Lost in the Middle: Position Bias in Long-Context LLMs
Liu et al.'s 2023 "Lost in the Middle" paper (TACL 2024) showed that language models given long contexts attend best to information at the start and end of the input, with accuracy tracing a U-shaped curve as the relevant passage moves toward the middle. The effect appears across GPT-3.5, Claude, LongChat, and MPT, persists in extended-context variants, and is widely attributed to rotary position embeddings and causal attention. The finding drove practical changes in RAG pipelines — re-ranking to place top hits at the edges, repeating key instructions, and using benchmarks like Needle in a Haystack to measure how well models actually use their advertised context windows.
Why Asking an LLM to Check Its Own Answer Often Fails
Asking a {{large language model}} to double-check its own answer rarely catches real errors and can degrade accuracy. The critique pass runs on the same weights with the same gaps, and a soft challenge like "are you sure?" often flips a correct answer rather than fixing a wrong one. Self-critique pays off mainly when the model already had the knowledge but executed sloppily, when external information enters the loop, or when a different verifier checks the work.