Long Context (LLM)
Long context refers to the maximum input length a language model can process in a single forward pass — its context window. Sizes grew from 2K tokens in 2020 to 1M+ by 2024, enabled by efficient attention variants and position-encoding extensions like rotary interpolation, ALiBi, and YaRN. Advertised window size, however, does not equal effective window size: models still show the lost-in-the-middle U-shape and struggle with multi-hop reasoning across long inputs.
"Long context" in large language models refers to the maximum number of input tokens a model can attend to in a single forward pass, commonly called its context window. Window sizes grew from 2K tokens in GPT-3 (2020) to 4K-8K in early GPT-4 and Llama 2 (2023), and then to 100K+ in Claude 2 and 1M+ in Gemini 1.5 (2024). By the mid-2020s context windows of several hundred thousand tokens were routine, and research models had demonstrated 10M-token windows. Enabling long contexts requires changes beyond raw memory. Standard self-attention scales as O(n^2) in sequence length, so practical systems use a mix of FlashAttention, sliding window attention, ring attention, and mixture of experts routing to keep compute tractable. Positional encodings also need adaptation: techniques such as Rotary Position Embedding interpolation, ALiBi, and YaRN let models trained on short sequences generalise to much longer ones. Long contexts unlock workloads that would otherwise require retrieval or chunking: whole-book summarisation, multi-file code analysis, long video understanding (with token compression), and in-context learning from large example sets. However, advertised window size does not equal effective window size. Models routinely score near-perfectly on simple fact-retrieval benchmarks like Needle in a Haystack while still showing the lost-in-the-middle U-shape, struggling with multi-hop reasoning across long inputs, and incurring significant latency and cost (often quadratic or near-quadratic) as input length grows. Long context complements rather than replaces retrieval and explicit memory.