Web Standards

5 chunks

RFC 3986

RFC 3986 (2005) is the IETF specification that defines the generic syntax of Uniform Resource Identifiers, superseding RFC 2396 and RFC 1738 and serving as Internet Standard 66.

94%
0

Percent-Encoding

Percent-encoding is the URI mechanism that replaces a byte with `%` plus two hexadecimal digits so characters with special syntactic meaning, or bytes outside ASCII, can be carried inside a URL without ambiguity.

94%
0

Why URL Encoding Is So Confusing

URL encoding tangles three different rule sets: RFC 3986 percent-encoding for URIs, application/x-www-form-urlencoded for HTML forms (where + means space), and the WHATWG URL Standard's context-specific encode sets. The mismatches explain bugs like + in Gmail addresses, double-encoded %2520, and URLs that work in a browser but break in curl.

92%
0

WHATWG URL Standard

The WHATWG URL Standard is a living specification that defines URL parsing, serialisation and encoding for the modern web, intended to describe what browsers actually do rather than the abstract grammar of RFC 3986.

92%
0

Why Unicode Has Four Normalization Forms

Unicode permits more than one byte sequence to represent the same visible string, so the standard defines four normalization forms — NFC, NFD, NFKC, and NFKD — that collapse those alternatives to a single canonical shape. The pair NFC/NFD handles canonical equivalence (precomposed vs. decomposed accents), while NFKC/NFKD additionally fold compatibility variants like ligatures and full-width letters. Each form has a niche: storage and display, sorting and linguistic processing, or loose search and matching.

92%
0