Why URL Encoding Is So Confusing
URL encoding tangles three different rule sets: RFC 3986 percent-encoding for URIs, application/x-www-form-urlencoded for HTML forms (where + means space), and the WHATWG URL Standard's context-specific encode sets. The mismatches explain bugs like + in Gmail addresses, double-encoded %2520, and URLs that work in a browser but break in curl.
URL encoding looks like a single rule but is actually several overlapping conventions, which is why bugs around it are so common. There are three contexts to keep straight. The first is percent-encoding as defined by RFC 3986, the generic URI syntax. It splits characters into a reserved set (general delimiters like `:/?#[]@` and sub-delimiters like `!$&'()*+,;=`) and an unreserved set (`A-Z a-z 0-9 - . _ ~`). Unreserved characters never need encoding; anything else that would collide with structure is replaced by `%` plus two hex digits of its UTF-8 byte. A literal space becomes `%20`. The second context is application/x-www-form-urlencoded, the media type browsers use for HTML form submissions in POST bodies and (historically) GET query strings. It is a modified percent-encoding inherited from RFC 1738: spaces become `+` instead of `%20`, and a literal `+` must be sent as `%2B`. Decoders that treat a query string as form data will silently turn `+` back into a space, while strict RFC 3986 parsers treat `+` as a sub-delimiter that means itself. The third context is the WHATWG URL Standard, the living spec implemented by browsers. It defines several percent-encode sets (path, query, fragment, userinfo, component) so the same character may be encoded in the query but not in the path. JavaScript's `encodeURI` preserves reserved delimiters so a whole URL stays parseable, while `encodeURIComponent` encodes them, which is why it is the correct choice for a single value inside a path or query. Common bugs follow from mixing these contexts. A `+` inside a Gmail address such as `user+tag@gmail.com` is legal under RFC 5322, but a naive form encoder that does not promote it to `%2B` will deliver mail to `user tag`. Double-encoding (`%2520` instead of `%20`) appears when middleware encodes an already-encoded value, often when a redirect or template re-runs `encodeURIComponent`. URLs that paste cleanly into a browser address bar can still fail in `curl` because the shell strips quotes or expands `&` and `?` before the request is built. And because Punycode hides non-ASCII hostnames as `xn--` labels, what users see in the address bar may differ byte-for-byte from what is sent on the wire.