Why URLs Need Encoding
URLs reserve certain characters as structural delimiters: / separates paths, ? starts the query string, & separates query parameters, # marks a fragment. If your data contains any of these characters, browsers and servers would interpret them as structure rather than content — breaking your URL.
Example problem — a search query for "black & white":
/* Wrong: ampersand breaks the parameter parsing */
https://example.com/search?q=black & white
/* Correct: ampersand encoded */
https://example.com/search?q=black%20%26%20white
How Percent-Encoding Works
Each encoded character is replaced with % followed by two hexadecimal digits representing the character's byte value in UTF-8:
| Character | Encoded | Why encoded |
|---|---|---|
| space | %20 or + (in queries) | Not allowed in URLs |
| & | %26 | Parameter separator |
| ? | %3F | Starts query string |
| # | %23 | Starts fragment |
| / | %2F | Path separator |
| = | %3D | Key-value delimiter |
| + | %2B | Means space in query context |
| % | %25 | Escape character itself |
| : | %3A | Scheme separator |
Reserved vs Unreserved Characters
Per RFC 3986:
- Unreserved (never need encoding):
A-Z a-z 0-9 - _ . ~ - Reserved (encode when they'd be confused with structure):
: / ? # [ ] @ ! $ & ' ( ) * + , ; = - Other (always encode): spaces, non-ASCII, control characters
Unreserved characters are guaranteed safe to pass through any URL context unchanged. Reserved characters are safe only when they're not being interpreted as structure — but since you usually don't control interpretation, encoding them is the safe default.
encodeURI vs encodeURIComponent
JavaScript provides two built-in functions that behave differently:
// encodeURI: encodes spaces and non-ASCII but leaves URL structure intact
encodeURI("https://example.com/path with spaces?a=b&c=d");
// Result: "https://example.com/path%20with%20spaces?a=b&c=d"
// encodeURIComponent: encodes EVERYTHING including URL structure
encodeURIComponent("https://example.com/path with spaces?a=b&c=d");
// Result: "https%3A%2F%2Fexample.com%2Fpath%20with%20spaces%3Fa%3Db%26c%3Dd"
Rule of thumb: use encodeURIComponent for individual query values. Use encodeURI only when encoding a complete URL that might contain spaces, and you want to preserve the URL structure.
// Correct way to build a query URL:
const query = "black & white";
const url = `https://example.com/search?q=${encodeURIComponent(query)}`;
// Result: "https://example.com/search?q=black%20%26%20white"
UTF-8 and Non-ASCII Characters
Non-English characters like ñ, 中, 한, 日 are multi-byte in UTF-8. Each byte gets its own %HH escape:
encodeURIComponent("ñ"); // "%C3%B1" (2 bytes)
encodeURIComponent("中"); // "%E4%B8%AD" (3 bytes)
encodeURIComponent("한"); // "%ED%95%9C" (3 bytes)
encodeURIComponent("😀"); // "%F0%9F%98%80" (4 bytes — emoji)
Modern browsers display unencoded UTF-8 characters in the address bar for readability (IDN — Internationalized Domain Names), but the underlying HTTP request still sends the percent-encoded version. This is why a URL looks clean in your browser but shows %E4%B8%AD when you copy-paste it.
Query String: + vs %20 for Spaces
In query strings (after ?), + is a valid alternative for space. In paths, it's literal "+". This inconsistency causes real bugs:
// encodeURIComponent always produces %20, never +
encodeURIComponent("hello world");
// "hello%20world"
// Browsers decode BOTH + and %20 as space in query strings
// Result of search?q=hello+world == search?q=hello%20world
// But in path segments, + stays as literal +
// /tags/c%2B%2B → "c++" (encoded)
// /tags/c++ → "c++" (no special meaning)
Safer to always use %20 everywhere. The + shortcut is a legacy HTML form convention that causes ambiguity outside of pure query-string contexts.
Decoding URLs
// JavaScript decoding
decodeURIComponent("hello%20world"); // "hello world"
decodeURIComponent("black%20%26%20white"); // "black & white"
decodeURIComponent("%E4%B8%AD"); // "中"
// Reversibility: encode then decode gives you the original
const original = "Hello, 世界! & more";
const encoded = encodeURIComponent(original);
const decoded = decodeURIComponent(encoded);
decoded === original; // true
Common Pitfalls
- Double-encoding — accidentally calling
encodeURIComponenttwice."hello world"becomes"hello%2520world"(the%got re-encoded). Decode once correctly reverses it; decoding twice garbles. - Using encodeURI for parameter values — it doesn't encode
&, so an ampersand in user input breaks the URL structure. Always useencodeURIComponentfor values. - Trusting URLs in forms — HTML forms encode with
application/x-www-form-urlencoded, which uses+for spaces. Server-side parsers must handle both+and%20. - Encoding already-encoded URLs — if you're passing a full URL as a query parameter (e.g., a redirect URL), encode it once with
encodeURIComponent. Don't pre-encode.
Frequently Asked Questions
- What's the maximum URL length?
- No hard spec limit, but practical limits: browsers support 2,000-8,000 characters (IE was 2,083 — still quoted as "the limit"). Servers may reject longer. For anything beyond a few hundred characters of parameter data, use a POST request with a body instead of a GET query string.
- Should I encode the full URL or just the parameters?
- Encode individual parameter values with
encodeURIComponent. Never encode the full URL — you'd convert the//in the scheme, breaking it. Build the URL structure manually with${encodeURIComponent(value)}interpolation. - Why does my URL look fine in the browser but fail when shared?
- Browsers display UTF-8 characters unencoded for readability. When copied, some systems get the displayed version and don't re-encode properly. Always generate URLs with
encodeURIComponent— don't rely on the browser's display form being portable. - Is URL encoding case-sensitive?
- The hex digits themselves are not —
%3Fand%3fboth mean?. Per RFC 3986, uppercase is preferred for consistency, and some strict parsers only accept uppercase. Use uppercase when generating. - What about URL encoding vs HTML entity encoding?
- They're different and not interchangeable. HTML entities (
&,<) escape characters in HTML documents. URL encoding (%26,%3C) escapes characters in URLs. If you embed a URL inside HTML, you may need both — URL encode the URL first, then HTML-entity encode the ampersands in the resulting URL string.