Guides/ Developer
link

URL Encoding Explained: Percent-Encoding, Reserved Chars, UTF-8

URL encoding (percent-encoding) converts characters that have special meaning in URLs — spaces, ampersands, non-ASCII — into %HH escape sequences. This guide explains the mechanics, when encoding is mandatory, the difference between encodeURI and encodeURIComponent, and how UTF-8 handles non-English characters.

April 2026 · 7 min read

Why URLs Need Encoding

URLs reserve certain characters as structural delimiters: / separates paths, ? starts the query string, & separates query parameters, # marks a fragment. If your data contains any of these characters, browsers and servers would interpret them as structure rather than content — breaking your URL.

Example problem — a search query for "black & white":

/* Wrong: ampersand breaks the parameter parsing */
https://example.com/search?q=black & white

/* Correct: ampersand encoded */
https://example.com/search?q=black%20%26%20white

How Percent-Encoding Works

Each encoded character is replaced with % followed by two hexadecimal digits representing the character's byte value in UTF-8:

CharacterEncodedWhy encoded
space%20 or + (in queries)Not allowed in URLs
&%26Parameter separator
?%3FStarts query string
#%23Starts fragment
/%2FPath separator
=%3DKey-value delimiter
+%2BMeans space in query context
%%25Escape character itself
:%3AScheme separator

Reserved vs Unreserved Characters

Per RFC 3986:

  • Unreserved (never need encoding): A-Z a-z 0-9 - _ . ~
  • Reserved (encode when they'd be confused with structure): : / ? # [ ] @ ! $ & ' ( ) * + , ; =
  • Other (always encode): spaces, non-ASCII, control characters

Unreserved characters are guaranteed safe to pass through any URL context unchanged. Reserved characters are safe only when they're not being interpreted as structure — but since you usually don't control interpretation, encoding them is the safe default.

encodeURI vs encodeURIComponent

JavaScript provides two built-in functions that behave differently:

// encodeURI: encodes spaces and non-ASCII but leaves URL structure intact
encodeURI("https://example.com/path with spaces?a=b&c=d");
// Result: "https://example.com/path%20with%20spaces?a=b&c=d"

// encodeURIComponent: encodes EVERYTHING including URL structure
encodeURIComponent("https://example.com/path with spaces?a=b&c=d");
// Result: "https%3A%2F%2Fexample.com%2Fpath%20with%20spaces%3Fa%3Db%26c%3Dd"

Rule of thumb: use encodeURIComponent for individual query values. Use encodeURI only when encoding a complete URL that might contain spaces, and you want to preserve the URL structure.

// Correct way to build a query URL:
const query = "black & white";
const url = `https://example.com/search?q=${encodeURIComponent(query)}`;
// Result: "https://example.com/search?q=black%20%26%20white"

UTF-8 and Non-ASCII Characters

Non-English characters like ñ, , , are multi-byte in UTF-8. Each byte gets its own %HH escape:

encodeURIComponent("ñ");      // "%C3%B1"      (2 bytes)
encodeURIComponent("中");      // "%E4%B8%AD"   (3 bytes)
encodeURIComponent("한");      // "%ED%95%9C"   (3 bytes)
encodeURIComponent("😀");      // "%F0%9F%98%80" (4 bytes — emoji)

Modern browsers display unencoded UTF-8 characters in the address bar for readability (IDN — Internationalized Domain Names), but the underlying HTTP request still sends the percent-encoded version. This is why a URL looks clean in your browser but shows %E4%B8%AD when you copy-paste it.

Query String: + vs %20 for Spaces

In query strings (after ?), + is a valid alternative for space. In paths, it's literal "+". This inconsistency causes real bugs:

// encodeURIComponent always produces %20, never +
encodeURIComponent("hello world");
// "hello%20world"

// Browsers decode BOTH + and %20 as space in query strings
// Result of search?q=hello+world == search?q=hello%20world

// But in path segments, + stays as literal +
// /tags/c%2B%2B → "c++"  (encoded)
// /tags/c++     → "c++"  (no special meaning)

Safer to always use %20 everywhere. The + shortcut is a legacy HTML form convention that causes ambiguity outside of pure query-string contexts.

Decoding URLs

// JavaScript decoding
decodeURIComponent("hello%20world");        // "hello world"
decodeURIComponent("black%20%26%20white");  // "black & white"
decodeURIComponent("%E4%B8%AD");            // "中"

// Reversibility: encode then decode gives you the original
const original = "Hello, 世界! & more";
const encoded = encodeURIComponent(original);
const decoded = decodeURIComponent(encoded);
decoded === original;  // true

Common Pitfalls

  • Double-encoding — accidentally calling encodeURIComponent twice. "hello world" becomes "hello%2520world" (the % got re-encoded). Decode once correctly reverses it; decoding twice garbles.
  • Using encodeURI for parameter values — it doesn't encode &, so an ampersand in user input breaks the URL structure. Always use encodeURIComponent for values.
  • Trusting URLs in forms — HTML forms encode with application/x-www-form-urlencoded, which uses + for spaces. Server-side parsers must handle both + and %20.
  • Encoding already-encoded URLs — if you're passing a full URL as a query parameter (e.g., a redirect URL), encode it once with encodeURIComponent. Don't pre-encode.

Frequently Asked Questions

What's the maximum URL length?
No hard spec limit, but practical limits: browsers support 2,000-8,000 characters (IE was 2,083 — still quoted as "the limit"). Servers may reject longer. For anything beyond a few hundred characters of parameter data, use a POST request with a body instead of a GET query string.
Should I encode the full URL or just the parameters?
Encode individual parameter values with encodeURIComponent. Never encode the full URL — you'd convert the // in the scheme, breaking it. Build the URL structure manually with ${encodeURIComponent(value)} interpolation.
Why does my URL look fine in the browser but fail when shared?
Browsers display UTF-8 characters unencoded for readability. When copied, some systems get the displayed version and don't re-encode properly. Always generate URLs with encodeURIComponent — don't rely on the browser's display form being portable.
Is URL encoding case-sensitive?
The hex digits themselves are not — %3F and %3f both mean ?. Per RFC 3986, uppercase is preferred for consistency, and some strict parsers only accept uppercase. Use uppercase when generating.
What about URL encoding vs HTML entity encoding?
They're different and not interchangeable. HTML entities (&, <) escape characters in HTML documents. URL encoding (%26, %3C) escapes characters in URLs. If you embed a URL inside HTML, you may need both — URL encode the URL first, then HTML-entity encode the ampersands in the resulting URL string.