Base64 and URL Encoding: Purpose, Pitfalls, Correct Usage

Published 2026-04-13 7 min read

Summary (TL;DR)

Run echo -n "Hi" | base64 and you get exactly four characters back: SGk=. Two input bytes (16 bits) get sliced into 6-bit chunks, padded out to a 4-character output, and that is the entire arithmetic. Treat that simple rule as a default of “wrap everything to be safe” and the cost compounds in a hurry — three input bytes always become four output characters, which is about 33% inflation, and a 1.2 MB JPEG I once watched a service Base64-wrap into a JSON field landed at roughly 1.64 MB on the wire. Encoding is not encryption, and it is not compression. It is a set of rules for expressing one alphabet inside another so the data survives a channel that would otherwise mangle it. Base64 (RFC 4648 §4) exists to ship arbitrary bytes through systems that only understand text — email bodies, JSON string fields, inline image data. Base64URL (the same RFC, §5) swaps two characters (+-, /_) so the result is safe to drop into a URL path, a filename, or a JWT. Percent-encoding (RFC 3986) is a separate tool: it converts characters that mean something to URLs (?, &, #, space, Unicode) into %XX sequences that do not. Pick the encoding that matches the channel: binary if the path supports it, Base64 for text channels, Base64URL for URL-embedded data, and percent-encoding for anything already in a URL context. Revealed Base64 strings remain readable to anyone with a decoder, so never treat encoding as a secrecy mechanism.

Background

Text-only channels existed long before anyone asked them to carry images. Email historically assumed 7-bit ASCII and stripped or corrupted bytes with the high bit set; HTTP headers still restrict certain characters; JSON string fields are UTF-8 but cannot hold raw control bytes (0x00–0x1F) safely. Base64, standardized in RFC 4648, solves the problem by mapping every 3 input bytes (24 bits) onto 4 output characters of 6 bits each. The alphabet is 64 characters: A-Z (indices 0–25), a-z (26–51), 0-9 (52–61), + (62), / (63), with = as padding to keep the output length a multiple of four. One byte of input produces XX== (8 bits → 6+2 padded), two bytes produces XXX= (16 bits → 6+6+4 padded), and three bytes produces XXXX without padding.

The URL-safe variant of Base64 only changes alphabet indices 62 and 63 — + becomes -, / becomes _ — because + decodes as a space in legacy form submissions and / is a path separator. In many URL-safe contexts the = padding is dropped as well; this is the form used in JSON Web Tokens (RFC 7519), where a compact header.payload.signature triple has to fit in an Authorization: Bearer ... header without re-escaping.

Percent-encoding (sometimes called URL-encoding) is separate. RFC 3986 defines two categories of ASCII: unreserved characters (A-Z, a-z, 0-9, -, ., _, ~) that may appear verbatim, and reserved characters (:, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, =) that mean something structurally and must be encoded when they appear as data rather than syntax. Any byte outside that set — including the bytes of a UTF-8–encoded non-ASCII character — is written as %XX in hexadecimal. encodeURIComponent in JavaScript encodes both reserved and non-reserved characters aggressively for use inside a component; encodeURI is more permissive because it assumes you are passing a whole URL.

A subtle consequence of these three encodings is that they are not interchangeable, even when the bytes they produce happen to overlap. A percent-encoded query parameter is still ASCII characters encoding other ASCII characters; feeding it through a Base64 decoder produces garbage. A standard Base64 string dropped into a URL path silently breaks because the / inside it is read as a path separator — I have watched a colleague lose half a day to a single such character. These mix-ups are the source of half the encoding bugs in production.

Data / Comparison

PropertyBase64 (standard)Base64URLPercent-encoding
When to useText channel that rejects raw binary (email MIME, JSON string of bytes)Same, but embedded in a URL, filename, or JWTA single URL component containing reserved or non-ASCII characters
AlphabetA–Z a–z 0–9 + / =A–Z a–z 0–9 - _ (padding optional)Unreserved set; everything else becomes %XX
OverheadAbout 33% (4 chars per 3 bytes), plus 76-character line wrapping in MIMEAbout 33%, no padding in typical useVaries — 1 byte of UTF-8 becomes 3 ASCII bytes (%XX)
Breaks onRaw + and / in URLs, un-padded length in strict decodersStandard Base64 decoders that require +/=Double-encoding (already percent-encoded data gets encoded again)
Common usesMIME attachments, data URIs, inline binaries in JSONJWT, URL-safe IDs, short linksQuery parameters, path segments, form bodies

The numbers matter for budgeting. A 1 MB image Base64-encoded becomes roughly 1.37 MB; with MIME’s 76-character line wrapping, add another 2% or so. In an HTTP response that size inflation hits both the server and the client’s parser. Percent-encoding is usually a smaller issue at the string level but can multiply bytes for CJK text: a Korean character in UTF-8 takes 3 bytes, which become 9 ASCII characters after encoding. The two-character word “안녕” turns into the 18-byte sequence %EC%95%88%EB%85%95 inside a URL.

Real-world Scenarios

Scenario 1 — Email attachments. A PDF travels inside a MIME part with Content-Transfer-Encoding: base64. The mail client Base64-encodes the file, wraps lines at 76 characters (the classic MIME limit), and sends it. The receiver reverses the process. The text-only assumption of SMTP made Base64 the default here long before modern extensions like 8BITMIME or BINARYMIME existed, and most mail servers still emit Base64 for safety. The 33% overhead is accepted as the cost of reliable delivery.

Scenario 2 — JSON Web Tokens. A JWT (RFC 7519) is header.payload.signature, where each segment is Base64URL-encoded JSON or bytes, without padding. The URL-safe alphabet means tokens can appear in Authorization headers, access_token query parameters, and log lines without re-escaping. The absence of padding keeps them short. One warning: anyone decoding a JWT can read the claims — the encoding is not security; the HMAC or RSA signature is. Do not put secrets in the payload.

Scenario 3 — Data URIs for small images. A background-image: url("data:image/png;base64,iVBORw0K...") inlines a PNG directly into CSS. This avoids a round-trip for tiny assets like icons. But Base64’s 33% overhead and the loss of browser caching and parallel requests mean it is only a win for assets small enough that the extra HTTP round-trip would have cost more. In my experience the break-even is roughly 1–2 KB; above that, a separate cached file or an SVG is usually faster.

Scenario 4 — Uploading a user avatar. A common anti-pattern is to read a file with FileReader.readAsDataURL, which returns a data:image/png;base64,... string, and then POST that string as a JSON field. It works, but it is almost never the right shape: the payload is now 33% larger, the server has to decode it before writing to disk, and both sides burn memory on the string form. In one case I observed, a 5 MB image ballooned to roughly 6.7 MB of JSON and caused timeouts on mobile networks; switching to multipart/form-data shipped the same file at 5 MB. Reach for Base64 in this flow only when the transport layer genuinely requires a JSON string, such as when a third-party API does not accept multipart.

Common Misconceptions

“Base64 is encryption.” It is not. The mapping is published in RFC 4648, the alphabet is fixed, and any decoder returns the original bytes. Encoding protects the transport, not the contents. If the payload is sensitive, encrypt first and Base64 the ciphertext for transport.

“URL encoding is only needed for non-ASCII characters.” Many reserved ASCII characters also have to be encoded when they appear as data. A query value containing & must become %26, or the server will parse it as a new parameter. # must become %23, or everything after it gets treated as the fragment. The rule is structural, not character-set–based.

“Everything going through HTTP should be Base64-encoded for safety.” HTTP happily carries binary bodies — Content-Type: application/octet-stream, chunked transfer, any byte value. Base64 is a workaround for channels that do not, and paying a 33% overhead when the channel already handles bytes is pure tax. Use multipart/form-data or a raw body for file uploads, and reserve Base64 for the cases where the channel genuinely needs text (a JSON field, a URL parameter, a MIME body).

“Base64 and Base64URL are interchangeable.” They are not. Feeding a URL-safe string into a strict Base64 decoder that expects +/ will fail or produce garbage. Libraries usually provide both; match the encoder to the decoder end to end. Node’s Buffer.from(s, 'base64') accepts either alphabet, but not every standard library is that lenient.

Checklist

  1. Is the channel 7-bit-only or structured text? Email body, JSON string of bytes, CSS data: URI → Base64.
  2. Is the encoded value going into a URL, filename, or JWT? Use Base64URL and decide whether padding is allowed.
  3. Is the value a URL component already? Percent-encode, and only the parts that need it.
  4. Is the data large? Consider whether the channel really requires encoding — a raw binary body avoids 33% overhead.
  5. Is decoding strict or permissive? Match standard/URL-safe alphabets end to end; do not mix.
  6. Is confidentiality a goal? Encrypt before encoding. Encoding alone never makes data secret.

The Patrache Studio Base64 encoder/decoder handles both standard and URL-safe variants locally, so the input bytes do not leave your browser — useful when the data is a token or a small key. If the Base64 payload happens to be a JWT, pair it with JSON Formatting, Validation, and Schema in Practice to inspect the decoded claims cleanly. And if the encoded blob is a compact ID derived from a UUID, UUID v1 vs v4 vs v7: Picking a DB Primary Key explains why the original bytes you Base64URL-encoded still matter for sort order and indexing.

References