HTML Entity Encoder / Decoder
Encode and decode HTML entities with named and numeric entity support.
About HTML Entity Encoder / Decoder
HTML entities are special sequences of characters that represent reserved or invisible characters within HTML documents. Because HTML uses angle brackets, ampersands, and quotation marks as part of its own syntax, embedding these characters directly in content creates ambiguity for the browser's parser. HTML entities resolve this by providing escape sequences -- either named references like & for an ampersand, or numeric references like < for a less-than sign -- that the browser decodes and renders as the intended character without confusing it for markup.
The HTML specification defines over 2,000 named character references covering everything from basic punctuation (", <, >) to mathematical symbols (∑, ∞), currency signs (€, ¥), arrows (←, →), and the full range of typographic characters (—, ‘, …). Numeric entities can be expressed in decimal (&) or hexadecimal (&) form, giving developers a universal fallback for any Unicode code point even when no named reference exists. This makes numeric entities particularly useful for emoji, CJK characters, and other symbols outside the basic Latin range.
Beyond correctness, HTML entity encoding plays a critical role in web security. Cross-site scripting (XSS) attacks exploit situations where user-supplied content is inserted into a page without proper encoding, allowing an attacker to inject script tags or event handlers. By encoding output so that angle brackets become < and >, and quotation marks become " or ', the browser treats the content as inert text rather than executable markup. This output encoding is one of the primary defenses recommended by the OWASP Foundation, and understanding how and when to apply it is essential knowledge for every web developer working with server-rendered HTML, templating engines, or content management systems.
How to Use the HTML Entity Encoder / Decoder
- Paste or type your text into the input panel on the left side of the tool.
- Select the operation mode: Encode to convert characters into HTML entities, or Decode to convert entities back to readable text.
- Choose the entity format: Named entities (e.g., &) for readability, or Numeric entities (e.g., &) for universal compatibility.
- Toggle whether to encode all characters or only special HTML characters (<, >, &, ", ') that could conflict with markup.
- View the converted output in real-time in the right panel as you type or modify settings.
- Use the Swap button to move the output back into the input field for round-trip testing or iterative encoding.
- Copy the result to your clipboard with the copy button for use in your HTML templates, emails, or source code.
Common Use Cases
Preventing XSS in User-Generated Content
When displaying user-submitted text such as comments, forum posts, or profile descriptions in an HTML page, encoding special characters before insertion into the DOM prevents attackers from injecting script tags or malicious event handlers. This output encoding converts <script> into <script>, rendering it as visible text rather than executable code. It serves as a critical last line of defense alongside input validation and Content Security Policy headers.
Embedding Code Snippets in HTML Documents
Technical documentation, tutorials, and blog posts frequently display code examples that contain angle brackets, ampersands, and other characters with special meaning in HTML. Encoding these characters ensures the code renders literally in the browser rather than being interpreted as markup. This is especially important when showing HTML or XML code within <pre> and <code> blocks where the raw syntax must be preserved verbatim.
Composing HTML Emails with Special Characters
Email clients have inconsistent support for character encoding, and many strip or misinterpret non-ASCII characters. Using HTML entities for typographic quotes, em dashes, copyright symbols, and currency signs ensures the intended characters display correctly across Outlook, Gmail, Apple Mail, and other clients regardless of the declared charset or the recipient's locale settings.
Sanitizing Data for Template Engines
Server-side template engines like Handlebars, Jinja2, and EJS often auto-escape output by default, but developers sometimes bypass this with raw output modes for trusted content. When inserting any data that originates from external sources -- API responses, database records, CSV imports -- encoding HTML entities before passing values to templates eliminates the risk of accidental markup injection even if auto-escaping is disabled.
Frequently Asked Questions
What is the difference between named and numeric HTML entities?
Named entities use a human-readable alias like & for an ampersand or < for a less-than sign, making source code easier to read and maintain. Numeric entities use the Unicode code point in either decimal (&) or hexadecimal (&) form. Every named entity has an equivalent numeric form, but not every Unicode character has a named entity. Numeric entities are the universal fallback and work in any HTML or XML context, while named entities require the browser or parser to recognize the specific alias.
Which characters must be encoded in HTML content?
At minimum, you must encode the five characters that have special meaning in HTML: the ampersand (&), less-than sign (<), greater-than sign (>), double quote ("), and single quote/apostrophe ('). The ampersand starts an entity reference, angle brackets delimit tags, and quotes delimit attribute values. Failing to encode these characters can break your markup or create security vulnerabilities. Other characters like non-breaking spaces, em dashes, and emoji are optional to encode but can improve cross-platform compatibility.
Does HTML entity encoding prevent all XSS attacks?
HTML entity encoding is highly effective at preventing XSS in HTML body content by neutralizing angle brackets and script injection. However, it is not sufficient in all contexts. Encoding requirements differ depending on where user data appears: within HTML attributes you also need to encode quotes, inside JavaScript blocks you need JavaScript-specific escaping, within URLs you need URL encoding, and inside CSS you need CSS escaping. The OWASP recommendation is to apply context-specific output encoding for each insertion point rather than relying on a single encoding strategy.
Are HTML entities the same as URL encoding or Base64?
No, these are three distinct encoding schemes for different purposes. HTML entities encode characters for safe display within HTML documents. URL encoding (percent-encoding) encodes characters for safe transmission in URLs and query strings, using the %XX format. Base64 encoding converts binary data into an ASCII string representation for transport in text-based protocols. Each encoding scheme has its own reserved characters, syntax, and use cases, and applying the wrong encoding to a given context will produce incorrect results.
Why do some web pages show raw entities like &amp; instead of the intended character?
This usually indicates double encoding, where content that was already entity-encoded gets encoded a second time. The ampersand in & itself gets encoded to &amp;, so the browser displays the literal text & instead of the intended character. Double encoding commonly occurs when data passes through multiple processing layers that each apply encoding independently. The fix is to ensure encoding happens exactly once, typically at the final output stage, and that earlier stages pass raw text rather than pre-encoded strings.
Should I encode all characters or just the special HTML characters?
For most use cases, encoding only the five special HTML characters (<, >, &, ", ') is sufficient and produces the most readable output. Encoding all characters converts every letter, digit, and symbol into its numeric entity form, which dramatically increases the size of the output and makes it nearly impossible to read in source view. Full encoding can be useful in niche scenarios such as obfuscating email addresses from basic scrapers or ensuring absolute compatibility with systems that have unreliable character set handling, but it is not recommended for general use.