HTML entities are special codes that represent characters in HTML. They begin with an ampersand (&) and end with a semicolon (;). HTML entities are necessary because certain characters have special meanings in HTML and must be escaped to display correctly or to prevent security vulnerabilities. For example, < represents the less-than sign (<), and > represents the greater-than sign (>).
Cross-Site Scripting (XSS) is one of the most common web security vulnerabilities. When user input containing HTML or JavaScript is displayed without proper encoding, attackers can inject malicious scripts. HTML entity encoding converts potentially dangerous characters into safe representations.
Example: If a user enters <script>alert('XSS')</script>, encoding converts it to <script>alert('XSS')</script>, which displays as text instead of executing.
HTML uses characters like <, >, and & for its syntax. To display these characters as literal text, they must be encoded. Without encoding, browsers interpret them as HTML markup.
Some characters may not display correctly across different browsers, devices, or character encodings. HTML entities provide a universal way to represent these characters.
| Character | Entity Name | Entity Number | Description |
|---|---|---|---|
< |
< |
< |
Less than sign |
> |
> |
> |
Greater than sign |
& |
& |
& |
Ampersand |
" |
" |
" |
Double quotation mark |
' |
' |
' |
Apostrophe (single quote) |
| Space | |
  |
Non-breaking space |
© |
© |
© |
Copyright symbol |
® |
® |
® |
Registered trademark |
™ |
™ |
™ |
Trademark symbol |
€ |
€ |
€ |
Euro sign |
£ |
£ |
£ |
Pound sign |
¥ |
¥ |
¥ |
Yen sign |
Named entities use descriptive names: <, >, ©. They're easier to read and remember, but not all characters have named entities.
Numeric entities use decimal (<) or hexadecimal (<) character codes. Every Unicode character has a numeric entity, making them more comprehensive than named entities.
Always encode user input before displaying it in HTML. This includes comments, forum posts, profile information, and any other user-submitted content. Never trust user input, as it may contain malicious code.
When inserting dynamic content into HTML attributes, encoding prevents attribute injection attacks:
<input value="user_input_here">
If user_input contains "><script>alert('XSS')</script>, proper encoding prevents the script from executing.
When embedding data in JavaScript strings within HTML, encode it to prevent script injection:
<script>var name = "encoded_user_name";</script>
Special characters in page titles and meta descriptions should be encoded to ensure proper display across all platforms and prevent HTML injection.
In HTML content (between tags), encode <, >, &, and quotes.
In HTML attributes, encode quotes, ampersands, and less-than/greater-than signs. The specific quotes to encode depend on which quote style wraps the attribute.
In JavaScript, use JavaScript escaping (backslash escaping) rather than HTML entity encoding. HTML entities won't work inside <script> tags.
In URLs, use URL encoding (percent encoding) instead of HTML entities. They serve different purposes and aren't interchangeable.
Entity encoding is one layer of security. Also implement:
Most programming languages provide built-in functions or security libraries for HTML encoding. Use these instead of rolling your own solutions, as they're tested and handle edge cases properly.
Encode data just before outputting it to HTML, not when storing it in the database. This preserves the original data and ensures encoding is always applied when needed.
Use the appropriate encoding method for each context. HTML entity encoding doesn't protect against all attacks if used in JavaScript or CSS contexts.
Encoding data multiple times results in the encoding characters being displayed instead of the intended characters. For example, &lt; displays as < instead of <.
Only encoding some special characters leaves your application vulnerable. Always encode all necessary characters for the context.
Using HTML entity encoding in JavaScript or CSS contexts doesn't provide protection. Each context requires its own encoding method.
Many developers remember to encode content between tags but forget about attribute values, creating security vulnerabilities.
A blog allowing user comments must encode all comment text before displaying it. Otherwise, a malicious user could post a comment containing <script> tags that execute when other users view the page.
When displaying search terms like "Results for: [user's search]", encode the search term. Without encoding, searches for <img src=x onerror=alert('XSS')> would execute malicious code.
Error messages that include user input (like "User 'username' not found") must encode the username. This prevents attackers from using error messages to inject code.