Skip to main content

What are HTML Entities?

Glossary image
character entities HTML character references HTML escape codes

HTML entities are special text codes used in HTML markup to represent characters that either have a reserved meaning in the language or cannot be typed directly in a standard text editor. Rather than inserting the character itself, a developer writes a short code sequence that the browser interprets and renders as the intended symbol.

The most common reason to use an HTML entity is to avoid ambiguity in the markup. In HTML, the less-than sign (<) and the ampersand (&amp;) are reserved characters - they signal the start of a tag and the start of an entity reference, respectively. If you want either symbol to appear as visible text on a page rather than be parsed as code, you must write them as &lt; and &amp;. The browser then displays the correct character without misreading the surrounding markup.

Every HTML entity follows one of two formats. A named entity uses a descriptive keyword between an ampersand and a semicolon, such as &copy; for the copyright symbol © or &nbsp; for a non-breaking space. A numeric entity references the character's position in the Unicode standard, written either in decimal form (for example, &#169;) or hexadecimal form (&#xA9;), both of which also produce ©. Named entities are generally preferred for readability, while numeric entities are useful when no named form exists.

HTML entities were historically important because older character encodings could not reliably transmit every symbol across different systems. Today, the widespread adoption of UTF-8 as the standard encoding for web pages means that most characters - including accented letters, currency symbols, and emoji - can be placed directly in an HTML document without encoding. However, entities remain necessary for the reserved characters mentioned above, and they are still encountered frequently in legacy codebases, content management systems, and automatically generated markup.

Understanding entities is also relevant to semantic HTML and accessibility. A non-breaking space (&nbsp;), for instance, prevents a line break between two words, which has layout implications. Overusing it for spacing purposes - rather than relying on CSS - is considered poor practice, as it introduces invisible formatting characters that can confuse screen readers and make the source harder to maintain.

From an SEO perspective, search engine crawlers parse HTML source code directly, so improperly encoded reserved characters can break markup and prevent content from being indexed correctly. Using the appropriate entities ensures that both browsers and crawlers interpret the page as intended, contributing to a well-formed, reliable document structure.

Have a question?

Get in touch if you'd like to learn more about this topic.

Contact Us