HTML Entities Lookup: 60-Entry Curated Reference for HTML Authors

HTML Entities Lookup: a compact, keyboard-first reference for HTML entities

HTML Entities Lookup is a compact reference for 60 commonly used HTML entities with search, three copy formats per entry, keyboard shortcuts and a GitHub repo.

Why a small, curated HTML entities reference matters

When you write about HTML or Markdown, you continually need to display characters literally — the less-than sign, copyright and trademark symbols, or arrows — without the browser interpreting them as markup. That problem is solved by HTML entities, but the WHATWG list contains more than two thousand entries, many of which are obscure or rarely used. HTML Entities Lookup was created from a simple frustration: repeatedly mistyping entity forms from memory and losing time verifying code points. The project bundles a carefully chosen set of roughly sixty entities into a single, browsable reference that aims to surface the forms people actually type in blog posts, documentation, and small projects.

The tool is available as a live demo and as a GitHub repository, and it intentionally ships with no runtime dependencies and no build step. That makes it lightweight to inspect and reuse: open the demo, type, copy, done.

Curation strategy and trade-offs

Rather than mirroring the full WHATWG entity catalogue, the author deliberately shrank the surface to a hand-picked set across six categories (basic, punctuation, currency, math, arrows, Greek). The stated reasons are practical: a 2,200-row grid is hard to browse; many listed entities are single-use curiosities; and a smaller dataset is much easier to test for data integrity. The underlying philosophy — curation over completeness — treats the reference as a fast, human-facing tool rather than an exhaustive machine-oriented index. That design choice prioritizes discoverability and speed of use over catalog completeness.

Data model and search design

At the heart of the project is a compact, five-field data model for each entry: name, character, codepoint (decimal), category, and a localized description containing English and Japanese text. This data shape enables multi-axis search: queries are matched against the entity name, the literal character, the decimal code, both localized descriptions, and the category.

Search is implemented to be forgiving and predictable. Empty queries return the full set; non-empty queries are trimmed and lowercased for comparison. The implementation treats single-character queries specially: a strict equality check matches a literal character (so searching for © returns that entity alone), while longer queries use substring matches across name, category, and description. This avoids pathological matches where a symbol like & would otherwise match common words (for example, "and") through naive substring matching.

Because each entry includes both the character and its numeric code, the search can handle inputs such as the glyph itself (paste ©), a name (copy), a decimal (169), or a description term (copyright). That flexibility lets users type whatever first comes to mind and immediately find the entity they need.

Three copy formats and clipboard handling

HTML provides three canonical ways to encode an entity in source: a named entity (readable and familiar in HTML), a decimal numeric form (suitable for XML contexts), and a hexadecimal numeric form (handy when cross-referencing Unicode tables). Each entity card in the grid exposes all three forms via individual copy controls.

Rather than wiring three independent handlers per button, the implementation encodes the exact string to copy in a data-copy attribute on each button. A single click handler is attached to the grid container and uses event delegation to detect clicks on any element carrying data-copy. When a copy action occurs, the handler writes the encoded string to the clipboard (using the browser clipboard API) and surfaces a short toast-style confirmation. This approach keeps the runtime footprint small and avoids attaching hundreds of listeners: with six categories, sixty cards, and three buttons each, the code handles roughly 180 copy controls with a single delegated listener.

Keyboard-first user experience

Speed is a core objective for a reference tool: the ideal interaction is open the page, type, click a copy button, and continue. To that end, the search input is autofocus-enabled so it receives focus on page load. The project also borrows a familiar GitHub-style shortcut: pressing the forward slash key (/) focuses the search field unless the user is already typing into an input. The keydown handler prevents the default behavior and moves focus to the search control, reducing pointer travel for people who prefer keyboard-driven workflows.

These two small UX details — autofocus and a single-key shortcut — shorten the edit cycle to a few seconds when you already know the entity name or code.

Testing strategy and data integrity

The project includes automated tests run with node –test. The test suite covers data integrity and search behavior across thirteen cases. Core assertions verify that every entity record contains the required fields (name, character, numeric codepoint, category, and localized descriptions), that names are unique, and that searching returns expected results for direct-character queries and description-based queries.

One particularly valuable check compares the character to the declared codepoint by asserting that char.codePointAt(0) equals the numeric code field. That arithmetic cross-validation catches typographical errors where someone might accidentally assign the wrong decimal value to a glyph. Because the test compares two fields of the dataset instead of re-querying an external Unicode table, it provides immediate protection against simple mistakes.

Implementation patterns that scale down complexity

Several small patterns in the codebase are worth noting for their economy and clarity:

The five-field entity record consolidates all the information required for searching and rendering into a predictable shape, making filters and UI rendering straightforward.
The search normalization step trims and lowercases input once, keeping match conditions simple and avoiding duplicated logic.
Single-character inputs are handled with strict equality, while other queries use substring includes; this avoids noisy matches and keeps results relevant.
Copy controls embed the exact copy string in a data attribute so the click handler is the same for all buttons, simplifying event handling.
Event delegation from the container to individual buttons keeps listener counts minimal and makes teardown trivial.

These patterns favor a small, auditable codebase over a heavier UI framework or a large dataset that would require more complex search infrastructure.

Who this tool is for and how it fits workflows

HTML Entities Lookup targets writers, technical authors, and developers who frequently need to include literal glyphs in prose or documentation. It reduces friction when authoring HTML or XML by making it trivial to retrieve the named, decimal, or hex form of a character and copy it to the clipboard. Because the dataset emphasizes commonly used entities, it aims to satisfy the typical needs of content creators rather than every edge case in the Unicode repertoire.

The repository and demo are intentionally simple — no dependencies, no build — which makes it easy to inspect or fork for a personalized reference. That design choice also lowers the barrier to reuse in different contexts, from local documentation tooling to a personal bookmarks collection.

Broader implications for reference tooling and developer productivity

The project showcases a compact pattern for reference tools that emphasizes curation, speed, and low friction. Rather than attempting to surface every possible item in a large canonical dataset, a focused collection of well-chosen entries can be more usable in real-world authoring workflows. The combination of a small, well-structured dataset, predictable search semantics, and keyboard-first UX reduces cognitive load and cycle time for common tasks.

For teams and maintainers of documentation tooling, the approach highlights two practical ideas: validate linked fields with simple arithmetic or consistency checks to catch data-entry errors, and prefer a small, well-tested dataset that can be audited and extended deliberately. The use of event delegation and data attributes also reinforces common front-end best practices for keeping interactions fast and memory-efficient.

How to try it and contribute

The project is demonstrated as a live demo and the source code is hosted on GitHub. The author notes that the dataset is part of a public portfolio series (entry number fifteen in a 100+ item series) and explicitly invites issues if a commonly used entity is missing. Those interested in running the tests can do so with node –test to exercise the suite of thirteen cases that validate fields, uniqueness, codepoint correctness, and search behavior.

The repository’s minimal dependency surface and lack of build step make it straightforward to clone, run, and experiment with the dataset and UI.

The author’s curatorial stance — selecting entries people actually use and keeping the tool fast to browse — makes HTML Entities Lookup a practical microtool for content authors and documentation writers. If you frequently need literal characters in HTML or XML, the tight search loop (type, copy, paste) and the three-copy-format buttons remove small but recurring friction that would otherwise cost time across many writing sessions.