Files
awesome-awesomeness/html/codepoints.html
2025-07-18 22:22:32 +02:00

404 lines
17 KiB
HTML
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<h1 id="awesome-code-points-awesome">Awesome Code Points <a
href="https://github.com/sindresorhus/awesome"><img
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
alt="Awesome" /></a></h1>
<p>This is a curated list of characters in Unicode, that have
interesting (and maybe not widely known) features or are awesome in some
other way.</p>
<h2 id="table-of-contents">Table of Contents</h2>
<ol type="1">
<li><a href="#standalone-code-points">Standalone code points</a></li>
<li><a href="#code-points-that-affect-others">Code points that affect
others</a>
<ol type="1">
<li><a href="#breaking-and-gluing-other-characters">Breaking and Gluing
other characters</a></li>
</ol></li>
<li><a href="#record-holders-and-extremes">Record holders and
extremes</a></li>
<li><a href="#for-funsies">For funsies</a>
<ol type="1">
<li><a href="#games">Games</a></li>
</ol></li>
<li><a href="#other-lists-of-code-points">Other Lists of Code
Points</a></li>
<li><a href="#contributing-your-code-points">Contributing</a></li>
<li><a href="#license">License</a></li>
</ol>
<h2 id="standalone-code-points">Standalone Code Points</h2>
<ul>
<li><p>The code points of the Unicode blocks <a
href="https://codepoints.net/box_drawing">Box Drawing</a> (U+2500 to
U+257F) and <a href="https://codepoints.net/block_elements">Block
Elements</a> (U+2580 to U+259F) cover most of your monospace
command-line visualization needs.</p>
<pre><code> ╭───────╮
│Unicode│
│rules! │
╰┬─────┬╯</code></pre></li>
<li><p><a href="https://codepoints.net/U+2E2E">U+2E2E</a> REVERSED
QUESTION MARK - the “irony mark” to express irony/sarcasm. A useful
character⸮</p></li>
<li><p><a href="https://codepoints.net/U+D800">U+D800</a> to <a
href="https://codepoints.net/U+DFFF">U+DFFF</a> - surrogate code points.
They are only reserved to ease <a
href="https://en.wikipedia.org/wiki/UTF-16">UTF-16
encoding</a>.</p></li>
<li><p><a href="https://codepoints.net/U+FEFF">U+FEFF</a> ZERO WIDTH
NO-BREAK SPACE - its name suggests, that it can be used like U+2060
WORD JOINER. And in fact the latter was introduced to inherit its
semantics. This is because U+FEFF had become a special beacon called the
<a href="https://en.wikipedia.org/wiki/Byte_order_mark">byte order
mark</a>, that was placed on the beginning of some UTF-8 files. In
complying software (including many text editors) this character is
stripped from the start of a file and handled as metadata. In
non-complying software (like the PHP interpreter) this leads to all
sorts of fun behaviour.</p></li>
<li><p><a href="https://codepoints.net/U+FFFD">U+FFFD</a> REPLACEMENT
CHARACTER - when a character cannot be displayed (e.g., decoding an
erroneous UTF-8 sequency), this code point steps into the
breach.</p></li>
<li><p><a href="https://codepoints.net/U+1D455">U+1D455</a> is missing.
It would be an italic small “h”. It was not encoded, because it would be
identical to the Planck constant (<a
href="https://codepoints.net/U+210E">U+210E</a>).</p></li>
<li><p><a href="https://codepoints.net/U+FF03">U+FF03</a> FULLWIDTH
NUMBER SIGN - it is the “Japanese Hashtag” <code></code>. Sites like
Twitter accept it as equivalent to the regular <code>#</code> (<a
href="https://codepoints.net/U+0023">U+0023</a>).</p></li>
</ul>
<h2 id="code-points-that-affect-others">Code Points that Affect
Others</h2>
<ul>
<li><p><a href="https://codepoints.net/U+202D">U+202D</a> and <a
href="https://codepoints.net/U+202E">U+202E</a> - change the text
direction. Relevant XKCD:</p>
<p><a href="https://xkcd.com/1137/"><img
src="http://imgs.xkcd.com/comics/rtl.png" /></a></p></li>
<li><p><a href="https://codepoints.net/U+FE0E">U+FE0E</a> VARIATION
SELECTOR-15 - force black-<em>&amp;</em>-white emoji. If this code point
follows an emoji, an explicit monochrome rendering of the emoji is
requested (if the client supports it).</p></li>
<li><p><a href="https://codepoints.net/U+FE0F">U+FE0F</a> VARIATION
SELECTOR-16 - force colorful emoji. If this code point follows an emoji,
an explicit colorful rendering of the emoji is requested (if the client
supports it).</p></li>
<li><p>Diacritics and combining marks: There is a <a
href="https://codepoints.net/search?gc=Mn">host of characters</a>, that
add to the characters before. Those are called Combining Marks. Unicode
provides a <a href="http://unicode.org/faq/char_combmark.html">handy
FAQ</a> on the details, but in a nutshell: If you add one after a
character, it is placed on top of that previous one. So,
<code>a + ̊ = å</code>. This <em>may</em> lead to all kinds of funny
problems, because for some combinations there are pre-composed
characters. Our little <code>å</code> here can also be encoded as
U+00E5. You might note, that while this has a length of one character,
the combination of <code>a</code> and combining ring has a length of two
characters.</p>
<p>Of course, one can also do fun things with those characters like <a
href="http://stackoverflow.com/a/1732454/113195">this answer</a> on
StackOverflow.</p></li>
<li><p>The <a href="https://codepoints.net/U+1F1E6..U+1F1FF">Regional
Indicator Symbols</a> U+1F1E6 to U+1F1FF resemble the 26 latin
characters. They are used to create flag emoji. Since the Unicode
consortium didnt feel like getting on board with international
politics, the solution to flags is to combine these 26 characters to the
respective ISO code for a country. Examples:</p>
<table>
<thead>
<tr class="header">
<th>Country</th>
<th>ISO Code</th>
<th>Code Points</th>
<th>Emoji (if supported)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>USA</td>
<td>US</td>
<td>U+1F1FA + U+1F1F8</td>
<td>🇺🇸</td>
</tr>
<tr class="even">
<td>Germany</td>
<td>DE</td>
<td>U+1F1E9 + U+1F1EA</td>
<td>🇩🇪</td>
</tr>
<tr class="odd">
<td>China</td>
<td>CN</td>
<td>U+1F1E8 + U+1F1F3</td>
<td>🇨🇳</td>
</tr>
</tbody>
</table></li>
<li><p>Skin color of emoji: There are five code points, that control the
skin color of emoji, <a
href="https://codepoints.net/U+1F3FB..U+1F3FF">U+1F3FB to U+1F3FF</a>.
They are called “Emoji Modifier Fitzpatrick Type” 1 to 6, with 1 the
palest and 6 the darkest. If one of these characters follows an emoji,
that emoji is meant to be rendered in the appropriate skin color of <a
href="https://en.wikipedia.org/wiki/Fitzpatrick_scale">the Fitzpatrick
scale</a>. If no such modifier is added, the skin tone should be
unnatural, e.g., bright yellow. Fun fact: Since the Fitzpatrick
modifiers are normal code points, emoji with such skin colors have the
length 2, which Twitter users noticed first. Here is a comparison chart
<a
href="http://www.unicode.org/reports/tr51/tr51-2.html#Diversity">directly
from the specification</a>:</p>
<table>
<colgroup>
<col style="width: 14%" />
<col style="width: 68%" />
<col style="width: 16%" />
</colgroup>
<thead>
<tr class="header">
<th>Code</th>
<th>Name</th>
<th>Samples</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>U+1F3FB</td>
<td>EMOJI MODIFIER FITZPATRICK TYPE-1-2</td>
<td><img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-1-2.png" alt="" height="20" width="auto">
<img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-1-2-bw.png" alt="" height="20" width="auto"></td>
</tr>
<tr class="even">
<td>U+1F3FC</td>
<td>EMOJI MODIFIER FITZPATRICK TYPE-3</td>
<td><img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-3.png" alt="" height="20" width="auto">
<img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-3-bw.png" alt="" height="20" width="auto"></td>
</tr>
<tr class="odd">
<td>U+1F3FD</td>
<td>EMOJI MODIFIER FITZPATRICK TYPE-4</td>
<td><img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-4.png" alt="" height="20" width="auto">
<img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-4-bw.png" alt="" height="20" width="auto"></td>
</tr>
<tr class="even">
<td>U+1F3FE</td>
<td>EMOJI MODIFIER FITZPATRICK TYPE-5</td>
<td><img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-5.png" alt="" height="20" width="auto">
<img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-5-bw.png" alt="" height="20" width="auto"></td>
</tr>
<tr class="odd">
<td>U+1F3FF</td>
<td>EMOJI MODIFIER FITZPATRICK TYPE-6</td>
<td><img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-6.png" alt="" height="20" width="auto">
<img src="http://www.unicode.org/reports/tr51/images/other/swatch-type-6-bw.png" alt="" height="20" width="auto"></td>
</tr>
</tbody>
</table></li>
</ul>
<h3 id="breaking-and-gluing-other-characters">Breaking and Gluing other
characters</h3>
<ul>
<li><a href="https://codepoints.net/U+00A0">U+00A0</a> NO-BREAK SPACE -
force adjacent characters to stick together. Well known as
<code>&amp;nbsp;</code> in HTML.</li>
<li><a href="https://codepoints.net/U+00AD">U+00AD</a> SOFT HYPHEN - (in
HTML: <code>&amp;shy;</code>) like ZERO WIDTH SPACE, but show a hyphen
if (and only if) a break occurs.</li>
<li><a href="https://codepoints.net/U+200B">U+200B</a> ZERO WIDTH SPACE
- the inverse to U+00A0: create no space, but allow word breaking.</li>
<li><a href="https://codepoints.net/U+200D">U+200D</a> ZERO WIDTH JOINER
- force adjacent characters to be joined together (e.g., arabic
characters or supported emoji). Apple uses this to compose some emoji
like different families.</li>
<li><a href="https://codepoints.net/U+2060">U+2060</a> WORD JOINER - the
same as U+00A0, but completely invisible. Good for writing
<code>@font-face</code> on Twitter.</li>
</ul>
<p>For better comparison of which code point has which effect, consult
this table:</p>
<table>
<thead>
<tr class="header">
<th> </th>
<th>U+00A0</th>
<th>U+00AD</th>
<th>U+200B</th>
<th>U+200D</th>
<th>U+2060</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>create space</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr class="even">
<td>allow breaking</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr class="odd">
<td>possible change</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>Smashing Magazine featured <a
href="http://www.smashingmagazine.com/2015/10/space-yourself/">a
comprehensive article</a> on the different types of whitespace.</p>
<h2 id="record-holders-and-extremes">Record Holders and Extremes</h2>
<ul>
<li><a href="https://codepoints.net/U+0000">U+0000</a> &lt;control&gt; -
first code point.</li>
<li><a href="https://codepoints.net/U+10FFFF">U+10FFFF</a>
(<em>non-character</em>) - last code point. The whole rest of its plane
apart from U+10FFFE, the code points in the 0x10000-0x10FFFD range, are
private use characters, guaranteed to be never filled by a future
Unicode standard.</li>
<li><a href="https://codepoints.net/U+1F402">U+1F402</a> OX - shortest
name.</li>
<li>U+1FBA8 BOX DRAWINGS LIGHT DIAGONAL UPPER CENTRE TO MIDDLE LEFT AND
MIDDLE RIGHT TO LOWER CENTRE and U+1FBA9 BOX DRAWINGS LIGHT DIAGONAL
UPPER CENTRE TO MIDDLE RIGHT AND MIDDLE LEFT TO LOWER CENTRE - longest
name: 88 characters each.</li>
<li><a href="https://codepoints.net/U+FDFA">U+FDFA</a> ARABIC LIGATURE
SALLALLAHOU ALAYHE WASALLAM - longest decomposition form: 18
characters.</li>
<li><a href="https://codepoints.net/U+5146">U+5146</a> and <a
href="https://codepoints.net/U+16B61">U+16B61</a> - code points that
represent the highest “single-digit” number. In both cases thats
1,000,000,000,000, a trillion.</li>
<li><a href="https://codepoints.net/U+0F33">U+0F33</a> TIBETAN DIGIT
HALF ZERO - code point that represents the <em>lowest</em>
“single-digit” number and at the same time the only negative one,
-½.</li>
<li>The trophy for most useless code points goes to <a
href="https://codepoints.net/U+0080">U+0080</a>, <a
href="https://codepoints.net/U+0081">U+0081</a> and <a
href="https://codepoints.net/U+0099">U+0099</a>. These so-called C1
control characters are more or less unspecified. They got into Unicode,
because they were present in the very first version of what should later
become ISO 10646, the ISO-standardized version of Unicode. They <a
href="http://unicode.org/mail-arch/unicode-ml/y2015-m10/0050.html">were
meant to be part of an upgrade to ISO 2022</a>, that never came to
be.</li>
<li>A close second place in this regard goes to the CJK unified
ideographs <a href="https://codepoints.net/U+599B"></a>, <a
href="https://codepoints.net/U+6327"></a>, <a
href="https://codepoints.net/U+6683"></a>, <a
href="https://codepoints.net/U+6926"></a>, <a
href="https://codepoints.net/U+69DE"></a>, <a
href="https://codepoints.net/U+87D0"></a>, <a
href="https://codepoints.net/U+88AE"></a>, <a
href="https://codepoints.net/U+95A0"></a>, <a
href="https://codepoints.net/U+99F2"></a>, <a
href="https://codepoints.net/U+58B8"></a>, <a
href="https://codepoints.net/U+58E5"></a>, and <a
href="https://codepoints.net/U+5F41"></a>. These so-called <a
href="https://www.dampfkraft.com/ghost-characters.html">“ghost
characters”</a> came to Unicode via the Japanese JIS standard, where
they were added, because they were mis-read or misinterpreted from other
signs, when JIS was compiled from original printed text sources.</li>
<li><a href="https://codepoints.net/U+006F">U+006F</a> LATIN SMALL
LETTER O - leads the list of characters with confusable shapes. Of all
the possible mappings in the <a
href="http://www.unicode.org/reports/tr39/#Data_Files">list of
confusable characters</a>, the small “o” leads with a whopping 73
entries of similar looking glyphs, followed by <a
href="https://codepoints.net/U+006C">U+006C</a> LATIN SMALL LETTER L
with 70 entries.</li>
<li><a href="https://codepoints.net/U+1F4C0">U+1F4C0</a> DVD - only code
point name without any vowel (<a
href="https://twitter.com/ken_lunde/status/960188623390846976">source</a>)</li>
<li><a href="https://codepoints.net/U+3106C">U+3106C</a> CJK UNIFIED
IDEOGRAPH 3106C - the character with the most <a
href="https://en.wikipedia.org/wiki/Chinese_character_strokes">strokes</a>:
84. Take your time to write this one!</li>
</ul>
<h2 id="for-funsies">For Funsies</h2>
<ul>
<li><a href="https://codepoints.net/U+1680">U+1680</a> OGHAM SPACE MARK
- a space that looks like a dash. Great to bring programmers close to
madness: <code>1 + 2 === 3</code>.</li>
<li><a href="https://codepoints.net/U+037E">U+037E</a> GREEK QUESTION
MARK - a look-alike to the semicolon. Also a fun way to annoy
developers.</li>
<li><a href="https://codepoints.net/U+1DD2">U+1DD2</a> COMBINING US
ABOVE - this is the most romantic code point.</li>
<li><a href="https://codepoints.net/U+F8FF">U+F8FF</a> PRIVATE USE
CODEPOINT - this private use code point is rendered as Apple logo on
many Apple devices.</li>
<li><a href="https://codepoints.net/U+1F574">U+1F574</a> MAN IN BUSINESS
SUIT LEVITATING - A rather curious character, that only made it into
Unicode for its appearance in the Webdings font (for reasons of
backwards compatibility).</li>
<li><a href="https://codepoints.net/U+1F596">U+1F596</a> RAISED HAND
WITH PART BETWEEN MIDDLE AND RING FINGERS - the Vulcan salute. Live long
and prosper! 🖖</li>
<li><a href="https://codepoints.net/U+1F918">U+1F918</a> SIGN OF THE
HORNS - Rock on! 🤘</li>
<li><a href="https://codepoints.net/U+2800">U+2800</a> BRAILLE PATTERN
BLANK - A Braille pattern that has zero of its six or eight dots filled
in. According to the standard: “* while this character is imaged as a
fixed-width blank in many fonts, it does not act as a space” Essentially
it is rendered as white-space, but since it is designated as
<em>not</em> white-space it isnt matched by white-space-validating
regular expressions. This can be used to bypass all kinds of validation
that disallows or trims white-space.</li>
</ul>
<h3 id="games">Games</h3>
<p>For plain-text gaming, Unicode is well equipped with several complete
sets:</p>
<ul>
<li><a href="https://codepoints.net/U+2654..U+265F">Chess
figures</a>.</li>
<li><a href="https://codepoints.net/U+2660..U+2667">Card suits</a> and
even a whole <a href="https://codepoints.net/playing_cards">deck of
cards</a> complete with joker and back of card.</li>
<li><a href="https://codepoints.net/U+2680..U+2685">Die faces</a> and a
nice <a href="https://codepoints.net/U+1F3B2">die emoji</a>.</li>
<li><a href="https://codepoints.net/U+2686..U+2689">Go pieces</a>.</li>
<li><a href="https://codepoints.net/U+26C0..U+26C3">Draughts (or
checkers) pieces</a>.</li>
<li><a href="https://codepoints.net/U+2616,U+2617,U+26C9,U+26CA">Shogi
pieces</a>, a <a href="https://en.wikipedia.org/wiki/Shogi">Japanese
variant of chess</a>.</li>
<li><a href="https://codepoints.net/domino_tiles">Domino tiles</a></li>
<li><a href="https://codepoints.net/mahjong_tiles">Mahjong
tiles</a></li>
</ul>
<h2 id="other-lists-of-code-points">Other Lists of Code Points</h2>
<ul>
<li><a
href="https://github.com/ehmicky/cross-platform-terminal-characters">Cross-platform
terminal characters</a> - a list of characters that work on most
terminals.</li>
</ul>
<h2 id="contributing-your-code-points">Contributing Your Code
Points</h2>
<p>See <a href="CONTRIBUTING.md">the contribution guide</a> for
details.</p>
<h2 id="license">License</h2>
<p><a href="https://creativecommons.org/publicdomain/zero/1.0/"><img
src="https://i.creativecommons.org/p/zero/1.0/88x31.png"
alt="CC0" /></a></p>
<p>To the extent possible under law, <a
href="https://github.com/Codepoints/awesome-codepoints/graphs/contributors">the
contributors</a> have waived all copyright and related or neighboring
rights to this work. See <a href="LICENSE">the license file</a> for
details.</p>
<p><a
href="https://github.com/Codepoints/awesome-codepoints">codepoints.md
Github</a></p>