Joshua's Docs - Text, Character Encoding, and Misc. Reference


Where to Get Data


One of the slightly confusing things about Emoji and it's overlap with Unicode is how to describe or label emoji.

There are multiple readable properties that can belong to an emoji - using U+1F4A1 - 💡, as an example:

  • Name
    • What: This is an immutable / permanent unique ID for the character, specific by Unicode
      • Always in English
      • Not intended to be translated
      • "limited to uppercase ASCII letters, digits and hyphen"
    • Note: Many websites will mix this up with the CLDR short name or alias. Or, if they do have the correct name, they use proper case instead of the true upper case.
    • AKA Unicode Standard Name, Unicode Character Name, Official Name
  • Alias
    • Example: idea
    • What: An alternative ID for the character. According to this, they are also supposed to be unique.
      • In English
      • Limited to standard ASCII, like Name
    • Warning: There can be more than one alias for a single character, or even none!
    • Warning: Not to be confused with CLDR Keywords (see below)!
  • CLDR Short Name
    • Example: light bulb (English)
    • What: This is shorter, more readable name for the character, maintained under CLDR.
      • Unlike the Standard Name or Alias, this value does get translated into other languages
      • Also unlike Name or Alias, this is not guaranteed to stay the same - it is not an ID
      • This is the property usually meant for screen-readers, thus CLDR annotating it as "TTS", for "Text-to-Speech"
      • Can be found under CLDR annotations
    • AKA Short Name, Short Character Name, TTS value
    • More info:
  • CLDR Keywords
    • Example: bulb | comic | electric | idea | light (English)
    • What:
      • A set of single words that correspond with the character, which could be used to search for it.
      • Like the CLDR Short Name, these should be internationalized and are meant to be flexible
      • Can be found under CLDR annotations
    • AKA: CLD Character Keywords
    • More info:

Many system tools will combine these properties in interesting ways. For example, the Windows Emoji Picker will let you search by CLDR Short Name, or CLDR Keywords.

Markdown Source Last Updated:
Tue Sep 08 2020 12:12:57 GMT+0000 (Coordinated Universal Time)
Markdown Source Created:
Tue Sep 08 2020 05:57:58 GMT+0000 (Coordinated Universal Time)
© 2024 Joshua Tzucker, Built with Gatsby