Its Time to Allow Emoji Domain Names
To understand why Emoji Domain Names aren't allowed we really have to go back to the beginning – to a certain extent the history of the Internet itself has been a history of cultural exclusion that has changed over time, but only very very slowly.
In the early days of the Internet, large parts of it had been an American thing – this led to decisions being made with a very English centric flavour. When the Domain Name System, or DNS, was first established, English – in the form of Latin characters, where the only ones supported. Even the relatively small range of Western European accented characters were excluded. According to UNESCO, in 2008 only 12 languages accounted for 98% of Internet web pages – with English accounting for 72% of all web pages.
To a certain extent this problem started at the very bottom, at the birth of computing – English, and Latin characters, in the form of ASCII (American Standard Code for Information Interchange) was first published in 1963. It specified how all the Latin characters would be represented using the first 128 characters of the 256 available in an 8-bit byte.
Later this became know as US-ASCII with the second block of 128 characters being used for national variation. However, of the wide range of variation in national characters and languages in use, the remaining 128 characters was not nearly enough to represent everything. This also set in stone, from the beginning, the priority English would have in computing.
And so Unicode Was Born
To widen the support for national languages, in 1987 Unicode was born. The aim was to produced a character set that was capable of presenting all national languages. The first standard (v1.0.0) was published in 1991 and supported 24 different scripts in 7,161 characters - or “code points” as they are known in Unicode. Jun-2017 v10.0.0 was released with 139 different scripts and 136,755 code points.
Originally, Unicode was 16-bit, but it has since been extended to 32bit. However, this can be wasteful for languages, like western Europeans ones, where only a small number of non-Latin characters may be used in any one document. For example, where 98% of the document could be represented in the original 7-bit ASCII, but to be fully 32-bit Unicode the document would be expanded to four times it size.
This is where UTF-8 comes in. UTF-8 is a way of encoding 32-bit Unicode such that only those characters that need the extra data space of 8-bit, 16-bit or 32-bit are given it. So documents that are largely Latin characters remain much the same size, but the full range of 32-bit Unicode can be included where necessary.
Ideally, all documents on the Internet (web pages, emails etc) should be UTF-8. A fully UTF-8 compliant systems would be able to store and recall any characters in any language – e.g. names and addresses in China could be stored, recalled and displayed in Chinese.
Adding International Characters to DNS
So when the decision was taken to add International Character support to DNS, the logical choice was UTF-8. However, DNS was already widely implemented with support for only a very narrow range of Latin characters. So an extra encoding was established to convert UTF-8 into this narrow range of valid DNS characters. This conversion algorithm is called PunyCode and was first published as a proposed standard in March-2003 in RFC3490 and later became the adopted standard IDNA2003 (International Domain Names in Applications 2003). These standards where then adopted by ICANN. In 2009, the ICANN Board approved a fast track process for IDN ccTLDs, describing the programme as a “top priority”. By April 2011, 17 IDN ccTLDs had been launched.
The original algorithm allows for the encoding of any Unicode Code Point – many of which are not international characters at all, but represent all sorts of symbols, for example, including a full pack of playing cards (U+1F0A0 to U+1F0DF)
The first standard for International Domain Names in Applications was published by IETF in 2003 so is called IDNA2003. It was later revised in 2008 as IDNA2008 extending support from Unicode 3.2 to Unicode 5.2
Changes from IDNA2003 to IDNA 2008
One of the major changes from IDNA2003 to IDNA2008 was to restrict the range of Unicode code points that are allowed to be encoded into DNS. There were two main reasons for this. Firstly, in IDN2003 some words could be registered twice as both upper and lower case. This is not inline with the general principals of DNS and could cause confusion.
For example, in IDA2003, http://öbb.at/ and http://ÖBB.at/ could both exist in the DNS separately and therefore could, in theory, could be owned by two different entities and point to two different websites. IDNA2008 fixes this by disallowing the separate registration of the upper-case version.
The other major change in IDNA2008 was that symbols were disallowed. One of the principals of IDNA2008 was that characters that are meant for presentation purposes only should not be allowed in IDN.
For example, the Unicode character U+2010 is a HYPHEN – however, so is ASCII character 45, also U+002D – so if every Unicode character where allowed, you could register the domain name “my-bank.com” and “my‐bank.com” - the first uses the ASCII hyphen, the second one uses the Unicode Hyphen. This clearly presents a security risk.
So what about Emoji Characters, what are the arguments in favour?
1 … There is now a standard
Since Unicode 6.0 there has been a specific definition of Emoji Characters. Before then a a few had been in the Unicode block “Miscellaneous Symbols” (U+2600 to U+26FF). This means, at the time the IDNA2008 standard was set, Emojis where not officially recognised or defined as an entity in themselves in Unicode. There is therefore a strong case to say this needs to be looked at again.
We're not trying to say any mistake had been made when IDNA2008 was established – simply that Emojis as an entity hadn't existed and their widespread use and popularity, especially on social media, was yet to be established when IDNA2008 was agreed.
2 ... Some Emoji Domain Names Already Exist
Between IDNA2003 and IDNA2008 some Emoji, and other Miscellaneous Symbol, domain names had been registered in dot-COM. There are 65 single code point domain names in the “Miscellaneous Symbol” block including ☮.com, ♀.com and ☯.com
Of those 65 symbol domain names 33 are categorised as Emojis. In this respect, there could be a case to suggest mistakes had been made when setting IDNA2008, in that it has recently come to light that in taking the decision to block symbol domain names, the fact that some already existed had not been taken into account.
This means that, due to historical ambiguities, dot-COM has ended up being allowed to carry 33 Emoji Domain Names that the other new-GTLDs are not allowed.
This could be problematic for ICANN, as it could leave them open to accusations of not operating the new-GTLD program on an level playing field – allowing registrations of Emoji Domain Names to remain in dot-COM while banning them from the new Generic Top Level Domains.
It is also clear that the idea they're going to break the internet simply doesn't hold water, as they've been live and kicking since 2003.
3 … Unicode Says There is no Risk
In all issues of security, its impossible to say with 100% certainty there there is no risk – however, in the opinion of Unicode there is no security benefit to the removal of Emojis. Google estimated the improved security achieved by the removal of all symbols to be 0.000016%
Generally, the removal of confusable or inappropriate characters is best done by the TLD registry, who not only has a list of all the already registered domains, but also best knows what characters would be appropriate in their address space.
Although, it could equally be argued that the small number of symbol domain names registered means there simply aren't enough for this to be a problem, or to accurately gauge if there could be a problem.
4 … There is now a Written Standard
Since Unicode 6.0 there is a standard for Emoji characters. This means there wouldn't be a requirement for any additional work to be done to define which characters would or wouldn't be included. This was not the case when IDNA2008 was set, which was set at Unicode 5.2 (although it allows for updates).
5 … Cultural Exclusion
Since its inception, the internet has a history of cultural exclusion and English bias. Clearly, it is very high on ICANN's priority list to ensure this is tackled. However, continuing to block Emoji Domain Names appears to entrench this bias further, both in terms of Geo/National Cultural Exclusion and in terms of age biased cultural exclusion.
The Argument Against
There appear to be three main arguments put forward against Emoji Domain Names.
The decision on what type of domain names to allow and what not to allow is the responsibility of the ICANN Security and Stability Advisory Committee. In this interview the chair of the committee, Patrik Fälström, explains why Emoji Domains are not allowed.
It boils down to three issues.
1 … Confusibility
With a lack of standard in the way that Emojis are presented to the end user, if an end user tried to type in an emoji domain name, there is a good chance they could use the wrong one.
For example, when you look at the different ways that U+1F600 (GRINNING FACE) and U+1F601 (GRINNING FACE WITH SMILING EYES) are presented, it is easy to see how the small and subtle difference between the different images on the different platforms could lead to confusion - especially as these are only a tiny fraction of the Emojis that exist.
A simply answer to this would be – isn't this the responsibility of the domain owner? If a text domain could be easily confused with another one, the owner is expected to take responsibility – they must either register both, or choose a different name.
As Emojis become more widely popular, universal implementations like the EmojiOne artwork are starting to emerge, that are now available on a wide range of different platform.
2 … Accessibility
As Emojis can not be spoken, they lack accessibility.
Although this argument has some merit, all Unicode code points, without exception, have an associated standard text description. For example U+1F601 is “GRINNING FACE WITH SMILING EYES” – so it doesn't seem to be a big jump to use this as the spoken version of an Emoji.
3 … Its not Allowed in the Standard
It is true that IDNA2008 does not allow Emoji Characters. However, this is because of three reasons. Firstly, Emojis (as a standard) did not exist at the time the IDNA2008 was set and secondly people “forgot” that some Emoji Domain Names had already been registered, so this was not taken into account and finally because in 2008 Emojis hadn't risen to the cultural significance they occupy today.
It also does feel quite a lot like the old "because I said so" argument you get from your parents when you're five years old.
Whatever the arguments about percentages, communication is far more than just the words we say, and Emojis can be used to convey some of the hidden depths lost in a text only world.
Emojis add a human element back into the slightly sterile world of computer communication.
It is clear that Emoji Domain Names will continue to be disallowed while IDNA2008 is the current agreed standard. However, there doesn't seem to be any good reason to continue to exclude them the next time this standard is revised.