Internationalised Domain Names

A little while ago, I gave you v6only to play with.  Now I give you http://πέτρος.chown.org.uk.  Let’s see if your browser can display Greek characters in URLs.

URLs were traditionally limited to Western characters.  That was a problem for people who spoke languages which used a different character set: Greek, Cyrillic, Chinese, and so on.

Allowing other character sets to appear in URLs sounds easy, but it isn’t.  The problem is that it creates all sorts of new possibilities for fake websites, which can then be used in phishing emails.  For example, suppose you ended up on barclayς.co.uk.  Would you realise that the last letter was a Greek final sigma and not an s?  There are some cases that are even worse, where there is no visual distinction between the characters at all.

To prevent this, Firefox will only show Western characters—unless the domain is registered in a country which has specific rules about the characters which are permitted.  Because the UK doesn’t, if you visit the link above in Firefox, you won’t normally see Greek characters in the URL.  Chrome took a different approach.  Chrome will only show you characters from languages that you claim to be able to speak.  If your language preferences include Greek, you will therefore see Greek characters in the URL.

If you want to see a Greek URL, therefore, it’s easiest to use Chrome.  Add Greek to the language preferences, and it should just work.

There are lots of other funny traps with this.  Greek has two lower case sigmas: σ and ς.  There is only one upper case sigma: Σ.  Because URLs are case-insensitive, this creates problems.  Is πέτρος the same name as πέτροσ?  The answer is yes, and in fact πέτρος will magically turn into πέτροσ, even though the latter is bad grammar.

Oh yes.  πέτρος is the apostle who is normally called Peter.  So πέτρος is almost my name in Biblical Greek.

8 thoughts on “Internationalised Domain Names

  1. Neil

    That looks like Matt 16:18. I only got the encoded form in the URL bar though (using Firefox). I guess the Chrome approach means that the hack only works for bilingual people. If you say that you know Greek, you will see barclayς.co.uk.

    Reply
    1. Pete Post author

      It’s worse than that, isn’t it? Saying you don’t know English won’t stop you seeing the Western characters in barclayς, because US-ASCII characters don’t need to be encoded.

      Ideally Chrome would block names that mix scripts, but I don’t know if it does that or not.

      Incidentally, Firefox doesn’t show Unicode characters in .uk domains because Nominet don’t do IDNs, and as a result, .uk isn’t on the whitelist. However, it seems to me that this is wrong. Effectively Nominet’s IDN policy is the most restrictive possible, in that they don’t allow any at all. Do you agree? I’ve been thinking about nagging Mozilla to change their policy on this, because it unfairly deprives me of a shiny toy.

      Reply
      1. Neil

        But US-ASCII is only 7 bits so it doesn’t have “ς”. That’s right isn’t it? The DNS spec seems to suggest that implementations should be 8-bit clean without specifying what characters 128-255 do. It says that you should restrict yourself to alphanumerics plus hyphen to avoid problems with protocols like SMTP.

        Actually, I’m not sure Firefox needs to do any blocking in this case. You can’t even register barclayς.co.uk. If someone gets fooled by barclayς.chown.org.uk then they would be fooled by barclays.chown.org.uk.

        Reply
        1. Pete Post author

          Imagine you are using Chrome, and you say you speak Greek and nothing else. What will Chrome show you? It’s okay with ‘barclay’ and ‘.co.uk’ because that doesn’t require encoding. It’s okay with ‘ς’ because you claim to speak Greek. As a result, you get shown the spoof domain name rather than the punycode version. (I haven’t actually tried this, so perhaps Chrome blocks names that mix scripts, or requires that the user speak a Western European language if ASCII characters are mixed with something else.)

          At one point you could register barclayς.com and, actually, anything else. You could even use various Unicode characters that look like dots and slashes. I think .com has now disallowed all IDNs. It would be interesting to know if there are any problematic TLDs left. If the world is split between TLDs with sensible restrictions on IDNs, and TLDs that disallow all IDNs, the whitelist might as well be dropped. As you say, the ability to create barclayς.chown.org.uk is not a security threat.

          (You do still have to block the characters that look like slashes because, otherwise, you could register for example barclays.co.uk╱foo.chown.org.uk.)

          Reply
  2. John Papadopoulos

    Hello Πέτρο,

    I have an interesting question regarding Firefox version 50.1.0 on Windows 10.

    When I click on http://πέτρος.chown.org.uk Firefox doesn’t take me there and displays ‘Server not found’. However, when I type http://πέτροσ.chown.org.uk (notice the sigma in πέτροσ) everything works fine.

    Internet Explorer and Microsoft Edge don’t have this problem. Any ideas?

    Thanking you in advance for your time, efforts and support, I am wishing you the very best for 2017 and I am looking forward to hearing from you.

    Reply
    1. Pete Post author

      Hi John,

      It looks as though some things have changed since I wrote this article. Firefox displays Greek characters now, so that’s a step forward.

      I’ve now registered xn--ixa0bbfgj.chown.org.uk (πέτρος) as well as xn--ixa0bbfld.chown.org.uk (πέτροσ). It’s just an experiment so the new name doesn’t have a website; if you go there, you’ll just see a test page. Firefox seems to treat the two sites as entirely different. You get your final sigma but the names are no longer case-insensitive. If you go to http://ΠΈΤΡΟΣ.chown.org.uk/ you arbitrarily get redirected to πέτροσ.

      Chrome has the behaviour I originally noted. Whichever URL you start with (final sigma, no final sigma, or capitals) you end up at πέτροσ.

      From your name I’m guessing that you’re actually Greek! So are you trying to make this work for a modern Greek website? If so, presumably you need to register two names, one with a final sigma and one without. You then display the same site for both URLs. For bonus points I suppose you could forward Firefox users to the final sigma version of the site, so they see the language the way it’s meant to be written. You can’t help the fact that it won’t work for users of other browsers.

      Good luck,
      Pete

      Reply
      1. John Papadopoulos

        Hello Pete,

        Thank you very much for your prompt reply and please do accept my apologies for not being as prompt in getting back.

        You guessed right! I am actually Greek and a few years ago I registered through a Greek Registrar, the ethnic domain ‘ΙωάννηςΠαπαδόπουλος.net’ [ http://www.ΙωάννηςΠαπαδόπουλος.net/ ].

        I will explain the problem further, hoping that I will put my two cents in the discussion of Internationalised Domain Names. In fact, I am posting here the email I sent to my registrar, asking for advice.

        Microsoft Edge – Windows 10: When I type
        http://www.ΙωάννηςΠαπαδόπουλος.net
        the address is automatically converted to
        http://www.ιωάννησπαπαδόπουλοσ.net/
        and I am taken to the website.

        Microsoft Internet Explorer – Windows 10: When I type
        http://www.ΙωάννηςΠαπαδόπουλος.net
        the address is automatically converted to punycode
        http://www.xn--hxakammktlaybgbc3bj5a3dwd.net/
        and I am taken to the website.

        Mozilla Firefox 50.1.0 – Windows 10: When I type
        http://www.ΙωάννηςΠαπαδόπουλος.net
        the address is automatically converted to
        http://www.ιωάννηςπαπαδόπουλος.net/
        and I get the message ‘Problem loading page’ and ‘Server not found’. However, when I type the sigma myself,
        http://www.ΙωάννησΠαπαδόπουλοσ.net/
        then Firefox takes me to the website as the other two browsers.

        The reply I got from my registrar is that the problem on how Firefox treats the greek final sigma is a bug that has appeared in the past and that it is fixed in future hotfixes that are released by Firefox developers. They believe that it is a matter of time for an update that will be fixing this behavior.

        And last, but not least, using Firefox 50.1.0 – Windows 10,
        http://πέτρος.chown.org.uk/
        and
        xn--ixa0bbfgj.chown.org.uk
        take me to the Apache default page,
        while
        http://πέτροσ.chown.org.uk/
        xn--ixa0bbfld.chown.org.uk
        and
        http://ΠΈΤΡΟΣ.chown.org.uk/
        take me to the page with the sentence in Greek Polytonic.

        It goes without saying that I can’t thank you enough for your support.

        All the best,
        John

        Reply
        1. Pete Post author

          Hi John,

          Is the final sigma ‘compulsory’ in the sense that Greeks would feel a word was spelt wrong if it wasn’t used? If so, there seems to be a good argument that Firefox is the only browser which gets this right! It’s just awkward because domain names are supposed to be case-insensitive and that doesn’t work if you have two lower case letters for one upper case one.

          If there is an absolute rule that a sigma at the end of a word is always a final sigma, that might work, I suppose. You could then choose the appropriate lower case letter based on the context. This could be done by browsers without changing the underlying IDN system at all.

          If you want to take part in a wider discussion of IDNs, this isn’t the best place. 🙂 I was just interested because it was a new browser feature, back when I first wrote the article. I didn’t help develop the standard or anything like that. You’re probably better going to the browser bug trackers and shouting about the problem. Unfortunately the IDN working group https://datatracker.ietf.org/wg/idn/charter/ has closed down (having produced the documents it was intended to create) so while that would be the ideal place to complain about bugs in the standard, it may not work.

          Pete

          Reply

Leave a Reply

Your email address will not be published. Required fields are marked *