libidn: On Label Separators
1
1 Appendix B On Label Separators
1 ******************************
1
1 Some strings contains characters whose NFKC normalized form contain the
1 ASCII dot (0x2E, “.”). Examples of these characters are U+2024 (ONE DOT
1 LEADER) and U+248C (DIGIT FIVE FULL STOP). The strings have the
1 interesting property that their IDNA ToASCII output will contain
1 embedded dots. For example:
1
1 ToASCII (hi U+248C com) = hi5.com
1 ToASCII (räksmörgås U+2024 com) = xn--rksmrgs.com-l8as9u
1
1 This demonstrate the two general cases: The first where the ASCII dot
1 is part of an output that do not begin with the IDN prefix ‘xn--’. The
1 second example illustrate when the dot is part of IDN prefixed with
1 ‘xn--’.
1
1 The input strings are, from the DNS point of view, a single label.
1 The IDNA algorithm translate one label at a time. Thus, the output is
1 expected to be only one label. What is important here is to make sure
1 the DNS resolver receives the correct query. The DNS protocol does not
1 use the dot to delimit labels on the wire, rather it uses length-value
1 pairs. Thus the correct query would be for ‘{7}hi5.com’ and
1 ‘{22}xn--rksmrgs.com-l8as9u’ respectively.
1
1 Some implementations (1) have decided that these inputs strings are
1 potentially confusing for the user. The string ‘hi U+248C com’ looks
1 like ‘hi5.com’ on systems that support Unicode properly. These
1 implementations do not follow RFC 3490. They yield:
1
1 ToASCII (hi U+248C com) = hi5.com
1 ToASCII (räksmörgås U+2024 com) = xn--rksmrgs-5wao1o.com
1
1 The DNS query they perform are ‘{3}hi5{3}com’ and
1 ‘{18}xn--rksmrgs-5wao1o{3}com’ respectively. Arguably, this leads to a
1 better user experience, and suggests that the IDNA specification is
1 sub-optimal in this area.
1
1 B.1 Recommended Workaround
1 ==========================
1
1 It has been suggested to normalize the entire input string using NFKC
1 before passing it to IDNA ToASCII. You may use
1 ‘stringprep_utf8_nfkc_normalize’ or ‘stringprep_ucs4_nfkc_normalize’.
1 This appears to lead to similar behaviour as IE/Firefox, which would
1 avoid the problem, but this needs to be confirmed. Feel free to discuss
1 the issue with us.
1
1 Alternative workarounds are being considered. Eventually Libidn may
1 implement a new flag to the ‘idna_*’ functions that implements a
1 recommended way to work around this problem.
1
1 ---------- Footnotes ----------
1
1 (1) Notably Microsoft’s Internet Explorer and Mozilla’s Firefox, but
1 not Apple’s Safari.
1