libidn: On Label Separators

1 
1 Appendix B On Label Separators
1 ******************************
1 
1 Some strings contains characters whose NFKC normalized form contain the
1 ASCII dot (0x2E, “.”).  Examples of these characters are U+2024 (ONE DOT
1 LEADER) and U+248C (DIGIT FIVE FULL STOP). The strings have the
1 interesting property that their IDNA ToASCII output will contain
1 embedded dots.  For example:
1 
1      ToASCII (hi U+248C com) = hi5.com
1      ToASCII (räksmörgås U+2024 com) = xn--rksmrgs.com-l8as9u
1 
1    This demonstrate the two general cases: The first where the ASCII dot
1 is part of an output that do not begin with the IDN prefix ‘xn--’.  The
1 second example illustrate when the dot is part of IDN prefixed with
1 ‘xn--’.
1 
1    The input strings are, from the DNS point of view, a single label.
1 The IDNA algorithm translate one label at a time.  Thus, the output is
1 expected to be only one label.  What is important here is to make sure
1 the DNS resolver receives the correct query.  The DNS protocol does not
1 use the dot to delimit labels on the wire, rather it uses length-value
1 pairs.  Thus the correct query would be for ‘{7}hi5.com’ and
1 ‘{22}xn--rksmrgs.com-l8as9u’ respectively.
1 
1    Some implementations (1) have decided that these inputs strings are
1 potentially confusing for the user.  The string ‘hi U+248C com’ looks
1 like ‘hi5.com’ on systems that support Unicode properly.  These
1 implementations do not follow RFC 3490.  They yield:
1 
1      ToASCII (hi U+248C com) = hi5.com
1      ToASCII (räksmörgås U+2024 com) = xn--rksmrgs-5wao1o.com
1 
1    The DNS query they perform are ‘{3}hi5{3}com’ and
1 ‘{18}xn--rksmrgs-5wao1o{3}com’ respectively.  Arguably, this leads to a
1 better user experience, and suggests that the IDNA specification is
1 sub-optimal in this area.
1 
1 B.1 Recommended Workaround
1 ==========================
1 
1 It has been suggested to normalize the entire input string using NFKC
1 before passing it to IDNA ToASCII. You may use
1 ‘stringprep_utf8_nfkc_normalize’ or ‘stringprep_ucs4_nfkc_normalize’.
1 This appears to lead to similar behaviour as IE/Firefox, which would
1 avoid the problem, but this needs to be confirmed.  Feel free to discuss
1 the issue with us.
1 
1    Alternative workarounds are being considered.  Eventually Libidn may
1 implement a new flag to the ‘idna_*’ functions that implements a
1 recommended way to work around this problem.
1 
1    ---------- Footnotes ----------
1 
1    (1) Notably Microsoft’s Internet Explorer and Mozilla’s Firefox, but
1 not Apple’s Safari.
1