libidn: PR29 discussion
1
1 Appendix A PR29 discussion
1 **************************
1
1 If you wish to experiment with a modified Unicode NFKC implementation
1 according to the PR29 proposal, you may find the following bug report
1 useful. However, I have not verified that the suggested modifications
1 are correct. For reference, I’m including my response to the report as
1 well.
1
1 From: Rick McGowan <rick@unicode.org>
1 Subject: Possible bug and status of PR 29 change(s)
1 To: bug-libidn@gnu.org
1 Date: Wed, 27 Oct 2004 14:49:17 -0700
1
1 Hello. On behalf of the Unicode Consortium editorial committee, I would
1 like to find out more information about the PR 29 fixes, if any, and
1 functions in Libidn. Your implementation was listed in the text of PR29 as
1 needing investigation, so I am following up on several implementations.
1
1 The UTC has accepted the proposed fix to D2 as outlined in PR29, and a new
1 draft of UAX #15 has been issued.
1
1 I have looked at Libidn 0.5.8 (today), and there may still be a possible
1 bug in NFKC.java and nfkc.c.
1
1 ------------------------------------------------------
1
1 1. In NFKC.java, this line in canonicalOrdering():
1
1 if (i > 0 && (last_cc == 0 || last_cc != cc)) {
1
1 should perhaps be changed to:
1
1 if (i > 0 && (last_cc == 0 || last_cc < cc)) {
1
1 but I'm not sure of the sense of this comparison.
1
1 ------------------------------------------------------
1
1 2. In nfkc.c, function _g_utf8_normalize_wc() has this code:
1
1 if (i > 0 &&
1 (last_cc == 0 || last_cc != cc) &&
1 combine (wc_buffer[last_start], wc_buffer[i],
1 &wc_buffer[last_start]))
1 {
1
1 This appears to have the same bug as the current Python implementation (in
1 Python 2.3.4). The code should be checking, as per new rule D2 UAX #15
1 update, that the next combining character is the same or HIGHER than the
1 current one. It now checks to see if it's non-zero and not equal.
1
1 The above line(s) should perhaps be changed to:
1
1 if (i > 0 &&
1 (last_cc == 0 || last_cc < cc) &&
1 combine (wc_buffer[last_start], wc_buffer[i],
1 &wc_buffer[last_start]))
1 {
1
1 but I'm not sure of the sense of the comparison (< or > or <=?) here.
1
1 In the text of PR29, I will be marking Libidn as "needs change" and adding
1 the version number that I checked. If any further change is made, please
1 let me know the release version, and I'll update again.
1
1 Regards,
1 Rick McGowan
1
1 From: Simon Josefsson <jas@extundo.com>
1 Subject: Re: Possible bug and status of PR 29 change(s)
1 To: Rick McGowan <rick@unicode.org>
1 Cc: bug-libidn@gnu.org
1 Date: Thu, 28 Oct 2004 09:47:47 +0200
1
1 Rick McGowan <rick@unicode.org> writes:
1
1 > Hello. On behalf of the Unicode Consortium editorial committee, I would
1 > like to find out more information about the PR 29 fixes, if any, and
1 > functions in Libidn. Your implementation was listed in the text of PR29 as
1 > needing investigation, so I am following up on several implementations.
1 >
1 > The UTC has accepted the proposed fix to D2 as outlined in PR29, and a new
1 > draft of UAX #15 has been issued.
1 >
1 > I have looked at Libidn 0.5.8 (today), and there may still be a possible
1 > bug in NFKC.java and nfkc.c.
1
1 Hello Rick.
1
1 I believe the current behavior is intentional. Libidn do not aim to
1 implement latest-and-greatest NFKC, it aim to implement the NFKC
1 functionality required for StringPrep and IDN. As you may know,
1 StringPrep/IDN reference Unicode 3.2.0, and explicitly says any later
1 changes (which I consider PR29 as) do not apply.
1
1 In fact, I believe that would I incorporate the changes suggested in
1 PR29, I would in fact be violating the IDN specifications.
1
1 Thanks for looking into the code and finding the place where the
1 change could be made. I'll see if I can mention this in the manual
1 somewhere, for technically interested readers.
1
1 Regards,
1 Simon
1