libidn: PR29 Functions
1
1 8 PR29 Functions
1 ****************
1
1 A deficiency in the specification of Unicode Normalization Forms has
1 been found. The consequence is that some strings can be normalized into
1 different strings by different implementations. In other words, two
1 different implementations may return different output for the same input
1 (because the interpretation of the specification is ambiguous).
1 Further, an implementation invoked again on the one of the output
1 strings may return a different string (because one of the interpretation
1 of the ambiguous specification make normalization non-idempotent).
1 Fortunately, only a select few character sequence exhibit this problem,
1 and none of them are expected to occur in natural languages (due to
1 different linguistic uses of the involved characters).
1
1 A full discussion of the problem may be found at:
1
1 <http://www.unicode.org/review/pr-29.html>
1
1 The PR29 functions below allow you to detect the problem sequence.
1 So when would you want to use these functions? For most applications,
1 such as those using Nameprep for IDN, this is likely only to be an
1 interoperability problem. Thus, you may not want to care about it, as
1 the character sequences will rarely occur naturally. However, if you
1 are using a profile, such as SASLPrep, to process authentication tokens;
1 authorization tokens; or passwords, there is a real danger that
1 attackers may try to use the peculiarities in these strings to attack
1 parts of your system. As only a small number of strings, and no
1 naturally occurring strings, exhibit this problem, the conservative
1 approach of rejecting the strings is recommended. If this approach is
1 not used, you should instead verify that all parts of your system, that
1 process the tokens and passwords, use a NFKC implementation that produce
1 the same output for the same input.
1
1 Technically inclined readers may be interested in knowing more about
1 the implementation aspects of the PR29 flaw. ⇒PR29 discussion.
1
1 8.1 Header file ‘pr29.h’
1 ========================
1
1 To use the functions explained in this chapter, you need to include the
1 file ‘pr29.h’ using:
1
1 #include <pr29.h>
1
1 8.2 Core Functions
1 ==================
1
1 pr29_4
1 ------
1
1 -- Function: int pr29_4 (const uint32_t * IN, size_t LEN)
1 IN: input array with unicode code points.
1
1 LEN: length of input array with unicode code points.
1
1 Check the input to see if it may be normalized into different
1 strings by different NFKC implementations, due to an anomaly in the
1 NFKC specifications.
1
1 Return value: Returns the ‘Pr29_rc’ value ‘PR29_SUCCESS’ on
1 success, and ‘PR29_PROBLEM’ if the input sequence is a "problem
1 sequence" (i.e., may be normalized into different strings by
1 different implementations).
1
1 8.3 Utility Functions
1 =====================
1
1 pr29_4z
1 -------
1
1 -- Function: int pr29_4z (const uint32_t * IN)
1 IN: zero terminated array of Unicode code points.
1
1 Check the input to see if it may be normalized into different
1 strings by different NFKC implementations, due to an anomaly in the
1 NFKC specifications.
1
1 Return value: Returns the ‘Pr29_rc’ value ‘PR29_SUCCESS’ on
1 success, and ‘PR29_PROBLEM’ if the input sequence is a "problem
1 sequence" (i.e., may be normalized into different strings by
1 different implementations).
1
1 pr29_8z
1 -------
1
1 -- Function: int pr29_8z (const char * IN)
1 IN: zero terminated input UTF-8 string.
1
1 Check the input to see if it may be normalized into different
1 strings by different NFKC implementations, due to an anomaly in the
1 NFKC specifications.
1
1 Return value: Returns the ‘Pr29_rc’ value ‘PR29_SUCCESS’ on
1 success, and ‘PR29_PROBLEM’ if the input sequence is a "problem
1 sequence" (i.e., may be normalized into different strings by
1 different implementations), or ‘PR29_STRINGPREP_ERROR’ if there was
1 a problem converting the string from UTF-8 to UCS-4.
1
1 8.4 Error Handling
1 ==================
1
1 pr29_strerror
1 -------------
1
1 -- Function: const char * pr29_strerror (Pr29_rc RC)
1 RC: an ‘Pr29_rc’ return code.
1
1 Convert a return code integer to a text string. This string can be
1 used to output a diagnostic message to the user.
1
1 *PR29_SUCCESS:* Successful operation. This value is guaranteed to
1 always be zero, the remaining ones are only guaranteed to hold
1 non-zero values, for logical comparison purposes.
1
1 *PR29_PROBLEM:* A problem sequence was encountered.
1
1 *PR29_STRINGPREP_ERROR:* The character set conversion failed (only
1 for ‘pr29_8z()’ ).
1
1 Return value: Returns a pointer to a statically allocated string
1 containing a description of the error with the return code ‘rc’ .
1