libidn: Punycode Functions

1 
1 5 Punycode Functions
1 ********************
1 
1 Punycode is a simple and efficient transfer encoding syntax designed for
1 use with Internationalized Domain Names in Applications.  It uniquely
1 and reversibly transforms a Unicode string into an ASCII string.  ASCII
1 characters in the Unicode string are represented literally, and
1 non-ASCII characters are represented by ASCII characters that are
1 allowed in host name labels (letters, digits, and hyphens).  A general
1 algorithm called Bootstring allows a string of basic code points to
1 uniquely represent any string of code points drawn from a larger set.
1 Punycode is an instance of Bootstring that uses particular parameter
1 values, appropriate for IDNA.
1 
1 5.1 Header file ‘punycode.h’
1 ============================
1 
1 To use the functions explained in this chapter, you need to include the
1 file ‘punycode.h’ using:
1 
1      #include <punycode.h>
1 
1 5.2 Unicode Code Point Data Type
1 ================================
1 
1 The punycode function uses a special type to denote Unicode code points.
1 It is guaranteed to always be a 32 bit unsigned integer.
1 
1  -- Punycode Unicode code point: uint32_t punycode_uint
1      A unsigned integer that hold Unicode code points.
1 
1 5.3 Core Functions
1 ==================
1 
1 Note that the current implementation will fail if the ‘input_length’
1 exceed 4294967295 (the size of ‘punycode_uint’).  This restriction may
1 be removed in the future.  Meanwhile applications are encouraged to not
1 depend on this problem, and use ‘sizeof’ to initialize ‘input_length’
1 and ‘output_length’.
1 
1    The functions provided are the following two entry points:
1 
1 punycode_encode
1 ---------------
1 
1  -- Function: int punycode_encode (size_t INPUT_LENGTH, const
1           punycode_uint [] INPUT, const unsigned char [] CASE_FLAGS,
1           size_t * OUTPUT_LENGTH, char [] OUTPUT)
1      INPUT_LENGTH: The number of code points in the ‘input’ array and
1      the number of flags in the ‘case_flags’ array.
1 
1      INPUT: An array of code points.  They are presumed to be Unicode
1      code points, but that is not strictly REQUIRED. The array contains
1      code points, not code units.  UTF-16 uses code units D800 through
1      DFFF to refer to code points 10000..10FFFF. The code points
1      D800..DFFF do not occur in any valid Unicode string.  The code
1      points that can occur in Unicode strings (0..D7FF and E000..10FFFF)
1      are also called Unicode scalar values.
1 
1      CASE_FLAGS: A ‘NULL’ pointer or an array of boolean values parallel
1      to the ‘input’ array.  Nonzero (true, flagged) suggests that the
1      corresponding Unicode character be forced to uppercase after being
1      decoded (if possible), and zero (false, unflagged) suggests that it
1      be forced to lowercase (if possible).  ASCII code points (0..7F)
1      are encoded literally, except that ASCII letters are forced to
1      uppercase or lowercase according to the corresponding case flags.
1      If ‘case_flags’ is a ‘NULL’ pointer then ASCII letters are left as
1      they are, and other code points are treated as unflagged.
1 
1      OUTPUT_LENGTH: The caller passes in the maximum number of ASCII
1      code points that it can receive.  On successful return it will
1      contain the number of ASCII code points actually output.
1 
1      OUTPUT: An array of ASCII code points.  It is *not*
1      null-terminated; it will contain zeros if and only if the ‘input’
1      contains zeros.  (Of course the caller can leave room for a
1      terminator and add one if needed.)
1 
1      Converts a sequence of code points (presumed to be Unicode code
1      points) to Punycode.
1 
1      Return value: The return value can be any of the ‘Punycode_status’
1      values defined above except ‘PUNYCODE_BAD_INPUT’ .  If not
1      ‘PUNYCODE_SUCCESS’ , then ‘output_size’ and ‘output’ might contain
1      garbage.
1 
1 punycode_decode
1 ---------------
1 
1  -- Function: int punycode_decode (size_t INPUT_LENGTH, const char []
1           INPUT, size_t * OUTPUT_LENGTH, punycode_uint [] OUTPUT,
1           unsigned char [] CASE_FLAGS)
1      INPUT_LENGTH: The number of ASCII code points in the ‘input’ array.
1 
1      INPUT: An array of ASCII code points (0..7F).
1 
1      OUTPUT_LENGTH: The caller passes in the maximum number of code
1      points that it can receive into the ‘output’ array (which is also
1      the maximum number of flags that it can receive into the
1      ‘case_flags’ array, if ‘case_flags’ is not a ‘NULL’ pointer).  On
1      successful return it will contain the number of code points
1      actually output (which is also the number of flags actually output,
1      if case_flags is not a null pointer).  The decoder will never need
1      to output more code points than the number of ASCII code points in
1      the input, because of the way the encoding is defined.  The number
1      of code points output cannot exceed the maximum possible value of a
1      punycode_uint, even if the supplied ‘output_length’ is greater than
1      that.
1 
1      OUTPUT: An array of code points like the input argument of
1      ‘punycode_encode()’ (see above).
1 
1      CASE_FLAGS: A ‘NULL’ pointer (if the flags are not needed by the
1      caller) or an array of boolean values parallel to the ‘output’
1      array.  Nonzero (true, flagged) suggests that the corresponding
1      Unicode character be forced to uppercase by the caller (if
1      possible), and zero (false, unflagged) suggests that it be forced
1      to lowercase (if possible).  ASCII code points (0..7F) are output
1      already in the proper case, but their flags will be set
1      appropriately so that applying the flags would be harmless.
1 
1      Converts Punycode to a sequence of code points (presumed to be
1      Unicode code points).
1 
1      Return value: The return value can be any of the ‘Punycode_status’
1      values defined above.  If not ‘PUNYCODE_SUCCESS’ , then
1      ‘output_length’ , ‘output’ , and ‘case_flags’ might contain
1      garbage.
1 
1 5.4 Error Handling
1 ==================
1 
1 punycode_strerror
1 -----------------
1 
1  -- Function: const char * punycode_strerror (Punycode_status RC)
1      RC: an ‘Punycode_status’ return code.
1 
1      Convert a return code integer to a text string.  This string can be
1      used to output a diagnostic message to the user.
1 
1      *PUNYCODE_SUCCESS:* Successful operation.  This value is guaranteed
1      to always be zero, the remaining ones are only guaranteed to hold
1      non-zero values, for logical comparison purposes.
1 
1      *PUNYCODE_BAD_INPUT:* Input is invalid.
1 
1      *PUNYCODE_BIG_OUTPUT:* Output would exceed the space provided.
1 
1      *PUNYCODE_OVERFLOW:* Input needs wider integers to process.
1 
1      Return value: Returns a pointer to a statically allocated string
1      containing a description of the error with the return code ‘rc’ .
1