libidn: Utility Functions

1 
1 3 Utility Functions
1 *******************
1 
1 The rest of this library makes extensive use of Unicode characters.  In
1 order to interface this library with the outside world, your application
1 may need to make various Unicode transformations.
1 
1 3.1 Header file ‘stringprep.h’
1 ==============================
1 
1 To use the functions explained in this chapter, you need to include the
1 file ‘stringprep.h’ using:
1 
1      #include <stringprep.h>
1 
1 3.2 Unicode Encoding Transformation
1 ===================================
1 
1 stringprep_unichar_to_utf8
1 --------------------------
1 
1  -- Function: int stringprep_unichar_to_utf8 (uint32_t C, char * OUTBUF)
1      C: a ISO10646 character code
1 
1      OUTBUF: output buffer, must have at least 6 bytes of space.  If
1      ‘NULL’ , the length will be computed and returned and nothing will
1      be written to ‘outbuf’ .
1 
1      Converts a single character to UTF-8.
1 
1      Return value: number of bytes written.
1 
1 stringprep_utf8_to_unichar
1 --------------------------
1 
1  -- Function: uint32_t stringprep_utf8_to_unichar (const char * P)
1      P: a pointer to Unicode character encoded as UTF-8
1 
1      Converts a sequence of bytes encoded as UTF-8 to a Unicode
1      character.  If ‘p’ does not point to a valid UTF-8 encoded
1      character, results are undefined.
1 
1      Return value: the resulting character.
1 
1 stringprep_ucs4_to_utf8
1 -----------------------
1 
1  -- Function: char * stringprep_ucs4_to_utf8 (const uint32_t * STR,
1           ssize_t LEN, size_t * ITEMS_READ, size_t * ITEMS_WRITTEN)
1      STR: a UCS-4 encoded string
1 
1      LEN: the maximum length of ‘str’ to use.  If ‘len’ < 0, then the
1      string is terminated with a 0 character.
1 
1      ITEMS_READ: location to store number of characters read read, or
1      ‘NULL’ .
1 
1      ITEMS_WRITTEN: location to store number of bytes written or ‘NULL’
1      .  The value here stored does not include the trailing 0 byte.
1 
1      Convert a string from a 32-bit fixed width representation as UCS-4.
1      to UTF-8.  The result will be terminated with a 0 byte.
1 
1      Return value: a pointer to a newly allocated UTF-8 string.  This
1      value must be deallocated by the caller.  If an error occurs,
1      ‘NULL’ will be returned.
1 
1 stringprep_utf8_to_ucs4
1 -----------------------
1 
1  -- Function: uint32_t * stringprep_utf8_to_ucs4 (const char * STR,
1           ssize_t LEN, size_t * ITEMS_WRITTEN)
1      STR: a UTF-8 encoded string
1 
1      LEN: the maximum length of ‘str’ to use.  If ‘len’ < 0, then the
1      string is nul-terminated.
1 
1      ITEMS_WRITTEN: location to store the number of characters in the
1      result, or ‘NULL’ .
1 
1      Convert a string from UTF-8 to a 32-bit fixed width representation
1      as UCS-4.  The function now performs error checking to verify that
1      the input is valid UTF-8 (before it was documented to not do error
1      checking).
1 
1      Return value: a pointer to a newly allocated UCS-4 string.  This
1      value must be deallocated by the caller.
1 
1 3.3 Unicode Normalization
1 =========================
1 
1 stringprep_ucs4_nfkc_normalize
1 ------------------------------
1 
1  -- Function: uint32_t * stringprep_ucs4_nfkc_normalize (const uint32_t
1           * STR, ssize_t LEN)
1      STR: a Unicode string.
1 
1      LEN: length of ‘str’ array, or -1 if ‘str’ is nul-terminated.
1 
1      Converts a UCS4 string into canonical form, see
1      ‘stringprep_utf8_nfkc_normalize()’ for more information.
1 
1      Return value: a newly allocated Unicode string, that is the NFKC
1      normalized form of ‘str’ .
1 
1 stringprep_utf8_nfkc_normalize
1 ------------------------------
1 
1  -- Function: char * stringprep_utf8_nfkc_normalize (const char * STR,
1           ssize_t LEN)
1      STR: a UTF-8 encoded string.
1 
1      LEN: length of ‘str’ , in bytes, or -1 if ‘str’ is nul-terminated.
1 
1      Converts a string into canonical form, standardizing such issues as
1      whether a character with an accent is represented as a base
1      character and combining accent or as a single precomposed
1      character.
1 
1      The normalization mode is NFKC (ALL COMPOSE). It standardizes
1      differences that do not affect the text content, such as the
1      above-mentioned accent representation.  It standardizes the
1      "compatibility" characters in Unicode, such as SUPERSCRIPT THREE to
1      the standard forms (in this case DIGIT THREE). Formatting
1      information may be lost but for most text operations such
1      characters should be considered the same.  It returns a result with
1      composed forms rather than a maximally decomposed form.
1 
1      Return value: a newly allocated string, that is the NFKC normalized
1      form of ‘str’ .
1 
1 3.4 Character Set Conversion
1 ============================
1 
1 stringprep_locale_charset
1 -------------------------
1 
1  -- Function: const char * stringprep_locale_charset ( VOID)
1 
1      Find out current locale charset.  The function respect the CHARSET
1      environment variable, but typically uses nl_langinfo(CODESET) when
1      it is supported.  It fall back on "ASCII" if CHARSET isn’t set and
1      nl_langinfo isn’t supported or return anything.
1 
1      Note that this function return the application’s locale’s preferred
1      charset (or thread’s locale’s preffered charset, if your system
1      support thread-specific locales).  It does not return what the
1      system may be using.  Thus, if you receive data from external
1      sources you cannot in general use this function to guess what
1      charset it is encoded in.  Use stringprep_convert from the external
1      representation into the charset returned by this function, to have
1      data in the locale encoding.
1 
1      Return value: Return the character set used by the current locale.
1      It will never return NULL, but use "ASCII" as a fallback.
1 
1 stringprep_convert
1 ------------------
1 
1  -- Function: char * stringprep_convert (const char * STR, const char *
1           TO_CODESET, const char * FROM_CODESET)
1      STR: input zero-terminated string.
1 
1      TO_CODESET: name of destination character set.
1 
1      FROM_CODESET: name of origin character set, as used by ‘str’ .
1 
1      Convert the string from one character set to another using the
1      system’s ‘iconv()’ function.
1 
1      Return value: Returns newly allocated zero-terminated string which
1      is ‘str’ transcoded into to_codeset.
1 
1 stringprep_locale_to_utf8
1 -------------------------
1 
1  -- Function: char * stringprep_locale_to_utf8 (const char * STR)
1      STR: input zero terminated string.
1 
1      Convert string encoded in the locale’s character set into UTF-8 by
1      using ‘stringprep_convert()’ .
1 
1      Return value: Returns newly allocated zero-terminated string which
1      is ‘str’ transcoded into UTF-8.
1 
1 stringprep_utf8_to_locale
1 -------------------------
1 
1  -- Function: char * stringprep_utf8_to_locale (const char * STR)
1      STR: input zero terminated string.
1 
1      Convert string encoded in UTF-8 into the locale’s character set by
1      using ‘stringprep_convert()’ .
1 
1      Return value: Returns newly allocated zero-terminated string which
1      is ‘str’ transcoded into the locale’s character set.
1