libidn: Utility Functions
1
1 3 Utility Functions
1 *******************
1
1 The rest of this library makes extensive use of Unicode characters. In
1 order to interface this library with the outside world, your application
1 may need to make various Unicode transformations.
1
1 3.1 Header file ‘stringprep.h’
1 ==============================
1
1 To use the functions explained in this chapter, you need to include the
1 file ‘stringprep.h’ using:
1
1 #include <stringprep.h>
1
1 3.2 Unicode Encoding Transformation
1 ===================================
1
1 stringprep_unichar_to_utf8
1 --------------------------
1
1 -- Function: int stringprep_unichar_to_utf8 (uint32_t C, char * OUTBUF)
1 C: a ISO10646 character code
1
1 OUTBUF: output buffer, must have at least 6 bytes of space. If
1 ‘NULL’ , the length will be computed and returned and nothing will
1 be written to ‘outbuf’ .
1
1 Converts a single character to UTF-8.
1
1 Return value: number of bytes written.
1
1 stringprep_utf8_to_unichar
1 --------------------------
1
1 -- Function: uint32_t stringprep_utf8_to_unichar (const char * P)
1 P: a pointer to Unicode character encoded as UTF-8
1
1 Converts a sequence of bytes encoded as UTF-8 to a Unicode
1 character. If ‘p’ does not point to a valid UTF-8 encoded
1 character, results are undefined.
1
1 Return value: the resulting character.
1
1 stringprep_ucs4_to_utf8
1 -----------------------
1
1 -- Function: char * stringprep_ucs4_to_utf8 (const uint32_t * STR,
1 ssize_t LEN, size_t * ITEMS_READ, size_t * ITEMS_WRITTEN)
1 STR: a UCS-4 encoded string
1
1 LEN: the maximum length of ‘str’ to use. If ‘len’ < 0, then the
1 string is terminated with a 0 character.
1
1 ITEMS_READ: location to store number of characters read read, or
1 ‘NULL’ .
1
1 ITEMS_WRITTEN: location to store number of bytes written or ‘NULL’
1 . The value here stored does not include the trailing 0 byte.
1
1 Convert a string from a 32-bit fixed width representation as UCS-4.
1 to UTF-8. The result will be terminated with a 0 byte.
1
1 Return value: a pointer to a newly allocated UTF-8 string. This
1 value must be deallocated by the caller. If an error occurs,
1 ‘NULL’ will be returned.
1
1 stringprep_utf8_to_ucs4
1 -----------------------
1
1 -- Function: uint32_t * stringprep_utf8_to_ucs4 (const char * STR,
1 ssize_t LEN, size_t * ITEMS_WRITTEN)
1 STR: a UTF-8 encoded string
1
1 LEN: the maximum length of ‘str’ to use. If ‘len’ < 0, then the
1 string is nul-terminated.
1
1 ITEMS_WRITTEN: location to store the number of characters in the
1 result, or ‘NULL’ .
1
1 Convert a string from UTF-8 to a 32-bit fixed width representation
1 as UCS-4. The function now performs error checking to verify that
1 the input is valid UTF-8 (before it was documented to not do error
1 checking).
1
1 Return value: a pointer to a newly allocated UCS-4 string. This
1 value must be deallocated by the caller.
1
1 3.3 Unicode Normalization
1 =========================
1
1 stringprep_ucs4_nfkc_normalize
1 ------------------------------
1
1 -- Function: uint32_t * stringprep_ucs4_nfkc_normalize (const uint32_t
1 * STR, ssize_t LEN)
1 STR: a Unicode string.
1
1 LEN: length of ‘str’ array, or -1 if ‘str’ is nul-terminated.
1
1 Converts a UCS4 string into canonical form, see
1 ‘stringprep_utf8_nfkc_normalize()’ for more information.
1
1 Return value: a newly allocated Unicode string, that is the NFKC
1 normalized form of ‘str’ .
1
1 stringprep_utf8_nfkc_normalize
1 ------------------------------
1
1 -- Function: char * stringprep_utf8_nfkc_normalize (const char * STR,
1 ssize_t LEN)
1 STR: a UTF-8 encoded string.
1
1 LEN: length of ‘str’ , in bytes, or -1 if ‘str’ is nul-terminated.
1
1 Converts a string into canonical form, standardizing such issues as
1 whether a character with an accent is represented as a base
1 character and combining accent or as a single precomposed
1 character.
1
1 The normalization mode is NFKC (ALL COMPOSE). It standardizes
1 differences that do not affect the text content, such as the
1 above-mentioned accent representation. It standardizes the
1 "compatibility" characters in Unicode, such as SUPERSCRIPT THREE to
1 the standard forms (in this case DIGIT THREE). Formatting
1 information may be lost but for most text operations such
1 characters should be considered the same. It returns a result with
1 composed forms rather than a maximally decomposed form.
1
1 Return value: a newly allocated string, that is the NFKC normalized
1 form of ‘str’ .
1
1 3.4 Character Set Conversion
1 ============================
1
1 stringprep_locale_charset
1 -------------------------
1
1 -- Function: const char * stringprep_locale_charset ( VOID)
1
1 Find out current locale charset. The function respect the CHARSET
1 environment variable, but typically uses nl_langinfo(CODESET) when
1 it is supported. It fall back on "ASCII" if CHARSET isn’t set and
1 nl_langinfo isn’t supported or return anything.
1
1 Note that this function return the application’s locale’s preferred
1 charset (or thread’s locale’s preffered charset, if your system
1 support thread-specific locales). It does not return what the
1 system may be using. Thus, if you receive data from external
1 sources you cannot in general use this function to guess what
1 charset it is encoded in. Use stringprep_convert from the external
1 representation into the charset returned by this function, to have
1 data in the locale encoding.
1
1 Return value: Return the character set used by the current locale.
1 It will never return NULL, but use "ASCII" as a fallback.
1
1 stringprep_convert
1 ------------------
1
1 -- Function: char * stringprep_convert (const char * STR, const char *
1 TO_CODESET, const char * FROM_CODESET)
1 STR: input zero-terminated string.
1
1 TO_CODESET: name of destination character set.
1
1 FROM_CODESET: name of origin character set, as used by ‘str’ .
1
1 Convert the string from one character set to another using the
1 system’s ‘iconv()’ function.
1
1 Return value: Returns newly allocated zero-terminated string which
1 is ‘str’ transcoded into to_codeset.
1
1 stringprep_locale_to_utf8
1 -------------------------
1
1 -- Function: char * stringprep_locale_to_utf8 (const char * STR)
1 STR: input zero terminated string.
1
1 Convert string encoded in the locale’s character set into UTF-8 by
1 using ‘stringprep_convert()’ .
1
1 Return value: Returns newly allocated zero-terminated string which
1 is ‘str’ transcoded into UTF-8.
1
1 stringprep_utf8_to_locale
1 -------------------------
1
1 -- Function: char * stringprep_utf8_to_locale (const char * STR)
1 STR: input zero terminated string.
1
1 Convert string encoded in UTF-8 into the locale’s character set by
1 using ‘stringprep_convert()’ .
1
1 Return value: Returns newly allocated zero-terminated string which
1 is ‘str’ transcoded into the locale’s character set.
1