libidn: TLD Functions
1
1 7 TLD Functions
1 ***************
1
1 Organizations that manage some Top Level Domains (TLDs) have published
1 tables with characters they accept within the domain. The reason may be
1 to reduce complexity that come from using the full Unicode range, and to
1 protect themselves from future (backwards incompatible) changes in the
1 IDN or Unicode specifications. Libidn implement an infrastructure for
1 defining and checking strings against such tables. Libidn also ship
1 some tables from TLDs that we have managed to get permission to use them
1 from. Because these tables are even less static than Unicode or
1 StringPrep tables, it is likely that they will be updated from time to
1 time (even in backwards incompatible ways). The Libidn interface
1 provide a “version” field for each TLD table, which can be compared for
1 equality to guarantee the same operation over time.
1
1 From a design point of view, you can regard the TLD tables for IDN as
1 the “localization” step that come after the “internationalization” step
1 provided by the IETF standards.
1
1 The TLD functionality rely on up-to-date tables. The latest version
1 of Libidn aim to provide these, but tables with unclear copying
1 conditions, or generally experimental tables, are not included. Some
1 such tables can be found at <https://github.com/gnuthor/tldchk>.
1
1 7.1 Header file ‘tld.h’
1 =======================
1
1 To use the functions explained in this chapter, you need to include the
1 file ‘tld.h’ using:
1
1 #include <tld.h>
1
1 7.2 Core Functions
1 ==================
1
1 tld_check_4t
1 ------------
1
1 -- Function: int tld_check_4t (const uint32_t * IN, size_t INLEN,
1 size_t * ERRPOS, const Tld_table * TLD)
1 IN: Array of unicode code points to process. Does not need to be
1 zero terminated.
1
1 INLEN: Number of unicode code points.
1
1 ERRPOS: Position of offending character is returned here.
1
1 TLD: A ‘Tld_table’ data structure representing the restrictions for
1 which the input should be tested.
1
1 Test each of the code points in ‘in’ for whether or not they are
1 allowed by the data structure in ‘tld’ , return the position of the
1 first character for which this is not the case in ‘errpos’ .
1
1 Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all code
1 points are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1 character is not allowed, or additional error codes on general
1 failure conditions.
1
1 tld_check_4tz
1 -------------
1
1 -- Function: int tld_check_4tz (const uint32_t * IN, size_t * ERRPOS,
1 const Tld_table * TLD)
1 IN: Zero terminated array of unicode code points to process.
1
1 ERRPOS: Position of offending character is returned here.
1
1 TLD: A ‘Tld_table’ data structure representing the restrictions for
1 which the input should be tested.
1
1 Test each of the code points in ‘in’ for whether or not they are
1 allowed by the data structure in ‘tld’ , return the position of the
1 first character for which this is not the case in ‘errpos’ .
1
1 Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all code
1 points are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1 character is not allowed, or additional error codes on general
1 failure conditions.
1
1 7.3 Utility Functions
1 =====================
1
1 tld_get_4
1 ---------
1
1 -- Function: int tld_get_4 (const uint32_t * IN, size_t INLEN, char **
1 OUT)
1 IN: Array of unicode code points to process. Does not need to be
1 zero terminated.
1
1 INLEN: Number of unicode code points.
1
1 OUT: Zero terminated ascii result string pointer.
1
1 Isolate the top-level domain of ‘in’ and return it as an ASCII
1 string in ‘out’ .
1
1 Return value: Return ‘TLD_SUCCESS’ on success, or the corresponding
1 ‘Tld_rc’ error code otherwise.
1
1 tld_get_4z
1 ----------
1
1 -- Function: int tld_get_4z (const uint32_t * IN, char ** OUT)
1 IN: Zero terminated array of unicode code points to process.
1
1 OUT: Zero terminated ascii result string pointer.
1
1 Isolate the top-level domain of ‘in’ and return it as an ASCII
1 string in ‘out’ .
1
1 Return value: Return ‘TLD_SUCCESS’ on success, or the corresponding
1 ‘Tld_rc’ error code otherwise.
1
1 tld_get_z
1 ---------
1
1 -- Function: int tld_get_z (const char * IN, char ** OUT)
1 IN: Zero terminated character array to process.
1
1 OUT: Zero terminated ascii result string pointer.
1
1 Isolate the top-level domain of ‘in’ and return it as an ASCII
1 string in ‘out’ . The input string ‘in’ may be UTF-8, ISO-8859-1
1 or any ASCII compatible character encoding.
1
1 Return value: Return ‘TLD_SUCCESS’ on success, or the corresponding
1 ‘Tld_rc’ error code otherwise.
1
1 tld_get_table
1 -------------
1
1 -- Function: const Tld_table * tld_get_table (const char * TLD, const
1 Tld_table ** TABLES)
1 TLD: TLD name (e.g. "com") as zero terminated ASCII byte string.
1
1 TABLES: Zero terminated array of ‘Tld_table’ info-structures for
1 TLDs.
1
1 Get the TLD table for a named TLD by searching through the given
1 TLD table array.
1
1 Return value: Return structure corresponding to TLD ‘tld’ by going
1 thru ‘tables’ , or return ‘NULL’ if no such structure is found.
1
1 tld_default_table
1 -----------------
1
1 -- Function: const Tld_table * tld_default_table (const char * TLD,
1 const Tld_table ** OVERRIDES)
1 TLD: TLD name (e.g. "com") as zero terminated ASCII byte string.
1
1 OVERRIDES: Additional zero terminated array of ‘Tld_table’
1 info-structures for TLDs, or ‘NULL’ to only use library deault
1 tables.
1
1 Get the TLD table for a named TLD, using the internal defaults,
1 possibly overrided by the (optional) supplied tables.
1
1 Return value: Return structure corresponding to TLD ‘tld_str’ ,
1 first looking through ‘overrides’ then thru built-in list, or
1 ‘NULL’ if no such structure found.
1
1 7.4 High-Level Wrapper Functions
1 ================================
1
1 tld_check_4
1 -----------
1
1 -- Function: int tld_check_4 (const uint32_t * IN, size_t INLEN, size_t
1 * ERRPOS, const Tld_table ** OVERRIDES)
1 IN: Array of unicode code points to process. Does not need to be
1 zero terminated.
1
1 INLEN: Number of unicode code points.
1
1 ERRPOS: Position of offending character is returned here.
1
1 OVERRIDES: A ‘Tld_table’ array of additional domain restriction
1 structures that complement and supersede the built-in information.
1
1 Test each of the code points in ‘in’ for whether or not they are
1 allowed by the information in ‘overrides’ or by the built-in TLD
1 restriction data. When data for the same TLD is available both
1 internally and in ‘overrides’ , the information in ‘overrides’
1 takes precedence. If several entries for a specific TLD are found,
1 the first one is used. If ‘overrides’ is ‘NULL’ , only the
1 built-in information is used. The position of the first offending
1 character is returned in ‘errpos’ .
1
1 Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all code
1 points are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1 character is not allowed, or additional error codes on general
1 failure conditions.
1
1 tld_check_4z
1 ------------
1
1 -- Function: int tld_check_4z (const uint32_t * IN, size_t * ERRPOS,
1 const Tld_table ** OVERRIDES)
1 IN: Zero-terminated array of unicode code points to process.
1
1 ERRPOS: Position of offending character is returned here.
1
1 OVERRIDES: A ‘Tld_table’ array of additional domain restriction
1 structures that complement and supersede the built-in information.
1
1 Test each of the code points in ‘in’ for whether or not they are
1 allowed by the information in ‘overrides’ or by the built-in TLD
1 restriction data. When data for the same TLD is available both
1 internally and in ‘overrides’ , the information in ‘overrides’
1 takes precedence. If several entries for a specific TLD are found,
1 the first one is used. If ‘overrides’ is ‘NULL’ , only the
1 built-in information is used. The position of the first offending
1 character is returned in ‘errpos’ .
1
1 Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all code
1 points are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1 character is not allowed, or additional error codes on general
1 failure conditions.
1
1 tld_check_8z
1 ------------
1
1 -- Function: int tld_check_8z (const char * IN, size_t * ERRPOS, const
1 Tld_table ** OVERRIDES)
1 IN: Zero-terminated UTF8 string to process.
1
1 ERRPOS: Position of offending character is returned here.
1
1 OVERRIDES: A ‘Tld_table’ array of additional domain restriction
1 structures that complement and supersede the built-in information.
1
1 Test each of the characters in ‘in’ for whether or not they are
1 allowed by the information in ‘overrides’ or by the built-in TLD
1 restriction data. When data for the same TLD is available both
1 internally and in ‘overrides’ , the information in ‘overrides’
1 takes precedence. If several entries for a specific TLD are found,
1 the first one is used. If ‘overrides’ is ‘NULL’ , only the
1 built-in information is used. The position of the first offending
1 character is returned in ‘errpos’ . Note that the error position
1 refers to the decoded character offset rather than the byte
1 position in the string.
1
1 Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all
1 characters are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1 character is not allowed, or additional error codes on general
1 failure conditions.
1
1 tld_check_lz
1 ------------
1
1 -- Function: int tld_check_lz (const char * IN, size_t * ERRPOS, const
1 Tld_table ** OVERRIDES)
1 IN: Zero-terminated string in the current locales encoding to
1 process.
1
1 ERRPOS: Position of offending character is returned here.
1
1 OVERRIDES: A ‘Tld_table’ array of additional domain restriction
1 structures that complement and supersede the built-in information.
1
1 Test each of the characters in ‘in’ for whether or not they are
1 allowed by the information in ‘overrides’ or by the built-in TLD
1 restriction data. When data for the same TLD is available both
1 internally and in ‘overrides’ , the information in ‘overrides’
1 takes precedence. If several entries for a specific TLD are found,
1 the first one is used. If ‘overrides’ is ‘NULL’ , only the
1 built-in information is used. The position of the first offending
1 character is returned in ‘errpos’ . Note that the error position
1 refers to the decoded character offset rather than the byte
1 position in the string.
1
1 Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all
1 characters are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1 character is not allowed, or additional error codes on general
1 failure conditions.
1
1 7.5 Error Handling
1 ==================
1
1 tld_strerror
1 ------------
1
1 -- Function: const char * tld_strerror (Tld_rc RC)
1 RC: tld return code
1
1 Convert a return code integer to a text string. This string can be
1 used to output a diagnostic message to the user.
1
1 *TLD_SUCCESS:* Successful operation. This value is guaranteed to
1 always be zero, the remaining ones are only guaranteed to hold
1 non-zero values, for logical comparison purposes.
1
1 *TLD_INVALID:* Invalid character found.
1
1 *TLD_NODATA:* No input data was provided.
1
1 *TLD_MALLOC_ERROR:* Error during memory allocation.
1
1 *TLD_ICONV_ERROR:* Error during iconv string conversion.
1
1 *TLD_NO_TLD:* No top-level domain found in domain string.
1
1 Return value: Returns a pointer to a statically allocated string
1 containing a description of the error with the return code ‘rc’ .
1