libidn: TLD Functions

1 
1 7 TLD Functions
1 ***************
1 
1 Organizations that manage some Top Level Domains (TLDs) have published
1 tables with characters they accept within the domain.  The reason may be
1 to reduce complexity that come from using the full Unicode range, and to
1 protect themselves from future (backwards incompatible) changes in the
1 IDN or Unicode specifications.  Libidn implement an infrastructure for
1 defining and checking strings against such tables.  Libidn also ship
1 some tables from TLDs that we have managed to get permission to use them
1 from.  Because these tables are even less static than Unicode or
1 StringPrep tables, it is likely that they will be updated from time to
1 time (even in backwards incompatible ways).  The Libidn interface
1 provide a “version” field for each TLD table, which can be compared for
1 equality to guarantee the same operation over time.
1 
1    From a design point of view, you can regard the TLD tables for IDN as
1 the “localization” step that come after the “internationalization” step
1 provided by the IETF standards.
1 
1    The TLD functionality rely on up-to-date tables.  The latest version
1 of Libidn aim to provide these, but tables with unclear copying
1 conditions, or generally experimental tables, are not included.  Some
1 such tables can be found at <https://github.com/gnuthor/tldchk>.
1 
1 7.1 Header file ‘tld.h’
1 =======================
1 
1 To use the functions explained in this chapter, you need to include the
1 file ‘tld.h’ using:
1 
1      #include <tld.h>
1 
1 7.2 Core Functions
1 ==================
1 
1 tld_check_4t
1 ------------
1 
1  -- Function: int tld_check_4t (const uint32_t * IN, size_t INLEN,
1           size_t * ERRPOS, const Tld_table * TLD)
1      IN: Array of unicode code points to process.  Does not need to be
1      zero terminated.
1 
1      INLEN: Number of unicode code points.
1 
1      ERRPOS: Position of offending character is returned here.
1 
1      TLD: A ‘Tld_table’ data structure representing the restrictions for
1      which the input should be tested.
1 
1      Test each of the code points in ‘in’ for whether or not they are
1      allowed by the data structure in ‘tld’ , return the position of the
1      first character for which this is not the case in ‘errpos’ .
1 
1      Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all code
1      points are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1      character is not allowed, or additional error codes on general
1      failure conditions.
1 
1 tld_check_4tz
1 -------------
1 
1  -- Function: int tld_check_4tz (const uint32_t * IN, size_t * ERRPOS,
1           const Tld_table * TLD)
1      IN: Zero terminated array of unicode code points to process.
1 
1      ERRPOS: Position of offending character is returned here.
1 
1      TLD: A ‘Tld_table’ data structure representing the restrictions for
1      which the input should be tested.
1 
1      Test each of the code points in ‘in’ for whether or not they are
1      allowed by the data structure in ‘tld’ , return the position of the
1      first character for which this is not the case in ‘errpos’ .
1 
1      Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all code
1      points are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1      character is not allowed, or additional error codes on general
1      failure conditions.
1 
1 7.3 Utility Functions
1 =====================
1 
1 tld_get_4
1 ---------
1 
1  -- Function: int tld_get_4 (const uint32_t * IN, size_t INLEN, char **
1           OUT)
1      IN: Array of unicode code points to process.  Does not need to be
1      zero terminated.
1 
1      INLEN: Number of unicode code points.
1 
1      OUT: Zero terminated ascii result string pointer.
1 
1      Isolate the top-level domain of ‘in’ and return it as an ASCII
1      string in ‘out’ .
1 
1      Return value: Return ‘TLD_SUCCESS’ on success, or the corresponding
1      ‘Tld_rc’ error code otherwise.
1 
1 tld_get_4z
1 ----------
1 
1  -- Function: int tld_get_4z (const uint32_t * IN, char ** OUT)
1      IN: Zero terminated array of unicode code points to process.
1 
1      OUT: Zero terminated ascii result string pointer.
1 
1      Isolate the top-level domain of ‘in’ and return it as an ASCII
1      string in ‘out’ .
1 
1      Return value: Return ‘TLD_SUCCESS’ on success, or the corresponding
1      ‘Tld_rc’ error code otherwise.
1 
1 tld_get_z
1 ---------
1 
1  -- Function: int tld_get_z (const char * IN, char ** OUT)
1      IN: Zero terminated character array to process.
1 
1      OUT: Zero terminated ascii result string pointer.
1 
1      Isolate the top-level domain of ‘in’ and return it as an ASCII
1      string in ‘out’ .  The input string ‘in’ may be UTF-8, ISO-8859-1
1      or any ASCII compatible character encoding.
1 
1      Return value: Return ‘TLD_SUCCESS’ on success, or the corresponding
1      ‘Tld_rc’ error code otherwise.
1 
1 tld_get_table
1 -------------
1 
1  -- Function: const Tld_table * tld_get_table (const char * TLD, const
1           Tld_table ** TABLES)
1      TLD: TLD name (e.g.  "com") as zero terminated ASCII byte string.
1 
1      TABLES: Zero terminated array of ‘Tld_table’ info-structures for
1      TLDs.
1 
1      Get the TLD table for a named TLD by searching through the given
1      TLD table array.
1 
1      Return value: Return structure corresponding to TLD ‘tld’ by going
1      thru ‘tables’ , or return ‘NULL’ if no such structure is found.
1 
1 tld_default_table
1 -----------------
1 
1  -- Function: const Tld_table * tld_default_table (const char * TLD,
1           const Tld_table ** OVERRIDES)
1      TLD: TLD name (e.g.  "com") as zero terminated ASCII byte string.
1 
1      OVERRIDES: Additional zero terminated array of ‘Tld_table’
1      info-structures for TLDs, or ‘NULL’ to only use library deault
1      tables.
1 
1      Get the TLD table for a named TLD, using the internal defaults,
1      possibly overrided by the (optional) supplied tables.
1 
1      Return value: Return structure corresponding to TLD ‘tld_str’ ,
1      first looking through ‘overrides’ then thru built-in list, or
1      ‘NULL’ if no such structure found.
1 
1 7.4 High-Level Wrapper Functions
1 ================================
1 
1 tld_check_4
1 -----------
1 
1  -- Function: int tld_check_4 (const uint32_t * IN, size_t INLEN, size_t
1           * ERRPOS, const Tld_table ** OVERRIDES)
1      IN: Array of unicode code points to process.  Does not need to be
1      zero terminated.
1 
1      INLEN: Number of unicode code points.
1 
1      ERRPOS: Position of offending character is returned here.
1 
1      OVERRIDES: A ‘Tld_table’ array of additional domain restriction
1      structures that complement and supersede the built-in information.
1 
1      Test each of the code points in ‘in’ for whether or not they are
1      allowed by the information in ‘overrides’ or by the built-in TLD
1      restriction data.  When data for the same TLD is available both
1      internally and in ‘overrides’ , the information in ‘overrides’
1      takes precedence.  If several entries for a specific TLD are found,
1      the first one is used.  If ‘overrides’ is ‘NULL’ , only the
1      built-in information is used.  The position of the first offending
1      character is returned in ‘errpos’ .
1 
1      Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all code
1      points are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1      character is not allowed, or additional error codes on general
1      failure conditions.
1 
1 tld_check_4z
1 ------------
1 
1  -- Function: int tld_check_4z (const uint32_t * IN, size_t * ERRPOS,
1           const Tld_table ** OVERRIDES)
1      IN: Zero-terminated array of unicode code points to process.
1 
1      ERRPOS: Position of offending character is returned here.
1 
1      OVERRIDES: A ‘Tld_table’ array of additional domain restriction
1      structures that complement and supersede the built-in information.
1 
1      Test each of the code points in ‘in’ for whether or not they are
1      allowed by the information in ‘overrides’ or by the built-in TLD
1      restriction data.  When data for the same TLD is available both
1      internally and in ‘overrides’ , the information in ‘overrides’
1      takes precedence.  If several entries for a specific TLD are found,
1      the first one is used.  If ‘overrides’ is ‘NULL’ , only the
1      built-in information is used.  The position of the first offending
1      character is returned in ‘errpos’ .
1 
1      Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all code
1      points are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1      character is not allowed, or additional error codes on general
1      failure conditions.
1 
1 tld_check_8z
1 ------------
1 
1  -- Function: int tld_check_8z (const char * IN, size_t * ERRPOS, const
1           Tld_table ** OVERRIDES)
1      IN: Zero-terminated UTF8 string to process.
1 
1      ERRPOS: Position of offending character is returned here.
1 
1      OVERRIDES: A ‘Tld_table’ array of additional domain restriction
1      structures that complement and supersede the built-in information.
1 
1      Test each of the characters in ‘in’ for whether or not they are
1      allowed by the information in ‘overrides’ or by the built-in TLD
1      restriction data.  When data for the same TLD is available both
1      internally and in ‘overrides’ , the information in ‘overrides’
1      takes precedence.  If several entries for a specific TLD are found,
1      the first one is used.  If ‘overrides’ is ‘NULL’ , only the
1      built-in information is used.  The position of the first offending
1      character is returned in ‘errpos’ .  Note that the error position
1      refers to the decoded character offset rather than the byte
1      position in the string.
1 
1      Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all
1      characters are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1      character is not allowed, or additional error codes on general
1      failure conditions.
1 
1 tld_check_lz
1 ------------
1 
1  -- Function: int tld_check_lz (const char * IN, size_t * ERRPOS, const
1           Tld_table ** OVERRIDES)
1      IN: Zero-terminated string in the current locales encoding to
1      process.
1 
1      ERRPOS: Position of offending character is returned here.
1 
1      OVERRIDES: A ‘Tld_table’ array of additional domain restriction
1      structures that complement and supersede the built-in information.
1 
1      Test each of the characters in ‘in’ for whether or not they are
1      allowed by the information in ‘overrides’ or by the built-in TLD
1      restriction data.  When data for the same TLD is available both
1      internally and in ‘overrides’ , the information in ‘overrides’
1      takes precedence.  If several entries for a specific TLD are found,
1      the first one is used.  If ‘overrides’ is ‘NULL’ , only the
1      built-in information is used.  The position of the first offending
1      character is returned in ‘errpos’ .  Note that the error position
1      refers to the decoded character offset rather than the byte
1      position in the string.
1 
1      Return value: Returns the ‘Tld_rc’ value ‘TLD_SUCCESS’ if all
1      characters are valid or when ‘tld’ is null, ‘TLD_INVALID’ if a
1      character is not allowed, or additional error codes on general
1      failure conditions.
1 
1 7.5 Error Handling
1 ==================
1 
1 tld_strerror
1 ------------
1 
1  -- Function: const char * tld_strerror (Tld_rc RC)
1      RC: tld return code
1 
1      Convert a return code integer to a text string.  This string can be
1      used to output a diagnostic message to the user.
1 
1      *TLD_SUCCESS:* Successful operation.  This value is guaranteed to
1      always be zero, the remaining ones are only guaranteed to hold
1      non-zero values, for logical comparison purposes.
1 
1      *TLD_INVALID:* Invalid character found.
1 
1      *TLD_NODATA:* No input data was provided.
1 
1      *TLD_MALLOC_ERROR:* Error during memory allocation.
1 
1      *TLD_ICONV_ERROR:* Error during iconv string conversion.
1 
1      *TLD_NO_TLD:* No top-level domain found in domain string.
1 
1      Return value: Returns a pointer to a statically allocated string
1      containing a description of the error with the return code ‘rc’ .
1