libidn: Invoking idn

1 
1 10 Invoking idn
1 ***************
1 
1 10.1 Name
1 =========
1 
1 GNU Libidn (idn) – Internationalized Domain Names command line tool
1 
1 10.2 Description
1 ================
1 
1 ‘idn’ allows internationalized string preparation (‘stringprep’),
1 encoding and decoding of punycode data, and IDNA ToASCII/ToUnicode
1 operations to be performed on the command line.
1 
1    If strings are specified on the command line, they are used as input
1 and the computed output is printed to standard output ‘stdout’.  If no
1 strings are specified on the command line, the program read data, line
1 by line, from the standard input ‘stdin’, and print the computed output
1 to standard output.  What processing is performed (e.g., ToASCII, or
1 Punycode encode) is indicated by options.  If any errors are
1 encountered, the execution of the applications is aborted.
1 
1    All strings are expected to be encoded in the preferred charset used
1 by your locale.  Use ‘--debug’ to find out what this charset is.  You
1 can override the charset used by setting environment variable ‘CHARSET’.
1 
1    To process a string that starts with ‘-’, for example ‘-foo’, use
1 ‘--’ to signal the end of parameters, as in ‘idn --quiet -a -- -foo’.
1 
1 10.3 Options
1 ============
1 
1 ‘idn’ recognizes these commands:
1 
1   -h, --help               Print help and exit
1 
1   -V, --version            Print version and exit
1 
1   -s, --stringprep         Prepare string according to nameprep profile
1 
1   -d, --punycode-decode    Decode Punycode
1 
1   -e, --punycode-encode    Encode Punycode
1 
1   -a, --idna-to-ascii      Convert to ACE according to IDNA (default mode)
1 
1   -u, --idna-to-unicode    Convert from ACE according to IDNA
1 
1       --allow-unassigned   Toggle IDNA AllowUnassigned flag (default off)
1 
1       --usestd3asciirules  Toggle IDNA UseSTD3ASCIIRules flag (default off)
1 
1       --no-tld             Don't check string for TLD specific rules
1                              Only for --idna-to-ascii and --idna-to-unicode
1 
1   -n, --nfkc               Normalize string according to Unicode v3.2 NFKC
1 
1   -p, --profile=STRING     Use specified stringprep profile instead
1                              Valid stringprep profiles: `Nameprep',
1                              `iSCSI', `Nodeprep', `Resourceprep',
1                              `trace', `SASLprep'
1 
1       --debug              Print debugging information
1 
1       --quiet              Silent operation
1 
1 10.4 Environment Variables
1 ==========================
1 
1 The CHARSET environment variable can be used to override what character
1 set to be used for decoding incoming data (i.e., on the command line or
1 on the standard input stream), and to encode data to the standard
1 output.  If your system is set up correctly, however, the application
1 will guess which character set is used automatically.  Example usage:
1 
1      $ CHARSET=ISO-8859-1 idn --punycode-encode
1      ...
1 
1 10.5 Examples
1 =============
1 
1 Standard usage, reading input from standard input:
1 
1      jas@latte:~$ idn
1      libidn 0.3.5
1      Copyright 2002, 2003 Simon Josefsson.
1      GNU Libidn comes with NO WARRANTY, to the extent permitted by law.
1      You may redistribute copies of GNU Libidn under the terms of
1      the GNU Lesser General Public License.  For more information
1      about these matters, see the file named COPYING.LIB.
1      Type each input string on a line by itself, terminated by a newline character.
1      räksmörgås.se
1      xn--rksmrgs-5wao1o.se
1      jas@latte:~$
1 
1    Reading input from command line, and disable printing copyright and
1 license information:
1 
1      jas@latte:~$ idn --quiet räksmörgås.se blåbærgrød.no
1      xn--rksmrgs-5wao1o.se
1      xn--blbrgrd-fxak7p.no
1      jas@latte:~$
1 
1    Accessing a specific StringPrep profile directly:
1 
1      jas@latte:~$ idn --quiet --profile=SASLprep --stringprep teßtª
1      teßta
1      jas@latte:~$
1 
1 10.6 Troubleshooting
1 ====================
1 
1 Getting character data encoded right, and making sure Libidn use the
1 same encoding, can be difficult.  The reason for this is that most
1 systems encode character data in more than one character encoding, i.e.,
1 using ‘UTF-8’ together with ‘ISO-8859-1’ or ‘ISO-2022-JP’.  This problem
1 is likely to continue to exist until only one character encoding come
1 out as the evolutionary winner, or (more likely, at least to some
1 extents) forever.
1 
1    The first step to troubleshooting character encoding problems with
1 Libidn is to use the ‘--debug’ parameter to find out which character set
1 encoding ‘idn’ believe your locale uses.
1 
1      jas@latte:~$ idn --debug --quiet ""
1      system locale uses charset `UTF-8'.
1 
1      jas@latte:~$
1 
1    If it prints ‘ANSI_X3.4-1968’ (i.e., ‘US-ASCII’), this indicate you
1 have not configured your locale properly.  To configure the locale, you
1 can, for example, use ‘LANG=sv_SE.UTF-8; export LANG’ at a ‘/bin/sh’
1 prompt, to set up your locale for a Swedish environment using ‘UTF-8’ as
1 the encoding.
1 
1    Sometimes ‘idn’ appear to be unable to translate from your system
1 locale into ‘UTF-8’ (which is used internally), and you get an error
1 like the following:
1 
1      jas@latte:~$ idn --quiet foo
1      idn: could not convert from ISO-8859-1 to UTF-8.
1      jas@latte:~$
1 
1    The simplest explanation is that you haven’t installed the ‘iconv’
1 conversion tools.  You can find it as a standalone library in GNU
1 Libiconv (<http://www.gnu.org/software/libiconv/>).  On many GNU/Linux
1 systems, this library is part of the system, but you may have to install
1 additional packages (e.g., ‘glibc-locale’ for Debian) to be able to use
1 it.
1 
1    Another explanation is that the error is correct and you are feeding
1 ‘idn’ invalid data.  This can happen inadvertently if you are not
1 careful with the character set encoding you use.  For example, if your
1 shell run in a ‘ISO-8859-1’ environment, and you invoke ‘idn’ with the
1 ‘CHARSET’ environment variable as follows, you will feed it ‘ISO-8859-1’
1 characters but force it to believe they are ‘UTF-8’.  Naturally this
1 will lead to an error, unless the byte sequences happen to be valid
1 ‘UTF-8’.  Note that even if you don’t get an error, the output may be
1 incorrect in this situation, because ‘ISO-8859-1’ and ‘UTF-8’ does not
1 in general encode the same characters as the same byte sequences.
1 
1      jas@latte:~$ idn --quiet --debug ""
1      system locale uses charset `ISO-8859-1'.
1 
1      jas@latte:~$ CHARSET=UTF-8 idn --quiet --debug räksmörgås
1      system locale uses charset `UTF-8'.
1      input[0] = U+0072
1      input[1] = U+4af3
1      input[2] = U+006d
1      input[3] = U+1b29e5
1      input[4] = U+0073
1      output[0] = U+0078
1      output[1] = U+006e
1      output[2] = U+002d
1      output[3] = U+002d
1      output[4] = U+0072
1      output[5] = U+006d
1      output[6] = U+0073
1      output[7] = U+002d
1      output[8] = U+0068
1      output[9] = U+0069
1      output[10] = U+0036
1      output[11] = U+0064
1      output[12] = U+0035
1      output[13] = U+0039
1      output[14] = U+0037
1      output[15] = U+0035
1      output[16] = U+0035
1      output[17] = U+0032
1      output[18] = U+0061
1      xn--rms-hi6d597552a
1      jas@latte:~$
1 
1    The sense moral here is to forget about ‘CHARSET’ (configure your
1 locales properly instead) unless you know what you are doing, and if you
1 want to use it, do it carefully, after verifying with ‘--debug’ that you
1 get the desired results.
1