libidn: Invoking idn
1
1 10 Invoking idn
1 ***************
1
1 10.1 Name
1 =========
1
1 GNU Libidn (idn) – Internationalized Domain Names command line tool
1
1 10.2 Description
1 ================
1
1 ‘idn’ allows internationalized string preparation (‘stringprep’),
1 encoding and decoding of punycode data, and IDNA ToASCII/ToUnicode
1 operations to be performed on the command line.
1
1 If strings are specified on the command line, they are used as input
1 and the computed output is printed to standard output ‘stdout’. If no
1 strings are specified on the command line, the program read data, line
1 by line, from the standard input ‘stdin’, and print the computed output
1 to standard output. What processing is performed (e.g., ToASCII, or
1 Punycode encode) is indicated by options. If any errors are
1 encountered, the execution of the applications is aborted.
1
1 All strings are expected to be encoded in the preferred charset used
1 by your locale. Use ‘--debug’ to find out what this charset is. You
1 can override the charset used by setting environment variable ‘CHARSET’.
1
1 To process a string that starts with ‘-’, for example ‘-foo’, use
1 ‘--’ to signal the end of parameters, as in ‘idn --quiet -a -- -foo’.
1
1 10.3 Options
1 ============
1
1 ‘idn’ recognizes these commands:
1
1 -h, --help Print help and exit
1
1 -V, --version Print version and exit
1
1 -s, --stringprep Prepare string according to nameprep profile
1
1 -d, --punycode-decode Decode Punycode
1
1 -e, --punycode-encode Encode Punycode
1
1 -a, --idna-to-ascii Convert to ACE according to IDNA (default mode)
1
1 -u, --idna-to-unicode Convert from ACE according to IDNA
1
1 --allow-unassigned Toggle IDNA AllowUnassigned flag (default off)
1
1 --usestd3asciirules Toggle IDNA UseSTD3ASCIIRules flag (default off)
1
1 --no-tld Don't check string for TLD specific rules
1 Only for --idna-to-ascii and --idna-to-unicode
1
1 -n, --nfkc Normalize string according to Unicode v3.2 NFKC
1
1 -p, --profile=STRING Use specified stringprep profile instead
1 Valid stringprep profiles: `Nameprep',
1 `iSCSI', `Nodeprep', `Resourceprep',
1 `trace', `SASLprep'
1
1 --debug Print debugging information
1
1 --quiet Silent operation
1
1 10.4 Environment Variables
1 ==========================
1
1 The CHARSET environment variable can be used to override what character
1 set to be used for decoding incoming data (i.e., on the command line or
1 on the standard input stream), and to encode data to the standard
1 output. If your system is set up correctly, however, the application
1 will guess which character set is used automatically. Example usage:
1
1 $ CHARSET=ISO-8859-1 idn --punycode-encode
1 ...
1
1 10.5 Examples
1 =============
1
1 Standard usage, reading input from standard input:
1
1 jas@latte:~$ idn
1 libidn 0.3.5
1 Copyright 2002, 2003 Simon Josefsson.
1 GNU Libidn comes with NO WARRANTY, to the extent permitted by law.
1 You may redistribute copies of GNU Libidn under the terms of
1 the GNU Lesser General Public License. For more information
1 about these matters, see the file named COPYING.LIB.
1 Type each input string on a line by itself, terminated by a newline character.
1 räksmörgås.se
1 xn--rksmrgs-5wao1o.se
1 jas@latte:~$
1
1 Reading input from command line, and disable printing copyright and
1 license information:
1
1 jas@latte:~$ idn --quiet räksmörgås.se blåbærgrød.no
1 xn--rksmrgs-5wao1o.se
1 xn--blbrgrd-fxak7p.no
1 jas@latte:~$
1
1 Accessing a specific StringPrep profile directly:
1
1 jas@latte:~$ idn --quiet --profile=SASLprep --stringprep teßtª
1 teßta
1 jas@latte:~$
1
1 10.6 Troubleshooting
1 ====================
1
1 Getting character data encoded right, and making sure Libidn use the
1 same encoding, can be difficult. The reason for this is that most
1 systems encode character data in more than one character encoding, i.e.,
1 using ‘UTF-8’ together with ‘ISO-8859-1’ or ‘ISO-2022-JP’. This problem
1 is likely to continue to exist until only one character encoding come
1 out as the evolutionary winner, or (more likely, at least to some
1 extents) forever.
1
1 The first step to troubleshooting character encoding problems with
1 Libidn is to use the ‘--debug’ parameter to find out which character set
1 encoding ‘idn’ believe your locale uses.
1
1 jas@latte:~$ idn --debug --quiet ""
1 system locale uses charset `UTF-8'.
1
1 jas@latte:~$
1
1 If it prints ‘ANSI_X3.4-1968’ (i.e., ‘US-ASCII’), this indicate you
1 have not configured your locale properly. To configure the locale, you
1 can, for example, use ‘LANG=sv_SE.UTF-8; export LANG’ at a ‘/bin/sh’
1 prompt, to set up your locale for a Swedish environment using ‘UTF-8’ as
1 the encoding.
1
1 Sometimes ‘idn’ appear to be unable to translate from your system
1 locale into ‘UTF-8’ (which is used internally), and you get an error
1 like the following:
1
1 jas@latte:~$ idn --quiet foo
1 idn: could not convert from ISO-8859-1 to UTF-8.
1 jas@latte:~$
1
1 The simplest explanation is that you haven’t installed the ‘iconv’
1 conversion tools. You can find it as a standalone library in GNU
1 Libiconv (<http://www.gnu.org/software/libiconv/>). On many GNU/Linux
1 systems, this library is part of the system, but you may have to install
1 additional packages (e.g., ‘glibc-locale’ for Debian) to be able to use
1 it.
1
1 Another explanation is that the error is correct and you are feeding
1 ‘idn’ invalid data. This can happen inadvertently if you are not
1 careful with the character set encoding you use. For example, if your
1 shell run in a ‘ISO-8859-1’ environment, and you invoke ‘idn’ with the
1 ‘CHARSET’ environment variable as follows, you will feed it ‘ISO-8859-1’
1 characters but force it to believe they are ‘UTF-8’. Naturally this
1 will lead to an error, unless the byte sequences happen to be valid
1 ‘UTF-8’. Note that even if you don’t get an error, the output may be
1 incorrect in this situation, because ‘ISO-8859-1’ and ‘UTF-8’ does not
1 in general encode the same characters as the same byte sequences.
1
1 jas@latte:~$ idn --quiet --debug ""
1 system locale uses charset `ISO-8859-1'.
1
1 jas@latte:~$ CHARSET=UTF-8 idn --quiet --debug räksmörgås
1 system locale uses charset `UTF-8'.
1 input[0] = U+0072
1 input[1] = U+4af3
1 input[2] = U+006d
1 input[3] = U+1b29e5
1 input[4] = U+0073
1 output[0] = U+0078
1 output[1] = U+006e
1 output[2] = U+002d
1 output[3] = U+002d
1 output[4] = U+0072
1 output[5] = U+006d
1 output[6] = U+0073
1 output[7] = U+002d
1 output[8] = U+0068
1 output[9] = U+0069
1 output[10] = U+0036
1 output[11] = U+0064
1 output[12] = U+0035
1 output[13] = U+0039
1 output[14] = U+0037
1 output[15] = U+0035
1 output[16] = U+0035
1 output[17] = U+0032
1 output[18] = U+0061
1 xn--rms-hi6d597552a
1 jas@latte:~$
1
1 The sense moral here is to forget about ‘CHARSET’ (configure your
1 locales properly instead) unless you know what you are doing, and if you
1 want to use it, do it carefully, after verifying with ‘--debug’ that you
1 get the desired results.
1