gettext: Locale Names
1
1 2.3.1 Locale Names
1 ------------------
1
1 A locale name usually has the form ‘LL_CC’. Here ‘LL’ is an ISO 639
1 two-letter language code, and ‘CC’ is an ISO 3166 two-letter country
1 code. For example, for German in Germany, LL is ‘de’, and CC is ‘DE’.
1 You find a list of the language codes in appendix ⇒Language Codes
1 and a list of the country codes in appendix ⇒Country Codes.
1
1 You might think that the country code specification is redundant.
1 But in fact, some languages have dialects in different countries. For
1 example, ‘de_AT’ is used for Austria, and ‘pt_BR’ for Brazil. The
1 country code serves to distinguish the dialects.
1
1 Many locale names have an extended syntax ‘LL_CC.ENCODING’ that also
1 specifies the character encoding. These are in use because between 2000
1 and 2005, most users have switched to locales in UTF-8 encoding. For
1 example, the German locale on glibc systems is nowadays ‘de_DE.UTF-8’.
1 The older name ‘de_DE’ still refers to the German locale as of 2000 that
1 stores characters in ISO-8859-1 encoding – a text encoding that cannot
1 even accommodate the Euro currency sign.
1
1 Some locale names use ‘LL_CC.@VARIANT’ instead of ‘LL_CC’. The
1 ‘@VARIANT’ can denote any kind of characteristics that is not already
1 implied by the language LL and the country CC. It can denote a
1 particular monetary unit. For example, on glibc systems, ‘de_DE@euro’
1 denotes the locale that uses the Euro currency, in contrast to the older
1 locale ‘de_DE’ which implies the use of the currency before 2002. It
1 can also denote a dialect of the language, or the script used to write
1 text (for example, ‘sr_RS@latin’ uses the Latin script, whereas ‘sr_RS’
1 uses the Cyrillic script to write Serbian), or the orthography rules, or
1 similar.
1
1 On other systems, some variations of this scheme are used, such as
1 ‘LL’. You can get the list of locales supported by your system for your
1 language by running the command ‘locale -a | grep '^LL'’.
1
1 There is also a special locale, called ‘C’. When it is used, it
1 disables all localization: in this locale, all programs standardized by
1 POSIX use English messages and an unspecified character encoding (often
1 US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending on the
1 operating system).
1