gawk: Explaining gettext
1
1 13.2 GNU 'gettext'
1 ==================
1
1 'gawk' uses GNU 'gettext' to provide its internationalization features.
1 The facilities in GNU 'gettext' focus on messages: strings printed by a
1 program, either directly or via formatting with 'printf' or
1 'sprintf()'.(1)
1
1 When using GNU 'gettext', each application has its own "text domain".
1 This is a unique name, such as 'kpilot' or 'gawk', that identifies the
1 application. A complete application may have multiple
1 components--programs written in C or C++, as well as scripts written in
1 'sh' or 'awk'. All of the components use the same text domain.
1
1 To make the discussion concrete, assume we're writing an application
1 named 'guide'. Internationalization consists of the following steps, in
1 this order:
1
1 1. The programmer reviews the source for all of 'guide''s components
1 and marks each string that is a candidate for translation. For
1 example, '"`-F': option required"' is a good candidate for
1 translation. A table with strings of option names is not (e.g.,
1 'gawk''s '--profile' option should remain the same, no matter what
1 the local language).
1
1 2. The programmer indicates the application's text domain ('"guide"')
1 to the 'gettext' library, by calling the 'textdomain()' function.
1
1 3. Messages from the application are extracted from the source code
1 and collected into a portable object template file ('guide.pot'),
1 which lists the strings and their translations. The translations
1 are initially empty. The original (usually English) messages serve
1 as the key for lookup of the translations.
1
1 4. For each language with a translator, 'guide.pot' is copied to a
1 portable object file ('.po') and translations are created and
1 shipped with the application. For example, there might be a
1 'fr.po' for a French translation.
1
1 5. Each language's '.po' file is converted into a binary message
1 object ('.gmo') file. A message object file contains the original
1 messages and their translations in a binary format that allows fast
1 lookup of translations at runtime.
1
1 6. When 'guide' is built and installed, the binary translation files
1 are installed in a standard place.
1
1 7. For testing and development, it is possible to tell 'gettext' to
1 use '.gmo' files in a different directory than the standard one by
1 using the 'bindtextdomain()' function.
1
1 8. At runtime, 'guide' looks up each string via a call to 'gettext()'.
1 The returned string is the translated string if available, or the
1 original string if not.
1
1 9. If necessary, it is possible to access messages from a different
1 text domain than the one belonging to the application, without
1 having to switch the application's default text domain back and
1 forth.
1
1 In C (or C++), the string marking and dynamic translation lookup are
1 accomplished by wrapping each string in a call to 'gettext()':
1
1 printf("%s", gettext("Don't Panic!\n"));
1
1 The tools that extract messages from source code pull out all strings
1 enclosed in calls to 'gettext()'.
1
1 The GNU 'gettext' developers, recognizing that typing 'gettext(...)'
1 over and over again is both painful and ugly to look at, use the macro
1 '_' (an underscore) to make things easier:
1
1 /* In the standard header file: */
1 #define _(str) gettext(str)
1
1 /* In the program text: */
1 printf("%s", _("Don't Panic!\n"));
1
1 This reduces the typing overhead to just three extra characters per
1 string and is considerably easier to read as well.
1
1 There are locale "categories" for different types of locale-related
1 information. The defined locale categories that 'gettext' knows about
1 are:
1
1 'LC_MESSAGES'
1 Text messages. This is the default category for 'gettext'
1 operations, but it is possible to supply a different one
1 explicitly, if necessary. (It is almost never necessary to supply
1 a different category.)
1
1 'LC_COLLATE'
1 Text-collation information (i.e., how different characters and/or
1 groups of characters sort in a given language).
1
1 'LC_CTYPE'
1 Character-type information (alphabetic, digit, upper- or lowercase,
1 and so on) as well as character encoding. This information is
1 accessed via the POSIX character classes in regular expressions,
1 such as '/[[:alnum:]]/' (⇒Bracket Expressions).
1
1 'LC_MONETARY'
1 Monetary information, such as the currency symbol, and whether the
1 symbol goes before or after a number.
1
1 'LC_NUMERIC'
1 Numeric information, such as which characters to use for the
1 decimal point and the thousands separator.(2)
1
1 'LC_TIME'
1 Time- and date-related information, such as 12- or 24-hour clock,
1 month printed before or after the day in a date, local month
1 abbreviations, and so on.
1
1 'LC_ALL'
1 All of the above. (Not too useful in the context of 'gettext'.)
1
1 NOTE: As described in ⇒Locales, environment variables with
1 the same name as the locale categories ('LC_CTYPE', 'LC_ALL', etc.)
1 influence 'gawk''s behavior (and that of other utilities).
1
1 Normally, these variables also affect how the 'gettext' library
1 finds translations. However, the 'LANGUAGE' environment variable
1 overrides the 'LC_XXX' variables. Many GNU/Linux systems may
1 define this variable without your knowledge, causing 'gawk' to not
1 find the correct translations. If this happens to you, look to see
1 if 'LANGUAGE' is defined, and if so, use the shell's 'unset'
1 command to remove it.
1
1 For testing translations of 'gawk' itself, you can set the
1 'GAWK_LOCALE_DIR' environment variable. See the documentation for the C
11 'bindtextdomain()' function and also see ⇒Other Environment
Variables.
1
1 ---------- Footnotes ----------
1
1 (1) For some operating systems, the 'gawk' port doesn't support GNU
1 'gettext'. Therefore, these features are not available if you are using
1 one of those operating systems. Sorry.
1
1 (2) Americans use a comma every three decimal places and a period for
1 the decimal point, while many Europeans do exactly the opposite:
1 1,234.56 versus 1.234,56.
1