gawk: Explaining gettext

1 
1 13.2 GNU 'gettext'
1 ==================
1 
1 'gawk' uses GNU 'gettext' to provide its internationalization features.
1 The facilities in GNU 'gettext' focus on messages: strings printed by a
1 program, either directly or via formatting with 'printf' or
1 'sprintf()'.(1)
1 
1    When using GNU 'gettext', each application has its own "text domain".
1 This is a unique name, such as 'kpilot' or 'gawk', that identifies the
1 application.  A complete application may have multiple
1 components--programs written in C or C++, as well as scripts written in
1 'sh' or 'awk'.  All of the components use the same text domain.
1 
1    To make the discussion concrete, assume we're writing an application
1 named 'guide'.  Internationalization consists of the following steps, in
1 this order:
1 
1   1. The programmer reviews the source for all of 'guide''s components
1      and marks each string that is a candidate for translation.  For
1      example, '"`-F': option required"' is a good candidate for
1      translation.  A table with strings of option names is not (e.g.,
1      'gawk''s '--profile' option should remain the same, no matter what
1      the local language).
1 
1   2. The programmer indicates the application's text domain ('"guide"')
1      to the 'gettext' library, by calling the 'textdomain()' function.
1 
1   3. Messages from the application are extracted from the source code
1      and collected into a portable object template file ('guide.pot'),
1      which lists the strings and their translations.  The translations
1      are initially empty.  The original (usually English) messages serve
1      as the key for lookup of the translations.
1 
1   4. For each language with a translator, 'guide.pot' is copied to a
1      portable object file ('.po') and translations are created and
1      shipped with the application.  For example, there might be a
1      'fr.po' for a French translation.
1 
1   5. Each language's '.po' file is converted into a binary message
1      object ('.gmo') file.  A message object file contains the original
1      messages and their translations in a binary format that allows fast
1      lookup of translations at runtime.
1 
1   6. When 'guide' is built and installed, the binary translation files
1      are installed in a standard place.
1 
1   7. For testing and development, it is possible to tell 'gettext' to
1      use '.gmo' files in a different directory than the standard one by
1      using the 'bindtextdomain()' function.
1 
1   8. At runtime, 'guide' looks up each string via a call to 'gettext()'.
1      The returned string is the translated string if available, or the
1      original string if not.
1 
1   9. If necessary, it is possible to access messages from a different
1      text domain than the one belonging to the application, without
1      having to switch the application's default text domain back and
1      forth.
1 
1    In C (or C++), the string marking and dynamic translation lookup are
1 accomplished by wrapping each string in a call to 'gettext()':
1 
1      printf("%s", gettext("Don't Panic!\n"));
1 
1    The tools that extract messages from source code pull out all strings
1 enclosed in calls to 'gettext()'.
1 
1    The GNU 'gettext' developers, recognizing that typing 'gettext(...)'
1 over and over again is both painful and ugly to look at, use the macro
1 '_' (an underscore) to make things easier:
1 
1      /* In the standard header file: */
1      #define _(str) gettext(str)
1 
1      /* In the program text: */
1      printf("%s", _("Don't Panic!\n"));
1 
1 This reduces the typing overhead to just three extra characters per
1 string and is considerably easier to read as well.
1 
1    There are locale "categories" for different types of locale-related
1 information.  The defined locale categories that 'gettext' knows about
1 are:
1 
1 'LC_MESSAGES'
1      Text messages.  This is the default category for 'gettext'
1      operations, but it is possible to supply a different one
1      explicitly, if necessary.  (It is almost never necessary to supply
1      a different category.)
1 
1 'LC_COLLATE'
1      Text-collation information (i.e., how different characters and/or
1      groups of characters sort in a given language).
1 
1 'LC_CTYPE'
1      Character-type information (alphabetic, digit, upper- or lowercase,
1      and so on) as well as character encoding.  This information is
1      accessed via the POSIX character classes in regular expressions,
1      such as '/[[:alnum:]]/' (⇒Bracket Expressions).
1 
1 'LC_MONETARY'
1      Monetary information, such as the currency symbol, and whether the
1      symbol goes before or after a number.
1 
1 'LC_NUMERIC'
1      Numeric information, such as which characters to use for the
1      decimal point and the thousands separator.(2)
1 
1 'LC_TIME'
1      Time- and date-related information, such as 12- or 24-hour clock,
1      month printed before or after the day in a date, local month
1      abbreviations, and so on.
1 
1 'LC_ALL'
1      All of the above.  (Not too useful in the context of 'gettext'.)
1 
1      NOTE: As described in ⇒Locales, environment variables with
1      the same name as the locale categories ('LC_CTYPE', 'LC_ALL', etc.)
1      influence 'gawk''s behavior (and that of other utilities).
1 
1      Normally, these variables also affect how the 'gettext' library
1      finds translations.  However, the 'LANGUAGE' environment variable
1      overrides the 'LC_XXX' variables.  Many GNU/Linux systems may
1      define this variable without your knowledge, causing 'gawk' to not
1      find the correct translations.  If this happens to you, look to see
1      if 'LANGUAGE' is defined, and if so, use the shell's 'unset'
1      command to remove it.
1 
1    For testing translations of 'gawk' itself, you can set the
1 'GAWK_LOCALE_DIR' environment variable.  See the documentation for the C
11 'bindtextdomain()' function and also see ⇒Other Environment
 Variables.
1 
1    ---------- Footnotes ----------
1 
1    (1) For some operating systems, the 'gawk' port doesn't support GNU
1 'gettext'.  Therefore, these features are not available if you are using
1 one of those operating systems.  Sorry.
1 
1    (2) Americans use a comma every three decimal places and a period for
1 the decimal point, while many Europeans do exactly the opposite:
1 1,234.56 versus 1.234,56.
1