gettext: Aspects

1 
1 1.3 Aspects in Native Language Support
1 ======================================
1 
1    For a totally multi-lingual distribution, there are many things to
1 translate beyond output messages.
1 
1    • As of today, GNU ‘gettext’ offers a complete toolset for
1      translating messages output by C programs.  Perl scripts and shell
1      scripts will also need to be translated.  Even if there are today
1      some hooks by which this can be done, these hooks are not
1      integrated as well as they should be.
1 
1    • Some programs, like ‘autoconf’ or ‘bison’, are able to produce
1      other programs (or scripts).  Even if the generating programs
1      themselves are internationalized, the generated programs they
1      produce may need internationalization on their own, and this
1      indirect internationalization could be automated right from the
1      generating program.  In fact, quite usually, generating and
1      generated programs could be internationalized independently, as the
1      effort needed is fairly orthogonal.
1 
1    • A few programs include textual tables which might need translation
1      themselves, independently of the strings contained in the program
1      itself.  For example, RFC 1345 gives an English description for
1      each character which the ‘recode’ program is able to reconstruct at
1      execution.  Since these descriptions are extracted from the RFC by
1      mechanical means, translating them properly would require a prior
1      translation of the RFC itself.
1 
1    • Almost all programs accept options, which are often worded out so
1      to be descriptive for the English readers; one might want to
1      consider offering translated versions for program options as well.
1 
1    • Many programs read, interpret, compile, or are somewhat driven by
1      input files which are texts containing keywords, identifiers, or
1      replies which are inherently translatable.  For example, one may
1      want ‘gcc’ to allow diacriticized characters in identifiers or use
1      translated keywords; ‘rm -i’ might accept something else than ‘y’
1      or ‘n’ for replies, etc.  Even if the program will eventually make
1      most of its output in the foreign languages, one has to decide
1      whether the input syntax, option values, etc., are to be localized
1      or not.
1 
1    • The manual accompanying a package, as well as all documentation
1      files in the distribution, could surely be translated, too.
1      Translating a manual, with the intent of later keeping up with
1      updates, is a major undertaking in itself, generally.
1 
1    As we already stressed, translation is only one aspect of locales.
1 Other internationalization aspects are system services and are handled
1 in GNU ‘libc’.  There are many attributes that are needed to define a
1 country’s cultural conventions.  These attributes include beside the
1 country’s native language, the formatting of the date and time, the
1 representation of numbers, the symbols for currency, etc.  These local
1 "rules" are termed the country’s locale.  The locale represents the
1 knowledge needed to support the country’s native attributes.
1 
1    There are a few major areas which may vary between countries and
1 hence, define what a locale must describe.  The following list helps
1 putting multi-lingual messages into the proper context of other tasks
1 related to locales.  See the GNU ‘libc’ manual for details.
1 
1 _Characters and Codesets_
1 
1      The codeset most commonly used through out the USA and most English
1      speaking parts of the world is the ASCII codeset.  However, there
1      are many characters needed by various locales that are not found
1      within this codeset.  The 8-bit ISO 8859-1 code set has most of the
1      special characters needed to handle the major European languages.
1      However, in many cases, choosing ISO 8859-1 is nevertheless not
1      adequate: it doesn’t even handle the major European currency.
1      Hence each locale will need to specify which codeset they need to
1      use and will need to have the appropriate character handling
1      routines to cope with the codeset.
1 
1 _Currency_
1 
1      The symbols used vary from country to country as does the position
1      used by the symbol.  Software needs to be able to transparently
1      display currency figures in the native mode for each locale.
1 
1 _Dates_
1 
1      The format of date varies between locales.  For example, Christmas
1      day in 1994 is written as 12/25/94 in the USA and as 25/12/94 in
1      Australia.  Other countries might use ISO 8601 dates, etc.
1 
1      Time of the day may be noted as HH:MM, HH.MM, or otherwise.  Some
1      locales require time to be specified in 24-hour mode rather than as
1      AM or PM. Further, the nature and yearly extent of the Daylight
1      Saving correction vary widely between countries.
1 
1 _Numbers_
1 
1      Numbers can be represented differently in different locales.  For
1      example, the following numbers are all written correctly for their
1      respective locales:
1 
1           12,345.67       English
1           12.345,67       German
1            12345,67       French
1           1,2345.67       Asia
1 
1      Some programs could go further and use different unit systems, like
1      English units or Metric units, or even take into account variants
1      about how numbers are spelled in full.
1 
1 _Messages_
1 
1      The most obvious area is the language support within a locale.
1      This is where GNU ‘gettext’ provides the means for developers and
1      users to easily change the language that the software uses to
1      communicate to the user.
1 
1    These areas of cultural conventions are called _locale categories_.
1 It is an unfortunate term; _locale aspects_ or _locale feature
1 categories_ would be a better term, because each “locale category”
1 describes an area or task that requires localization.  The concrete data
1 that describes the cultural conventions for such an area and for a
1 particular culture is also called a _locale category_.  In this sense, a
1 locale is composed of several locale categories: the locale category
1 describing the codeset, the locale category describing the formatting of
1 numbers, the locale category containing the translated messages, and so
1 on.
1 
1    Components of locale outside of message handling are standardized in
1 the ISO C standard and the POSIX:2001 standard (also known as the SUSV3
1 specification).  GNU ‘libc’ fully implements this, and most other modern
1 systems provide a more or less reasonable support for at least some of
1 the missing components.
1