gettext: Aspects
1
1 1.3 Aspects in Native Language Support
1 ======================================
1
1 For a totally multi-lingual distribution, there are many things to
1 translate beyond output messages.
1
1 • As of today, GNU ‘gettext’ offers a complete toolset for
1 translating messages output by C programs. Perl scripts and shell
1 scripts will also need to be translated. Even if there are today
1 some hooks by which this can be done, these hooks are not
1 integrated as well as they should be.
1
1 • Some programs, like ‘autoconf’ or ‘bison’, are able to produce
1 other programs (or scripts). Even if the generating programs
1 themselves are internationalized, the generated programs they
1 produce may need internationalization on their own, and this
1 indirect internationalization could be automated right from the
1 generating program. In fact, quite usually, generating and
1 generated programs could be internationalized independently, as the
1 effort needed is fairly orthogonal.
1
1 • A few programs include textual tables which might need translation
1 themselves, independently of the strings contained in the program
1 itself. For example, RFC 1345 gives an English description for
1 each character which the ‘recode’ program is able to reconstruct at
1 execution. Since these descriptions are extracted from the RFC by
1 mechanical means, translating them properly would require a prior
1 translation of the RFC itself.
1
1 • Almost all programs accept options, which are often worded out so
1 to be descriptive for the English readers; one might want to
1 consider offering translated versions for program options as well.
1
1 • Many programs read, interpret, compile, or are somewhat driven by
1 input files which are texts containing keywords, identifiers, or
1 replies which are inherently translatable. For example, one may
1 want ‘gcc’ to allow diacriticized characters in identifiers or use
1 translated keywords; ‘rm -i’ might accept something else than ‘y’
1 or ‘n’ for replies, etc. Even if the program will eventually make
1 most of its output in the foreign languages, one has to decide
1 whether the input syntax, option values, etc., are to be localized
1 or not.
1
1 • The manual accompanying a package, as well as all documentation
1 files in the distribution, could surely be translated, too.
1 Translating a manual, with the intent of later keeping up with
1 updates, is a major undertaking in itself, generally.
1
1 As we already stressed, translation is only one aspect of locales.
1 Other internationalization aspects are system services and are handled
1 in GNU ‘libc’. There are many attributes that are needed to define a
1 country’s cultural conventions. These attributes include beside the
1 country’s native language, the formatting of the date and time, the
1 representation of numbers, the symbols for currency, etc. These local
1 "rules" are termed the country’s locale. The locale represents the
1 knowledge needed to support the country’s native attributes.
1
1 There are a few major areas which may vary between countries and
1 hence, define what a locale must describe. The following list helps
1 putting multi-lingual messages into the proper context of other tasks
1 related to locales. See the GNU ‘libc’ manual for details.
1
1 _Characters and Codesets_
1
1 The codeset most commonly used through out the USA and most English
1 speaking parts of the world is the ASCII codeset. However, there
1 are many characters needed by various locales that are not found
1 within this codeset. The 8-bit ISO 8859-1 code set has most of the
1 special characters needed to handle the major European languages.
1 However, in many cases, choosing ISO 8859-1 is nevertheless not
1 adequate: it doesn’t even handle the major European currency.
1 Hence each locale will need to specify which codeset they need to
1 use and will need to have the appropriate character handling
1 routines to cope with the codeset.
1
1 _Currency_
1
1 The symbols used vary from country to country as does the position
1 used by the symbol. Software needs to be able to transparently
1 display currency figures in the native mode for each locale.
1
1 _Dates_
1
1 The format of date varies between locales. For example, Christmas
1 day in 1994 is written as 12/25/94 in the USA and as 25/12/94 in
1 Australia. Other countries might use ISO 8601 dates, etc.
1
1 Time of the day may be noted as HH:MM, HH.MM, or otherwise. Some
1 locales require time to be specified in 24-hour mode rather than as
1 AM or PM. Further, the nature and yearly extent of the Daylight
1 Saving correction vary widely between countries.
1
1 _Numbers_
1
1 Numbers can be represented differently in different locales. For
1 example, the following numbers are all written correctly for their
1 respective locales:
1
1 12,345.67 English
1 12.345,67 German
1 12345,67 French
1 1,2345.67 Asia
1
1 Some programs could go further and use different unit systems, like
1 English units or Metric units, or even take into account variants
1 about how numbers are spelled in full.
1
1 _Messages_
1
1 The most obvious area is the language support within a locale.
1 This is where GNU ‘gettext’ provides the means for developers and
1 users to easily change the language that the software uses to
1 communicate to the user.
1
1 These areas of cultural conventions are called _locale categories_.
1 It is an unfortunate term; _locale aspects_ or _locale feature
1 categories_ would be a better term, because each “locale category”
1 describes an area or task that requires localization. The concrete data
1 that describes the cultural conventions for such an area and for a
1 particular culture is also called a _locale category_. In this sense, a
1 locale is composed of several locale categories: the locale category
1 describing the codeset, the locale category describing the formatting of
1 numbers, the locale category containing the translated messages, and so
1 on.
1
1 Components of locale outside of message handling are standardized in
1 the ISO C standard and the POSIX:2001 standard (also known as the SUSV3
1 specification). GNU ‘libc’ fully implements this, and most other modern
1 systems provide a more or less reasonable support for at least some of
1 the missing components.
1