gettext: Plural forms

1 
1 11.2.6 Additional functions for plural forms
1 --------------------------------------------
1 
1    The functions of the ‘gettext’ family described so far (and all the
1 ‘catgets’ functions as well) have one problem in the real world which
1 have been neglected completely in all existing approaches.  What is
1 meant here is the handling of plural forms.
1 
1    Looking through Unix source code before the time anybody thought
1 about internationalization (and, sadly, even afterwards) one can often
1 find code similar to the following:
1 
1         printf ("%d file%s deleted", n, n == 1 ? "" : "s");
1 
1 After the first complaints from people internationalizing the code
1 people either completely avoided formulations like this or used strings
1 like ‘"file(s)"’.  Both look unnatural and should be avoided.  First
1 tries to solve the problem correctly looked like this:
1 
1         if (n == 1)
1           printf ("%d file deleted", n);
1         else
1           printf ("%d files deleted", n);
1 
1    But this does not solve the problem.  It helps languages where the
1 plural form of a noun is not simply constructed by adding an ‘s’ but
1 that is all.  Once again people fell into the trap of believing the
1 rules their language is using are universal.  But the handling of plural
1 forms differs widely between the language families.  For example, Rafal
1 Maszkowski ‘<rzm@mat.uni.torun.pl>’ reports:
1 
1      In Polish we use e.g. plik (file) this way:
1           1 plik
1           2,3,4 pliki
1           5-21 pliko'w
1           22-24 pliki
1           25-31 pliko'w
1      and so on (o’ means 8859-2 oacute which should be rather okreska,
1      similar to aogonek).
1 
1    There are two things which can differ between languages (and even
1 inside language families);
1 
1    • The form how plural forms are built differs.  This is a problem
1      with languages which have many irregularities.  German, for
1      instance, is a drastic case.  Though English and German are part of
1      the same language family (Germanic), the almost regular forming of
1      plural noun forms (appending an ‘s’) is hardly found in German.
1 
1    • The number of plural forms differ.  This is somewhat surprising for
1      those who only have experiences with Romanic and Germanic languages
1      since here the number is the same (there are two).
1 
1      But other language families have only one form or many forms.  More
1      information on this in an extra section.
1 
1    The consequence of this is that application writers should not try to
1 solve the problem in their code.  This would be localization since it is
1 only usable for certain, hardcoded language environments.  Instead the
1 extended ‘gettext’ interface should be used.
1 
1    These extra functions are taking instead of the one key string two
1 strings and a numerical argument.  The idea behind this is that using
1 the numerical argument and the first string as a key, the implementation
1 can select using rules specified by the translator the right plural
1 form.  The two string arguments then will be used to provide a return
1 value in case no message catalog is found (similar to the normal
1 ‘gettext’ behavior).  In this case the rules for Germanic language is
1 used and it is assumed that the first string argument is the singular
1 form, the second the plural form.
1 
1    This has the consequence that programs without language catalogs can
1 display the correct strings only if the program itself is written using
1 a Germanic language.  This is a limitation but since the GNU C library
1 (as well as the GNU ‘gettext’ package) are written as part of the GNU
1 package and the coding standards for the GNU project require program
1 being written in English, this solution nevertheless fulfills its
1 purpose.
1 
1  -- Function: char * ngettext (const char *MSGID1, const char *MSGID2,
1           unsigned long int N)
1      The ‘ngettext’ function is similar to the ‘gettext’ function as it
1      finds the message catalogs in the same way.  But it takes two extra
1      arguments.  The MSGID1 parameter must contain the singular form of
1      the string to be converted.  It is also used as the key for the
1      search in the catalog.  The MSGID2 parameter is the plural form.
1      The parameter N is used to determine the plural form.  If no
1      message catalog is found MSGID1 is returned if ‘n == 1’, otherwise
1      ‘msgid2’.
1 
1      An example for the use of this function is:
1 
1           printf (ngettext ("%d file removed", "%d files removed", n), n);
1 
1      Please note that the numeric value N has to be passed to the
1      ‘printf’ function as well.  It is not sufficient to pass it only to
1      ‘ngettext’.
1 
1      In the English singular case, the number – always 1 – can be
1      replaced with "one":
1 
1           printf (ngettext ("One file removed", "%d files removed", n), n);
1 
1      This works because the ‘printf’ function discards excess arguments
1      that are not consumed by the format string.
1 
1      If this function is meant to yield a format string that takes two
1      or more arguments, you can not use it like this:
1 
1           printf (ngettext ("%d file removed from directory %s",
1                             "%d files removed from directory %s",
1                             n),
1                   n, dir);
1 
1      because in many languages the translators want to replace the ‘%d’
1      with an explicit word in the singular case, just like “one” in
1      English, and C format strings cannot consume the second argument
1      but skip the first argument.  Instead, you have to reorder the
1      arguments so that ‘n’ comes last:
1 
1           printf (ngettext ("%2$d file removed from directory %1$s",
1                             "%2$d files removed from directory %1$s",
1                             n),
1                   dir, n);
1 
1      See ⇒c-format for details about this argument reordering
1      syntax.
1 
1      When you know that the value of ‘n’ is within a given range, you
1      can specify it as a comment directed to the ‘xgettext’ tool.  This
1      information may help translators to use more adequate translations.
1      Like this:
1 
1           if (days > 7 && days < 14)
1             /* xgettext: range: 1..6 */
1             printf (ngettext ("one week and one day", "one week and %d days",
1                               days - 7),
1                     days - 7);
1 
1      It is also possible to use this function when the strings don’t
1      contain a cardinal number:
1 
1           puts (ngettext ("Delete the selected file?",
1                           "Delete the selected files?",
1                           n));
1 
1      In this case the number N is only used to choose the plural form.
1 
1  -- Function: char * dngettext (const char *DOMAIN, const char *MSGID1,
1           const char *MSGID2, unsigned long int N)
1      The ‘dngettext’ is similar to the ‘dgettext’ function in the way
1      the message catalog is selected.  The difference is that it takes
1      two extra parameter to provide the correct plural form.  These two
1      parameters are handled in the same way ‘ngettext’ handles them.
1 
1  -- Function: char * dcngettext (const char *DOMAIN, const char *MSGID1,
1           const char *MSGID2, unsigned long int N, int CATEGORY)
1      The ‘dcngettext’ is similar to the ‘dcgettext’ function in the way
1      the message catalog is selected.  The difference is that it takes
1      two extra parameter to provide the correct plural form.  These two
1      parameters are handled in the same way ‘ngettext’ handles them.
1 
1    Now, how do these functions solve the problem of the plural forms?
1 Without the input of linguists (which was not available) it was not
1 possible to determine whether there are only a few different forms in
1 which plural forms are formed or whether the number can increase with
1 every new supported language.
1 
1    Therefore the solution implemented is to allow the translator to
1 specify the rules of how to select the plural form.  Since the formula
1 varies with every language this is the only viable solution except for
1 hardcoding the information in the code (which still would require the
1 possibility of extensions to not prevent the use of new languages).
1 
1    The information about the plural form selection has to be stored in
1 the header entry of the PO file (the one with the empty ‘msgid’ string).
1 The plural form information looks like this:
1 
1      Plural-Forms: nplurals=2; plural=n == 1 ? 0 : 1;
1 
1    The ‘nplurals’ value must be a decimal number which specifies how
1 many different plural forms exist for this language.  The string
1 following ‘plural’ is an expression which is using the C language
1 syntax.  Exceptions are that no negative numbers are allowed, numbers
1 must be decimal, and the only variable allowed is ‘n’.  Spaces are
1 allowed in the expression, but backslash-newlines are not; in the
1 examples below the backslash-newlines are present for formatting
1 purposes only.  This expression will be evaluated whenever one of the
1 functions ‘ngettext’, ‘dngettext’, or ‘dcngettext’ is called.  The
1 numeric value passed to these functions is then substituted for all uses
1 of the variable ‘n’ in the expression.  The resulting value then must be
1 greater or equal to zero and smaller than the value given as the value
1 of ‘nplurals’.
1 
1 The following rules are known at this point.  The language with families
1 are listed.  But this does not necessarily mean the information can be
1 generalized for the whole family (as can be easily seen in the table
1 below).(1)
1 
1 Only one form:
1      Some languages only require one single form.  There is no
1      distinction between the singular and plural form.  An appropriate
1      header entry would look like this:
1 
1           Plural-Forms: nplurals=1; plural=0;
1 
1      Languages with this property include:
1 
1      Asian family
1           Japanese, Vietnamese, Korean
1      Tai-Kadai family
1           Thai
1 
1 Two forms, singular used for one only
1      This is the form used in most existing programs since it is what
1      English is using.  A header entry would look like this:
1 
1           Plural-Forms: nplurals=2; plural=n != 1;
1 
1      (Note: this uses the feature of C expressions that boolean
1      expressions have to value zero or one.)
1 
1      Languages with this property include:
1 
1      Germanic family
1           English, German, Dutch, Swedish, Danish, Norwegian, Faroese
1      Romanic family
1           Spanish, Portuguese, Italian, Bulgarian
1      Latin/Greek family
1           Greek
1      Finno-Ugric family
1           Finnish, Estonian
1      Semitic family
1           Hebrew
1      Austronesian family
1           Bahasa Indonesian
1      Artificial
1           Esperanto
1 
1      Other languages using the same header entry are:
1 
1      Finno-Ugric family
1           Hungarian
1      Turkic/Altaic family
1           Turkish
1 
1      Hungarian does not appear to have a plural if you look at sentences
1      involving cardinal numbers.  For example, “1 apple” is “1 alma”,
1      and “123 apples” is “123 alma”.  But when the number is not
1      explicit, the distinction between singular and plural exists: “the
1      apple” is “az alma”, and “the apples” is “az almák”.  Since
1      ‘ngettext’ has to support both types of sentences, it is classified
1      here, under “two forms”.
1 
1      The same holds for Turkish: “1 apple” is “1 elma”, and “123 apples”
1      is “123 elma”.  But when the number is omitted, the distinction
1      between singular and plural exists: “the apple” is “elma”, and “the
1      apples” is “elmalar”.
1 
1 Two forms, singular used for zero and one
1      Exceptional case in the language family.  The header entry would
1      be:
1 
1           Plural-Forms: nplurals=2; plural=n>1;
1 
1      Languages with this property include:
1 
1      Romanic family
1           Brazilian Portuguese, French
1 
1 Three forms, special case for zero
1      The header entry would be:
1 
1           Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n != 0 ? 1 : 2;
1 
1      Languages with this property include:
1 
1      Baltic family
1           Latvian
1 
1 Three forms, special cases for one and two
1      The header entry would be:
1 
1           Plural-Forms: nplurals=3; plural=n==1 ? 0 : n==2 ? 1 : 2;
1 
1      Languages with this property include:
1 
1      Celtic
1           Gaeilge (Irish)
1 
1 Three forms, special case for numbers ending in 00 or [2-9][0-9]
1      The header entry would be:
1 
1           Plural-Forms: nplurals=3; \
1               plural=n==1 ? 0 : (n==0 || (n%100 > 0 && n%100 < 20)) ? 1 : 2;
1 
1      Languages with this property include:
1 
1      Romanic family
1           Romanian
1 
1 Three forms, special case for numbers ending in 1[2-9]
1      The header entry would look like this:
1 
1           Plural-Forms: nplurals=3; \
1               plural=n%10==1 && n%100!=11 ? 0 : \
1                      n%10>=2 && (n%100<10 || n%100>=20) ? 1 : 2;
1 
1      Languages with this property include:
1 
1      Baltic family
1           Lithuanian
1 
1 Three forms, special cases for numbers ending in 1 and 2, 3, 4, except those ending in 1[1-4]
1      The header entry would look like this:
1 
1           Plural-Forms: nplurals=3; \
1               plural=n%10==1 && n%100!=11 ? 0 : \
1                      n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
1 
1      Languages with this property include:
1 
1      Slavic family
1           Russian, Ukrainian, Belarusian, Serbian, Croatian
1 
1 Three forms, special cases for 1 and 2, 3, 4
1      The header entry would look like this:
1 
1           Plural-Forms: nplurals=3; \
1               plural=(n==1) ? 0 : (n>=2 && n<=4) ? 1 : 2;
1 
1      Languages with this property include:
1 
1      Slavic family
1           Czech, Slovak
1 
1 Three forms, special case for one and some numbers ending in 2, 3, or 4
1      The header entry would look like this:
1 
1           Plural-Forms: nplurals=3; \
1               plural=n==1 ? 0 : \
1                      n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
1 
1      Languages with this property include:
1 
1      Slavic family
1           Polish
1 
1 Four forms, special case for one and all numbers ending in 02, 03, or 04
1      The header entry would look like this:
1 
1           Plural-Forms: nplurals=4; \
1               plural=n%100==1 ? 0 : n%100==2 ? 1 : n%100==3 || n%100==4 ? 2 : 3;
1 
1      Languages with this property include:
1 
1      Slavic family
1           Slovenian
1 
1 Six forms, special cases for one, two, all numbers ending in 02, 03, … 10, all numbers ending in 11 … 99, and others
1      The header entry would look like this:
1 
1           Plural-Forms: nplurals=6; \
1               plural=n==0 ? 0 : n==1 ? 1 : n==2 ? 2 : n%100>=3 && n%100<=10 ? 3 \
1               : n%100>=11 ? 4 : 5;
1 
1      Languages with this property include:
1 
1      Afroasiatic family
1           Arabic
1 
1    You might now ask, ‘ngettext’ handles only numbers N of type
1 ‘unsigned long’.  What about larger integer types?  What about negative
1 numbers?  What about floating-point numbers?
1 
1    About larger integer types, such as ‘uintmax_t’ or ‘unsigned long
1 long’: they can be handled by reducing the value to a range that fits in
1 an ‘unsigned long’.  Simply casting the value to ‘unsigned long’ would
1 not do the right thing, since it would treat ‘ULONG_MAX + 1’ like zero,
1 ‘ULONG_MAX + 2’ like singular, and the like.  Here you can exploit the
1 fact that all mentioned plural form formulas eventually become periodic,
1 with a period that is a divisor of 100 (or 1000 or 1000000).  So, when
1 you reduce a large value to another one in the range [1000000, 1999999]
1 that ends in the same 6 decimal digits, you can assume that it will lead
1 to the same plural form selection.  This code does this:
1 
1      #include <inttypes.h>
1      uintmax_t nbytes = ...;
1      printf (ngettext ("The file has %"PRIuMAX" byte.",
1                        "The file has %"PRIuMAX" bytes.",
1                        (nbytes > ULONG_MAX
1                         ? (nbytes % 1000000) + 1000000
1                         : nbytes)),
1              nbytes);
1 
1    Negative and floating-point values usually represent physical
1 entities for which singular and plural don’t clearly apply.  In such
1 cases, there is no need to use ‘ngettext’; a simple ‘gettext’ call with
1 a form suitable for all values will do.  For example:
1 
1      printf (gettext ("Time elapsed: %.3f seconds"),
1              num_milliseconds * 0.001);
1 
1 Even if NUM_MILLISECONDS happens to be a multiple of 1000, the output
1      Time elapsed: 1.000 seconds
1 is acceptable in English, and similarly for other languages.
1 
1    The translators’ perspective regarding plural forms is explained in
1 ⇒Translating plural forms.
1 
1    ---------- Footnotes ----------
1 
1    (1) Additions are welcome.  Send appropriate information to
1 <bug-gnu-gettext@gnu.org> and <bug-glibc-manual@gnu.org>.  The Unicode
1 CLDR Project (<http://cldr.unicode.org>) provides a comprehensive set of
1 plural forms in a different format.  The ‘msginit’ program has
1 preliminary support for the format so you can use it as a baseline
1 (⇒msginit Invocation).
1