gettext: Preparing Strings

1 
1 4.3 Preparing Translatable Strings
1 ==================================
1 
1    Before strings can be marked for translations, they sometimes need to
1 be adjusted.  Usually preparing a string for translation is done right
1 before marking it, during the marking phase which is described in the
1 next sections.  What you have to keep in mind while doing that is the
1 following.
1 
1    • Decent English style.
1 
1    • Entire sentences.
1 
1    • Split at paragraphs.
1 
1    • Use format strings instead of string concatenation.
1 
1    • Avoid unusual markup and unusual control characters.
1 
1 Let’s look at some examples of these guidelines.
1 
1    Translatable strings should be in good English style.  If slang
1 language with abbreviations and shortcuts is used, often translators
1 will not understand the message and will produce very inappropriate
1 translations.
1 
1      "%s: is parameter\n"
1 
1 This is nearly untranslatable: Is the displayed item _a_ parameter or
1 _the_ parameter?
1 
1      "No match"
1 
1 The ambiguity in this message makes it unintelligible: Is the program
1 attempting to set something on fire?  Does it mean "The given object
1 does not match the template"?  Does it mean "The template does not fit
1 for any of the objects"?
1 
1    In both cases, adding more words to the message will help both the
1 translator and the English speaking user.
1 
1    Translatable strings should be entire sentences.  It is often not
1 possible to translate single verbs or adjectives in a substitutable way.
1 
1      printf ("File %s is %s protected", filename, rw ? "write" : "read");
1 
1 Most translators will not look at the source and will thus only see the
1 string ‘"File %s is %s protected"’, which is unintelligible.  Change
1 this to
1 
1      printf (rw ? "File %s is write protected" : "File %s is read protected",
1              filename);
1 
1 This way the translator will not only understand the message, she will
1 also be able to find the appropriate grammatical construction.  A French
1 translator for example translates "write protected" like "protected
1 against writing".
1 
1    Entire sentences are also important because in many languages, the
1 declination of some word in a sentence depends on the gender or the
1 number (singular/plural) of another part of the sentence.  There are
1 usually more interdependencies between words than in English.  The
1 consequence is that asking a translator to translate two half-sentences
1 and then combining these two half-sentences through dumb string
1 concatenation will not work, for many languages, even though it would
1 work for English.  That’s why translators need to handle entire
1 sentences.
1 
1    Often sentences don’t fit into a single line.  If a sentence is
1 output using two subsequent ‘printf’ statements, like this
1 
1      printf ("Locale charset \"%s\" is different from\n", lcharset);
1      printf ("input file charset \"%s\".\n", fcharset);
1 
1 the translator would have to translate two half sentences, but nothing
1 in the POT file would tell her that the two half sentences belong
1 together.  It is necessary to merge the two ‘printf’ statements so that
1 the translator can handle the entire sentence at once and decide at
1 which place to insert a line break in the translation (if at all):
1 
1      printf ("Locale charset \"%s\" is different from\n\
1      input file charset \"%s\".\n", lcharset, fcharset);
1 
1    You may now ask: how about two or more adjacent sentences?  Like in
1 this case:
1 
1      puts ("Apollo 13 scenario: Stack overflow handling failed.");
1      puts ("On the next stack overflow we will crash!!!");
1 
1 Should these two statements merged into a single one?  I would recommend
1 to merge them if the two sentences are related to each other, because
1 then it makes it easier for the translator to understand and translate
1 both.  On the other hand, if one of the two messages is a stereotypic
1 one, occurring in other places as well, you will do a favour to the
1 translator by not merging the two.  (Identical messages occurring in
1 several places are combined by xgettext, so the translator has to handle
1 them once only.)
1 
1    Translatable strings should be limited to one paragraph; don’t let a
1 single message be longer than ten lines.  The reason is that when the
1 translatable string changes, the translator is faced with the task of
1 updating the entire translated string.  Maybe only a single word will
1 have changed in the English string, but the translator doesn’t see that
1 (with the current translation tools), therefore she has to proofread the
1 entire message.
1 
1    Many GNU programs have a ‘--help’ output that extends over several
1 screen pages.  It is a courtesy towards the translators to split such a
1 message into several ones of five to ten lines each.  While doing that,
1 you can also attempt to split the documented options into groups, such
1 as the input options, the output options, and the informative output
1 options.  This will help every user to find the option he is looking
1 for.
1 
1    Hardcoded string concatenation is sometimes used to construct English
1 strings:
1 
1      strcpy (s, "Replace ");
1      strcat (s, object1);
1      strcat (s, " with ");
1      strcat (s, object2);
1      strcat (s, "?");
1 
1 In order to present to the translator only entire sentences, and also
1 because in some languages the translator might want to swap the order of
1 ‘object1’ and ‘object2’, it is necessary to change this to use a format
1 string:
1 
1      sprintf (s, "Replace %s with %s?", object1, object2);
1 
1    A similar case is compile time concatenation of strings.  The ISO C
1 99 include file ‘<inttypes.h>’ contains a macro ‘PRId64’ that can be
1 used as a formatting directive for outputting an ‘int64_t’ integer
1 through ‘printf’.  It expands to a constant string, usually "d" or "ld"
1 or "lld" or something like this, depending on the platform.  Assume you
1 have code like
1 
1      printf ("The amount is %0" PRId64 "\n", number);
1 
1 The ‘gettext’ tools and library have special support for these
1 ‘<inttypes.h>’ macros.  You can therefore simply write
1 
1      printf (gettext ("The amount is %0" PRId64 "\n"), number);
1 
1 The PO file will contain the string "The amount is %0<PRId64>\n".  The
1 translators will provide a translation containing "%0<PRId64>" as well,
1 and at runtime the ‘gettext’ function’s result will contain the
1 appropriate constant string, "d" or "ld" or "lld".
1 
1    This works only for the predefined ‘<inttypes.h>’ macros.  If you
1 have defined your own similar macros, let’s say ‘MYPRId64’, that are not
1 known to ‘xgettext’, the solution for this problem is to change the code
1 like this:
1 
1      char buf1[100];
1      sprintf (buf1, "%0" MYPRId64, number);
1      printf (gettext ("The amount is %s\n"), buf1);
1 
1    This means, you put the platform dependent code in one statement, and
1 the internationalization code in a different statement.  Note that a
1 buffer length of 100 is safe, because all available hardware integer
1 types are limited to 128 bits, and to print a 128 bit integer one needs
1 at most 54 characters, regardless whether in decimal, octal or
1 hexadecimal.
1 
1    All this applies to other programming languages as well.  For
1 example, in Java and C#, string concatenation is very frequently used,
1 because it is a compiler built-in operator.  Like in C, in Java, you
1 would change
1 
1      System.out.println("Replace "+object1+" with "+object2+"?");
1 
1 into a statement involving a format string:
1 
1      System.out.println(
1          MessageFormat.format("Replace {0} with {1}?",
1                               new Object[] { object1, object2 }));
1 
1 Similarly, in C#, you would change
1 
1      Console.WriteLine("Replace "+object1+" with "+object2+"?");
1 
1 into a statement involving a format string:
1 
1      Console.WriteLine(
1          String.Format("Replace {0} with {1}?", object1, object2));
1 
1    Unusual markup or control characters should not be used in
1 translatable strings.  Translators will likely not understand the
1 particular meaning of the markup or control characters.
1 
1    For example, if you have a convention that ‘|’ delimits the left-hand
1 and right-hand part of some GUI elements, translators will often not
1 understand it without specific comments.  It might be better to have the
1 translator translate the left-hand and right-hand part separately.
1 
1    Another example is the ‘argp’ convention to use a single ‘\v’
1 (vertical tab) control character to delimit two sections inside a
1 string.  This is flawed.  Some translators may convert it to a simple
1 newline, some to blank lines.  With some PO file editors it may not be
1 easy to even enter a vertical tab control character.  So, you cannot be
1 sure that the translation will contain a ‘\v’ character, at the
1 corresponding position.  The solution is, again, to let the translator
1 translate two separate strings and combine at run-time the two
1 translated strings with the ‘\v’ required by the convention.
1 
1    HTML markup, however, is common enough that it’s probably ok to use
1 in translatable strings.  But please bear in mind that the GNU gettext
1 tools don’t verify that the translations are well-formed HTML.
1