gettext: Preparing Strings
1
1 4.3 Preparing Translatable Strings
1 ==================================
1
1 Before strings can be marked for translations, they sometimes need to
1 be adjusted. Usually preparing a string for translation is done right
1 before marking it, during the marking phase which is described in the
1 next sections. What you have to keep in mind while doing that is the
1 following.
1
1 • Decent English style.
1
1 • Entire sentences.
1
1 • Split at paragraphs.
1
1 • Use format strings instead of string concatenation.
1
1 • Avoid unusual markup and unusual control characters.
1
1 Let’s look at some examples of these guidelines.
1
1 Translatable strings should be in good English style. If slang
1 language with abbreviations and shortcuts is used, often translators
1 will not understand the message and will produce very inappropriate
1 translations.
1
1 "%s: is parameter\n"
1
1 This is nearly untranslatable: Is the displayed item _a_ parameter or
1 _the_ parameter?
1
1 "No match"
1
1 The ambiguity in this message makes it unintelligible: Is the program
1 attempting to set something on fire? Does it mean "The given object
1 does not match the template"? Does it mean "The template does not fit
1 for any of the objects"?
1
1 In both cases, adding more words to the message will help both the
1 translator and the English speaking user.
1
1 Translatable strings should be entire sentences. It is often not
1 possible to translate single verbs or adjectives in a substitutable way.
1
1 printf ("File %s is %s protected", filename, rw ? "write" : "read");
1
1 Most translators will not look at the source and will thus only see the
1 string ‘"File %s is %s protected"’, which is unintelligible. Change
1 this to
1
1 printf (rw ? "File %s is write protected" : "File %s is read protected",
1 filename);
1
1 This way the translator will not only understand the message, she will
1 also be able to find the appropriate grammatical construction. A French
1 translator for example translates "write protected" like "protected
1 against writing".
1
1 Entire sentences are also important because in many languages, the
1 declination of some word in a sentence depends on the gender or the
1 number (singular/plural) of another part of the sentence. There are
1 usually more interdependencies between words than in English. The
1 consequence is that asking a translator to translate two half-sentences
1 and then combining these two half-sentences through dumb string
1 concatenation will not work, for many languages, even though it would
1 work for English. That’s why translators need to handle entire
1 sentences.
1
1 Often sentences don’t fit into a single line. If a sentence is
1 output using two subsequent ‘printf’ statements, like this
1
1 printf ("Locale charset \"%s\" is different from\n", lcharset);
1 printf ("input file charset \"%s\".\n", fcharset);
1
1 the translator would have to translate two half sentences, but nothing
1 in the POT file would tell her that the two half sentences belong
1 together. It is necessary to merge the two ‘printf’ statements so that
1 the translator can handle the entire sentence at once and decide at
1 which place to insert a line break in the translation (if at all):
1
1 printf ("Locale charset \"%s\" is different from\n\
1 input file charset \"%s\".\n", lcharset, fcharset);
1
1 You may now ask: how about two or more adjacent sentences? Like in
1 this case:
1
1 puts ("Apollo 13 scenario: Stack overflow handling failed.");
1 puts ("On the next stack overflow we will crash!!!");
1
1 Should these two statements merged into a single one? I would recommend
1 to merge them if the two sentences are related to each other, because
1 then it makes it easier for the translator to understand and translate
1 both. On the other hand, if one of the two messages is a stereotypic
1 one, occurring in other places as well, you will do a favour to the
1 translator by not merging the two. (Identical messages occurring in
1 several places are combined by xgettext, so the translator has to handle
1 them once only.)
1
1 Translatable strings should be limited to one paragraph; don’t let a
1 single message be longer than ten lines. The reason is that when the
1 translatable string changes, the translator is faced with the task of
1 updating the entire translated string. Maybe only a single word will
1 have changed in the English string, but the translator doesn’t see that
1 (with the current translation tools), therefore she has to proofread the
1 entire message.
1
1 Many GNU programs have a ‘--help’ output that extends over several
1 screen pages. It is a courtesy towards the translators to split such a
1 message into several ones of five to ten lines each. While doing that,
1 you can also attempt to split the documented options into groups, such
1 as the input options, the output options, and the informative output
1 options. This will help every user to find the option he is looking
1 for.
1
1 Hardcoded string concatenation is sometimes used to construct English
1 strings:
1
1 strcpy (s, "Replace ");
1 strcat (s, object1);
1 strcat (s, " with ");
1 strcat (s, object2);
1 strcat (s, "?");
1
1 In order to present to the translator only entire sentences, and also
1 because in some languages the translator might want to swap the order of
1 ‘object1’ and ‘object2’, it is necessary to change this to use a format
1 string:
1
1 sprintf (s, "Replace %s with %s?", object1, object2);
1
1 A similar case is compile time concatenation of strings. The ISO C
1 99 include file ‘<inttypes.h>’ contains a macro ‘PRId64’ that can be
1 used as a formatting directive for outputting an ‘int64_t’ integer
1 through ‘printf’. It expands to a constant string, usually "d" or "ld"
1 or "lld" or something like this, depending on the platform. Assume you
1 have code like
1
1 printf ("The amount is %0" PRId64 "\n", number);
1
1 The ‘gettext’ tools and library have special support for these
1 ‘<inttypes.h>’ macros. You can therefore simply write
1
1 printf (gettext ("The amount is %0" PRId64 "\n"), number);
1
1 The PO file will contain the string "The amount is %0<PRId64>\n". The
1 translators will provide a translation containing "%0<PRId64>" as well,
1 and at runtime the ‘gettext’ function’s result will contain the
1 appropriate constant string, "d" or "ld" or "lld".
1
1 This works only for the predefined ‘<inttypes.h>’ macros. If you
1 have defined your own similar macros, let’s say ‘MYPRId64’, that are not
1 known to ‘xgettext’, the solution for this problem is to change the code
1 like this:
1
1 char buf1[100];
1 sprintf (buf1, "%0" MYPRId64, number);
1 printf (gettext ("The amount is %s\n"), buf1);
1
1 This means, you put the platform dependent code in one statement, and
1 the internationalization code in a different statement. Note that a
1 buffer length of 100 is safe, because all available hardware integer
1 types are limited to 128 bits, and to print a 128 bit integer one needs
1 at most 54 characters, regardless whether in decimal, octal or
1 hexadecimal.
1
1 All this applies to other programming languages as well. For
1 example, in Java and C#, string concatenation is very frequently used,
1 because it is a compiler built-in operator. Like in C, in Java, you
1 would change
1
1 System.out.println("Replace "+object1+" with "+object2+"?");
1
1 into a statement involving a format string:
1
1 System.out.println(
1 MessageFormat.format("Replace {0} with {1}?",
1 new Object[] { object1, object2 }));
1
1 Similarly, in C#, you would change
1
1 Console.WriteLine("Replace "+object1+" with "+object2+"?");
1
1 into a statement involving a format string:
1
1 Console.WriteLine(
1 String.Format("Replace {0} with {1}?", object1, object2));
1
1 Unusual markup or control characters should not be used in
1 translatable strings. Translators will likely not understand the
1 particular meaning of the markup or control characters.
1
1 For example, if you have a convention that ‘|’ delimits the left-hand
1 and right-hand part of some GUI elements, translators will often not
1 understand it without specific comments. It might be better to have the
1 translator translate the left-hand and right-hand part separately.
1
1 Another example is the ‘argp’ convention to use a single ‘\v’
1 (vertical tab) control character to delimit two sections inside a
1 string. This is flawed. Some translators may convert it to a simple
1 newline, some to blank lines. With some PO file editors it may not be
1 easy to even enter a vertical tab control character. So, you cannot be
1 sure that the translation will contain a ‘\v’ character, at the
1 corresponding position. The solution is, again, to let the translator
1 translate two separate strings and combine at run-time the two
1 translated strings with the ‘\v’ required by the convention.
1
1 HTML markup, however, is common enough that it’s probably ok to use
1 in translatable strings. But please bear in mind that the GNU gettext
1 tools don’t verify that the translations are well-formed HTML.
1