gettext: PO Files
1
1 3 The Format of PO Files
1 ************************
1
1 The GNU ‘gettext’ toolset helps programmers and translators at
1 producing, updating and using translation files, mainly those PO files
1 which are textual, editable files. This chapter explains the format of
1 PO files.
1
1 A PO file is made up of many entries, each entry holding the relation
1 between an original untranslated string and its corresponding
1 translation. All entries in a given PO file usually pertain to a single
1 project, and all translations are expressed in a single target language.
1 One PO file "entry" has the following schematic structure:
1
1 WHITE-SPACE
1 # TRANSLATOR-COMMENTS
1 #. EXTRACTED-COMMENTS
1 #: REFERENCE…
1 #, FLAG…
1 #| msgid PREVIOUS-UNTRANSLATED-STRING
1 msgid UNTRANSLATED-STRING
1 msgstr TRANSLATED-STRING
1
1 The general structure of a PO file should be well understood by the
1 translator. When using PO mode, very little has to be known about the
1 format details, as PO mode takes care of them for her.
1
1 A simple entry can look like this:
1
1 #: lib/error.c:116
1 msgid "Unknown system error"
1 msgstr "Error desconegut del sistema"
1
1 Entries begin with some optional white space. Usually, when
1 generated through GNU ‘gettext’ tools, there is exactly one blank line
1 between entries. Then comments follow, on lines all starting with the
1 character ‘#’. There are two kinds of comments: those which have some
1 white space immediately following the ‘#’ - the TRANSLATOR COMMENTS -,
1 which comments are created and maintained exclusively by the translator,
1 and those which have some non-white character just after the ‘#’ - the
1 AUTOMATIC COMMENTS -, which comments are created and maintained
1 automatically by GNU ‘gettext’ tools. Comment lines starting with ‘#.’
1 contain comments given by the programmer, directed at the translator;
1 these comments are called EXTRACTED COMMENTS because the ‘xgettext’
1 program extracts them from the program’s source code. Comment lines
1 starting with ‘#:’ contain references to the program’s source code.
1 Comment lines starting with ‘#,’ contain flags; more about these below.
1 Comment lines starting with ‘#|’ contain the previous untranslated
1 string for which the translator gave a translation.
1
1 All comments, of either kind, are optional.
1
1 After white space and comments, entries show two strings, namely
1 first the untranslated string as it appears in the original program
1 sources, and then, the translation of this string. The original string
1 is introduced by the keyword ‘msgid’, and the translation, by ‘msgstr’.
1 The two strings, untranslated and translated, are quoted in various ways
1 in the PO file, using ‘"’ delimiters and ‘\’ escapes, but the translator
1 does not really have to pay attention to the precise quoting format, as
1 PO mode fully takes care of quoting for her.
1
1 The ‘msgid’ strings, as well as automatic comments, are produced and
1 managed by other GNU ‘gettext’ tools, and PO mode does not provide means
1 for the translator to alter these. The most she can do is merely
1 deleting them, and only by deleting the whole entry. On the other hand,
1 the ‘msgstr’ string, as well as translator comments, are really meant
1 for the translator, and PO mode gives her the full control she needs.
1
1 The comment lines beginning with ‘#,’ are special because they are
1 not completely ignored by the programs as comments generally are. The
1 comma separated list of FLAGs is used by the ‘msgfmt’ program to give
1 the user some better diagnostic messages. Currently there are two forms
1 of flags defined:
1
1 ‘fuzzy’
1 This flag can be generated by the ‘msgmerge’ program or it can be
1 inserted by the translator herself. It shows that the ‘msgstr’
1 string might not be a correct translation (anymore). Only the
1 translator can judge if the translation requires further
1 modification, or is acceptable as is. Once satisfied with the
1 translation, she then removes this ‘fuzzy’ attribute. The
1 ‘msgmerge’ program inserts this when it combined the ‘msgid’ and
1 ‘msgstr’ entries after fuzzy search only. ⇒Fuzzy Entries.
1
1 ‘c-format’
1 ‘no-c-format’
1 These flags should not be added by a human. Instead only the
1 ‘xgettext’ program adds them. In an automated PO file processing
1 system as proposed here, the user’s changes would be thrown away
1 again as soon as the ‘xgettext’ program generates a new template
1 file.
1
1 The ‘c-format’ flag indicates that the untranslated string and the
1 translation are supposed to be C format strings. The ‘no-c-format’
1 flag indicates that they are not C format strings, even though the
1 untranslated string happens to look like a C format string (with
1 ‘%’ directives).
1
1 When the ‘c-format’ flag is given for a string the ‘msgfmt’ program
1 does some more tests to check the validity of the translation.
DONTPRINTYET 1 ⇒msgfmt Invocation, ⇒c-format Flag and *note1DONTPRINTYET 1 ⇒msgfmt Invocation, ⇒c-format Flag and ⇒
c-format.
1
1 ‘objc-format’
1 ‘no-objc-format’
1 Likewise for Objective C, see ⇒objc-format.
1
1 ‘sh-format’
1 ‘no-sh-format’
1 Likewise for Shell, see ⇒sh-format.
1
1 ‘python-format’
1 ‘no-python-format’
1 Likewise for Python, see ⇒python-format.
1
1 ‘python-brace-format’
1 ‘no-python-brace-format’
1 Likewise for Python brace, see ⇒python-format.
1
1 ‘lisp-format’
1 ‘no-lisp-format’
1 Likewise for Lisp, see ⇒lisp-format.
1
1 ‘elisp-format’
1 ‘no-elisp-format’
1 Likewise for Emacs Lisp, see ⇒elisp-format.
1
1 ‘librep-format’
1 ‘no-librep-format’
1 Likewise for librep, see ⇒librep-format.
1
1 ‘scheme-format’
1 ‘no-scheme-format’
1 Likewise for Scheme, see ⇒scheme-format.
1
1 ‘smalltalk-format’
1 ‘no-smalltalk-format’
1 Likewise for Smalltalk, see ⇒smalltalk-format.
1
1 ‘java-format’
1 ‘no-java-format’
1 Likewise for Java, see ⇒java-format.
1
1 ‘csharp-format’
1 ‘no-csharp-format’
1 Likewise for C#, see ⇒csharp-format.
1
1 ‘awk-format’
1 ‘no-awk-format’
1 Likewise for awk, see ⇒awk-format.
1
1 ‘object-pascal-format’
1 ‘no-object-pascal-format’
1 Likewise for Object Pascal, see ⇒object-pascal-format.
1
1 ‘ycp-format’
1 ‘no-ycp-format’
1 Likewise for YCP, see ⇒ycp-format.
1
1 ‘tcl-format’
1 ‘no-tcl-format’
1 Likewise for Tcl, see ⇒tcl-format.
1
1 ‘perl-format’
1 ‘no-perl-format’
1 Likewise for Perl, see ⇒perl-format.
1
1 ‘perl-brace-format’
1 ‘no-perl-brace-format’
1 Likewise for Perl brace, see ⇒perl-format.
1
1 ‘php-format’
1 ‘no-php-format’
1 Likewise for PHP, see ⇒php-format.
1
1 ‘gcc-internal-format’
1 ‘no-gcc-internal-format’
1 Likewise for the GCC sources, see ⇒gcc-internal-format.
1
1 ‘gfc-internal-format’
1 ‘no-gfc-internal-format’
11 Likewise for the GNU Fortran Compiler sources, see ⇒
gfc-internal-format.
1
1 ‘qt-format’
1 ‘no-qt-format’
1 Likewise for Qt, see ⇒qt-format.
1
1 ‘qt-plural-format’
1 ‘no-qt-plural-format’
1 Likewise for Qt plural forms, see ⇒qt-plural-format.
1
1 ‘kde-format’
1 ‘no-kde-format’
1 Likewise for KDE, see ⇒kde-format.
1
1 ‘boost-format’
1 ‘no-boost-format’
1 Likewise for Boost, see ⇒boost-format.
1
1 ‘lua-format’
1 ‘no-lua-format’
1 Likewise for Lua, see ⇒lua-format.
1
1 ‘javascript-format’
1 ‘no-javascript-format’
1 Likewise for JavaScript, see ⇒javascript-format.
1
1 It is also possible to have entries with a context specifier. They
1 look like this:
1
1 WHITE-SPACE
1 # TRANSLATOR-COMMENTS
1 #. EXTRACTED-COMMENTS
1 #: REFERENCE…
1 #, FLAG…
1 #| msgctxt PREVIOUS-CONTEXT
1 #| msgid PREVIOUS-UNTRANSLATED-STRING
1 msgctxt CONTEXT
1 msgid UNTRANSLATED-STRING
1 msgstr TRANSLATED-STRING
1
1 The context serves to disambiguate messages with the same
1 UNTRANSLATED-STRING. It is possible to have several entries with the
1 same UNTRANSLATED-STRING in a PO file, provided that they each have a
1 different CONTEXT. Note that an empty CONTEXT string and an absent
1 ‘msgctxt’ line do not mean the same thing.
1
1 A different kind of entries is used for translations which involve
1 plural forms.
1
1 WHITE-SPACE
1 # TRANSLATOR-COMMENTS
1 #. EXTRACTED-COMMENTS
1 #: REFERENCE…
1 #, FLAG…
1 #| msgid PREVIOUS-UNTRANSLATED-STRING-SINGULAR
1 #| msgid_plural PREVIOUS-UNTRANSLATED-STRING-PLURAL
1 msgid UNTRANSLATED-STRING-SINGULAR
1 msgid_plural UNTRANSLATED-STRING-PLURAL
1 msgstr[0] TRANSLATED-STRING-CASE-0
1 ...
1 msgstr[N] TRANSLATED-STRING-CASE-N
1
1 Such an entry can look like this:
1
1 #: src/msgcmp.c:338 src/po-lex.c:699
1 #, c-format
1 msgid "found %d fatal error"
1 msgid_plural "found %d fatal errors"
1 msgstr[0] "s'ha trobat %d error fatal"
1 msgstr[1] "s'han trobat %d errors fatals"
1
1 Here also, a ‘msgctxt’ context can be specified before ‘msgid’, like
1 above.
1
1 Here, additional kinds of flags can be used:
1
1 ‘range:’
1 This flag is followed by a range of non-negative numbers, using the
1 syntax ‘range: MINIMUM-VALUE..MAXIMUM-VALUE’. It designates the
1 possible values that the numeric parameter of the message can take.
1 In some languages, translators may produce slightly better
1 translations if they know that the value can only take on values
1 between 0 and 10, for example.
1
1 The PREVIOUS-UNTRANSLATED-STRING is optionally inserted by the
1 ‘msgmerge’ program, at the same time when it marks a message fuzzy. It
1 helps the translator to see which changes were done by the developers on
1 the UNTRANSLATED-STRING.
1
1 It happens that some lines, usually whitespace or comments, follow
1 the very last entry of a PO file. Such lines are not part of any entry,
1 and will be dropped when the PO file is processed by the tools, or may
1 disturb some PO file editors.
1
1 The remainder of this section may be safely skipped by those using a
1 PO file editor, yet it may be interesting for everybody to have a better
1 idea of the precise format of a PO file. On the other hand, those
1 wishing to modify PO files by hand should carefully continue reading on.
1
1 An empty UNTRANSLATED-STRING is reserved to contain the header entry
1 with the meta information (⇒Header Entry). This header entry
1 should be the first entry of the file. The empty UNTRANSLATED-STRING is
1 reserved for this purpose and must not be used anywhere else.
1
1 Each of UNTRANSLATED-STRING and TRANSLATED-STRING respects the C
1 syntax for a character string, including the surrounding quotes and
1 embedded backslashed escape sequences. When the time comes to write
1 multi-line strings, one should not use escaped newlines. Instead, a
1 closing quote should follow the last character on the line to be
1 continued, and an opening quote should resume the string at the
1 beginning of the following PO file line. For example:
1
1 msgid ""
1 "Here is an example of how one might continue a very long string\n"
1 "for the common case the string represents multi-line output.\n"
1
1 In this example, the empty string is used on the first line, to allow
1 better alignment of the ‘H’ from the word ‘Here’ over the ‘f’ from the
1 word ‘for’. In this example, the ‘msgid’ keyword is followed by three
1 strings, which are meant to be concatenated. Concatenating the empty
1 string does not change the resulting overall string, but it is a way for
1 us to comply with the necessity of ‘msgid’ to be followed by a string on
1 the same line, while keeping the multi-line presentation left-justified,
1 as we find this to be a cleaner disposition. The empty string could
1 have been omitted, but only if the string starting with ‘Here’ was
1 promoted on the first line, right after ‘msgid’.(1) It was not really
1 necessary either to switch between the two last quoted strings
1 immediately after the newline ‘\n’, the switch could have occurred after
1 _any_ other character, we just did it this way because it is neater.
1
1 One should carefully distinguish between end of lines marked as ‘\n’
1 _inside_ quotes, which are part of the represented string, and end of
1 lines in the PO file itself, outside string quotes, which have no
1 incidence on the represented string.
1
1 Outside strings, white lines and comments may be used freely.
1 Comments start at the beginning of a line with ‘#’ and extend until the
1 end of the PO file line. Comments written by translators should have
1 the initial ‘#’ immediately followed by some white space. If the ‘#’ is
1 not immediately followed by white space, this comment is most likely
1 generated and managed by specialized GNU tools, and might disappear or
1 be replaced unexpectedly when the PO file is given to ‘msgmerge’.
1
1 ---------- Footnotes ----------
1
1 (1) This limitation is not imposed by GNU ‘gettext’, but is for
1 compatibility with the ‘msgfmt’ implementation on Solaris.
1