gettext: PO Files

1 
1 3 The Format of PO Files
1 ************************
1 
1    The GNU ‘gettext’ toolset helps programmers and translators at
1 producing, updating and using translation files, mainly those PO files
1 which are textual, editable files.  This chapter explains the format of
1 PO files.
1 
1    A PO file is made up of many entries, each entry holding the relation
1 between an original untranslated string and its corresponding
1 translation.  All entries in a given PO file usually pertain to a single
1 project, and all translations are expressed in a single target language.
1 One PO file "entry" has the following schematic structure:
1 
1      WHITE-SPACE
1      #  TRANSLATOR-COMMENTS
1      #. EXTRACTED-COMMENTS
1      #: REFERENCE…
1      #, FLAG…
1      #| msgid PREVIOUS-UNTRANSLATED-STRING
1      msgid UNTRANSLATED-STRING
1      msgstr TRANSLATED-STRING
1 
1    The general structure of a PO file should be well understood by the
1 translator.  When using PO mode, very little has to be known about the
1 format details, as PO mode takes care of them for her.
1 
1    A simple entry can look like this:
1 
1      #: lib/error.c:116
1      msgid "Unknown system error"
1      msgstr "Error desconegut del sistema"
1 
1    Entries begin with some optional white space.  Usually, when
1 generated through GNU ‘gettext’ tools, there is exactly one blank line
1 between entries.  Then comments follow, on lines all starting with the
1 character ‘#’.  There are two kinds of comments: those which have some
1 white space immediately following the ‘#’ - the TRANSLATOR COMMENTS -,
1 which comments are created and maintained exclusively by the translator,
1 and those which have some non-white character just after the ‘#’ - the
1 AUTOMATIC COMMENTS -, which comments are created and maintained
1 automatically by GNU ‘gettext’ tools.  Comment lines starting with ‘#.’
1 contain comments given by the programmer, directed at the translator;
1 these comments are called EXTRACTED COMMENTS because the ‘xgettext’
1 program extracts them from the program’s source code.  Comment lines
1 starting with ‘#:’ contain references to the program’s source code.
1 Comment lines starting with ‘#,’ contain flags; more about these below.
1 Comment lines starting with ‘#|’ contain the previous untranslated
1 string for which the translator gave a translation.
1 
1    All comments, of either kind, are optional.
1 
1    After white space and comments, entries show two strings, namely
1 first the untranslated string as it appears in the original program
1 sources, and then, the translation of this string.  The original string
1 is introduced by the keyword ‘msgid’, and the translation, by ‘msgstr’.
1 The two strings, untranslated and translated, are quoted in various ways
1 in the PO file, using ‘"’ delimiters and ‘\’ escapes, but the translator
1 does not really have to pay attention to the precise quoting format, as
1 PO mode fully takes care of quoting for her.
1 
1    The ‘msgid’ strings, as well as automatic comments, are produced and
1 managed by other GNU ‘gettext’ tools, and PO mode does not provide means
1 for the translator to alter these.  The most she can do is merely
1 deleting them, and only by deleting the whole entry.  On the other hand,
1 the ‘msgstr’ string, as well as translator comments, are really meant
1 for the translator, and PO mode gives her the full control she needs.
1 
1    The comment lines beginning with ‘#,’ are special because they are
1 not completely ignored by the programs as comments generally are.  The
1 comma separated list of FLAGs is used by the ‘msgfmt’ program to give
1 the user some better diagnostic messages.  Currently there are two forms
1 of flags defined:
1 
1 ‘fuzzy’
1      This flag can be generated by the ‘msgmerge’ program or it can be
1      inserted by the translator herself.  It shows that the ‘msgstr’
1      string might not be a correct translation (anymore).  Only the
1      translator can judge if the translation requires further
1      modification, or is acceptable as is.  Once satisfied with the
1      translation, she then removes this ‘fuzzy’ attribute.  The
1      ‘msgmerge’ program inserts this when it combined the ‘msgid’ and
1      ‘msgstr’ entries after fuzzy search only.  ⇒Fuzzy Entries.
1 
1 ‘c-format’
1 ‘no-c-format’
1      These flags should not be added by a human.  Instead only the
1      ‘xgettext’ program adds them.  In an automated PO file processing
1      system as proposed here, the user’s changes would be thrown away
1      again as soon as the ‘xgettext’ program generates a new template
1      file.
1 
1      The ‘c-format’ flag indicates that the untranslated string and the
1      translation are supposed to be C format strings.  The ‘no-c-format’
1      flag indicates that they are not C format strings, even though the
1      untranslated string happens to look like a C format string (with
1      ‘%’ directives).
1 
1      When the ‘c-format’ flag is given for a string the ‘msgfmt’ program
1      does some more tests to check the validity of the translation.
DONTPRINTYET 1      ⇒msgfmt Invocation, ⇒c-format Flag and *note1DONTPRINTYET 1      ⇒msgfmt Invocation, ⇒c-format Flag and ⇒
      c-format.
1 
1 ‘objc-format’
1 ‘no-objc-format’
1      Likewise for Objective C, see ⇒objc-format.
1 
1 ‘sh-format’
1 ‘no-sh-format’
1      Likewise for Shell, see ⇒sh-format.
1 
1 ‘python-format’
1 ‘no-python-format’
1      Likewise for Python, see ⇒python-format.
1 
1 ‘python-brace-format’
1 ‘no-python-brace-format’
1      Likewise for Python brace, see ⇒python-format.
1 
1 ‘lisp-format’
1 ‘no-lisp-format’
1      Likewise for Lisp, see ⇒lisp-format.
1 
1 ‘elisp-format’
1 ‘no-elisp-format’
1      Likewise for Emacs Lisp, see ⇒elisp-format.
1 
1 ‘librep-format’
1 ‘no-librep-format’
1      Likewise for librep, see ⇒librep-format.
1 
1 ‘scheme-format’
1 ‘no-scheme-format’
1      Likewise for Scheme, see ⇒scheme-format.
1 
1 ‘smalltalk-format’
1 ‘no-smalltalk-format’
1      Likewise for Smalltalk, see ⇒smalltalk-format.
1 
1 ‘java-format’
1 ‘no-java-format’
1      Likewise for Java, see ⇒java-format.
1 
1 ‘csharp-format’
1 ‘no-csharp-format’
1      Likewise for C#, see ⇒csharp-format.
1 
1 ‘awk-format’
1 ‘no-awk-format’
1      Likewise for awk, see ⇒awk-format.
1 
1 ‘object-pascal-format’
1 ‘no-object-pascal-format’
1      Likewise for Object Pascal, see ⇒object-pascal-format.
1 
1 ‘ycp-format’
1 ‘no-ycp-format’
1      Likewise for YCP, see ⇒ycp-format.
1 
1 ‘tcl-format’
1 ‘no-tcl-format’
1      Likewise for Tcl, see ⇒tcl-format.
1 
1 ‘perl-format’
1 ‘no-perl-format’
1      Likewise for Perl, see ⇒perl-format.
1 
1 ‘perl-brace-format’
1 ‘no-perl-brace-format’
1      Likewise for Perl brace, see ⇒perl-format.
1 
1 ‘php-format’
1 ‘no-php-format’
1      Likewise for PHP, see ⇒php-format.
1 
1 ‘gcc-internal-format’
1 ‘no-gcc-internal-format’
1      Likewise for the GCC sources, see ⇒gcc-internal-format.
1 
1 ‘gfc-internal-format’
1 ‘no-gfc-internal-format’
11      Likewise for the GNU Fortran Compiler sources, see ⇒
      gfc-internal-format.
1 
1 ‘qt-format’
1 ‘no-qt-format’
1      Likewise for Qt, see ⇒qt-format.
1 
1 ‘qt-plural-format’
1 ‘no-qt-plural-format’
1      Likewise for Qt plural forms, see ⇒qt-plural-format.
1 
1 ‘kde-format’
1 ‘no-kde-format’
1      Likewise for KDE, see ⇒kde-format.
1 
1 ‘boost-format’
1 ‘no-boost-format’
1      Likewise for Boost, see ⇒boost-format.
1 
1 ‘lua-format’
1 ‘no-lua-format’
1      Likewise for Lua, see ⇒lua-format.
1 
1 ‘javascript-format’
1 ‘no-javascript-format’
1      Likewise for JavaScript, see ⇒javascript-format.
1 
1    It is also possible to have entries with a context specifier.  They
1 look like this:
1 
1      WHITE-SPACE
1      #  TRANSLATOR-COMMENTS
1      #. EXTRACTED-COMMENTS
1      #: REFERENCE…
1      #, FLAG…
1      #| msgctxt PREVIOUS-CONTEXT
1      #| msgid PREVIOUS-UNTRANSLATED-STRING
1      msgctxt CONTEXT
1      msgid UNTRANSLATED-STRING
1      msgstr TRANSLATED-STRING
1 
1    The context serves to disambiguate messages with the same
1 UNTRANSLATED-STRING.  It is possible to have several entries with the
1 same UNTRANSLATED-STRING in a PO file, provided that they each have a
1 different CONTEXT.  Note that an empty CONTEXT string and an absent
1 ‘msgctxt’ line do not mean the same thing.
1 
1    A different kind of entries is used for translations which involve
1 plural forms.
1 
1      WHITE-SPACE
1      #  TRANSLATOR-COMMENTS
1      #. EXTRACTED-COMMENTS
1      #: REFERENCE…
1      #, FLAG…
1      #| msgid PREVIOUS-UNTRANSLATED-STRING-SINGULAR
1      #| msgid_plural PREVIOUS-UNTRANSLATED-STRING-PLURAL
1      msgid UNTRANSLATED-STRING-SINGULAR
1      msgid_plural UNTRANSLATED-STRING-PLURAL
1      msgstr[0] TRANSLATED-STRING-CASE-0
1      ...
1      msgstr[N] TRANSLATED-STRING-CASE-N
1 
1    Such an entry can look like this:
1 
1      #: src/msgcmp.c:338 src/po-lex.c:699
1      #, c-format
1      msgid "found %d fatal error"
1      msgid_plural "found %d fatal errors"
1      msgstr[0] "s'ha trobat %d error fatal"
1      msgstr[1] "s'han trobat %d errors fatals"
1 
1    Here also, a ‘msgctxt’ context can be specified before ‘msgid’, like
1 above.
1 
1    Here, additional kinds of flags can be used:
1 
1 ‘range:’
1      This flag is followed by a range of non-negative numbers, using the
1      syntax ‘range: MINIMUM-VALUE..MAXIMUM-VALUE’.  It designates the
1      possible values that the numeric parameter of the message can take.
1      In some languages, translators may produce slightly better
1      translations if they know that the value can only take on values
1      between 0 and 10, for example.
1 
1    The PREVIOUS-UNTRANSLATED-STRING is optionally inserted by the
1 ‘msgmerge’ program, at the same time when it marks a message fuzzy.  It
1 helps the translator to see which changes were done by the developers on
1 the UNTRANSLATED-STRING.
1 
1    It happens that some lines, usually whitespace or comments, follow
1 the very last entry of a PO file.  Such lines are not part of any entry,
1 and will be dropped when the PO file is processed by the tools, or may
1 disturb some PO file editors.
1 
1    The remainder of this section may be safely skipped by those using a
1 PO file editor, yet it may be interesting for everybody to have a better
1 idea of the precise format of a PO file.  On the other hand, those
1 wishing to modify PO files by hand should carefully continue reading on.
1 
1    An empty UNTRANSLATED-STRING is reserved to contain the header entry
1 with the meta information (⇒Header Entry).  This header entry
1 should be the first entry of the file.  The empty UNTRANSLATED-STRING is
1 reserved for this purpose and must not be used anywhere else.
1 
1    Each of UNTRANSLATED-STRING and TRANSLATED-STRING respects the C
1 syntax for a character string, including the surrounding quotes and
1 embedded backslashed escape sequences.  When the time comes to write
1 multi-line strings, one should not use escaped newlines.  Instead, a
1 closing quote should follow the last character on the line to be
1 continued, and an opening quote should resume the string at the
1 beginning of the following PO file line.  For example:
1 
1      msgid ""
1      "Here is an example of how one might continue a very long string\n"
1      "for the common case the string represents multi-line output.\n"
1 
1 In this example, the empty string is used on the first line, to allow
1 better alignment of the ‘H’ from the word ‘Here’ over the ‘f’ from the
1 word ‘for’.  In this example, the ‘msgid’ keyword is followed by three
1 strings, which are meant to be concatenated.  Concatenating the empty
1 string does not change the resulting overall string, but it is a way for
1 us to comply with the necessity of ‘msgid’ to be followed by a string on
1 the same line, while keeping the multi-line presentation left-justified,
1 as we find this to be a cleaner disposition.  The empty string could
1 have been omitted, but only if the string starting with ‘Here’ was
1 promoted on the first line, right after ‘msgid’.(1)  It was not really
1 necessary either to switch between the two last quoted strings
1 immediately after the newline ‘\n’, the switch could have occurred after
1 _any_ other character, we just did it this way because it is neater.
1 
1    One should carefully distinguish between end of lines marked as ‘\n’
1 _inside_ quotes, which are part of the represented string, and end of
1 lines in the PO file itself, outside string quotes, which have no
1 incidence on the represented string.
1 
1    Outside strings, white lines and comments may be used freely.
1 Comments start at the beginning of a line with ‘#’ and extend until the
1 end of the PO file line.  Comments written by translators should have
1 the initial ‘#’ immediately followed by some white space.  If the ‘#’ is
1 not immediately followed by white space, this comment is most likely
1 generated and managed by specialized GNU tools, and might disappear or
1 be replaced unexpectedly when the PO file is given to ‘msgmerge’.
1 
1    ---------- Footnotes ----------
1 
1    (1) This limitation is not imposed by GNU ‘gettext’, but is for
1 compatibility with the ‘msgfmt’ implementation on Solaris.
1