gettext: Normalizing

1 
1 8.3.4 Normalizing Strings in Entries
1 ------------------------------------
1 
1    There are many different ways for encoding a particular string into a
1 PO file entry, because there are so many different ways to split and
1 quote multi-line strings, and even, to represent special characters by
1 backslashed escaped sequences.  Some features of PO mode rely on the
1 ability for PO mode to scan an already existing PO file for a particular
1 string encoded into the ‘msgid’ field of some entry.  Even if PO mode
1 has internally all the built-in machinery for implementing this
1 recognition easily, doing it fast is technically difficult.  To
1 facilitate a solution to this efficiency problem, we decided on a
1 canonical representation for strings.
1 
1    A conventional representation of strings in a PO file is currently
1 under discussion, and PO mode experiments with a canonical
1 representation.  Having both ‘xgettext’ and PO mode converging towards a
1 uniform way of representing equivalent strings would be useful, as the
1 internal normalization needed by PO mode could be automatically
1 satisfied when using ‘xgettext’ from GNU ‘gettext’.  An explicit PO mode
1 normalization should then be only necessary for PO files imported from
1 elsewhere, or for when the convention itself evolves.
1 
1    So, for achieving normalization of at least the strings of a given PO
1 file needing a canonical representation, the following PO mode command
1 is available:
1 
1 ‘M-x po-normalize’
1      Tidy the whole PO file by making entries more uniform.
1 
1    The special command ‘M-x po-normalize’, which has no associated keys,
1 revises all entries, ensuring that strings of both original and
1 translated entries use uniform internal quoting in the PO file.  It also
1 removes any crumb after the last entry.  This command may be useful for
1 PO files freshly imported from elsewhere, or if we ever improve on the
1 canonical quoting format we use.  This canonical format is not only
1 meant for getting cleaner PO files, but also for greatly speeding up
1 ‘msgid’ string lookup for some other PO mode commands.
1 
1    ‘M-x po-normalize’ presently makes three passes over the entries.
1 The first implements heuristics for converting PO files for GNU
1 ‘gettext’ 0.6 and earlier, in which ‘msgid’ and ‘msgstr’ fields were
1 using K&R style C string syntax for multi-line strings.  These
1 heuristics may fail for comments not related to obsolete entries and
1 ending with a backslash; they also depend on subsequent passes for
1 finalizing the proper commenting of continued lines for obsolete
1 entries.  This first pass might disappear once all oldish PO files would
1 have been adjusted.  The second and third pass normalize all ‘msgid’ and
1 ‘msgstr’ strings respectively.  They also clean out those trailing
1 backslashes used by XView’s ‘msgfmt’ for continued lines.
1 
1    Having such an explicit normalizing command allows for importing PO
1 files from other sources, but also eases the evolution of the current
1 convention, evolution driven mostly by aesthetic concerns, as of now.
1 It is easy to make suggested adjustments at a later time, as the
1 normalizing command and eventually, other GNU ‘gettext’ tools should
1 greatly automate conformance.  A description of the canonical string
1 format is given below, for the particular benefit of those not having
1 Emacs handy, and who would nevertheless want to handcraft their PO files
1 in nice ways.
1 
1    Right now, in PO mode, strings are single line or multi-line.  A
1 string goes multi-line if and only if it has _embedded_ newlines, that
1 is, if it matches ‘[^\n]\n+[^\n]’.  So, we would have:
1 
1      msgstr "\n\nHello, world!\n\n\n"
1 
1    but, replacing the space by a newline, this becomes:
1 
1      msgstr ""
1      "\n"
1      "\n"
1      "Hello,\n"
1      "world!\n"
1      "\n"
1      "\n"
1 
1    We are deliberately using a caricatural example, here, to make the
1 point clearer.  Usually, multi-lines are not that bad looking.  It is
1 probable that we will implement the following suggestion.  We might lump
1 together all initial newlines into the empty string, and also all
1 newlines introducing empty lines (that is, for N > 1, the N-1’th last
1 newlines would go together on a separate string), so making the previous
1 example appear:
1 
1      msgstr "\n\n"
1      "Hello,\n"
1      "world!\n"
1      "\n\n"
1 
1    There are a few yet undecided little points about string
1 normalization, to be documented in this manual, once these questions
1 settle.
1