gettext: Normalizing
1
1 8.3.4 Normalizing Strings in Entries
1 ------------------------------------
1
1 There are many different ways for encoding a particular string into a
1 PO file entry, because there are so many different ways to split and
1 quote multi-line strings, and even, to represent special characters by
1 backslashed escaped sequences. Some features of PO mode rely on the
1 ability for PO mode to scan an already existing PO file for a particular
1 string encoded into the ‘msgid’ field of some entry. Even if PO mode
1 has internally all the built-in machinery for implementing this
1 recognition easily, doing it fast is technically difficult. To
1 facilitate a solution to this efficiency problem, we decided on a
1 canonical representation for strings.
1
1 A conventional representation of strings in a PO file is currently
1 under discussion, and PO mode experiments with a canonical
1 representation. Having both ‘xgettext’ and PO mode converging towards a
1 uniform way of representing equivalent strings would be useful, as the
1 internal normalization needed by PO mode could be automatically
1 satisfied when using ‘xgettext’ from GNU ‘gettext’. An explicit PO mode
1 normalization should then be only necessary for PO files imported from
1 elsewhere, or for when the convention itself evolves.
1
1 So, for achieving normalization of at least the strings of a given PO
1 file needing a canonical representation, the following PO mode command
1 is available:
1
1 ‘M-x po-normalize’
1 Tidy the whole PO file by making entries more uniform.
1
1 The special command ‘M-x po-normalize’, which has no associated keys,
1 revises all entries, ensuring that strings of both original and
1 translated entries use uniform internal quoting in the PO file. It also
1 removes any crumb after the last entry. This command may be useful for
1 PO files freshly imported from elsewhere, or if we ever improve on the
1 canonical quoting format we use. This canonical format is not only
1 meant for getting cleaner PO files, but also for greatly speeding up
1 ‘msgid’ string lookup for some other PO mode commands.
1
1 ‘M-x po-normalize’ presently makes three passes over the entries.
1 The first implements heuristics for converting PO files for GNU
1 ‘gettext’ 0.6 and earlier, in which ‘msgid’ and ‘msgstr’ fields were
1 using K&R style C string syntax for multi-line strings. These
1 heuristics may fail for comments not related to obsolete entries and
1 ending with a backslash; they also depend on subsequent passes for
1 finalizing the proper commenting of continued lines for obsolete
1 entries. This first pass might disappear once all oldish PO files would
1 have been adjusted. The second and third pass normalize all ‘msgid’ and
1 ‘msgstr’ strings respectively. They also clean out those trailing
1 backslashes used by XView’s ‘msgfmt’ for continued lines.
1
1 Having such an explicit normalizing command allows for importing PO
1 files from other sources, but also eases the evolution of the current
1 convention, evolution driven mostly by aesthetic concerns, as of now.
1 It is easy to make suggested adjustments at a later time, as the
1 normalizing command and eventually, other GNU ‘gettext’ tools should
1 greatly automate conformance. A description of the canonical string
1 format is given below, for the particular benefit of those not having
1 Emacs handy, and who would nevertheless want to handcraft their PO files
1 in nice ways.
1
1 Right now, in PO mode, strings are single line or multi-line. A
1 string goes multi-line if and only if it has _embedded_ newlines, that
1 is, if it matches ‘[^\n]\n+[^\n]’. So, we would have:
1
1 msgstr "\n\nHello, world!\n\n\n"
1
1 but, replacing the space by a newline, this becomes:
1
1 msgstr ""
1 "\n"
1 "\n"
1 "Hello,\n"
1 "world!\n"
1 "\n"
1 "\n"
1
1 We are deliberately using a caricatural example, here, to make the
1 point clearer. Usually, multi-lines are not that bad looking. It is
1 probable that we will implement the following suggestion. We might lump
1 together all initial newlines into the empty string, and also all
1 newlines introducing empty lines (that is, for N > 1, the N-1’th last
1 newlines would go together on a separate string), so making the previous
1 example appear:
1
1 msgstr "\n\n"
1 "Hello,\n"
1 "world!\n"
1 "\n\n"
1
1 There are a few yet undecided little points about string
1 normalization, to be documented in this manual, once these questions
1 settle.
1