gettext: Preparing ITS Rules
1
1 15.6.6 Preparing Rules for XML Internationalization
1 ---------------------------------------------------
1
1 Marking translatable strings in an XML file is done through a
1 separate "rule" file, making use of the Internationalization Tag Set
1 standard (ITS, <http://www.w3.org/TR/its20/>). The currently supported
1 ITS data categories are: ‘Translate’, ‘Localization Note’, ‘Elements
1 Within Text’, and ‘Preserve Space’. In addition to them, ‘xgettext’
1 also recognizes the following extended data categories:
1
1 ‘Context’
1
1 This data category associates ‘msgctxt’ to the extracted text. In
1 the global rule, the ‘contextRule’ element contains the following:
1
1 • A required ‘selector’ attribute. It contains an absolute
1 selector that selects the nodes to which this rule applies.
1
1 • A required ‘contextPointer’ attribute that contains a relative
1 selector pointing to a node that holds the ‘msgctxt’ value.
1
1 • An optional ‘textPointer’ attribute that contains a relative
1 selector pointing to a node that holds the ‘msgid’ value.
1
1 ‘Escape Special Characters’
1
1 This data category indicates whether the special XML characters
1 (‘<’, ‘>’, ‘&’, ‘"’) are escaped with entity reference. In the
1 global rule, the ‘escapeRule’ element contains the following:
1
1 • A required ‘selector’ attribute. It contains an absolute
1 selector that selects the nodes to which this rule applies.
1
1 • A required ‘escape’ attribute with the value ‘yes’ or ‘no’.
1
1 ‘Extended Preserve Space’
1
1 This data category extends the standard ‘Preserve Space’ data
1 category with the additional value ‘trim’. The value means to
1 remove the leading and trailing whitespaces of the content, but not
1 to normalize whitespaces in the middle. In the global rule, the
1 ‘preserveSpaceRule’ element contains the following:
1
1 • A required ‘selector’ attribute. It contains an absolute
1 selector that selects the nodes to which this rule applies.
1
1 • A required ‘space’ attribute with the value ‘default’,
1 ‘preserve’, or ‘trim’.
1
1 All those extended data categories can only be expressed with global
1 rules, and the rule elements have to have the
1 ‘https://www.gnu.org/s/gettext/ns/its/extensions/1.0’ namespace.
1
1 Given the following XML document in a file ‘messages.xml’:
1
1 <?xml version="1.0"?>
1 <messages>
1 <message>
1 <p>A translatable string</p>
1 </message>
1 <message>
1 <p translatable="no">A non-translatable string</p>
1 </message>
1 </messages>
1
1 To extract the first text content ("A translatable string"), but not
1 the second ("A non-translatable string"), the following ITS rules can be
1 used:
1
1 <?xml version="1.0"?>
1 <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
1 <its:translateRule selector="/messages" translate="no"/>
1 <its:translateRule selector="//message/p" translate="yes"/>
1
1 <!-- If 'p' has an attribute 'translatable' with the value 'no', then
1 the content is not translatable. -->
1 <its:translateRule selector="//message/p[@translatable = 'no']"
1 translate="no"/>
1 </its:rules>
1
1 ‘xgettext’ needs another file called "locating rule" to associate an
1 ITS rule with an XML file. If the above ITS file is saved as
1 ‘messages.its’, the locating rule would look like:
1
1 <?xml version="1.0"?>
1 <locatingRules>
1 <locatingRule name="Messages" pattern="*.xml">
1 <documentRule localName="messages" target="messages.its"/>
1 </locatingRule>
1 <locatingRule name="Messages" pattern="*.msg" target="messages.its"/>
1 </locatingRules>
1
1 The ‘locatingRule’ element must have a ‘pattern’ attribute, which
1 denotes either a literal file name or a wildcard pattern of the XML
1 file(1). The ‘locatingRule’ element can have child ‘documentRule’
1 element, which adds checks on the content of the XML file.
1
1 The first rule matches any file with the ‘.xml’ file extension, but
1 it only applies to XML files whose root element is ‘<messages>’.
1
1 The second rule indicates that the same ITS rule file are also
1 applicable to any file with the ‘.msg’ file extension. The optional
1 ‘name’ attribute of ‘locatingRule’ allows to choose rules by name,
1 typically with ‘xgettext’’s ‘-L’ option.
1
1 The associated ITS rule file is indicated by the ‘target’ attribute
1 of ‘locatingRule’ or ‘documentRule’. If it is specified in a
1 ‘documentRule’ element, the parent ‘locatingRule’ shouldn’t have the
1 ‘target’ attribute.
1
1 Locating rule files must have the ‘.loc’ file extension. Both ITS
1 rule files and locating rule files must be installed in the
1 ‘$prefix/share/gettext/its’ directory. Once those files are properly
1 installed, ‘xgettext’ can extract translatable strings from the matching
1 XML files.
1
1 15.6.6.1 Two Use-cases of Translated Strings in XML
1 ...................................................
1
1 For XML, there are two use-cases of translated strings. One is the
1 case where the translated strings are directly consumed by programs, and
1 the other is the case where the translated strings are merged back to
1 the original XML document. In the former case, special characters in
1 the extracted strings shouldn’t be escaped, while they should in the
1 latter case. To control wheter to escape special characters, the
1 ‘Escape Special Characters’ data category can be used.
1
1 To merge the translations, the ‘msgfmt’ program can be used with the
1 option ‘--xml’. ⇒msgfmt Invocation, for more details about how
1 one calls the ‘msgfmt’ program. ‘msgfmt’’s ‘--xml’ option doesn’t
1 perform character escaping, so translated strings can have arbitrary XML
1 constructs, such as elements for markup.
1
1 ---------- Footnotes ----------
1
1 (1) Note that the file name matching is done after removing any ‘.in’
1 suffix from the input file name. Thus the ‘pattern’ attribute must not
1 include a pattern matching ‘.in’. For example, if the input file name
1 is ‘foo.msg.in’, the pattern should be either ‘*.msg’ or just ‘*’,
1 rather than ‘*.in’.
1