gettext: Preparing ITS Rules

1 
1 15.6.6 Preparing Rules for XML Internationalization
1 ---------------------------------------------------
1 
1    Marking translatable strings in an XML file is done through a
1 separate "rule" file, making use of the Internationalization Tag Set
1 standard (ITS, <http://www.w3.org/TR/its20/>).  The currently supported
1 ITS data categories are: ‘Translate’, ‘Localization Note’, ‘Elements
1 Within Text’, and ‘Preserve Space’.  In addition to them, ‘xgettext’
1 also recognizes the following extended data categories:
1 
1 ‘Context’
1 
1      This data category associates ‘msgctxt’ to the extracted text.  In
1      the global rule, the ‘contextRule’ element contains the following:
1 
1         • A required ‘selector’ attribute.  It contains an absolute
1           selector that selects the nodes to which this rule applies.
1 
1         • A required ‘contextPointer’ attribute that contains a relative
1           selector pointing to a node that holds the ‘msgctxt’ value.
1 
1         • An optional ‘textPointer’ attribute that contains a relative
1           selector pointing to a node that holds the ‘msgid’ value.
1 
1 ‘Escape Special Characters’
1 
1      This data category indicates whether the special XML characters
1      (‘<’, ‘>’, ‘&’, ‘"’) are escaped with entity reference.  In the
1      global rule, the ‘escapeRule’ element contains the following:
1 
1         • A required ‘selector’ attribute.  It contains an absolute
1           selector that selects the nodes to which this rule applies.
1 
1         • A required ‘escape’ attribute with the value ‘yes’ or ‘no’.
1 
1 ‘Extended Preserve Space’
1 
1      This data category extends the standard ‘Preserve Space’ data
1      category with the additional value ‘trim’.  The value means to
1      remove the leading and trailing whitespaces of the content, but not
1      to normalize whitespaces in the middle.  In the global rule, the
1      ‘preserveSpaceRule’ element contains the following:
1 
1         • A required ‘selector’ attribute.  It contains an absolute
1           selector that selects the nodes to which this rule applies.
1 
1         • A required ‘space’ attribute with the value ‘default’,
1           ‘preserve’, or ‘trim’.
1 
1    All those extended data categories can only be expressed with global
1 rules, and the rule elements have to have the
1 ‘https://www.gnu.org/s/gettext/ns/its/extensions/1.0’ namespace.
1 
1    Given the following XML document in a file ‘messages.xml’:
1 
1      <?xml version="1.0"?>
1      <messages>
1        <message>
1          <p>A translatable string</p>
1        </message>
1        <message>
1          <p translatable="no">A non-translatable string</p>
1        </message>
1      </messages>
1 
1    To extract the first text content ("A translatable string"), but not
1 the second ("A non-translatable string"), the following ITS rules can be
1 used:
1 
1      <?xml version="1.0"?>
1      <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
1        <its:translateRule selector="/messages" translate="no"/>
1        <its:translateRule selector="//message/p" translate="yes"/>
1 
1        <!-- If 'p' has an attribute 'translatable' with the value 'no', then
1             the content is not translatable.  -->
1        <its:translateRule selector="//message/p[@translatable = 'no']"
1          translate="no"/>
1      </its:rules>
1 
1    ‘xgettext’ needs another file called "locating rule" to associate an
1 ITS rule with an XML file.  If the above ITS file is saved as
1 ‘messages.its’, the locating rule would look like:
1 
1      <?xml version="1.0"?>
1      <locatingRules>
1        <locatingRule name="Messages" pattern="*.xml">
1          <documentRule localName="messages" target="messages.its"/>
1        </locatingRule>
1        <locatingRule name="Messages" pattern="*.msg" target="messages.its"/>
1      </locatingRules>
1 
1    The ‘locatingRule’ element must have a ‘pattern’ attribute, which
1 denotes either a literal file name or a wildcard pattern of the XML
1 file(1).  The ‘locatingRule’ element can have child ‘documentRule’
1 element, which adds checks on the content of the XML file.
1 
1    The first rule matches any file with the ‘.xml’ file extension, but
1 it only applies to XML files whose root element is ‘<messages>’.
1 
1    The second rule indicates that the same ITS rule file are also
1 applicable to any file with the ‘.msg’ file extension.  The optional
1 ‘name’ attribute of ‘locatingRule’ allows to choose rules by name,
1 typically with ‘xgettext’’s ‘-L’ option.
1 
1    The associated ITS rule file is indicated by the ‘target’ attribute
1 of ‘locatingRule’ or ‘documentRule’.  If it is specified in a
1 ‘documentRule’ element, the parent ‘locatingRule’ shouldn’t have the
1 ‘target’ attribute.
1 
1    Locating rule files must have the ‘.loc’ file extension.  Both ITS
1 rule files and locating rule files must be installed in the
1 ‘$prefix/share/gettext/its’ directory.  Once those files are properly
1 installed, ‘xgettext’ can extract translatable strings from the matching
1 XML files.
1 
1 15.6.6.1 Two Use-cases of Translated Strings in XML
1 ...................................................
1 
1    For XML, there are two use-cases of translated strings.  One is the
1 case where the translated strings are directly consumed by programs, and
1 the other is the case where the translated strings are merged back to
1 the original XML document.  In the former case, special characters in
1 the extracted strings shouldn’t be escaped, while they should in the
1 latter case.  To control wheter to escape special characters, the
1 ‘Escape Special Characters’ data category can be used.
1 
1    To merge the translations, the ‘msgfmt’ program can be used with the
1 option ‘--xml’.  ⇒msgfmt Invocation, for more details about how
1 one calls the ‘msgfmt’ program.  ‘msgfmt’’s ‘--xml’ option doesn’t
1 perform character escaping, so translated strings can have arbitrary XML
1 constructs, such as elements for markup.
1 
1    ---------- Footnotes ----------
1 
1    (1) Note that the file name matching is done after removing any ‘.in’
1 suffix from the input file name.  Thus the ‘pattern’ attribute must not
1 include a pattern matching ‘.in’.  For example, if the input file name
1 is ‘foo.msg.in’, the pattern should be either ‘*.msg’ or just ‘*’,
1 rather than ‘*.in’.
1