gettext: Preparing ITS Rules

 
 16.1.6 Preparing Rules for XML Internationalization
 ---------------------------------------------------
 
    Marking translatable strings in an XML file is done through a
 separate "rule" file, making use of the Internationalization Tag Set
 standard (ITS, <https://www.w3.org/TR/its20/>).  The currently supported
 ITS data categories are: ‘Translate’, ‘Localization Note’, ‘Elements
 Within Text’, and ‘Preserve Space’.  In addition to them, ‘xgettext’
 also recognizes the following extended data categories:
 
 ‘Context’
 
      This data category associates ‘msgctxt’ to the extracted text.  In
      the global rule, the ‘contextRule’ element contains the following:
 
         • A required ‘selector’ attribute.  It contains an absolute
           selector that selects the nodes to which this rule applies.
 
         • A required ‘contextPointer’ attribute that contains a relative
           selector pointing to a node that holds the ‘msgctxt’ value.
 
         • An optional ‘textPointer’ attribute that contains a relative
           selector pointing to a node that holds the ‘msgid’ value.
 
 ‘Escape Special Characters’
 
      This data category indicates whether the special XML characters
      (‘<’, ‘>’, ‘&’, ‘"’) are escaped with entity reference.  In the
      global rule, the ‘escapeRule’ element contains the following:
 
         • A required ‘selector’ attribute.  It contains an absolute
           selector that selects the nodes to which this rule applies.
 
         • A required ‘escape’ attribute with the value ‘yes’ or ‘no’.
 
 ‘Extended Preserve Space’
 
      This data category extends the standard ‘Preserve Space’ data
      category with the additional values ‘trim’ and ‘paragraph’.  ‘trim’
      means to remove the leading and trailing whitespaces of the
      content, but not to normalize whitespaces in the middle.
      ‘paragraph’ means to normalize the content but keep the paragraph
      boundaries.  In the global rule, the ‘preserveSpaceRule’ element
      contains the following:
 
         • A required ‘selector’ attribute.  It contains an absolute
           selector that selects the nodes to which this rule applies.
 
         • A required ‘space’ attribute with the value ‘default’,
           ‘preserve’, ‘trim’, or ‘paragraph’.
 
    All those extended data categories can only be expressed with global
 rules, and the rule elements have to have the
 ‘https://www.gnu.org/s/gettext/ns/its/extensions/1.0’ namespace.
 
    Given the following XML document in a file ‘messages.xml’:
 
      <?xml version="1.0"?>
      <messages>
        <message>
          <p>A translatable string</p>
        </message>
        <message>
          <p translatable="no">A non-translatable string</p>
        </message>
      </messages>
 
    To extract the first text content ("A translatable string"), but not
 the second ("A non-translatable string"), the following ITS rules can be
 used:
 
      <?xml version="1.0"?>
      <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
        <its:translateRule selector="/messages" translate="no"/>
        <its:translateRule selector="//message/p" translate="yes"/>
 
        <!-- If 'p' has an attribute 'translatable' with the value 'no', then
             the content is not translatable.  -->
        <its:translateRule selector="//message/p[@translatable = 'no']"
          translate="no"/>
      </its:rules>
 
    ‘xgettext’ needs another file called "locating rule" to associate an
 ITS rule with an XML file.  If the above ITS file is saved as
 ‘messages.its’, the locating rule would look like:
 
      <?xml version="1.0"?>
      <locatingRules>
        <locatingRule name="Messages" pattern="*.xml">
          <documentRule localName="messages" target="messages.its"/>
        </locatingRule>
        <locatingRule name="Messages" pattern="*.msg" target="messages.its"/>
      </locatingRules>
 
    The ‘locatingRule’ element must have a ‘pattern’ attribute, which
 denotes either a literal file name or a wildcard pattern of the XML
 file(1).  The ‘locatingRule’ element can have child ‘documentRule’
 element, which adds checks on the content of the XML file.
 
    The first rule matches any file with the ‘.xml’ file extension, but
 it only applies to XML files whose root element is ‘<messages>’.
 
    The second rule indicates that the same ITS rule file are also
 applicable to any file with the ‘.msg’ file extension.  The optional
 ‘name’ attribute of ‘locatingRule’ allows to choose rules by name,
 typically with ‘xgettext’’s ‘-L’ option.
 
    The associated ITS rule file is indicated by the ‘target’ attribute
 of ‘locatingRule’ or ‘documentRule’.  If it is specified in a
 ‘documentRule’ element, the parent ‘locatingRule’ shouldn’t have the
 ‘target’ attribute.
 
    Locating rule files must have the ‘.loc’ file extension.  Both ITS
 rule files and locating rule files must be installed in the
 ‘$prefix/share/gettext/its’ directory.  Once those files are properly
 installed, ‘xgettext’ can extract translatable strings from the matching
 XML files.
 
 16.1.6.1 Two Use-cases of Translated Strings in XML
 ...................................................
 
    For XML, there are two use-cases of translated strings.  One is the
 case where the translated strings are directly consumed by programs, and
 the other is the case where the translated strings are merged back to
 the original XML document.  In the former case, special characters in
 the extracted strings shouldn’t be escaped, while they should in the
 latter case.  To control wheter to escape special characters, the
 ‘Escape Special Characters’ data category can be used.
 
    To merge the translations, the ‘msgfmt’ program can be used with the
 option ‘--xml’.  ⇒msgfmt Invocation, for more details about how
 one calls the ‘msgfmt’ program.  ‘msgfmt’’s ‘--xml’ option doesn’t
 perform character escaping, so translated strings can have arbitrary XML
 constructs, such as elements for markup.
 
    ---------- Footnotes ----------
 
    (1) Note that the file name matching is done after removing any ‘.in’
 suffix from the input file name.  Thus the ‘pattern’ attribute must not
 include a pattern matching ‘.in’.  For example, if the input file name
 is ‘foo.msg.in’, the pattern should be either ‘*.msg’ or just ‘*’,
 rather than ‘*.in’.