gettext: Language Implementors

 
 15.1 The Language Implementor’s View
 ====================================
 
    All programming and scripting languages that have the notion of
 strings are eligible to supporting ‘gettext’.  Supporting ‘gettext’
 means the following:
 
   1. You should add to the language a syntax for translatable strings.
      In principle, a function call of ‘gettext’ would do, but a
      shorthand syntax helps keeping the legibility of internationalized
      programs.  For example, in C we use the syntax ‘_("string")’, and
      in GNU awk we use the shorthand ‘_"string"’.
 
   2. You should arrange that evaluation of such a translatable string at
      runtime calls the ‘gettext’ function, or performs equivalent
      processing.
 
   3. Similarly, you should make the functions ‘ngettext’, ‘dcgettext’,
      ‘dcngettext’ available from within the language.  These functions
      are less often used, but are nevertheless necessary for particular
      purposes: ‘ngettext’ for correct plural handling, and ‘dcgettext’
      and ‘dcngettext’ for obeying other locale-related environment
      variables than ‘LC_MESSAGES’, such as ‘LC_TIME’ or ‘LC_MONETARY’.
      For these latter functions, you need to make the ‘LC_*’ constants,
      available in the C header ‘<locale.h>’, referenceable from within
      the language, usually either as enumeration values or as strings.
 
   4. You should allow the programmer to designate a message domain,
      either by making the ‘textdomain’ function available from within
      the language, or by introducing a magic variable called
      ‘TEXTDOMAIN’.  Similarly, you should allow the programmer to
      designate where to search for message catalogs, by providing access
      to the ‘bindtextdomain’ function or — on native Windows platforms —
      to the ‘wbindtextdomain’ function.
 
   5. You should either perform a ‘setlocale (LC_ALL, "")’ call during
      the startup of your language runtime, or allow the programmer to do
      so.  Remember that gettext will act as a no-op if the ‘LC_MESSAGES’
      and ‘LC_CTYPE’ locale categories are not both set.
 
   6. A programmer should have a way to extract translatable strings from
      a program into a PO file.  The GNU ‘xgettext’ program is being
      extended to support very different programming languages.  Please
      contact the GNU ‘gettext’ maintainers to help them doing this.  The
      GNU ‘gettext’ maintainers will need from you a formal description
      of the lexical structure of source files.  It should answer the
      questions:
         • What does a token look like?
         • What does a string literal look like?  What escape characters
           exist inside a string?
         • What escape characters exist outside of strings?  If Unicode
           escapes are supported, are they applied before or after
           tokenization?
         • What is the syntax for function calls?  How are consecutive
           arguments in the same function call separated?
         • What is the syntax for comments?
      Based on this description, the GNU ‘gettext’ maintainers can add
      support to ‘xgettext’.
 
      If the string extractor is best integrated into your language’s
      parser, GNU ‘xgettext’ can function as a front end to your string
      extractor.
 
   7. The language’s library should have a string formatting facility.
      Additionally:
        1. There must be a way, in the format string, to denote the
           arguments by a positional number or a name.  This is needed
           because for some languages and some messages with more than
           one substitutable argument, the translation will need to
           output the substituted arguments in different order.  ⇒
           c-format Flag.
        2. The syntax of format strings must be documented in a way that
           translators can understand.  The GNU ‘gettext’ manual will be
           extended to include a pointer to this documentation.
      Based on this, the GNU ‘gettext’ maintainers can add a format
      string equivalence checker to ‘msgfmt’, so that translators get
      told immediately when they have made a mistake during the
      translation of a format string.
 
   8. If the language has more than one implementation, and not all of
      the implementations use ‘gettext’, but the programs should be
      portable across implementations, you should provide a no-i18n
      emulation, that makes the other implementations accept programs
      written for yours, without actually translating the strings.
 
   9. To help the programmer in the task of marking translatable strings,
      which is sometimes performed using the Emacs PO mode (⇒
      Marking), you are welcome to contact the GNU ‘gettext’
      maintainers, so they can add support for your language to
      ‘po-mode.el’.
 
    On the implementation side, two approaches are possible, with
 different effects on portability and copyright:
 
    • You may link against GNU ‘gettext’ functions if they are found in
      the C library.  For example, an autoconf test for ‘gettext()’ and
      ‘ngettext()’ will detect this situation.  For the moment, this test
      will succeed on GNU systems and on Solaris 11 platforms.  No severe
      copyright restrictions apply, except if you want to distribute
      statically linked binaries.
 
    • You may emulate or reimplement the GNU ‘gettext’ functionality.
      This has the advantage of full portability and no copyright
      restrictions, but also the drawback that you have to reimplement
      the GNU ‘gettext’ features (such as the ‘LANGUAGE’ environment
      variable, the locale aliases database, the automatic charset
      conversion, and plural handling).