gettext: Language Implementors
15.1 The Language Implementor’s View
====================================
All programming and scripting languages that have the notion of
strings are eligible to supporting ‘gettext’. Supporting ‘gettext’
means the following:
1. You should add to the language a syntax for translatable strings.
In principle, a function call of ‘gettext’ would do, but a
shorthand syntax helps keeping the legibility of internationalized
programs. For example, in C we use the syntax ‘_("string")’, and
in GNU awk we use the shorthand ‘_"string"’.
2. You should arrange that evaluation of such a translatable string at
runtime calls the ‘gettext’ function, or performs equivalent
processing.
3. Similarly, you should make the functions ‘ngettext’, ‘dcgettext’,
‘dcngettext’ available from within the language. These functions
are less often used, but are nevertheless necessary for particular
purposes: ‘ngettext’ for correct plural handling, and ‘dcgettext’
and ‘dcngettext’ for obeying other locale-related environment
variables than ‘LC_MESSAGES’, such as ‘LC_TIME’ or ‘LC_MONETARY’.
For these latter functions, you need to make the ‘LC_*’ constants,
available in the C header ‘<locale.h>’, referenceable from within
the language, usually either as enumeration values or as strings.
4. You should allow the programmer to designate a message domain,
either by making the ‘textdomain’ function available from within
the language, or by introducing a magic variable called
‘TEXTDOMAIN’. Similarly, you should allow the programmer to
designate where to search for message catalogs, by providing access
to the ‘bindtextdomain’ function or — on native Windows platforms —
to the ‘wbindtextdomain’ function.
5. You should either perform a ‘setlocale (LC_ALL, "")’ call during
the startup of your language runtime, or allow the programmer to do
so. Remember that gettext will act as a no-op if the ‘LC_MESSAGES’
and ‘LC_CTYPE’ locale categories are not both set.
6. A programmer should have a way to extract translatable strings from
a program into a PO file. The GNU ‘xgettext’ program is being
extended to support very different programming languages. Please
contact the GNU ‘gettext’ maintainers to help them doing this. The
GNU ‘gettext’ maintainers will need from you a formal description
of the lexical structure of source files. It should answer the
questions:
• What does a token look like?
• What does a string literal look like? What escape characters
exist inside a string?
• What escape characters exist outside of strings? If Unicode
escapes are supported, are they applied before or after
tokenization?
• What is the syntax for function calls? How are consecutive
arguments in the same function call separated?
• What is the syntax for comments?
Based on this description, the GNU ‘gettext’ maintainers can add
support to ‘xgettext’.
If the string extractor is best integrated into your language’s
parser, GNU ‘xgettext’ can function as a front end to your string
extractor.
7. The language’s library should have a string formatting facility.
Additionally:
1. There must be a way, in the format string, to denote the
arguments by a positional number or a name. This is needed
because for some languages and some messages with more than
one substitutable argument, the translation will need to
output the substituted arguments in different order. ⇒
c-format Flag.
2. The syntax of format strings must be documented in a way that
translators can understand. The GNU ‘gettext’ manual will be
extended to include a pointer to this documentation.
Based on this, the GNU ‘gettext’ maintainers can add a format
string equivalence checker to ‘msgfmt’, so that translators get
told immediately when they have made a mistake during the
translation of a format string.
8. If the language has more than one implementation, and not all of
the implementations use ‘gettext’, but the programs should be
portable across implementations, you should provide a no-i18n
emulation, that makes the other implementations accept programs
written for yours, without actually translating the strings.
9. To help the programmer in the task of marking translatable strings,
which is sometimes performed using the Emacs PO mode (⇒
Marking), you are welcome to contact the GNU ‘gettext’
maintainers, so they can add support for your language to
‘po-mode.el’.
On the implementation side, two approaches are possible, with
different effects on portability and copyright:
• You may link against GNU ‘gettext’ functions if they are found in
the C library. For example, an autoconf test for ‘gettext()’ and
‘ngettext()’ will detect this situation. For the moment, this test
will succeed on GNU systems and on Solaris 11 platforms. No severe
copyright restrictions apply, except if you want to distribute
statically linked binaries.
• You may emulate or reimplement the GNU ‘gettext’ functionality.
This has the advantage of full portability and no copyright
restrictions, but also the drawback that you have to reimplement
the GNU ‘gettext’ features (such as the ‘LANGUAGE’ environment
variable, the locale aliases database, the automatic charset
conversion, and plural handling).