web2c: TCX files

 
 5.4.2 TCX files: Character translations
 ---------------------------------------
 
 TCX (TeX character translation) files help TeX support direct input of
 8-bit international characters if fonts containing those characters are
 being used.  Specifically, they map an input (keyboard) character code
 to the internal TeX character code (a superset of ASCII).
 
    Of the various proposals for handling more than one input encoding,
 TCX files were chosen because they follow Knuth's original ideas for the
 use of the 'xchr' and 'xord' tables.  He ventured that these would be
 changed in the WEB source in order to adjust the actual version to a
 given environment.  It turns out, however, that recompiling the WEB
 sources is not as simple a task as Knuth may have imagined; therefore,
 TCX files, providing the possibility of changing of the conversion
 tables on on-the-fly, have been implemented instead.
 
    This approach limits the portability of TeX documents, as some
 implementations do not support it (or use a different method for
 input-internal reencoding).  It may also be problematic to determine the
 encoding to use for a TeX document of unknown provenance; in the worst
 case, failure to do so correctly may result in subtle errors in the
 typeset output.  But we feel the benefits outweigh these disadvantages.
 
    This is entirely independent of the MLTeX extension (⇒MLTeX):
 whereas a TCX file defines how an input keyboard character is mapped to
 TeX's internal code, MLTeX defines substitutions for a non-existing
 character glyph in a font with a '\accent' construction made out of two
 separate character glyphs.  TCX files involve no new primitives; it is
 not possible to specify that an input (keyboard) character maps to more
 than one character.
 
    Information on specifying TCX files:
 
    * The best way to specify a TCX file is to list it explicitly in the
      first line of the main document:
           %& -translate-file=TCXFILE
 
    * You can also specify a TCX file to be used on a particular TeX run
      with the command-line option '-translate-file=TCXFILE'.
 
    * TCX files are searched for along the 'WEB2C' path.
 
    * Initial TeX (⇒Initial TeX Initial TeX.) ignores TCX files.
 
    The Web2c distribution comes with a number of TCX files.  Two
 important ones are 'il1-t1.tcx' and 'il2-t1.tcx', which support ISO
 Latin 1 and ISO Latin 2, respectively, with Cork-encoded fonts
 (a.k.a. the LaTeX T1 encoding).  TCX files for Czech, Polish, and Slovak
 are also provided.
 
    One other notable TCX file is 'empty.tcx', which is, well, empty.
 Its purpose is to reset Web2C's behavior to the default (only visible
 ASCII being printable, as described below) when a format was dumped with
 another TCX being active--which is in fact the case for everything but
 plain TeX in the TeX Live and other distributions.  Thus:
 
      latex somefile8.tex
      => terminal etc. output with 8-bit chars
      latex --translate-file=empty.tcx somefile8.tex
      => terminal etc. output with ^^ notation
 
    Syntax of TCX files:
   1. Line-oriented.  Blank lines are ignored.
 
   2. Whitespace is ignored except as a separator.
 
   3. Comments start with '%' and continue to the end of the line.
 
   4. Otherwise, a line consists of one or two character codes,
      optionally followed by 0 or 1.  The last number indicates whether
      DEST is considered printable.
           SRC [DEST [PRNT]]
 
   5. Each character code may be specified in octal with a leading '0',
      hexadecimal with a leading '0x', or decimal otherwise.  Values must
      be between 0 and 255, inclusive (decimal).
 
   6. If the DEST code is not specified, it is taken to be the same as
      SRC.
 
   7. If the same SRC code is specified more than once, it is the last
      definition that counts.
 
    Finally, here's what happens: when TeX sees an input character with
 code SRC, it 1) changes SRC to DEST; and 2) makes the DEST code
 "printable", i.e., printed as-is in diagnostics and the log file rather
 than in '^^' notation.
 
    By default, no characters are translated, and character codes between
 32 and 126 inclusive (decimal) are printable.
 
    Specifying translations for the printable ASCII characters (codes
 32-127) will yield unpredictable results.  Additionally you shouldn't
 make the following characters printable: '^^I' (TAB), '^^J' (line feed),
 '^^M' (carriage return), and '^^?' (delete), since TeX uses them in
 various ways.
 
    Thus, the idea is to specify the input (keyboard) character code for
 SRC, and the output (font) character code for DEST.
 
    By default, only the printable ASCII characters are considered
 printable by TeX.  If you specify the '-8bit' option, all characters are
 considered printable by default.  If you specify both the '-8bit' option
 and a TCX file, then the TCX can set specific characters to be
 non-printable.
 
    Both the specified TCX encoding and whether characters are printable
 are saved in the dump files (like 'tex.fmt').  So by giving these
 options in combination with '-ini', you control the defaults seen by
 anyone who uses the resulting dump file.
 
    When loading a dump, if the '-8bit' option was given, then all
 characters become printable by default.
 
    When loading a dump, if a TCX file was specified, then the TCX data
 from the dump is ignored and the data from the file used instead.