Directory nonpareil/compiler/lib
================================

This directory contains a miscellaneous set of files read and written
by the Nonpareil compiler, "npc".


To update the Unicode character database
----------------------------------------

To update the Unicode character database, replace the existing versions
of files UnicodeData.txt and SpecialCasing.txt in this directory with
new ones from the Unicode web site.  Then delete whatever subset of files
char_pool_be, char_trie_be, char_pool_le, and char_trie_le is currently
present ("make clean" in directory nonpareil does this).  These files
will be regenerated automatically on the next run of npc.  For further
information about all these files, see below.

It would be prudent to save the old UnicodeData.txt and SpecialCasing.txt
nearby while doing this.  While the Unicode Consortium has promised that
these files will not change radically from now on, and so there is no
reason why npc would not be able to read the new versions, you want to
be able to back out if it goes wrong.


Files beginning with npsys
--------------------------

    npsys.c
    npsys.h
    npsys_init.c
    npsys_init.h
    npsys_load.c
    npsys_load.h
    npsys_typedefs.h

These files are used when "npc" runs and compiles a system.  They are
hand-written, unchanging files that are copied into the system's "ccode"
subdirectory and compiled along with the generated C files.  They contain
stuff that is the same for all systems:  memory allocation code, array and
string functions, main(), etc.


Files related to character handling
-----------------------------------

    UnicodeData.txt
    SpecialCasing.txt

These files come from the Unicode web site without modification, and
contain the Unicode Character Database, which defines the properties
of all characters.  They are used by the Nonpareil compiler in two
ways:  to define character categories for the lexical analysis of
Nonpareil source files, and as the source of the predefined objects
in the automatically generated "system/lang/char" class file.

    char_skeleton

The skeleton of the "system/lang/char" class file just mentioned.
That file is "char_skeleton" with several thousand predefined objects
added to it at the point marked $INSERT_CHAR_LIST_HERE$.

    thompson

A historical note concerning the design of the UTF-8 character
encoding scheme, archived here for reference but not used.

    char_pool_be
    char_trie_be
    char_pool_le
    char_trie_le

Files "char_pool_be" and "char_trie_be" are created the first time that
"npc" runs on a big-endian machine.  They hold binary data encoding
character properties, and are used by "npc" to determine the lexical
classes of the characters in Nonpareil source files.  They are derived
automatically from "UnicodeData.txt" and "SpecialCasing.txt".  It saves
several seconds on each compilation to load character property data
in this binary form, rather than slog through "UnicodeData.txt" and
"SpecialCasing.txt" and assemble it from scratch.

Files "char_pool_le" and "char_trie_le" are the same, for little-endian
machines.  In a networked environment it is safe to have both sets
of files present here.  A big-ending "npc" binary will create and read
the big-endian versions, and a little-endian "npc" binary will create
and read the little-endian ones, without conflict.
