@Appendix
    @Title { Nonpareil document format }
    @Tag { document }
@Begin
@LP
@I { This appendix is a draft which belongs in a separate document,
one describing how Nonpareil is used in document formatting.  It
appears here since that other document does not exist yet.  It
has not yet been indexed. }
@PP
Although there is only one Nonpareil language, it comes in two
@I { formats }.  The format described elsewhere in this document
is @I { Nonpareil code format }, or @I { NPC }, the format
of Nonpareil program texts.  This Appendix describes
@I { Nonpareil document format }, or @I { NPD }, the format of the
large literal object expressions that are actual documents.
@PP
NPD is needed mainly because NPC is not an efficient way to enter text:
@ID @Nonpareil { ["The", "cat", "sat", "on", "the", "mat."] }
NPD also defines precisely which expressions from NPC are permitted
in document objects -- an important issue, given that literal documents
must be interpreted rather than compiled.
@PP
NPD must meet the challenges of expressing international documents, which
differ substantially from European ones.  It naturally relies heavily on
the "ISO/EIC 10646" character set, with semantics given by Unicode
@Cite { $unicode2000 }.  The Unicode Consortium has done wonders
collecting, conceptualizing, and presenting conventions for displaying
text from around the world.  Nevertheless, there is an unmistakable
mismatch between the Unicode model of text as a sequence of characters
from a fixed repertoire, and the model of text as a hierarchy of objects
adopted (for good reason) by systems such as @TeX @Cite { $knuth1984tex },
Lout @Cite { $kingston2004lout.program }, and Nonpareil.  Character
sequences that correspond with visible glyphs work well:  letters,
ideographs, accents, and so forth.  Unfortunately, Unicode also attempts
to express concepts that are fundamentally beyond the reach of its model,
either because they are hierarchical (such as languages and bi-directional
formatting) or infinitely graduated (spacing).  NPD does not attempt to
prop up these inadequate aspects of Unicode.
@PP
The Nonpareil system stores documents in NPD format, even documents
created entirely interactively.  Users may also create NPD files using
a text editor or other software for subsequent reading by Nonpareil in
batch formatting mode.  (Most of the complexity of this Appendix arises
from a desire to make NPD a pleasant format for batch users.)  But if a
document is written in NPD with a text editor, edited using Nonpareil,
then saved, there is no guarantee that the saved document will closely
resemble the original.  Nonpareil does try to save documents in a readable
form, by taking care over indenting, inserting braces only when needed,
and so on.
@PP
NPD documents will be presented in a fixed-width font.  This is
partly to distinguish NPD from NPC, and partly because white space
is significant in some places in NPD, so its exact amount has to
be clear.
@PP
Before entering on the details, here is a brief overview.  NPD shares
the lexical structure of NPC:  it uses the same character set
("ISO/EIC 10646") and the same encoding (UTF-8) with the same
classification of characters (Chapter {@NumberOf lexical.charset}).
@PP
The definitions of white space, end of line, and comments given for NPC
(Appendix {@NumberOf lexical.whitespace}) apply to NPD, except that
NPD comments are introduced by @F { "\\#" } rather than @Nonpareil { # },
and the meaning given to white space is more elaborate
(Appendix {@NumberOf document.mapping}).  NPD makes some changes to
the eight token classes of NPC, and adds a ninth class:
@ID @Tbl
    aformat { @Cell A | @Cell B }
    mv { 0.5vx }
{
@Rowa
    font { Italic }
    A { Token class }
    B { Example }
    rb { yes }
@Rowa
    A { Identifiers }
    B { @F { "\\list" } }
@Rowa
    A { Reserved words }
    B { @F { "\\else" } }
@Rowa
    A { User-defined punctuation sequences }
    B { @F { "\\+" } }
@Rowa
    A { Reserved punctuation sequences }
    B { @F { "{" } }
@Rowa
    A { Boolean literals }
    B { @F { "\\false" } }
@Rowa
    A { Number literals }
    B { @F { 3.1416 } }
@Rowa
    A { Character literals }
    B { @F { 'H' } }
@Rowa
    A { String literals }
    B { @F { "\"Hello\"" } }
@Rowa
    A { Charseq literals }
    B { @F { Hello } }
    rb { yes }
}
Roughly speaking, NPD adds a backslash character at the
front of every reserved word or symbol, to make room for the
new @I charseq token class, whose elements become literal words.
Grouping is accomplished with {@F "{"} and {@F "}"} rather than
{@F "("} and {@F ")"}.  There is also a significant change in the
syntax of parameter passing.  Here is an example of an NPD file:
@QD downifneeded @Scale @F @Verbatim {
\reference
    \type { \TechReport }
    \author { \author \surname { Kingston } \othernames { Jeffrey H. } }
    \title { The Nonpareil Language Specification, Version 1.1 }
    \date { 2004 }
}
Ordinary text is intermixed with other material.  The only characters
that stand for something other than themselves are {@F "\\"}, {@F "{"},
and {@F "}"}; to get the corresponding literal characters one must
write {@F "\\\\"}, {@F "\\{"}, and {@F "\\}"}, but these are the only
characters that need escaping in this way.
@BeginSubAppendices

@Include { document.displayable }
@Include { document.textmodel }
@Include { document.mapping }
@Include { document.filter }

@EndSubAppendices
@End @Appendix
