Milestones in the Nonpareil Project
===================================

This document records when certain stages were completed, but this
should not be taken to mean that they were worked on only since the
previous deadline, or that they were never revised afterwards.

Jeffrey H. Kingston


Phase 1 - Planning (1996 - 2002)
================================

November 1996.  I completed my paper, "The future of document
    formatting".  I also began thinking seriously around this
    time about the design of the system now called Nonpareil.

February - November 2002.  My student Bradley Baetz carried out
    the first implementation of Nonpareil, for his Honours thesis:
    "Nonpareil - a strongly typed object oriented functional
    programming language".  A fairly detailed but unwritten
    language design had evolved by the time he started, with
    syntax diagrams.

27 August 2002.  On this day I left my job to devote myself full
    time to the Nonpareil project, although it then took about
    six weeks of finishing odd jobs and setting up an office and
    computer at home before I was able to actually start work.

24 December 2002.  My paper, "Prospectus for Nonpareil", was
    completed and placed on my web site, and comments were
    solicited from members of the Lout mailing list.  This
    paper presented a precise syntax, attempted a type system,
    and discussed some document formatting issues, including
    the format of literal documents, requirements for the user
    interface including automatic construction using introspection,
    and locating mouse clicks.  These ideas had been worked out
    prior to 2002 but not written down.

January 2003.  On holidays, during which time I had a good idea for
    modelling selections and locating mouse clicks, which involves
    inserting references in the display object to the originating
    content.  As the location search goes down the display tree, it
    accumulates the references it passes through into a list, and
    this list becomes the selection.  Although not fully automatic
    as originally intended, it's simple, viable, and comprehensible
    by the programmer.


Phase 2 - Compiler front end (January-August 2003)
==================================================

24 January 2003.  Returned from holidays and commenced coding the
    Nonpareil compiler.

24 February 2003.  Completed coding of the lexer, parser, system,
    module, and renaming parts of the Nonpareil compiler, and
    released the first version of my paper "Nonpareil Language
    Specification", which laid out the rules of these parts of
    the language.

23 March 2003.  Completed coding of the manifesting part of the
    compiler, which propagates inherited features into child
    classes and identifies what each name refers to.  This
    included introducing the NAMED type, a union of everything that
    can be named (except modules which have their own name space),
    plus tightening up a lot of the previous code, including
    revising renaming to ensure that each module has at most one
    view of each class.  Also sorted out many details concerning
    class extensions, builtin classes, and norename classes, and
    defined and handled the builtin classes object, int, real,
    list, tuple2 .. tuple12.

3 April 2003.  Completed revised specification of functions,
    variables, named parameters, default parameters, and currying,
    influenced by careful reading of Cardelli and Wegner for typing,
    and the OCaml reference manual for named and default parameters.
    Worked out how to implement these features efficiently,
    including passing default markers for default value parameters,
    the funn, funn_rep, and funn_m classes for currying, and int_ref
    etc. with coercion functions for automatic boxing/unboxing (read
    Breazu-Tannen et al).  Revised the language specification document
    to include all this.

19 April 2003.  Completed a revision of view handling, which unifies
    views and operator tables into contexts and ensures that each
    class and feature knows its own name in each module, and that
    each class knows its own feature view in each module.  Also
    modified name handling so that when names are printed in error
    messages a full trace of their renaming history is given.

28 June 2003.  After two months' work from the preceding milestone (!),
    produced a type checker which successfully checked all the classes
    I have so far, including the bodies of the map and fold functions
    from the nlist class.  Defined the set of feature signatures of
    a meet type formally.  Local type inference of actual generic
    parameters is done by introducing unification variables constrained
    to be above and below given types, as in Pierce and Turner (1999)
    but differing in some details.

7 August 2003.  Over a month from the preceding milestone (!), completed
    a rewrite of the type system which treats user-defined coercion
    functions as introducing elements into the subtype relation, not
    just introducing features into classes.  The main problem was that
    this allowed cycles of equivalent classes in the subtype relation,
    which required a large rethink since I was using the ancestor set
    as the type representation, but with cycles there is no longer a
    unique map back from ancestor sets to types.  Have also refined the
    local type inference algorithm further, introducing `range types',
    to overcome the problem of how unification variables and overloading
    interact (one needs to be able to undo the effects of a type check
    if it is just one hypothesis from a set of overloaded functions,
    but unification variables don't naturally lend themselves to undo).
    Wrote an Appendix to the User's Guide entitled `The Nonpareil Type
    System' explaining all this in detail, and posted the Version 1.1
    Language Specification document including this appendix on the
    Nonpareil home page.

===========================================================================
August 2003 - October 2004.  During this period I worked on my other
    research project, timetable construction, producing a paper for
    submission to PATAT04, and subsequently the KTS web site (see
    http://www.it.usyd.edu.au/~jeff).
===========================================================================

Phase 3 - Completion of Compiler (November 2004 - August 2005)
==============================================================

2 November 2004.  For the last week or so I have been working on the
    design of NPD, the Nonpareil format for literal documents, as now
    described in the relevant appendix of the Nonpareil Language
    Specification.  Today I start coding again with the following
    short-term goals:

    * Conversion of lexer output from 32-bit Unicode scalar values to
      16-bit Unicode characters, basically because I've realized the
      cost of finding properties of 32-bit quantities.

    * An ADT for reading Unicode property files, converting to
      two-level optimized property tables, and reading, writing,
      and querying those tables.

    * Lexing and parsing of NPD files.

13 November 2004.  Well, today I really started coding again.  I've gone
    back to 32-bit Unicode scalar values, basically because they are going
    to happen eventually (ISO/EIC 10646 is a 31-bit code), accessed via
    a four-level table, and done a lot of work sorting out how the various
    Unicode properties and algorithms work.  Also lost some time to
    dentistry in this period.

17 November 2004.  Am now able to write out a binary character properties
    table and associated string pool.  The two binary (.data) files are
    loadable in virtually no time, vs. 2 seconds to load and convert the
    text files.  Here are the file sizes:

	 60332 Nov 17 15:10 char_pool.data
	355280 Nov 17 15:10 char_trie.data
	 15386 Nov 13 16:25 SpecialCasing.txt
	897402 Nov 13 16:25 UnicodeData.txt

    The UChar type, after this initialization, can return the Unicode
    General Category of a character, its combining class, etc. - all the
    properties in UnicodeData.txt and SpecialCasing.txt except character
    name and a few conditional casings that this code doesn't handle;
    plus a Nonpareil lexical class.

19 November 2004.  Lexical analyser now fully Unicode compliant, with
    Unicode definitions of identifiers, digits, and punctuation.  Added
    infixr alongside infix - implementation was wonderfully easy, just
    add 1 to infixr precedences before scanning the right shell.  The
    interpreter seems to be working as well as before.  Next is code
    generation.

22 November 2004.  Am now generating typedefs (including special ones
    for builtin types) and structs (including anonymous bit fields to
    span gaps caused by multiple inheritance).

26 November 2004.  Got to generating function headers but realized that
    it was getting horrible.  Have realized that the "system and view"
    model of the documentation is the right one but is not implemented
    by the current compiler, which has problems around MFEATURE and
    MFEATSIG, these being basically the same thing.  The problem arose
    because I was feeling my way to the implementation before.  So I am
    embarking on a major rewrite.  I may be gone some time.

10 December 2004.  Rewrite appears to be working today.  It took two
    weeks, and not everything I thought could be simplified could be,
    but everything is a lot clearer now and it was worth it.  Now
    have to get back to code generation.

19 December 2004.  Have code generation framework going well,
    the main omissions at the moment are call expressions, which
    have to link to the correct parameters etc.  Found a clever
    method of introducing hidden parameters into let and anonymous
    functions, and spent what feels like it was about a week (!)
    fiddling around with case expressions.  Now need to do call
    expressions and then start auditing just what else is missing.
    Clone is missing and needs a special implementation invoking
    memcmp, since although it is a normal feature, to implement
    it that way would produce a zillion parameters.  Caching is
    also missing.  Default values for parameters are partly done.

22 December 2004.  A small rewrite removed the FUNCTION type, which
    on reflection was an abstract supertype and not a real type,
    separated PARAMETER into PARAMETER_VIEW and PARAMETER, and
    merged EXPR_FUN into EXPR_LET (since anonymous functions are
    simple cases of let expressions).

24 December 2004.  Last day of work before one month's summer holiday.
    Have introduced CFUNCT and CPARAM types to represent C functions
    and their parameters, and successfully ported a lot of code over
    to this design.  Currently have just implemented CFunctCallCodeGen
    which is supposed to generate one call to a C function.  Have linked
    it in to expr_call.c - to the targeted call case only so far, but
    it should handle them all.  The code is compiling but I ran out of
    time to test and it is failing an assertion.

29 January 2005.  Started work again today after the summer holiday.

20 February 2005.  Since starting again I did some more code generation,
    then things got messy so I did a major rewrite, introducing a
    FUNCTION class from which everything that's a function inherits.
    This rewrite took two weeks up to today; I'm now back working on
    code generation, basically where I left off before.

23 March 2005.  Still working on code generation.  Have completed all
    the basic stuff: creation functions, dynamically dispatched features,
    etc.  Am now working on the refinements (arrays, predefined objects,
    that sort of thing).  Had a major setback three days ago, when I
    realized that I could not easily determine which object fields were
    pointers and which were immediate values, in those cases where the
    field type is a type variable.  In particular, when an object creation
    occurs within a generic class, no static scheme will work.  This
    information is needed when swizzling heap data so that it can be
    stored in a file and read in for a fast start-up.  Am still
    investigating this issue but the options at present seem to be:

    * Use mmap(), which requires no swizzling and would be very fast.
      However, as far as I know it only exists on Unix, and the manual
      entry does not guarantee that it will reload into the same virtual
      address that it stored from, although you can ask for it and
      there seems to be no reason why it wouldn't.  [Tim Cooper is
      using equivalent functions in Windows, but he is not attempting
      to reload into the same virtual address.]

    * Run the initialization code twice; those fields that differ in
      the two runs must be pointers to objects.  Put swizzle bits into
      objects where there is doubt, to record the results of this, and
      consult those bits when unswizzling but not otherwise.  If floating
      point operations are non-deterministic (which I seem to remember
      that they can be, although I can't find any info about it), the two
      runs may diverge, causing this method to fail.  Speed is not such
      a big issue because it's all in an off-line run.  Anyway even the
      mmap() solution might require two runs, the first to find out how
      much memory is going to be needed, the second to use that memory.

    * Include full type information in the type tag of every object,
      rather than just the class as at present.  Essentially this is
      full-blown introspection, which I am not ready to do yet, and
      anyway I don't really know how to do it in the case of object
      creations lying within generic classes and functions, and so I
      am hoping that this "full introspection" will never be needed.
      Also I am not sure how the sequencing would go:  what about the
      introspection objects themselves?  Perhaps they are not generic.

24 March 2005.  Last day before taking Easter break.  Have begun working
    on swizzling.  The plan is to run initialization twice, compare
    object-by-object, set swizzle bits, and write the file.  Then read
    back and unswizzle.  It will be an assert error if the comparison
    shows that the two initializations took different paths.

31 March 2005.  Swizzling code done, but initialization altogether has
    quite a lot of other stuff, still to do.  Got sick of wondering
    where everything is kept and how it fits together, so have assembled
    everything into a single directory, the compiler in one subdirectory
    and the system in another.  A single "make" will now build and
    install a working Nonpareil system from scratch (or it will, when
    the compiler is finished), with everything in the right place, and
    everybody knowing what those places are.

6 April 2005.  Have "np -init" working today - writing out a file of
    swizzled predefined objects.  But all_predefined arrays not done yet.

7 April 2005.  Added all_predefined arrays.  Not bad for one day's work!
    However the current design of all_predefined requires initialization
    to do a prerequisite search through the whole system, not module by
    module as the current implementation does, because an all_predefined
    array of a class in one module can receive contributions from class
    extensions in later modules.  So I need to redo the initialization code.

20 April 2005.  Got side-tracked into making an abstract backend, which
    turned out to be a two-week job!  It's finished now: the compiler
    is compiling, and C code it generates is compiling cleanly, but I
    have not yet attempted to run that C code.

25 April 2005.  Sorted out a quite long list of smallish issues, then
    struck clone, which turned out to be another rock in the path.  It
    needs to be partitioned, but the partitioning depends on having
    already validated every type in the entire system, since classes
    connected by meet types or inheritance must lie in the same partition.

28 April 2005.  Have sorted out how to do clone, but during the
    implementation I struck problems with incorrect types, and then
    when I tried to debug the problem I got into a terrible mess
    between ordinary parameters and forwarded parameters.  This has
    prompted yet another refactoring, which I began today, in which
    there are separate Nonpareil functions with Nonpareil parameters,
    and C functions with C parameters.  The Nonpareil functions refer
    to the corresponding C functions, and the Nonpareil parameters
    refer to the corresponding C parameters.  This gives a concrete
    model of the forwarded parameters:  they are exactly the parameters
    that will appear in the C function.  This was always the case, but
    it will be much clearer now.

1 May 2005.  Have more or less finished the latest refactoring, and it
    is indeed much clearer.  Have introduced new CREATION_FN_VIEW,
    CREATION_FN, INVARIANT_VIEW, and INVARIANT classes, and certainly
    cleared up a lot of things that were confused before (e.g. inner
    letdefs, and invariants), so the refactor has been a big success
    and will probably stick - at last.

8 May 2005.  Heaven knows where the last week went, but I am still
    finishing off the same refactoring, I've been debugging it
    today, and now have a complete compile to C again, although the
    generated C code is giving a few error messages, not many, when
    compiled.  I did lose a day or so to feeling off colour, and
    there has been some detailed work in strengthening the C_FUNCT
    abstraction, which has turned out well.

14 May 2005.  C compile now working properly.  Spent some time sorting
    out old problems with let definitions.  These are now working,
    although at the cost of refusing to compile certain complex kinds.
    Also sorted out initialization of string literals, there was a bug
    there.  Above all, clone is now working.  On my list of things to do
    begun on 31 March there are 92 items, with 23 remaining to be done.
    The principal tasks remaining in this pure compiler phase of the
    project are function inlining, fast reload, enumerated types, and a
    problem that dawned on me the other day:  I am inserting coercions
    before type range variable unification is complete, so they can't
    always be the right coercions.  Have reorganized the "Implementation
    Notes" report into a structure that seems likely to stick.

17 May 2005.  Successfully generating the "char" class today, and
    reading it.  Now have to implement enumerated types.

22 May 2005.  Have carried out quite a lot of the implementation of
    enumerated types, and also completed an excursion into integral
    types, setting up byte, short, int, ubyte, ushort, and uint,
    with coercions and other conversions between them.  But now have
    realized that my approach to enumerated types, involving two
    classes for each enumerated type, was wrong.  I should rather
    keep the implementation much closer to the back end, leaving
    the front end untouched except that enumerated classes inherit
    ultimately from enum_object rather than object.

9 June 2005.  Have just returned from a week's bushwalking.  Now well
    into the revised implementation of enumerated classes, currently
    implementing legal_code and with_code.  After that I will need to
    write the code to build the trie, and enumerated classes should
    be done.  Also added mathematical functions to the real class today.

12 June 2005.  Have completed code generation of all_predefined and
    all_enumerated arrays, including inheritance.  Have changed to an
    initialization order more compatible with building tries.  Still
    have to check this order, and still have to initialize tries.

23 June 2005.  Have just returned from a week's holiday.  I completed
    trie building before I left (a very tricky algorithm indeed), and
    today I have enhanced the back end module so that it checks whether
    C files have not changed since the previous compile, and skips the
    compile in that case.  This involves saving backup copies of all
    generated files, and comparing them character by character with
    the current versions.  A C file's generated header files must
    also compare equal, and there must be an object file, if skipping
    is to be done.  The next step is to sort out the ordering of
    class-level predefined object initializations.

25 June 2005.  Have finally completed a definitive description of
    predefined object initialization, and designed an algorithm to
    do it.  Next step: implementation.

1 July 2005.  Have completed the implementation of predefined object
    initialization.  Along the way I realized that the initialization
    algorithm basically traverses back-end functions, to see what they
    call.  This inspired me to make a major renaming to identify all
    the back-end functions clearly.  I then carried out an audit to
    match up code generation and predefined object initialization, so
    initialization checks what code generate generates.  Very clean.

2 July 2005.  Finished predefined object initialization and set to
    work on generating two binaries: "np_init", which initializes
    predefined objects from scratch and saves them in a checkpoint
    file; and "np", which loads predefined objects quickly from the
    checkpoint file.  Compiler is running right through without
    crashing, but still have to audit the generated C files.  There
    are #include files to add, plus must read through it all
    carefully.

3 July 2005.  Now have a clean compile of the generated C code,
    except for a couple of small, older problems.  Have to read
    through the code carefully (but I bet it's right), then go
    on and work out system_load().

7 July 2005.  Now generating system_load() and compiling it
    cleanly.  That is, I now have complete working "np_init" and
    "np" binaries.  My list of things to do currently has 17
    items, quite substantial ones but all either reorganizations,
    bug fixes, or optimizations, nothing basically new to add now.

13 July 2005.  Working through the list of things remaining to do,
    currently 12 of them left.  Implemented function inlining
    today (not in general, just two common special cases).  The
    most substantial remaining problem is that of coercions being
    inserted before the types they coerce to have been finalized.
    Otherwise it's routine stuff from here on.

22 July 2005.  Still in the list, currently 15 items to do.  Have
    completed a full audit of privacy, including introducing private
    parameters of features and rewriting that section of the language
    specification.  Currently thinking about default values of creation
    features, which seem quite difficult.  Have uncovered and am fixing
    several problems with clone. 

28 July 2005.  Completed handling of default values of creation features.
    It was a long job, which took several days to plan and design.

3 August 2005.  Have lost several days to other things.  Today I completed
    a rewrite of preconditions code that avoids code duplication by placing
    the preconditions into functions.  Along the way I audited class and
    feature manifesting and greatly reduced the number of contexts created,
    by re-using levels after clearing them out.  This should be faster and
    less memory-hungry, and the code is cleaner too.

7 August 2005.  Lost a couple of days to timetabling and dentistry.  Have
    just added function call caching.  21 items on to-do list, which is
    more than there were a month ago, but most are small.  The only big
    items are (1) as for 13 July, (2) eliminating all code duplication.

12 August 2005.  Completed a review of Unicode and the char class, worked
    out the right way to handle invalid and unassigned code values, and
    modified the specification and implementation.  Also fixed a horrible
    old bug in the trie code, that was getting the node reference counts
    wrong by not realizing that the reference counts of the children of a
    node increase by 1 when that node is copied.  17 items on to-do list.

18 August 2005.  Completed the compiler today, and packaged it up for
    placing on the Nonpareil home page.

===========================================================================
September 2005 - December 2005.  During this period I plan to work on
    my other research project, timetable construction, producing a paper
    for submission to PATAT06.  I hope to do some background reading
    for Nonpareil during this time, and some planning of where the
    project will go when I return to it next year.
===========================================================================

