@Section
    @Title { Memory }
    @Tag { memory }
@Begin
@LP
In its quest for speed, Nonpareil caches objects and function calls,
and it calculates predefined objects just once, during an initializing
run, storing them in a binary file that can be loaded quickly at the
start of each ordinary run.  This section explains all this in detail.
@BeginSubSections

@SubSection
    @Title { Memory systems }
@Begin
@LP
A @I { memory system } is a large piece of memory within which objects
may be stored.  A memory system obtains large @I chunks of memory from
the operating system and doles them out in small pieces (@I { records })
as required.  Each new large chunk is twice as large as the previous
chunk, starting from about 1MB, so even on a very memory-hungry run
there should be less than 10 chunks in a memory system.  If a request
arrives for a record larger than the current chunk size, that record
goes into a fresh chunk of its own.
@PP
Each record contains a type tag from which the length of the record
can be deduced, and no memory is ever freed, so there is no need for
hidden memory management fields of the kind used by @I { malloc }.
Thus there is zero memory allocation overhead per record.
@PP
Requiring objects to obtain their memory from a memory system rather
than directly from @I { malloc } speeds up memory allocation, but it
is done mainly so that some central authority knows what memory has
been allocated, enabling the memory system operations described here.
@PP
One chunk in every memory system is devoted to that memory system's
@I { cache } -- a large hash table giving access to every record
in that memory system.  A cache is always the sole occupant of its
chunk.  When a cache is 80% full it is rehashed into a new, larger
chunk.  The old chunk is held in reserve by the memory system and
re-used next time a new chunk is required, if large enough.
@PP
Nonpareil caches both objects and function calls, and at each
moment there is a @I { current object memory system } and a
@I { current function call memory system } into which new objects
and function call records are stored.  These two memory systems
are accessed via global variables.
@PP
An object memory system cache is a simple linear probing hash table,
with null values denoting empty slots, and non-null values pointing to
objects.  A function call cache also uses linear probing.  Its
non-null values point to records containing a length field (the
number of parameters), then the parameters (one per word, the first
being a pointer to the function that this is a call on), and finally
the result value.
@PP
Each record, whether object or function call, contains a 32-bit
@I { signature }, which is just the hash value before reduction
to the table range via the remainder operation.  Storing this
signature has a significant memory cost, but it costs next to
nothing in time and speeds up retrieval (the key operation),
since unequal signatures rule out an object immediately, while
equal ones almost always indicate that the full match will succeed.
@PP
Actually the signature is stored primarily for a different reason:
it greatly simplifies the rehashing operation that occurs when the hash
table is enlarged.  With a signature available, the new insert position is
just the remainder when the signature is divided by the new table size.
Without a signature, one must rehash the record, which is slow and
surprisingly difficult to get right.  Rehashing using full information
about the layout of the record requires a huge dynamic dispatch over all
concrete record types.  Alternatively, rehashing based on (say) just
the length of the record requires all unused fields within the record
to be carefully cleared when it is created.  When creating an object,
one then cannot simply hash the parameters of the creation function;
instead, the object must be built and hashed, then relinquished if it
is found to already exist.  This is slow when the object exists already.
@End @SubSection

@SubSection
    @Title { Object creation and caching }
@Begin
@LP
Whenever an object is to be created, the cache of the current object
memory system is consulted.  If an equal object is already present in
the cache, that object is returned instead of creating a fresh object.
In this way the system ensures that two objects are equal if and only
if they are the same (lie at the same memory address).
@PP
For example, consider the problem of creating an object of the
@Nonpareil { nlist } class:
@ID @Box @Nonpareil {
class nlist{A} inherit list{A}

    head: A
    tail: list{A}

noncreation

    ...

end
}
The C record generated for representing @Nonpareil { nlist } objects is
@ID @OneRow @CP {
struct nlist_rec {
  unsigned int signature;
  unsigned short type_tag;
  unsigned char swizzle_bits;
  unsigned : 8;
  void *head;
  list tail;
};
}
Every object contains at least its own signature and a type tag.
The C function generated for creating an @Nonpareil { nlist } object is
@ID @OneRow 13px @Break @CP {
nlist nlist_create(void *head, list tail)
{
  nlist self, other;
  unsigned int sig, i;

  /* calculate hash signature */
  sig = (nlist_tag << 16);
  sig += ((int) head);
  sig += (((int) tail) << 1);

  /* return previously created object if in hash table */
  i = (sig % npsys_obj_cache_len);
  while( npsys_obj_cache[i] )
  {
    if( (npsys_obj_cache[i]->signature == sig) )
    {
      other = ((nlist) npsys_obj_cache[i]);
      if( ((other->type_tag == nlist_tag)
          && (other->head == head)
          && (other->tail == tail)) )
      return other;
    }
    i = ((i + 1) % npsys_obj_cache_len);
  }

  /* create and initialize a new object */
  self = ((nlist) npsys_obj_new(16));
  self->signature = sig;
  self->type_tag = nlist_tag;
  self->head = head;
  self->tail = tail;

  /* insert new creation into hash table and return it */
  npsys_obj_cache_insert(((CACHE_OBJ) self), i);
  return self;
}
}
This function has one parameter for each creation variable.  All
code excerpts are actual, unmodified Nonpareil compiler output.
@PP
The first step is to hash the parameters, producing the signature,
held in variable @CP { sig }.  If either creation variable had had a
default value, the correspondong actual value passed would have first
been tested for equality with a special value indicating that the
default value was to be used, in which case the value of an expression
which evaluates that default value would have been assigned to the
parameter.  How to handle that expression is a separate topic.
@PP
Then, accessed via global variables @CP { npsys_obj_cache } and
@CP { npsys_obj_cache_len }, the cache of the current object system
is searched for an object equal to this one.  If found, that other
object is returned.  Otherwise, a new record of the appropriate length
(16 bytes in this example) is obtained from the current object memory
system, initialized, inserted, and returned.  The @CP { swizzle_bits }
field is not initialized by object creation.  It is described in a
later subsection.
@PP
Array and string creation is somewhat different.  The @Nonpareil { array }
type is built-in, with C struct
@ID @OneRow @CP {
struct array_rec {
  unsigned int signature;
  unsigned short type_tag;
  unsigned char swizzle_bits;
  unsigned : 8;
  unsigned int length;
  void *elems[1];
};
}
The C struct for the @Nonpareil { string } type is the same except
for having a specific element type:
@ID @OneRow @CP {
struct string_rec {
  unsigned int signature;
  unsigned short type_tag;
  unsigned char swizzle_bits;
  unsigned : 8;
  unsigned int length;
  uchar elems[1];
};
}
These definitions are printed by the compiler, but are fixed, not
calculated like the one for @Nonpareil { nlist }.
@PP
There are several functions that create arrays, all builtin.  These
are required to follow a certain protocol.  First they call
@CP { array_create1 }, which creates a record of the right length
for the particular array.  Next they assign the elements using any
method they choose.  Finally they call @CP { array_create2 }, which
checks whether an array equal to the new one is already in the cache;
if no, it returns the new array; if yes, it relinquishes the new array
and returns the existing one.  For example, the code to initialize a
string might look like this:
@ID @OneRow @CP {
s = string_create1(5);
s->elems[0] = 'H';
s->elems[1] = 'e';
s->elems[2] = 'l';
s->elems[3] = 'l';
s->elems[4] = 'o';
s = string_create2(s);
}
Functions @CP { string_create1 } and @CP { string_create2 } are clones
of @CP { array_create1 } and @CP { array_create2 }.  Like the structs,
these functions are fixed, not generated by the compiler.  They are kept
in a library file which is included with every compile.
@PP
A problem here is that the algorithm for filling the array (or string)
might involve function calls and object creations, and so a decision
to relinquish the array might be taken after many other objects have
been allocated, when the array is no longer at the top of the chunk.
@PP
The Nonpareil memory system handles this situation very simply.  If,
at the moment of relinquishment, the array is still at the top of
the chunk, then the memory is reclaimed, otherwise it leaks.  Such
leaks will be very rare, since most array construction operations
do not involve the creation of other objects anyway (e.g. literal
string construction and array concatenation), and given that the
array in question is to be relinquished, any objects that it points
to must have already existed at the time the array construction began.
@PP
If an array leak occurs during the initialization of predefined object
features, the leaked array will be swizzled and saved with the other
predefined objects.  Then, when they are reloaded, it will be inserted
into the cache with them.  It is of course equal to at least one other
object (the original array that caused it to leak), but the cache
reinsertion operation will not notice this, because it is optimized for
reloading under the assumption that no two objects are equal.  It might
seem that this does not matter, since the original array must precede
the leaked one in memory, so will be reinserted first and will prevent
the leaked array from ever being retrieved.  However, a rehash of the
cache (which traverses the hash table, not memory) would reorder them
when there is an array wraparound between the original and leaked arrays,
and then, if a new object creation of an array with this value occured,
@CP { array_create2 } would return a reference to the leaked array, and
a subsequent equality test would return false when it should return true.
@PP
The solution adopted to this problem is as follows.  Just before
relinquishing an array or string @CP { self } within @CP { array_create2 }
or @CP { string_create2 }, assign a value to its first element (if any)
that cannot appear in any ordinary array:
@ID @OneRow @CP {
if( self->length > 0 )
  self->elems[0] = DFT_PARAM_VAL;
}
The value assigned is the one used as the default parameter value.  This
ensures that a leaked non-empty array will never be confused with any
other array, except possibly another leaked array, but that does not matter.
This does not solve the problem for empty arrays and strings, but they
can never leak, because the protocol permits only one way to create them:
@ID @OneRow @CP {
a = array_create1(0);
a = array_create2(a);
}
and no code is ever generated in which other code appears between these
two calls.
@PP
Function call caching is similar to object creation, in that an object
is created only if it does not already lie in the cache.  However,
the object returned is not the created object, but rather one of
its fields, which holds the function call result.  One cannot
unify object caching and function caching, because of this need
for returning a field rather than the cached object itself.
@End @SubSection

@SubSection
    @Title { Checkpointing and swizzling }
    @Tag { memory.swizzling }
@Begin
@LP
After initializing all the predefined objects, a Nonpareil initializing
run has to save them in a binary file that can be quickly loaded by a
subsequent non-initializing (called @I { loading }) run.  That is, it
has to save the current object memory system.  The current function
call memory system is not saved, nor is the object memory system's cache.
This is quite safe, because no object contains a pointer into either
type of memory.
@PP
When objects are saved to a file, any pointers within them lose their
meaning.  They must be converted to suitable integers before saving,
then converted back again to pointers when the file is reloaded on a later
run.  These operations will be called @I { swizzling } and @I { unswizzling }.
@PP
Nonpareil keeps track of the start and end address of each chunk.
Corresponding to each byte address in each chunk is a unique integer,
obtained by adding the offset of that address within the chunk to
the total length of all preceding chunks.  To checkpoint object memory,
this integer is calculated for each pointer in object memory, by
testing which chunk the pointer points into, and overwritten onto
the pointer before the chunk is written to disk.  When the disk file
is read back in it goes directly into a single chunk, and each pointer
field is incremented by the base address of that chunk.  This reload
is quite fast, fortunately, since it is done at the start of each loading
run, whereas writing is done only by initializing runs.
@PP
The binary file that holds a checkpointed memory system has a very
simple format.  It begins with a long integer (traditionally called a
@I { magic number }), holding the time when the executable file that
wrote this file was compiled.  This is checked to ensure that the file
is read only by an executable compiled from the same system as the
executable that wrote the file.  After that comes a single swizzled
pointer, pointing into the following chunk to an array of pointers
to the @Nonpareil { all_enumerated }, @Nonpareil { all_predefined },
and trie arrays of the memory system, from which all the other objects
are reached.  After that comes the memory system itself as a single
swizzled chunk.
@PP
The name of the file ends in @CP { _be } or @CP { _le } accordingly
as it was written by an executable running on a big-endian or
little-endian system.  When reading, a big-endian executable will
only open a binary file whose name ends in @CP { _be }, and a
little-endian one will only open a file whose name ends in @CP { _le }.
In this way all confusion over endianness is avoided, indeed the same
system can be compiled on both big-endian and little-endian machines in
a networked environment.  All that is required is that an initializing
run be performed on one machine of each type.
@PP
To swizzle an entire memory system, swizzle each chunk of it.  To
swizzle a chunk, proceed through the chunk swizzling each record.
This requires both knowing how to swizzle one record, and knowing
its length so that one can proceed to the next chunk.  The swizzling
algorithm depends on a function with header
@ID @CP { int system_swizzle(object self, object other) }
which handles both:  it swizzles @CP { self } and returns the size of
@CP { self }'s record, so that the chunk swizzling knows how far to
skip along the chunk to the next record.  Parameter @CP { other } is
concerned with working out which fields to swizzle, as explained below.
@PP
To implement @CP { system_swizzle }, one needs a giant dynamic
dispatch on the tag field of @CP { self }.  To ensure that the
C compiler is not overwhelmed, the body of @CP { system_swizzle }
contains nested @CP { if } statements that dispatch to a module
(this is easily done since the tag fields of types defined in
any one module have contiguous values).  Within each module is a
function which uses a @CP { case } statement to dispatch to
swizzle functions for each of that module's types.
@PP
Unswizzling is organized in the same way, by proceeding through the
chuck calling a @CP { system_unswizzle } function that dispatches to
the modules, which then dispatch to the object types.  Unswizzling
means converting integers back to pointers, but these functions also
recalculate object signatures, if required owing to the presence of
fields containing pointers, and insert the objects into a new cache.
@End @SubSection

@SubSection
    @Title { Knowing what to swizzle }
@Begin
@LP
Swizzling requires that the system know which object fields are
pointers.  Nonpareil's strong typing usually makes this clear,
but there is a problem with fields whose type is a variable:
@ID @Nonpareil {
class nlist{x} inherit list{x}

    head: x
    tail: list{x}

end
}
In this example, @Nonpareil { tail } is always a pointer, but
@Nonpareil { head } could be a pointer or an immediate value,
depending on the instantiation of @Nonpareil { x }.  The record type
tag holds only the uninstantiated type tag value @CP { nlist_tag };
it contains no information about the type of @Nonpareil { x }.
@PP
One solution to this problem would be to avoid swizzling altogether by
ensuring that a saved object file is reloaded into the process
address it was stored from.  However, the operating system used by
the author (Linux) does not offer such a feature, despite possessing
a full virtual memory addressing system.  The manual entry for the
Linux @I { mmap } function, which comes closest, allows the user to
ask for it, but deprecates it and states that the request will not
necessarily be granted.
@PP
Another solution might be to ensure that the type tag stored in the
object record identified an @I { instantiated type }, that is, a type
with particular values for its type variables.  Unfortunately, it is
not clear how to come up with instantiated type tags.  Consider
the @Nonpareil { map } feature of @Nonpareil { nlist }:
@ID @Nonpareil {
map{B}(f: fun1{A, B}): list{B} := nlist(f.apply1(head), tail.map(f))
}
It creates an object of type @Nonpareil { nlist{B} }, and it is
not clear where an instantiated type tag with this value will
come from (it varies from call to call, to begin with).
@PP
Nonpareil's solution involves storing a set of @I { swizzle bits } in
each object.  There is one bit for each generic type parameter of the
class which is also the type of a creation feature.  For example, in
@Nonpareil { nlist } above, there would be one swizzle bit, for
@Nonpareil { x }; or if @Nonpareil { head } had some other type,
there would be no swizzle bits.  Each swizzle bit is 1 if, in this
particular object, the type variable is a pointer type, and 0 if it is not.
@PP
Swizzle bits are not inherited.  A parent class may have several swizzle
bits, while its child has none, and vice versa, as is easily shown
by example.  But for exactly this reason, the location of the swizzle
bits in the object record does not have to be the same from one object
to another.  So these bits (if needed at all) are allocated @I { last },
after all other record fields have received their offsets.  An upper
limit of 32 is placed on the number of swizzle bits in any object --
a generous limit -- so that the swizzle bits will always fit into a
single field.  The actual width will be 0, 8, 16, or 32 bits depending
on how many swizzle bits there are.
@PP
In deciding which swizzle bits to set, Nonpareil relies on the fact
that only one checkpoint is needed, at the end of initialization.  It
runs the initialization twice, in different object memory systems, then
compares the two object systems object by object and variable-type field
by variable-type field.  Where a variable-type field has a different
value in each object, those values must be pointers into different
memory systems, indicating that that field must be swizzled.
@FootNote {
If floating-point operations produce non-deterministic results,
that could confuse this method; indeed if these results are used
in tests which determine what objects are created, that could even
lead to cases where the two object memory systems cannot be compared
object by object.  I have not been able to find out whether such
non-determinism exists in floating-point operations.  If it does,
it would be necessary to avoid predefined objects containing generic
fields whose values depend on the outcome of floating-point operations.
}
@PP
This comparison is part of the swizzling operation; the swizzle bits
are set at the same time the fields themselves are swizzled, and only
objects destined for saving in a file have their swizzle bits set.  The
two objects to be compared are passed as parameters @CP { self } and
@CP { other } to @CP { system_swizzle } above.  Then when the object is
loaded again later, known pointers are unswizzled, known non-pointers are
left alone, and the appropriate swizzle bits decide the doubtful cases.
@End @SubSection

@SubSection
    @Title { Function call caching }
@Begin
@LP
As already mentioned, Nonpareil caches function calls.  How the hash
table is organized was described earlier.  This section describes how
the cache is accessed, and which kinds of functions are cached.
@PP
Each back-end function object contains a Boolean field saying whether
or not calls on that function are to be cached.  Irrespective of what
kind of back-end function it is, if that field is set then during
code generation of the body of the function, code is generated at
the start of the function (before the preconditions are tested)
for hashing the parameters, searching the function call cache, and
returning early with an existing result if found.  Code is also
generated at the end of the function for building a cache object
holding the signature, result, and parameters, and inserting it
into the cache.
@PP
The remaining question is which calls to cache.  Provided every
@I object is cached, there is no problem with correctness in not
caching some function calls.
@PP
At the time of writing there were twelve different kinds of back-end
function objects.  Some of them do not correspond to functions in the C
code, and thus caching is irrelevant to them:  let definitions without
parameters, downcast variables, parameters, creation features, and
predefined object features.  Creation functions are not cached because
object caching does the equivalent for them.  Back-end precondition
and invariant functions have a @CP { void } result type, not Boolean,
so it would be awkward to cache those.  However, since they have the
same parameters as their enclosing function, and a cache hit on one of
them would usually imply a cache hit on the enclosing function, which
would cause the precondition or invariant to be bypassed anyway, it
would be rare for time to be saved by caching them, and so they aren't.
The various initialization functions are not cached, because each of
them is called only once anyway.  Builtin functions are not cached because,
at least so far, their bodies are all too small to benefit from caching,
and indeed are usually inlined.
@PP
This leaves noncreation features and let definitions with parameters, both
of which are cached, and creation feature default value functions, which
are currently not cached, on the principle that they are likely to
be tiny functions (constants, typically) which would not benefit.  It
would make sense to heuristically evaluate all these functions to
decide whether each is likely to benefit from caching or not, but this
is not currently being done.
@End @SubSection

@EndSubSections
@End @Section
