\section{Appendix: Cost calculations in detail}
\label{sect:formal}
This section presents detailed cost formulas.  It is the basis of the
implementation of history in the author's KHE platform \cite{kingston2014khe}.

\subsection{General formulas}
Let $I$ be a global instance with projections $I_1 , \dots , I_n$,
where $n \ge 1$, and let $C$ be a constraint with projections
$C_1 , \dots , C_n$.  Let the instance currently being solved
be $I_i$, so that solutions $S_1 , \dots , S_i$ are available,
with $S_1 , \dots , S_{i-1}$ fixed.

Let $c(C_i)$, the {\em cumulative cost} of $C_i$, be the minimum, over
all solutions $S$ of $I$ containing $S_1 , \dots , S_i$, of $c(C)$,
the cost of $C$ in $S$.  In other words, $c(C_i)$ is the largest cost
assignable to $C_i$ without risk of exceeding $c(C)$.

As $i$ grows, the set of solutions $S$ containing $S_1 , \dots , S_i$
shrinks, so $c(C_i)$ increases:
$$c(C_1) \le c(C_2) \le \cdots \le c(C_n) = c(C)$$
When cost adjustment is used to avoid double counting, the cost actually
contributed to the solution cost, called the {\em reported cost}, is
$c(C_1)$ when $i = 1$, and $c(C_i) - c(C_{i-1})$ when $i > 1$.  The
total reported cost is then
$$c(C_1) + (c(C_2) - c(C_1)) + \cdots + (c(C_n) - c(C_{n-1})) = c(C_n) = c(C)$$
as required.  All reported costs are non-negative, since
$c(C_i) \ge c(C_{i-1})$.

It is usual to calculate cost in two stages.  First, information
specific to each constraint is used to calculate the {\em deviation},
also called the {\em degree of violation}.  This is usually the
amount by which some quantity exceeds a maximum limit or falls
short of a minimum limit.  Then a non-negative, non-decreasing
{\em cost function} $f(x)$ is applied to the deviation to produce
the cost.  Examples are $f(x) = wx$ and $f(x) = wx^2$, where $w$
is a non-negative constant {\em weight}.

When cost is calculated in this way, one can define $d(C_i)$, the
{\em cumulative deviation} of $C_i$, as the minimum, over all
solutions $S$ of $I$ containing $S_i , \dots , S_i$, of $d(C)$,
the deviation of $C$ in $S$.  Then $c(C_i) = f(d(C_i))$ and
$c(C) = f(d(C))$.

\subsection{Cluster busy times constraints}
Suppose now that $C$ is a cluster busy times constraint with minimum
limit $L$ and maximum limit $U$, where $0 \le L \le U$.

Suppose these quantities are available to $C_i$ for calculating costs
with:
\begin{quote}
\begin{tabular}{l@{\extracolsep{0.5cm}}l}
$a_i$ & The number of $C$'s time groups in $C_1 , \dots , C_{i-1}$ \\
$b_i$ & The number of $C$'s time groups in $C_i$ \\
$c_i$ & The number of $C$'s time groups in $C_{i+1} , \dots , C_n$ \\
$x_i$ & The number of $C$'s time groups which are active in $S_1 , \dots , S_{i-1}$ \\
$y_i$ & The number of $C$'s time groups which are active in $S_i$
\end{tabular}
\end{quote}
with $0 \le x_i \le a_i$, $0 \le y_i \le b_i$, and $0 \le c_i$.
There is no $z_i$ because solutions are available only for the past
and present, not for the future.

% Let $d(C_i)$, the {\em cumulative deviation} of $C_i$, be the minimum
% over all solutions $S$ of $I$ containing $S_i , \dots , S_i$ of $d(C)$,
% the deviation (number of violations) of $C$ in $S$.  Cumulative cost
% is a non-negative non-decreasing function of cumulative deviation,
% for example $c(C_i) = w \cdot d(C_i)$ or $c(C_i) = w \cdot d(C_i)^2$,
% where $w$ is a non-negative constant {\em weight}.

The cluster busy times constraint has an {\tt AllowZero} option, which
when true causes zero active time groups to produce cost 0, whatever
the limits.  As an aid to expressing this in algebra, introduce the
low-precedence operator
$$a :: b$$
Its value is 0 when {\tt AllowZero} is true and $a = 0$, and $b$ otherwise.

The number of active time groups of $C$ in any solution $S$ of $I$
containing $S_1 , \dots , S_i$ is at least $x_i + y_i$ and at most
$x_i + y_i + c_i$, so
$$d(C_i) = x_i + y_i :: max(0, L - x_i - y_i - c_i, x_i + y_i - U)$$
$L \le U$ implies that $L - x_i - y_i - c_i$
and $x_i + y_i - U$ cannot both be positive.

If cost adjustment is desired, $C_i$ also needs to know whether
there is a $C_{i-1}$ and what its cumulative cost is if so, so
that it can subtract it away.  There is a $C_{i-1}$ when $a_i > 0$,
and, by the previous formula, its cumulative deviation is
$$d(C_{i-1}) = x_{i-1} + y_{i-1} :: max(0, L - x_{i-1} - y_{i-1} - c_{i-1}, x_{i-1} + y_{i-1} - U)$$
But $x_{i-1} + y_{i-1} = x_i$ and $c_{i-1} = b_i + c_i$, so
$$d(C_{i-1}) =  x_i :: max(0, L - x_i - b_i - c_i, x_i - U)$$
and in this form $d(C_{i-1})$, and hence $c(C_{i-1})$, is easy
for $C_i$ to calculate.  It should do this just once, since the
value is constant.  When $a_i = 0$, there is no $C_{i-1}$, but
we define $d(C_{i-1})$ to be 0 then, since it is faster to always
subtract something than to test whether a subtraction is required
and then do it if so.

It would be convenient if $a_i = 0$ implied $d(C_{i-1}) = 0$,
since then $a_i = 0$ would not be a special case and $a_i$ itself
would not be needed.  But although $a_i = 0$ implies $x_i = 0$,
which eliminates the $x_i - U$ term, the $L - x_i - b_i - c_i$
term can be positive when $a_i = 0$.

% All that is actually used,
% however, is the condition $a_i > 0$.  XESTT takes the presence
% of a history section in the constraint to mean $a_i > 0$, and
% its absence to mean $a_i = 0$.

This leads to a point of interest to implementers:  when there
is no lower limit (when $L = 0$), $a_i$ and $c_i$ do not influence
the values of these formulas.  Also of interest is the fact that
when $a_i$, $x_i$ and $c_i$ are all 0, the formulas reduce to
what they would be without history:  the formula for $d(C_i)$ becomes
$$d(C_i) = y_i :: max(0, L - y_i, y_i - U)$$
and $d(C_{i-1}) = 0$ since $a_i = 0$.  So 0 is a suitable default
value for $a_i$, $x_i$ and $c_i$.

It is not impossible to extend this work to incorporate information
about a resource's future timetable.  This would involve redefining
$x_i$ to be the number of $C$'s time groups outside $w_i$ which are
known to be active, and redefining $c_i$ to be the number of $C$'s time
groups outside $w_i$ whose activity is undetermined.  The problem is that
the formulas $x_{i-1} + y_{i-1} = x_i$ and $c_{i-1} = b_i + c_i$ need
detailed adjustment under the new definitions, leading to more complexity.

% If a future time group is known to be active, this
% increases $x_i$ by 1 and decreases $c_i$ by 1; if it is known to
% be inactive, this decreases $c_i$ by 1.

\subsection{Limit active intervals constraints}
Suppose now that $C$ is a limit active intervals constraint with
minimum limit $L$ and maximum limit $U$, where $0 \le L \le U$.

An {\em active interval} is a sequence of consecutive active time
groups; its length is what the constraint constrains.  Let the length
of active interval $\Delta$ be $l(\Delta)$, and let the deviation
contributed by $\Delta$ be $d(\Delta)$.  Then
$$d(\Delta) = max(0, L - l(\Delta), l(\Delta) - U)$$
$L \le U$ implies that the second and third terms cannot both
be positive.

Let $C$'s active intervals in $S$ be $\Delta_1 , \dots , \Delta_m$.
One way to define $c(C)$ is
$$c(C) = f ( \sum_{j=1}^m d(\Delta_j) )$$
A total deviation is found and the cost function is applied once,
giving a cost.

Conventional though it may be, this definition interacts badly
with history:  when $f(x)$ is non-linear, $C_i$ needs to know the
total deviation of all past active intervals.  This is not
surprising; after all, the value $x_i$ given to cluster busy
times constraints concerns all past time groups.  But when
constraining total workload it is natural to use total past
workload, whereas when constraining the lengths of active
intervals it is not natural to use the deviations of active
intervals from the distant past.  The competition doesn't,
for example.

So a different definition of $c(C)$ is made, which
interacts better with history:
$$c(C) = \sum_{j=1}^m f(d(\Delta_j))$$
This applies the cost function multiple times; but still it is
reasonable.

Suppose these quantities are available to $C_i$ for calculating costs
with:
\begin{quote}
\begin{tabular}{l@{\extracolsep{0.5cm}}l}
$a_i$ & The number of $C$'s time groups in $C_1 , \dots , C_{i-1}$ \\
$b_i$ & The number of $C$'s time groups in $C_i$ \\
$c_i$ & The number of $C$'s time groups in $C_{i+1} , \dots , C_n$ \\
$x_i$ & The number of consecutive active time groups immediately \\
      & \quad preceding $w_i$ \\
$\delta_p,\dots,\delta_q$ & The active intervals of $C_i$'s time groups
taken in isolation
\end{tabular}
\end{quote}
Cumulative cost cannot be calculated from these values.
The reported cost is
$$c(C_i) - c(C_{i-1}) = \sum_{j=p}^q f(d(\delta_j))$$
although this must be tweaked to take account of active intervals at
the ends of $w_i$ which could extend into adjacent weeks.  The
entire week may be active, giving a single active interval
potentially extending in both directions.

First suppose that $C_i$ includes at least one inactive time group, so
that its two ends are independent.
Suppose there is an active interval $\delta_q$ which includes the
last time group of $C_i$.  This interval could extend into $w_{i+1}$
and beyond.  Its full length is at least $l(\delta_q)$ and at most
$l(\delta_q) + c_i$, so it contributes
$$f(max(0, L - l(\delta_q) - c_i, l(\delta_q) - U))$$
to $c(C_i)$, and the last term of the sum above, $f(d(\delta_q))$,
must be replaced by this.  If there is no such active interval, then
no adjustment is required.

Now suppose that $x_i > 0$.  This is the length of an active interval
which includes the last time group of $C_{i-1}$.  As just explained,
it will have contributed
$$f(max(0, L - x_i - c_{i-1}, x_i - U))$$
to $c(C_{i-1})$, with $c_{i-1} = b_i + c_i$ as usual.  This contribution
is obsolete and must be subtracted away.  Then if the first time group
of $C_i$ is not active, the regular cost of an active interval with
length $x_i$ must be added:
$$f(max(0, L - x_i, x_i - U))$$
If the first time group of $C_i$ is active, then $x_i$ abuts
$\delta_p$, the first active interval of $C_i$, and their joint
contribution to $c(C_i)$ is
$$f(max(0, L - l(\delta_p) - x_i, l(\delta_p) + x_i - U))$$
That ends the $x_i > 0$ case.  None of this is needed when $x_i = 0$.

Finally, suppose that $C_i$  has no inactive time groups, so that
there is a single active interval $\delta_q$ which includes both
the first and last time groups of $C_i$.  The analyses for
both ends of the week apply to $\delta_q$.  If $x_i > 0$, then
$x_i$'s obsolete contribution must be subtracted away, and cost
$$f(max(0, L - l(\delta_q) - x_i - c_i, l(\delta_q) + x_i - U))$$
added.  If $x_i = 0$, there is nothing to subtract away, but
$$f(max(0, L - l(\delta_q) - c_i, l(\delta_q) - U))$$
must be added.

% In the implementation, the obsolete contribution is calculated just
% once and subtracted from every reported cost, while the rest is
% incorporated into the cost calculations for the individual intervals
% $\delta_p$ and $\delta_q$.

An implementation of the limit active intervals constraint
may be extended to handle history in three steps.
First, subtract a cost from the reported cost:
$$f(max(0, L - x_i - b_i - c_i, x_i - U))$$
when $x_i > 0$, and 0 otherwise.  Second, extend the data structure for
holding active intervals, and merging and splitting them as time groups
become active and inactive, to include an active interval of length
$x_i$ (when $x_i > 0$) lying just to the left of the first time group,
which participates in interval merges and splits like real intervals
do, except that its own (virtual) time groups never become inactive.
Third, when comparing an interval length with a lower limit, add
$c_i$ to the length when the interval includes the last time group.
These three extensions cover everything in the formulas above.

Once again, when $L = 0$, $a_i$ and $c_i$ do not influence the values
of these formulas.  In fact, $a_i$ has no influence even when $L > 0$,
although it is worth recording, since it is an upper limit for $x_i$.
And when $a_i$, $x_i$, and $c_i$ are all 0, the formulas reduce to
what they would be without history.

Existing models vary in their treatment of sequences of free and
busy days at the ends of the cycle.  When mimicking such models,
artificial values for history may be useful.  For example, in
\cite{curtois2016}, minimum limits do not apply to sequences of
busy or free days that include the first or last day.  This can
be handled by assigning value $L$ to $a_i$, $x_i$, and $c_i$ in
XESTT constraints that impose minimum (but not maximum) limits
on the lengths of these sequences.

Again, incorporation of information about a resource's future
timetable is not impossible.  If a future time group is known to
be inactive, the future is irrelevant from there, so $c_i$ may be
reduced.  Active time groups immediately following $w_i$ may allow
$l(\delta_q)$ to increase.  Once again, the details are complex.
