man/engines.doc

\chapter{Coroutining using Prolog engines}
\label{sec:engines}

Where the term \jargon{coroutine} in Prolog typically refer to hooks
triggered by \jargon{attributed variables} (\secref{attvar}), SWI-Prolog
provides two other forms of coroutines. Delimited continuations (see
\secref{delcont}) allow creating coroutines that run in the same Prolog
engine by capturing and restarting the \jargon{continuation}. This
section discusses \jargon{engines}, also known as \jargon{interactors}.
The idea was pinned by Paul Tarau \cite{DBLP:conf/coordination/Tarau11}.
The API described in this chapter has been established together with
Paul Tarau and Paulo Moura.

Engines are closely related to \jargon{threads} (\secref{threads}). An
engine is a Prolog virtual machine that has its own stacks and (virtual)
machine state. Unlike normal Prolog threads though, they are not
associated with an operating system thread. Instead, you \jargon{ask} an
engine for a next answer (engine_next/2). Asking an engine for the next
answer attaches the engine to the calling operating system thread and
cause it to run until the engine calls engine_yield/1 or its associated
goal completes with an answer, failure or an exception. After the engine
yields or completes, it is detached from the operating system thread and
the answer term is made available to the calling thread. Communicating
with an engine is similar to communicating with a Prolog system though
the terminal. In this sense engines are related to \jargon{Pengines} as
provided by library \pllib{pengines}, but where Pengines aim primarily
at accessing Prolog engines through the network, engines are in-process
entities.

\section{Examples using engines}
\label{sec:engine-examples}

We introduce engines by describing application areas and providing
simple example programs. The predicates are defined in
\secref{engine-predicates}. We identify the following application areas
for engines.

\begin{enumerate}
    \item Aggregating solutions from one or more goals.  See
          \secref{engine-aggregation}.
    \item Access the terms produced in \jargon{forward execution}
          through backtracking without collecting all of them
	  first. \Secref{engine-aggregation} illustrates this as
	  well.
    \item State accumulation and sharing.  See \secref{engine-state}.
    \item Scalable many-agent applications.  See \secref{engine-agents}.
\end{enumerate}


\subsection{Aggregation using engines}
\label{sec:engine-aggregation}

Engines can be used to reason about solutions produced by a goal through
backtracking. In this scenario we create an engine with the goal we wish
to backtrack over and we enumerate all its solution using engine_next/1.
This usage scenario competes with the all solution predicates
(findall/3, bagof/3, etc.) and the predicates from library
\pllib{aggregate}. Below we implement findall/3 using engines.

\begin{code}
findall(Templ, Goal, List) :-
	setup_call_cleanup(
	    engine_create(Templ, Goal, E),
	    get_answers(E, List),
	    engine_destroy(E)).

get_answers(E, [H|T]) :-
	engine_next(E, H), !,
	get_answers(E, T).
get_answers(_, []).
\end{code}

The above is not a particularly attractive alternative for the built-in
findall/3. It is mostly slower due to time required to create and
destroy the engine as well as the (currently\footnote{The current
implementation of engines is built on top of primitives that are not
optimal for the engine use case. There is considerable opportunity to
reduce the overhead.}) higher overhead of copying terms between engines
than the overhead required by the dedicated representation used by
findall/3.

It gets more interesting if we wish to combine answers from multiple
backtracking predicates. Assume we have two predicates that, on
backtracking, return ordered solutions and we wish to merge the two
answer streams into a single ordered stream of answers. The solution in
classical Prolog is below. It collects both answer sets, merges them
using ordered set merging and extract the answers. The code is clean and
short, but it doesn't produce any answers before both generaters are
fully enumerated and it uses memory that is proportional to the combined
set of answers.

\begin{code}
:- meta_predicate merge(?,0, ?,0, -).

merge_answers(T1,G1, T2,G2, A) :-
	findall(T1, G1, L1),
	findall(T2, G2, L2),
	ord_union(L1, L2, Ordered),
	member(A, Ordered).
\end{code}

We can achieve the same using engines. We create two engines to generate
the solutions to both our generators. Now, we cas ask both for an
answer, put the smallest in the answer set and ask the engine that
created the smallest for its next answer, etc. This way we can create an
ordered list of answers as above, but now without creating intermediate
lists. We can avoid creating the intermediate list by introducing a
third engine that controls the two generators and \jargon{yields} the
answers rather than putting them in a list. This is a general example of
turning a predicate that builds a set of terms into a non-deterministic
predicate that produces the results on backtracking. The full code is
below. Merging the answers of two generators, each generating a set of
10,000 integers is currently about 20\% slower than the code above, but
the engine-based solution runs in constant space and generates the first
solution immediately after both our generators have produced their first
solution.


\begin{code}
:- meta_predicate merge(?,0, ?,0, -).

merge(T1,G1, T2,G2, A) :-
	engine_create(A, merge(T1,G1, T2,G2), E),
	repeat,
	    (	engine_next(E, A)
	    ->	true
	    ;	!, fail
	    ).

merge(T1,G1, T2,G2) :-
	engine_create(T1, G1, E1),
	engine_create(T2, G2, E2),
	(   engine_next(E1, S1)
	->  (   engine_next(E2, S2)
	    ->	order_solutions(S1, S2, E1, E2)
	    ;	yield_remaining(S1, E1)
	    )
	;   engine_next(E2, S2),
	    yield_remaining(S2, E2)
	).

order_solutions(S1, S2, E1, E2) :- !,
	(   S1 @=< S2
	->  engine_yield(S1),
	    (	engine_next(E1, S11)
	    ->	order_solutions(S11, S2, E1, E2)
	    ;	yield_remaining(S2, E2)
	    )
	;   engine_yield(S2),
	    (	engine_next(E2, S21)
	    ->	order_solutions(S1, S21, E1, E2)
	    ;	yield_remaining(S1, E1)
	    )
	).

yield_remaining(S, E) :-
	engine_yield(S),
	engine_next(E, S1),
	yield_remaining(S1, E).
\end{code}


\subsection{State accumulation using engines}
\label{sec:engine-state}

Applications that need to manage a state can do so by passing the state
around in an additional argument, storing it in a global variable or
update it in the dynamic database using assertz/1 and retract/1. Both
using an additional argument and a global variable (see b_setval/2),
make the state subject to backtracking. This may or may not be
desirable. If having a state is that subject to backtracking is
required, using an additional argument or backtrackable global variable
is the right approach. Otherwise, non-backtrackable global variables
(nb_setval/2) and dynamic database come into the picture, where global
variables are always local to a thread and the dynamic database may or
may not be shared between threads (see thread_local/1).

Engines bring an alternative that packages a state inside the engine
where it is typically represented in a (threaded) Prolog variable. The
state may be updated, while controlled backtracking to a previous state
belongs to the possibilities. It can be accessed and updated by anyone
with access to the engines' handle. Using an engine to represent state
has the following advantages:

\begin{itemize}
   \item The programming style needed inside the engine is much more
   `Prolog friendly', using engine_fetch/1 to read a request and
   engine_yield/1 to reply to it.
   \item The state is packaged and subject to (atom) garbage
   collection.
   \item The state may be accessed from multiple threads.  Access
   to the state is synchronized without the need for explicit locks.
\end{itemize}

The example below implements a shared priority heap based on library
\pllib{heaps}. The predicate \nopredref{update_heap}{1} shows the
typical update loop for maintaining state inside an engine: fetch a
command, update the state, yield with the reply and call the updater
recursively. The update step is guarded against failure. For robustness
one may also guard it against exceptions using catch/3. Note that
\nopredref{heap_get}{2} passes the \arg{Priority} and \arg{Key} it wishes
to delete from the heap such that if the unification fails, the heap
remains unchanged.

The resulting heap is a global object with either a named or anonymous
handle that evolves independently from the Prolog thread(s) that access
it. If the heap is anonymous, it is subject to (atom) garbage
collection.

\begin{code}
:- use_module(library(heaps)).

create_heap(E) :-
	empty_heap(H),
	engine_create(_, update_heap(H), E).

update_heap(H) :-
	engine_fetch(Command),
	(   update_heap(Command, Reply, H, H1)
	->  true
	;   H1 = H,
	    Reply = false
	),
	engine_yield(Reply),
	update_heap(H1).

update_heap(add(Priority, Key), true, H0, H) :-
	add_to_heap(H0, Priority, Key, H).
update_heap(get(Priority, Key), Priority-Key, H0, H) :-
	get_from_heap(H0, Priority, Key, H).

heap_add(Priority, Key, E) :-
	engine_post(E, add(Priority, Key), true).

heap_get(Priority, Key, E) :-
	engine_post(E, get(Priority, Key), Priority-Key).
\end{code}


\subsection{Scalable many-agent applications}
\label{sec:engine-agents}

The final application area we touch are agent systems were we wish to
capture an agent in a Prolog goal. Such systems can be implemented using
threads (see \secref{threads}) that use thread_send_message/2 and
thread_get_message/1 to communicate. The main problem is that each
thread is associated by an operating system thread. OS threads are,
depending on the OS, relatively expensive. Scalability of this design
typically ends, depending on OS and hardware, somewhere between 1,000
and 100,000 agents.

Engines provide an alternative. A detached Prolog engine currently
requires approximately 20~Kbytes memory on 64~bit hardware, growing
with the size of the Prolog stacks. The Prolog stacks may be minimised
by calling garbage_collect/0 followed by trim_stacks/0, providing a
\jargon{deep sleep} mode. The set of agents, each represented by
an engine can be controlled by a static or dynamic pool of threads.
Scheduling the execution of agents and their communication is completely
open and can be optimised to satisfy the requirements of the
application.

\begin{quote}
This section needs an example. Preferably something that fits on one
page and would not scale using threads. Engines might work nice to
implement \textit{Antrank: An ant colony algorithm for ranking web
pages}.\footnote{\url{http://www.ijettcs.org/Volume3Issue2/IJETTCS-2014-04-23-113.pdf}}
\end{quote}


\section{Engine resource usage}
\label{sec:engine-resources}

A Prolog engine consists of a virtual machine state that includes the
Prolog stacks. An `empty' engine requires aout 20~KBytes of memory. This
grows when the engine requires additional stack space. Anonymous engines
are subject to atom garbage collection (see garbage_collect_atoms/0).
Engines may be reclaimed immediately using engine_destroy/1. Calling
engine_destroy/1 destroys the virtual machine state, while the handle
itself is left to atom garbage collection. The virtual machine is
reclaimed as soon as an engine produced its last result, failed or
raised an exception. This implies that it is only advantageous to call
engine_destroy/1 explicitly if you are not interested in further
answers.

Engines that are expected to be left in inactive state for a prelonged
time can be minimized by calling garbage_collect/0 and trim_stacks/0
(in that order) before calling engine_yield/1 or succeeding.


\section{Engine predicate reference}
\label{sec:engine-predicates}

This section documents the built-in predicates that deal with engines.
In addition to these, most predicates dealing with threads and message
queue can be used to access engines.

\begin{description}
    \predicate[det]{engine_create}{3}{+Template, :Goal, ?Engine}
\nodescription
    \predicate[det]{engine_create}{4}{+Template, :Goal, -Engine, +Options}
Create a new engine and unify \arg{Engine} with a handle to it.
\arg{Template} and \arg{Goal} form a pair similar to findall/3: the
instantiation of \arg{Template} becomes available though engine_next/2
after \arg{Goal} succeeds.  \arg{Options} is a list of the following
options.  See thread_create/3 for details.
    \begin{description}
	\termitem{alias}{+Name}
    Give the engine a name.  \arg{Name} must be an atom.  If this
    option is provided, \arg{Engine} is unified with \arg{Name}.
    The name space for engines is shared with threads and mutexes.
        \termitem{stack}{+Bytes}
Set the stack limit for the engine.  The default is inherited from
the calling thread.
    \end{description}
The \arg{Engine} argument of engine_create/3 may be instantiated to an
atom, creating an engine with the given alias.

    \predicate[det]{engine_destroy}{1}{+Engine}
Destroy \arg{Engine}.

    \predicate[semidet]{engine_next}{2}{+Engine, -Term}
Ask the engine \arg{Engine} to produce a next answer.  On this first
call on a specific engine, the \arg{Goal} of the engine is started.
If a previous call returned an answer through completion, this causes
the engine to backtrack and finally, if the engine produces a previous
result using engine_yield/1, execution proceeds after the engine_yield/1
call.

    \predicate[det]{engine_next_reified}{2}{+Engine, -Term}
Similar to engine_next/2, but instead of success, failure or or raising
an exception, \arg{Term} is unified with one of terms below. This
predicate is provided primarily for compatibility with Lean~Prolog.

    \begin{description}
        \termitem{the}{Answer}
    Goal succeeded with \arg{Template} bound to \arg{Answer} or
    Goal yielded with a term \arg{Answer}.
	\termitem{no}{}
    Goal failed.
	\termitem{exception}{Exception}
    Goal raises the error \arg{Exception}.
    \end{description}

    \predicate[det]{engine_post}{2}{+Engine, +Term}
Make \arg{Term} available to engine_fetch/1 inside the \arg{Engine}.
This call must be followed by a call to engine_next/2 and the engine
must call engine_fetch/1.

    \predicate[det]{engine_post}{3}{+Engine, +Term, -Reply}
Combines engine_post/2 and engine_next/2.

    \predicate[det]{engine_yield}{1}{+Term}
Called from within the engine, causing engine_next/2 in the caller to
return with \arg{Term}. A subsequent call to engine_next/2 causes
engine_yield/1 to `return'. This predicate can only be called if the
engine is not involved in a callback from C, i.e., when the engine calls
a predicate defined in C that calls back Prolog it is not possible to
use this predicate. Trying to do so results in a
\const{permission_error} exception.

    \predicate[det]{engine_fetch}{1}{-Term}
Called from within the engine to fetch the term made available through
engine_post/2 or engine_post/3.  If no term is available an
existence_error exception is raised.

    \predicate[det]{engine_self}{1}{-Engine}
Called from within the engine to get access to the handle to the engine
itself.

    \predicate[semidet]{is_engine}{1}{@Term}
True if \arg{Term} is a reference to or the alias name of an existing
engine.

    \predicate[nondet]{current_engine}{1}{-Engine}
True when \arg{Engine} is an existing engine.
\end{description}