C/C++ parser for the CSP XML format version 2.1

(c) 2008 olivier.roussel at cril.univ-artois.fr

##########################################################################
#WARNING: 
# Many things have changed in this parser for format XCSP 2.1 in
# contrast with the previous parser for format XCSP 2.0. See the file
# CHANGES for details.
#
# Please read the CHANGES file which contain important informations
# which are no detailed in this file.
#
##########################################################################


The files in this archive containt a C++ parser for the XML format of
CSP/WCSP instances named XCSP 2.1. A C interface to the C++ parser is
also provided.

This parser uses the libxml2 library which must be installed on your
system

See http://xmlsoft.org/

This library is probably installed on most Linux platforms. Note that
you'll need both the libxml2 and the libxml2-devel packages to compile
your program.

The proposed parser reads the XML file in SAX mode. That means that
each time an XML tag is read, the libxml2 library calls a function of
the parser. In turn, the parser will decode the XML information and
call a function of the solver to transmit the informations that were
read.

The advantage of the SAX mode is that it doesn't use much memory. Its
drawback is that it forces to read the information in the order of
the XML file. In contrast, the DOM model would allow reading the
informations in any order but would require to store the whole XML
file in memory before the solver can grab any information.


Here's how the parser interfaces with the solver. The author of a
solver writes an interface (a set of functions) which are called by
the parser when it decodes a piece of information from the XML file
(this is a callback). These interface functions are in charge of
populating the data structures used by the solver. They don't have to
cope with the XML details (only the parser does). This is in fact an
instance of the Builder Design Pattern
(http://www.developer.com/design/article.php/1502691)

The programmer simply calls the parser with the file to read and with
a reference to the callback functions to use. The parser reads the
file, decodes the information and calls the callback functions that
create the solver data structures. Once the parser is done, the solver
can start its quest of a solution.

Let's take an example to give a clear idea of how it can work:

1) the solver calls the parser

2) the parser reads the XML file, chunk by chunk

3) when the parser reads the <instance> tag, it calls the solver
function beginInstance() with the name of the instance

4) when the parser reads the <domains> tag, it calls
beginDomainsSection() and announces the number of domains definition
to come

5) when the parser reads a <domain> tag, it calls
beginDomain(Name,nbValue) to indicate that the next calls will concern
domain Name and announces that it contains nbValue. The solver must
create a new domain named Name.

6) for each value v in this domain, it calls addDomainValue(v). The
solver is in charge of recording this value in the data structure
created for the last domain.

7) when all values in the domain have been transmistted to the solver,
the parser calls endDomain()

8) when all domains are read, the parser calls endDomainsSection()

and so on...

One important point is the way the parser communicates an expression
(in the definition of a predicate) to the solver.  As different
solvers may need different representations of the expressions, the
parser is in charge of reading the expression and translating it to a
number of different formats.

This means that the solver may ask to receive a string that represents
a C expression, or a string that represents an expression in prefix
notation, or postfix notation. The solver may even ask to receive the
expression as an abstract syntax tree (AST). This AST is a tree of
nodes, each node representing either a variable, or a constant, or an
operator, or a list of nodes, or a dictionary of nodes,... See the
file CHANGES for a brief description of these types.

In short, the parser decodes the XML file, create an internal
representation of the expression as an asbtract syntax tree and
provide the solver with the most convenient representation.

Whenever an alphanumeric identifier is read (name of a domain,
variable, relation,...), a corresponding numerical identifier is
assigned by the parser. These numerical IDs start from 0 and are
assigned separately for domains, variables, relations.... Whenever the
parser communicates an alphanumeric identifier to the solver, it also
provides the solver with the numerical identifier.

In the example callbacks, this is displayed as VARNAME/NUMERICALID or
PARAMETERNAME/$POSITION


#
# Example in C++
#

The directory examples/C++ contains a small example demonstrating the
use of the parser in C++.

#
# Example in C
#

The directory examples/C contains a small example demonstrating the
one possible use of the parser in C.

Actually, there are two ways to use the parser in C.

1) The first method is to use the C interface as demonstrated in
   examples/C. This approach is not the best one because the C++
   parser has to create a copy of the data in structures which can be
   handled in C. Clearly this approach is not optimal, unless your
   solver adopts the C data structures generated by the parser.


2) The second method is to use the C++ interface of the parser dans
   call your C function from the C++ callback. This is the best
   approach when your solver has its own data structures.

All C++ data structure created by the parser are read-only and are
owned by the parser (i.e. the parser deallocates them as soon as they
are useless).

All C data structure created by the parser are allocated by malloc and
are own by your solver. The C++ parser will never deallocate them.


#
# Tools
#

The tools directory contains two programs

CSP-verifier.cc 

  This is the source of the verifier program that will be used during
  the competition.

  The basic use of the verifier program is to pipe the solution given
  by the solver to the verifier program with a single command line
  parameter which is the name of the instance file. Example:

  cat solverOutput | CSP-verifier instance.xml

  The verifier will output OK if the solution satisfies all
  constraints. It will print ERROR with the first unsatisfied
  constraint otherwise.

CSPInstanceInformation.cc 

  This is a program which collects statistics about an instance.

 
#
# Inner working of the parser
#

In a few words, the class XMLParser_libxml2<> is the main class, which
contains the parse() method. It has a template which is the type of
the callback class to use. By default, this template argument is
CSPParserCallback which defines the virtual methods which the parser
will call to provide the solver with the decoded information. If you
think that virtual functions are evil, you can instanciate
XMLParser_libxml2<Callback> with a callback class that doesn't use
virtual functions (but the parser still uses virtual methods in a
number of other classes).

XMLParser_libxml2 is mainly an interface between the libxml2 library
and the XMLParser class which contains the actual parser.