Xenofarm

What is Xenofarm?
Documentation
Latest changes
Download
Installations
Contact

Contents

  1. This is Xenofarm
  2. The Grand Design of Everything
  3. The Server Script
    3.1. Obtaining the Source
    3.2. Creating a Build Package
    3.3. Build Package Format
    3.4. Invoking the Generic Server Script
    3.5. Example Setup for the Generic Server Script
  4. The Web Export
  5. The Compilation Client
  6. The Web Import
  7. The Result Parser
    7.1. Format of a Result Package
    7.2. Invoking the Generic Result Parser
  8. The Result Interface
  9. Garbage Collection
 10. Creating your own Project


1. This is Xenofarm

The Xenofarm project is a rather small project, if measured in code. This is because the wonders of modern computer engineering lets us reuse components to build advanced applications with ease, letting us programmers focus on important tasks as design and quality. Did you buy that? Well, normally the world isn't as perfect as we like it to be, but in this case it is, since Xenofarm is just some glue components holding together other components.

The goal of Xenofarm is to aid programmers who want to build extremely platform independent applications. Most programmers would say, "If I want to do something platform independent I use Pike/Python/Perl/Java/LOGO", but there are still some guys struggling to get these programming languages to work on all strange OS on all strange platforms. It is for them (us) this tool is built.

From a historical point of view, Xenofarm is a complete rewrite of the automated build and regression test tool that was used by the Pike programming language development team between 1999 and 2002. It was called AutoBuild and was created by Martin Nilsson and Johan Schön at Roxen Internet Software. AutoBuild was in turn derived from (or rather inspired by) Tinderbox, a similar system used by the Mozilla development team. Xenofarm still sticks to the basic AutoBuild concept, but was redesigned to get rid of the few annoying flaws and shortcomings of its predecessor. Most notably the system components has been more loosly coupled and the system should scale to X different clients performing Y different projects for Z different servers.

SH-client by Peter Bortas
HTTP PUT  by Per Hedbor
The rest  by Martin Nilsson

Substantial contributors:

Per  Cederqvist
Johan Sundström
Anders Qvist

Xenofarm is released under GNU GPL.

2. The Grand Design of Everything

The basic idea with Xenofarm is to help developers solve bugs, as compared with the Mozilla projects Tinderbox where the goal is to find the bugs as quickly as possible. We believe that it is better to get "the full picture" of what is broken, and as much information as possible, before starting to fix the code. There are essentially three ways that differs between Tinderbox and Xenofarm in this aspect;

  1. Xenofarm synchronizes its build on all platforms. It creates a build package, containing the source code and additional data needed to build it, and distributes it to all clients and the build result is presented as a group. It is then easy to see if eg. all Sparc computers failed or all Solaris computers failed, which simplifies the bug hunt.

  2. Xenofarm applies CVS checkout latencies, to try to minimize mid-checkin breaks. The latency is currently based on a simple threshold value, but a more advanced solution is in the works.

  3. "All" relevant information is collected from the build machines to the front end. This enables people with no login access to the computers to help in the bug hunt.

Another design goal with Xenofarm was to scale better than AutoBuild did; In terms of number of clients, in terms of number of projects, in terms of number of source locations and in terms of distance. Conceptually the system has a number of build package producers and a number of compilation clients. Every compilation client then decides which packages they want to "subscribe" to. The workflow is as follows:

  1. Some trigger mechanism signals that it is time to make a new build. The trigger is typically a CVS checkin.
  2. The server script creates a new build package and exports it through a web server.
  3. The client polls the web server and finds a new package, which it downloads.
  4. The client builds/tests/whatever the package and uploads the result to a web server. Typically the same web server as it downloaded the build package from.
  5. A result processor takes the uploaded result package and derives interesting facts, which it stores in a database. The result files are moved into an exported web directory.
  6. The result summary and all the exported files can be viewed from a web interface.

3. The Server Script

The server script is responsible for creating new build packages when triggered, and place them in a directory where the web export picks it up. Since the composition of a build package varies a lot from project to project, we've taken the approach to include all the base functionality into one pike program file and then inherit that file into "project programs" that contains logic specific for that project. The rationale of course is that programming languages are better suited for this type of configurational task than any artificial configuration file language one might come up with.

The base program is however useful in its own to create a "generic" build package from a CVS archive. The rest of this section talks about the generic server script; it may not apply to derived scripts such as projects/pike/server.pike.

Before you run the base server for the first time, you must create a work directory for it. This directory should be located on the same filesystem as the web directory (see --web-dir). The base server will store state files within a subdirectory called "state", and temporary files within "tmp". It also requires a copy of the source; see below. It will not use other file names in the work directory, so you are free to use other files yourself.

3.1. Obtaining the Source

The generic server polls the CVS repository by running "cvs -q update" in a working copy of the source. If anything has been changed, it will create a new build package.

You must perform the initial checkout manually, before running server.pike. The checkout must be performed in the work directory of server.pike (see --work-dir). If you want to build from a branch, just use the "-r" option of "cvs co" to create a sticky tag in the working copy.

The name of the checked out module must match the --cvs-module argument you give to server.pike.

3.2. Creating a Build Package

The working copy of the source must be transformed to a build package that the clients can download. The input to this transformation is the working copy of the source. The result of the transformation should be a gzipped tar file with certain properties (see "Build package format").

The generic server has built-in support for a very simple transformation: it simply runs

   echo $buildid > $working/buildid.txt
   tar cf $name.tar $working
   rm $working/buildid.txt
   gzip $name.tar

This only works if the working copy fulfills the requirements defined in "Build package format". For projects that are not adapted to Xenofarm, this is most often not the case.

You can specify an external program via the --transformer argument. That program will receive three arguments: the name of the directory that holds the working copy of the source, the name of the resulting build package that it should create and the build id. The ".tar.gz" suffix is not included in the name argument. This small shellscript is equivalent to the default built-in transformation:

    #!/bin/sh
    working=$1
    name=$2
    buildid=$3
    echo $buildid > $working/buildid.txt
    tar cf $name.tar $working || exit 1
    rm $working/buildid.txt
    gzip $name.tar || exit 1
    exit 0

The transform script can be written in any language; you don't need to know Pike to write it. If you want to use Pike, you might consider inheriting from server.pike and writing your own transform_source() function. Doing so will be slightly more efficient and enables you to directly access all variables in the server program. See projects/pike/server.pike for an example of what a program that inherits the server might look like.

The transform script should take care not to modify anything within $working. If you need to create files in the distribution, do so in a copy. Assuming you have GNU cp, you could write a script along these lines:

    #!/bin/sh
    working=$1
    name=$2
    buildid=$3
    rm -rf copy
    cp -a $working copy || exit 1
    echo $buildid > copy/buildid.txt
    (cd copy && autoconf) || exit 1
    cat << EOF > copy/makefile
    xenobuild:
    	./configure
    	make -f Makefile
    EOF
    tar cf $name.tar copy || exit 1
    gzip $name.tar || exit 1
    exit 0

See projects/lyskom-server/source-transform.sh for a more complex example of a transformation script.

3.3. Build Package Format

The build package must:

  • create a toplevel directory and unpack in it.
  • contain a buildid.txt file in the toplevel directory that identifies the build. If an unmodified result parser is to be used the buildid.txt file must conform to the description in section 7.1.

3.4. Invoking the Generic Server Script

The generic server program is started with

  ./server.pike <arguments> <project>

where arguments are from the list below and project is the name of the xenofarm project. You need one server per project. Many of the settings have hard coded defaults, but in some cases that was not possible, so some arguments are mandatory. These are cvs-module, db, web-dir and work-dir.

Arguments:

--cvs-module=<name>

The CVS module the server should use. This argument is mandatory.

--db=<database url>

The database URL, eg. mysql://localhost/xenofarm. This argument is mandatory.

--force

Make a new build and exit, regardless if there has been a new chekin or if we are within the min-distance quarantine or checkin latency.

--help

Displays a summary of the available arguments and their use.

--latency=<seconds>

The enforced latency between the latest checkin and when the next package is made. Defaults to 300 seconds (5 minutes).

--min-distance=<seconds>

The enforced minimum distance in time between to builds. Defaults to 7200 seconds (two hours).

--poll=<seconds>

How often the CVS is queried for new checkins, not including the time when the server is enforcing minimum build distance and when the server is enforcing build latency. Defaults to 60 seconds.

--repository=<path>

The CVS repository the server should use. This string is given as the -d argument to the CVS binary. (This value is not currently used by the generic server.pike script. Derived versions may find it useful.)

--transformer=<file path>

Program that builds the source package.

--update-opts=<options>

CVS options to append to "cvs -q update". Defaults to "-Pd" but using "--update-opts=-d" could also make sense. Giving multiple --update-opts adds arguments to the command line, in that order.

--verbose

Send messages about everything that happens to stdout. It might be a good idea to start the server with ./server.pike <args> --verbose <project> > server.log &

--web-dir=<path>

Where the outgoing build packages should be put. This argument is mandatory.

--work-dir=<path>

Where state files and temporary files for the server script should be put. This argument is mandatory.

3.5. Example Setup for the Generic Server Script

Let's say that you want to set up a Xenofarm for Python. The first step is to create a work directory for the generic server script, and perform an initial checkout of the Python sources. Like this:

  $ mkdir -p /some/path/pythonworkdir
  $ cd /some/path/pythonworkdir
  $ cvs -z3 -d:pserver:anonymous@cvs.python.sourceforge.net:/cvsroot/python \
	 co python
  $

You also need a web export directory (more about this below).

  $ mkdir -p /some/path/pythonexport

Finally, you need to create a database. Assuming that the user "linus" will do the database maintenance and run server.pike and result_parser.pike, that the Roxen modules will run as user "www" and that there is a mysql user root that is priviliged to grant others priviliges. Under these conditions these commands will set up the privileges properly for a database called "python_xenofarm":

  $ mysql -u root -p
  mysql> create database python_xenofarm;
  mysql> grant select, insert, update, delete, create, drop, index, alter
      -> on python_xenofarm.*
      -> to linus@localhost;
  mysql> grant select, insert, update, delete
      -> on python_xenofarm.*
      -> to www@localhost;
  mysql> exit

And then populate it with a few tables:

  $ mysql -D python_xenofarm -u linus -p < tables.sql

Once all this is in place, you can run the generic script like this:

  $ ./server.pike --db=mysql://dbuser:dbpassword@/python_xenofarm \
	--cvs-module=python \
	--web-dir=/some/dir/pythonexport/ \
	--work-dir=/some/dir/pythonworkdir/ \
	python

The server will run forever and create new build packages within /some/dir/pythonworkdir/ when anything has changed.

4. The Web Export

The Xenofarm client fetches new build packages from a specific web server URL. From that URL it expects to get a file named snapshot.tar.gz. It also expects the Last-Modified header to change to a later date and time when a new build package is made available.

The Roxen WebServer module "Xenofarm I/O module", found in roxen_modules/xenofarm_fs.pike, is mounted as a file system module in the webservers virtual file system. By default it mounts to the path /xenofarm. The module will export the build packages put in the path entered in "Dist search path" at the module mount point.

The module expects all build packages to be named as XXXX-YYYYMMDD-hhmmss.tar.gz, where XXXX is the name of the project, eg. Pike7.3-20020509-203344.tar.gz. These files are directly accessible from the mountpoint, e g /xenofarm/Pike7.3-20020509-203344.tar.gz in this case. The latest version is exported through the stable URL mountpoint/latest, e g /xenofarm/latest in this case. This in turn returns a redirect to a snapshot.tar.gz in a subdirectory named after the latest source distribution. In this case /xenofarm/latest would redirect to /xenofarm/Pike7.3-20020509-203344/snapshot.tar.gz. The Last-Modified header is derived from the file name, not the actual mtime of the file, so the system is robust and handles dirty hacks that move the files between systems.

The reason to put more than one file in the outgoing directory is to enable more redirects than just the "latest" one. It might be interesting to export the most successfull one, eg. the one with the highest number of successfull builds, as a service for people not interested in the build project, but in the source code itself. In this way Xenofarm might act as an automatic QA for people who wants an almost CVS-fresh version and still minimizing the risks of upgrading.

It could also be interesting to provide a "latest interesting" redirect, where interesting means interesting from a debugging point of view. Some of the machines that participates in a Xenofarm "cluster" could be so slow that it might take several normal build cycles for them to complete one. In these cases it would be a big loss if these machines took the latest CVS snapshot and spent several hours compiling only to find a trivial typo that a faster machine detected in minutes. A CVS snapshot containing only failing builds is likely to fail on these machines as well, while on a snapshot where at least one machine succeeded and several failed every extra result will aid programmers in solving the problem.

It is of course possible to use a less complex solution for the distribution. Just put the files in a normal directory, add a symlink from snapshot.tar.gz to the latest build package, and ensure that the webserver handles the Last-Modified correctly when you update your files. An even simpler solution is to just replace the snapshot.tar.gz every time a new build is made.

5. The Compilation Client

The compilation client is the collective name of the software running on the client side, taking care of downloading, compilation, data gathering and result submission. The compilation client is written in sh scripts who mostly use standard unix components. The non-standard components required are wget for downloading build packages and a little C utility that performs HTTP PUT, used for uploading the result. There are two reasons for why we have taken the trouble to do the client in this way:

  1. Although Pike is very portable, sh and wget are almost always already present. This choice of language will enable even more platforms to become compilation clients.

  2. Since a lot of files in the Pike source code are generated by Pike scripts, it is good to ensure that the clients do not have any easily accessible Pike installed. This checks if any make rules in the build package target is broken so that some files are not generated.

We have tried to keep the interface between the build package and the client as simple as possible. The package should be a standard tar.gz file, which when unpacked should result in a single directory with all the files and subdirectories of the project in it. Once unpacked, the client will enter the directory and execute the command defined in the clients configuration file. It expects this to result in a result.tar.gz file. This file must be a standard tar.gz file, since the client repackages the result and inserts the file machineid.txt.

If the command fails the client will create a result.tar.gz file of its own, containing at least three files:

xenofarmclient.txt

This file contains a log of all output from running the build command in the package. This file should only be produced by the client script when the execution command fails.

machineid.txt

This file contains a number of values on the form "identifier: value" that identifies the build machine and it's enviroment. Currently it contains the following pairs:

   sysname: <output from "uname -s">
   release: <output from "uname -r">
   version: <output from "uname -v">
   machine: <output from "uname -m">
   nodename: <output from "uname -n">
   testname: <the name of the test as specified in the config file>
   command: <the command line for the test as specified in the
             config file>
   clientversion: <the output from "client.sh --version">
   putversion: <the output from "put --version">
   contact: <an email address to the person maintaining the client
             instance as specified in config/contact.txt>

New identifiers might be added in new versions of the client. Unknown identifiers should not cause a fatal error in the result parser. They are available in the result mapping in the result_parser.pike program as any other identifier-value pair.

buildid.txt

This file is a copy of the buildid.txt file in the build package.

For more information about the compilation client, see the README in the client directory.

6. The Web Import

When the client has built its result file, it is uploaded through HTTP PUT to a URL specified in the client configuration file. The "Xenofarm I/O module" is able to be in the receiving end and store the results in the "Result search path" directory. The module accepts all files that are uploaded to "result" in its mountpoint. Given the default mountpoint /xenofarm, the default result URL path is /xenofarm/result. The uploaded files are named "res" followed by the posix time, an underscore, a counter incremented for every upload since the module started, and finally ".tar.gz". The file name might look like res1021228036_31.tar.gz.

Again it is possible to use any webserver and set up a path that accepts HTTP PUT. Possible solutions to minimize the risk of clashes is to let the clients name the result packages differently, or letting all clients have their own upload URL.

7. The Result Parser

Once the result packages are uploaded to the result server, which need not be the same machine as the export server, they are processed by the result parser. The result parser is a fairly simplistic script that checks the in directory for new files at given intervals, and if a new script is found does the following:

  1. Unzips and untars the package into a working directory.
  2. Retrieves the build id.
  3. Retrieves the machine id, eg. the host name and uname of the client machine.
  4. Finds out how many of the possible steps where completed, eg. compilation and verification but not packaging, and calculates an overall result for the result.
  5. Counts the number of compilation warnings.
  6. Stores the results of the parsing in a database.
  7. Moves the log files to a directory reachable from the front end web server.
  8. Deletes the uploaded package.

The heuristics for calculating the overall result in step 4 is as follows: If the task "build" doesn't pass, the the result failed. If everything passed, the result passed. Otherwise the build is in the state "WARN".

7.1. Format of a Result Package

This section documents how a result package should look like when the generic result_parser.pike script is used. It should be a gzipped tar file that contains a number of files. It must not contain any directory, since such tars will not be opened for security reasons. The following files have special meaning in the generic result parser:

buildid.txt

The first line of this file should contain the build id, which is the id column of the build table. This value is used for the build column in the result table.

machineid.txt

This file is always produced and included by the Xenofarm client. The format of this file is described in section 6. If duplicates of a key is found, the last value will be used. It is thus possible to override the generated values by adding an additional keypair in the config/contact.txt file.

mainlog.txt

A log of the build process, in a special format.

  mainlog   := id logpair+ ("END\n" [0x00-0xff]*)?
  id        := "FORMAT " version "\n"
  logpair   := start logpair* end
  start     := "BEGIN " task "\n" timestamp "\n"
  end       := result "\n" timestamp "\n"
  result    := "PASS" | "FAIL" | "WARN" (" " warnings)?
  version   := [0-9]+
  task      := [_a-ZA-Z0-9]+
  warnings  := [0-9]+
  timestamp := [-.:/0-9a-zA-Z ]+

Where "version" is the version of the log file format, which currently is "2". The token "task" is the name of the task or group of tasks performed. Note that all tasks at the same level and within the same supertask must have uniqe names. "warnings" is a numeric value of the number of warnings encountered. "timestamp" is the output from the date command in the "C" locale.

compilelog.txt

A log of the compilation. The warning count is the number of lines in this file that case insensitively matches the string "warning".

In addition to the above files, you may include any number of extra files. They will all be accessible via the web result interface, but the generic Xenofarm result parser will ignore them.

7.2. Invoking the Generic Result Parser

The generic result parser can be used in two different modes, either to parse specific result files or as a daemon that checks for new result files in a directory for incoming files. To start the result parser, type:

  ./result_parser.pike <arguments> [<result files>]

If a list of files are given, only these files will be parsed. Otherwise the result parser will start to act as a daemon, reading new files from the result directory at given intervals. Every successfully parsed file will be removed from the result directory.

Arguments:

--db=<database url>

The database URL, e.g. mysql://localhost/xenofarm. This argument is mandatory, unless --dry-run has been issued.

--dry-run

If used, the result parser will not store any results in the database, move any files to the web server directory or remove any files from the result directory.

--help

Displays a summary of the available arguments and their use.

--poll=<seconds>

How often the result directory is checked for new result files. Defaults to 60 seconds.

--result-dir=<path>

Where incoming result files are read from. This argument is mandatory, unless any result files are given directly at the command line.

--verbose

Send messages about everything that happens to stdout.

--web-dir=<path>

Where the contents of the result files chould be copied to. This argument is mandatory, unless --dry-run has been issued.

--work-dir=<path>

Where temporary files should be put. This directory needs to be emptied before the script is started. This argument is mandatory.

8. The Result Interface

TBD

9. Garbage Collection

When running a Xenofarm server some files are not removed from the system. All the build packages produced by the server program as well as all the files obtained from the result packages will remain until deleted. To automate the deletion of superfluous you can use the Garbage Collection script, gc.pike. Although it currently isn't very advanced, we expect it might grow complex in the future.

Currently only the last build package, or last few packages, are spared, but if the possible extensions listed in section 4 are implemented several packages must be kept online. These build packages could potentially differ very much in age.

In a similar fashion it is possible that we would like to keep more build results than the last few builds. The latest successful build is one possibility, especially if the resulting binary is included in the result package. The latest build result of every status, eg. one that failed to build, one that failed to verify etc. is another.

Arguments:

--dists-left=<number>

The maximum number of build packages that should remain after the garbage collector has made a sweep. Defaults to 1.

--help

Displays a summary of the available arguments and their use.

--out-dir=<path>

The directory where the outgoing build packages resides.

--poll=<seconds>

The number of secods to sleep between every sweep. Defaults to two hours.

--result-dir=<path>

The directory where the decompressed results are stored, ie. the path in --web-dir argument to the result parser.

--results-left=<number>

The maximum number of results that should remain after the garbage collector has made a sweep. Defaults to 11.

10. Creating your own Project

TBD