2.4. Experiment execution — `launcher`

The launcher program is provided by the launcher module, and controls the execution of experiments (called jobs). It also keeps track of the execution status of each job (not run, running, failed or successfully executed), so that it can act just on a subset of the jobs (e.g., re-run all failed jobs).

launcher supports different predefined execution systems, from simple local shell scripts to jobs in a cluster, but can be easily extended to support other systems with a few lines of code.

2.4.1. Inspecting templates and execution systems

The following commands show the available templates and their definition:

list-templates: Shows a list of the available templates.

show-template TEMPLATE_NAME: Shows the contents of the specified template.

2.4.2. Acting on jobs

You can perform one of the following action on jobs:

summary: Prints a summary of the state of the selected jobs.

state: Prints the state of the selected jobs.

monitor: Monitors the state of selected jobs.

submit: Submits the selected jobs to execution.

kill: Kills the execution of selected jobs.

files: List files matching a given expression.

After the action command, you must provide a job descriptor file produced by generate_jobs. Alternatively, the job descriptor file itself can be used as an executable, as long as launcher is in your PATH. In this case you, must not explicitly pass this file as an argument after the action command.

2.4.3. Selecting jobs by their state

It is usually advisable to apply an action only to jobs with a specific state. Jobs can be selected by providing one or more of the following flags:

-n, --notrun: Jobs that have still not been run.

-o, --outdated: Jobs that are out of date (have been successfully executed, but one of their dependencies has been updated).

-r, --running: Jobs that are currently running.

-d, --done: Jobs that have been successfully executed.

-f, --failed: Jobs that have failed exeucting.

If not provided, launcher defaults to all jobs. Thus, all jobs that have not been run successfully executed (including those that are currently running) can be (re)executed with:

./jobs.jd submit -rfn

This is a shorthand for:

launcher submit -r -f -n ./jobs.jd

2.4.4. Selecting jobs with user-defined filters

launcher can restrict the set of jobs to operate on by using filters. You can set what variables are available to filters wen invoking the generate_jobs method. You can also inspect them with launcher:

variables: Show the variables (and, optionally, values) available on a job descriptor file.

Following the previous example on experiment creation, a filter can be used to execute all the jobs for benchmark foo, but only if they have an l2 size between 4 and 16 KB:

./experiments/jobs.jd submit "benchmark == 'foo'" "4 <= l2" "l2 <= 16"

Providing multiple filters is a shorthand for joining them with and:

./experiments/jobs.jd submit "benchmark == 'foo' and 4 <= l2 and l2 <= 16"

As a shorthand for filters selecting variables that actually are paths, immediate values can also be specified. For every job, these non-filter values will match on any of the following conditions:

The argument is the value of some variable.
The argument is a relative path from the current directory to a value of some variable interpreted as a relative path from the base directory generated by Experiments.
The argument is an absolute path from the current directory to a value of some variable interpreted as a relative path from the base directory generated by Experiments.

Some of these paths are printed on the screen when querying the state of jobs; for example, a non-run job shows the path to the execution script, identified by the LAUNCHER variable (last three commands are equivalent):

$ ./experiments/jobs.jd state
( ) experiments/jobs/foo-0-1-2-1-2-1.sh
( ) experiments/jobs/foo-0-1-2-1-2-2.sh
( ) experiments/jobs/foo-0-1-2-1-2-4.sh
...
# with regular filter
$ ./experiments/jobs.jd submit "LAUNCHER=='jobs/foo-0-1-2-1-2-1.sh' || LAUNCHER=='jobs/foo-0-1-2-1-2-2.sh'"
# immediate value
$ ./experiments/jobs.jd submit jobs/foo-0-1-2-1-2-1.sh jobs/foo-0-1-2-1-2-2.sh
# relative path
$ ./experiments/jobs.jd submit experiments/jobs/foo-0-1-2-1-2-1.sh experiments/jobs/foo-0-1-2-1-2-2.sh

Note that if you provide multiple of these arguments they are all or’ed together, and then and’ed with any argument and state filters in the command line (if any).

2.4.5. File house-keeping

You can use the command files to operate on files based on job properties. For example, to remove the DONE files of all failed jobs:

$ ./experiments/jobs.jd files --failed {{DONE}} | xargs rm -f

Note

Absolute paths are always preserved. Relative paths are searched on the output directory of the job descriptor file (attribute sciexp2.expdef.experiments.Experiments.out).

This is very handy to synchronize the results of your successful jobs onto a separate directory (e.g., to store them on your project’s repository), while ensuring that results from jobs that no longer exist are removed (e.g., from jobs that were once generated, but no longer exist):

# synchronize successful results
$ rsync -avz `./experiments/jobs.jd files res/{{ID}}.done -d` /path/to/successful/results/

# remove results of non-existing jobs
$ ./experiments/jobs.jd files --not-expanded /path/to/successful/results/{{ID}}.done | xargs rm -f

Note how the --not-expanded argument selects all files that match the provided expression but are not expanded from any of the jobs in the job descriptor file. If you instead wanted to delete all results from both non-existing and non-successful jobs:

$ ./experiments/jobs.jd files --not-expanded --done /path/to/successful/results/{{ID}}.done | xargs rm -f

2.4.6. Integrating job execution

The job execution facilities can be integrated into a bigger project by using the launcher module instead of bluntly executing launcher as an external command.

2.4.7. Writing new execution systems

You can override existing execution systems, and add new ones by creating a Python module with the same name as the system, and adding it to Python’s module search path.

2.4. Experiment execution — launcher