2.4. Experiment execution — launcher
The launcher program is provided by the launcher
module, and controls the execution of experiments (called jobs). It also keeps track of the execution status of each job (not run, running, failed or successfully executed), so that it can act just on a subset of the jobs (e.g., re-run all failed jobs).
launcher supports different predefined execution systems, from simple local shell scripts to jobs in a cluster, but can be easily extended to support other systems with a few lines of code.
2.4.1. Inspecting templates and execution systems
The following commands show the available templates and their definition:
- list-templates
Shows a list of the available templates.
- show-template TEMPLATE_NAME
Shows the contents of the specified template.
2.4.2. Acting on jobs
You can perform one of the following action on jobs:
- summary
Prints a summary of the state of the selected jobs.
- state
Prints the state of the selected jobs.
- monitor
Monitors the state of selected jobs.
- submit
Submits the selected jobs to execution.
- kill
Kills the execution of selected jobs.
- files
List files matching a given expression.
After the action command, you must provide a job descriptor file produced by generate_jobs
. Alternatively, the job descriptor file itself can be used as an executable, as long as launcher is in your PATH
. In this case you, must not explicitly pass this file as an argument after the action command.
2.4.3. Selecting jobs by their state
It is usually advisable to apply an action only to jobs with a specific state. Jobs can be selected by providing one or more of the following flags:
- -n, --notrun
Jobs that have still not been run.
- -o, --outdated
Jobs that are out of date (have been successfully executed, but one of their dependencies has been updated).
- -r, --running
Jobs that are currently running.
- -d, --done
Jobs that have been successfully executed.
- -f, --failed
Jobs that have failed exeucting.
If not provided, launcher defaults to all jobs. Thus, all jobs that have not been run successfully executed (including those that are currently running) can be (re)executed with:
./jobs.jd submit -rfn
This is a shorthand for:
launcher submit -r -f -n ./jobs.jd
2.4.4. Selecting jobs with user-defined filters
launcher can restrict the set of jobs to operate on by using filters. You can set what variables are available to filters wen invoking the generate_jobs
method. You can also inspect them with launcher:
- variables
Show the variables (and, optionally, values) available on a job descriptor file.
Following the previous example on experiment creation, a filter can be used to execute all the jobs for benchmark foo
, but only if they have an l2 size between 4 and 16 KB:
./experiments/jobs.jd submit "benchmark == 'foo'" "4 <= l2" "l2 <= 16"
Providing multiple filters is a shorthand for joining them with and
:
./experiments/jobs.jd submit "benchmark == 'foo' and 4 <= l2 and l2 <= 16"
As a shorthand for filters selecting variables that actually are paths, immediate values can also be specified. For every job, these non-filter values will match on any of the following conditions:
The argument is the value of some variable.
The argument is a relative path from the current directory to a value of some variable interpreted as a relative path from the base directory generated by
Experiments
.The argument is an absolute path from the current directory to a value of some variable interpreted as a relative path from the base directory generated by
Experiments
.
Some of these paths are printed on the screen when querying the state of jobs; for example, a non-run job shows the path to the execution script, identified by the LAUNCHER
variable (last three commands are equivalent):
$ ./experiments/jobs.jd state
( ) experiments/jobs/foo-0-1-2-1-2-1.sh
( ) experiments/jobs/foo-0-1-2-1-2-2.sh
( ) experiments/jobs/foo-0-1-2-1-2-4.sh
...
# with regular filter
$ ./experiments/jobs.jd submit "LAUNCHER=='jobs/foo-0-1-2-1-2-1.sh' || LAUNCHER=='jobs/foo-0-1-2-1-2-2.sh'"
# immediate value
$ ./experiments/jobs.jd submit jobs/foo-0-1-2-1-2-1.sh jobs/foo-0-1-2-1-2-2.sh
# relative path
$ ./experiments/jobs.jd submit experiments/jobs/foo-0-1-2-1-2-1.sh experiments/jobs/foo-0-1-2-1-2-2.sh
Note that if you provide multiple of these arguments they are all or’ed together, and then and’ed with any argument and state filters in the command line (if any).
2.4.5. File house-keeping
You can use the command files
to operate on files based on job properties. For example, to remove the DONE
files of all failed jobs:
$ ./experiments/jobs.jd files --failed {{DONE}} | xargs rm -f
Note
Absolute paths are always preserved. Relative paths are searched on the output directory of the job descriptor file (attribute sciexp2.expdef.experiments.Experiments.out
).
This is very handy to synchronize the results of your successful jobs onto a separate directory (e.g., to store them on your project’s repository), while ensuring that results from jobs that no longer exist are removed (e.g., from jobs that were once generated, but no longer exist):
# synchronize successful results
$ rsync -avz `./experiments/jobs.jd files res/{{ID}}.done -d` /path/to/successful/results/
# remove results of non-existing jobs
$ ./experiments/jobs.jd files --not-expanded /path/to/successful/results/{{ID}}.done | xargs rm -f
Note how the --not-expanded
argument selects all files that match the provided expression but are not expanded from any of the jobs in the job descriptor file. If you instead wanted to delete all results from both non-existing and non-successful jobs:
$ ./experiments/jobs.jd files --not-expanded --done /path/to/successful/results/{{ID}}.done | xargs rm -f
2.4.6. Integrating job execution
The job execution facilities can be integrated into a bigger project by using the launcher
module instead of bluntly executing launcher as an external command.
2.4.7. Writing new execution systems
You can override existing execution systems, and add new ones by creating a Python module with the same name as the system, and adding it to Python’s module search path.
See also