aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorMatthijs van der Wild <matthijs.van-der-wild@durham.ac.uk>2024-09-30 16:19:51 +0100
committerMatthijs van der Wild <matthijs.van-der-wild@durham.ac.uk>2024-09-30 16:19:51 +0100
commit9246d90121fb9beb87796ca5dc9b8758daaaeb45 (patch)
treed8ac9bdcf3fc527150bd0b008e453be4da6b2a84 /README.md
Initialise repositories
Diffstat (limited to 'README.md')
-rw-r--r--README.md44
1 files changed, 44 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..ba7adb8
--- /dev/null
+++ b/README.md
@@ -0,0 +1,44 @@
+# LOFAR PILOT
+
+This is a small pipeline runner script that wraps Common Workflow Language ([CWL](https://www.commonwl.org/) pipelines with [toil](https://toil.readthedocs.io).
+It is compatible with [LINC](https://git.astron.nl/RD/LINC) and the [VLBI](https://git.astron.nl/RD/VLBI-cwl/) pipelines.
+*This is a work in progress.
+Issues should be reported to [Matthijs van der Wild](mailto:matthijs.van-der-wild@durham.ac.uk).*
+
+## Assumptions
+
+This script assumes the following:
+* All relevant input data is available either in either the `$HOME` directory or in a directory henceforth called `$BINDDIR`.
+ Targets of any links in these directories should be accessible to the compute directories, as these will be mounted during relevant jobs.
+* The output will be written to a results directory in `$BINDDIR`.
+* This script will be used with the SLURM queuing system on COSMA5 with the following options: `-p cosma5 -A durham -t 72:00:00`.
+ If these options are not appropriate or if this script is to be run on other SLURM-run clusters one must set `$TOIL_SLURM_ARGS` prior to running.
+* `$CWL_SINGULARITY_CACHE` is set and the corresponding path contains (a link to) a singularity container `vlbi-cwl.sif`.
+ If it isn't set a suitable container can be specified as detailed below.
+
+## Execution
+
+The script can be run as follows:
+```
+sh pilot.sh [options] <workflow name> $BINDDIR
+```
+Options can be the following:
+* `-h` prints the script usage with all available options (optional).
+* `-r` restarts a failed pipeline, if this script was run before but the pipeline failed.
+* `-c` allows the pipeline to use the specified container (optional, VLBI pipeline only).
+* `-i` points to your input JSON file (so it can be any appropriate JSON file, as long as it is located in either `$HOME` or `$BINDDIR`.
+* `-p` is a path to the pipeline repository (LINC and VLBI pipeline only).
+* `--scratch` is a path to local scratch storage where temporary data can be written to (optional).
+ **`--scratch` must be local to the compute node.
+ Nonlocal scratch storage will likely cause the pipeline to fail.**
+* `<workflow name>` is the workflow file name without extension, e.g. `delay-calibration` or `concatenate-flag` for the VLBI pipeline or `HBA_calibrator` or `HBA_target` for LINC.
+
+## Notes
+
+* Upon successful pipeline completion the results directory contains the following:
+ * The pipeline data products,
+ * the statistics gathered by toil.
+* Jobstore files and intermediate pipeline data products are stored in a `toil` directory in `$BINDDIR`.
+* Jobstore files can be removed by running `toil clean $BINDDIR/toil/<workflow>_job`.
+* Toil may not clear temporary files after the pipeline has finished.
+ These have to be removed by hand.