hailrunner
WDL workflow for running Hail jobs on ephemeral Google Cloud Dataproc clusters.
What it does
The hailrunner run workflow creates an ephemeral Dataproc cluster, submits a PySpark/Hail script, collects outputs from GCS, and destroys the cluster – all in a single WDL task. The cluster lifecycle is fully managed: if the job fails, the cluster is still torn down.
Docker image
The image at hailrunner/docker/Dockerfile bundles:
- Python 3.11
This is a companion discussion topic for the original entry at github.com/broadinstitute/hailrunner/hailrunner-run