Github.com/broadinstitute/hailrunner/hailrunner-run

hailrunner

WDL workflow for running Hail jobs on ephemeral Google Cloud Dataproc clusters.

What it does

The hailrunner run workflow creates an ephemeral Dataproc cluster, submits a PySpark/Hail script, collects outputs from GCS, and destroys the cluster – all in a single WDL task. The cluster lifecycle is fully managed: if the job fails, the cluster is still torn down.

Docker image

The image at hailrunner/docker/Dockerfile bundles:

  • Python 3.11

This is a companion discussion topic for the original entry at github.com/broadinstitute/hailrunner/hailrunner-run