The setup and scripts for setting up Dataproc and using Hail to generate a sparse matrix callset for the UKBB Pharma5 effort.


  • A sample map tsv file containing two columns, the sample name and a gs:// path to the sample
    • This tsv should include any control samples you want included in your sparse matrix output
  • A Google Cloud Services project that has permissions to access the input and output buckets for your batch

High level steps

  1. Setup y

