Batch Effect Analysis Pipeline
A general-purpose pipeline for measuring group-level batch effects in whole-genome sequencing (WGS) variant callsets. Compares a user-supplied grouping variable (e.g., sequencing center, instrument, lab) across genomic interval classes, stratified by genetic ancestry, using Hail on Spark.
Overview
Large-scale WGS consortia aggregate data from multiple sources, each with potentially different instruments, library prep protocols, and bioinformatics pipelines.
This is a companion discussion topic for the original entry at github.com/broadinstitute/batch-e/batch-e