Github.com/broadinstitute/batch-e/batch-e

Batch Effect Analysis Pipeline

A general-purpose pipeline for measuring group-level batch effects in whole-genome sequencing (WGS) variant callsets. Compares a user-supplied grouping variable (e.g., sequencing center, instrument, lab) across genomic interval classes, stratified by genetic ancestry, using Hail on Spark.

Overview

Large-scale WGS consortia aggregate data from multiple sources, each with potentially different instruments, library prep protocols, and bioinformatics pipelines.


This is a companion discussion topic for the original entry at github.com/broadinstitute/batch-e/batch-e