Github.com/broadinstitute/popout/vcf-to-pgen

popout

GPU-accelerated local ancestry inference at biobank scale — no reference panel required.

Feed it phased WGS from a large cohort and ancestry structure falls out of the joint distribution.

How it works

With 500K+ samples, the data is the reference panel. The pipeline:

  1. SEED — Randomized SVD on a SNP subset projects all haplotypes into PCA space. GMM assigns soft ancestry labels. Number of ancestries auto-detected from the eigenvalue gap.

  2. INIT — Allele frequenc


This is a companion discussion topic for the original entry at github.com/broadinstitute/popout/vcf-to-pgen