popout
GPU-accelerated local ancestry inference at biobank scale — no reference panel required.
Feed it phased WGS from a large cohort and ancestry structure falls out of the joint distribution.
How it works
With 500K+ samples, the data is the reference panel. The pipeline:
-
SEED — Randomized SVD on a SNP subset projects all haplotypes into PCA space. GMM assigns soft ancestry labels. Number of ancestries auto-detected from the eigenvalue gap.
-
INIT — Allele frequenc
This is a companion discussion topic for the original entry at github.com/broadinstitute/popout/vcf-to-pgen