Github.com/broadinstitute/popout/filter-pgen

popout

GPU-accelerated local ancestry inference at biobank scale — no reference panel required.

Feed it phased WGS from a large cohort and ancestry structure falls out of the joint distribution.

How it works

With 500K+ samples, the data is the reference panel. See docs/THEORY.md
for the full mathematical treatment. The pipeline:

  1. SEED — Randomized SVD on a SNP subset projects all haplotypes into PCA space. GMM assigns soft ancestry labels. Number of ance

This is a companion discussion topic for the original entry at github.com/broadinstitute/popout/filter-pgen