Kevin Bird's fatal flaw and why “Fst” can fool you when looking for genes that make populations different
Imagine two neighbouring villages, A and B.
In village A most people are tall; in village B most people are short.
You want to know whether natural selection made the villages diverge in height, and you hope to spot the genes involved.
A common first step is to scan the whole genome and ask: “Are the DNA-letter differences that influence height more different between villages than the rest of the genome?”
The yard-stick most labs still reach for is Fst, a number that runs from 0 (no difference) to 1 (complete difference).
If the height-related SNPs have higher Fst than random SNPs, you shout “eureka” and publish.
That sounds sensible—until you realise Fst can stay tiny even when selection has been strong.
Below is a plain-language tour of why that happens, and what to do instead.
1. Polygenic traits: a crowd, not a hero
Height, blood pressure, years of schooling and many other traits are polygenic: hundreds or thousands of genetic variants each nudge the trait up or down by a millimetre, a point, or half a year.
Under divergent selection (say, village A favours tall people, village B favours short ones) all the “tall” versions move up in frequency in A, while all the “short” versions move up in B.
Crucially, the shift at each single locus is tiny—maybe from 46 % to 54 %.
Fst, which measures the squared allele-frequency difference, therefore stays small for every individual SNP.
2. The hidden orchestra: allelic covariance
The magic happens across loci.
Even though each violin plays only a little louder, the entire orchestra is now perfectly in tune.
Geneticists call this harmony allelic covariance (or between-population linkage disequilibrium): the tall-increasing alleles at hundreds of scattered sites rise together in village A, and fall together in village B.
The result is a big difference in the average trait value (the Qst, or quantitative divergence) even though Fst at each single site is practically zero.
3. Why the Fst-enrichment test misses the music
The classic Fst enrichment test asks:
“Is the average Fst of my trait-associated SNPs higher than the average Fst of random SNPs?”
But it only listens to one speaker at a time.
If every speaker is only slightly louder, the test will shrug and say “no difference”, giving a false negative.
Meanwhile the whole choir is singing fortissimo.
This concept was formalized by Le Corre and Kremer (2012), and has largely been ignored by most geneticists. To be sure, Berg and Coop (2014) introduced a selection test based on this theory, but Coop later rejected its validity due to environmental confounding in GWAS studies.
Real-world examples:
Forest trees (2007-2010 studies): Candidate genes for drought tolerance showed Fst barely above neutral markers, yet the Qst for drought tolerance was large.
Human height and education: Factor-analysis of thousands of SNPs shows strong allelic covariance between across world populations, but the per-SNP Fst is still tiny (Piffer, 2023).
4. No garden? No problem. How to estimate Qst with polygenic scores
Traditional Qst requires raising individuals from different populations in a common-garden experiment to strip away environmental noise.
With large-scale genomics we can now skip the greenhouse and compute a “polygenic Qst” in three simple steps.
Compute polygenic scores
Compute variance components: PGS variance between populations: VB = Var(mean PGS of each population); PGS variance within populations: VW = mean(Var(PGS within each population))
Calculate polygenic Qst
Qst_PGS = VB / (VB + 2 VW)
The factor 2 appears because we are dealing with diploid genotypes
VB captures the genetic differences in the trait that selection could have created.
VW captures the remaining genetic variation that is still segregating inside each population.
Environmental noise is largely sidestepped because the PGS is built only from DNA.
The estimate is only as good as the GWAS effect sizes (mostly European samples today) and assumes the same genetic architecture across populations. Still, it is a garden-free, scalable way to ask: Has divergent selection shifted this polygenic trait between groups?
When I did this, I found out that Qst was much larger than Fst, and much larger than Qst calculated using randomly shuffled GWAS effect sizes (Piffer, 2023):
Whereas global Fst is typically around 0.1, Qst for EA3 was 0.58 and for EA4 it was 0.91.
Why Bird’s IQ conclusion flies off course
Picture a 100-metre sprint.
If every runner’s stride length is only 1 cm longer in population A than in population B, the per-locus Fst—the difference in the “stride-length gene” frequency—is tiny. But if 500 such genes all shift in the same direction, population A’s average sprint time (the trait) can be markedly faster. The genetic differentiation of the trait (Qst) is now much larger than the per-gene differentiation (Fst). This is what happens for polygenic traits, as Berg and Coop (2014) demonstrated.
Bird (2021) missed this concept. Here is what went wrong:
He renamed Qst “phenotypic Fst” and then demanded it equal Fst.
Under simple neutrality we do expect Qst = Fst. But divergent selection routinely yields Qst > Fst, exactly the pattern we later show for educational attainment and height. Calling the trait-level divergence “Fst” and then declaring the gene-level Fst “too small” is like insisting the orchestra’s volume must equal the volume of a single violin.He forgot the choir effect of cross-population LD.
When many small-effect alleles move in lock-step between populations, their covariance inflates the trait variance between groups while each allele’s individual Fst remains modest.His yard-stick was half the length.
Bird compared his “phenotypic Fst” (≈ 0.6) to EUR-AFR Fst of 0.11 and concluded genetics can, at best, explain 4.7–8.5 IQ points. Once we replace his mis-named statistic with the correct Qst = 0.61, the genetic contribution aligns almost perfectly with the observed 30-point gap—no environmental deus ex machina required.Empirical check.
Using the EA3 polygenic score and accounting for allelic covariance, we compute Qst ≈ 0.61, the same magnitude Bird obtained for IQ. Far from refuting a genetic explanation, his own numbers—once interpreted correctly—support it.
Bottom line: Qst ≠ Fst for polygenic traits under selection. Bird’s paper is therefore not evidence against a genetic contribution to group differences; it is a textbook illustration of why confusing the violin’s volume with the orchestra’s leads to the wrong crescendo.
🧠 Bird’s Conceptual Mistake: Treating Qst as a Genetic Estimate of Fst
Bird did not merely mislabel Qst as “Fst.” He conceptually misused the formula by assuming that multiplying a phenotypic mean difference by heritability yields an estimate of genetic differentiation (Fst). This is fundamentally wrong because:
Fst ≠ Qst × h²
Fst measures allele frequency divergence at individual loci.
Qst measures trait-level divergence under an additive infinitesimal model.
There is no general identity relating Fst to Qst × h² unless extremely restrictive assumptions hold (e.g., neutrality, equal effect sizes, no LD, no dominance/epistasis, etc.).
Heritability Adjusts Variance, Not Differentiation
Bird’s calculation: "Genetic difference = phenotypic difference × h²"
only converts phenotypic between-group variance into additive genetic between-group variance.
But genetic variance ≠ genetic differentiation (Fst).
Example:Two populations could have identical Fst ≈ 0 at all loci (no allele frequency divergence), yet extremely different Qst if covariance among loci (cross-population LD) shifts trait means.
Conversely, high Fst at many loci might cancel out (directional effects oppose each other), yielding Qst ≈ 0.
Ignoring Cross-Population LD
Bird’s formula assumes independence of loci, but polygenic adaptation often works via coordinated frequency shifts (covariance).
This inflates Qst relative to Fst, making Qst >> Fst even when single-locus Fst is tiny.Misinterpreting the Null Expectation
Under neutrality, Qst = Fst only if:All loci have equal, additive effects.
No LD or epistasis.
No selection.
Since none of these hold for complex traits, Qst > Fst is expected under divergent selection.
Bird treated Qst > Fst as evidence against genetic effects, when in reality it supports them.
✅ Correct Interpretation
Bird’s calculation estimates Qst (trait divergence), not Fst (allele divergence).
Qst >> Fst is predicted for polygenic traits under selection, so his result does not refute genetic contributions.
The real question is whether Qst ≈ expected under selection, not whether Qst = Fst.
Thus, the flaw is conceptual: he equated additive genetic variance between groups with allele frequency differentiation, which only holds under simplistic, unrealistic assumptions.
References
Berg JJ, Coop G (2014). "A Population Genetic Signal of Polygenic Adaptation." PLoS Genet. 10(8):e1004412. doi:10.1371/journal.pgen.1004412.
Bird KA (2021). "No support for the hereditarian hypothesis of the Black–White achievement gap using polygenic scores and tests for divergent selection." Am J Phys Anthropol. 175(2):465–476. doi:10.1002/ajpa.24216.
Le Corre V, Kremer A (2012). "The genetic differentiation at quantitative trait loci under local adaptation." Mol Ecol. 21(7):1548–1566. doi:10.1111/j.1365-294X.2012.05479.x.
Piffer, D. (2023). Signals of Human Polygenic Adaptation: Moving Beyond Single-Gene Methods and Controlling for Population-Specific Linkage Disequilibrium. Qeios. doi:10.32388/HDJK5P.2.
Can we raise IQ eventually then or what's the current hurdles apologies if it sounds stupid I'm not a geneticist
Great article, David. The one minor thing that got under my skin a little was the orchestra analogy. Sound is logarithmic, not linear. So eg 10 people playing slightly louder would only increase the apparent sound level by something like 2x rather than 10x that individual increase.
I found the stride length analogy to be perfect, even if it doesn't have the same artistic panache as the orchestra.
I tried to think of other additive analogies, but nothing really better than your runner off the top of my head.
I asked the AI and this analogy stood out as both additive and artistically beautiful:
"Each thread in a tapestry adds only a sliver of color—but when thousands of threads are aligned just right, a complex, coherent image emerges. If just one thread changes color, the difference is imperceptible. But if thousands of them shift slightly, the whole image transforms."
Why it works: Additive, intuitive, beautiful. It gets across both accumulation and pattern, just like polygenic traits.
Especially strong for traits like IQ or personality, where what emerges is qualitative from many quantitative increments.