How Simpson’s Paradox Created the “Aryan” Myth
Selection, geography, and the illusion of an Aryan package
In
I discussed what I called the Aryan paradox. The paradox arises from the close association, in modern Europe, between Steppe ancestry and a set of traits traditionally labeled “Aryan”.
By “the Steppe” I don’t mean some vague grassland anywhere on Earth. I mean the Pontic–Caspian steppe: the belt of open grasslands north of the Black Sea and the Caspian Sea, stretching across what is now southern Ukraine and southern Russia and into the broader Eurasian steppe zone. This is the region where the best-known early “Steppe ancestry” source populations—often associated with Yamnaya-related groups—lived during the Late Neolithic and Early Bronze Age.
These Steppe groups were primarily pastoralists, with economies centered on herding (cattle, sheep/goats) and high mobility across open terrain. Over the Bronze Age, Steppe-derived societies are also linked to key technological and cultural developments in Eurasia, including wheeled vehicles and, later, chariot warfare (though the classic light chariot horizon is later than the earliest Yamnaya expansion and is not the main driver of the first Steppe ancestry pulse into Europe).
Linguistically, these Steppe populations are widely regarded as the most plausible homeland of Proto-Indo-European, the ancestral language that later diversified into branches spread across Europe and parts of Asia. The Indo-Aryans—historically attested in South Asia—are one descendant branch of this wider Indo-European expansion. That’s why, in popular imagination and older ethnolinguistic usage, Steppe ancestry sometimes gets treated as the biological substrate of “Aryanness,” even though “Aryan” is strictly a linguistic/cultural label in its original scholarly sense.
If we look at Europe today, populations with more Steppe ancestry tend to be taller, blonder, lighter-skinned, and more likely to have blue eyes. It is therefore tempting to conclude that the Steppe migrations brought a coherent Aryan package of phenotypes into Europe, and that modern northern Europeans simply preserved these traits from their Steppe ancestors.
Ancient DNA complicates this story. The Steppe populations themselves were not particularly blond or light-skinned by modern northern European standards. Yet today, the regions with the highest Steppe ancestry are also the regions where these traits reach their highest frequencies.
The resolution of the paradox was recent selection. Steppe ancestry correlates with these traits today, not because the Steppe introduced them in finished form, but because Steppe-rich populations ended up living in northern, low-UV environments where selection continued to push pigmentation traits for thousands of years after the Bronze Age.
In this post, I show that lactase persistence follows the same pattern, and I explain why this constitutes a genuine instance of Simpson’s paradox in population genetics.


