Does Ancient DNA Track Human Progress, or Just Time?
This post is a follow-up to my earlier essay, “Do Genetically Smarter Populations Climb the Civilization Ladder Earlier?”, which first explored whether educational-attainment polygenic scores in ancient DNA track not just time, but broad differences in civilizational development. The question remains the same, but the analysis is now much stronger. I revisit it using the far larger AADR v66 2M release, cleaner archaeological coding, stricter data-quality filters, and more rigorous statistical models designed to separate chronology from civilization more cleanly than before.
Introduction
Ancient DNA contains a surprise. When researchers extract polygenic scores linked to educational attainment from prehistoric skeletons, the scores rise through time (Piffer & Kirkegaard, 2024; Piffer, 2025). Hunter-gatherers score lower than early farmers, early farmers lower than Bronze Age populations, and so on. The standard interpretation is that this reflects something about the passage of time itself such as selection, migration, demographic turnover, or some mixture of all three.
But time is not the only thing changing across that span. Social and technological complexity are changing too. A Paleolithic forager band and an Iron Age kingdom are not just separated by millennia. They are separated by farming, writing, cities, specialization, storage, hierarchy, and the entire accumulated weight of what we loosely call civilization. So here is the question worth asking: when those polygenic scores rise through ancient history, are we really just watching a clock tick or are we watching something track the emergence of more complex ways of organizing human life?
In an earlier post I used ancient individuals from the AADR dataset to assign each archaeological period a civilization-stage score and ask whether that score predicts educational-attainment polygenic scores even after controlling for absolute date. In practice, that meant coding Paleo-Mesolithic groups as 1, Neolithic groups as 2, Copper Age groups as 3, Bronze Age groups as 4, and Iron Age groups as 5.
This follow-up revisits that question using corrected archaeological labels, the much larger AADR v66 2M dataset, and a stronger emphasis on the subset of samples whose period labels can be assigned directly from metadata rather than reconstructed from looser fallback rules. The point is not to abandon the original question. It is to ask whether the same broad pattern survives when the labels are cleaner and the sample is larger.
The leverage comes from a basic fact of world prehistory: civilization and chronology do not move in lockstep. Greece had farming millennia before Britain. Some populations at the same date occupied very different social worlds. That mismatch is what makes it possible to pull the two apart and ask which one is actually doing the work.
A time trend by itself could reflect selection, migration, demography, drift, or some mixture of all of them. The harder question is whether the relevant axis is not just time, but civilization.
If you take two ancient populations from different social worlds, but do not let chronology do all the explanatory work, does their place on the civilizational ladder still matter?
The modern world has made one thing hard to miss. Cognitive performance, schooling, institutional complexity, and economic development cluster together. But once we move back into deep history, the picture becomes much hazier. We still know remarkably little about whether the genetic variants associated with educational attainment track not just the passage of time, but the emergence of more complex forms of social organization.
Gregory Clark argued that preindustrial societies may have undergone differential reproduction in ways that slowly shifted the distribution of traits favorable to economic success. Galor and Moav built a related idea into a formal evolutionary growth framework. In their model, the Malthusian world did not simply hold humanity down. It also created selection pressures that gradually favored traits complementary to human capital, technology, and later economic takeoff.
I am not testing their mechanism directly, but the question points in the same direction. If traits linked to educational attainment were historically relevant to the capacity of populations to sustain more complex forms of life, then archaeological stage should retain some predictive power even after absolute time is held constant. In that sense, the civilization-stage score is crude, but it is also useful. It is a rough proxy for where a population stood in the long transition from simple subsistence regimes to more knowledge-intensive and organizationally demanding societies.
At any given date, some populations were still living in relatively simple social worlds while others had already crossed into much more complex ones. And populations grouped into the same broad archaeological stage could be separated by very large spans of time. That mismatch is what makes the whole exercise possible. Once we exploit the fact that date and stage are misaligned, we can ask the real question: when educational-attainment polygenic scores rise through ancient time, are we just looking at chronology, or are we also looking at civilization?
The key identification fact is simple: archaeological stage and absolute date do not move in lockstep, as shown in Fig. 1.
Figure 1. Chronology and Civilization Stage Do Not Move in Lockstep

Brief Methods Note
This follow-up uses the AADR v66 2M panel. After excluding modern individuals, archaic and non-human samples, and groups lacking the variables needed for the model, the full ancient analysis set contains 12,221 individuals. The stricter metadata-only subset, in which archaeological periods come directly from metadata rather than heuristic assignment, contains 8,069 individuals. The main specification is a weighted mixed model with ancestry PCs, coverage, latitude, absolute date, and a random intercept for merged archaeological group (`Group_base`). The headline robustness checks focus on the metadata-only subset and on the <=12k BP window.
The focal trait is educational attainment because it is the best-powered available polygenic index for the broad cluster of abilities and dispositions that modern societies reward as human capital. Delay discounting was included for a different reason: as an exploratory proxy for the non-cognitive side of human capital, especially future orientation and willingness to defer immediate reward. Height was included as a comparison trait rather than a substantive target. If Height were flat while EA remained strongly structured, that would make the EA result easier to interpret as trait-specific rather than background structure.
Results
The first result is the one that motivated this entire line of inquiry: educational-attainment polygenic scores still rise through time in the new dataset. More recent ancient samples tend to have higher EA scores than older ones. That part of the story survives the larger AADR release, corrected labels, and score-specific weighting.
But the more important result is that chronology does not seem to exhaust the signal. The cleanest way to show that is to treat archaeological stage as an ordered outcome and ask whether higher EA scores predict assignment to later stages once absolute date, ancestry PCs, coverage, and latitude are controlled. In those ordered-probit models, the answer is yes. The association survives not just in the metadata-only subset, but also in the stricter metadata-plus-direct-date and metadata-plus-low-uncertainty subsets.
The main result is not simply that later samples score higher on EA. It is that, net of date and ancestry controls, higher EA scores are associated with later archaeological stage, and that relationship is more robust than the corresponding Height association in the stricter subsets.
The headline specification treats archaeological stage as an ordered outcome rather than a continuous score.
Table 1. Trait effects and Years BP controls across dating-quality subsets.

The real question is whether these results hold up once you start tightening the screws: stricter dating, metadata-only classifications, comparison traits, and alternative model specifications. That is where this either becomes interesting or falls apart.

