PifferPilfer

PifferPilfer

Who are the Roma Gypsies? What Their DNA Tells Us

Davide Piffer's avatar
Davide Piffer
Jun 28, 2026
∙ Paid

The Roma have historically been a difficult population to map genetically due to a lack of dense genomic data and a complex, highly disrupted migratory history. They are neither a standard European population nor a South Asian group simply transplanted westward. Instead, their genetic profile is defined by a severe historical founder event out of South Asia, followed by centuries of admixture and isolation inside Europe.

The genomic data used in this analysis is sourced from Font-Porterias et al. (2019), which provides high-resolution genome-wide data for European and Iberian Roma groups. This dataset allows us to move past broad global comparisons and instead compare the Roma directly against the populations they historically lived alongside or originated from: southern and southeastern Europeans, South Asians, and West or Central Asians.

Based on anthropological observations and cultural history, we can form clear baseline expectations for these comparisons. Genetically, we would expect to see signals matching their distinct physical profile, specifically shorter stature and darker skin pigmentation relative to host European populations. Behaviorally, their long-standing cultural customs of insularity and historically low rates of formal schooling predict lower polygenic scores for educational attainment.

By testing these expectations against an ancestry-adjusted dataset, we can see exactly where these traits track with neutral genetic drift and where they diverge.

The Roma Effect

To evaluate these trait differences, the comparison uses an individual-level regression model calculated separately for each polygenic score. Running the models both with and without ancestry controls allows us to disentangle the broad effects of geographic ancestry from actual positive selection.

The baseline, unadjusted model measures the raw trait difference between the Roma and the non-Roma comparison panel in standard deviation units. This shows the total observed genetic difference.

The second model adds the first five genetic principal components to control for population structure. By stripping away the background noise of deep geographic origins, this ancestry-adjusted model reveals whether a trait divergence vanishes as a mere byproduct of ancestry, or remains robust, signaling potential directional selection.

The table below examines whether the Roma mean is higher or lower than the pooled non-Roma comparison mean, showing how that contrast changes once those ancestry principal components are introduced.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Davide Piffer · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture