J2a-M158 of Afghanistan, Pakistan, Punjab, Kazakhstan and Eastern Anatolia

One of my clients, tracing his male line to an ancestor with surname Khan of Ghazni, Afghanistan, commissioned me to analyze the STRs of several scientific study samples identified as being Y-DNA haplogroup J2a-M158.

The samples

A Pashtun from Kunar, Afghanistan designated J2a5 is in this paper: 
https://www.nature.com/articles/ejhg201259#MOESM27 – Table 4

Six Hazaras from various cities in Afghanistan are designated as J-M158/J2a5:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3314501 – Table S2

Two men from Turkey (NE Turkey + SE Turkey) were also found positive for this line, designated as J2D-M158 (old naming convention)

http://evolutsioon.ut.ee/publications/Cinnioglu2004.pdf – Page 140

These samples only tested STRs and are not on the YFull YTree. Before discussing the result of the STR analysis let’s look at what can be more definitely established from the next generation SNP tested samples, three of which are on the YFull YTree.

Bronze Age Diversity Centered Around the Indus Valley Civilization

A sample with surname Khan who traces his male line to Ghazni, Afghanistan did a Big Y test that resulted in a new branch being added to the YFull tree under J-M158 called J-FT323588, sharing only additional 2 SNPs in common with a Hazaragi-speaking sample from Pakistan.

If we assume that the Punjabi speaking sample from India is from Punjab, the geography of these three samples, that are more than 4000 years distantly related to one another, is roughly consistent with the territory of the Indus Valley Civilization (and periphery) that existed at that time.

There is also an ancient sample showing up on FTDNA’s tree as “Butkara 12451, 1000-800 BCE, Swat Valley, Pakistan”. This corresponds with the Gandhara Grave Culture.

The grave culture has been regarded as a token of the Indo-Aryan migrations but has also been explained by local cultural continuity. Estimates, based on ancient DNA analyses, suggest ancestors of middle Swat valley people mixed with a population coming from the Inner Asian Mountain Corridor, which carried Steppe ancestry, sometime between 1900 and 1500 BCE

Wikipedia’s Gandhara Grave Culture article attributing Narasimhan, Vagheesh M., et al. (2019)

If the ancient sample had predated the arrival of Indo-Aryan speakers, it would be proof that at least some J2a-M158 was already living in South Asia prior to their arrival in the Late Bronze Age. So this assumption is not yet proven through ancient samples.

So I think we shouldn’t get too distracted with the ethnic identities of the modern samples who have been STR tested, given that this line has most likely been living in South Asia since the Bronze Age. Of course, until ancient DNA is found, a later co-migration of all three samples’ ancestors to this area from somewhere else cannot be ruled out.

J2a-M158 in Afghanistan more affiliated with Central Afghanistan than any Modern Ethnic Group

At first glance, it may seem straightforward to conclude that J2a-M158 is associated with Hazaras, given that the only samples found positive in the study PMC3314501, that sampled all ethnic groups of Afghanistan, were six Hazaras.

However, the actual affiliation is regional, given that the J2a-M158 rate of Hazaras from Ozurgan is 66% (4/6), Ghor is 9% (1/11) and outside of these two regions is 2.3% (1/43).

Additionally, we don’t yet have a very clear picture of the regional diversity of J2a-M158 in Central Afghanistan because each of the four provinces surrounding Ozurgan were sampled at most once in this study (Kandahar: Balush – O, Daykundi: Hazara – C3, Zabul: not sampled, Helmand: Pashtun, L1C).

Another interesting takeaway from this study is that of the samples from Ozurgan and neighboring provinces, there is not a single R1a1a-M17, one of the major markers associated with the Indo-Aryan migrations. This may indicate that the mountains of Central Afghanistan served as a Y-DNA refugium, as mountains generally do, for indigenous male lines vs the Indo-Aryan newcomers, who mostly had R1a1a-M17. Simply put, the newcomers may not have wanted to or were unable to enter these mountains.

Interesting Anatolians

The two J2a-M158 samples from Turkey from the Cinnioglu (2004) study were unfortunately tested at a very low resolution of STRs, not surprising given that this study was published 20 years ago when STR testing was in its infancy.

Given that YFull has assessed that J2a-M158 acquired the mutation DYS393 = 12 -> 14, it is possible that the two Anatolian samples are not fully formed J2a-M158. If a sample from Turkey is ever found who splits J2a-M158 then this will be confirmed.


The two samples from Turkey are also quite divergent from one another among the other STRs, so may be a few thousand thousand years distantly related to one another (relatedness cannot be reliably computed from this low number of STRs).

There is no way to know at this point whether they form a common deep subclade from Anatolia or represent two different ancestral migrations to Anatolia, going as far back as Indo-Aryan Mitanni or as recently as Turkic groups (in each case this subclade of J2a would have been incorporated into these groups, among which it would have remained a minority).

Central Afghanistan Substructure

The dendrogram feature of STR Match Finder didn’t work for the very small subset of STRs in common for the entire set of samples. I will note that the Pashtun from Kunar has the exact STR haplotype as the six Hazaras from the Central Afghanistan.

The dendrogram indicates that the six Hazaras from Central Afghanistan are much more closely related to one another than they are to Khan from Ghazni. More testing of Aghans, especially from Central Afghanistan, is required to identify further substructure.

I want to emphasize that despite the large number of samples from Central Afghanistan, they all descend from a relatively recent common ancestor. So there is no compelling evidence yet that J2a-M158 itself originated in Central Afghanistan. It could be a fringe area that only one ancestor migrated to from a more fertile area of South Asia, i.e. core IVC geography centered on the rivers.

Uzbek-7 Sample from Kazakhstan may split J2a-M158

I asked Göran Runström of FTDNA about their J2a-M158 sample with Kazakhstan flag.

This is an ethnic Uzbek sample “Uzbek-7” from Kazakhstan (Zhabagin et al. 2022)

The raw data of J2a-M158 has not yet been obtained by YFull or FTDNA but there are indications it would result in splitting J2a-M158 because most samples have coverage of 10x and this sample was reported being positive for less than half of the J2a-M158 level SNPs (I don’t have experience reading this file type but this is per a conversation with Göran Runström). The publicly released data did not indicate negative negatives.

The interesting thing would then be to know if this sample also has ancestral DYS393 = 12 (if YFull gets this sample they should extract the STR alleles) and then whether or not this sample would form a more recent subclade with the two STR-study samples from Turkey who also share ancestral DYS393 = 12.

Due to the two scientific samples from Turkey having GD 4/10 it seems improbable for both of them to descend from a single Turkic migration era male line ancestor, however there is the chance that one of these samples is much more closely related to the Uzbek-7 sample, and if so, may possibly be indicative of a relatively recent migration to Turkey with Turkic/Turkicized peoples (i.e. the last 1500 years).

These posts are the opinion of Hunter Provyn, a haplogroup researcher in J-M241 and J-M102.

Leave a Reply

Your email address will not be published. Required fields are marked *