83% of Socotrans descend from one J2b-M205 man who lived 3100 years ago

I've been meaning to write an article about the interesting phenomenon regarding the overwhelming majority of Socotrans being J2b-M205 for several months now.

All the computations I make here are based solely on samples from the YFull tree who have indicated paternal line descent to Socotra and elsewhere. Others who have access to STR results of men from Socotra may come up with a different calculation.

The exact statistics computed from the YFull YTree are 15/18 Socotra marked samples being a particular branch of J2b-M205 known as J2b-Y45076 whose most recent common ancestor lived about 3100 years ago.

I hesitated to write this article because I haven't been able to learn about the history of Socotra.

But since I've recently adapted my new mt Heatmap code to work for the geolocated samples on the YFull YTree, I'm motivated to share some of these freshly computed relative frequency heatmaps.

I also make use of the scientific view of the YFull tree and the Y Diversity Heatmap I developed last year.

Y Heatmap Alpha relative frequency map of J2b-Y45076 computed from geolocated samples from the YFull YTree v10.04 on 7/05/2021

Some confusion regarding J* nomenclature from an older study

Wikipedia says that the majority of male residents of Socotra are in the J* subclade.

I'm not quoting this to bash Wikipedia (after all this is where I must admit that I get most of my supplemental "knowledge" - it is convenient and you can follow the links to sources) but just pointing out how exciting it is that by following open source genetic data from the YFull YTree we can know more than has yet been documented.

In fact, kudos to the volunteer content creators / curators of Wikipedia for having introduced a "Genetics" subsection to most articles.

Presumably, when the first Socotrans' Y-DNA was tested, J2b had not yet been defined by SNPs. Or maybe they meant that Socotrans are either J1 or J2. However in genetics when we say J*, the asterisk is supposed to mean a sample that is positive for J but negative for all downstream.

Such a man does not exist on the YFull YTree.

See for yourself - https://www.yfull.com/tree/J/


I read the part of the 2009 paper claiming the J* haplogroups.


The paper claims the following Y-DNA haplogroup counts among men mostly sampled from harder to reach parts of Socotra:

  • J*(xM172,M267) - 45
  • J1 - 9
  • E - 6
  • F*(xJ,K) - 1
  • K*(xO,P) - 1
  • R*(R1b) - 1

I don't know at this time where the YFull Socotrans hail from in Socotra, but I wonder if these J* samples from the study are actually the same line as our J2b-M205 men, given that 15/18 is not much off from 45/63, both being supermajorities. That would imply there was a problem with the M172 SNP test conducted by the study always coming back negative.

Additional targeted sampling of the same regions tested in the study could get to the truth of it.

The Peopling / Re-peopling of Socotra

According to Wikipedia:

"There was initially an Oldowan lithic culture in Socotra. Oldowan stone tools were found in the area around Hadibo by V.A. Zhukov, a member of the Russian Complex Expedition in 2008."

I'm not sure if there is any other evidence that humans continued to live on Socotra from then until now. That's where the DNA evidence can come in.


Based on their Y-DNA relationships and where the man trace their male line descent, a strong case can be made that a J2b-Y45076 man or several of his descendants later migrated to Socotra from the approximate region of what is now called Dhofar Governorate of Oman. YFull estimates the most recent common ancestor of this line to have lived around 1100 BCE.

These Socotrans likely came from Dhofar

Dhofar Governorate has the highest relative frequency of J2b-Y45076 outside of Socotra. While the map calculates a peak of 15-18% the raw data used to compute it are 19 of 83 Dhofar marked samples are in this line, roughly 23%.

(The reason for the difference is that no man in this lineage marked his sample only as Oman without regional code. Therefore there is some weight from Oman country-specificity samples (none of which are J2b-Y45076) influencing the computation of the sample rate in Dhofar. This is how it should work.)

Another piece of strong circumstantial evidence pointing to an origin on Dhofar is that the next closest relative to J2b-Y45076 traces his male line descent to Dhofar. The most recent common ancestor of these men, J2b-Y45546, lived about 2600 BCE according to the YFull estimate.

Timing of Migrations to Socotra

The oldest exclusively Socotran line on the YFull YTree is J2b-Y130510, descending from a MRCA who lived 1850 years ago. The safest conclusion is that this ancestor was living on Socotra at that time. Half of all Socotra marked samples on YFull descend from this man.

However, on the basis of modern diversity in Socotra, it can be argued that the 'grandparent' of this line, not biological grandparent but the ancestor two branching points back in time, J2b-Y130506 himself may have been already living in Socotra around 1000 BCE, the time when YFull estimates he lived.

YFull scientific view of J2b-Y130506. A child lineage that is exclusively from Socotra has a much older TMRCA than the one from Hadramawt.

One child line of J2b-Y130506 is found exclusively in Socotra. The other child line has one Socotran and one Hadramawt child line of its own.

We need to take into account the relative sample rates on YFull. Socotrans are represented on YFull with 18 samples per 44000 people and Hadramawt is sampled at 197 people per 1.25 million people

This means there is one YFull sample for every 6345 people from Hadramawt vs one for every 2444 from Socotra. So Socotra is sampled at 2.6 times times the rate, per capita, than Hadramawt is.

(Dhofar with 83 samples for 458,734 people has one sample per 5526 people. This is a little higher than Hadramawt but less than half Socotra's relative sample rate.)

So the strength of the argument for a Socotra origin of Iron Age "grandfather" J2b-Y130506 lies not in the number of Hadramawt vs Socotran lines. There could be other Hadramawt lines that we don't yet see due to the relative undersampling.

Still, I believe that J2b-Y130506 may have been living in Socotra (or Dhofar even though no sample yet) rather than Hadramawt because:

  • The TMRCA of the oldest exclusively Socotran child branch is much older than the TMRCA of the oldest exclusively Hadramawt branch - 1850 vs 450 years old.
  • Upstream diversity is not in Hadramawt but more in Dhofar or Socotra. Other related branches containing men from Hadramawt are younger (J2b-Y131353)

My Y Diversity Map actually computes Socotra as having the highest diversity for the 4600 year old TMRCA lineage J2b-Y45546.

This seems counterintuitive because the single Dhofar sample who is J2b-Y45546* is weighted equally to the rest of the J2b-Y45076 samples added together.

Diversity of J2b-Y45546 computed by Diversity Map

I confirmed that the Diversity Heatmap algorithm did give the correct weight (0.5) to sample YF101845 by clicking it on the map.

The reason that diversity is computed as greater in Socotra is that the basal sample from Dhofar is divided by a regional sample rate (83 OM-ZU samples) that is more than 4.5 times the regional sample rate that Socotra samples are divided by (18 YE-SU samples). Bleed from neighboring samples on the Arabian Peninsula further increases the effective underlying sample rate of Dhofar.

I am not arguing that the higher computed diversity in Socotra should imply that the 4600 year old lineage J2b-Y45546 was actually living on Socotra.

As I mentioned above, the sample rate per capita of Dhofar is less than half of Socotra's - this strengthens the weight of the single basal Dhofar sample.

Also child J2b-Y131354 with TMRCA 2500 years ago has more diversity in Dhofar and elsewhere on the Arabian Peninsula than it does in Socotra. So I would interpret the men in this lineage who trace descent to Dhofar as "remnants", men whose ancestors didn't stray too far from their deeper geographic origins.

However given the circumstantial evidence presented above, I think that J2b-Y130506 may represent the line of a man who had migrated to Socotra by 1000 BCE. Ancient samples or other circumstantial evidence might be found to contradict this prediction.

I have heard anecdotal evidence that the Pharaohs and/or Phoenicians mentioned Socotra but have yet to find the relevant citations.


The demographics of the Wikipedia article on Socotra mentions that a rare branch of N is found among Socotrans.

Also the Wikipedia article on Soqotri people mentions several mtDNA lines.

Unfortunately none of the 291 Yemen samples on the YFull MTree (as of 5/27/2022) have indicated the regional code for Socotra.

The mtDNA haplogroup with the most total samples from Yemen is R0a'b with 69 samples or 23.7% of all Yemen samples on the YFull MTree. R0a'b also has 19 of the 71 total Oman marked samples, or 26.7%.

mt Heatmap showing the relative frequency of mtDNA haplogroup R0a'b, which peaks along the southern coasts of the Red Sea. No samples marked with Socotra regional code yet on the YFull MTree.

(I don't yet have an automated way to find the most frequent per capita haplogroup in a given region, but this is something I plan to develop in the future. So another mtDNA haplogroup might be more relatively frequent in Yemen or Oman.)

Socotra might have the same level of R0a'b as neighboring Yemen and Oman but not necessarily because there could have been founder effects similar to the one we see here with Y-DNA lineage J2b-Y45076.

I'm in contact with a researcher named Dr Saleh Bamasak who intends to motivate some of these men to sample their mtDNA.

Once any Socotrans get on the YFull MTree we can see what can be learned from their mtDNA regarding the peopling or re-peopling of Socotra.

Modern South Arabian Languages

The Soqotri language is one of six languages that form a group known by linguistists as the Modern South Arabian languages.

By inspecting the map from Wikipedia, it appears that the region of greatest Modern South Arabian linguistic diversity is Dhofar.

So it would make sense that the lines of J2b-M205>Y45076 that migrated to Socotra came from Dhofar and were speaking a Modern South Arabian language that would eventually become Soqotri.

Let's keep in mind that J2b-M205 has been found in ancient contexts dating back to about 2400 BCE (I1730) in Ain Ghazal, Jordan before attempting to associate the deeper ancestors of this line with ancient speakers of proto-Modern South Arabian.

However it is very interesting to see another line of J2b-M205, J2b-FT58383, that is exclusively found in South Arabia and Ethiopia that has a 5600 year old TMRCA. The geography of this line conforms more to the now extinct Old South Arabian languages.

J2b-FT58383 may have already been living in western Yemen by 3600 BCE.

It should be noted that there are only 11 Eritrea and 21 Ethiopia marked samples on the YFull YTree, meaning that these countries are sampled at a per capita rate of 10 times and 120 times less than Yemen is. So not enough samples are yet on the YFull tree to expect to find an Eritrea marked sample in this lineage if it is actually present at 3% given that there are only 11 total samples from Eritrea so far.

YFull World Sampling Rate interactive map

Thanks for reading!

If you found this interesting you can consider commissioning me to write a research article to advance your genetic genealogy research goals.

You can contact me: hunter provyn at gmail dot com

I also accept donations via paypal and you can sign up to support my Phylogeographer project on Patreon.

These posts are the opinion of Hunter Provyn, a haplogroup researcher in J-M241 and J-M102.

Leave a Reply

Your email address will not be published. Required fields are marked *