PhyloGeographer Updated to YFull v7.10.00

6 New Geographic Codes Added

  • UAE (though AE existed before)
  • ER-DU Eritrea, Debub
  • AF-LOW Afghanistan, Logar
  • SD-13 Sudan, Janub Kurdufan
  • AL-LB Albania, Librazhd
  • AL-PR Albania, Përmet

Corrected Codes

Moved UAE closer to geographic center and Abu Dhabi

Many incorrect codes in Sudan corrected. The mistakes came from my original source and were usually confused with similar place names in Egypt or Saudi Arabia.

SD[01, 18, 07, 26, 17, 25] Ash Shumaliyah, Al Buhayrat, Al Jazirah, Al Baḩr al Aḩmar, Baḩr al Jabal, Sinnar

Improvements to Migration Calculating Algorithm

Clades with three children (basal samples and/or subclades) such that one child is close to the clade's parent and grandparent and the other two children are close to each other but further away should now in most cases be calculated to have originated near the parent location, rather than the (possibly oversampled) child locations.

I tested this behavior with haplogroup I-Y8943, which had previously been computed to have formed in Ireland, due to the majority of subclades (2/3) being found there.

BEFORE: Hard to see because it is obscured by its subclades positions, but I-Y8943 had been computed to have formed in Ireland due to 2 of 3 subclades being found there.

However, based on the deep diversity in Scandinavia haplogroup researchers informed me that the a Scandinavian origin was more likely, which I agree with based on the YFull tree.

AFTER: Now that parent and grandparent clade locations are taken into account for determining outliers, the two Irish subclades are considered outliers for I-Y8943, obscuring its more likely Scandinavian origin.

A second improvement affects clades with exactly two samples and/or subclades. When these two children are so distant from each other, that the total distance from their parent clade's parent through the interpolated point between them to each of the two children exceeds the sum of the distances from the parent clade's parent directly to each child, the latter migration results.

This avoids clades being computed to have formed in a 'no man's land' far from either child location and from the clade's parent.

BEFORE: This interpolation was always applied. Sometimes it resulted in a clade being computed far from both locations where it was actually found, yet in the geographic center between these locations.

AFTER: Now if a clade's children are more distant, the clade's parent location will be used to represent the clade instead of an interpolated point that may be far from the actual samples.

These changes are algorithmic and will affect all previously computed paths, to a greater or lesser extent.

Improving PhyloGeographer is an iterative process. You can help me by pointing out subclades that, based on YFull sample distribution, you believe ought to be calculated to different locations.

Please keep in mind that the algorithm does not make use of coastline information. Coastlines change over time and to fully solve the problem of clades being computed in the water would require obtaining a set of coastline geometries for the whole world covering different time periods. It's beyond the scope of what I can do as one person who is not funded.

If you want me to work toward this goal, consider becoming a sponsor of PhyloGeographer on Patreon so you can vote for this feature in my regular polls for future improvements.

PhyloGeographer Project on Patreon

 

These posts are the opinion of Hunter Provyn, a haplogroup researcher in J-M241 and J-M102.

Leave a Reply

Your email address will not be published. Required fields are marked *