STR Match Finder now uses Allele-specific STR Mutation Rates

In this example, at the top you can see that DYS442 = 10 (in orange) is more stable than DYS537 = 12 (in red). However in the 2nd example, DYS442 = 11 is less stable than DYS537 = 11. So the true stability depends on the allele of the STR.

STR Match Finder now takes the stability of the particular allele of a given STR into account when determining the display order of the STRs, going left to right from most to least stable.

Previously I had been using a single mutation rate estimate for all alleles of a particular STR.

The practical application is that if you see that most of the rare alleles you match with someone are toward the right (less stable), then you can consider it less reliable that you may in fact be more closely related to them compared to someone with whom your matching alleles are toward the left (more stable).

This ordering by stability should now be much more accurate than before.

Here I’m making use of the allele-dependent STR mutation rates I had computed from publicly available data from the YFull YTree in Sept 2022 and doing a little extrapolation to handle alleles I had less or no data for.

Interpolation Technique

For an allele falling inside the domain of computed alleles for a STR, I interpolated by computing the geometric mean of the closest alleles less than and greater than the target allele.

Extrapolation Technique

Of the 91 STRs for which I computed allele-specific mutation rates, for 13 of the STRs I was only able to compute a reliable mutation rate for a single allele (I rejected results below a certain threshold of incidences).

The 13 alleles for which I only had a single mutation rate computed were: [‘DYS455’, ‘DYS578’, ‘DYS590’, ‘DYS425’, ‘DYS436’, ‘DYS490’, ‘DYS450’, ‘DYS492’, ‘DYS494’, ‘DYS575’, ‘DYS726’, ‘DYS434’, ‘DYS435’]

I used these 78 STRs to compute two baseline ratios to use to extrapolate the mutation rate for an allele outside the observed range.

For 65 of the 78 STRs, the lowest allele’s mutation rate was less than that of the second lowest. The 13 STRs that did not follow this trend were: [‘DYS390’, ‘DYS389ii-i’, ‘DYS438’, ‘DYS531’, ‘DYS594’, ‘DYS487’, ‘DYS710’, ‘DYS549’, ‘DYS533’, ‘DYS445’, ‘DYS715’, ‘DYS643’, ‘DYS497’]

For 59 of the 78 STRs, the highest allele’s mutation rate was greater than that of the second highest. The 19 STRs that did not follow this trend were: [‘DYS388’, ‘DYS447’, ‘DYS437’, ‘DYS449’, ‘DYS460’, ‘DYS576’, ‘DYS531’, ‘DYF406S1’, ‘DYS511’, ‘DYS594’, ‘DYS710’, ‘DYS714’, ‘DYS505’, ‘DYS525’, ‘DYS712’, ‘DYS532’, ‘DYS715’, ‘DYS643’, ‘DYS497’]

I then computed the geometric mean of the lowest-to-second-lowest allele ratio of mutation rates and highest-to-second-highest-allele ratio of mutation rates, in each case excluding the STRs that bucked the main trend of a positive correlation between allele magnitude and mutation rate.

In the code (and my study that it derives the data from) the mutation rate is expressed as the mean years until a mutation. The computed geometric means of these ratios are:

leastOverSecondLeast = 1.696

greatestOverSecondGreatest = 0.518

So this means that, for STRs that follow the main trend, the minimum allele is 1.696 times as stable as the next lowest allele and the maximum allele is about half as stable as the next highest allele.

For any STR that followed the trend, I used these computed ratios to extrapolate for alleles outside the computed allele domain. To be conservative, I did not raise these ratios to the power of how many steps away they were from the closest computed allele.

For the STRs which deviated from the trend, I used the mutation rate of the closest computed allele.

To be conservative, for the thirteen STRs which I had only a single allele’s mutation rate computed, I extrapolated with the square root of the ratio instead, because I didn’t have data on whether or not that STR behaved according to the main trend.

The STRs for which I had at least one allele’s mutation rate calculated from my previous study and which STR Match Finder now interpolates/extrapolates for other alleles are: [‘DYS393’, ‘DYS390’, ‘DYS19’, ‘DYS391’, ‘DYS388’, ‘DYS439’, ‘DYS389i’, ‘DYS392’, ‘DYS389ii-i’, ‘DYS458’, ‘DYS455’, ‘DYS454’, ‘DYS447’, ‘DYS437’, ‘DYS448’, ‘DYS449’, ‘DYS460’, ‘Y-GATA-H4’, ‘DYS456’, ‘DYS607’, ‘DYS576’, ‘DYS570’, ‘DYS442’, ‘DYS438’, ‘DYS531’, ‘DYS578’, ‘DYS590’, ‘DYS537’, ‘DYS641’, ‘DYF406S1’, ‘DYS511’, ‘DYS425’, ‘DYS557’, ‘DYS594’, ‘DYS436’, ‘DYS490’, ‘DYS534’, ‘DYS450’, ‘DYS444’, ‘DYS481’, ‘DYS520’, ‘DYS446’, ‘DYS617’, ‘DYS568’, ‘DYS487’, ‘DYS572’, ‘DYS640’, ‘DYS492’, ‘DYS565’, ‘DYS710’, ‘DYS485’, ‘DYS495’, ‘DYS540’, ‘DYS714’, ‘DYS716’, ‘DYS717’, ‘DYS505’, ‘DYS556’, ‘DYS549’, ‘DYS589’, ‘DYS522’, ‘DYS494’, ‘DYS533’, ‘DYS636’, ‘DYS575’, ‘DYS638’, ‘DYS462’, ‘DYS452’, ‘DYS445’, ‘Y-GATA-A10’, ‘DYS463’, ‘DYS441’, ‘DYS525’, ‘DYS712’, ‘DYS593’, ‘DYS650’, ‘DYS532’, ‘DYS715’, ‘DYS504’, ‘DYS513’, ‘DYS561’, ‘DYS552’, ‘DYS726’, ‘DYS635’, ‘DYS587’, ‘DYS643’, ‘DYS497’, ‘DYS510’, ‘DYS434’, ‘DYS461’, ‘DYS435’]

STRs that do not follow the trend of increasing instability with increasing magnitude of allele

One factor that I know of that can affect mutation rates is an incomplete repeat. An allele with an incomplete repeat is written with a decimal, so 13.2 means that there were 13 full repeats and that the final repeat was missing one or more basepairs of the repeating sequence, known as a motif. I believe that in some cases, an incomplete repeat can greatly increase the stability of the allele.

However, for simplicity, I had converted all computed haplotypes from YFull into integers before computing the mutation rates. STR Match Finder also treats alleles strictly as integers, or in the case of palindromes, a sequence of integers.

These posts are the opinion of Hunter Provyn, a haplogroup researcher in J-M241 and J-M102.

Leave a Reply

Your email address will not be published. Required fields are marked *