Studies have shown that STR length, i.e. the number of repeats, factors into the mutation rate. Generally, the greater the length, the higher the mutation rate, and the greater chance of having a deletion of several repeats.
Motivation
Out of curiosity, and with the aim of eventually improving STR Match Finder to more accurately compute genetic distance based on STR-allele differences, I decided to make use of the large data set of YFull YTree computed haplotypes to calculate the allele-based STR mutation rates of the STRs in the 111 STR set used by FTDNA for which YFull has computed haplotypes.
Data
YFull has developed their own algorithm for computing haplotypes based on the STRs of each sample. They obtain the STRs of their samples either through the customer's direct import of STR test results or by extraction from the BAM file.
In order to compute the mutation rates using these haplotypes, I make use of the YFull YTree's TMRCA estimates.
I used the haplotypes that were publicly available on YFull between August 1-6, 2022 and the estimated TMRCAs from YFull YTree version 10.04.
There are three potential sources of error in the input data that I have used to compute allele-dependent STR mutation rates:
- Errors in extracting STRs from BAM file
- Errors in the algorithm YFull uses to compute haplotypes
- Errors in TMRCA estimates
Methodology
For each haplogroup root, I traversed the YFull YTree, keeping track of how many years elapsed between subsequent TMRCA estimates and whether or not a given STR-allele changed.
The years elapsed between TMRCA of a parent and its child I refer to as bottleneck length.
In the event that the downstream haplogroup's haplotype is undefined for a particular STR, that data point is ignored.
Next I use binning to convert the information of [bottleneck length (years), mutation or no mutation] into a mutation rate per time interval bin for each STR-allele.
The cutoffs for the bins I have been using to quantify average mutation rate for the STR-allele combo per bin are, in years: [100,200,400,600,800,1000,1200,1500,2000,2500,3500,4500]
In the case that a bottleneck was estimated by YFull to be zero years, I adjusted this to be 50 years.
As long as I have at least 30 data points falling within a bin, I include that bin in the set of [x,y] to solve for a best fit Poisson distribution.
x = average bottleneck length of points in bin
y = observed mutation rate (probability)
The reasons I do not use mutation rates from longer bottlenecks are:
- Less data points in these bins make them less reliable
- The longer the elapsed time between observations, the greater the chance that 'no observed change' is masking a hidden mutation away and mutation back to the original value - this can really mess up Poisson best fitting because in reality, the mutation rate should always get higher with longer bottlenecks.
- For DYS456 = 15 I found some odd apparent stability at the 10,000 year and older bottleneck length that did not match the observed behavior of it being less stable than DYS456 = 14 at lower bottlenecks. It could be due to low sample number combined with above effect or maybe in some haplogroups (most predominantly E), the ancestral DYS456 = 15 allele has some other base pairs mixed in with it that makes it more stable than other DYS456 (just a possible explanation I read about from another paper).
Results
There is a very strong positive correlation between mutation rate and length of allele for a given STR, as I think we expected from previous studies.
Sometimes even a factor of ten increase in mutation rate, as comparing DYS388 = 12 to DYS388 = 16 or 17 as you can see in the table below, where:
- 'n' is number of data points, i.e. number of times that this STR's allele was computed as an ancestral haplotype for a clade that has a direct child branch with this STR also computed and for which the bottleneck is less than 4500 years
- 'deltas' is the number of times the child had a different allele than the parent
- Results only computed for n > 500 and deltas > 25
Allele | Mutation Rate (years) | n | deltas |
DYF406 | |||
9 | 29216 | 2197 | 44 |
10 | 17745 | 7029 | 219 |
11 | 15826 | 7499 | 287 |
12 | 14848 | 2759 | 127 |
DYS19 | |||
13 | 18001 | 1709 | 51 |
14 | 17783 | 10647 | 316 |
15 | 18020 | 4364 | 208 |
16 | 7351 | 2252 | 192 |
DYS388 | |||
12 | 101045 | 10878 | 79 |
13 | 48722 | 1804 | 32 |
14 | 22354 | 1807 | 33 |
15 | 18966 | 1596 | 61 |
16 | 9239 | 854 | 44 |
17 | 10148 | 1595 | 55 |
DYS389I | |||
12 | 24270 | 3875 | 99 |
13 | 15984 | 12549 | 503 |
14 | 15364 | 2943 | 155 |
DYS389II-I | |||
15 | 6278 | 759 | 111 |
16 | 10788 | 10262 | 638 |
17 | 7947 | 6479 | 575 |
18 | 5622 | 1718 | 221 |
DYS390 | |||
22 | 20139 | 2417 | 76 |
23 | 20283 | 7389 | 238 |
24 | 11635 | 6430 | 345 |
25 | 10224 | 2623 | 171 |
DYS391 | |||
10 | 28875 | 11332 | 271 |
11 | 9058 | 7668 | 468 |
DYS392 | |||
11 | 83239 | 11381 | 85 |
13 | 28544 | 4390 | 97 |
14 | 21737 | 1935 | 51 |
DYS393 | |||
12 | 50845 | 5521 | 64 |
13 | 24632 | 10889 | 237 |
14 | 15755 | 2516 | 122 |
DYS425 | |||
12 | 95952 | 16209 | 101 |
DYS434 | |||
9 | 121403 | 17917 | 85 |
DYS435 | |||
11 | 185942 | 19151 | 61 |
DYS436 | |||
12 | 223948 | 19203 | 50 |
DYS437 | |||
14 | 105415 | 10091 | 60 |
15 | 22848 | 6271 | 185 |
16 | 41644 | 2950 | 38 |
DYS438 | |||
10 | 70793 | 10169 | 81 |
11 | 73080 | 3144 | 33 |
12 | 34357 | 3942 | 60 |
DYS439 | |||
10 | 14526 | 2852 | 101 |
11 | 12153 | 8268 | 451 |
12 | 9377 | 7549 | 702 |
13 | 4438 | 840 | 59 |
DYS441 | |||
13 | 23570 | 6356 | 148 |
14 | 20390 | 5245 | 185 |
15 | 13819 | 3242 | 128 |
16 | 10952 | 2782 | 181 |
18 | 2727 | 595 | 65 |
DYS442 | |||
11 | 17999 | 5524 | 203 |
12 | 11573 | 10689 | 583 |
13 | 7347 | 1643 | 160 |
14 | 5565 | 1268 | 139 |
DYS444 | |||
11 | 22024 | 1787 | 72 |
12 | 12206 | 9452 | 490 |
13 | 9931 | 5442 | 387 |
14 | 7369 | 2366 | 192 |
DYS445 | |||
10 | 36389 | 1763 | 26 |
11 | 39420 | 7401 | 93 |
12 | 31151 | 9633 | 206 |
DYS446 | |||
11 | 11052 | 620 | 26 |
12 | 9737 | 3359 | 199 |
13 | 9241 | 7942 | 554 |
14 | 12440 | 4163 | 241 |
15 | 7340 | 1642 | 145 |
16 | 5052 | 765 | 91 |
DYS447 | |||
23 | 16196 | 2422 | 125 |
24 | 12007 | 2274 | 138 |
25 | 9001 | 5412 | 397 |
26 | 9822 | 3637 | 291 |
27 | 12323 | 704 | 57 |
DYS448 | |||
19 | 23432 | 6659 | 190 |
20 | 18277 | 9344 | 279 |
21 | 15864 | 2076 | 115 |
DYS449 | |||
25 | 3956 | 1739 | 141 |
26 | 3493 | 641 | 69 |
27 | 6139 | 933 | 99 |
28 | 7874 | 2126 | 291 |
29 | 5732 | 4337 | 719 |
30 | 6703 | 2614 | 356 |
31 | 4869 | 1894 | 298 |
32 | 4733 | 2500 | 481 |
33 | 5511 | 679 | 98 |
DYS450 | |||
8 | 133620 | 14098 | 66 |
DYS452 | |||
29 | 20376 | 3052 | 62 |
30 | 18158 | 6733 | 291 |
31 | 17987 | 5080 | 198 |
32 | 4687 | 507 | 44 |
DYS454 | |||
11 | 110109 | 17102 | 86 |
12 | 32383 | 1956 | 34 |
DYS455 | |||
11 | 158727 | 17169 | 96 |
DYS456 | |||
14 | 15434 | 5004 | 164 |
15 | 12116 | 8133 | 592 |
16 | 6599 | 4343 | 491 |
17 | 4891 | 1477 | 153 |
DYS458 | |||
15 | 8300 | 4098 | 333 |
16 | 7581 | 4093 | 446 |
17 | 6129 | 5989 | 776 |
18 | 7556 | 2835 | 304 |
19 | 3013 | 785 | 77 |
DYS460 | |||
9 | 19255 | 980 | 34 |
10 | 15741 | 7786 | 317 |
11 | 15052 | 10650 | 484 |
DYS461 | |||
11 | 30086 | 6986 | 120 |
12 | 15951 | 10085 | 396 |
13 | 11227 | 2026 | 123 |
DYS462 | |||
11 | 51412 | 10393 | 107 |
12 | 41681 | 7767 | 113 |
13 | 12228 | 1241 | 64 |
DYS463 | |||
21 | 28588 | 3385 | 81 |
22 | 23623 | 4911 | 131 |
23 | 19246 | 910 | 30 |
24 | 18818 | 4094 | 160 |
DYS481 | |||
21 | 12716 | 1048 | 49 |
22 | 8996 | 5030 | 404 |
23 | 9418 | 3210 | 245 |
24 | 7212 | 1827 | 188 |
25 | 6410 | 4340 | 426 |
26 | 7091 | 1770 | 167 |
27 | 3955 | 792 | 114 |
DYS485 | |||
14 | 36493 | 1900 | 32 |
15 | 23223 | 12503 | 335 |
16 | 18570 | 959 | 33 |
17 | 12509 | 1007 | 64 |
DYS487 | |||
12 | 35236 | 1843 | 26 |
13 | 38597 | 9808 | 212 |
14 | 19626 | 4143 | 153 |
15 | 17968 | 821 | 46 |
DYS490 | |||
12 | 90375 | 16251 | 134 |
DYS492 | |||
12 | 128950 | 15093 | 63 |
DYS494 | |||
9 | 160359 | 16604 | 54 |
DYS495 | |||
14 | 83398 | 1892 | 27 |
15 | 28902 | 10312 | 206 |
16 | 29674 | 4844 | 118 |
17 | 10641 | 1923 | 83 |
DYS497 | |||
13 | 21826 | 1357 | 26 |
14 | 28223 | 10425 | 265 |
15 | 32281 | 6210 | 142 |
DYS504 | |||
13 | 22707 | 1899 | 47 |
14 | 13081 | 2736 | 134 |
15 | 10912 | 5190 | 294 |
16 | 8906 | 3500 | 318 |
17 | 6708 | 5238 | 532 |
18 | 5712 | 505 | 45 |
DYS505 | |||
11 | 24336 | 5154 | 148 |
12 | 15519 | 7670 | 337 |
13 | 15750 | 5503 | 234 |
DYS510 | |||
16 | 27091 | 1448 | 39 |
17 | 13673 | 11752 | 574 |
18 | 8966 | 4925 | 342 |
19 | 5064 | 884 | 104 |
DYS511 | |||
9 | 24832 | 7395 | 127 |
10 | 19559 | 9880 | 298 |
11 | 18768 | 2064 | 96 |
DYS513 | |||
11 | 21970 | 5602 | 146 |
12 | 14334 | 8968 | 409 |
13 | 13434 | 4461 | 246 |
DYS520 | |||
18 | 39058 | 1368 | 28 |
20 | 17822 | 7962 | 282 |
21 | 15522 | 7068 | 275 |
22 | 10325 | 1361 | 80 |
DYS522 | |||
10 | 32436 | 5237 | 96 |
11 | 21700 | 6913 | 199 |
12 | 15094 | 5102 | 296 |
13 | 4808 | 1780 | 191 |
14 | 2955 | 538 | 54 |
DYS525 | |||
9 | 37482 | 2341 | 43 |
10 | 19365 | 13029 | 314 |
11 | 16326 | 3250 | 177 |
12 | 7193 | 966 | 72 |
DYS531 | |||
10 | 66577 | 2174 | 28 |
11 | 70472 | 15955 | 157 |
DYS532 | |||
9 | 13502 | 900 | 47 |
10 | 14218 | 2957 | 102 |
11 | 11748 | 5253 | 279 |
12 | 9328 | 2909 | 197 |
13 | 7046 | 4369 | 408 |
14 | 6779 | 1674 | 197 |
15 | 9773 | 552 | 54 |
DYS533 | |||
10 | 16240 | 1061 | 36 |
11 | 17847 | 8162 | 240 |
12 | 14003 | 8855 | 448 |
13 | 6673 | 1100 | 83 |
DYS534 | |||
13 | 9833 | 2043 | 125 |
14 | 8788 | 1997 | 176 |
15 | 6773 | 7213 | 769 |
16 | 6922 | 4546 | 602 |
17 | 6729 | 2605 | 341 |
18 | 2084 | 576 | 83 |
DYS537 | |||
10 | 37390 | 5478 | 98 |
11 | 31193 | 10374 | 237 |
12 | 7569 | 2946 | 166 |
DYS540 | |||
11 | 30002 | 5451 | 90 |
12 | 20402 | 12700 | 420 |
DYS549 | |||
11 | 12448 | 2318 | 99 |
12 | 13447 | 10647 | 584 |
13 | 8105 | 6254 | 572 |
DYS552 | |||
23 | 14691 | 993 | 31 |
24 | 9935 | 9486 | 474 |
25 | 11416 | 5058 | 304 |
26 | 9621 | 2298 | 199 |
27 | 7964 | 1219 | 117 |
DYS556 | |||
11 | 42804 | 7950 | 118 |
12 | 17801 | 10202 | 292 |
DYS557 | |||
14 | 14937 | 3150 | 147 |
15 | 12508 | 4680 | 248 |
16 | 9909 | 5319 | 370 |
17 | 11047 | 1372 | 122 |
18 | 7167 | 3296 | 278 |
19 | 5447 | 1048 | 154 |
DYS561 | |||
14 | 37432 | 3546 | 60 |
15 | 27535 | 13166 | 298 |
16 | 13675 | 2609 | 141 |
DYS565 | |||
11 | 99154 | 10717 | 76 |
12 | 24135 | 5490 | 175 |
13 | 8957 | 1774 | 105 |
DYS568 | |||
11 | 76262 | 15173 | 138 |
12 | 13177 | 2687 | 145 |
DYS570 | |||
16 | 7891 | 1420 | 149 |
17 | 6015 | 5764 | 661 |
18 | 6342 | 5454 | 670 |
19 | 5691 | 3964 | 588 |
20 | 3543 | 1517 | 282 |
DYS572 | |||
10 | 50002 | 3320 | 59 |
11 | 23010 | 11221 | 357 |
12 | 9068 | 2298 | 143 |
DYS575 | |||
10 | 266733 | 19200 | 48 |
DYS576 | |||
15 | 7523 | 890 | 94 |
16 | 6037 | 2783 | 308 |
17 | 6294 | 6081 | 812 |
18 | 5415 | 7786 | 1145 |
19 | 6519 | 1369 | 171 |
DYS578 | |||
8 | 181190 | 13555 | 47 |
DYS587 | |||
18 | 27689 | 12123 | 250 |
19 | 25114 | 4867 | 144 |
20 | 11307 | 737 | 39 |
21 | 9030 | 733 | 55 |
DYS589 | |||
11 | 38656 | 6032 | 113 |
12 | 38405 | 7794 | 141 |
13 | 23742 | 3668 | 137 |
DYS590 | |||
8 | 208217 | 18205 | 44 |
DYS593 | |||
15 | 145757 | 16877 | 70 |
16 | 56697 | 2249 | 29 |
DYS594 | |||
10 | 48663 | 14013 | 124 |
11 | 60558 | 4862 | 63 |
DYS607 | |||
12 | 22845 | 1281 | 29 |
13 | 17345 | 2486 | 90 |
14 | 17463 | 7249 | 261 |
15 | 14681 | 5754 | 293 |
16 | 6886 | 2480 | 186 |
DYS617 | |||
12 | 53219 | 9820 | 134 |
13 | 31346 | 3308 | 81 |
DYS635 | |||
20 | 13267 | 1115 | 79 |
21 | 10481 | 7702 | 584 |
22 | 6256 | 2859 | 378 |
23 | 12195 | 6159 | 286 |
24 | 9542 | 977 | 77 |
DYS636 | |||
11 | 137379 | 14125 | 66 |
12 | 48176 | 4710 | 57 |
DYS638 | |||
11 | 58648 | 14726 | 163 |
12 | 16319 | 2180 | 53 |
DYS640 | |||
11 | 82075 | 12946 | 76 |
12 | 39882 | 6153 | 106 |
DYS641 | |||
10 | 111017 | 18081 | 128 |
11 | 6726 | 561 | 28 |
DYS643 | |||
9 | 41657 | 3518 | 51 |
10 | 20703 | 7977 | 216 |
11 | 20914 | 2853 | 98 |
12 | 15005 | 4116 | 178 |
13 | 8354 | 782 | 49 |
DYS650 | |||
17 | 13612 | 1533 | 118 |
18 | 7692 | 3511 | 412 |
19 | 5709 | 6135 | 791 |
20 | 7133 | 4218 | 452 |
21 | 5589 | 1717 | 221 |
DYS710 | |||
30 | 4208 | 901 | 137 |
31 | 4928 | 1723 | 300 |
32 | 4849 | 2408 | 457 |
33 | 4957 | 3487 | 613 |
34 | 5208 | 2666 | 505 |
35 | 3917 | 2756 | 564 |
36 | 5167 | 1013 | 163 |
DYS712 | |||
19 | 7751 | 3234 | 273 |
20 | 5723 | 6291 | 791 |
21 | 5427 | 1788 | 283 |
22 | 4448 | 3024 | 545 |
23 | 3610 | 1074 | 231 |
24 | 3122 | 876 | 193 |
26 | 2913 | 813 | 202 |
27 | 4132 | 560 | 134 |
28 | 4159 | 619 | 154 |
DYS714 | |||
21 | 12178 | 533 | 38 |
22 | 7795 | 1001 | 92 |
23 | 7524 | 1956 | 166 |
24 | 7359 | 3424 | 343 |
25 | 6144 | 4645 | 584 |
26 | 5540 | 4290 | 624 |
27 | 5729 | 1591 | 208 |
DYS715 | |||
21 | 12658 | 768 | 34 |
22 | 15239 | 4230 | 176 |
23 | 11588 | 6001 | 337 |
24 | 8187 | 7172 | 571 |
25 | 8343 | 1029 | 78 |
DYS716 | |||
26 | 57439 | 2537 | 42 |
27 | 32288 | 4457 | 87 |
DYS717 | |||
19 | 50117 | 13654 | 184 |
20 | 41485 | 3845 | 63 |
21 | 25128 | 812 | 32 |
DYS726 | |||
12 | 144620 | 16550 | 70 |
Y-GATA-A10 | |||
12 | 17476 | 6129 | 247 |
13 | 9134 | 10197 | 733 |
14 | 5391 | 2601 | 246 |
Y-GATA-H4 | |||
10 | 22304 | 8945 | 244 |
11 | 12389 | 8470 | 461 |
12 | 5693 | 1307 | 111 |
def bucketize_loss(params, x_array, y_array):
mu = params[0]
loss = 0
for i in range(len(x_array)):
p_mutation = 1 - 0.5 ** (x_array[i] / mu)
error = p_mutation - y_array[i]
error2 = error * error
loss += error2
return loss