Friday, September 4, 2020

AncestryDNA's Inconsistent cM Totals

Edit: See bottom of article for update.

For several years now, because both of my parents took the DNA test, I have noticed certain DNA matches who share more DNA with me than with one of my parents (usually my mom) and none with the other. In most cases, it's only a difference of less than about 5 cM, which is usually small enough that I figure it's nominal and doesn't matter. But I also have many matches where the difference is 10 cM or greater, which is harder to ignore. The greatest difference I've come across so far has been 20 cM. And I know I'm not the only one, I've talked to a lot of other people who have noticed the same.

Recently, AncestryDNA added to the very little amount of DNA matching data they provide, the ability to see the longest shared segment with a match. This has been enlightening, because as many people have already noticed, there are some cases where the longest shared segment is greater than the total amount of shared DNA. Naturally, this isn't genetically possible, and it's left many people confused. AncestryDNA tried to provide an explanation for it:

"In some cases, the length of the longest shared segment is greater than the total length of shared DNA. This is because we adjust the length of shared DNA to reflect DNA that is most likely shared from a recent ancestor. Sometimes, DNA can be shared for reasons other than recent ancestry, such as when two people share the same ethnicity or are from the same regions."

They are trying to keep it simple, but unfortunately I think it serves only to confuse most people even more. Here's what this means.

AncestryDNA have a program called Timber that removes shared segments it believes are not identical by descent (ie, the shared DNA is not coming from an ancestor within a genealogical time frame, but rather from a shared ethnic background). What AncestryDNA's explanation is saying is that they are applying Timber to the total shared DNA, but not to the longest segment. This explains the reason for the inconsistency between the totals and the longest segment, but not the logic or reasoning behind the bizarre choice to apply it to one and not the other. If you find this frustrating, you're not the only one.

What does this mean for the inconsistent shared totals with a match between parent and child? Well, I've noticed that often, when the totals are inconsistent, so is the total and the longest segment, and this tells me the same Timber action that's removing segments from the totals but not the longest segment is probably what is causing the inconsistent totals between parents and children.

Take for example, this DNA match "RB":

RB shares 39 cM across 2 segments with me, longest segment 47 cM
RB shared 19 cM across 2 segments with my mom, longest segment 47 cM


So, my mom and I both actually share one 47 cM segment with RB, but Timber has removed a chuck in the middle of that (making 2 smaller segments). Generally, that's not necessarily a bad thing if that chunk isn't identical by descent, but for some inexplicable reason, Timber took a larger chunk from the shared DNA with my mom than with me. That shouldn't be happening, because it's the same segment, it should be removing the same amount from each. Instead, it's taking the same shared 47 cM segment and removing 28 cM from one person but only 8 cM from the other, and that doesn't make sense, and doesn't exactly instill much confidence in Timber and it's reliability.

My theory on why this is happening is that it may have to do with endogamy. Most of the matches I've noticed with this problem on are my mom's side, particularly from endogamous branches. Granted, my dad has some endogamous branches too, but my mom has a fairly recent Mennonite branch, who are highly endogamous, and many of these matches are from that branch. I don't know whether endgamy is maybe messing with Timber, or Timber is trying to remove endogamous segments, but whatever it's doing, it shouldn't be doing it so inconsistently, and frankly, I can't believe this issue has gone on for so long unresolved (except it's Ancestry, so I can believe it).

Edit (24 Sep 2020): Recently, AncestryDNA added to the DNA data they provide the "unweighted shared DNA" total - which is the amount of DNA you share with a match before Timber is applied. You can find it by clicking on either the longest segment data or the shared total for more information. This means the inconsistencies between the total and the longest segment make more sense, and so do the matches where I share more than my parent does, but I fear it's only going to cause more questions about what an unweighted total is, why there are two totals, why they are sometimes so drastically different, and which total do we rely on? Theoretically, we should be able to rely more on the weighted (Timber) total, but since I don't trust Timber, there is no easy answer to the last question.

But at least I can now see the original total with matches, which unsurprisingly is now much more consistent with the original total they share with my parent. There are a couple that still have a discrepancy of 6 cM or less, but that's somewhat nominal, I suppose.

No comments:

Post a Comment