Friday, September 29, 2017

MyHeritageDNA Matching Issues

I'm amazed, just not a good way
I'm going to illustrate why your DNA match list at MyHeritageDNA shouldn't be trusted. This has been touched on in a few other blogs (see here and here), but I want to highlight and update some details which are really very concerning.

The main problem is that the system is clearly including a high percentage of either false positive matches, or false negatives - and more than that, they are not necessarily weak matches with only small segments shared. Granted, false positives are a part of DNA matching no matter what company you test with, the nature of DNA means there are always going to be matches known as "Identical by State/Type" (IBS or IBT) versus "Identical by Descent" (IBD). Not to be confused with the medical bowel conditions using the same abbreviations, IBS matches are ones which share small amounts of DNA with you by chance, making them false positives, versus Identical by Descent which means your shared DNA comes from a common ancestor. IBS matches are a part of DNA matching no matter what - however, they only share small segments of DNA with you. This is why most companies have a cut off point where any match sharing no segments above about 7, 6, or 5 cM is automatically excluded, in attempts to reduce the amount of IBS/false positive matches included. The smaller amount of DNA you share, the more likely it is an IBS match. According to ISOGG, "False positive matching rates of between 12% and 23% have been reported for Family Finder data, and up to 34% at Ancestry using their current algorithm." So this is a normal part of DNA matching.

My closest match on MyHeritage is not a match to either my
mother or father
The trouble is, not only is MyHeritageDNA's rates of false positives much higher (around 60% according to my results and others), but more alarmingly, they are not all matches who share only small segments at the bottom of your list like normal IBS matches do. Once you get into the more distant cousins in your match list, you know that some of them are going to be IBS. However, when your top, closest match (after immediate family) who shares 89.5 cM with you (a significant amount), doesn't match either your mom or dad, you know something is very wrong. They are not just expected IBS matches, there is clearly a problem with the DNA matching system.

In fact, MyHeritage's cut off point for minimum segment size to qualify as a match appears to be 12 cM (none of my matches had a longest segment below this), which should almost assure that all the matches are IBD, not IBS (the normal cut off points are typically 5-7 cM)... and yet about 63% of my matches are clearly false positives. One of the blogs I linked to above seems to suggest these false positives are the result of imputed data (explained on their blog). While I don't fully understand imputed data, the blog is written by a professional scientist and therefore a reliable source.

Of course, it's also possible instead of being false positives for me, some of them are legit matches to me and false negatives for my parents, but here again, to have such a high match to me not turning up for either of my parents, something is clearly still wrong. If that's the case, it makes you wonder how many strong, legit matches are missing from my own match list too. Supporting the false negatives theory is the fact that none of my matches are shared matches with my Dad (don't worry, he and I match as father/child at all venues so there's no question he is my father). One could argue that's just because anyone on my dad's side who has tested is too distant to also match me, but that would have to mean I also wouldn't have any shared matches with my paternal grandfather, and yet I do. DNA doesn't skip a generation - how can I have shared matches with my paternal grandfather, but not my father, unless they are false negatives for my dad?

Of lesser concern is MyHeritage's relationship estimate ranges, which are often as specific as "1st cousin twice removed - 4th cousin", for example. No other company attempts to be as specific as this because there is so much overlap in how much DNA may be shared for different relationship types and degree. Looking at the chart to the right, you'll see that a 1st cousin twice removed is lumped into the same group, and therefore the same range of possible shared DNA, as a 2nd cousin. So why isn't the estimated range 2nd-4th cousins instead? Granted, relationship estimates take other things into consideration - not only the total amount of DNA shared, but over how many segments, and how long those segments are. Even so, if no other company is able to be as specific as 1st cousins twice removed instead of 2nd cousins, how have MyHeritage managed it? It also makes it difficult for people to understand their possible relationship with a match, since a lot of people don't even understand what "removed" means to begin with, and even those that do may have trouble knowing what relationships would be within a range that included a "removed". However, since the relationship range is only an estimate anyway, it's not a huge concern, just more of an annoyance.

I should note that all my tests (mine, my mom's, dad's, and paternal grandfather's) were transfers, meaning I had tested with another company and then uploaded my raw DNA data to MyHeritageDNA. I did not buy a test with MyHeritageDNA, and I've seen other blogs saying this makes a difference. However, since the matching database and algorithms are all the same regardless of whether you uploaded or bought a test, I don't see how this could be the case. If there's a false match in my list, then I am a false match in their list, regardless of whether one, both, or neither of us tested directly with MyHeritageDNA.

When you combine this very serious matching problem with the fact that their ethnicity report is also seemingly inferior for most people compared to other companies, it really makes their DNA test pretty worthless.Your mileage may vary regarding the ethnicity report, and of course DNA ethnicity reports are only estimates anyway, but in my experience, and that of many others, it is the least accurate out of all 4 of the big DNA genealogy companies. I highly recommend you don't get sucked in by their long running and continually reduced sales, but instead test with a more reliable company (even if it costs more) and then you can upload your raw DNA data to MyHeritage for free. Because that's about what their results are worth.

UPDATE (01/11/2018): Today I checked my kits' DNA matches at MyHeritage and so far, it looks like much of these issues have finally been resolved. The match in question who shared a significant amount of DNA with me but wasn't a match to either of my parents is now showing as a match to my dad and my paternal grandfather. In fact, my dad's kit now has a lot more matches than he had before (he previously only had 19 matches), many of them matches to me too, resolving the problem that we had no matches in common at first (except each other). Several of his original matches and now missing, suggesting they may have been false positives. And several of those still present have seen updates to how much DNA they share. So there's been a lot of changes, and it looks like they are good ones - matches now seem to make a lot more sense. Now I'm just looking forward to seeing updates in their ethnicity report.

No comments:

Post a Comment