Tuesday, September 7, 2021

ThruLines is not the enemy

I see a lot of skepticism out there about ThruLines, and some of it is warranted, because it is based on family trees, which can have errors that get copied multiple times. But that doesn't mean you should dismiss ThruLines entirely, there are ways to get reliable use out of it, and not just by finding records than confirm them. There are ways to use DNA to find biological relatives or break down brick walls in your tree even when there's no written records of the lineage, and ThruLines is just one tool that can help you do this.


It's basically a matter of probabilities. The more people you match who are descended from multiple siblings of your ancestor, especially when all those descendants all or mostly match each other to form a cluster, the less likely it becomes that it's an error. When the matches mostly all match each other to form a cluster, you know they are all related and descended from the same branch/ancestor - you just need to identify which branch/ancestor, which is where trees and ThruLines come in. Each sibling that those matches descend from would have to be an error for trees/ThruLines to be wrong, so the more siblings you match descendants of, the more likely the trees are accurate. If you match 20 people (who mostly all match each other too) descended from 5 siblings of your ancestor, what are the chances there's been an error in the trees for each of those 5 siblings, plus your own ancestor? Extremely unlikely. In the example above (click to enlarge), there's 41 matches descended from 8 siblings of Elizabeth Mertz, so for this all to be wrong, there would have to be 9 different errors. This amount of evidence is really very conclusive, and I can probably confirm this family now.

Even assuming there's only one error and those siblings are indeed siblings to each other, but your ancestor is the lone error, and not actually their sibling, what are the chances you would match that many people from a certain family, if you weren't related to that family somehow? Using the example above again, what are the chances I match 41 people descended from those 8 siblings, if Elizabeth Mertz is not one of their siblings? Again, it's very unlikely - and the only way this would be possible is if there was a lot of endogamy involved, but even so, it would still be pointing you towards a specific population you're likely descended from (and matching surnames from the same endogamous population means you're probably related to that specific family somehow), so you don't want to dismiss it.

Granted, it doesn't confirm who exactly the parents of those siblings are, only that they are indeed siblings. For that, you'd have to go up another generation and do the same thing - look for people descended from siblings of the alleged father and mother. In the example above, it doesn't really confirm that Phillip Mertz is the father of Elizabeth and all her siblings, only that they are siblings from the same parent(s), whoever that may be. But for now, it's probably safe to add Phillip Mertz at least as a placeholder until more research can be done (it really is okay to add speculative data to your tree as long as you know it's speculative!).

In the example below, you can see how this ThruLines doesn't confirm descend from Benjamin Butler - the 6 DNA matches are descendants of children of David Butler, so this really doesn't confirm this potential ancestor at all.

And there's other limitations, mainly the fact that the Shared Matches tool (which is the only way to confirm if matches match each other and form a cluster) only includes estimated 4th cousins or closer (20+ cM). AncestryDNA really need to provide something more comprehensive. They say it's limited to 20+ cM because it would tax the server too much if they expanded it to include all matches. But at the very least, they could expand it to 15+ cM segments, which have a 100% chance of being identical by descent. That would still exclude most matches (8-15 cM) and therefore not be as taxing on the server, but include all matches that have a 100% chance of being IBD, which would make ThruLines so much more useful and reliable. At the moment, they are excluding hundreds, even thousands of IBD matches from the Shared Matches tool, which is extremely debilitating. Alternately, they could offer another tool that would be less taxing on the server - a simple one-to-one comparison. Pop in two match usernames, which would tell us whether those two matches match each other or not. Very simple, not very taxing, but it would get the job done.

Even so, it's still possible to get reliable usage out of ThruLines. Remember, ThruLines is only automating a process that people used to manually do (and still do when the relationship exceeds ThruLines' 5th great grandparent limit). If it weren't possible to use DNA to confirm relationships when there is no written record of it available, what use would DNA be, and how do you think all these NPEs are being discovered? While it's true that you do have to watch out for tree errors being replicated in ThruLines, if you understand how DNA and ThruLines work, there is useful data you can get out of it. To often, I see people who seem to completely dismiss ThruLines, as though it's not reliable at all, but you're only hindering your own research by thinking that.

1 comment:

  1. I totally agree. Verify the connections before you trust them. But about 98% of mine seem correct as far as I can tell.

    ReplyDelete