Tuesday, August 10, 2021

Finding Unknown Biological Ancestors with DNA

This is a topic that comes up regularly in genealogy circles, because DNA testing often reveals cases of unknown adoptions, or what we call "non-paternity events" (NPE), when someone's father is not their biological father. Once there's enough suggestion that something like this has happened, the question then becomes, how do I identify this unknown biological ancestor? It can be done, although the further back on your tree it occurred, the more difficult it will be (far enough back and it might not be possible). Whenever possible, it's best to have someone from the oldest generation descended from this event to test. Like if you're looking for Grandma's unknown biological father, have Grandma take the test, or if she is unable or unwilling, have your relevant parent take the test. At the same time, if the person you're looking for is actually still living (like if you're adopted and looking for living biological parents), it will be difficult to research since lots of records on living people are private (that's a whole different ballgame and you often have to rely more on information and communication from your DNA matches). Additionally, if you're working with an endogamous population, you may be out of luck. With all that in mind, here's how it works.

Step 1: Look for your closest DNA match that you can't identify as being from another known branch of your tree. If they don't have a family tree added, that's okay because first you want to look at their Shared Matches, and open any matches that do have family trees (the bigger, the better).

Step 2: Compare the family trees of those Shared Matches, looking for ancestors any two or more (the more, the better) of them have in common with each other (especially if those matches also match each other) - ancestors who are not found in your tree. Yes, this may take some time because you have to manually compare the trees - I find it best to start with the surname list on the match review page and find surnames they have in common with each other, then see if those surnames actually lead to a common ancestor among them. If the ancestor is found in your tree, then you know this group isn't from the branch you're looking for and you can label them and move on.

Step 3: Build a descendant tree for the ancestor you found. Make a note of any descendants who were in the right place at the right time at the right age, but we're not done yet.

Step 4: Repeat this process with the next closest match you can't identify (who isn't a part of the first group).

Step 5: Look for a descendant who appears in both the trees you've built - so someone who descends from both the ancestors you've identified. This is probably either the person you're looking for, or a close ancestor of theirs, like a parent or grandparent. If you don't find one, keep repeating this process until you do.

Chart showing the two different DNA matches groups and their shared ancestors.
Click to enlarge.

For example (shown above - these names are made up but the situation is real and came from my tree): I was looking for my grandfather's unknown biological father, so I had my grandfather take the test before he died. I first found a group of his matches (who mostly all matched each other) who were all descended from a colonial ancestor named John Smith (I told you I changed the names, lol), so I built a descendant tree for John Smith. I then found another group of matches who all descended from another colonial ancestor called Christopher Jones, and built a descendant tree for him. By building those trees, I found a descendant of John Smith - named Isaac Smith - had married a descendant of Christopher Jones - her name was Carrie Jones. This suggested that the man I was looking for was probably a descendant of Isaac Smith and Carrie Jones, and based on the dates, it could only really be one of their sons, specifically one of their four oldest sons. Eventually, a close descendant of one of the four sons tested and confirmed which of the four sons was my grandfather's biological father (below).

Chart showing the closer matches that eventually showed up and allowed me to figure out
which of the 4 brothers was my grandfather's bio father.

Granted, there could have been another descendant of John Smith who married a different descendant of Christopher Jones, and that could have led me to the wrong family - this is why too much endogamy can throw you off. But as long as there's not too much of it, you can document each case of it and using your DNA matches and how much DNA you share with them, you should be able to figure out which descendants are the ones you're looking for. But a highly endogamous population might be too complex. If I was looking for an unknown bio ancestor on my mom's Mennonite branch, I'm not sure it would be possible. I can sometimes share up to about 5 ancestor couples with matches on my Mennonite branch. And the unknown father of my Italian ancestor who was from a tiny, highly endogamous town in Italy where everyone there is related to everyone else somehow? Forget it.

However, this is the same type of method that professional Genetic Genealogists like CeCe Moore employ to identify individuals from DNA left at crime scenes (either suspects or unidentified victims). It can be done (for the most part), it just takes work, and sometimes some patience for the right matches to come in.

Thursday, August 5, 2021

Understanding Admixture and Genetic Overlap at MyHeritageDNA

MyHeritage is generally known for not having the most reliable ethnicity results. This is probably because they were latecomers to the DNA field, and they haven't yet updated their percentage reports. But of course, all ethnicity percentages are merely an interpretation of our DNA, and not necessarily very reliable anyway. And what MyHeritage does do a great job of (unlike AncestryDNA), is showing us lots of data on all the genetic overlap among neighboring regions, so we can understand how it works. Not only do they show us all available regions and the areas they cover (below), but they have a section called "Ethnicity Maps" that shows us "the most common ethnicities in each country and the top countries for each ethnicity, according to MyHeritage DNA users' data." Although this is all based on data from MyHeritage, it's still a valuable learning tool for understanding admixture in general.


The percentages in the Ethnicity Maps show us the portion of testers in each country who get results in each ethnicity, or the portion of testers with results in each ethnicity within each country.


For example, looking at the data by country, if you click on Germany, you'll see 55.7% of people living in Germany get results in the "North and West European" ethnicity (above). We don't know what average percentage they get for "North and West European" because the data doesn't include that, but it's probably pretty high. It then goes on to list another 19 ethnicities down to 1.2%, from all over Europe, Asia, Africa, and even Native America. That is not all due to genetic overlap, but simply because there may be, for example, a few Asians living in Germany who took the test. For genetic overlap, it's probably best to look at the top 5 ethnicities - which is likely why their default view is the top 5. What that shows us is that lots of people in Germany also get results in East European (48.9% of testers), Scandinavian (43.6%), Balkan (38.1%), and English (23.3%), illustrating the strong genetic overlap Germany has with those nearby areas. That means if you have known German ancestry, it would not be uncommon to get results in any of those neighboring regions, especially (though not exclusively) from MyHeritage's results.


On the flip side, when you look at the Ethnicity Maps by ethnicity, it shows us the most common countries each ethnicity is found in. This gives us a good understanding of two things: the top 5-10 countries show us the areas covered by that category (although our own Ethnicity Estimate already gives us that, this can give us an understanding of just how broad that area could really be), and the full list of countries shows us how much emigration there's been from each country around the rest of the world. For example, 36.7% of people in the USA get results in North and West European, which is not surprising, considering how many German immigrants there have been to the USA over history. This doesn't really show us genetic overlap, but it is very useful for understanding modern migration patterns.

Hopefully, as MyHeritage update their ethnicity reports, they will also update this very useful data and not retire it like AncestryDNA keep doing.

Sunday, August 1, 2021

Understanding Admixture and Genetic Overlap at AncestryDNA

I talk a lot about the genetic overlap that exists among neighboring regions and how that influences ethnicity percentages, or admixture. Unfortunately, AncestryDNA keeps taking away valuable learning tools for understanding these relationships between various populations, making it harder to illustrate them. First, they removed the Average Admixture Chart, then they removed the Genetic Details page, and now they've even removed our ability to click on "see other regions tested" and explore the maps and details of any region to understand the overlap they have with neighboring regions. The only thing left is the PCA chart in the Ethnicity White Paper, but even that has always been limited to Europe. 

The Average Admixture Chart (below) used to show us what the results of a typical native of every region would expect to get. It showed how much or how little each population was admixed. So for example, if you were of 100% British descent, you could expect to actually only get about 60-70% in Great Britain (this was before they decided to attempt to split up Britain), and around 8-10% in Europe West, Ireland, and/or Scandinavia. This illustrated the common overlapping DNA among the British, Germanic people, and Scandinavians, and also the close relationship between the British and Irish (sorry, Ireland). Europe West was even more admixed, averaging less than 50% results in Europe West, and the rest coming from pretty much everywhere else in Europe except Finland/NW Russia. Scandinavia was less admixed, averaging between 80% to 90% in Scandinavia, and only small amounts from Europe West, Great Britain, Finland/NW Europe, Ireland, and a smidge from Europe East. The chart made it clear just how admixed Europeans themselves are, or can be, and to AncestryDNA, that is apparently a bad thing that they are now trying to hide, because it means ethnicity percentages, by nature, aren't always very reliable, and can't always be broken down into more specific regions. That's something customers are frustrated by, so one by one, they keep taking away the learning tools that would help customers understand this.

The loss of the Average Admixture chart wasn't too unfortunate, because the same/similar data could essentially be found on the Genetic Details page. Previously, when you clicked on a region, and then clicked "More info", there would be a page with two tabs - one which still remains with the detailed history of the population and their migrations, and the other had genetic details that helped us understand the genetic overlap that region had with nearby regions. That second page is now gone. It showed us two very important charts that basically replaced the data in the Average Admixture chart. The first one (below) showed us the average percentage that a native of that area would likely get for that region (same as you would find on the Average Admixture chart). 

The second chart (above) showed us "Other regions commonly seen in people native to [this region]". This wasn't exactly the same data from the Average Admixture chart - it rather detailed the amount of people native to that region who got any amount of results in which neighboring regions. So it didn't tell us the amount a native would expect to get in those other areas, but how common it was for a native to get results in those other areas. Not exactly the same data, but still valuable data for understanding common overlap.

With these two vitally important learning tools gone, I often turned to the simple map and details of each region to illustrate how each region often covers neighboring regions as well. If you click on "Read full history" for each region, you can find not only the areas "primarily found" in that region, but the areas "also found in" that region too (above). Unfortunately, AncestryDNA has neglected to add the "Read full history" link to some of the newer regions (like Scotland) they added recently! An oversight? Or an indication they may also be retiring this page altogether now too? And on top of that, a new revamp of the appearance of our ethnicity results (may not be available to everyone yet) seems purely aesthetic at first, until you notice the link to "See other regions tested" is now gone too (below). 

It's as though they don't want people to understand how much genetic overlap there is between certain regions, even though it would greatly help people to understand their results. And now, anytime people ask, "If I get results in X, is it coming from my Y ancestry?" and it's not a region I have results in, I can't answer them because I can't look up the map and details of regions I didn't personally get results in. This kind of question gets asked so frequently in social media, and frankly, people like me basically wind up fielding these questions for Ancestry's customer support, and they keep making it more and more difficult. I guess if they really want a huge increase in the load on their customer support, that's fine, but if that's the case, they really shouldn't have gotten rid of their support email (you can now only contact them by phone, or social media like Facebook). So, they're making it harder for customers to understand their results, and harder for customers to contact them about it. Epic fail on customer service, AncestryDNA.

Edit: AncestryDNA did later re-add the "see other regions tested" link. Apparently it was just an oversight during their updates at the time.

The only remaining tool is the PCA chart (top), which is limited to Europe and therefore not much help in understanding results outside of Europe, or any relationships that might exist in the crossroads between Europe and other continents. And frankly, I have some concerns that voicing this will lead to them to remove the PCA chart too.

The percentage range included in our results is also useful for understanding that the percentages are very much an estimate, but not very useful for understanding the genetic overlap between regions. Still, hopefully they don't retire this feature either, but the ongoing trend doesn't bode well for it.