Friday, April 28, 2017

When it all comes together

Mary Cath. Brady, Martha Washington House, William Henry
McBride, and Daniel H McBride are all children of the John
McBride in my DNA match's tree, mentioned here as the heirs
of Catherine McBride, the mother of "my" John McBride.
This is a great example of how a combination of DNA and paper research can break down a brick wall. And it's why you should read all pages of a multipage document, even if it doesn't seem like it has any useful information at first. Additionally, probates often seem to get overlooked, but this is also a great example of how important they can be. When you're stuck, always look for a probate record, of all relatives involved. Even if they are female (women sometimes had wills and probates too!).

I had a suspicion that one of my DNA match's ancestor was my 5th great uncle, John McBride. The DNA match had the same name in her tree, but she knew nothing about him apart from his name (which she found from orphan court records), his wife, and his children. I had his birth and death data, and obviously his parents names (and records to back it all up), but no records confirming his wife's name or any children. The only thing they had in common was both location and time period, but there was no proof they were definitely the same man. It was like I had half the story, and she had the other half, but we had no way to link them together.

I finally read ALL the pages of John's mother's probate records. Initially I'd only read her will, thinking if she names her grandchildren by her son John, and they match the names of John's children in my DNA match's tree, that obviously proves they are the same man. The will doesn't name them (only says "my grandchildren by my son John")... but upon further inspection of the follow up documents, such as the distribution of her estate, it does list several people whose names match perfectly with the children in my DNA match's tree! Although it doesn't specify they are her grandchildren, given the context (i.e., her estate is being distributed to her heirs, as specified in her will as her grandchildren), it would be too much of a coincidence for so many of them to be listed on this woman's probate records if they weren't her grandchildren.

So when you're struggling to find a connection to a DNA match, it pays to do some digging around on a hunch, even if it seems like a long shot or there's not enough info to say for sure. The only connection I had was a name, and an extremely common first name at that, with a surname that isn't unheard of either. Even my DNA match in question was skeptical when I first proposed the idea to her, but a little digging proved my hunch was right!

Thursday, April 6, 2017

Finally! A Gedmatch Admixture Guide!

For those unaware, Gedmatch.com is a website where you can upload your raw DNA data for further analysis and matching with people from other companies who have also upload their data.

Part 1 - Admixture Proportions

Introduction
Despite all the help articles available on Gedmatch.com, none of them really offer a comprehensive guide to understand the admixture calculators for newbies. Most of them are guides on understanding DNA in general, or how to upload your data, or using the one-to-many or one-to-one tools. In fact, there is a very good beginners guide to the matching side of things found here. But the most common questions I see about Gedmatch are “which admixture calculator do I use?” and “what do the results mean?” There is a Gedmatch wiki page on admixture, and there is Kitty Cooper's slide presentation, but I don’t think they really answer all the questions most people are looking for, especially regarding Oracle. Even Googling the topic only turns up spotty results from forums and blogs, nothing that really lays it all out. Since no one else has done it, here is my attempt. Please keep in mind I am no expert and have no formal education in genetics, this is just the knowledge I’ve gathered over the years from various sources as a result of trying to understand my own DNA results.

Admixture is a scientific term for the ethnicity percentages you received from a DNA company like Ancestry.com, FamilyTreeDNA, 23andMe, or MyHeritage. It’s important to understand that each admixture project on Gedmatch is created by a different person, mostly academics. Note that most of the admixture results will include some basic info on the calculator, either on the results page, or through a link from the creator. However, the info provided may still be technical and difficult to understand for the average person, because they were primarily created for academic purposes. This is an attempt to translate some of that info into something more understandable to the average user. I apologize that this guide favors info on European backgrounds, but that is simply what I’m most familiar with, being a European descendant myself.

Be aware that it’s common practice in DNA admixtures to refer to populations from prehistoric times as “ancient”, even though this is a bit of a misnomer. In historical terms, ancient history marks the beginning of recorded history, but here, “ancient” generally refers to the time before written history, prehistory. Some time periods might be specified as “neolithic”, or “paleo/paleolithic”.

Select a project from the drop down menu, leaving the other
options as they are, then click "continue"
Step 1: Pick a project.
There are 7 projects to choose from in the Admixture (Heritage) tool (found under "Analyze your data" and "DNA raw data"), but what are they? What do they mean? Which one should you pick? Here’s a basic breakdown:

(Note: below the projects drop down menu there are options like "Admixture Proportions (with link to Oracle)" and "Chromosome Painting", etc. Don't mess with those for now, just stick with the top default option, Admixture Proportions (with link to Oracle), as that is what this guide will cover.)

  1. MDLP
This is a global calculator and attempts to break your results down into different parts of the world. It’s good as an overview, but if, for example, you already know you’re European, it’s probably unnecessary. It’s also heavy on ancient groups. The blog for this project is found here: http://magnusducatus.blogspot.com/

  1. Eurogenes
As the name suggests, this is primarily for people with European backgrounds. While it does have populations outside Europe, there are usually more sub-continental regions for Europe than any other continent. I highly recommend this as the go-to project for people with sole European ancestry. The blog for this project is found here: http://bga101.blogspot.com.au/

  1. Dodecad
This project says it focuses primarily on Eurasians, but most of the calculators are geared more towards Asian and African ancestry than European. It’s not ideal for Europeans, but may be useful for people with mixed ancestry. The blog for this project can be found here: http://dodecad.blogspot.com/

  1. HarappaWorld
This calculator is primarily for people with Asian ancestry. The blog for this project can be found here: http://www.harappadna.org/

  1. Ethiohelix
This is an African based project, though it does have options for people with mixed backgrounds (but always including African). The blog for this project is found here: http://ethiohelix.blogspot.com/

  1. puntDNAL
This is primarily a project on ancient DNA. There is no website, but questions and comments about should be directed to Abdullahi Warsame at puntdnalking@gmail.com

  1. GedrosiaDNA
This project focuses primarily Eurasian (especially Indian and Asian) and ancient DNA. There is no website, but for further questions, please contact the creator at Dilawerkh4@gmail.com


Once you've selected a project, you need to enter your kit
number and then select a specific calculator.
Step 2: Pick a calculator.
You’ll find that for each project, there are often several calculators to choose from. How to choose? What do they mean? What are the differences? Well, for starters, the numbers following a ‘K’ indicate how many populations (or regions/categories) that calculator includes. So for example, Eurogenes EUtest V2 K15 has 15 populations. So choose one depending how many regions you want to break your results down into. Keep in mind the more populations and therefore the more specific the regions are, the more speculative the results will be.

Don't forget to put in your kit number - if you've forgotten it, go back to the home page and copy it.

Certain other tests may be specific to deeper, more ancient (prehistoric) ancestry, like Hunter-Gatherer vs Farmer. Any abbreviation that starts with ‘A’ probably stands for ‘ancient’, but I will post a comprehensive terminology list at the end of this guide. These calculators for ancient DNA aren’t very useful if you’re just looking for an opinion on your more recent ethnicity results.

Other calculators might be specific to certain types of ancestry. For example, Eurogenes’ Jtest is specific to Ashkenazi Jewish ancestry. There’s no need to run this test if you don’t have any Jewish ancestry. In fact, you might get false results in Ashkenazi if you run this calculator and have no Jewish ancestry.

(Note: ignore the option below the calculator drop down menu, this is for data collection purposes. If all 4 of your grandparents are from the same ethnic group and you want your DNA to be a part of the sample groups they use to create these calculators and determine populations, then go ahead and fill it out. Otherwise, you can ignore it.)

Here’s a more detailed breakdown of each calculator.

MDLP
  • MDLP K11 Modern - 11 global populations including ancient
  • MDLP K16 Modern - 16 global populations including ancient and modern, results page includes full population descriptions
  • MDLP K23b - 23 global populations including ancient
  • MDLP World22 - 22 global populations including ancient, full details including maps of what areas each category covers are found here
  • MDLP World - 12 global populations, probably the original MDLP calculator

Eurogenes
  • Eurogenes K13 - 13 global populations, mostly European. Creator made this the default as it “seems to hit the spot for most people” with European background. Details here
  • Eurogenes EUtest V2 K15 - 15 global populations, mostly European, also a popular option. Details including regional maps for each category found here
  • Eurogenes ANE K7 - 7 populations, Ancient North Eurasian, meaning this looks at ancient DNA mostly in Europe, Western Asia, and Africa. Details found here
  • Eurogenes K9b - 9 global populations, approximates Geno 2.0 analysis
  • Eurogenes K9 - 9 global populations, map available here (population descriptions no longer available)
  • Eurogenes K10 - 10 global populations, map available here (population descriptions no longer available)
  • Eurogenes K11 - 11 global populations, map available here (population descriptions no longer available)
  • Eurogenes K12 - 12 global populations. North European ancestry is said to do well with this calculator. Map available here (population descriptions no longer available)
  • Eurogenes K12b - 12 global populations, excluding Native American (Amerindian), map available here (population descriptions no longer available)
  • Eurogenes K36 - 36 global populations, mostly European. This is the most detailed breakdown for Europeans, but that also makes it highly speculative. Details found here
  • Eurogenes Hunter-Gatherer vs Farmer - 12 ancient Hunter-Gatherer vs Farmer populations. Map available here
  • Jtest - Jewish Ashkenazi, 14 global populations but mostly European, this is essentially the EUtest with an Ashkenazi category. Details including maps are here
  • EUtest - 13 global populations, mostly European minus Jewish Ashkenazi. Details including maps are here

Dodecad
  • Dodecad V3 - 12 populations, mostly Asian and African, 2 European. More info
  • Africa9 - 9 populations, all African except one European. More info
  • World9 - 9 global populations, not specific to any continent so good as an overview regardless of your ancestry. More info
  • Dodecad K7b - 7 global populations, 3 are Asian. More info
  • Dodecad K12b - 12 global populations but more of Asian and African. More info

HarappaWorld
  • HarappaWorld only has one calculator and as explained above, it’s primarily for Asian ancestry. It does include some European, African, and Native American populations, but it has more break down for Asia and the Middle East.

Ethiohelix
  • EthioHelix K10 + French - 10 populations, 9 African, one “French” which acts as a European population. This is really only useful/accurate for people with mixed African and European ancestry. Maps available here
  • EthioHelix K10 + Japanese - 10 populations, 9 African, one “Japanese” which acts as an Asian population. Only useful for people with a mix of African and Asian ancestry. Maps
  • EthioHelix K10 + Palestinian - 10 populations, 9 African, one “Palestinian” which acts as a Middle Eastern population. Only useful for people with a mix of African and Middle Eastern ancestry. Maps
  • EthioHelix K10 Africa Only - 10 strictly African populations, nothing else. Do not use if you have no African ancestry as results won’t be accurate. Maps

puntDNAL
  • puntDNAL K10 Ancient - 10 ancient populations, incorporates Caucasus HG as well as Early Neolithic Farmers and Western European HG.
  • puntDNAL K12 Ancient - 12 populations, utilizing ancient oracle, more info provided on results page
  • puntDNAL K12 Modern - 12 populations utilizing modern oracle, more info provided on results page
  • puntDNAL K15 - 15 populations, focuses primarily on Africa (particularly East Africa), but also includes some West Asia, and Europe. More info
  • puntDNAL K8 African only - 8 populations, as the name suggest, it’s strictly an African calculator

GedrosiaDNA
  • Eurasia K9 ASI - 9 populations, modeled around the ancient ancestral South Indian component. More info on population descriptions
  • Eurasia K10 CHG - 10 ancient populations, modeled on Caucuses Hunter Gatherers, more info on population descriptions
  • Eurasia K11 CHG-NAF - 11 ancient populations, modeled on Caucuses Hunter Gatherers and Neolithic Anatolian Farmers, more info on population descriptions
  • Gedrosia K3 - 3 populations, Eastern Eurasian, Western Eurasian, and Sub-Saharan African. More details
  • Gedrosia K15 - 15 populations with a focus on the Indian subcontinent. Population descriptions
  • Eurasia K14 - 14 populations, using the same Neolithic and Bronze Age source data as the K14 Neolithic calculator, plus some modern populations
  • Eurasia K14 Neolithic - 14 populations, focus is on ancient Neolithic and Bronze Age genomes from across Eurasia. Population descriptions
  • Gedrosia K12 - 12 populations, designed for individuals of predominantly South Asian and West Asian ancestry for inferring gedrosian Balochi admixture. More info
  • Gedrosia K11 - 11 populations with a focus on Kalash Indo European peoples of Pakistan. Population descriptions
  • Ancient Eurasia K6 - 6 ancient populations, descriptions for which are available on results page.
  • Near East Neolithic K13 - 13 ancient populations, with a focus on the Near East. Details provided on results page.


Step 3: Understanding the results: A Terminology Guide
A list of populations you might see and a brief description. I did not include some of the most self-explanatory ones. Some that I have listed might still be obvious to some people, but I’ve seen others ask about them on occasion. If there isn’t one listed here, you might learn a lot by just googling it. There is also a good abbreviation guide here: https://isogg.org/wiki/Abbreviations
Keep in mind different calculators may use different terms to refer to the same region or population.

  • Amerindian or Amerind - Native American (ie, American Indian meshed into one word)
  • Anatolian - mostly Turkey
  • Ancestral Altaic - Asia (excluding South), and Eastern Europe
  • ANE - Ancient North Eurasian
  • Archaic African - broad category for prehistoric Africans
  • Archaic Human - broad category for prehistoric humans around 500,000 years ago
  • ASE - Ancient/Ancestral South Eurasian
  • Ashkenazi - Ashkenazi Jewish of central/eastern Europe (not the same as Sephardic Jewish)
  • ASI - Ancient/Ancestral South Indian
  • Australian - aboriginals of Australia
  • Australoid - “people indigenous to Southeast Asia, South Asia, Australia, Melanesia, Polynesia, Micronesia, and historically parts of East Asia.” (Wikipedia)
  • Austronesian - “relating to or denoting a family of languages spoken in an area extending from Madagascar in the west to the Pacific islands in the east.” (Google)
  • Baloch - people of Iranian Plateau and Arabian Peninsula (primarily the Middle East)
  • Baltic - regions surrounding the Baltic sea
  • Bantu - Central and south Africa
  • Basal - Basal Eurasian?
  • Beringian - areas surround the Bering Strait (Eastern Russia and Alaska)
  • Biaka - aka Aka, “nomadic Mbenga pygmy people who live in southwestern Central African Republic and the Brazzaville region of the Republic of the Congo” (Wikipedia)
  • Caucasian/Caucasus - people of the Caucasus region, the border between Europe and Asia in between the Black sea and the Caspian Sea
  • CHG - Caucuses Hunter Gatherers
  • EHG - Eastern Hunter-Gatherer
  • ENF - Early Neolithic Farmer
  • Fennoscandian - Scandinavia and Finland
  • Gedrosia - Modern day Makran (semi-desert coastal strip in Balochistan, in Pakistan and Iran, along the coast of the Persian Gulf and the Gulf of Oman)
  • Khoisan - Southern Africa
  • Mbuti - “one of several indigenous pygmy groups in the Congo region of Africa” (Wikipedia)
  • Melanesian - “a subregion of Oceania (and occasionally Australasia) extending from the western end of the Pacific Ocean to the Arafura Sea, and eastward to Fiji.” (Wikipedia)
  • Mesoamerican - Native American in Mexico, Central and South America
  • NAF - Neolithic Anatolian Farmer
  • Oceanian - Aboriginals of the Pacific Ocean islands (may include Australia depending on calculator)
  • Omotic - Southwest Ethiopia
  • Papuan - New Guinea and surrounding islands
  • Pastoralist - Sheep or cattle farmer
  • Pygmy - “certain peoples of very short stature in equatorial Africa and parts of Southeast Asia.” (Google)
  • San - Bushmen of southern Africa
  • SEA - South East Asian
  • SSA - Sub-Saharan African
  • Steppe - “ancient North Eurasian hunter-gatherers' heritage, which was subsequently shown to have an influence in later eastern hunter-gatherers and to have spread into Europe via an incursion of Steppe herders” (MDLP K16)
  • Tungus-Altaic - Northeast China and Siberia
  • WHG - Western Hunter-Gatherer
  • WHG-UHG - Western Hunter-Gatherer/Unknown Hunter-Gatherer
  • Volga-Ural - Part of Russia (central)


Conclusion
Which project and calculator you go with greatly depends on your known ancestry. I know all this info is probably still a little overwhelming even with (or perhaps because of!) this guide. If you’re of European descent, and a newcomer to Gedmatch, and you just want a second opinion on your ethnicity results from any of the Big 3 companies (Big 4 now maybe, with MyHeritage joining the bandwagon), I’d recommend Eurogenes K13 or K15. Personally, I tend to prefer K15, because there are maps available showing specifically what regions are covered by which populations. Certainly, you can play around with any of the other Eurogenes calculators too (except Jtest if you’re not Jewish). Most of the other projects and calculators are either geared more towards ancient DNA, other continents, or a mixed ancestry. You may find a non-bias global calculator in some of the other projects, but it’s probably not going to provide the breakdown of Europe you’re looking for.

If you’re looking for an ancient calculator, I again tend to stick to one of Eurogenes’ (HG vs F, or ANE), but MDLP have some good options too. There’s also a couple in puntDNAL which I don’t think have a bias towards any one type of ancestry.

If you’re African, Asian, or of mixed heritage, there are a number of options to choose from, but I unfortunately can’t recommend any over any others. Most global calculators will include Amerindian (I have tried to note when a global one doesn’t).

It is frustrating that maps, or at least population descriptions, aren’t available for every calculator, but this is a free service, after all. It’s actually pretty amazing all the work the project creators do to provide this for free.



Part 2 - Oracle
Say what now?

Introduction
The second most common questions I see about Gedmatch are about Oracle. What is it? What do the results means? Oracle is an attempt to pinpoint your origins to a more specific population or region. There are two options: Oracle and Oracle 4. You will find buttons for them listed under your admixture results. Note that not all admixture calculators have Oracle available. There is a third button which just says "Spreadsheet" but there is a good explanation for this from Roots & Recombinant DNA so there's no need for me to go over it.

Oracle
Oracle will list your admixture results, then something called Single Population sharing, and finally Mixed Mode Population Sharing.

  • Single Population Sharing attempts to pinpoint a specific, single population that your DNA most closely matches, with a list of the top 20. The distance will tell you how closely you match each group, so the smaller the distance number is, the more closely you match.
  • Mixed Mode Population Sharing will show you your top 20 of two specific, combined populations in order of how closely you match those populations. Again, the distance will tell you how closely you match this combo of populations, while the percentage will tell you how much of your DNA matched which population.

Oracle 4
Oracle 4 is essentially the same as Oracle, except it expands on it by providing combinations of 3 and 4 specific populations. The single and double combinations can be different from original Oracle though, so don’t bypass Oracle thinking you’ll get that and more with Oracle 4, it’s best to examine both.

  • Using 1 population approximation works the same as Single Population Sharing in Oracle, but I’ve noticed the results are sometimes different, so they’re obviously using a slightly different calculation. Reading the results works the same though: they are showing you a list of specific populations you most closely match, with the distant showing you just how closely you match.
  • Using 2 population approximation also works the same as Mixed Mode Population Sharing but again, results may vary, and for some reason only lists your top 1 result instead of the top 20.
  • Using 3 population approximation works the same as 2, but with a combination of 3 populations instead. One result.
  • Using 4 population approximation obviously uses a combination of 4 specific populations you most closely match and lists your top 20 combos. This was designed especially for people who have 4 grandparents from 4 different places. It can also work well if most of your ancestry is mainly from 4 different places.

Conclusion
Be aware that the results from Oracle and Oracle 4 will vary depending on what admixture calculator you used, which is why they are found on the admixture results page, and not as a separate calculator. Also keep in mind the results are speculative, but I have found they do often make some sense, and in some cases, can be remarkably accurate.