Home Page Image

Ernest Brooks and Mary (Wilson) Brooks, photograph ca. 1905, Wimberley, TX

Sources for Deep Ancestry Research

The sources for deep ancestry research are quite different from those primarily used by genealogists. Deep ancestry research depends on recognizing and analyzing patterns in Y-STR marker values for discovering Y-SNPs. Discovering the best candidates for partial Y-DNA chromosome scans (known as "Walk the Y" tests) requires recognizing major patterns in large databases. This pattern recognition is past to recent analysis which is not generally done by genealogists (but should be). Once a Y-SNP has been discovered, the pattern of associated Y-STRs is analyzed recent to past. However, the patterns of some Y-SNPs can be very broad or can be very limited. This analysis is unlike genealogical analysis which much more limited due to the time frame of the origin of the surname which is limited to only 400 to 800 years from the present.



First, there is the officially blessed version of the descendancy chart of mankind which is known as the Y-DNA haplotree. The ISOGG organization is always more current than other versions of the haplotree and should be regularly inspected for frequent updates:


There is also an alphabetical list of Y-SNPs that show which haplogroup each Y-SNP is assigned to and shows those Y-SNPs that are currently labeled as private. Note that several Y-SNPs are assigned to the same haplogroup which indicates many Y-SNPs are redundant with other Y-SNPs (no difference or not enough difference in scope to warrant its own branch on the haplotree):


Another key source are the rules for qualifying for a branch on the ISOGG haplotree which greatly helps understand the process for adding branches to the haplotree. Note that true "private" Y-SNPs will never qualify for the haplotree since the haplotree is a deep ancestral documentation project. Since there will eventually will be many more private Y-SNPs than haplotree Y-SNPs, this represents a major issue for genealogists since private Y-SNPs are extremely important to genealogical analysis:


Unfortunately, the ISOGG haplotree does not document "private" SNPs as well as the draft FTDNA haplotree does. The ISOGG haplotree labels most Y-SNPs at a very high level (for R-L21 - all are only shown as R1b "private" Y-SNPs). The ISOGG haplotree does not reveal the connection of "private" Y-SNPs to lower level haplogroups. The FTDNA does a much better job of showing how "private" Y-SNPs are related to more recent haplogroups and attempts to show possible connections of "private" Y-SNPs to more recent haplogroups:




Many deep ancestry researchers use Y-Search DNA fingerprint searches for the datamining of Y-STR submissions of interest. Unfortunately, it is difficult to sort out duplicate entries from FTDNA projects which only show FTDNA IDs. Dennis Wright (aka L226 admin) provided me a very nice Y-Search macro where you can enter the FTDNA ID and it will return you the associated Y-Search ID:


Once you select this link, you must then modify 91116 to the FTDNA ID of interest. Once you replace the number, just hit the enter key and the web site will return Y-Search ID associated with the FTDNA ID (pretty nice little macro). Another more complex macro that I have not used extensively provides an automated and repeatable DNA fingerprint Y-Search queries. This macro allows you to prefill most of the Y-Search parameters (such as a specific DNA fingerprint search) and then just copy the very long macro into the URL field and hit enter. Due to a limitation of software applications supported by web sites, many line endings get added to the URL which must be eliminated. An easy way to eliminate the line endings is to copy and paste the very long URL into Wordpad and turn off Word Wrap which removes all line endings. This macro allows you to save your searches and repeat them again in the future or check what search you made in the past:

Y-Search Macro for DNA fingerprint searches

Conducting effective DNA fingerprint searches can be very challenging for those who have only conducted Y-Search queries by Genetic Distance only. If you only enter your DNA fingerprint, you will get a lot of false positives (many matches that have many different haplogroups). In order to reduce these false hits, I enter many L21 marker values where the submissions match at least 75 % off all L21 submissions found in Mike Wash's extensive R-L21 spreadsheet. This greatly reduces false hits from other haplogroups. For those new to DNA fingerprint searches, these searches are, at best, 80 % accurate. They will include 10 to 20 % false hits (specially those that match the DNA fingerprint the least) and will also miss around 10 to 20 % of the valid matches (specially those submissions that mutated more than average). However, the DNA fingerprint searches are far superior to "genetic distance" searches for finding deep ancestry matches.



For many genealogists that are new deep ancestry research, they may not be aware of one of the primary sources for discovering new Y-SNP mutations. Many new Y-SNPs are found via a special partial Y-DNA chromosome test from FTDNA which is called the "Walk the Y" test. This is not the only source as many new Y-SNPs are data-mined from other scientific sources as well. For those not familiar with this test, see the FTDNA description of this special test:

FTDNA FAQ for "Walk the Y" test

For those who want to see a high level summary of the WTY tests, there are summary charts of results for all R-L21 WTY tests to date. Some are "private" tests that are only available via scientific databases but most are summarized in the chart below:

R-L21_WTY_GATC_Summary Chart



The above list of source files will be expanded as time allows.