Robert and Diane Casey

Robert and Diane Casey, ca. 1985, Dallas, Texas
 



How To - Y-SNPs (Genealogical Viewpoint)

WHY DEEP ANCESTRY (Y-SNPs) ARE IMPORTANT

As most genealogists, I originally thought that Y-SNPs played a very minor role for genealogists. Several years ago, this may have been the case. The first discovered most common Y-DNA haplogroups were several thousand years old and only divided all of mankind into around 100 deep ancestral branches. The first and most widely known usage of deep ancestry was using haplogroups as a quick methodology to separate genealogical clusters from each other. After all, if you do not share an ancestor at 3,000 years, you obviously will not share a common genealogical ancestor at 300 to 600 years. But most sponsors of Y-STR submissions probably believed that surname admins were just too lazy to manually separate submissions into valid genealogically related surname clusters and most sponsors were not very supportive in ordering deep clade tests or individual SNP tests for more recent branches. This remains an important usage of haplogroups as some surname projects have dozens of genealogical clusters which are difficult to properly separate.

However, many surname admins (and some sponsors) have learned over time that many genealogical clusters were easy to isolate while other groupings of submissions just seemed too genetically diverse to be true genealogical clusters. Some groupings appeared to be multiple genealogical clusters that overlapped in some fashion. Many surname admins are not really aware of the root cause of this overlap but many have learned that this is a result of having common Y-STR marker values that have changed little over the last 2,000 years. These common marker values have changed over the last 2,000 years but they were primarily due to parallel mutations and backwards mutations and many submissions arrived back to the same set of marker values they started out with 2,000 years ago. Closely matching YSTR values are sometimes not related and are false hits by methodologies that only use genetic differences for measurement of relatedness. Only around five to ten percent of submissions fall into this category - but if it is your genealogical cluster, then it is 100 % of your cluster. So the second major usage of YSNPs was to rule out false hits due to having common YSTR marker values.

With the growing understanding of common Y-STR marker values and the number of useful Y-SNP haplogroups passing 500 haplogroups, surname admins are now encouraging sponsors to order Nat Geo 2.0 tests (formerly deep clade tests) as well as special order Y-SNP tests to assist with the separation of groupings with common DNA marker values. These tests are also used to separate those pesky overlapping clusters where there are common Y-STR marker values as well. Due to these two important factors (sorting tool for clusters and separating overlapping clusters), most FTDNA admins routinely request testing of Y-SNPs.

Having common marker values can result in false hits for relatedness but having very rare marker values can be used distinguish your genalogical cluster from others. The discovery of more rare Y-STR marker values lead to another very useful application of Y-SNP testing testing for genealogical usage - systematically determining how rare haplotypes are when compared to others. For genealogical clusters under the very deep ancestry R1b haplogroup, you can compare the MRCA of the R1b haplotype to the MRCA of the haplotype of your genealogical cluster. If you only find a few mutations, your genealogical cluster could have very common Y-STR marker values that overlap with many haplogroups over 1,000 years old and makes the testing of Y-SNPs much more important. Some combination of marker values are so uncommon, that all Y-STR related submissions will be genealogically related regardless of surnames. For other Y-STR related submissions that have extremely common combinations of DNA marker values, even close Y-STR matches with the same surname may not share haplogroups and therefore could not share common genealogical ancestors 300 to 600 years ago. Evaluating the rarity of your Y-STR haplotypes is critical to ruling out false genealogical matches and is much more important to analyze than most researchers realize.

 

Y-SNPs ARE NOW APPROACHING GENEALOGICAL TIMES

With many Y-SNP haplogroups now having origins that are only 1,000 to 1,500 years old, many recently discovered Y-SNPs are recently getting interestingly close to the genealogical time frame. When haplogroups are this close to the genealogical time frame, other major usages of Y-SNPs become powerful analytic tools for genealogists. If you compare the MRCA haplotype of your more recent haplogroup to the MRCA haplotype of your genealogical cluster, then you have a Y-STR fingerprint for your genealogical cluster. These are the mutations between your ancestor when the recent haplogroup originated and the ancestor when your genealogical cluster originated. When attempting to find out if more remotely related submissions are truly related, matching the Y-STR fingerprint (or close to the Y-STR fingerprint) is a very strong factor in determining the possibility of being related.

Sharing common mutations from the haplogroup MRCA is a much more important criteria for determining a possible connection than genetic difference (the number of mutations between submissions). If you find other distantly related genealogical clusters that share significnats parts of this Y-STR fingerprint, then it is also evidence that these remotely related genealogical clusters could share a common ancestor. If you have possible NPE candidates that strong geographical ties and are genetically close, discovering that they share common mutations from the MRCA of the haplogroup is additional genetic proof supporting the possibility of a NPE connection.

Very few surname admins or sponsors of genetic tests are aware that Y-STR fingerprints of genealogical clusters can greatly enhance genetic source documentation that support genetic analysis. Y-STR fingerprints also provide far superior searches for possible relatives. Searching by the number of mutations can miss genetically related submissions that mutated a little more than normal. Also, if your genealogical cluster has common marker values where mutational difference is less reliable due to major overlapping unrelated submissions, having common "off modal" mutations from the MRCA of the haplogroup can be a far more accurate test of relatedness. Searching Y-Search with your DNA fingerprint provides much better genetic matches than searching by mutational difference. Having shared mutations and close genetic matches is a powerful combination. Having shared mutations, close genetic distance and sharing a common surname is even a more powerful combination. If you only analyze the mutations below the MRCA of your genealogical cluster, you are not including important useful mutations that occurred between the creation of your haplogroup and the creation of your genealogical cluster.

When the age of the haplogroup gets very close to the genealogical time frame (500 to 1,000 years ago), these Y-SNP mutations that define the haplogroup can reveal even more information. These Y-SNP mutations are called "near private" Y-SNPs by deep ancestry researchers. These mutations are always dominated one or two surnames. If 80 % are one surname, 10 % are a second surname and the last 10 % are spread across 20 surnames, then you have probably discovered very distant NPE connection between two most common surnames. The last 10 % of surnames also become excellent NPE candidates since the number of NPEs over a 500 to 1,000 year time frame should range between ten and twenty percent.

 

FINDING Y-SNPs WITHIN GENEALOGICAL TIME FRAMES

Just like Y-STRs, Y-SNPs can mutate at any time. Anywhere from 20,000 years ago to only 200 years ago. Any Y-SNP that mutates within the genealogical time frame is called a "private" Y-SNP and is extremely important to genealogical research. Y-STRs really only form clusters of related submissions. The vast majority of Y-STR mutations provide only proof that those submissions that include common mutations must be more closely related. However, the connection between these clusters and the age of these clusters are difficult to determine via only Y-STR information. It is similar to tree trunk and several big branches laying on the ground: you can have many branches, but you have little information of where to put the branches on the tree in proper chronological order. In addition, you have a lot of submissions that do not have any cluster defining mutations. These submissions with no common mutations can not be even assigned to a branch and there is no information how they are connected together or where they belong on the tree. Here is a typical DNA descendancy chart of a well established surname cluster (the most common scenario):

Scenario 1 - Many Y-STR branches - but no early branch that splits the cluster

If you are very lucky, you may discover a Y-STR mutation that happened just after the formation of your genealogical cluster. These older Y-STR mutations can form an early branch that divides the genealogical cluster into two large branches where all other branches are attached to one branch or the other. This kind of early branch is much more likely if your MRCA haplotype is much more recent (around 300 years). This kind of branch has major genealogical implications as you can eliminate around half of the submissions as being less related and focus your genealogical research on the half of the submissions that belong to your branch. This very special scenario allows Y-STR mutations not only show a genealogical connection between two very old branches but also indicates the mutation happened immediately after the creation of the genealogical cluster. These kinds of cluster dividing branches are rare (five to ten percent of genealogical clusters at most):

Scenario 2 - Several Y-STR branches - with early branch that splits the cluster

In only the last two years, "private" Y-SNPs have become available for genealogical analysis. Very few genealogists are even aware of these extremely powerful "private" Y-SNP mutations. Unlike Y-STR branches, "private" Y-SNPs reveal new branches with more clarity. "Private" Y-SNPs provide connection information between all branches and provide the relative time frame of each branch. According the November, 2011 ISOGG Y-SNP summary, there are currently over 500 haplotree branches defined and over 150 "private" Y-SNPs that have been discovered. Only a handful of these "private" Y-SNP are probably being analyzed for genealogical purposes. There are currently around 20 to 30 new Y-SNPs being discovered every month and around half of these are "private" Y-SNPs. Many scientists believe that there could be 1,000s (or perhaps 10,000s) of "private" Y-SNPs that could be discovered over the next few years. These kinds of branches will become common and will help create a DNA descendancy chart that starts to resemble a traditional genealogical descendancy chart:

Scenario 3 - Many Y-STR branches - with one private SNP

So how do genealogists locate existing "private" Y-SNPs associated with their genealogical cluster and how do you test for these Y-SNP mutations? Finding existing "private" Y-SNPs that match your surname project is pretty tedious work and requires some research in the haplogroup projects. If you are lucky, deep ancestry researchers may alert the surname admin of these recently discovered "private" Y-SNPs. These "private" Y-SNPs are a very recent development in genetic genealogical research and these "private" Y-SNPs are not well documented. Most newly discovered Y-SNPs are data mined from academic and scientific studies and are then made available for general testing by genealogists. FTDNA also offers a test that discovers new Y-SNPs (and finds one to three new Y-SNPs around 50 % of the time). Currently, new Y-SNPs mutations are now being discovered every few days and the rate of discovery is increasing as more researchers become involved and testing costs continue to decline (or the scope of testing continues to increase by scanning even more base pairs). This web site hopes to document many of these "private" Y-SNPs to make it easier locate these important Y-SNPs and know the status of testing for these "private" Y-SNPs.

The future of Y-SNP testing is very exciting

The future is very bright for Y-SNP testing. Testing the entire Y-chromosome for unique Y-SNP mutations is not currently economically feasible (requires a full genome scan which costs around $5,000 and only gives you raw data requiring complex tools to analyze). The current "Walk the Y" test from FTDNA only tests around 350,000 to 450,000 base pairs of the Y chromosome. This is only a small fraction of the Y-Chomosome which is around 58,000,000 base pairs and less than one percent is researched via the $900 "Walk the Y" test. The Nat Geo 2.0 test includes around 10,000 new Y-SNPs from several academic sources and FTDNA's "Walk the Y" testing. These new Y-SNPs have been not well researched to date. Many of these Y-SNPs are private (genealogical) Y-SNPs that are ignored by the academic community since their interests are primarily oriented towards deep ancestry research.

Estimates range between 10,000s to 100,000s of private Y-SNPs could be discovered. Eventually, every genealogical cluster could have several private Y-SNPs or dozens of Y-SNPs assigned to the genealogical cluster. This means that Y-SNPs could assigned to individual well proven ancestors in the future and most Y-SNPs would have parents and brothers connected in manner similiar to a traditional genealogical descendancy chart. Y-SNP results combined with full Y-STR testing (400 to 500) is estimated to yield several unique mutations per individual on the average. Since only Y-SNPs provide how these mutations are connected and indicate a relative time frame, Y-SNPs should become the primary genetic test of the near future. Future Y-SNP tests will not only reveal which genealogical cluster that a donor belongs to but could also reveal which known well proven ancestor that the donor belongs to. Although the tests of future will cost much less, many more sponsors will be required to test to reveal our recent ancestry.