Bryan and Jordan Casey

Bryan Casey (age two days - left) and Jordan Casey (age two years), reaction to "that's your baby brother," all descendants of Robert Casey, photograph 1989, Dallas, TX

How To - Y-SNPs (Future Trends)



The future of DNA analysis for genealogists is extremely promising but is evolving painfully slow. Most submissions are random submissions and submissions trickle in at very slow pace due to the high cost of widespread testing. Many submissions have little or no accompanying traditional documentation and everyone is still learning how to analyze DNA submissions. Eventually there will be dozens of submissions for every genealogical cluster with one hundred or more markers to analyze. There is a biological limit to the number of Y-STRs that mutate in a manner compatible with genealogical needs. There are believed to be only 200 to 400 Y-STRs that will match genealogical needs. Over the next few years, Y-SNPs will grow from 500 haplotree branches and 150 "private" Y-SNPs to 5,000 haplotree branches and 50,000 "private" Y-SNPs. This massive growth will provide a wealth of new genetic information to analyze and will introduce some significant growing pains. Over the next few years, the size of autosomal databases will grow significantly and these tests will need to be integrated with Y-STR and Y-SNP genetic information.

In the next two to three years, the scope of genetic genealogical testing will include the entire genome instead multiple tests for each individual being tested. The full genome tests will eventually decline to $1,000, then $500 and eventually $250. Once tested, you have the test results sent to you which will include: 1) all 400 Y-STRs; 2) millions of Y-SNPs that have the potential to mutate; 3) autosomal areas (probably 100 times the current tests which will provide some improvements); 4) health and gene information (if you select the option); 5) all mtDNA data; 6) X-DNA which may have some genealogical value; and 7) many common specialty tests (such as the special 464 test that reveal non-standard GATC values). This massive infusion of new data will create an unbelievable increase of genetic information to analyze and will certainly have some very significant learning curves as the data will become available faster than the genetic community can easily analyze immediately. The days of primarily manual analysis will be supplemented with new advanced analytic tools.

The first complete scan of the first person's entire genome (every human DNA marker that exists) far exceeded $10,000,000 in 2004. Just four years later in 2008, many genome scans were conducted for less than $1,000,000 per individual. Just one year later in 2009, dozens of genomes were scanned for less than $100,000 per genome. Already in 2011, hundreds of genome scans have been conducted and the cost has been reduced again to under $10,000 per scan. In August, 2011, a new DNA testing company announced $5,000 full genome tests. There is great excitement in the scientific and medical community that "under $1,000" full genome tests will be available in the next year or two. Most believe the "under $1,000" test will be the cost threshold where massive medical testing will become feasible. The costs of future testing for genealogists will be primarily driven by the overhead of delivering this information to genealogists and will require complex software to analyze which will also become a major expense in analyzing millions of markers for each submission.

DNA testing is in the early stages of the technology maturing cycle. Currently, the hardware costs of DNA scanners (and associated "consumable" supplies) are the dominant economic factor and the associated labor costs are not far behind. Currently, the software and software development expenses are in distant third place. Software development costs are currently very limited to simple MRCA calculators, web access to place orders and provide information, relatively small databases for repositories of DNA submissions and simple search engines to compare submissions. This mixture of expenses behind testing your DNA will radically shift over the next few years. DNA testing technology is currently in the same state of affairs for corporate data processing was around 15 to 20 years ago.

Early the technology cycle, the costs of running a corporate data centers shifted from hardware related costs to labor costs to support these systems due to hardware costs decreasing at staggering rates. Labor costs soared due to massive increases in software development to take advantage of massive increases in computing power. Costs for generic software productivity tools greatly increased in order to reduce labor costs by increasing labor productivity. Today, the hardware operational costs of the corporate data center is only a small fraction of software development costs. This same technology maturing cycle will be repeated with the DNA testing industry. Hopefully genealogists will be able to gain a free ride for much of the complex software analysis tools required by the medical industry. The future DNA testing companies will become very dependent on software analysis tools and database extraction tools to analyze the massive amount of data that will become available. This time of transition will require major investments in software development as well as increased skills by genetic researchers.

The number of Y-STR markers will increase with the availability of full genome scans. I can not imagine that genealogists will not taking advantage another 100 to 300 additional Y-STRs when they become available at no additional charge. However, the emphasis will shift from Y-STR analysis to Y-SNP analysis. Y-STR markers are relatively fast mutating markers and only produce clusters of related submissions. Y-STRs rarely show how all the clusters are connected or the age of each cluster. Fortunately, Y-SNPs also form branches that have much less ambiguity, reveal how branches are connected and imply the relative age of each mutation. Just a few years ago, the deep ancestry researchers were discovering branches that occurred 2,000 to 4,000 years ago and most genealogists paid minimal attention to this research. However, the origins of Y-SNP branches of mankind is now approaching 1,000 years for many new haplogroups and many Y-SNPs are already under 500 years where they will have a profound impact on genealogists.

There is another future major limit where DNA testing will eventually hit another brick wall. DNA testing only works where there is a reasonable amount of traditional documentation available to assign names, places and specific dates to genetic connections. DNA testing is a great complementary source for genealogical information in the 200 to 400 year time frame where we can enhance our knowledge of the connections of oldest proven ancestors within several generations of these oldest proven ancestors. Eventually, our genealogical research will approach a time frame where 90 % of the evidence will be genetic and only 10 % of the evidence will be traditional documentation. We could discover how are distant ancestors are connected - but may not be able to put names, specific dates and places due to lack of supporting traditional documentation that provides this information. Of course, there will always be set of lucky individuals that tie into more wealthy ancestors that left a better paper trail behind. As the genetic genealogical research travels further back in time, new brick walls will become limiting factors again due to lack of any significant amount of traditional documentation to add any genealogical meaning to our genetic family histories.

As technology greatly enhanced our ability to access, research and document our family histories, DNA testing is providing a new infusion of source documentation to complement our traditional documentation sources. As it took many of years for many genealogists to embrace new computer technologies, it will probably take many years to develop and embrace new DNA technologies for genealogists. Previous generations saw other technology improvements that we take for granted today. Cars allowed us to be more mobile and visit remote courthouses, telephones allowed us to call our distant cousins and copiers enhanced our ability to duplicate source records to share. Improvements in technology of personal computers and the internet databases supporting genealogists will also continue to improve over time but the sheer magnitude of unreliable information continues to explode as well. It is naive to believe that DNA testing for genealogists will be the last major quantum leap in enhancing our genealogical research. If anyone has any ideas of other near term quantum leaps on the horizon (other than DNA related), drop me a note so that I can start preparing for these new opportunities to learn yet more new analysis skills.