Tuesday, 13 November 2018

FTDNA Thanksgiving Sale

There are some incredible discounts in the current FTDNA Sale which lasts from now until Nov 22nd. And there will probably be a Christmas Sale after that. So now is the time to start thinking about getting that upgrade or that extra kit.

Below are the sale prices and they are the lowest I have ever seen.
Y37 for just $99 ...
Family Finder for just $49 ...
and $100-140 off Big Y upgrades.

This feels more like Crazy Eddie's Second Hand Car Deals!

If you have any questions about your own particular situation, just drop me an email.

Maurice Gleeson
Nov 2018

Thursday, 30 November 2017

DNA reveals a major Welsh connection?

One of the main reasons for using DNA as a genealogical tool is to help you break through Brick Walls in your family tree research - those dead ends or roadblocks where you are currently stuck and can't go back any further. Only this year DNA helped me achieve a major breakthrough on one of my ancestral lines which has taken me on a wild adventure that is continuing to shock and surprise. This particular breakthrough throws new light on the Early Limerick Spierin's and the social circles in which they moved. And it also reveals that some of us within the project may have ties to the English monarchy.

The most useful of the three tests for genealogical purposes is the autosomal DNA test. We each have 46 chromosomes in every cell in our body and the autosomal DNA test assesses all 46 of these in women and 45 out of the 46 in men. (The remaining 46th chromosome in men is the Y chromosome and the Y-DNA test is the primary focus of the Spearin Surname Project).

I did my first autosomal DNA testing with FamilyTreeDNA back in 2010. I tested myself, my father and my maternal aunt. We each had several hundred matches, most of them distant relatives with no obvious common ancestor despite comparing respective family trees. This is what it was like in the early days of autosomal DNA testing - the various company databases were small and close matches were not common. That has all changed in recent years with currently about 9 million customers in the top three company databases (Ancestry 6 million, 23andMe 2 million, FamilyTreeDNA 1 million). This number is set to reach about 25 million by 2020 and more and more people will find close matches in the databases. By a close match I mean someone with whom you share a common ancestor in the relatively recent past, say up to a third cousin (and therefore common great great grandparents).

As time moved on, I tested several other family members, including more distant cousins. And several other distant cousins became interested in DNA testing and they paid for their own tests (all with FamilyTreeDNA). Then, about 2 years ago, it came to my notice that four of my family members who had tested all descended from the same ancestral couple - Patrick Spierin (PS) and Mary Morgan (MM), my great great great grandparents, and one of my Brick Walls. This is illustrated in the diagram below. The four family members were my Dad (MHG), his paternal first cousin (COC), his 2nd cousin once removed (KS), and his 2nd cousin twice removed (EW). The line of ascent from my Dad to PS & MM is indicated by the green rimmed boxes. It occurred to me that any matches that they shared in common with each other were likely to be related via PS & MM. And by contacting these shared matches, one of them might hold the clue that allowed me to break through the Brick Wall I had at that level and allow me to push my family tree back an extra generation.

Triangulating on Patrick Spierin & Mary Morgan

That was the theory at least. But could I prove it?

So as an experiment, I compared their respective lists of matches (each had about 1000 matches at this stage) and extracted those matches that any two of the four family members shared with each other. This was relatively easy to do as FamilyTreeDNA allows you to identify such Shared Matches and download them into a spreadsheet as an Excel or csv file. You can see the actual numbers of Shared Matches shared among the four family members in the diagram above. I ended up with a spreadsheet of 135 Shared Matches and after removing duplicates, I was left with 100 people.

I next wrote 100 individual emails to all the people on the list, explaining that they match two or more of my four family members, all of whom were descended from PS & MM and asking them if they had any Spierin or Morgan ancestors in their family tree.

The response rate to this exercise was pretty good and over the next several weeks I received 49 replies, trickling in slowly and tantalisingly. Each email reply was eagerly opened. Would this be the one that held the treasure? And each reply was a polite no. No Spierin ancestors, no Morgan ancestors. I quickly learnt to expect disappointment.

And then came the 50th reply. From Tony in Arizona.

"YES!! We have Morgan ancestors!"

I was thrilled. Had the experiment worked? I quickly sent Tony a link to my online family tree so that he could see what information I had about Patrick Spierin & Mary Morgan. I knew that they were married in Tipperary in 1828 but I had no date or place of birth for either of them. Presumably they would have been born sometime between 1800-1810. They had several children and the baptism records revealed that Patrick Spierin was variously a Police Constable and a "Sargent at Arms". I searched for him in the records of the Royal Irish Constabulary but he was not there. It is probable that he served in the Peace Preservation Force (PPF, an armed militia) but sadly there are no surviving records related to the PPF. However I found mention of him on several occasions in local newspapers of the period. Apparently on one occasion he was involved in a riot and shot someone. And on another occasion (in 1838) he was the arresting officer in the notorious murder case of Wayland and Cooper, two local landlords who were set upon by the Whiteboys, a local vigilante gang that intimidated landlords who evicted tenants, for this was the time of many "agrarian outrages". Four men were charged with the murder and the case made constant headlines in the papers. Two of the men escaped, one died in prison, and one was hanged outside Clonmel gaol. The case sparked the Devon Commission enquiry into land ownership in Ireland.

But by 1845, Constable Patrick Spierin had had enough. He next appears in Dublin as a porter for the Great Southern and Western Railway. It may be that Tipperary was getting too hot for him. Maybe people singled him out as "the man who got Con Hickey hung", and so he left with his family and moved them to the relative safety of Ireland's capitol. And that is where they lived out their days. He died in 1872 and Mary in 1878.

And that was all the information I had on them. And this is what I shared with Tony from Arizona. I also asked if I could see where his Morgan line appeared in his tree. And that is where we hit a problem. Tony knew there was a Morgan connection but nobody in the family knew where it fitted in. But it was plainly written for all to see on the gravestone of the wife of John Morgan in St Laurence's Cemetery in Limerick.

The gravestone erected by John Morgan

The gravestone states that John Morgan's wife had died in 1879 (aged 78) and he had erected the stone in her memory. But also buried in the gave is John Morgan's great nephew, John O'Dwyer who tragically died in World War One on 23rd Feb 1917. Now Tony's family knew this great nephew well, and knew where he fitted into the more recent family tree, but nobody had any idea who John Morgan was nor how he was related to John O'Dwyer. Assuming that John Morgan was roughly the same age as his wife, he would have been born about 1800, which is around about the same time I estimated that my great great great grandmother Mary Morgan was born. Could it be possible that John and Mary were brother and sister?

Another member of Tony's family agreed to do a DNA test (Tony's first cousin) and she too came back as a match to some of my four family members. This certainly supported a family connection via the Morgan's but it was a distant match and we could never be sure that there wasn't some second connection on some other ancestral line in either of our respective family trees. And despite an exhaustive search of the usual genealogical records, we could not precisely place John Morgan in Tony's family tree.

Sadly, after several months of flurried activity, we were going nowhere and we put the research aside.

Two years later, I received the 51st response.

In fact, this was a new match, someone who had recently done an autosomal DNA test. And Andrew informed me that he too had a Morgan ancestor - Patrick Morgan, born about 1812. And he had served for many years in the Royal Irish Constabulary. Bells started going off. My Patrick Spierin had been in the Peace Preservation Force. Were the two men associated with some sort of traditional family occupation?

Andrew sent me Patrick's picture, taken in the 1860s, in all his fine regalia, including a wonderful ceremonial sword. Could Andrew's Patrick Morgan have been a brother to my Mary Morgan and Tony's John Morgan?

Photo of Patrick Morgan (c. 1867)

I took out all the old research and started emailing back and forth with Andrew. I searched online for new clues. I searched on Ancestry for records and other family trees containing Andrew's Patrick Morgan. No luck. I googled Patrick Morgan RIC and generated pages of search results. And then one of them caught my eye. It was a family tree on GENI (www.geni.com). And sure enough, there was a family tree that contained Andrew's Patrick Morgan. But wait ... it also had his parents! Oh wow, I thought, this could be a breakthrough for Andrew! And it also had siblings for Patrick Morgan, and sure enough one of them was a John Morgan, who was married to a Mary! Hey, that could be Tony's ancestor - that could be a real breakthrough for Tony! Two for the price of one! And then as I looked further along the line of siblings, I came to someone called Patrick Spierin, married to a Mary Morgan! Hey! Those are mine! What are you doing with my ancestors in your tree?!

The Morgan family tree on Geni ... with all the key players therein!
(click to enlarge)

After the initial shock, I searched desperately for sources. Where did this information come from? What wonderful records had I missed in my own research? But there were no sources. It was just a tree with names. I could not verify any of the information against independent primary sources. This was disappointing because this meant that the only way I could access the sources would be to contact the owner and the tree had been created in June 2009, eight years previously. I had severe doubts that any email I sent to the author of the tree might take another 8 years to be answered. I was disappointed and feared the worst. I wrote a brief email making gentle enquiries about the source of the information and sent off the email into the ether with few hopes of receiving an early response.

And how wrong I was! A few days later I received a wonderful reply from George. "You ask what my sources were" he said. "Quite simply, the notebooks of Professor Wardell". "Who was he?" I asked. The reply: "Professor of Military History in Trinity College Dublin at the turn of the century. He undertook a study of the Morgan surname in Ireland and had access to all the records that went up in smoke in the Public Records Office fire of 1922. And he recorded everything in his notebook. And I have his notebook."

George's source for the family tree was Prof Wardell's notebooks from the early 1900s

It is not very often that I stare at my computer in complete astonishment but this was one of those times. Could we actually have solved the mystery, united the three Morgan's as siblings, and pushed back the family tree one extra generation?

In fact we did much more than that.

George, Andrew and I started exchanging emails left, right and centre. Suddenly huge amounts of information started flooding in, not just from the notebooks of Prof Wardell that George had in his possession but from a whole array of associated records. And in fact Prof Wardell's notebooks not only established that our three Morgan ancestors were siblings, but it pushed the Morgan family tree back five generations to the Morgan's of Dunmoylan and Old Abbey. You see our Morgan's were landed gentry and went all the way back to Limerick in the 1600s. All the way back to Lieutenant Edward Morgan who in 1716 (or thereabouts) married Alice Spierin, daughter of Luke Spierin!

The Morgan's of Old Abbey - note the Luke Spierin connection

Not only that, but they claimed descent from the Morgan's of Tredegar in Wales, who have a pedigree that goes back to 1089.

Not only that, but on a recent visit to the LDS Family History Library in Salt Lake City, I discovered an old pedigree that states that the Morgan's of Tredegar are descended from "six Kings, five Lords, and a Ducke". And during that visit I met Jim who informed me that he too is descended from the Tredegar Morgan's (making us something like 13th cousins) and that some of our Morgan relatives (fellow Morgan descendants) include somebody called JP Morgan, someone else called Princess Diana, and someone else who was a Captain in the Caribbean and gave his name to a bottle of rum.

And because the first Morgan settler in Ireland married the daughter of Luke Spierin, any of her descendants will automatically tie in to the Morgan's of Tredegar and their connections to the English monarchy.

So I am still reeling. And suddenly we have an awful lot of fact-checking to do. And lots more blog posts to write!

But what incredible adventures awaits us!

Maurice Gleeson
Nov 2017

Friday, 17 November 2017

FTDNA Holiday Sale until Dec 31 2017

FamilyTreeDNA have launched their Annual Holiday Sale. This runs from the last day of the Annual FTDNA Conference (Nov 12th 2017) until the end of the year. So now is the time to buy FTDNA tests and take advantage of some of their lowest prices ever. They also make perfect Birthday, Thanksgiving & Christmas gifts for friends and family.

2017 Holiday Sale Discounts

There are discounts on many of their products including upgrades on mtDNA and Y-DNA. The discounts represent approximately a 10-30% reduction from the usual price.

There is a special offer regarding the Big Y test. The usual price is $575 but there is a $100 discount in the sale. Further discounts are possible with the vouchers described below. But everyone who buys a Big Y test will automatically get a FREE upgrade to the Y-DNA-111 test. So if you have only tested your Y-DNA to the 37 marker level, buying the Big Y will get you a free upgrade to 111 markers (which would normally cost you $188).

Even if you haven't done a Y-DNA-37 test yet, you can order it at the Sale Price, and use a voucher for a further discount, and then once it has registered on the system, you can order the Big Y test and get the $100 Sale Price discount, and any additional voucher discount, and a free upgrade to 111 markers. This is a very good deal indeed!
So if you were very lucky, you could get the Y-DNA-37 for $109 (using a $20 voucher) plus the Big Y for $375 (using a $100 voucher) and the free upgrade to 111 markers. This wold normally cost $169 + $575 + $188 = $942 but you would be getting it for $484. This is only 51% of the price you would normally pay.

As mentioned above, you can use Holiday Reward vouchers to lower the sale prices even further. These will be issued every Monday until the end of the Sale but each voucher only lasts for 7 days so you have to use them quickly. In effect, this may reduce the cost of the Family Finder atDNA test to $49 and Y-DNA-37 to $109.

A $20 voucher for the Y-DNA-67 test

To access your voucher, simply log on to your FTDNA account and click on the Holiday Reward icon on your home page. If you make a purchase during the Sale, you frequently get a Bonus Reward as well. This gives further discounts on other tests.

And if you want to use the voucher for yourself, simply click on the Enjoy Rewards button and the product will be added to your Cart and the discount applied. Alternatively you can give the voucher to friends or family by clicking on the Share Rewards button. Each voucher can only be used once, and must be used before the weekly deadline.

A lot of people donate any vouchers they are not using so check the ISOGG Facebook group and Genetic Genealogy Ireland Facebook group for any unused vouchers that you might be able to take advantage of. Be warned, they go fast so you might have to try several before you find one that works.

Enjoy the Sale!

Maurice Gleeson
Nov 2017

Thursday, 11 August 2016

Vive la France! Another Project member, another clue?

A new project member has joined Genetic Family 1 (GF1, the Limerick Spearin's). Does he offer further clues as to the origins of the Spearin family?

His surname is Laveaud and he lives in France. He did the Y-DNA-37 test back in January 2015 and later upgraded it to 67 markers in May, and 111 markers in April 2016. He also did the I-M223 SNP Pack in March this year.

How close does he match?

He matches the members of the Spearin DNA Project very closely. There are 13 members in Genetic Family 1 (GF1) and he matches 2 of them at 111 markers, 5 of them at 67 markers, 2 of them at 37 markers, and all 13 of them at 25 markers. So this is a strong association.

The Genetic Distance between him and the two Spearin project members that he matches at the 111 marker level (200083 & 209996) is 8/111 and 9/111, and the estimated time back to the common ancestor between them all is about 9 generations (midpoint estimate) but with a 90% range of 4 to 16 generations. This roughy translates into an estimated year of birth for their common ancestor of 1680 (range 1470-1830; this assumes 30 years per generation and the average year of birth of project members being 1950).

So this means that his connection to the Spearin’s could be sometime in the 1600’s. This could be around the time they arrived in Ireland, or (more likely) before that time. In the early 1600s, we believe they lived in London, and before that, in the 1400s and 1500s, we believe they lived in Flanders (today’s northern Belgium). We are still looking for more people to do the DNA test in order to confirm this theory, and hopefully the more matches we get like our new member here, the more likely we can figure out where we all came from.

So how can a Laveaud be related to the Limerick Spearin's? 

There are several possibilities:
  • he is the result of an NPE* (his distant genetic ancestor was a Spearin)
  • we are the result of an NPE (our distant genetic ancestor was a Laveaud)
  • we spring from a common ancestor pre-surnames (e.g. some time before 1000 AD)
  • we are all victims of Convergence (i.e. by chance the match looks closer than it actually is)
But are there any clues from the evidence we have to date as to which of the above options it could be? Read on ...

* an NPE is a Non-Paternity Event and there are many possible causes (e.g. legal name change, surname switch, adoption, infidelity, illegitimacy, and many more). 

What do his I-M223 SNP Pack results show?

In a previous post, I described how the SNP Progression of the Limerick Spearin's has been clarified to be:
I- ... M438 > L460 > P214 > M223 > CTS10057 > Z161 > CTS6433 > Z78 > CTS8584 > Z185 > Z180 > L1198 = Z166 > Y17535 > Y18109 
The results of Mr Laveaud's I-M223 SNP Pack show that he is positive for Z166 and L1198 (upstream Spearin SNPs) but negative for all the SNPs below this ... or rather for all the SNPs below Z166 that were included in the Pack. And not all sub-Z166 SNPs were included. One very important SNP that is missing from the pack is the unique Spearin SNP marker Y18109, and the one immediately above it, Y17535 (in blue below).  And this is the area of the tree we are most interested in because anyone who sits on this branch (Y17535 or below) will help shed further light on where the Limerick Spearin's originated.  I have asked FTDNA to please add Y17535 to the I-M223 SNP Pack. I will update this blog post with their reply in due course.

The current position of the new member on the Haplotree
green = tested positive; red = tested negative; blue = test available;
black = test not done and not available

So, our new member could sit on one of the blue branches above, or some other branch currently not identified.

In fact, this is only half the story, because checking the same SNPs on YFULL reveals that Y17535 and Y18109 are the first of several SNPs in two SNP Blocks:
  • Y17535 block (5 SNPs) - Y17535, Y17536, Y17537, Y17941, Z21761
  • Y18109 block (10 SNPs) - Y18109 to Y18118

Furthermore, look at the formation dates that YFULL gives for these SNPs relevant to the Limerick Spearin's ... 
  • Z166 (L1198) was formed some time between 3000 - 2700 years ago
  • Y17535 was formed some time between 2700-2200 years ago 
  • Y18109 was formed some time between 2200 - 150 years ago, and this 

Given the close connection between the new member and the Limerick Spearin's (GD = 9/111), he should test positive for SNP Y17535 (which is over 2000 years old). However, there is a huge gap in the time estimate for the formation of Y18109 and this means that there is a strong chance that the new member will not test positive for this SNP marker. He may sit on an adjacent branch to the Spearin's that has not as yet been identified. But if he does test positive for Y18109, then it suggests a very close association, and makes the probability of an NPE more likely.

A lot of these questions would be answered if our new member did the Big Y test, but it is expensive ($575). So alternatively, it may be worthwhile for him to test for the single SNP Y17535 ($39). And if that is positive to test for Y18109 ($39), to see if he sits on the same branch of the Human Evolutionary Tree (Haplotree) as us Spearin's. It's a bit lonely out here!

We could ask other close neighbours to test on the Big Y or Y17535 - but the problem is trying to identify them. Of the GF1 project members who have tested out to 111 markers, our new Laveaud member is their only non-Spearin match. And at the 67 marker level, there are loads of matches (20+) with some of them obviously sitting on adjacent branches to the Limerick Spearin's that connect over 2000 years ago (e.g. branches Y6060 and PF5268). This indicates that there is a degree of "Downstream Convergence" among our 67-marker matches and it will be difficult to identify which of them are truly our closest neighbours. So testing people on these branches may not be of much help ... but it may be the only option we have.

However, there is one ray of light. Our new member Laveaud matches two GF1 Spearin members (GD = 8/111 & 9/111) ... but also a third individual by the name of Razee, with ancestry from Rhode Island. He is kit 70816 in the diagram at the end. This match has a GD of 10/111 to Mr Laveaud and is positive for L1198 (roughly equivalent to Z166).  This could be the result of "downstream convergence" but there is some evidence to suggest that this is not the case. We will discuss this below.

But first, is there anybody among the array of people in the I-M223 Haplogroup Project that we could target for SNP testing? These people were grouped based on STR values initially and these groupings are continually being refined by additional SNP testing. The ones of interest to our GF1 are the ones that could possibly sit on our Y17535 branch or one of its sub-branches (only BY3098 has been identified so far). We are not that interested in those that sit on branches adjacent to Y17535 (i.e. Y6060, Z190 / S20905, P185_2, and PF5268) because they are likely to be too far back in time to be genealogically relevant (i.e. over 2000 years ago).

So ... this limits our field of interest to the groups known as :
  • Cont1 Group 1
  • Cont1 Group 1a
  • Cont1 Group 2
  • Cont1 Group 3 (our new Laveaud member has been placed here)

The Limerick Spearin's belong to subgroup Cont1h1. Several questions spring to mind: would it be possible to identify the most closely related GF1 matches among these subgroups and encourage them to test? Would a cladogram help? Has the cladogram changed over time as SNPs have become available, or have they verified the cladistic structure that was generated via STRs? i.e. how accurate were the STR-based estimates? I will discuss these points on the I-M223 Project's Activity Feed and with the I-M223 Project Administrators and feedback in due course.

The Limerick Spearin's belong to Cont1h1

Some Traditional Genealogy

Mr Laveaud's ancestry goes back to La Tremblade, an area on the west coast of France, south of La Rochelle and north of Bordeaux and the Garonne / Dore estuary. It is well known for its oyster farms.

From 1200 onward, the area became an international trading centre with merchants from England, Spain & Flanders, importing wool and exporting both wine and salt. Potentially, because of this trade, there could have been some contact between the merchant Spearin's from Flanders and the trading Laveaud's from France.

He can only trace back as far as his grandfather but the Laveaud surname has been in France since before the 1600s, and possibly at the start of surname formation in France, which was about 1100-1200. It is quite a common name because it means "the valley" and thus may have multiple different origins. 

La Rochelle was a Catholic fiefdom but the city of La Tremblade became Protestant around 1540 so it is probable that the Laveaud's were Protestants around 1600. Around 1650 La Rochelle came under the protection of Louis XIV because of its Catholic population which apparently included an Irish community. The Laveaud's probably became Catholics around 1680 with subsequent persecution and oppression. 

The Spearin's were most likely Protestant, definitely while in London, possibly in Flanders, and certainly on arrival in Limerick. Most became Catholic after several generations in Ireland. The reason this consideration of religion is important is that it may help us to ascertain if the Spearin's and Laveaud's were likely to have mixed socially.

La Tremblade was a big French port with much traffic going to Quebec (New France) from the 1600s onwards. Oyster-farming became a major industry around this time. In 1876, La Tremblade was the fifth Port of France just after la Rochelle.

The Razee Connection

The connection with Razee mentioned above is very interesting and potentially very important. He is kit 70816 in the diagram at the end. There is a family of Razé from France who owns oyster farms in La Tremblade! Are these the ancestors of the Razee from Rhode Island? Has he found his ancestral origin? From the TiP Report for our new member, it appears that the common ancestor between Laveaud and Razee was a little earlier than the common ancestor he shares with the Spearin group, perhaps about 1620 (range 1380-1770; 11 generations, range 6-19). However, these STR-based TMRCA estimates must be taken with a pinch of salt. The actual connection may be further back in time.

It would be useful if the Razee family from America did some additional DNA testing, specifically the Y17535 SNP ($39) and if that is negative, then the I-M223 SNP Pack ($119).  Also, it would be great if one of the Razé family from La Rochelle did the Y-DNA-37 test - this could confirm the origins of the American Razee family. I have written to the Administrator of the Razee Project and I offered her these suggestions.

I wonder where the name Razé originally came from? Below are some surname distribution maps for several variants of the name. These are from http://worldnames.publicprofiler.org/Default.aspx?country_code=BE ... They appear to have come from Belgium or France, and a few of them may have become Rasey in England. In contrast, Laveaud is a French surname, with high concentrations near La Rochelle. There is a marked similarity of high surname density for the surnames Razé and Laveaud in the area of La Rochelle suggesting that people with these surnames lived in close proximity to each other.

Modern Surname Distribution Maps of
Razee, Rasey & Razé

Surname Distribution of Laveau & Laveaud

Other neighbours ... from Portugal & Cuba?

A recent response from the I-M223 Yahoo group was from a neighbour who sits on an adjacent branch (BY3098, Cont1h2) to the Limerick Spearin's of GF1 and our connection is some time in the last 2200 years. His YFULL ID is YF10785 (FTDNA kit 260237, surname Braz, MDKA Domingos Pires Preto, b. ~1600 - Penela da Beira, Portugal) and he reports that all his ancestors are from Portugal (up to 1600). So this suggests that maybe the Spearin's came from Portugal or maybe his family came from Flanders sometime between 200 BC and 1500 AD. I know that Spain had control of the Flanders area for a while (1581-1714) so maybe this is where there is a connection? or could it be via trade and merchants pre-1500? or maybe it is further back?

Also, there is another new neighbour on this adjacent BY3098 branch (Cont1h2) and he also appears to have origins from the Iberian Peninsula. His surname is Lopez-Carnicer (kit N16676) and his MDKA is Carlos San Justo Lopez Castillo (born 26 Oct 1797) and he is from Cuba. Presumably he was a Spanish or Portuguese emigrant.

So somewhere back in time these two branches (ours and the Iberian's) meet up - the question is where?

The answer probably lies in finding other close neighbours who can help fill in the pieces of this fascinating jigsaw puzzle. And that means encouraging more people to do SNP testing, either with the Big Y or the I-M223 SNP Pack.

Possible Next Steps
  1. date the Cont1h1 (BY3098) sub-branch (via I-M223 Project Admins & Activity Feed)?
  2. get the updated cladogram for the Cont1 group?
  3. identify who is nearest to our Cont1h1 branch among the Cont1 members without subgroups?
  4. target them for testing 1) Big Y; 2) single SNP Y17535? 3) I-M223 SNP Pack?
  5. write to RAZEE if no response from Project Administrator?

The subgroups below Z166 of possible relevance to GF1
(from the I-M223 Project)

Wednesday, 27 April 2016

New Project Member ... with clues to the Spearin origin?

Please welcome a new project member to Genetic Family 1 (GF1; the Limerick Spearin's). Member 458314 is not a Spearin, he's a Graham. But he is a fairly close match to the members of GF1. And he may hold clues to the origins of the Spearin's in GF1.

Evidence from STRs
Below is the summary of Mr Graham's Genetic Distance to other members of the Spearin Surname Project at the 67 marker level. He has a Genetic Distance (GD) of 5/67 with his closest match, 6/67 with 4 other members of GF1, and 7/67 with two other members. He also matches two people in the Ungrouped category but much more distantly (9/67 and 20/67).

TiP Report (at 67 markers) for new member 458314 showing his GD to his closest matches in the project
(click to enlarge)

Clicking on the TiP Report for his 7/67 matches reveals that the TMRCA (Time to Most recent Common Ancestor) is estimated to be about 12 generations ago (50% probability level) with a 90% range of 6-21 generations ago. That roughly equates with 360 years before present (90% range 180-630 ybp), which in turn gives a rough estimate of the birth year of the common ancestor of 1590 (90% range 1320-1770, assuming the average birth year of those tested is 1950).

So why does Mr Graham match the GF1 Spearin's?

There are several possible Scenarios based on the DNA alone:
  1. There may have been an NPE (non-paternity event, e.g. adoption, illegitimacy, etc) somewhere along the line and Mr Graham is actually a Spearin, and we all share a common Spearin ancestor born about 1590 (I say we because I too am a GF1 Spearin).
  2. We Spearin's in GF1 are actually all Graham's, and the NPE was on our line, not Mr Graham's.
  3. Mr Graham and the GF1 Spearin's are actually related prior to the common usage of surnames, which in the UK occurred around about 1200-1300. However, we see from his TMRCA estimate that there is a 95% probability that he is related to the GF1 Spearin's sometime after 1320, so this scenario seems unlikely.
  4. What we are looking at is in fact an example of Convergence. This is when the genetic profile of one person appears to be fairly close to that of another person but in fact there are hidden back mutations or parallel mutations within their profiles that make them related much further back than they seem.

So which of these scenarios is the most likely in Mr Graham's case?

Evidence from SNPs
Well, we might get some clues from the terminal SNP markers of his closest matches. Mr Graham's own terminal SNP is the upstream SNP M223 placing him firmly in Haplogroup I (along with the GF1 Spearin's).
  • At 67 markers, he has 19 matches whose terminal SNPs include L1198 (x1), PF5268 (x1), Y18109 (x2; both GF1), Y6060 (x1), and Z166 (x2; 1 from GF1).
  • At 37 markers, he has 11 matches including L1198 (x1) and Y18109 (x1; GF1)
  • At 25 markers, he has 44 matches including CTS6433 (x1), L1198 (x1), PF5268 (x1), Y18109 (x2; both GF1), and Z166 (x2; 1 from GF1).

All these terminal SNPs (except one) are on the same or adjacent branches of the human evolutionary tree to that on which the GF1 Spearin's sit. I've marked these branches with a red dot in the diagram below. The exception is SNP CTS6433 which is on the following branch:
  • I-M223 > CTS616 > CTS10057 > Z161 > CTS2392 > Z173 > CTS6433

So although the exception is still within the I-M223 haplogroup sub-clade (like the GF1 Spearin's), it is a completely different branch.

However, taking all the evidence into consideration, there seems to be little doubt that Mr. Graham will test positive for L1198 - the only question is which of the sub-branches does he sit on. Currently there appear to be 3 possibilities - Y6060, PF5268, and Y18109 (the GF1 Spearin's). To answer this question, there are several courses of action open to Mr Graham:
  • do the I-M223 SNP Pack ($119) - this will test for most of the relevant downstream SNPs (in pink below) but not all of them (in blue). Additional single SNP testing (e.g. for Y18109) might be indicated thereafter
  • do the Big Y test ($575, or wait for the sale when it is usually $475 or lower) and a YFULL reanalysis ($49) - this will assess most/all of the relevant SNPs and detect some new ones too

Placement on the Haplotree of the Terminal SNPs of Mr Graham's closest matches

If further SNP testing reveals that Mr Graham sits on one of the adjacent branches in the haplotree (e.g. Y6060), then the connection will be very far back in time (L1198 for example was formed 3000 years ago approximately - see diagram below), and if that is the case we are probably looking at is Scenario 4, Convergence.

However, if he sits on the same branch as the GF1 Spearin's (Y18109) then we can conclude that we are related within the past 2200 years (approximately) as this is when it is estimated that the SNP Y18109 was formed (see previous post). This is consistent with any of the first 3 scenarios above, but does not help us distinguish which scenario is the most likely.

Only by doing the Big Y test (and a YFULL reanalysis) would we get a better idea of which scenario is the most likely. If we compared his Big Y results to those of the GF1 Spearin's who have already done the Big Y test, we might find any of the following:
  • He sits on an adjacent branch, below L1198 or below Y17535 => the most likely scenario is Scenario 4: Convergence
  • He sits on branch Y18109, matches some of the 10 SNPs in the terminal SNP block, but does not match others. This splits up the Y18109 10-SNP block (as discussed in a previous post) and places him on a new adjacent branch with a branching point estimated to be either before the common usage of surnames (=> Scenario 3 is the most likely scenario i.e. he is related to the GF1 Spearin's before 1200-1300 AD) or after the common usage of surnames (=> Scenario 1 or 2 is most likely). Either way, the new branching point could be dated and would move everybody concerned further down the human evolutionary tree.
  • He sits on the Y18109 branch, matches all 10 SNPs in the terminal SNP block, but does not match any of the unique SNPs of those GF1 Spearin members already tested => Scenario 1 or 2 is most likely
  • As above, he sits on the Y18109 branch, matches all 10 SNPs in the terminal SNP block,  and in addition matches one of the existing Big Y-tested GF1 members on some of their unique SNPs => possibly Scenario 1 is the most likely and an NPE has occurred somewhere along Mr Graham's ancestral line. This would also create a new branching point which could be dated and would move (some of) us further downstream on the human evolutionary tree.

Branching points & Terminal SNP blocks below L1198

Genealogical evidence
So far, the discussion has merely focussed on the genetic evidence. But this is where we bring in the evidence from Mr. Graham's known genealogy. It is his grandson who manages his DNA results and here is what he says:
I submitted my Grandfather's YDNA to get tested as his father Edward Graham (+ Sister) took his Mother's maiden name "Graham" and he has no recorded Father on his birth record.

So once the results came in we had 9 good matches for Spearing, Speiran, Spearin, Speerin

Now the fun begins finding the link to a Spearin in New Zealand.

So clearly there is an NPE in the Graham line and it is at the level of Mr Graham's father (born in 1901 in New Zealand). The question is: does it go back to a Spearin or to some other surname?

One obvious course of action (as Mr Graham's grandson suggests) would be to search for a Spearin in New Zealand in 1901 who could have been Mr Graham's father's father. There are several potential candidates* in New Zealand around this time with the names Spearing and Sperring (more usually an English variant) but no one by the name of Spearin, Speiran, Speirin, or Spierin (more usually the Irish variant associated with the GF1 group). So, there is no clear signal currently that a GF1 Spearin was the father of Mr Graham's father.

Next Steps
One could try to track down some of the present day New Zealand Spearing's and encourage them to do a Y-DNA-37 test to see if there is a close match to Mr Graham. Or Mr Graham could do the Big Y test (and YFULL reanalysis) to see where he sits on the human evolutionary tree relative to the GF1 Spearin's.

The latter seems like the best course of action as it will give us the most information. It is likely to give us quite a bit of additional information about our relative positions on the haplotree but it won't answer all our questions - it may not identify any additional surname candidates for Mr Graham's father's father, and it may not give us any further clues to the ancestral origins of the GF1 Spearin's.

And as has been the case for many years, it will still be a waiting game to see if any closer matches to Mr Graham or the GF1 Spearin's emerge over time.

But one day, we will get there (in all likelihood). It is only a matter of time.

Maurice Gleeson
April 2016

* from the New Zealand Electoral Rolls 1853-1981 on Ancestry

Friday, 22 April 2016

Big Y results - comparing TMRCA estimates

In the previous post, we looked in depth at the SNP markers identified by FTDNA and YFULL, and compared the reports from each company for similarities and differences. However, this post explores the topic of TMRCA (Time to Most Recent Common Ancestor) and thankfully this is a lot more straightforward.

TMRCA Estimates based on SNPs
Another useful piece of information from the YFULL analysis is the estimate for when the SNPs in this terminal block emerged and the TMRCA estimate between the two volunteers who have tested (TMRCA is Time to Most Recent Common Ancestor). We discussed in a previous post that the SNPs in this block emerged about 2200 years ago (or 200 BC) but today we are looking at the TMRCA between the two volunteers.

SNP emergence estimate & TMRCA estimate for GF1

Their TMRCA is estimated to be a mere 150 years before present (ybp) by which they mean 150 years prior to the approximate date of birth of these individuals, which (let's say) is approximately 1950. This gives a common ancestor born about the year 1800. However, the 95% Confidence Intervals around this estimate indicate that it could be anywhere from 75 years ago to 500 years ago. Or in other words, we can be 95% confident that the common ancestor was born some time between 1450 and 1875. This estimate could be refined if more people from Genetic Family 1 (GF1) were to do the Big Y test and upload their results to YFULL, but for now there is no pressing need to do so.

Calculation of the TMRCA estimate

TMRCA Estimates based on STRs
But how does this compare with TMRCA estimates based on STR markers? The TiP Report for the comparison of these two volunteers is detailed below.  It is based on a comparison of their STR markers at the 67-marker level. You can access your own TiP Report by clicking on the orange TiP icon beside each of your matches. This tells you how close or how distantly you are related (based on your STR values). You can select comparisons based on 12 markers, 25 markers, 37, 67 or 111 (depending on how many markers you have personally tested).

This analysis assesses the probability that the two individuals share a common ancestor on their direct male lines within the past "X" number of generations. This is a cumulative probability and so the probability increases over time and eventually reaches 100%.

TiP Report comparing Volunteer A (H1223)  with Volunteer B (164729)
(click to enlarge)
The 50% (midpoint) value is about 10 generations - in other words, there is a roughly 50% chance that the common ancestor was born within the last 10 generations, and a roughly 50% chance that it was sometime before that. The 5% and 95% probability levels are about 4 and 19 generations respectively. Allowing 30 years per generation, this gives us a midpoint TMRCA estimate of 300 years before present (ybp), with a 90% Confidence Interval of 120 to 570 years ago. And translating this into actual years gives us a midpoint estimate of 1650 (assuming an average year of birth for the two volunteers of about 1950), with a range of somewhere between 1380 to 1830 AD.

This TMRCA estimate based on STR values (1380-1650-1830) is not that close to the TMRCA estimate based on SNP values (1450-1800-1875). In fact, the midpoint estimate is out by 150 years.  Also, the range around the "best estimate" is very large, and could be quite far back in time (1380-1450). This is why we really have to be careful when interpreting TMRCA estimates - they may be out by several hundred years ... and in either direction!

However, there is an additional technique we can use to try to obtain more accurate assessments of  TMRCA estimates for the entire group, and that is something we will explore in a subsequent blog post.

Maurice Gleeson
April 2016

Friday, 15 April 2016

Big Y Results - Terminal SNPs, Shared SNPs & Unique SNPs

In the previous post we looked at some of the initial results of the YFULL analysis of the Big Y test from two of the volunteers from Genetic Family 1 (GF1). In this post we will take a closer look at the SNPs revealed by the additional YFULL analysis and then compare and contrast them with the original results from FTDNA.

This is quite a long post but stick with it!

Terminal SNPs
The two volunteers from GF1 were given new ID numbers at YFULL and you can see them in the haplotree diagram below - they are the last two numbers.
  • The first volunteer (FTDNA kit number 164729) is YF04104 (results available in Sep 2015)
  • The second volunteer (H1223) is YF04316 (results Oct 2015)

The terminal SNP Block for GF1 on the YFULL Haplotree

Our two brave volunteers have been placed on the YFULL Haplotree as a sub-branch below SNP Y17535. Both our volunteers have the terminal SNP Y18109 or rather they have a whole "block" of terminal SNPs, namely:
  • Y18109
  • Y18110 
  • Y18111 
  • Y18112 
  • Y18113  
  • Y18114 
  • Y18115
  • Y18116
  • Y18117
  • Y18118 

Shared SNPs
When two or more people share several terminal SNPs in common, this terminal SNP "block" is usually named after the first SNP in the block, which in our case is Y18109. These terminal SNPs are shared between our two volunteers and no other people in the world (currently).

If we move up the tree to the next nearest branching point, we find this is marked by the SNP Y17535 (which represents a SNP block of 5 SNPs). There are 2 sub-branches below Y17535 - our own sub-branch (Y18109) with our own 2 volunteers, and another sub-branch (Y17535*) with a single individual. Thus 3 individuals (currently) share the SNP Y17535.

Shared SNPs on the L1198 branch of the YFULL haplotree
(click to enlarge)

And if we move further upstream to the next nearest branching point, this is marked by the SNP L1198 (representing another SNP block of about 7 SNPs, although some are "equivalent SNPs" i.e. the same SNP with several different names because it was discovered by several different people around about the same time - these are the SNPs separated by a forward slash). There are 4 sub-branches below L1198 - the Y17535 branch just discussed above (with 3 individuals), but also an L1198* sub-branch (3 people), a Y6060 sub-branch (with 3 subsequent sub-branches, the last of which also has 3 sub-branches; 9 people in total), and an S20905 sub-branch (aka Z190, but not shown in the diagram for some strange reason; with 4 levels of sub-branching) containing 11 people altogether (currently).

So, in total, 26 people share the SNP block L1198, 3 people share the SNP block Y17535, and 2 people (our volunteers) share the SNP block Y18109.


But as more people test, either from GF1 or our close genetic neighbours (if they exist), each SNP block should be gradually split up. In other words, we can expect our particular sub-branch of the tree to be joined by adjacent sub-branches sprouting nearby, some of which will "steal" SNPs from our current terminal SNP block. We can also expect further sub-branches to sprout below our current terminal SNP / SNP block. For example, if (say) 10 people from GF1 were to test, the current 10-SNP block might dwindle to (say) a 2-SNP block, with (say) 4 sub-branches below it - one sub-branch containing a single terminal SNP, another containing a 3-SNP block, and two containing a 2-SNP block.

The Take Home Message is: our current terminal SNP block will dwindle and will be split up as more people do the Big Y test (or similar tests).

Unique (Personal) SNPs
In addition to the 10 SNPs that the two volunteers share in common with each other (i.e. the Y18109 SNP Block), they each possess SNPs that the other does not have. In other words, they have their own unique, personal or "private" SNPs that no one else in the world has (currently). No doubt if we were to test other members of GF1 for these "private" SNPs we would find that some of these unique SNPs would no longer be unique anymore - they would be shared with other members of the Spearin group - thus dwindling the number of unique SNPs possessed by any given individual.

YFULL reports that Member YF04316 (H1223) has 16 "Novel SNPs" with 3 SNPs characterised as Best Quality, 1 as Acceptable Quality, and 11 as Ambiguous Quality. These are illustrated in the diagram below. In contrast, member YF04104 (164729) has 51 Novel SNPs but none are of best quality, 1 is of acceptable quality, 49 are of ambiguous quality, and 1 is of low quality (these are not shown here because they take up too much space). But what do they mean by quality?

The quality of a SNP is a reflection of how confident the company is about declaring it to be a true positive SNP and not a false positive finding. There are various reasons for why the test might throw up a false positive result and we don't need to go into the details here, but it is simply important to remember that some results may be false positives and it is best to focus on the SNPs that the company is most confident about (i.e. the best quality SNPs).

Unique SNPs (currently) possessed by member YF04316

If more people from GF1 tested, we would probably find that some of the 16 Best Quality Novel SNPs of Member YF04316 (H1223) would turn up in the results of some of the new people, and would no longer be "private" or unique - they would be shared by other members in the group. And this might even result in one or several more branches being formed.

So, in a similar way to how the shared SNPs in the current GF1 terminal SNP block will split up as our genetic neighbours get tested, these unique SNPs to H1223 will also gradually disappear as more people test. So, for example, if everyone from GF1 were to do the Big Y test, a lot of H1223's unique SNPs would turn out to be shared by other members of GF1 (and thus they would not be unique any more). This could be useful when building a Mutation History Tree (discussed in a subsequent blog post) but we could also probably achieve this with the existing STR data instead, so there is no burning need for more people in GF1 to do the Big Y test.

Comparison between FTDNA Analysis & YFULL Analysis
We have looked at the YFULL reanalysis of the Big Y data. Now we are going to compare it to the Big Y data analysis originally performed by FTDNA to see if (and where) there are similarities and differences.

The FTDNA results report that our two volunteers match on 24,165 known SNPs and differ on 2 known SNPs, namely YSC0000155 and PF3643. In fact it is member 164729 who appears to be lacking these SNPs - H1223 appears to have them both. This is very surprising given that we expect our two volunteers to be related by a common ancestor some time in the 1600's and so there should be a very close relationship between them with no major differences in the SNPs they share. So for them to differ by two SNPs is quite a surprise.

Furthermore, these two SNPs in question are nowhere to be found on either the FTDNA Haplotree or the ISOGG Haplotree. I found YSC0000155 on YBrowse and it was discovered in a Haplogroup J-L147 person but there is no further information available on this SNP. Similarly, PF3643 was discovered in 2011 and possibly belongs in Haplogroup I. The I-M223 Yahoo Discussion Group notes that PF3643 turns up in some but not all I-M223 people and that "some people's Big Y test did not record a result for PF3643. However, there is enough data to show that Z79+ people must have had a back mutation from derived C back to ancestral A." So it is difficult to judge whether these SNPs are relevant to our own particular Spearin sub-branch of the human evolutionary tree. I suspect that these particular SNPs may be quite far upstream from where we currently sit and are of no particular relevance to the conversation that follows. But I could be wrong.

Furthermore, the nature of NGS tests (Next Generation Sequencing tests) like the Big Y means that this particular test simply failed to detect these two SNPs this time around and they are in fact present after all. If we were to repeat the same test in the same individual they might pop up in the second test.

A big thank you to John Cleary who pointed out that you can check SNP information on YFULL if you know the SNP name. Just go to Check SNPs, enter the name of the SNP in question and click on the magnifying glass icon when it comes up.

I was able to check the YFULL website for FTDNA's mystery missing Known SNPs (YSC0000155 and PF3643) but obtained no additional useful information. I still do not know where these are placed in the haplotree. Perhaps they have not been allocated a position as yet.

Enter a SNP name to get SNP details
(click to enlarge)

But the above discussion relates to "known" SNPs. Let's take a look at the "unknown SNPs - the "Novel" SNPs.

Shared SNPs
According to the FTDNA analysis, our two volunteers have 201 "Shared Novel Variants" but when you click on the number 201, the pop-up box not only has Shared Novel Variants but also the SNPs unique to each of the two individuals. So this should not really be under the heading "Shared Novel Variants" as it also includes "unique" variants that are not shared with anyone. A relatively minor criticism, but potentially confusing.

FTDNA's Big Y results page for H1223 - 201 "Shared Novel Variants" with 164729

There are 3 tabs in the Shared Novel Variants pop-up box - one tab has 156 "Shared" SNPs, 45 "unique" to H1223, and 13 "unique" to 164729 ... and that adds up to a total of 214 ... so where does the 201 come from?? 156 + 45 is 201 ... so did they forget the other 13 SNPs? Other numbers for nearby neighbours (155, & 190) also do not add up correctly. This is not potentially confusing - it is confusing.

Pop-up box with 3 tabs showing Shared SNPs & unique SNPs

Apart from the confusion over the term "Shared" and the actual number of SNPs detected, there are several further sources of confusion.

Firstly, the definition of the term "Novel" in the phrase "Shared Novel Variant". Novel is supposed to refer to SNPs that have never been discovered before. But ... before when? The definition of Novel varies between companies so what is novel to FTDNA may not be considered novel to YFULL. And vice versa. Furthermore, presumably anything "novel" has a time-limit, after which it becomes classified as "known" ... but no one knows when this time-limit expires. And this may also differ among companies ... one man's "cutting edge" may be another's "yesterday's news". There is no standardisation. So caution is necessary when interpreting these results and comparing them between companies. There will be differences in how companies report the same data.

Here's another source of confusion. FTDNA reports 156 Shared SNPs whereas YFULL does not give this actual number - it places the two individuals together on the YFULL tree sharing 10 SNPs in their shared Terminal SNP Block (Y18109), 5 SNPs shared at the branching point above that (Y17535 branch), and possibly 7 SNPs on the branching point above that (L1198 branch). So, where on the tree are these 156 shared SNPs that FTDNA says the two volunteers share? Do they go right back up the tree, back to "genetic Adam"?

And this is also where we encounter our next problem - FTDNA do not report SNP names, only SNP positions. This makes it difficult to identify SNPs and compare results between companies - some people use SNP names for identification, other people use SNP positions. In order to find out the SNP names (and thereafter ascertain where on the tree they sit), we have to enter every SNP position on YBowse to see if there is a corresponding name (or several corresponding names). That's 156 SNP positions!! What a pulaver!

Below is a screenshot of ISOGG's YBrowse utility. By entering the position in the search box, you can find if there are any particular SNPs at that particular position on the Y chromosome. You have to enter the position in the format shown. The example below is for position 7,321,330 and there are (apparently) 4 different SNP names at this particular position. This initially suggests that they are all equivalent SNPs (i.e. same SNP, different names) but further examination of the Details for each of the 4 SNPs reveals that there is a contradictory direction of mutation - was it from C to A (SNPs 1,3,4), or from A to C (SNP 2)? Which came first? The chicken (C) or the Egg (A)? [Note: allele-anc refers to the ancestral value (i.e. the original or reference value) and allele-der refers to the derived or mutated value.]

Browse reveals there are 4 SNPs at position 7,321,330

Details of the 4 SNPs with contradictory directions of mutation
(click to enlarge)

A further point of confusion is the fact that this particular SNP is found in several Haplogroups, namely R, O & Q, whereas we know the Spearin's are in Haplogroup I. So ... what does this mean? This does not look like a SNP that is uniquely shared by just our two volunteers. It appears to be a SNP that is shared not just by our two volunteers but by a host of other people??... including people in other haplogroups? In which case, there is really not much point in me trying to identify all 156 "shared SNPs" that FTDNA says our two volunteers have in common.

I stopped after five!

What about the 10 SNPs shared between our two volunteers (the so-called Y19108 block) on the YFULL tree? Are these included in FTDNA's list of 156 shared SNPs? And what about the shared SNPs further upstream at branching points (Y17535, L1198, etc)? Are these also in the FTDNA list of shared SNPs?

Well, it was possible to use YBrowse to identify the positions for each of the SNPs on the YFULL tree. And then compare these positions to the FTDNA list to see if they appeared there. Here's what was found:
  • all 7 SNPs in the L1198 block are missing from FTDNA's Shared Novel Variants list ... but this could be because they are relatively well-established "upstream" SNPs and therefore do not meet the criteria for "Novel"
  • 3 of the 5 SNPs in the Y17535 block are present in FTDNA's list but 2 are missing (see diagram below) ... however, one of them (Y17491) turns up in FTDNA's list of unique SNPs for H1223 (YF04316)! It seems this particular SNP was recognised as a unique SNP by FTDNA but as a shared SNP by YFULL. So who is "right"?
  • 6 of the 10 SNPs in the Y18109 block are present in FTDNA's list but 4 are missing (Y18109, -10, -16, & -18) ... and again, 2 of them turn up in FTDNA's list of unique SNPs for H1223. These SNPs are identified as unique by FTDNA but shared by YFULL.  So who do we believe?
The fact that the Y18109 SNP is missing from FTDNA's Shared SNP list is highly confusing because FTDNA have assigned the terminal SNP for both our volunteers as Y18109. How can they do this if it does not turn up as a shared SNP between the two volunteers??? However it does appear in the list of SNPs tested for each of our volunteers on their Haplotree & SNPs page on the FTDNA website.  And when I download the SNPs from each volunteer into a csv file, there it is, Y18109, in both files, and derived from the Big Y test! Why then does it not turn up in the Shared Novel Variants list? Perhaps it is classified as a "known" SNP? And that's why it turns up in the downloaded csv file but the others do not? But that still does not explain the absence of the other 3 missing SNPs from our terminal Y18109 SNP block.

It's a conundrum. A quandary. A mystery.

A portion of my spreadsheet with the 156 Shared Novel Variants reported by FTDNA

Only 3 of the 5 SNPs in the Y17535 SNP Block appear on FTDNA's Shared Novel Variants list

So FTDNA do not identify all the shared SNPs identified by YFULL. Possibly because the two companies have different thresholds / criteria for declaring a SNP to be present.

But it points to a major lack of consistency between the YFULL analysis and the FTDNA analysis. And this naturally will raise concerns in people's minds about the accuracy of the data. Who got it right? Maybe both companies did. Maybe the differences are all down to the different criteria employed by each company for declaring a SNP. Or maybe not. Which analysis do you believe? Which is more reliable?

And what about the rest of the 156 Shared SNPs? Only 9 SNPs relate to the 3 branches of the YFULL tree discussed above - where do the other 147 fit in? Are they further upstream? It would be much more helpful if FTDNA simply reported the SNPs shared uniquely by Person A and Person B and no one else.

So, thus far, the analysis of FTDNA's 156 Shared SNPs has not been very helpful at all. Maybe we'll have better luck with the unique SNPs?

Unique SNPs
FTDNA reports that H1223 (YF04316) has 45 unique SNPs (i.e. not shared with 164729 / YF04104) and similarly 164729 (YF04104) has 13 unique SNPs (i.e. not shared with H1223 / YF04316). This differs considerably from the 16 and 51 unique SNPs reported by YFULL above. 

Unique SNPs reported by each company
But once again, the different companies have different criteria for declaring a SNP and this effects the results. If we take a closer look at the reporting criteria, FTDNA describe their "confidence" in the SNP as high, medium or unknown. In contrast, YFULL describes the "quality" of the SNP as best, acceptable, ambiguous, & low. Neither set of criteria are right or wrong - merely different approaches.

And when we compare the two sets of unique SNPs, there is only agreement between FTDNA and YFULL with regard to 2 unique SNPs for member H1223 (YF04316) and 1 unique SNP for member 164729 (YF04104). These are illustrated in the diagrams at the end of this post. 

  • Note that for H1223, none of YFULL's "Ambiguous quality" SNPs are reported by FTDNA. And similarly all but 2 of FTDNA's "high confidence" SNPs are reported by YFULL.  There are 3 "Best Quality" SNPs from YFULL (green and yellow highlight) but only 2 of these (yellow highlight) are declared by FTDNA. 
  • For 164729 (YF04104), only 1 unique SNP is declared by both companies (yellow highlight). This is deemed to be of "high confidence" by FTDNA and "acceptable quality" by YFULL.

Therefore, in terms of consistency or agreement between the two companies, the vast majority of unique SNPs declared by one company are not declared by the other. In terms of percentages this works out as: 2/45 (4.4%) and 1/13 (7.7%) agreement for FTDNA; and 2/16 (12.5%) and 1/51 (2%) for YFULL. This gives an average consistency score of a mere 6.7%. Or to put it another way, the companies will disagree 93.3% of the time.

So, even though we have a huge amount of information from both analyses, there are major differences between the two companies and what they put in their reports. The amount of inconsistency is quite astounding and highlights the need for caution in interpreting these reports.

To resolve these inconsistencies in reporting, we have to delve deeper into the data itself. And that means exploring the vcf files, bed files, and BAM files that contain the fine details of our DNA results (not accessible to Project Administrators without the express permission of the project members concerned). This is not a job for the faint-hearted and involves many hours of review and analysis. It is not a task that most Surname Project Administrators would embrace, and personally, I leave this type of analysis to the experts - the Haplogroup Project Administrators. This highlights the need for a close collaboration with people like Wayne Roberts and Aaron Salles Torres who are administrators of the I-M223 project. They have an overview of much more data than any Surname Project Administrator, and can potentially see patterns that would be easily missed by someone looking at a mere subset of the data.

Despite all the above caveats, we have actually learnt quite a lot from SNP testing. Both interpretations of the SNP data (by FTDNA and YFULL) place us in more or less the same position on their respective haplotrees. They both assign the same terminal SNP (Y18109). And there is some (minor) agreement on what are likely to be unique SNPs for each individual.

This entire exercise has been very useful in highlighting the fact that there is no standardisation currently in the way that the data from the Big Y test is analysed and interpreted. The same applies to other NGS tests, such as those offered by FGC (Full Genomes Corporation). And this is no surprise. We have to bear in mind that we are on the crest of the wave of scientific discovery here. We are the first explorers in a brave new world. As a community, it will take time for us to take in what we are seeing, analyse it, make sense of it, and arrive at a consensus regarding the best way to interpret and present the data. As Humphrey Bogart said to Claude Rains, this is simply the start of a wonderful relationship.

In the next post we will be looking at a topic that is (perhaps) a little bit more straightforward: TMRCA estimates - the Time to the Most Recent Common Ancestor.

Maurice Gleeson
April 2016

Unique SNPs for member H1223 - only 2 SNPs were jointly declared by both companies

Unique SNPs for member 164729 - only 1 SNP was jointly declared by both companies

Update 4 August 2016
I received this helpful comment from the I-M223 Yahoo Discussion Group:
Regarding the numbers reported in the Shared Novel Variants pop-up boxes I can offer the following. The first tab in a Shared Novel Variants pop-up box is shown in the attached figure. It states that there are 157 shared entries. Notice the position 14263127 is ancestral, i.e. G-G, and should not be in the list. There is one other like that so in reality there are 155 shared entries. In this case the mystery number is 200 leaving 45 that I cannot account for.
I reconciled the above against novel variants listed in the data exported as CSV files. Those same two bogus entries are present. The other kit in the comparison has 18 such bogus entries. After elimination of the bogus entries one kit has 28 novel variants not shared and the other has 26. These agree with the numbers reported as not shared in the other tabs of the pop-up box.

So from this it appears that there is a bug in the FTDNA system but this does not account for the discrepancies previously noted.