Modifying characters in phylogenetic studies: Simoes et al. 2016

This blog post will hold a special interest
for those who do not like the character list of the large reptile tree. Simoes et al. 2016 attempt to show that large studies, even those created by universally respected and dedicated PhDs (Gauthier et al. 2012 and Conrad 2008), may not be “of the highest quality.” They report, “Our results urge caution against certain types of character choices and constructions.”

Nice to know someone else out there
is also testing cladograms with critical insights. But, as you’ll see, the Simoes corrections, no matter how praise-worthy, well-intentioned and insightful, do not solve several problems.

At least one of the two tested analyses HAS to be of poor quality,
because the prior two analyses (Gauthier et al. 2012 and Conrad 2008) do not agree with one another (see below) in major and minor ways. When the LRT is introduced as a third candidate, now at least two are of poor quality, because the LRT provides yet a third topology. Which one best reflects actual evolutionary events? Or are all three ‘poor’?

Simoes et al. 2016
modified two competing scleroglossan studies (Gauthier et al. 2012, Conrad 2008) by culling ‘poor’ characters while keeping the original ordering of remaining character states and then by making all character states unordered. They report, “the concern for size is usually not followed by an equivalent, if any, concern for character construction/selection criteria. Problematic character constructions inhibit the capacity of phylogenetic analyses to recover meaningful homology hypotheses and thus accurate clade structures.” 

This has been a frequent criticism
of the large cladogram at, despite the fact that it continues to grow organically (with no cuts and grafts over the past several years) with additional taxa that all continue to resemble one another. And that it is developed by someone who is learning as he goes, with no a priori expertise or even knowledge of every new clade added to the LRT.

Simoes et al. 2016 found in the Gauthier et al. and the Conrad studies
“more than one-third of the almost 1000 characters analysed were classified within at least one of our categories of “types” of characters that should be avoided in cladistic investigations.These characters were removed or recoded, and the data matrices re-analysed, resulting in substantial changes in the sister group relationships for squamates, as compared to the original studies.”

Note the Simoes team did not,
apparently, attempt to reexamine problematic taxa and re-score any errors they might have found. While constructing the LRT, scoring errors are corrected constantly.

Simoes et al. 2016 conclude:
“The modified versions of Conrad’s (2008) and Gauthier et al.’s (2012) matrices do not provide revised phylogenetic hypotheses that we claim to be “fixed” or “superior” versions of the same—that would also require a re-analysis of the scorings performed for all terminal taxa that are well beyond the goals of this study. In addition, these results still reflect the original authors’ notions of primary homologies for many characters. Our main goal was to identify general problems with character conceptualizations and constructions for morphological characters for all morphological data sets, and then to identify these problematic characters within our area of expertise, specifically studies of squamate phylogeny. The results of this study provide a different perspective of squamate relationships and indicate how specific issues with character construction may deeply affect our current notion of the squamate tree of life.”

No word yet on what Gauthier et al. and Conrad have to say
about the criticism and changes to their matrices and tree topologies.

Four basic rules from Simoes et al. 
“We have identified four basic operational rules for the construction of characters, and accurate coding and scoring, but note there may well be more:

  1. utilization of as many similarity sub-criteria as possible in order to create characters that are more likely to reflect similarity due to recency of common ancestry;
  2. avoidance of logically inconsistent character construction, such as logically dependent characters, exemplified by our character type series I A;
  3. take into consideration previous studies suggesting possible biological dependency/independency among distinct morphological attributes used as characters; 
  4. acknowledge that continuous variation is widespread in nature and that such data must be treated as such. In the case of phylogenetic analyses, measurement characters must not be treated as discrete when there is a continuous range of variation.

When there is evidence for a disjoint distribution of data, and authors wish to treat them as discrete, a clear statement must be made supporting the disjoint nature of that data.”

These are good ideals to strive for.
The problem with related traits such as, longer vertebral column and short underdeveloped limbs, will always be with us. On the other hand, continuous variation sometimes leads to personal choice when judging those that are on the margins of one and another. Character construction is not perfect and never will be. Neither will scoring. But we can still strive for those — to a point. At some stage, all thinking has to stop and the SEND button must be pressed to upload the data and results to an editor or to the public.

the tree figures provided by Simoes et al 2016 were color coded for simplicity.  Unfortunately neither study includes taxa published after 2012. For their time, both the 2008 and 2012 studies were laudable efforts, but with the LRT, things have changed. Neither study recognized the Tritosauria and Protosquamata, although both correctly nest tritosaurs outside the crown group Squamates. Some protosquamates, like Dalinghosaurus, nested within derived clades by default.

Result: Gauthier et al. 2012
Both revisions retain snakes and amphisbaenids as sister taxa and highly derived burrowing snakes that open the jaws laterally as basal taxa. The modified and unordered tree correctly nest pro-snakes closer to snakes, but both fail to separate them from mosasaurs, which should arise from varanids. The unordered tree correctly moves geckos closer to snakes, but not close enough. Eicthstaettisaurus incorrectly moves further from geckos. Legless pygopodid geckos move to the base of legless amphibaenids + snakes and legged pro snakes + mosasaurs. This is where reconstructions would help workers see the red flags.

Results: Conrad 2008
Gekkos did not shift when this dataset was modified and unordered. All versions of the Conrad study retain the amphisbaenid – snake relationship, which was not repeated in the LRT. The clades Scincomorpha and Anguimorpha disappeared. The clade Diploglossa appeared in the modified version. Anguimorpha reappeared in the unordered version.

Conrad 2008 vs. Gauthier et al. 2012
These two studies did not agree with one another, despite having first hand access to most of the taxa, having extensive character and taxon lists and both had PhDs as authors.

  1. Conrad nested Eicstattisaurus at the base of the Squamata. Gauthier did not.
  2. Conrad nested gekkos as basal squamates. Gauthier did not.
  3. Conrad nested skinks and snakes next. Gauthier did not. 
  4. Conrad nested mosasaurs as highly derived. Gauthier did not.
  5. And there are a dozen+ other differences.

So, which one of these is valid?
That means the other is not valid (does not echo evolutionary events). The LRT indicates that both have problems because it presents a third topology based on traits that apply not only to lizards, but to all reptiles in general. Similarities appear within all major clades. Differences appear between all major clades. Since all three studies are based on genera, one wonders how such differences arise.

And what happens when ALL the changes are made by Simoes et al. 2016?

  1. The Conrad and Gauthier studies do not look more like each other after the changes
  2. Gauthier nests geckos as more derived, with Sineoamphisbaena, apart from other amphisbaenids but closer to the pro-snakes (still not allied with Eichstaettisaurus or snakes) and mosasaurs (still not allied with varanids).
  3. Conrad major squamate clades don’t change much, but genera change sisters quite a bit. At all stages Conrad allies varanids with mosasaurs, but it is not clear if that includes Aigialosaurus, Pontosaurus and Adriosaurus, which all nest with mosasaurs in the Gauthier studies, but the last two nest apart and with snakes in the LRT.

Concluding remarks

Even with the best minds, the best characters and firsthand access to data, Conrad 2008 and Gauthier 2012 could not come to one accord, even with the help of Simoes et al. 2016. And the LRT provides yet a third tree topology for squamates that takes into account the nesting of prosquamates and tritosaurs, something prior workers were unaware of based on their limited gamuts and paradigms. Simoes et al. were correct in unordering character traits, but that did not improve their trees. The LRT is unordered because ordering makes a priori assumptions that may not be valid

It is apparent that Conrad, the Gauthier team and the Simoes team trusted their numbers because they followed a ‘plug and go’ philosophy, lacking the critical reinspection of every relationship to make sure all sister taxa looked alike, did not quickly redevelop lost bones, or reverse the order of evolution (going from exotic and highly derived to simple and plesiomorphic). All taxa were reconstructed in the LRT and that makes for great ease in re-inspecting scores and traits. In the last four years several squamates unavailable to prior workers, like Tetrapodophis, have clarified relationships in the LRT.

Large studies that load lots of taxa and characters together and then push the start button don’t have the benefit of making sure every additional taxon fits and continues to make sense. Neither the Conrad nor the Gauthier originals nor their Simoes modifications were able to become fully resolved like the LRT is. In large studies, such as these, partial taxa should be included only if parsimony informative traits are preserved. Otherwise you blur the big picture.

One of the strengths of the LRT is that it grew slowly from a few taxa to many. Just like an imperfect child, it had and continues to have imperfections, yet it also continues to deliver new insights into reptile interrelationships that can be read, appreciated, confirmed and/or refuted by others. At present it is the only voice raised in heresy to all the traditional paradigms that cannot be validated, are poorly resolved and can be readily modified by others.

I don’t expect ANYONE to use my character list. No PhD in his/her right mind will ever use it. And we all know that. It would be like adopting an older child. It’s not yours, you didn’t raise it and you have to adapt your thinking to understand it. Better to grow your own analysis, like I did.

On the other hand, I DO hope and encourage others to use various subsets of the taxon list that the LRT recovers. It’s just a list of genera and specimens. No controversy there. Add my sisters to your trees and see where they take you. So far, several PhDs have done so with success and that’s great. Hopefully others will follow.

The taxa are flawless. The characters and scoring will always be flawed to some degree. That’s the world we all live in and paleontology will always have to deal with sometimes crumby (literally crumby) data.

Conrad JL 2008. Phylogeny and systematics of Squamata (Reptilia) based on morphology. Bulletin of the American Museum of Natural History 310: 1–182.
Gauthier JA, Kearney M, Maisano JA., Rieppel O and  Behlke ADB 2012. Assembling the squamate tree of life: Perspectives from the phenotype and the fossil record. Bull. Peabody Mus. Nat. Hist. 53, 3–308.
Simoes TR , Caldwell MW, Palci A and Nydam RL 2016. Giant taxon-character matrices: quality of character constructions remains critical regardless of size. Cladistics (2016) 1–22. doi: 10.1111/cla.12163. Online here.

Thanks to Dr. Neil Brocklehurst
for bringing this paper to my attention. I’m sure his intention in doing so was not satisfied.


3 thoughts on “Modifying characters in phylogenetic studies: Simoes et al. 2016

  1. You are right, my intention in bringing this paper to your notice was not satisfied. I seriously can’t believe I have to post this but apparently I do.

    The simoes paper was not about solving Squamata phylogeny. It was not about bringing agreement beTween conflicting topologies. It was not about updating taxon lists and rescoring errors.

    It was about one thing and one thing only: demonstrating that when you re-format badly-constructed characters, the results are affected!!! the results themselves are not the point, the fact that there were changes at all is proof of what I’ve been telling you. .

    The fact that the results changes at all is the point of interest. This is the point i was trying to make; formatting your characters properly, so they obey the basic principles of parsimony, will affect the results. That is the relevant result of this paper that you persist in ignoring, for no other reason than you are too lazy to make changes to your character list.

    Ah well, one good thing has come out of this exercise: I now know that, even when I give you a study explicitly stating that you have to format your characters correctly, explaining clearly why, and demonstrating it affects the results, you still won’t do it. Shows how interested in doing decent science you are.

  2. Tsk, tsk, Neil. You have given me large assignment and you want it done by the time you clap your hands.

    To today’s point; Simoes et al. demonstrated that results are affected, yes. That is so obvious and was reported by me, that it does not bear repeating again here. And yet you think I missed it.

    Dig deeper, which you have not yet done, and realize they solved no phylogenetic problems by their so-called improvements. They did not bring the two studies into harmony with one another. They did not show which was the superior or inferior study. They didn’t do anything but further stir the pot, creating clades where they still should not exist, bringing sisters together that do not nest together in the third study, the LRT. So, were those indeed improvements made by Simoes et al.? Maybe not. Maybe they are just muddling about. A phylogenetic analysis is a blueprint for actual evolutionary events. My blueprint provides a guide for those that solves problems — in part and in toto — as is. The LRT works. If the taxon order had a problem, I’m sure someone would let me know. Improvements, aka window dressing, will come to the character list, but give me time.

    Take the emotion out. Put your thinking cap on. Examine each node to see where there may be problems. Let me know where you see problems in the taxon order. I’ll get to the character list in time.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.