The Question of Correlated Characters

Earlier, and on the Dinosaur Mailing List, David Marjanovic and Mickey Mortimer dismissed the phylogenetic analysis that produced the large reptile tree and the other trees (pterosaur, therapsid) due to their inclusion of purported correlated characters. I think this is short-sighted.

I’m here to tell you, it’s damn hard to avoid including correlated characters. For instance:

1. A long dorsal series of vertebrae is correlated to short to absent limbs.

2. A long canine tooth is correlated to a tall maxilla (that’s where the root is found) and often correlated to a deep dentary to protect it.

3. A large orbit is often correlated to a short rostrum, especially in taxa of relatively small overall size.

4. Unsharp, unconical teeth (they come in many shapes and sizes) are correlated to a broad or deep ribcage.

5. Wings of two sorts are correlated to a strap-like scapula and an elongated, locked-down coracoid.

6. A kinked tail is correlated to the development of flippers

7. A carapace is correlated to short fingers and toes.

8. Bipedal hind limbs and simple hinge ankle joints are correlated to reduced forelimbs, except in flying tetrapods.

9. An elongated neck is typically correlated to a small skull (with exceptions, of course).

10. A thick-boned skull is typically correlated with a thick-boned pelvis and hind limbs.

I’m sure you can think of others.

Caveat: There are exceptions to everything listed above. So don’t raise a finger immediately. I’m only asking, “how can you create a list of characters that does not include a certain amount of correlation?” Or even a lot of correlation? Evolution follows certain patterns. A dorsal fin will often appear in marine taxa. That happens. Correlation is everywhere.

Convergence in the Large Reptile Tree
The Consistency Index (CI) is a number that can be recovered in PAUP and it represents the amount of convergence in the matrix. In the large reptile tree the CI hovers near 0.1. Nearly every character finds at least two expressions somewhere on the tree. Mortimer and Marjanovic see this as a fault of the study. I see this as a fact recovered by the study. And it shows the strength of the study that the tree could separate the various convergent traits by maximizing parsimony.

Emphasizing Certain Traits
In the past, paleontologists have emphasized skull fenestrae and ankle traits in determining phylogenetic relationships. According to Mortimer and Marjanovic correlation is to be avoided because it over-emphasizes certain traits. The problem is, certain taxa are known from only a skull, so by default, skull traits are emphasized in the scoring of these taxa. Others are known from only other body parts and these are emphasized, by default. There’s nothing else you can do about it!

Like democracy, it’s not perfect, but it’s the best thing we have at present. Notably, neither Mortimer nor Marjanovic have been able to identify misnested genera within the large reptile tree without resorting to nebulous suprageneric taxa.

The large reptile tree is completely resolved and continues to be so as more taxa are added. All evolutionary pathways provide a gradual accumulation of traits, which is what we’re looking for as we attempt to model the original family tree of life.

Methods can always be faulted. There’s always a sniper in the bell tower. The results here speak for themselves. Even so, if errors are found, please bring them to my attention.

7 thoughts on “The Question of Correlated Characters

  1. First let me say I’m glad you’re taking our critiques into account.

    One issue here is that you’re using the wrong kind of correlation. The kind phylogeneticists should avoid is strict logical correlation. So sure big eyes and short snouts are often found together, but because sometimes they’re not, we can leave in both characters. But you can’t have both “palatal teeth absent” and “vomer teeth absent”, because a taxon without palatal teeth by definition lacks vomeral teeth. The only way to make such strictly correlated characters work is by coding some taxa as inapplicable. So if you had both “less than thirty caudal vertebrae” and “less than twenty caudal vertebrae”, you could code the first normally, BUT code the second as inapplicable for any taxon with more than thirty caudals. In this way, the second character is only sampling those taxa with short tails already. It’s easier to just order one character in these cases (>30 (0); 20-30 (1); <20 (2)), but the math will work out the same using either method.

    CI actually stands for Consistency Index, and your tree actually has a very low CI of 0.1040. A CI of 1.0 would mean every character evolves only once and never reverses. A CI of 0.5 means on average there is one reversal or convergence per character. So your CI means on average every character converges or reverses 10 times in your tree. And this is good, because it means you didn't design your tree into your matrix. Analyses by Sereno classically have very high CIs of ~0.95, which means he mostly included characters that support his tree and didn't really test alternatives. Which leads one to wonder why he ran the analyses in the first place. So this is a part of your analysis, that while you misunderstood, is actually very good.

    By emphasizing certain traits, I didn't mean using more characters from one part of the body than other. Like you said, some taxa are fragmentary so we can only use what we have. What I meant is that you have many characters coding for the same things. Take appendicular reduction. Character 158 includes the state "interclavicle absent", but 159 includes the state "interclavicle poorly ossified or absent". Character 161 includes "scapulacoracoid poorly ossified", 163 includes "both scapula and coracoid absent", and 164 includes "scapulacoracoid absent". Character 168 includes "both humerus and femur absent", 169 includes "no humerus", 170 includes "forelimbs and hindlimbs vestigial", 173 includes "no manus, no pes", 182 includes "manus absent", 197 includes "femur less than half glenoacetabular length", 202 includes "no fibula", 203 includes "tarsus poorly ossified", 207 includes "metatarsals absent", 209 includes "no metatarsus", and 210 includes "metatarsus absent".

    Even at the most basic level, taxa that lack a metatarsus will be coded for that three times. So if you have three taxa (A, B and C), A and B which are similar in lacking an olecranon process, and B and C which are similar in lacking a metatarsus, B and C are going to be sister taxa because their single similarity was counted three times. Since so many of your characters are like this, I bet your tree has many nodes supported by repeats of the same morphology. Luckily, fixing this is pretty easy. The first rule is never have a "structure X absent/reduced" state in the same character that describes some variation in that structure. Make "structure X absent/reduced" its own character. Then when you're trying to code Boa's humerus morphology or whatever, code it inapplicable (- in PAUP). Similarly, since I don't think it's developmentally possible to have distal limb elements without proximal ones, if a taxon lacks a humerus, code it inapplicable for "manus absent".

    As for claiming I haven't been able to "identify misnested genera within the large reptile tree without resorting to nebulous suprageneric taxa", that's just a lie. I gave a whole list of dinosaurian examples in this comment . The first one was "Scelidosaurus should move up to be sister to Scutellosaurus" which doesn't even involve any suprageneric taxa. You never awknowledged my followup comment for why suprageneric taxa can be a sister taxon in real life and our cladograms. Your tree has just as much of this as anyone's. What's your sister taxon to Cephalerpeton? Reptilia, which is quite a bit bigger than a genus.

    Honestly David, if you want your reptile tree to be of any use, you have to fix how your characters are formatted ( And then add characters that have been used by others to support clades not in your tree ( You only included ONE saurischian character and THREE dinosaurian characters (as found by Nesbitt, 2011). Is it any wonder you have Marasuchus, silesaurids and poposaurs within Dinosauria and find Phytodinosauria instead of Saurischia? There's no shame in adding characters and redefining states, many published analyses have the same problems. But please don't make the time I've spent looking into your matrix a waste and claim your matrix is fine as is, and keep plugging in taxa as if there are no problems. I'm trying to help. And don't lie about me again either. That's just low.

  2. Thanks, Mickey. The mental typo of Convergence/Consistency has been fixed. It’s been a long time since I’ve had to use that abbreviation and I was rushing to get out the door for a date.

    My sister to Cephalerpeton? I see Thuringothyris, Concordia and Westlothiana “on the upside.” “Reptilia” is a mental construct here, not a listed taxon.

    Duplicating characters is also a necessary evil, as in describing the loss of the postfrontal when you actually want to talk about the postorbital or frontal (does it diminish or fuse? well that depends), or the absence of the supratemporal as it gradually diminishes or becomes fused to the parietal or supratemporal. The palatal tooth character duplication is also regrettable but happened as some taxa have no vomer teeth yet retain other palatal teeth. As I recall, no list of included taxa was ever the same with regard to overlapping traits. And I don’t mind a slight emphasis here and there, even when it’s not important.

    In the list of other suprageneric taxa you earlier offered I overlooked your Scelidosaurus/ Scuttellosaurus referral. It was stuck in the middle. Apologies. In the large reptile tree these two are separated by Heterodontosaurus and Agilisaurus and by 13 steps. This suggests the armor may be present by convergence here. What other traits link Scelidosaurus and Scuttelosaurus not found in the other two?

    I’ll take another look at the sauropod/phytodinosaur issue, but as mentioned earlier, the taxa in the large reptile tree all resemble their present sisters more than more distant taxa, both overall and in detail in the present tree. In the past, poorly represented taxa have shifted one up or down in the tree as new taxa are introduced. However, with the present data force-shifting the saurpodomorphs between theropods and the Pampadromaeus branch adds 11 steps. Adding them to the theropod branch adds 20 steps total. So they seem to be pretty well nested presently.

  3. I know Marjanovic has pointed this out numerous times to you- your concept of “sister taxon” is wrong. In your tree, Thuringothyris is not a sister taxon of Cephalerpeton. It’s a captorhinid, which is a subgroup of Lepidosauromorpha, which is a subgroup of Reptilia, which is Cephalerpeton’s actual sister taxon. This is a basic concept in phylogenetics. You don’t get to just redefine “sister taxon” to be something different from what everyone else uses. You might think your concept is still useful as “the most basal member of the sister taxon”, but in this case “basal” is an illusion caused by e.g. Thuringothyris having less taxa on its branch than the other branch of captorhinids does. But imagine we find 100 new “thuringothyrinines”. Now all of a sudden the other branch of captorhinids looks more basal to you, and you’d say Romeria is Cephalerpeton’s sister, while Thuringothryis looks more derived, being nested deeply in a huge clade of its relatives. Yet the phylogeny hasn’t changed.

    Duplicating characters is not necessary. Give me any set of characters you think requires duplication and I’ll show you how to make each only code for a single, different variable. For palatal teeth for example, just have separate characters coding for presence of vomer teeth, palatine teeth, pterygoid teeth, etc.. You should mind even slight emphasises, as even small changes to a matrix can have large effects on the topology.

    As for the thyreophorans, according to Butler (2007)-
    35. Anterior ramus of jugal, proportions: 1, wider than deep. Not in your matrix.
    46. Jugal posterior ramus, forked: 1, present. Not in your matrix.
    89. Cortical remodeling of surface of skull dermal bone: 1, present. Not in your matrix.
    106. Ridge or process on lateral surface of surangular, anterior to jaw suture: 1, present, strong anteroposteriorly extended ridge. You include it but have Scutellosaurus coded uncertain, and Agili and Heterodonto coded as having it. I’d say you’re right about Scutellosaurus (as the surangular is unknown), but wrong about Agili (Peng, 1992) and Heterodonto (Norman et al., 2011).
    112. Premaxillary teeth, number: 0, six. Besides being unordered, your character only has “more than four”, so doesn’t distinguish thyreophorans from Agilisaurus which has five.
    219. Parasagittal row of dermal osteoderms on the dorsum of the body: 1, present. You only have one character for armor, but Butler divides it, as there are taxa with a parasagittal row but no lateral row (e.g. stegosaurids). I note too you left Agilisaurus unknown for armor presence, when it lacks armor.
    220. Lateral row of keeled dermal osteoderms on the dorsum of the body: 1, present.

    Constraining Thyreophora actually only takes 10 more steps in your matrix, and also has the effect of making Ornithischia sister to your “Paraornithischia” or whatever, and making Daemonosaurus sister to that clade plus sauropodomorphs. So subtracting five steps for the characters not included by you (35, 46, 89, 112, 220) makes Thyreophora only five steps longer. If I make the three changes noted above (no ridge for Agili or Heterodonto, no armor for Agili), gives the same results.

    Constraining Saurischia only takes 16 more steps, and also kicks silesaurids and poposaurs out of Dinosauria. Since you missed 13 of Nesbitt’s (2011) saurischian characters, that would bring Saurischia down to being only three steps longer in your tree. Then when you consider the 11 dinosaur characters you missed, I bet Saurischia would be more parsimonious in your tree, since 1.) these would tend to kick out “basal theropods” Marasuchus and Trialestes and thus change basal theropod polarity and 2.) since enforcing Saurischia makes silesaurids and poposaurs non-dinosaurian, these 11 characters would have less reversals in the Saurischai tree.

    In any case, when I have trees only 3 or 5 steps longer than the MPTs, I don’t consider them to be significantly worse. I’ve seen so many trees change to support clades previously 10-15 steps less parsimonious once more characters or taxa are added. So I’d say fix your character formation (I’ll even help if you want), then add Nesbitt’s (2011) characters, and see how your tree changes. It can only make it better.

  4. I appreciate your thoughts and efforts and hope the next guy to try a large analysis takes on you suggestions, Mickey.
    As long as I can recover a single tree from all these reptiles, I’m not going to add another character. Sorry. I only take things this far. No further.
    I’m not a dinosaur expert. I’m not a professional paleontologists. I’m just here to raise possibilities and provide evidence for those possibilities.

    However, anyone can take this matrix and add to it to their heart’s content. That’s the offer. So much work already done, so much work yet to be done.

    And sorry, I’m not going to give in to suprageneric taxa as sister taxa. I understand what you’re saying. It’s clear that what you say is happening in my tree. Even so genera are all I deal with. That way there’s no cherry-picking and no wandering around in the cloud of suprageneric taxa and their changing definitions.

    If we found more basal taxa than Thuringothyris, it would be out as a sister to Cephalerpeton. And, really, we already know its out. It’s just a matter of time. It’s just standing in for now. Everything changes as new data comes in.


  5. “As long as I can recover a single tree from all these reptiles, I’m not going to add another character”

    Utter failure. Contrary to what you apparently think, finding low amounts of most parsimonious trees has nothing to do with how good an analysis is. A randomly coded matrix COULD make a fully resolved tree, but it would still be worthless. Or a perfectly coded matrix could find a lot of polytomies, because there’s conflicting data or not enough data known. Your analysis isn’t fully resolved, btw- Marasuchus and SMNS 12352 are in an unresolved trichtomy with Tawa+Avepoda. On the other hand, sampling as many characters as possible has everything to do with how good an analysis is. Why do you think we even bother running these things through PAUP? Because humans can’t test alternative topologies fast enough. Our job is all about testing alternatives, but if you say you’re not going to bother testing alternatives by including the data they’ve used, you’ve just admitted you’re not interested in finding the real tree. You can’t say “I don’t want to bother testing my tree any more, I’m not a professional or an expert” while at the same time say “my trees are superior to everyone elses’ and they should all listen to my answers to those persistent mysteries”. Why should anyone pay attention? You refuse to even include their data, and concede you don’t have the expertise to do things right.

    Ironically your statement about refusing to give in to suprageneric sister groups is quite appropriate- you refuse to give in to reality, preferring your incorrect version of the world sister taxon means “early splitting member of the sister taxon”, correlated characters are unimportant and somehow you’ve chosen just the right 228 characters to correctly organize amniotes.

  6. Mickey, If my results were published on paper there would be an end date to adding data and characters, the date the data went to press. The same is roughly applicable here. But I choose to keep adding taxa and I can because of the nature of the medium. People will either choose to follow my studies or not. That’s up to them. It’s an offering. Adding “as many characters as possible” is a never ending task, and you know that. To challenge someone with a never-ending task is unfair and shows you can never be satisfied. The tree becomes fully resolved again with the deletion of SMNS 12352. The 228 characters I have chosen do split up the Reptilia and their predecessors, as you have confirmed. Further name calling and sarcasm will not be approved for publication. I wish you well.

  7. But if your results were published, you would still have a duty to test hypotheses by adding more characters as opposed to accepting that your analysis was correct in the face of alternatives that include more data. Holtz got a fully resolved theropod tree in 1994 using 126 characters, but he didn’t just stop and declare his characters to be all those needed to answer the questions of theropod phylogeny. He went back and added taxa and characters in 2000 so that his new analysis had 386 of the latter. And he got different results. Then in 2004 he added yet more, for a total of 638, and got different results again. Adding taxa is important, but so is adding characters. The problem is you didn’t just perform an analysis, present the results and move on, in which case it would fine that you don’t want to take it further. Instead, you keep blogging about how your analysis demonstrates things about reptilian evolution, but in none of your posts do you include the caveat “though note I didn’t include most of the characters that have been proposed to support the traditional phylogeny”.

    In a way, I am never satisfied, since until we’ve included all possible data, we can always try harder to find the true cladogram. But what I would be satisfied with is if you had included all of the characters and taxa others used to support a certain traditional relationship you disagree with (say Saurischia, or Dinosauria), because only then would your statements have power. Otherwise, your statements need to be much more humble, of the form “here’s several characters I found that would support an alternative topology; someone should add these to a large published analysis and see if it changes the results.” Rather like what I did for my ceratosaurian ornithomimosaur post- . I presented some real, intriguing data, but since I didn’t test it against the traditional data, I didn’t treat my hypothesis as likely to be correct or better than the traditional consensus.

    Your warning about name-calling and sarcasm is fair enough. It is your blog after all. Future comments will be purely factual.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.