Molecules vs morphology in mammals: Beck and Baillie 2018

Some published thoughts
on traits vs. molecules just out in the last week.

Beck and Baillie 2018 titled their paper: 
“Improvements in the fossil record may largely resolve the conflict between morphological and molecular estimates of mammal phylogeny.” No. Just the opposite. But you can see exactly where they put their faith… not in what they can see and measure.

From the abstract (annotated):
“Morphological phylogenies of mammals continue to show major conflicts with the robust molecular consensus view of their relationships.” True.

“This raises doubts as to whether current morphological character sets are able to accurately resolve mammal relationships, particularly for fossil taxa for which, in most cases, molecular data is unlikely to ever become available.” Just the opposite. Doubts should have been raised about molecular data, which can be influenced by local viruses. Only physical traits, i. e. the expression of activated molecules, resolves relationships, as the large reptile tree (LRT, 1255 taxa) attests. 

“We tested this under a hypothetical ‘best case scenario’ by using ancestral state reconstruction (under both maximum parsimony and maximum likelihood) to infer the morphologies of fossil ancestors for all clades present in a recent comprehensive molecular phylogeny of mammals, and then seeing what effect inclusion of these predicted ancestors had on unconstrained analyses of morphological data. We found that this resulted in topologies that are highly congruent with the molecular consensus, even when simulating the effect of incomplete fossilisation. Most strikingly, several analyses recovered monophyly of clades that have never been found in previous morphology-only studies, such as Afrotheria and Laurasiatheria.” In other words, we used our imaginations to make molecule phylogenies work, rather than considering the possibility that molecular phylogenies did not work. 

“Our results suggest that, at least in principle, improvements in the fossil record may be sufficient to largely reconcile morphological and molecular phylogenies of mammals, even with current morphological character sets.” They used far too few taxa. And they used suprageneric taxa. They avoided fossil taxa. This is omitting available data. 

This is not the way science is supposed to work.
So why was this published?

Beck RMD and Baillie C 2018. Improvements in the fossil record may largely resolve the conflict between morphological and molecular estimates of mammal phylogeny. bioRxiv doi:10.1101/373191. First posted online July 20, 2018.

Click to access 373191.full.pdf

25 thoughts on “Molecules vs morphology in mammals: Beck and Baillie 2018

  1. “And they used suprageneric taxa. They avoided fossil taxa.”

    They didn’t. Their analyses were based on O’Leary et al.’s (2013) dataset, which used species level OTUs and ~40 fossil taxa. Also 3656 parsimony informative characters, or over sixteen times your character total.

    “So why was this published?”

    It wasn’t. It’s a preprint without peer review. Did you even read the paper?

    • Good points, from a certain point of view. However, character number after the first 150-200 or so, adds little to results. O’Leary’s 2013 dataset included too few fossil taxa. A preprint without peer review published on the Internet is still published… without peer review.

      • “However, character number after the first 150-200 or so, adds little to results. ”

        Ummm…This is not true. Where on earth did you get this idea???

      • Neil, you can’t just say, “This is not true,” without showing experience or citation.
        I can tell you from experience that this is true. I’ve said this time and again over the years.
        Graybeal 1998 likes taxa over characters.
        Here’s another reference I found helpful where the curve of increasing accuracy just levels off in the very high 90 percentiles at about 150-200 traits:
        Wiens JJ 2003. Missing data, incomplete taxa, and phylogenetic accuracy. Systematic Biology 52: 528–538.
        Multistate characters help, too.

        Okay, you’re up. Show me from experience or citation where more than 200 trait characters (not DNA) were tested against just 200 characters to ‘help’ a phylogenetic analysis.

      • Ok first things first, the wiens paper: the simulated phylogenies in that paper all contain 16 taxa. So the point at which adding characters stops making a difference is Not relevant to your matrix.

        Let’s go back to basics. What is phylogetics by parsimony actually doing? Answer: counting the number of character state changes. Tree with minimum number of state changes = most parsimonious = best (assuming you’re using parsimony as your criterion; that’s a whole other debate). This is not measuring similarity; taxa sharing plesiomorphies will not be found close to each other; you can’t define a Clade by a plesiomorophy. To define a Clade you need a state change.

        What this means is that, for each node in the phylogeny, there must be a character state to change. The number of nodes in a fully resolved phylogeny is number of taxa minus 1. Therefore that is the barest minimum number of non-plesiomorphic character states you need in your matrix (of course some characters can have more than one non-pleaiomorphic character state). But ultimately that’s the barest minimum you need, if you have no conflict. A single conflicting character means you need to add another character to resolve the conflict if you still want a fully resolved tree.

        Anyhoo this is a huge oversimplification . the precise equations are in these two papers:

        Click to access Bordewich2015.pdf

        Note that this is just if you want to resolve the tree; if you want the tree to be reliable it has to have more than the minimum. But the long and short of is, there is a minimum number of characters needed to resolve a tree, and it is a function of the number of character states and the number of taxa. If you add more taxa by necessity you need to add more character states ie either modify existing characters or add some more

        Also, yes, while the gray Beal paper said adding taxa improves a phylogeny faster than adding characters, adding characters still shows improvement. Also they never tested what happens when you have less characters than taxa. So stop citing this and the wiens paper. Neither tested what happens when you try to test 2000 taxa with (I seem to remember) less than 500 characters. Presumably because they assumed no one would be so clueless as to try doing that

      • I’m not happy with the academic literature either. I appreciate your thoughtful reply.

        So, in my experience (which can be duplicated by anyone), Neil, there was a false statement in your appraisal. You wrote, “What this means is that, for each node in the phylogeny, there must be a character state to change.” In my experience for each node there must be a suite of characters to change. One is not enough. Three changes between taxa produces a 50+ Bootstrap score.

        So we’re not listening to one note (character), but a chord (a suite of characters). Of course, most of them, like 220 out of 238 typically do not change. On another taxon a variation of the 238 traits (with 1-10 possible scores) arises. This permits a nearly unlimited number of convergent traits, so similar, but unrelated taxa can arise on occasion (tritosaurs and protorosaurs, turtles and shelled placodonts). Thus many unrelated taxa can share a trait or two dozen traits, but not a suite of 220+ traits.

        To your last point, I heard about character/taxon ratios when the ratio was about 1:1 seven years ago. Now that the ratio has fallen to 1:5 or so, your theory still cannot be verified by the LRT. Whatever you learned in school, or in the literature, does not work in the real world.

        You wrote, “If you add more taxa by necessity you need to add more character states ie either modify existing characters or add some more.” The present character list in the LRT disproves this hypothesis. That’s not wishful thinking. That’s a verified/verifiable fact. I am still able to slip new taxa between old taxa at will. The end is not near.

        I hope this helps.

      • So you take it you didn’t read the papers I linked to giving the precise maths behind what I said?

  2. Using Wiens 2003 to justify a small number of characters is like using the USDA nutritional guidelines to justify giving an adult African elephant the same amount of food as a human.

      • What’s the line in advertising- ‘individual results may vary?’ The results you get from your matrix can’t really be generalized. In the case of the clade I work in, all kinds of things impact the results – inclusion/exclusion of taxa, inclusion/exclusion of characters, coding of specific characters for specific ingroup taxa, outgroup selection/rooting, and so on. Your analyses do not, in any way, demonstrate the primacy of taxon sampling over everything else.

      • You’re dodging my point. You say it’s all about taxon sampling; I’m informing you that it may or may not be all about taxon sampling, depending on the circumstances.

      • I’m trying not to dodge your point. I’m trying to be specific, from my experience with the LRT. To your point, when I make mistakes scoring a taxon, correcting those errors sometimes moves taxa around. Otherwise, the LRT has grown ‘organically’ from 235 taxa to 1256 simply by insertion after insertion, filling out the tree.

  3. Question. Have you tried testing the argument of your detractors by adding more characters to your matrix? Say, just 50 more characters (no mean feat, but still doable). If your argument is that more characters do little to affect the resolution, or at least offer diminishing returns, then a hefty boost of new characters followed by a run in PAUP*, should serve as a strong validation of your position.

    You can probably speed the process up a bit by reaching out to your community for help with adding characters. I’m sure Mickey Mortimer and Dr. Marjanovic would be willing to toss you a few character states for the taxa they are working on that overlap with the LRT. Many more can probably be pulled from previous analyses (under the assumption that the coding was correct).

    In the end, the ball is still in your court. The burden of proof to show the validity of the LRT is still solely up to you. If you want those in the field to take it more seriously, a validation test such as this would go a long way.

    • The underpinning of your request suggests that the addition of 50+ characters would somehow change the tree topology because a higher character/taxon ratio would then be present. Think about the history of the LRT. When there were one fifth the number of taxa, there were just as many characters. So a 5x higher character/taxon ratio was present seven years ago, and yet the tree topology was the same as it is today. An easier test is to delete as many taxa as you you wish, let’s say 1200 of the current 1258. That produces a character/taxon ratio of 238/58 (4:1 ratio), something I do often, and never experience a change in tree topology.

      The underpinning of your request also suggests that the addition of 50+ characters would somehow increase the resolution of the tree topology. The LRT has been nearly fully resolved for the last seven years.

      The ball is in everyone’s court. Thomas Huxley, and a century later John Ostrom, wanted ‘to be taken seriously’ on the bird/dino issue, but they were ignored by the field and supported by a few rebels. These things just take time. All I want is for other workers to employ the pertinent taxa recovered by the LRT to confirm or refute these initial findings.

      • The problem with the approach you have done in the past is that when your LRT did produce a different topology, you chalked the difference up to taxon exclusion. By adding more characters to your complete data set, you can test whether it makes much of a difference at all, without having to worry that taxon exclusion is mucking things up.

        Also, let’s keep in mind that Ostrom built his case up over decades. His work wasn’t ignored, it’s just that the data were not compelling for a long time. If you really hope to upend an entire perspective on animal systematics, you should expect extreme push-back until the weight of your evidence finally pushes things over the edge. It worked for Ostrom, Wegener, and Alvarez. Heck, Michell’s initial proposal of black holes took almost 200 years before there was enough evidence to sway scientific minds. That’s not a failure of science. That’s one of its strengths. If your hypothesis is correct then it will withstand all the criticisms thrown at it.

      • Thank you. The fun is in the discovery. Getting hoisted on shoulders I’ll save for another day.

        re: adding characters… that’s a never-ending gambit. The number of characters is endless. The number to taxa covering a very wide gamut, has already been passed. Now I’m just adding inbetweeners.

        re: failure of science. IMHO, Archaeopteryx in the 1860s should have been recognized as a bird/theropod, and it was by a few, but too few. In astronomy, other than looking in vain for Vulcan, most discoveries have been universally recognized and accepted.

  4. Neil, the math is beyond my ken. But reading the abstracts and conclusions I note that Semple and Steel 2001 report, “We consider the question of how many homoplasy-free characters are required so that T can be correctly reconstructed.” If I’m reading this correctly, homoplasy-free cannot be applied to the Tetrapoda, where homoplasy runs rampant, given the LRT set of characters and states.

    In Bordewich and Semple 2015, they ask, “how many characters are required to recover the correct phylogenetic tree? More precisely, given an arbitrary phylogenetic tree T , how small can a collection C of characters be so that T is the only phylogenetic tree consistent with C?” That’s precisely your question, as I understand it. They provide up to four states for each character, but MacClade provides ten. Sometimes I use all ten. They also report, “Reverse and convergent transitions do happen in biology, but such events are considered relatively rare.” In the LRT, as mentioned above, such transitions are not rare.

    In conclusion, the LRT is a living breathing example that demonstrates that 231 characters can split and lump 1258+ taxa in full resolution, and most often with high bootstrap scores. You may not like that fact, but there it is sitting there in the middle of the road looking at you.

    • If you don’t understand the maths, might be an indication that you should actually listen to people to do. Obviously you aren’t expected to be an expert at everything, but “I don’t understand” is not an argument for ignoring it.

      Bordewich and semple, in their examples, do only go up to four states, but they provide a precise equation that allows any number of states. That’s the interesting bit.

      Also not impressed with the circularity using the LRT to argue against a paper showing why the LRT is not tenable. Frankly, if your tree is finding so much homoplasy, that is an indication that your character list is not adequate. If your characters have such a high rate of change that they are repeatedly re-evolving, that is an indication that they are not good for inferring phylogeny. Its why there has been such pushback in recent years against certatin types of character e.g. see Sansom’s recent papers about mammal dental characters. They are so heavily tied to ecology and so developmentally non-independent that their phylogenetic utility is reduced. When we’re inferring a phylogeny, we prefer as many homoplasy free characters as we can get. Hence the semple and steel paper

      • Neil,
        The paper is theoretical. The LRT is real.

        Your statement that lots of homoplasy indicates inadequacy is false. The characters include longer femur than tibia or not, skull longer/shorter than cerivicals, etc. How often do those change? Lots.

        Your statement about repeatedly re-evolving is false. It is an a priori bias that you need to ignore. Let the data do what the data does.

        We prefer as many homoplasy-fre characters as possible, but what we prefer is not the way nature works. Homoplasy is rampant. Get use to it. Work with it. Let it wash all over you. Shake off your old biases.

        The character list in the LRT is generalized to apply to as many taxa as possible. That’s why it works. Listen to me, Neil, it works. It lumps and splits and recovers sister taxa that look like one another. Other trees, as I show over and over, too often do not. Put your magnifying glass on other trees and you’ll be shocked at what you see. Or just read a few blogposts here, as I expose the taxon exclusion problems out there.

      • PS. If you don’t think the LRT works, show me where it does not work. According to your theory, there should be dozens to hundreds of problems here. List the misplaced taxa and tell me where they should go, and why. You may be correct. If so, I will make the change. If not, I will defend the work.

      • I’ll provide the ever-so-rare semi-defense of Peters here and say that in part because so many of his characters are multistate and extremely homoplasic, his ~230 characters can sort his ~1200 taxa into a tree. They’re also terribly formed and composite (both of which contribute to the number of states), correlated and misscored based on DGS and his unfamiliarity with anatomy, but they do generate a generally resolved tree. So arguing Peters has too few characters mathematically to make any tree isn’t the way to go.

  5. “The LRT is real.”

    No, the LRT is not real. There is only one real tree that shows how life evolved, and we have no idea what it is. Every tree—published or not—is a hypothesis of how certain groups evolved. If you are treating your tree as gospel, then that alone is why it won’t get published.

    • No, the LRT is real. It exists as a model/hypothesis of real events, which we will never know in full, as everyone knows and you pointed out. So, no kudos for you. Everyone knows more taxa are added on occasion. Everyone knows I correct data when I can, so scores change. When scores change, taxa might shift or be cemented in place with higher scores. By ‘real’ I meant that, no matter it’s faults, the LRT is able to lump and split all included 1258 taxa with only 231 multi-state traits. Thus it is a real example of something a mathematical construct, according to NB, cannot be completely resolved.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.