The paint-by-numbers analogy to phylogenetic analysis

Figure 1. owl paint-by-numbers

Figure 1. Owl paint-by-numbers. If you accurately add color to each little shape, pretty soon a picture will emerge. You don’t have to compose it. That’s already been done for you. Follow this method and your result will echo the original composition,  lighting and subject matter to a great degree. 

Phylogenetic analysis is like a paint-by-numbers kit.
You fill in each little color by following the instructions. Or you fill in each little matrix box (taxon/character) with the correct score. Only afterwards do you see the big picture. Or only afterwards does the software produce the resulting cladogram, the big picture of hypothetical relationships.

By contrast, in traditional painting
the master artist starts with a loose sketch, then arranges elements in a composition to fit a triangle, a golden rectangle, or some other substructure. The colors, tints and shadows are added in large blocks to a canvas of the right size to fit a certain wall. Finally the details (lace, highlights, eyelashes, etc. are added.

Like a paint-by-numbers canvas,
the big picture in evolution has already happened. The “instructions” or “clues” come to us in the form of preserved and exposed traits in fossils and living taxa. We don’t have all the clues, and never will, but with what we do have we fill them in until a complete picture begins to emerge, blank spaces and all.

Likewise, the large reptile tree
(LRT) and large pterosaur tree (LPT) are large gamut cladograms that will never be completed. However, as new taxa are added the details and transitions between established taxa become finer and finer blends. The big picture, or tree topology, has been pretty steady for several years and hundreds of additions.

Make sure your taxa 
are all species or specimens. Those provide good data. Avoid suprageneric taxa. By combining traits from several genera you’ll muddy the canvas. The tiny features will be lacking. You’ll cherry-pick favorites and overlook obscure details that might be Important.

Science is for everyone
Not just for PhDs. If they can create a cladogram, so can you. They test published work for validity. So do I and so can you. Along the way, you will make mistakes. I do too. Others will point out mistakes. Defend your decisions where appropriate. Fix problems at every opportunity. Follow this method and your result will echo the original tree topology. Then keep adding taxa as they become available to fill in any blank spaces.

The first time an idea is proposed
it is rarely accepted. As time goes by, some hypotheses disappear. And some should disappear. Others, whether valid or not, get headlines because the PR machinery is tilted in their favor. Still others slowly grow in acceptance and are ultimately embraced because they reflect the original tree topology we’re all trying to see more clearly.

Good luck on all your endeavors.


5 thoughts on “The paint-by-numbers analogy to phylogenetic analysis

  1. Ah, but, as a phylogeneticist, you don’t only fill in the numbers. You also make the matrix. You choose and define the characters and their states, and you choose (and to some extent define) the taxa. On top of that, you choose how to order the characters (never mind weighting them).

    A morphological data matrix is a matrix of hypotheses, each of which ought to be tested. That’s one of the most important points of my huge preprint (of which I recently submitted the next version to PeerJ for the 3rd round of peer review).

  2. The characters, as you know, where created, for the most part (228/231) years before the last 800+ taxa were added. So there’s no present tense here regarding choosing characters. That was all in the past. I’m as surprised as you are that this character list is still working. There’s no cherry-picking of taxa to fit the tree. No ordering of characters. I’m adding taxa that appear to be basal and transitional forms. No hummingbirds and penguins here. Sometimes the new taxa provide new insight into relationships. Often this is due to traditional taxon exclusion. Other times new taxa support traditional trees. And that’s important. Lizards still nest with lizards. Birds with birds. The tree can be subdivided many different ways to test the matrix many different ways.

    • So there’s no present tense here regarding choosing characters. That was all in the past.

      You’ve stopped adding characters???

      Why would you ever stop adding characters? That comes as a profound surprise.

      I’m as surprised as you are that this character list is still working.

      What makes you think it’s working?

      Some parts of the LRT, pretty large parts, look like the taxa are being sorted by noise and a bit of convergence. Lack of turtle monophyly, lack of marsupial monophyly, lack of anything approaching temnospondyl monophyly and many other phenomena (I sent you a more or less complete list yesterday) are reliable signs that your taxon sample is long past exhausting the ability of the character sample to sort the taxa.

      I know this from my own tree. Take a look at the stereospondyls or the salamanders in the different analyses in my preprint, and you’ll know what I mean.

      There’s no cherry-picking of taxa to fit the tree. No ordering of characters.

      As if those were remotely similar things… I’ll send you papers on ordering.

      Lizards still nest with lizards.

      And so do Prolacerta and a whole bunch of other animals that manifestly aren’t lizards…

  3. re: Why would you ever stop adding characters? – you know the answer: add one character, you have to go back to 1000 taxa to check on it. On the other hand, add one taxon and score it. Also: Because you come to a point of diminishing returns. More work. Less reward. The curve at the top starts to flatten. If you’re having trouble with any cladogram cull it down to subsets and see what you get. It might prove insightful. More is better, but only up to the point of diminishing returns.

    re: What makes you think it’s working? – same answer as always: the ability to lump and separate has not appreciably diminished (except in some partial taxa)

    re: Lack of turtle monophyly, etc. — Maybe that’s real. Let’s test if with an independent researcher using the same taxon list.

    re: And so do Prolacerta and a whole bunch of other animals that manifestly aren’t lizards… – not in the LRT. You must be referring to some antiquated matrices beset with taxon exclusion issues.

    : )

