Statistical consistency of maximum parsimony: A 3-state, 3-taxa model
MetadataShow full item record
Phylogenetics, the study of evolutionary relationships among species, bridges numerous disciplines, notably mathematics and biology. While biologists and computer scientists might be more concerned with the net result of phylogenetic methods, i.e. the evolutionary tree depicting the evolution of species, mathematicians tend to focus on the theory that forms the basis of these methods. Accordingly, techniques have been developed that make varying assumptions about the process of evolution. The maximum parsimony method assumes that the correct phylogenetic tree is the one that predicts the fewest number of changes in genetic sequences as species evolve over time. This assumption resembles the concept of Ockham’s Razor, that the simplest explanation is usually the correct one (Semple, 84). In this study, we will examine maximum parsimony and analyze a particular model to display some properties of the method. Different phylogenetic methods possess differing statistical properties, often because they make different assumptions about the way evolution occurs. Most notably, the methods can vary with respect to statistical consistency, the property that as the size of the sample used to produce an estimate increases, the estimate approaches the true value. For phylogenetic methods, consistency refers to the length of the gene sequences that are sampled. So for a phylogenetic method to be consistent, it must be that as the length of the compared DNA sequences grows, the method more accurately predicts the actual tree (i.e. tells us how the evolution actually occurred). Thus statistical consistency can distinguish between methods to help determine which might be the most accurate to use in predicting a tree of life. In this study we will analyze a 3-DNA base pair, 3-species (3 states, 3 taxa) model using the maximum parsimony method to determine if maximum parsimony is a consistent phylogenetic method. The model considers the following evolutionary tree: (Felsenstein 403). Here evolution occurs along edges I-V resulting in species A, B, and C. The values P, Q, and R indicate the probability of changing from one base pair to another along the corresponding edge. Intuitively, this change represents a mutation in DNA sequence that leads to creation of a new species. By analyzing maximum parsimony under this model, we find that by varying the probabilities of changing along an edge, the maximum parsimony method can become inconsistent and predict the incorrect tree.