Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion
Abstract
Phylogenetic trees have a multitude of applications in biology, epidemiology,
conservation and even forensics. However, the inference of phylogenetic trees can be
extremely computationally intensive. The computational burden of such analyses
becomes even greater when model-based methods are used. Model-based methods have
been repeatedly shown to be the most accurate choice for the reconstruction of
phylogenetic trees, and thus are an attractive choice despite their high computational
demands. Using the Maximum Likelihood (ML) criterion to choose among phylogenetic
trees is one commonly used model-based technique. Until recently, software for
performing ML analyses of biological sequence data was largely intractable for more
vi
than about one hundred sequences. Because advances in sequencing technology now
make the assembly of datasets consisting of thousands of sequences common, ML search
algorithms that are able to quickly and accurately analyze such data must be developed if
ML techniques are to remain a viable option in the future.
I have developed a fast and accurate algorithm that allows ML phylogenetic
searches to be performed on datasets consisting of thousands of sequences. My software
uses a genetic algorithm approach, and is named GARLI (Genetic Algorithm for Rapid
Likelihood Inference). The speed of this new algorithm results primarily from its novel
technique for partial optimization of branch-length parameters following topological
rearrangements. Experiments performed with GARLI show that it is able to analyze large
datasets in a small fraction of the time required by the previous generation of search
algorithms. The program also performs well relative to two other recently introduced fast
ML search programs.
Large parallel computer clusters have become common at academic institutions in
recent years, presenting a new resource to be used for phylogenetic analyses. The PGARLI
algorithm extends the approach of GARLI to allow simultaneous use of many
computer processors. The processors may be instructed to work together on a
phylogenetic search in either a highly coordinated or largely independent fashion.
Preliminary experiments suggest that analyses using the P-GARLI software can result in
better solutions than can be obtained with the serial GARLI algorithm.
Department
Description
text