Fast and accurate estimation of large-scale phylogenetic alignments and trees

dc.contributor.advisorLinder, C. Randalen
dc.contributor.advisorWarnow, Tandy, 1955-en
dc.contributor.committeeMemberMaddison, Wayneen
dc.contributor.committeeMemberPlaxton, C. Gregoryen
dc.contributor.committeeMemberPress, William H.en
dc.creatorLiu, Kevin Jensenen
dc.date.accessioned2011-07-06T17:34:51Zen
dc.date.available2011-07-06T17:34:51Zen
dc.date.issued2011-05en
dc.date.submittedMay 2011en
dc.date.updated2011-07-06T17:34:57Zen
dc.descriptiontexten
dc.description.abstractPhylogenetics is the study of evolutionary relationships. Phylogenetic trees and alignments play important roles in a wide range of biological research, including reconstruction of the Tree of Life - the evolutionary history of all organisms on Earth - and the development of vaccines and antibiotics. Today's phylogenetic studies seek to reconstruct trees and alignments on a greater number and variety of organisms than ever before, primarily due to exponential growth in affordable sequencing and computing power. The importance of phylogenetic trees and alignments motivates the need for methods to reconstruct them accurately and efficiently on large-scale datasets. Traditionally, phylogenetic studies proceed in two phases: first, an alignment is produced from biomolecular sequences with differing lengths, and, second, a tree is produced using the alignment. My dissertation presents the first empirical performance study of leading two-phase methods on datasets with up to hundreds of thousands of sequences. Relatively accurate alignments and trees were obtained using methods with high computational requirements on datasets with a few hundred sequences, but as datasets grew past 1000 sequences and up to tens of thousands of sequences, the set of methods capable of analyzing a dataset diminished and only the methods with the lowest computational requirements and lowest accuracy remained. Alternatively, methods have been developed to simultaneously estimate phylogenetic alignments and trees. Methods optimizing the treelength optimization problem - the most widely-used approach for simultaneous estimation - have not been shown to return more accurate trees and alignments than two-phase approaches. I demonstrate that treelength optimization under a particular class of optimization criteria represents a promising means for inferring accurate trees and alignments. The other methods for simultaneous estimation are not known to support analyses of datasets with a few hundred sequences due to their high computational requirements. The main contribution of my dissertation is SATe, the first fast and accurate method for simultaneous estimation of alignments and trees on datasets with up to several thousand nucleotide sequences. SATe improves upon the alignment and topological accuracy of all existing methods, especially on the most difficult-to-align datasets, while retaining reasonable computational requirements.en
dc.description.departmentComputer Science
dc.format.mimetypeapplication/pdfen
dc.identifier.slug2152/ETD-UT-2011-05-3489en
dc.identifier.urihttp://hdl.handle.net/2152/ETD-UT-2011-05-3489en
dc.language.isoengen
dc.subjectComputational phylogeneticsen
dc.subjectMultiple sequence alignmenten
dc.subjectPhylogenyen
dc.subjectTreelength optimization problemen
dc.subjectSimultaneous estimationen
dc.subjectBiological datasetsen
dc.titleFast and accurate estimation of large-scale phylogenetic alignments and treesen
dc.type.genrethesisen
thesis.degree.departmentComputer Sciencesen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorUniversity of Texas at Austinen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen

Access full-text files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LIU-DISSERTATION.pdf
Size:
1.07 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.12 KB
Format:
Plain Text
Description: