Algorithms for the analysis of whole genomes

Access full-text files

Date

2004

Authors

Wyman, Stacia Kathleen

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

With the advent of whole genome sequencing, we now have an abundance of whole genomes which have been sequenced and we have entered an era when algorithms can address problems at the whole genome level. In the past, sequencing efforts often focused on a single gene, and therefore, existing algorithms are at the scale of a single gene. With whole genome sequencing, we have access to sequence data for the entire genome of an organism or an organelle and algorithms are needed for whole genome analysis. In this research, we have addressed new computational problems that have arisen out of the availability and abundance of whole genome data. In genome annotation, all of the genes of a genome are located and identified in preparation for publication of the complete genome sequence. We address the problem of genome annotation with a software package that allows researchers to locate and identify all the genes in a genome and prepare the genome for direct submission to GenBank. A difficult problem that arises in the annotation of organellar genomes is the identification of animal mitochondrial transfer RNA genes. We present an experimental evaluation a set of methods (including our own) for identifying tRNAs. The problem of reconstructing phylogenies from gene order data involves recreating the evolutionary history of a set of organisms based on the order and direction of the genes in the genomes. This can give insight into mechanisms of large-scale evolutionary events. We present a new method for gene order phylogeny reconstruction, as well as improvements to an existing method, and evaluate the results on both real and simulated datasets. Finally, we address the problem of identification of regulatory elements. Understanding gene expression is one of the most pressing unsolved problems in molecular biology today because gene expression controls all of the metabolic and developmental processes in an organism. We present a new method which uses a comparative genomics approach which is made possible now that we have access to the complete DNA sequences of many sets of related organisms.

Description

text

Keywords

Citation