Browsing by Subject "computational biology"
Now showing 1 - 4 of 4
- Results Per Page
- Sort Options
Item Approaches to discover new human disease models through Boolean relationships of orthologous phenotypes(2012-04-26) Tien, Matthew; Marcotte, EdwardIn the development of genome-wide databases of model organisms, non-traditional approaches to study genomic networks have emerged to model molecular interactions. Past approaches to model such systems have used the homology of genes in model organisms to study human diseases and conditions. Combining the homology of genes between human model organisms with information in genomic databases, the Marcotte laboratory has discovered a systematic approach to predict new candidate genes for human diseases. In characterizing phenotypes of model organisms with homologous genes, it is possible to reveal similar genetic interactions of homologous genes in human diseases. These phenotypes are characterized by the presence and absence of all orthologous genes between species, these binary data structures are called phenologs. The project below examined the potential of Boolean relationships of phenologs within one or multiple species to optimize the identified set of genes for a human disease and to predict more candidate genes involved in human disease.Item Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs(BMC Bioinformatics, 2010-11-15) Kundeti, Vamsi K.; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, VishalBackground: Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). Results: In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. Conclusions: The bi-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.Item Predicting Subcellular Locations of Conserved Eukaryotic Protein Families with Co-Fractionation Mass Spectrometry Data(2023) Yang, David; Marcotte, EdwardClassifying proteins by their subcellular locations is important for gaining insight into their functions and understanding the dynamics of a cell. While some proteins of eukaryotic organisms such as humans or mice are well characterized, many conserved proteins across the entire Eukaryota domain remain uncharacterized. One recent development for detecting the physical association of proteins is co-fractionation mass spectrometry (CFMS), a method that involves multiple separations of proteins based on their physical and biochemical properties to then be identified by mass spectrometry. We utilized data from previous CFMS experiments of 31 eukaryotic organisms to build a machine learning model to predict the subcellular locations of conserved protein families. This model uses the elution profiles generated from CFMS as features and subcellular location annotations from QuickGO as truth labels. We used our trained model to predict subcellular locations of protein families that have been identified by CFMS but do not have annotated subcellular locations. Our results demonstrate that CFMS data is acceptable at distinguishing subcellular locations of deeply conserved protein families and is exceptional at distinguishing between ribosomal and non-ribosomal proteins.Item Spatial models for pandemic influenza: Extending human movement models across international borders(2014) Maples, Thomas; Eggo, Rosalind; Meyers, Lauren AncelModeling the spread of pandemic influenza can help public health officials decide when and where to concentrate prevention, detection, and intervention efforts. As influenza is transmitted from person to person, models for the spread of influenza and influenza-like viruses require an understanding of human movement and travel patterns. A recent movement model, known as the radiation model, has been shown to accurately predict the movement of people within the contiguous United States. Using commuter data from the 2000 US Census, we identify geographic regions of poor model fit, and demonstrate that this radiation model formulation does not accurately predict cross-border movement to Canada and Mexico. We propose a modified radiation model that takes the borders into account, adjusting the probability that a worker commutes to a foreign country. Our modifications to the radiation model significantly improve its ability to predict international movement from the United States to Canada and Mexico. The modifications particularly improve the fit of the model to commuter data for US counties near international borders. The modified radiation model could be applied to simulate the spread of pandemic influenza in North America.