# Browsing by Subject "Combinatorial optimization"

Now showing 1 - 6 of 6

- Results Per Page
1 5 10 20 40 60 80 100

- Sort Options
Ascending Descending

Item Algorithms and heuristics for combinatorial optimization in phylogeny(2006) Ganapathysaravanabavan, Ganeshkumar; Ramachandran, Vijaya; Warnow, TandyShow more The goal of phylogeny is to infer the evolutionary history of the numerous and diverse species on earth. The evolutionary history is usually represented in the form of a phylogenetic tree. Reconstruction of the phylogenetic history of all life on earth is one of the central goals of biology and is also a problem that has many practical applications, ranging from drug design to conservation of biodiversity. My dissertation addresses some of the combinatorial optimization problems that arise in phylogenetic reconstruction and the related field of historical biogeography. Two of the most important methods for reconstructing phylogenies are the optimization problems Maximum parsimony (MP) and Maximum likelihood (ML). Maximum parsimony is a purely combinatorial optimization problem, whereas ML requires an explicit probabilistic model of evolution. Both MP and ML are NP-hard problems, and the best current methods for obtaining good MP and ML solutions are based on some form of local search. Hence, developing better local-search heuristics for ML and MP is a problem of primary importance in phylogeny. Most current local searches are based on the Tree Bisection and Reconnection (TBR) operation for moving from tree to tree. In my dissertation, I consider an alternative local-search move called Edge-Contract-and-Refine (ECR) and explore its theoretical properties in detail. We prove that the neighborhood structure induced by the ECR move on the search space has many attractive properties. These properties suggest it can complement the TBR move in local-searches. We provide fast algorithms for computing the 2-ECR neighbor of a tree with the best MP score during a local search, and for computing a random ECR neighbor. Further, we conduct an extensive comparative study of popular ML heuristics involving both large biological and simulated datasets. We also investigate the impact of the tree used to start the search on the quality of the results obtained. Our results suggest that of the currently available ML software, PHYML, RAxML and GARLI are the three best ML methods. Our results also suggest that the starting tree affects the performance of the search. However, the correlation between the quality of the starting tree and the quality of the final solution is not straightforward. Phylogenetic reconstruction methods like MP and ML often return not a single tree but hundreds, even thousands, of trees with the same optimality score. Consensus phylogenetic trees are often used to summarize the common features observed in all or most of the resultant trees. Maximum Compatible Subtree (MCT) is one such consensus tree. In my dissertation, I provide a fixed-parameter tractable algorithm for the MCT problem when the input trees have a bounded degree. Further, I provide fast polynomial time approximation algorithms for the complement of the MCT problem. The goal of historical biogeography is to infer the history of the geographic distribution of organisms in the light of their evolutionary history. One of the most important tools in historical biogeographical inference is the comparison of phylogenies of different groups of species that share their geographic distribution. Common patterns observed among these different phylogenies suggest a shared geographic history. So that they can be compared, the phylogenies are converted to area cladograms, where the taxa are the geographic locations of organisms. Until very recently, area cladograms have been compared only visually. In my dissertation, we formalize the problem of comparing area cladograms in the form of the Maximum Agreement Area Cladogram (MAAC) problem, and provide a fast polynomial-time algorithm for computing the MAAC of two area cladograms. Further, we develop distance metrics for the comparison of area cladograms.Show more Item Characterizing neighborhoods favorable to local search techniques(2004) Dimova, Boryana Slavcheva; Barnes, J. Wesley.; Popova, ElmiraShow more NP-Complete problems are the most difficult problems to solve and polynomial time algorithms to solve these problems do not exist. One of the more powerful approaches for such problems are heuristic direct search techniques. For a given problem, a landscape is composed of (1) a solution space, (2) an objective function value defined at all elements of the solution space and (3) a direct search neighborhood defined for each element of the solution space. The goal of the research documented in this dissertation was to extend previous characterizations of landscapes conducive to the success of direct search methodologies. The primary contributions of this dissertation are as follows: (1) The extension of the characterization of AR(1) elementary landscapes to include arbitrary neighborhood definitions (2) The creation of an entirely new class of landscapes favorable to direct search methods, a subset of the AR(p) neighborhoods where p>1 (3) The development of a lower (upper) bound for a local minima (maxima) in AR(1) elementary landscapes using information stability (4) The characterization multistep composite AR(1) elementary landscapesShow more Item A column generation approach for stochastic optimization problems(2006) Wang, Yong Min; Bard, Jonathan F.; Morton, David P.Show more Understanding how uncertainty effects the dynamics and behavior of an organization is a critical aspect of system design. Models and methods that take uncertainty into account can lead to significant reductions in cost. This dissertation investigates the use of stochastic optimization models for the following applications: (1) a generalized assignment problem (GAP) with uncertain resource capacity and unknown processing times and (2) a shift planning and scheduling problem (SPSP) with unknown demand that arises in the construction of the workforce at US Postal Service mail processing and distribution centers. In the models, the first stage decisions are made before the uncertainty is revealed. For the GAP, the first stage decisions correspond to an assignment of jobs to agents. Penalties are incurred when the assignments do not permit all demand to be satisfied. For the SPSP, the number of full-time and part-time employees, as well as the number of full-time shifts by type, must be specified before the demand is known. In the second stage, feasibility is addressed by allocating overtime and calling in temporary workers to handle spikes in the mail volume. This dissertation consists of three parts: (i) the development and analysis of stochastic integer programming models for the GAP and the SPSP, (ii) the estimation of the demand distributions from historical data for the Dallas processing and distribution center, and (iii) the development of column generation algorithms to solve the associated stochastic integer programs. In the first part, the stochastic integer programming models are formulated for the two applications and the value of the stochastic solutions is presented. The second part begins with an analysis of the weekly demand and a test of hypothesis concerning the existence of an end-of-month effect, i.e., that the week at the end of the month might have larger volume. The hypothesis is rejected. After removing outliers, it is shown that the demand is approximated well by a normal distribution. In the last part of the dissertation, a branch-and-price algorithm is developed and used to solve the two applications. Experimental results demonstrate the algorithm’s effectiveness.Show more Item Combinatorial optimization for graphical structures in machine learning(2019-12) Lindgren, Erik Michael; Dimakis, Alexandros G.; Caramanis, Constantine; Sanghavi, Sujay; Klivans, Adam; Liu, QiangShow more Graphs are an essential topic in machine learning. In this proposal, we explore problems in graphical models (where a probability distribution has conditional independencies specified by a graph), causality (where a directed graph specifies causal directions), and clustering (where a weighted graph is used to denote related items). For our first contribution, we consider the facility location problem. In this problem our goal is to select a set of k "facilities" such that the average benefit from a "town" to its most beneficial facility is maximized. As input, we receive a bipartite graph where every edge has a weight denoting the benefit the town would receive from the facility if selected. The input graph is often dense, with O(n²) edges. We analyze sparsifying the graph with nearest neighbor methods. We give a tight characterization for how sparse each method can make the graph while approximately maintaining the value of the optimal solution. We then demonstrate these approaches experimentally and see that they lead to large speedups and high quality solutions. Next, we consider the MAP inference problem in discrete Markov Random Fields. MAP inference is the problem of finding the most likely configuration of a Markov Random Field. One common approach to finding the MAP solution is with integer programming. We are interested in analyzing the complexity of integer programming under the assumption that there is a polynomial number of fractional vertices in the linear programming relaxation of the integer program. We show that under this assumption the optimal MAP assignment can be found in polynomial time. Our result can be generalized to arbitrary integer programs such that the solution set is contained in the unit hypercube: we show that any integer program in the unit hypercube with a polynomial number of fractional vertices in the linear programming relaxation can be solved in polynomial time. We then consider the minimum cost intervention design problem: given the essential graph of a causal graph and a cost to intervene on a variable, identify the set of interventions with minimum total cost that can learn any causal graph with the given essential graph. We first show that this problem is NP-hard. We then prove that we can achieve a constant factor approximation to this problem with a greedy algorithm. Our approach to proving this guarantee uses tools from submodular optimization and knapsack quantization. Next we consider the problem of learning Ising models when an adversary can corrupt the samples we receive. For this problem we give nearly tight lower and upper bounds on the sample complexity. Finally, we consider the problem of conditional sampling from invertible generative models. We first establish hardness results of generating conditional samples. We then develop a scheme using variational inference that allows us to approximately solve the problem. Our approach is able to utilize the given invertible model to improve sample quality.Show more Item A group theoretic approach to metaheuristic local search for partitioning problems(2005) Kinney, Gary W.; Barnes, J. Wesley.Show more Recent work has demonstrated the power of combining group theory with metaheuristic search methodologies to solve discrete optimization problems. Group theory provides tools to characterize the underlying structures in move neighborhoods, solution representations and solution landscapes. Exploiting these structures with group theoretic techniques produces highly effective and efficient search algorithms. Using group theory we develop a methodology for partitioning the solution space into orbits. The partitioning is performed by clustering the variables based on the reduced costs of the LP relaxation creating “good” and “bad” orbits. We are able to calculate the size of each orbit and place upper and lower bounds on the solutions contained within. The search efforts can then be directed on the “good” orbits. Based on these ideas, we develop a Group Theoretic Tabu Search (GTTS) algorithm for solving the unicost Set Covering Problem (SCP), GTTS-USCP. We tested our algorithm on 65 benchmark problems and compared the results against the previous best known and solutions obtained by CPLEX 9.0. GTTS-USCP discovered 46 new best known solutions. GTTS-USCP converged significantly faster than CPLEX for all problem sets. We explore the general integer linear program (ILP) by way to the group minimization problem (GMP). By examining the local search in terms of the GMP, we gain insights that will help us solve the ILP. We describe the local search for the corner polyhedron in the space of the non-basic variables. Integer points in the corner polyhedron that produce an all integer basis form a sub-lattice. We develop identity move neighborhoods that allow the local search to traverse this sub-lattice. We also develop bound strengthening of the non-basic variables based on reduced costs. These bounds effectively shrink the corner polyhedra reducing the size of the solution space we must search. Based on this research, we develop a GTTS algorithm for solving the GMP, GTTS-GMP. Since the GMP can be formed from any ILP, this algorithm solves the general ILP. The algorithm performs well on a diverse set of benchmark problems.Show more Item On the shortest path and minimum spanning tree problems(2003) Pettie, Seth; Ramachandran, VijayaShow more The shortest path and minimum spanning tree problems are two of the classic textbook problems in combinatorial optimization. They are simple to describe and admit simple polynomial-time algorithms. However, despite years of concerted research effort, the asymptotic complexity of these problems remains unresolved. The main contributions of this dissertation are a number of asymptotically faster algorithms for the minimum spanning tree and shortest path problems. Of equal interest, we provide some clues as to why these problems are so diffcult. In particular, we show why certain modern approaches to the problems are doomed to have super-linear complexity. A sampling of our results are listed below. We emphasize that all of our algorithms work with general graphs, and make no restrictive assumptions on the numerical representation of edge-lengths. A provably optimal deterministic minimum spanning tree algorithm. (We give a constructive proof that the algorithmic complexity of the minimum spanning tree problem is equivalent to its decision-tree complexity.) An all-pairs shortest path algorithm for general graphs running in time O(mn + n2 log log n), where m and n are the number of edges and vertices. This provides the first improvement over approaches based on Dijkstra's algorithm. An all-pairs shortest path algorithm for undirected graphs running in O(mn log α ) time, where α = α (m, n) is the inverse-Ackermann function. A single-source shortest path algorithm running in O(m α +min{n log log r, n log n}) time, where r bounds the ratio of any two edge lengths. For r polynomial in n this is O(m + n log log n), an improvement over Dijkstra's algorithm. An inverse-Ackermann style lower bound for the online minimum spanning tree veri cation problem. This is the rst inverse-Ackermann type lower bound for a comparison-based problem. An Ω (m + n log n) lower bound on any hierarchy-type single-source shortest path algorithm, implying that this type of algorithm cannot improve upon Dijkstra's algorithm. (All of our shortest path algorithms are of the hierarchy type.) The first parallel minimum spanning tree algorithm that is optimal w.r.t. to both time and work. Our algorithm is for the EREW PRAM model. A parallel, expected linear-work minimum spanning tree algorithm using only a polylogarithmic number of random bits. An O(mn log α ) bound on the comparison-addition complexity of all-pairs shortest paths. This is within a tiny log α factor of optimal when m = O(n).Show more