# Browsing by Subject "Algorithms"

Now showing 1 - 20 of 53

- Results Per Page
1 5 10 20 40 60 80 100

- Sort Options
Ascending Descending

Item Accelerating graph computation with system optimizations and algorithmic design(2021-08-06) Hoang, Loc Dac; Pingali, Keshav; Huang, Qixing; Rossbach, Christopher; Wu, BoShow more Most data in today's world can be represented in a graph form, and these graphs can then be used as input to graph applications to derive useful information, such as shortest paths in a road network, similarity between drugs in a drug-protein network, persons of interest in a social network, or recommended products for customers in a customer purchase history graph. Graphs are growing larger as time passes, so there is an ever-growing need for efficient graph applications. Developers typically have two methods for accelerating the runtime of a graph application: (1) they optimize the systems on which the graph application is run on, and/or (2) they optimize the algorithm itself and gain speedup via algorithmic novelties. In this dissertation, I propose work that accelerates graph applications from these two perspectives. Broadly speaking, the work I present in this dissertation will be split into systems work and algorithmic work. On the systems end, I present the CuSP system and the DeepGalois system: 1. CuSP, or Customizable Streaming Partitioner, is a fast and general distributed graph partitioner that generates partitions for distributed graph analytics systems to use. CuSP provides a solution to the problem of existing slow partitioners that can only handle a few built-in policies by providing users with a general interface for specifying streaming partitioning policies that CuSP will then use to efficiently partition graphs. Our evaluation of the system shows that it can partition up to 22× faster than the state-of-the-art offline graph partitioner XtraPulp while producing partitions that allow graph applications to run 2× faster on average than XtraPulp's partitions. CuSP can be extended to allow users to express specific partitioning policies for their algorithms as we show in a case study with distributed triangle counting. 2. DeepGalois is a distributed graph neural network system that uses the observation that graph neural network computation can be expressed as a graph problem which allows it to be implemented in graph analytics systems. DeepGalois is built using existing distributed graph analytics systems; CuSP and the Gluon communication substrate are used to partition GNN graphs and efficiently communicate partial aggregations and gradients. It also supports sampling and minibatching of the graph. Experimental results on up to 128 CPU machines demonstrate that DeepGalois scales and that DeepGalois's epoch time for its best host configurations for the evaluated graphs is on average 2× faster than DistDGL's epoch time for its best host configurations. From an algorithmic perspective, I present a novel round-efficient distributed betweenness centrality and a novel formulation of the graph transformer network as a graph algorithm that allows for more efficient computation. 1. Min Rounds Betweenness Centrality (MRBC) is a provably round efficient BC algorithm that uses a novel message update rule in order to only send out updates from a vertex if it knows that the data it is going to send it finalized. We prove the correctness of this rule as well as establish bounds on the maximum number of rounds the algorithm takes. We implement MRBC in the D-Galois distributed graph analytics system where we further reduce communication overhead by relaxing proxy update frequency based on the message send rule. Evaluation shows that compared to a classic Brandes BC algorithm implementation, it reduces the number of rounds by 14× and communication time by 2.8× on average. 2. Graph Transformer Networks (GTNs) are a variant of graph convolutional networks that learn and use important typed paths called metapaths in a heterogeneous graph in order to improve task accuracy. The original formulation of the problem uses a series of dense matrix multiplies that are space inefficient, and the matrix formulation makes it difficult to use fine-grained graph techniques like sampling. We formulate the GTN problem as a graph problem that is more space efficient as it does not need dense matrices. In addition, because it is formulated as a graph algorithm, we can apply metapath sampling on top of it to significantly decrease the computational load. Evaluation shows that the sampling-based graph formulation of the GTN can be up to 155× faster than the original formulation without any compromise in task accuracy.Show more Item Adaptive multiscale modeling of polymeric materials using goal-oriented error estimation, Arlequin coupling, and goals algorithms(2008-05) Bauman, Paul Thomas, 1980-; Oden, J. Tinsley (John Tinsley), 1936-Show more Scientific theories that explain how physical systems behave are described by mathematical models which provide the basis for computer simulations of events that occur in the physical universe. These models, being only mathematical characterizations of actual phenomena, are obviously subject to error because of the inherent limitations of all mathematical abstractions. In this work, new theory and methodologies are developed to quantify such modeling error in a special way that resolves a fundamental and standing issue: multiscale modeling, the development of models of events that transcend many spatial and temporal scales. Specifically, we devise the machinery for a posteriori estimates of relative modeling error between a model of fine scale and another of coarser scale, and we use this methodology as a general approach to multiscale problems. The target application is one of critical importance to nanomanufacturing: imprint lithography of semiconductor devices. The development of numerical methods for multiscale modeling has become one of the most important areas of computational science. Technological developments in the manufacturing of semiconductors hinge upon the ability to understand physical phenomena from the nanoscale to the microscale and beyond. Predictive simulation tools are critical to the advancement of nanomanufacturing semiconductor devices. In principle, they can displace expensive experiments and testing and optimize the design of the manufacturing process. The development of such tools rest on the edge of contemporary methods and high-performance computing capabilities and is a major open problem in computational science. In this dissertation, a molecular model is used to simulate the deformation of polymeric materials used in the fabrication of semiconductor devices. Algorithms are described which lead to a complex molecular model of polymer materials designed to produce an etch barrier, a critical component in imprint lithography approaches to semiconductor manufacturing. Each application of this so-called polymerization process leads to one realization of a lattice-type model of the polymer, a molecular statics model of enormous size and complexity. This is referred to as the base model for analyzing the deformation of the etch barrier, a critical feature of the manufacturing process. To reduce the size and complexity of this model, a sequence of coarser surrogate models is generated. These surrogates are the multiscale models critical to the successful computer simulation of the entire manufacturing process. The surrogate involves a combination of particle models, the molecular model of the polymer, and a coarse-scale model of the polymer as a nonlinear hyperelastic material. Coefficients for the nonlinear elastic continuum model are determined using numerical experiments on representative volume elements of the polymer model. Furthermore, a simple model of initial strain is incorporated in the continuum equations to model the inherit shrinking of the A coupled particle and continuum model is constructed using a special algorithm designed to provide constraints on a region of overlap between the continuum and particle models. This coupled model is based on the so-called Arlequin method that was introduced in the context of coupling two continuum models with differing levels of discretization. It is shown that the Arlequin problem for the particle-tocontinuum model is well posed in a one-dimensional setting involving linear harmonic springs coupled with a linearly elastic continuum. Several numerical examples are presented. Numerical experiments in three dimensions are also discussed in which the polymer model is coupled to a nonlinear elastic continuum. Error estimates in local quantities of interest are constructed in order to estimate the modeling error due to the approximation of the particle model by the coupled multiscale surrogate model. The estimates of the error are computed by solving an auxiliary adjoint, or dual, problem that incorporates as data the quantity of interest or its derivatives. The solution of the adjoint problem indicates how the error in the approximation of the polymer model inferences the error in the quantity of interest. The error in the quantity of interest represents the relative error between the value of the quantity evaluated for the base model, a quantity typically unavailable or intractable, and the value of the quantity of interest provided by the multiscale surrogate model. To estimate the error in the quantity of interest, a theorem is employed that establishes that the error coincides with the value of the residual functional acting on the adjoint solution plus a higher-order remainder. For each surrogate in a sequence of surrogates generated, the residual functional acting on various approximations of the adjoint is computed. These error estimates are used to construct an adaptive algorithm whereby the model is adapted by supplying additional fine-scale data in certain subdomains in order to reduce the error in the quantity of interest. The adaptation algorithm involves partitioning the domain and selecting which subdomains are to use the particle model, the continuum model, and where the two overlap. When the algorithm identifies that a region contributes a relatively large amount to the error in the quantity of interest, it is scheduled for refinement by switching the model for that region to the particle model. Numerical experiments on several configurations representative of nano-features in semiconductor device fabrication demonstrate the effectiveness of the error estimate in controlling the modeling error as well as the ability of the adaptive algorithm to reduce the error in the quantity of interest. There are two major conclusions of this study: 1. an effective and well posed multiscale model that couples particle and continuum models can be constructed as a surrogate to molecular statics models of polymer networks and 2. an error estimate of the modeling error for such systems can be estimated with sufficient accuracy to provide the basis for very effective multiscale modeling procedures. The methodology developed in this study provides a general approach to multiscale modeling. The computational procedures, computer codes, and results could provide a powerful tool in understanding, designing, and optimizing an important class of semiconductormanufacturing processes. The study in this dissertation involves all three components of the CAM graduate program requirements: Area A, Applicable Mathematics; Area B, Numerical Analysis and Scientific Computation; and Area C, Mathematical Modeling and Applications. The multiscale modeling approach developed here is based on the construction of continuum surrogates and coupling them to molecular statics models of polymer as well as a posteriori estimates of error and their adaptive control. A detailed mathematical analysis is provided for the Arlequin method in the context of coupling particle and continuum models for a class of one-dimensional model problems. Algorithms are described and implemented that solve the adaptive, nonlinear problem proposed in the multiscale surrogate problem. Large scale, parallel computations for the base model are also shown. Finally, detailed studies of models relevant to applications to semiconductor manufacturing are presented.Show more Item Advanced navigation algorithms for precision landing(2007-12) Zanetti, Renato, 1978-; Bishop, Robert H., 1957-Show more A detailed analysis of autonomous navigation algorithms to achieve autonomous precision landing is presented. The problem of integrated attitude determination and inertial navigation is solved. The theoretical results are applied and tested in three different applications. Optimality conditions for constrained quaternion estimation using the Kalman filter are derived. It is common in spacecraft applications to separate the attitude determination from the inertial navigation system. While this approach has worked in the past, it inevitably degrades the navigation performance when the correlations between the two systems are not correctly accounted for. It is shown how to optimally include an attitude determination algorithm into the Kalman filter. When the conditions to achieve optimality are not met, it is shown how to achieve sub-optimality by properly accounting for the correlation. The traditional approach to inertial navigation is to employ the inertial measurement unit (IMU) outputs to propagate the estimated states forward in time, rather then use them to update the state. A detailed covariance analysis of deadreckoning Mars entry navigation is performed. The contribution of various sources of IMU errors are explicitly accounted for and the filter performance is validated through Monte Carlo analysis. The drawback of dead-reckoning is that this approach prevents the inertial measurements from reducing the uncertainty of the estimated states. While this shortcoming can be compensated by the availability of other measurements, it becomes crucial when the IMU is the only sensor to provide measurements. Such a situation arises, for example, during Mars atmospheric entry. In the second application of this work, IMU measurements from a NASA mission are processed in an extended Kalman filter, and the results are compared to dead-reckoning. It is shown that is possible to reduce the uncertainty of the inertial states by filtering the IMU. The final application is lunar descent to landing navigation. In this example the IMU is filtered and the algorithms to include an attitude estimate into the Kalman filter are tested. The design performance is confirmed by Monte Carlo analysis.Show more Item Algorithmic techniques for improving the speed and accuracy of phylogenetic methods(2004) Roshan, Usman Waheed; Warnow, TandyShow more Item Algorithms for analyzing parallel computations(2017-08) Chauhan, Himanshu; Garg, Vijay K. (Vijay Kumar), 1963-; Julien, Christine; Mittal, Neeraj; Nikolova, Evdokia; Pingali, Keshav; Reddi, Vijay JanapaShow more Predicate detection is a powerful technique to verify parallel programs. Verifying correctness of programs using this technique involves two steps: first we create a partial order based model, called a computation, of an execution of a parallel program, and then we check all possible global states of this model against a predicate that encodes a faulty behavior. A partial order encodes many total orders, and thus even with one execution of the program we can reason over multiple possible alternate execution scenarios. This dissertation makes algorithmic contributions to predicate detection in three directions. Enumerating all consistent global states of a computation is a fundamental problem requirement in predicate detection. Multiple algorithms have been proposed to perform this enumeration. Among these, the breadth-first search (BFS) enumeration algorithm is especially useful as it finds an erroneous consistent global state with the least number of events possible. The traditional algorithm for BFS enumeration of consistent global states was given more than two decades ago and is still widely used. This algorithm, however, requires space that in the worst case may be exponential in the number of processes in the computation. We give the first algorithm that performs BFS based enumeration of consistent global states of a computation in space that is polynomial in the number of processes. Detecting a predicate on a computation is a hard problem in general. Thus, in order to devise efficient detection and analysis algorithms it becomes necessary to use the knowledge about the properties of the predicate. We present algorithms that exploit the properties of two classes of predicates, called stable and counting predicates, and provide significant reduction — exponential in many cases — in time and space required to detect them. The technique of computation slicing creates a compact representation, called slice, of all global states that satisfy a class of predicates called regular predicate. We present the first distributed and online algorithm to create a slice of a computation with respect a regular predicate. In addition, we give efficient algorithms to create slices of two important temporal logic formulas even when their underlying predicate is not regular but either the predicate or its negation is efficiently detectable.Show more Item Algorithms for numerical modeling and inversion of multi-phase fluid flow and electromagnetic measurements(2005) Alpak, Faruk Omer; Torres-Verdín, Carlos; Sepehrnoori, Kamy, 1951-Show more The focus of this dissertation is the estimation of petrophysical properties of rock formations based on the combined use of electromagnetic and fluid-flow measurements. Traditionally, borehole electromagnetic measurements are interpreted independently in terms of spatial variations of electrical resistivity. The estimated spatial variations of electrical resistivity are subsequently interpreted in terms of variations of fluid saturation and porosity. Such a strategy can lead to erroneous conclusions concerning the petrophysical evaluation of rocks because the spatial distribution of electrical resistivity is often governed by the interplay between salt concentration, absolute permeability, relative permeability, and capillary pressure. To date, no consistent effort has been advanced to use the physics of multi-phase fluid flow as the leading phenomenon in the interpretation of borehole electromagnetic measurements. This dissertation develops several efficient nonlinear inversion algorithms that quantitatively combine borehole electromagnetic and fluid-flow phenomena. These inversion algorithms also provide a measure of uncertainty and non-uniqueness in the presence of noisy and imperfect measurements. The combined use of electromagnetic and fluid-flow measurements drastically reduces non-uniqueness and uncertainty of the estimated petrophysical parameters and, therefore, increases the accuracy of the estimates. Specific problems considered in this dissertation are the estimation of spatial distributions of porosity, permeability, and fluid saturation, as well as the estimation of relative permeability and capillary pressure. Joint and independent nonlinear inversions are performed for large-scale petrophysical properties from in-situ permanent sensor data and near-borehole scale petrophysical variables of rock formations from wireline formation tester and electromagnetic induction logging measurements. For cases where fluid-flow related measurements are absent, the coupled dual-physics inversion strategy allows quantitative interpretation of electromagnetic measurements consistent with the physics of fluid flow. It is conclusively shown that the simultaneous use of fluid-flow and electromagnetic data sets reduces non-uniqueness in the inverted petrophysical model.Show more Item Algorithms for the analysis of whole genomes(2004) Wyman, Stacia Kathleen; Kuipers, Benjamin; Jansen, Robert K.Show more With the advent of whole genome sequencing, we now have an abundance of whole genomes which have been sequenced and we have entered an era when algorithms can address problems at the whole genome level. In the past, sequencing efforts often focused on a single gene, and therefore, existing algorithms are at the scale of a single gene. With whole genome sequencing, we have access to sequence data for the entire genome of an organism or an organelle and algorithms are needed for whole genome analysis. In this research, we have addressed new computational problems that have arisen out of the availability and abundance of whole genome data. In genome annotation, all of the genes of a genome are located and identified in preparation for publication of the complete genome sequence. We address the problem of genome annotation with a software package that allows researchers to locate and identify all the genes in a genome and prepare the genome for direct submission to GenBank. A difficult problem that arises in the annotation of organellar genomes is the identification of animal mitochondrial transfer RNA genes. We present an experimental evaluation a set of methods (including our own) for identifying tRNAs. The problem of reconstructing phylogenies from gene order data involves recreating the evolutionary history of a set of organisms based on the order and direction of the genes in the genomes. This can give insight into mechanisms of large-scale evolutionary events. We present a new method for gene order phylogeny reconstruction, as well as improvements to an existing method, and evaluate the results on both real and simulated datasets. Finally, we address the problem of identification of regulatory elements. Understanding gene expression is one of the most pressing unsolved problems in molecular biology today because gene expression controls all of the metabolic and developmental processes in an organism. We present a new method which uses a comparative genomics approach which is made possible now that we have access to the complete DNA sequences of many sets of related organisms.Show more Item Algorithms for VLSI design planning(2003) Chen, Hung-ming; Mok, Aloysius Ka-Lau.; Wong, D. F.Show more With shrinking feature sizes, much more transistors can be integrated on a single chip. Moore’s Law has been followed closely in the past decades, resulting in larger and faster chips every year. In order to design larger and faster chips in deep submicron (DSM) technology, it is necessary to perform early design planning. In this dissertation, we present several algorithms for a number of VLSI design planning problems. First, we propose a method to integrate interconnect planning with floorplanning. Our approach is based on the Wong-Liu floorplanning algorithm. We perform pin assignment and fast global routing during every iteration of floorplanning. We use a multi-stage simulated annealing approach in which different interconnect planning methods are used in different ranges of temperatures to reduce running time. A temperature adjustment scheme is designed to give smooth transitions between different stages of simulated annealing. Second, floorplanning problems typically have relatively small number of blocks (e.g., 50-100) but have a large number of nets (e.g. 20K). Since existing floorplanning algorithms use simulated annealing which needs to examine a large number of floorplans, this has made interconnect-centric floorplanning computationally very expensive. We present approaches that can dramatically improve the run time of problems with large number of nets and at the same time improve solution quality. Third, we propose a method for simultaneous power supply planning and noise avoidance in floorplan design. Without careful power supply planning in layout design, the design of chips will suffer from mostly signal integrity problems including IR-drop, I noise, and IC reliability. Post-route methodologiesin solving signal integrity problem have been applied but they will cause a long turn-around time, which adds costly delays to time-to-market. We show that the noise avoidance in power supply planning problem can be formulated as a constrained maximum flow problem. Fourth, I/O placement has been a concern in modern IC design. Due to flipchip and multi-chip module technologies, I/O can be placed throughout the whole chip without long wires from the periphery of the chip. However, because of I/O placement constraints and I/O buffer site building cost, the decision of positions for placing I/O buffers has become critical. Our objective is to reduce the number of I/O buffer sites and to decide their positions in an existing standard cell placement. We formulate it as a minimum cost flow problem.Show more Item Antenna and algorithm design in MIMO communication systems: exploiting the spatial selectivity of wireless channels(2006) Forenza, Antonio; Heath, Robert W., Jr, 1973-Show more Cellular telephony and wireless Internet access are creating a growing demand for high quality wireless communications. Unfortunately, the current wireless communication infrastructure is not fully equipped to offer these unprecedented data rates and quality of service. The major obstacles include limited bandwidth availability, limited transmit power, and fluctuations in signal strength which are intrinsic to the wireless channel. Future wireless standards are relying on innovative core technologies such as multiple-input multiple-output (MIMO) communications to overcome these problems. The spatial dimension due to antenna arrays at the transmitter and receiver of MIMO communication systems can be exploited by sophisticated signal processing techniques to offer high link capacity, enhanced resistance to interference, and robustness to channel fading. The benefits of MIMO technology are obtained through a combination of antenna arrays that can provide spatial diversity and algorithms that can adapt to the propagation channel. Antenna arrays have to be designed to be robust in different propagation scenarios and provide the degrees of spatial diversity expected by the algorithms. The algorithms can adaptively reconfigure the transmission methods by tracking the changing channel conditions. The premise of the work presented in this dissertation is that antenna arrays and algorithms at the physical layer can be designed, based on performance metrics from different layers, to exploit the channel spatial selectivity, resulting in improved system performance. This dissertation presents performance analysis and design methodology of MIMO arrays, employing pattern diversity technique, in spatially correlated channels. The proposed array designs consist of collocated circular patch antennas, or circular patch arrays (CPAs). The benefit of pattern diversity, obtained through CPAs, over conventional space diversity technique is first demonstrated through analysis. Then a novel design methodology for compact MIMO arrays optimized with respect to microwave theory and communication theoretic metrics for given size constraints is proposed. This dissertation also presents adaptive algorithms at the physical layer to switch between different MIMO transmission schemes, based on statistical channel information. These adaptive algorithms exploit the spatial selectivity inherent in the channel and are designed to enhance the spectral efficiency of next generation wireless systems, for predefined target error rate.Show more Item Approximation algorithms for NP-hard clustering problems(2002) Mettu, Ramgopal Reddy; Plaxton, C. GregShow more Given a set of n points and their pairwise distances, the goal of clustering is to partition the points into a “small” number of “related” sets. Clustering algorithms are used widely to manage, classify, and summarize many kinds of data. In this dissertation, we study the classic facility location and k-median problems in the context of clustering, and formulate and study a new optimization problem that we call the online median problem. For each of these problems, it is known to be N P-hard to compute a solution with cost less than a certain constant factor times the optimal cost. We give simple constant-factor approximation algorithms for the facility location, k-median, and online median problems with optimal or near-optimal time bounds. We also study distance functions that are “approximately” metric, and show that such distance functions allow us to obtain a faster online median algorithm and to generalize our analysis to other objective functions, such as that of the well-known k-means heuristic. Given n points, the associated interpoint distances and nonnegative point weights, and a nonnegative penalty for each point, the facility location problem asks us to identify a set of cluster centers so that the weighted average cluster radii and the sum of the cluster center penalties are both minimized. The k-median problem asks us to identify exactly k cluster centers while minimizing just the weighted average cluster radii. We give a simple greedy algorithm for the facility location problem that runs in O(n2) time and produces a solution with cost at most 3 times optimal. For the k-median problem, we develop and make use of a sampling technique that we call successive sampling, and give a randomized constant-factor approximation algorithm that runs in O(n(k + log n + log2 n)) time. We also give an Ω(nk) lower bound on the running time of any randomized constant-factor approximation algorithm for the k-median problem that succeeds with even a negligible constant probability. In many settings, it is desirable to browse a given data set at differing levels of granularity (i.e., number of clusters). To address this concern, we formulate a generalization of the k-median problem that we call the online median problem. The online median problem asks us to compute an ordering of the points so that, over all i, when a prefix of length i is taken as a set of cluster centers, the weighted average radii of the induced clusters is minimized. We show that a natural generalization of the greedy strategy that we call hierarchically greedy yields an algorithm that produces an ordering such that every prefix of the ordering is within a constant factor of the associated optimal cost. Furthermore, our algorithm has a running time of Θ(n2). Finally, we study the performance of our algorithms in practice. We present implementations of our k-median and online median algorithms; our experimental results indicate that our approximation algorithms may be useful in practice.Show more Item Automating transformations from floating-point to fixed-point for implementing digital signal processing algorithms(2006) Han, Kyungtae; Evans, Brian L. (Brian Lawrence), 1965-Show more Many digital signal processing and communication algorithms are first simulated using floating-point arithmetic and later transformed into fixedpoint arithmetic to reduce implementation complexity. This transformation process may take more than 50% of the design time for complex designs. In addition, wordlengths in fixed-point designs may be altered at later stages in the design cycle. Different choices of wordlengths lead to different tradeoffs between signal quality and implementation complexity. In this dissertation, I propose two methods for characterizing the tradeoffs between signal quality and implementation complexity during the transformation of digital system designs to fixed-point arithmetic and variables. The first method, a gradient-based search for single-objective optimization with sensitivity information, scales linearly with the number of variables, but can become trapped in local optima. Based on wordlength design case studies for a wireless communication demodulator, adding sensitivity information reduces the search time by a factor of four and yields a design with 30% lower implementation costs. The second method, a genetic algorithm for multi-objective optimization, provides a Pareto optimal front that evolves towards the optimal tradeoff curve for signal quality vs. implementation complexity. This second method can be used to fully characterize the design space. I propose to use wordlength reduction methods of signed right shift and truncation to reduce power consumption in a given hardware architecture. For each method, I derive the expected values of the number of gates that switch during multiplication of the inputs. I apply the signed right shift method and the truncation method to a 16-bit radix-4 modified Booth multiplier and a 16- bit Wallace multiplier. The truncation method with 8-bit operands reduces the power consumption by 56% in the Wallace multiplier and 31% in the Booth multiplier. The signed right shift method shows a 25% power reduction in the Booth multiplier, but no power reduction in the Wallace multiplier. Finally, this dissertation describes a method to automate design assistance for transformation from floating-point to fixed-point data types. Floatingpoint programs are converted to fixed-point programs by a code generator. Then, the proposed wordlength search algorithms offer designers the freedom to determine data wordlengths to optimize the tradeoffs between signal quality and implementation complexity.Show more Item CAD algorithms for VLSI design and manufacturing(2003) Huang, Li-da; Wong, D. F.; Mok, Aloysius Ka-Lau.Show more Item CAD for nanolithography and nanophotonics(2011-08) Ding, Duo; Pan, David Z.; Chen, Ray T.; Ghosh, Joydeep; Orshansky, Michael E.; Torres, J. Andres; Touba, NurShow more As the semiconductor technology roadmap further extends, the development of next generation silicon systems becomes critically challenged. On the one hand, design and manufacturing closures become much more difficult due to the widening gap between the increasing integration density and the limited manufacturing capability. As a result, manufacturability issues become more and more critically challenged in the design of reliable silicon systems. On the other hand, the continuous scaling of feature size imposes critical issues on traditional interconnect materials (Cu/Low-K dielectrics) due to power, delay and bandwidth concerns. As a result, multiple classes of new materials are under research and development for future generation technologies. In this dissertation, we investigate several critical Computer-Aided Design (CAD) challenges under advanced nanolithography and nanophotonics technologies. In addressing these challenges, we propose systematic CAD methodologies and optimization techniques to assist the design of high-yield and high-performance integrated circuits (IC) with low power consumption. In Very Large Scale Integration (VLSI) CAD for nanolithography, we study the manufacturing variability under resolution enhancement techniques (RETs) and explore two important topics: (1) fast and high fidelity lithography hotspot detection; (2) generic and efficient manufacturability aware physical design. For the first topic, we propose a number of CAD optimization and integration techniques to achieve the following goals in detecting lithography hotspots: (a) high hotspot detection accuracy; (b) low false-positive rate (hotspot false-alarms); (c) good capability to trade-off between detection accuracy and false-alarms; (d) fast CPU run-time; and (e) excellent layout coverage and computation scalability as design gets more complex. For the second topic, we explore the routing stage by incorporating post-RET manufacturability models into the mathematical formulation of a detailed router to achieve: (a) significantly reduced lithography-unfriendly patterns; (b) small CPU run-time overhead; and (c) formulation generality and compatibility to all types of RETs and evoling manufacturing conditions. In VLSI CAD for nanophotonics, we focus on three topics: (1) characterization and evaluation of standard on-chip nanophotonics devices; (2) low power planar routing for on-chip opto-electrically interconnected systems; (3) power-efficient and thermal-reliable design of nanophotonics Wavelength Division Multiplexing for ultra-high bandwidth on-chip communication. With simulations and experiments, we demonstrate the critical role and effectiveness of Computer-Aided Design techniques as the semiconductor industry marches forward in the deeper sub-micron (45nm and below) domain.Show more Item Closed-loop subspace identification and fault diagnosis with optimal structured residuals(2005) Lin, Weilu; Qin, S. JoeShow more The development of system identification and fault diagnosis theory is of great practical significance. Systems are concerned with a broad spectrum of human-made machinery, including industrial production facilities (power plants, chemical plants, oil refinery, semiconductor fabrication plants, steel mills, paper mills, etc.), transportation vehicles (ships, airplanes, automobiles) and household appliances (heating/air conditioning equipment, refrigerators, washing machines, etc.). This dissertation is focused on subspace identification algorithms and optimal structured residuals approach for processes modeling and diagnosis. Main contributions of this work include: 1. Novel subspace identification methods (SIMs) with enforced causal models are implemented. It has been shown that proposed algorithm has lower estimation variance compared to traditional SIMs. Meanwhile the rigorous analysis shows that the proposed algorithms are consistent under certain assumptions. 2. The feasibility of closed-loop subspace identification is investigated. Novel closed-loop subspace identification methods with innovation estimation are proposed. The new algorithms are shown to be consistent under closed-loop conditions, while the traditional SIMs fail to provide consistent estimates. 3. A new optimal structured residuals (OSR) approach for unidirectional fault diagnosis is proposed. The necessary and sufficient conditions for unidirectional fault isolability with OSR approach are introduced. 4. The OSR for unidirectional fault diagnosis is extended to multidimensional fault diagnosis. The sufficient condition for deterministic multidimensional fault isolability is investigated.Show more Item Co-clustering algorithms : extensions and applications(2008-08) Cho, Hyuk; Dhillon, Inderjit S.Show more Co-clustering is rather a recent paradigm for unsupervised data analysis, but it has become increasingly popular because of its potential to discover latent local patterns, otherwise unapparent by usual unsupervised algorithms such as k-means. Wide deployment of co-clustering, however, requires addressing a number of practical challenges such as data transformation, cluster initialization, scalability, and so on. Therefore, this thesis focuses on developing sophisticated co-clustering methodologies to maturity and its ultimate goal is to promote co-clustering as an invaluable and indispensable unsupervised analysis tool for varied practical applications. To achieve this goal, we explore the three specific tasks: (1) development of co-clustering algorithms to be functional, adaptable, and scalable (co-clustering algorithms); (2) extension of co-clustering algorithms to incorporate application-specific requirements (extensions); and (3) application of co-clustering algorithms broadly to existing and emerging problems in practical application domains (applications). As for co-clustering algorithms, we develop two fast Minimum Sum-Squared Residue Co-clustering (MSSRCC) algorithms [CDGS04], which simultaneously cluster data points and features via an alternating minimization scheme and generate co-clusters in a “checkerboard” structure. The first captures co-clusters with constant values, while the other discovers co-clusters with coherent “trends” as well as constant values. We note that the proposed algorithms are two special cases (bases 2 and 6 with Euclidean distance, respectively) of the general co-clustering framework, Bregman Co-clustering (BCC) [BDG+07], which contains six Euclidean BCC and six I-divergence BCC algorithms. Then, we substantially enhance the performance of the two MSSRCC algorithms by escaping from poor local minima and resolving the degeneracy problem of generating empty clusters in partitional clustering algorithms through the three specific strategies: (1) data transformation; (2) deterministic spectral initialization; and (3) local search strategy. Concerning co-clustering extensions, we investigate general algorithmic strategies for the general BCC framework, since it is applicable to a large class of distance measures and data types. We first formalize various data transformations for datasets with varied scaling and shifting factors, mathematically justify their effects on the six Euclidean BCC algorithms, and empirically validate the analysis results. We also adapt the local search strategy, initially developed for the two MSSRCC algorithms, to all the twelve BCC algorithms. Moreover, we consider variations of cluster assignments and cluster updates, including greedy vs. non-greedy cluster assignment, online vs. batch cluster update, and so on. Furthermore, in order to provide better scalability and usability, we parallelize all the twelve BCC algorithms, which are capable of co-clustering large-scaled datasets over multiple processors. Regarding co-clustering applications, we extend the functionality of BCC to incorporate application-specific requirements: (1) discovery of inverted patterns, whose goal is to find anti-correlation; (2) discovery of coherent co-clusters from noisy data, whose purpose is to do dimensional reduction and feature selection; and (3) discovery of patterns from time-series data, whose motive is to guarantee critical time-locality. Furthermore, we employ co-clustering to pervasive computing for mobile devices, where the task is to extract latent patterns from usage logs as well as to recognize specific situations of mobile-device users. Finally, we demonstrate the applicability of our proposed algorithms for aforementioned applications through empirical results on various synthetic and real-world datasets. In summary, we present co-clustering algorithms to discover latent local patterns, propose their algorithmic extensions to incorporate specific requirements, and provide their applications to a wide range of practical domains.Show more Item Composition-guided image acquisition(2004) Banerjee, Serene; Evans, Brian L.Show more To make a picture more appealing, professional photographers apply a wealth of photographic composition rules, of which amateur photographers are of- ten unaware. This dissertation aims at providing in-camera feedback to the amateur photographer while taking pictures. The proposed algorithms do not depend on prior knowledge of the indoor/outdoor setting or scene, and are amenable to software implementation on fixed-point programmable digital signal processors available in digital still cameras. The key enabling step in automating photographic composition rules is to locate the main subject. Digital still image acquisition maps the 3-D world onto a 2-D picture. By using the 2-D picture alone, segmenting the main subject without prior knowledge of the scene is ill-posed. Even with prior knowledge, segmentation is often computationally intensive and error prone. This dissertation defends the idea that reliable main subject segmenta- tion without prior knowledge of scene and setting may be achieved by acquiring a single picture, in which the optical system blurs objects not in the plane of focus. After segmentation, photographic composition rules may be automated. In this context, segmentation only needs to approximately and not precisely locate the main subject. In this dissertation, I combine optical and digital image processing to perform the segmentation of the main subject without prior knowledge of the scene. In particular, I propose to acquire a picture in which the main subject is in focus, and the shutter aperture is fully open. The lens optics will blur any object not in the plane of focus. For the acquired picture, I develop a computationally simple one-pass algorithm to segment the main subject. The post segmentation objective is to automate selected photographic composition rules. The algorithms can either be applied on the picture taken with the objects not in the plane of focus blurred, or on a user-intended picture with the same focal length settings. This way, in-camera feedback can be provided to the amateur photographer, in the form of alternate compositions of the same scene. I automate three photographic composition rules: (1) placement of the main subject obeying the rule-of-thirds, (2) background blurring to simulate the main subject being in motion or decrease the depth-of-field of the picture, and (3) merger detection and mitigation when equally focused main subject and background objects merge as one object. The primary contributions of the dissertation are in digital still image processing. The first is the automation of segmentation of the main subject in a single still picture assisted by optical pre-processing. The second is the automation of main subject placement, artistic background blur, and merger detection and mitigation to try to improve photographic composition.Show more Item Designing algorithms that assist people to ask visual questions(2019-08-16) Wang, Yanan, M.S. in Information Studies; Gurari, DannaShow more Visual question answering services can help people with visual impairments answer their visual questions by supporting them to submit an image and a question. However, people with visual impairments may not know exactly what is captured in their images, and their questions may not be explicit and specific enough to be answered accurately. In the project, we designed several algorithms to assist people by identifying when to instruct them to take a higher quality image, ask a valid question, or clarify the object of interest. This can help people to make the visual question they asked answerable with a single, unambiguous answer. The algorithms are evaluated based on accuracy and demonstrated to be effective. Finally, we designed a system to interact with users with our algorithms and will implement it for a user study in future workShow more Item Dynamic, distributed resource allocation on regular SW-banyans(1986) Feo, John T., 1955-; Browne, James C.Show more In order to achieve the orders-of-magnitude increases in speed and fault tolerance desired by many of today's users, it is necessary to execute the parallel subtasks of processes concurrently. A class of computer systems able to achieve such high-performance parallel computing is configurable multiprocessors. Such systems can be both flexible and fast; however, the degree to which they are is determined by the properties of their interconnection networks and the efficiency and robustness of the algorithms used to configure the networks into disjoint subsystems each with the resources and the architecture desired by the intended user. This dissertation establishes a basis for configuring regular SW-banyans (a large class of configurable networks) which is both theoretically sound and practical in the context of an existing system, such as the Texas Reconfigurable Array Computer. In addition, specific algorithms to construct a wide variety of architectures on regular SW-banyans are presented. These algorithms are dynamic, distributed, of complexity N Log N or better and work correctly in the presence of faultsShow more Item Fundamental algorithms for physical design planning of VLSI(2002) Tang, Xiaoping; Wong, D. F.Show more Rapid advances in semiconductor technologies have led to a dramatic increase in the complexity of VLSI circuits. With fabrication technology entering deep submicron era, devices are scaled down, more functionalities are integrated into one chip, and chips run at higher clock frequencies. Handling the extremely large designs with ever-increasing clock rates, while reducing design turnaround time and ensuring timing convergence, is exhausting the capabilities of traditional design tools. Thus careful up-front design planning, analyzing physical implementation effects before the actual place-and-route, is essential in designing today’s multi-million and future’s billion gate ICs. We study several fundamental problems of VLSI physical design planning in this dissertation. In floorplanning, we develop a floorplanner FAST-SP based on sequence pair floorplan representation. A new approach is proposed to evaluate a sequence pair based on computing longest common subsequence in a pair of weighted sequences in time. Our evaluation algorithm is significantly faster than the previous graph-based algorithm. For all MCNC benchmark problems, we obtain the best results ever reported in the literature with significantly less runtime. Placement constraints are important in floorplanning. We consider fixedframe floorplanning and extend FAST-SP to handle placement constraints such as pre-placed constraint, range constraint, boundary constraint, abutment constraint, alignment constraint, rectilinear shape constraint, and performance constraint. We propose a uniform method to deal with all these constraints. Buffer planning is a key component in design planning. We present a new approach to buffer planning based on network flow computation. Our algorithm optimally solves the problem of inserting maximum number of buffers into the free space between the circuit blocks with minimum total cost in polynomial time. Buffered routing tree construction is essential in wire planning. We consider the problem of constructing routing trees with simultaneous buffer insertion and wire sizing in the presence of routing and buffer obstacles. A new graph-based approach is proposed to solve the problem. Both theoretical and experimental results show that the graph-based algorithm outperforms the previous DP-based algorithm by a large margin. Routing resource allocation or distribution is a key issue in wire planning. We need to minimize the use of routing resource while satisfying the delay constraints. Thus we study the problem and extend the graph-based algorithm to solve it. We also develop a hierarchical approach to buffered routing tree construction to tradeoff runtime and routing quality.Show more Item GIS algorithms for large watersheds with non-contributing areas(Center for Research in Water Resources, University of Texas at Austin, 2001-12) Figurski, Melissa Jane; Maidment, David R.Show more

- «
- 1 (current)
- 2
- 3
- »