Browsing by Subject "Compilers (Computer programs)"
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item Adaptive predication via compiler-microarchitecture cooperation(2007) Kim, Hyesoon, 1974-; Patt, Yale N.Even after decades of research in branch prediction, branch predictors still remain imperfect, which results in significant performance loss in aggressive processors that sup- port large instruction windows and deep pipelines. Predicated execution can reduce the number of branch mispredictions by eliminating hard-to-predict branches. However, the additional instruction overhead and data dependencies due to predicated execution some- times offset the performance benefits of having fewer mispredictions. This dissertation presents two cooperative compiler-microarchitecture mechanisms to reduce the branch mis- prediction penalty by combining predicated execution and branch prediction. The first mechanism is a set of new control flow instructions, called wish branches. With wish branches, the compiler generates code that can be executed either as normal branch code or as predicated code. At run-time, the hardware chooses between normal branch code and predicated code based on the run-time branch behavior and the estimated run-time effectiveness of each solution. The results show that wish branches can signifi- cantly improve both performance and energy efficiency compared to predication or branch prediction. To provide the benefit of predicated code to non-predicated Instruction Set Archi- tectures (ISAs) and to increase the benefit of predicated execution beyond the benefit of wish branches, this dissertation also presents and evaluates the Diverge-Merge Processor (DMP) architecture. In the diverge-merge processor, the compiler analyzes the control-flow graphs of the program and marks branches suitable for dynamic predication –called di- verge branches– and their corresponding control flow merge points. The hardware not only chooses whether to use branch prediction or predication, but also decides “which” instruc- tions after a branch should be predicated based on run-time branch behavior. This solution significantly reduces the overhead of predicated code and allows a very large set of control- flow graphs to be predicated, neither of which was possible previously because predication was performed statically without any run-time information. This dissertation compares DMP with all other major previously-proposed branch processing paradigms available in the literature in terms of performance, power, energy consumption, and complexity. The results show that DMP is the most energy-efficient and high-performance paradigm for branch handling. Code generation algorithms for the DMP architecture and cost-benefit analysis models of dynamic predication are also evaluated.Item Braids: out-of-order performance with almost in-order complexity(2007-12) Tseng, Francis, 1976-; Patt, Yale N.There is still much performance to be gained by out-of-order processors with wider issue widths. However, traditional methods of increasing issue width do not scale; that is, they drastically increase design complexity and power requirements. This dissertation introduces the braid, a compile-time generated entity that enables the execution core to scale to wider widths by exploiting the small fanout and short lifetime of values produced by the program. A braid captures dataflow and register usage information of the program which are known to the compiler but are not traditionally conveyed to the microarchitecture through the instruction set architecture. Braid processing requires identification by the compiler, minor augmentations to the instruction set architecture, and support by the microarchitecture. The execution core of the braid microarchitecture consists of a number of braid execution units (BEUs). The BEU is tailored to efficiently carry out the execution of a braid in an in-order fashion. Each BEU consists of a FIFO scheduler, a busy-bit vector, two functional units, and a small internal register file. The braid microarchitecture provides a number of opportunities for the reduction of design complexity. It reduces the port requirements of the renaming mechanism, it simplifies the steering process, it reduces the area, size, and port requirements of the register file, and it reduces the paths and port requirements of the bypass network. The complexity savings result in a design characterized by a lower power requirement, a shorter pipeline, and a higher clock frequency. On an 8-wide design, the result from executing braids is performance within 9% of a very aggressive conventional out-of-order microarchitecture with the complexity of an in-order implementation. Three bottlenecks are identified in the braid microarchitecture and a solution is presented to address each. The limitation on braid size is addressed by dynamic merging. The underutilization of braid execution resources caused by long-latency instructions is addressed by context sharing. The poor utilization of braid execution resources caused by single-instruction braids is addressed by heterogeneous execution resources.Item Compiler directed speculation for embedded clustered EPIC machines(2004) Pillai, Satish; Jacome, Margarida F.Very Large Instruction Word (VLIW)/Explicitly Parallel Instruction Computing (EPIC) processors are a very attractive platform for many of today's multimedia and communications applications. In particular, clustered VLIW/EPIC machines can take aggressive advantage of the available instruction level parallelism (ILP), while maintaining high energy-delay ef- ciency. However, multicluster machines are more challenging to compile to than centralized machines. In this thesis, we propose a novel compilerdirected resource-aware ILP extraction technique, called predicated switching, that is targeted towards such multicluster VLIW/EPIC machines. The proposed technique integrates three powerful ILP extraction techniques { predication, speculation and software pipelining, in a combined framework. The three novel contributions in this dissertation are: (1) a compiler transformation, denoted Static Single Assignment - Predicated Switching (SSA-PS), that leverages required data transfers between clusters for performance gains; (2) a static speculation algorithm to decide which speci c kernel operations should actually be speculated in a region of code (hyperblock), possibly being simultaneously software pipelined, so as to maximize execution performance on the target processor; and (3) an ILP extraction ow incorporating several code generation phases critical to pro table ILP extraction by the compiler. Experimental results performed on a representative set of time critical kernels compiled for a number of target machines show that, when compared to two baseline \resource-unaware" speculation techniques (one that speculates aggressively and one that speculates conservatively), predicated switching improves performance with respect to at least one of the baselines in 65% of the cases by up to 50%. Moreover, we show that code size and register pressure are not adversely affected by our technique. Finally, we show that our ILP extraction framework combining speculation and software pipelining can effectively exploit the relative merits of both techniques.Item Correct implementation of network protocols(2004) McGuire, Tommy Marcus; Gouda, Mohamed G., 1947-A number of issues combine to make network protocol development signif- icantly more difficult than other areas of computer programming: problems with time, concurrency, and failures; interactions between the network proto- col and its environment; and obstacles in developing the protocol over time. In order to address these issues, we introduce the Timed Abstract Pro- tocol notation and the Austin Protocol Compiler. The Timed Abstract Pro- tocol, or TAP, notation is a domain-specific formal language for describing asynchronous message-passing network protocols, with two execution models: an abstract execution model and a concrete execution model. The abstract execution model is suited for protocol design, comprehension, and correctness verification. The concrete execution model is suited for protocol implementa- tion. We show that the two models are equivalent: that a protocol interpreted under the concrete model preserves the intended behavior of the protocol in- terpreted under the abstract model. The Austin Protocol Compiler, or APC, is a system that transforms a protocol given in the Timed Abstract Protocol notation into executable C code and provides a runtime environment for the protocol. In order to demonstrate the effectiveness of the TAP notation and APC, we present implementations of a secure encryption key exchange proto- col, a failure discovery protocol, and a Domain Name System server. While discussing the latter, we examine the performance of the APC implementation and show that it is comparable to two other DNS servers. The combination of the Timed Abstract Protocol notation and the Austin Protocol Compiler addresses the issues of network protocol develop- ment by allowing precise and verifiable descriptions of protocols which can be made executable easily, in order both to gain experimental experience and to provide reference implementations.Item Incorporating domain-specific information into the compilation process(2003-05) Guyer, Samuel Zev; Lin, Yun CalvinDespite many advances in compiler research, traditional compilers continue to suffer from one significant limitation: they only recognize the low-level primitive constructs of their languages. In contrast, programmers increasingly benefit from higher level software components, which implement a variety of specialized domains—everything from basic file access to 3D graphics and parallel programming. The result is a marked difference between the level of abstraction in software development and the level of abstraction in compilation. In this thesis we present the Broadway compiler, which closes this gap. Broadway represents a new kind of compiler, called a library-level compiler, that supports domainspecific compilation by extending the benefits of compiler support to software libraries. The key to our approach is a separate annotation language that conveys domain-specific information about libraries to our compiler, allowing it to treat library routines more like built-in language operations. Using this information, the compiler can perform library-level program analysis and apply library-level optimizations. We explore both the opportunities and challenges presented by library-level compilation. We show that library-level optimizations can increase the performance of several parallel programs written using a highly-tuned parallel linear algebra library. These highlevel optimizations are beyond the capabilities of a traditional compiler and even rival the performance of programs hand-coded by an expert. We also show that our compiler is an effective tool for detecting a range of library-level errors, including several significant security vulnerabilities. Finally, we present a new client-driven pointer analysis algorithm, which provides precise and scalable program analysis to meet the demanding requirements of library-level compilation.Item Language and compiler support for mixin programming(2002-05) Cardone, Richard Joseph; Lin, Yun CalvinThe need to reduce the cost of software development and maintenance has been a constant and overriding concern since the advent of electronic computing. The difficulty, and therefore the expense, in programming large software applications is due to the complex interactions and interdependencies in application code. These interdependencies increase costs by making code hard to understand, hard to change, and hard to reuse. For over a half century, the need to reduce code complexity has been the driving force behind the trend to program at higher levels of abstraction with increased code modularity. This dissertation takes a step towards increasing code modularity by showing that mixin generic types can be used effectively to build applications from reusable software components. First, we address issues of language definition and integration. We show how mixins can be integrated into a modern programming language to support a methodology of incremental software construction. We viii identify novel language and compiler features that make programming with mixins convenient and efficient. Second, we address issues of implementation and evaluation. We implement a critical subset of mixin language support in a compiler. We then use our compiler to show that mixins increase code reuse compared to current technologies; to show that application development and maintenance can be simplified using mixins; and to show that our novel language features simplify mixin programming. In addition, we discuss language implementation issues and define a new design pattern useful in mixin programming.