Show simple item record

dc.contributor.advisorVin, Harrick M.en
dc.creatorMudigonda, Jayaramen
dc.date.accessioned2008-08-28T22:42:23Zen
dc.date.available2008-08-28T22:42:23Zen
dc.date.issued2005en
dc.identifierb61126135en
dc.identifier.urihttp://hdl.handle.net/2152/2307en
dc.descriptiontexten
dc.description.abstractThe design of packet processing systems is guided by two requirements: (1) high packet processing throughput, and (2) ease-of-programming. Meeting these two requirements simultaneously has proved to be challenging primarily because, packet processing aggravates the well-known problem of memory bottleneck. To overcome the memory bottleneck, today’s packet processing systems support a wide-range of mechanisms such as exposed memory hierarchy, multithreading, and non-blocking multi-word accesses. However, supporting many such mechanisms without clear guidelines for their usage complicates programmability and wastes system resources. In this dissertation, we ask two fundamental questions: (1) what minimal set of mechanisms should a packet processing system support to simultaneously achieve the goals of high throughput and ease-of-programming? and (2) how should one allocate common system resources such as chip area and off-chip memory bandwidth to these competing mechanisms? We make three contributions. First, we demonstrate that the minimal set must include data caches and multithreading for two reasons: (1) contrary to the widely-held belief, packet proviii cessing exhibits considerable data locality and (2) relying exclusively on either data locality or packet-level parallelism leads to low packet throughput. Second, we demonstrate that no fixed configuration of caches and multithreading works well across a spectrum of deployments (combinations of application, workload, and system characteristics). Achieving high throughput in all cases requires a malleable architecture that allows deployments to trade off resources between data caching and multithreading. Third, we develop such a malleable architecture based on a novel predictive register prefetcher that efficiently overlaps the execution of a thread with the thread-switch. Our predictor is accurate and imposes no significant overheads. We demonstrate that for the same chip area and memory bandwidth, our architecture achieves four times the throughput of current commercial systems. We argue that our architecture simplifies programmability. Data caches, unlike the mechanisms used today, are transparent to programmers and compilers. Further, our architecture eliminates the need for explicit thread scheduling as threads can be switched on cache misses. Thus, our approach simultaneously achieves both, high throughput and ease-of-programming. In addition, our findings lay the foundation for managing data accesses in the broader class of highthroughput, request-processing systems.
dc.format.mediumelectronicen
dc.language.isoengen
dc.rightsCopyright is held by the author. Presentation of this material on the Libraries' web site by University Libraries, The University of Texas at Austin was made possible under a limited license grant from the author who has retained all copyrights in the works.en
dc.subject.lcshPacket switching (Data transmission)en
dc.subject.lcshComputer network protocolsen
dc.titleAddressing the memory bottleneck in packet processing systemsen
dc.description.departmentComputer Sciencesen
dc.identifier.oclc71004382en
dc.type.genreThesisen
thesis.degree.departmentComputer Sciencesen
thesis.degree.disciplineComputer Sciencesen
thesis.degree.grantorThe University of Texas at Austinen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record