Core-characteristic-aware off-chip memory management in a multicore system-on-chip

dc.contributor.advisorErez, Mattanen
dc.contributor.committeeMemberJohn, Lizy K.en
dc.contributor.committeeMemberChiou, Dereken
dc.contributor.committeeMemberLin, Calvinen
dc.contributor.committeeMemberSchulte, Michael J.en
dc.creatorJeong, Min Kyuen
dc.date.accessioned2013-01-30T15:27:22Zen
dc.date.available2013-01-30T15:27:22Zen
dc.date.issued2012-12en
dc.date.submittedDecember 2012en
dc.date.updated2013-01-30T15:27:44Zen
dc.descriptiontexten
dc.description.abstractFuture processors will integrate an increasing number of cores because the scaling of single-thread performance is limited and because smaller cores are more power efficient. Off-chip memory bandwidth that is shared between those many cores, however, scales slower than the transistor (and core) count does. As a result, in many future systems, off-chip bandwidth will become the bottleneck of heavy demand from multiple cores. Therefore, optimally managing the limited off-chip bandwidth is critical to achieving high performance and efficiency in future systems. In this dissertation, I will develop techniques to optimize the shared use of limited off-chip memory bandwidth in chip-multiprocessors. I focus on issues that arise from the sharing and exploit the differences in memory access characteristics, such as locality, bandwidth requirement, and latency sensitivity, between the applications running in parallel and competing for the bandwidth. First, I investigate how the shared use of memory by many cores can result in reduced spatial locality in memory accesses. I propose a technique that partitions the internal memory banks between cores in order to isolate their access streams and eliminate locality interference. The technique compensates for the reduced bank-level parallelism of each thread by employing memory sub-ranking to effectively increase the number of independent banks. For three different workload groups that consist of benchmarks with high spatial locality, low spatial locality, and mixes of the two, the average system efficiency improves by 10%, 7%, 9% for 2-rank systems, and 18%, 25%, 20% for 1-rank systems, respectively, over the baseline shared-bank system. Next, I improve the performance of a heterogeneous system-on-chip (SoC) in which cores have distinct memory access characteristics. I develop a deadline-aware shared memory bandwidth management scheme for SoCs that have both CPU and GPU cores. I show that statically prioritizing the CPU can severely constrict GPU performance, and propose to dynamically adapt the priority of CPU and GPU memory requests based on the progress of GPU workload. The proposed dynamic bandwidth management scheme provides the target GPU performance while prioritizing CPU performance as much as possible, for any CPU-GPU workload combination with different complexities.en
dc.description.departmentElectrical and Computer Engineering
dc.format.mimetypeapplication/pdfen
dc.identifier.slug2152/ETD-UT-2012-12-6765en
dc.identifier.urihttp://hdl.handle.net/2152/ETD-UT-2012-12-6765en
dc.language.isoengen
dc.subjectMemoryen
dc.subjectCMPen
dc.subjectLocalityen
dc.subjectParallelismen
dc.subjectSoCen
dc.subjectGPUen
dc.subjectQoSen
dc.titleCore-characteristic-aware off-chip memory management in a multicore system-on-chipen
dc.type.genrethesisen
thesis.degree.departmentElectrical and Computer Engineeringen
thesis.degree.disciplineElectrical and Computer Engineeringen
thesis.degree.grantorUniversity of Texas at Austinen
thesis.degree.levelDoctoralen
thesis.degree.nameDoctor of Philosophyen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
JEONG-DISSERTATION.pdf
Size:
5.38 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.12 KB
Format:
Plain Text
Description: