Improving proxy accuracy : experiments with branch, memory, and over time behavior

Flolid, Steven Gregory

Improving proxy accuracy : experiments with branch, memory, and over time behavior

Access full-text files

FLOLID-MASTERSREPORT-2021.pdf (2.24 MB)

Date

2021-07-24

Authors

Flolid, Steven Gregory

Abstract

As computers and the workloads they run have grown in size and complexity, it has become difficult to test the performance and power of future products under design. These products (future processor and memory systems) are often designed on simulators that are orders of magnitude slower than the final product. Full-fledged workloads are simply too long and complex to run on cycle accurate simulators. For this reason, industry and academia have developed simulation region methodologies, such as SimPoint, that identify the most representative regions of the workloads. Most architecture researchers use dominant SimPoints in their architectural design explorations. However, in order to study runtime adaptive techniques for performance and power/energy management, it is important to capture the over time phase behavior of workloads. SimPoint has been demonstrated to capture average behavior accurately, but it is not known whether a sequence of SimPoints can capture the over time program phase behavior. To explore this, we propose SimTrace, an ordered sequence of SimPoints. We then use existing similarity techniques for time series to analyze and evaluate the accuracy of SimTrace to capture the over time phase behavior of the original workload. Using SPEC CPU 2017 benchmarks as a case study, we observe good accuracy for SimTrace: with less than 5% performance error (Instructions Per Cycle) for four time-series metrics. However, for techniques like SimTrace to work in real world environments techniques are needed to create simple executables for these SimPoints. One promising technique is program approximations which extract key characteristics from a workload. However, the techniques for creating Synthetic proxies are insufficient to handle current micro-architecture structures and the more complex workloads being run. Additionally, generating proxies has up to this point required expert tuning in order to perform well. This report covers some development on these proxies to increase automation and to extend the branch and memory models. Experimental results show improvements of up to 10x in accuracy for memory and branch behavior.