Discovering heap anomalies in the wild
MetadataShow full item record
Programmers increasingly rely on managed languages (e.g. Java and C#) to develop applications faster and with fewer bugs. Managed languages encourage allocating objects in the heap and rely on automatic memory management (garbage collection) to reclaim objects the program can no longer access. With more objects in the heap, the heap encodes more program state than ever before and offers new opportunities for optimization and analysis. This dissertation shows how to efficiently leverage the managed runtime to perform dynamic heap analysis. Previous heap analysis approaches significantly slow down programs, require special hardware, and/or increase memory consumption by 75% or more. We presents two synergistic techniques—dynamic object sampling (DOS) and heap summarization (HSG)—that mine program state embedded in the heap efficiently enough to use in production and effectively enough to improve performance, find bugs, and increase program understanding. We use these techniques to address three problems: (1) Performance of managed language. Because some objects live for a long time, they incur disproportionate collection costs. We optimize these costs with dynamic pretenuring. Dynamic pretenuring uses DOS to accurately predict allocation site survival rates and uses these predictions to improve performance. (2) Finding bugs. Memory leaks in managed languages occur when a program inadvertently maintains references to objects that it no longer needs. Along with degrading performance and resulting in program crashes, memory leaks cause systematic heap growth. We introduce Cork which uses the simplest type of HSG, a class points-from summary graph (CPFG), to detect systematic heap growth. Cork quickly identifies growing data structures observed in three popular benchmarks (fop, jess, and jbb2000) while adding an average of only 2.3% to total time. Additionally, we use Cork to debug a reported memory leak in Eclipse. (3) Program understanding. For a long time, static analysis has sought to statically summarize the shape of dynamic data structures to aid in program verification and understanding. Unfortunately, it only works on small programs. We introduce ShapeUp which instead characterizes recursive data structures dynamically by discovering data structure shape and degree invariants at runtime. ShapeUp uses DOS and a class field-wise summary graph (CFSG) to track in- and out-degree invariants of data structure nodes. We show how ShapeUp automatically identifies recursive data structures and likely shape invariants. Finally, we monitor discovered shape invariants to detect when a data structure becomes malformed. In summary, this dissertation is the first to leverage the managed runtime to perform dynamic heap analysis both accurately and efficiently. Our results show that the heap contains an enormous amount of program state and that there is much potential for dynamically mining heap characteristics for optimization, debugging, and program understanding.