PebblesDB : building key-value stores using fragmented log-structured merge trees
Key-value stores such as LevelDB and RocksDB offer excellent write throughput, but suffer high write amplification. The write amplification problem is due to the Log-Structured Merge Trees data structure that underlies these key-value stores. To remedy this problem, this thesis presents a novel data structure that is inspired by Skip Lists, termed Fragmented Log- Structured Merge Trees (FLSM). FLSM introduces the notion of guards to organize logs (sstables or files containing the data on storage), and avoids rewriting data in the same level. Theoretically, we show how FLSM can address the problem of write amplification. We build PebblesDB, a high-performance key-value store, by modifying HyperLevelDB to use the FLSM data structure. We evaluate PebblesDB using micro-benchmarks and show that for write-intensive workloads, PebblesDB reduces write amplification by 2.4-3x compared to RocksDB, while increasing write throughput by 6.7x. We evaluate PebblesDB extensively under a variety of benchmarks, workload patterns, and environmental factors and analyze how it performs in different scenarios. We modify two widely-used NoSQL stores, MongoDB and HyperDex, to use PebblesDB as their underlying storage engine. Evaluating these applications using the YCSB benchmark shows that throughput is increased by 18-105% when using PebblesDB (compared to their default storage engines) while write IO is decreased by 35-55%.