On indexing large databases for advanced data models

Samoladas, Vasilis
Journal Title
Journal ISSN
Volume Title

In the last decade, the relational data model has been extended in numerous ways, including geographic information systems, abstract data types and object models, constraint and temporal databases, and on-line analytical processing. We study the indexing requirements of these data models. In many cases, these requirements are fulfilled by efficient techniques for multidimensional range search. Previous techniques for multidimensional range search, such as the R-tree and its variants, are based on ad hoc assumptions on the nature of the workloads they index, and have been known to suffer from reduced scalability and robustness. We adopt an alternative approach; our study focuses on techniques that provide worst-case performance guarantees, and thus overcome these deficiencies. iii Indexability, proposed by Hellerstein, Koutsoupias and Papadimitriou, is a novel memory model for external memory. In indexability, the complexity of indexing is quantified by two parameters: storage redundancy and access overhead. Indexability focuses on the inherent trade-off between these two parameters. We study multidimensional range search under indexability. Our results are of two kinds; indexing schemes for various problems, and corresponding lower bounds. We develop indexing schemes for interval management, multidimensional arrays, and various types of planar range search. We derive a lower-bounds theorem for arbitrary indexing schemes, and apply it to multidimensional range search, proving most of our indexing schemes to be optimal. We then leverage our theoretical work to the design of access methods. We solve the long-standing open problem of an optimal external-memory priority search tree. Our structure, the EPS-tree, is based on indexability results. We also explore dynamization, and develop techniques with optimal amortized and worst-case cost. We implement and evaluate experimentally our access method. Our experiments demonstrate that EPS-trees achieve excellent search and update performance, comparable to that of B+-trees on one-dimensional datasets. Our experiments with large datasets demonstrate the scalability and robustness of our techniques. We also affirm the relevance of space-I/O trade-off in achieving high indexing performance. We conclude that the EPS-tree is an efficient, robust access method for a wide range of problems. Its success affirms the merits of systematic use of redundancy, and nominates indexability as a prominent methodology