Kubernetes provenance

Access full-text files

Date

2020-09-14

Authors

Lin, William, M.S. in Computer Sciences

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The field of machine learning (ML) has experienced a period of renaissance since the 2000s. First, exponential increase in computational power and improvements in hardware has finally allowed machine learning algorithms to process the same amount of data in minutes and hours rather than hundreds of years. Second, the model of cloud computing made large scale clusters inexpensive and available to anyone at the click of a button, allowing them to scale their algorithms without having to personally maintain hundreds or even thousands of machines. However, despite the huge rise in popularity of machine learning in both research and industry, the ML community is facing a crisis of being able to reproduce results. Although the existing machine learning frameworks all have the ability to re-execute the same piece of code saved by a researcher, the typical workflow could involve different frameworks and accesses to data on remote machines. These cross-framework workflows can not be replicated by a single frameworks provenance system, and often contain customized scripts and processes that can further obscure the ability for future replication and repeatability.

I make the argument in this thesis that because of machine learning’s need for scale and frequent training on large clusters, Kubernetes serves as a good common layer for the systems community to interpose a layer of provenance collection to aid the ML community in reproducing results that make use of multiple machines, frameworks, and hardware platforms. In addition, I also propose two new mechanisms for collecting fine-grained provenance information from Kubernetes without modifying the application or host operating system.

Description

LCSH Subject Headings

Citation