Enabling accurate and private machine learning systems
Machine learning applications in fields where data is sensitive, such as healthcare and banking, face challenges due to the need to protect the privacy of participating users. Tools developed in the past decades that aim to address this challenge include differential privacy and federated learning. Yet maintaining performance while protecting sensitive data poses a trilemma between accuracy, privacy, and efficiency. In this thesis, we aim to address these fundamental challenges and take a step towards enabling machine learning under privacy and resource constraints. On the differential privacy front, we develop in Chapter 2 an algorithm that addresses efficiency and accuracy of differentially private empirical risk minimization. We provide a dimension independent excess risk bound and show the algorithm converges to this excess risk bound at the same rate as AdaGrad. In Chapter 3 we introduce an algorithm for differentially private Top-k selection, a problem that often arises as a building block of large-scale data analysis tasks like NLP and recommender systems. The algorithm samples from a distribution with exponentially large support only in polynomial time and space, and improves existing pure differential privacy methods. On the federated learning front, locality of data imposes various system design challenges due to resource constraints. In Chapter 4, we propose federated and differentially private algorithms for matrix factorization tasks that arise when training recommender systems in the settings where data is distributed across different silos (e.g., hospitals or banks). Chapter 5 introduces a client selection strategy that reduces communication in federated learning while maintaining accuracy of the model. Finally, in Chapter 6 we conclude by presenting F3AST, a novel algorithm that addresses user intermittency in federated learning under an unknown and time varying system configuration.