Non-asymptotic study of quasi-Newton methods with application in large-scale machine learning
Access full-text files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Quasi-Newton methods are among popular optimization algorithms for solving convex programs since they serve as a middle ground between first-order methods and second-order Newton-type algorithms. They improve the linear convergence rate of first-order methods and achieve a local superlinear rate, while their computational cost per iteration is O(d²) instead of O(d³) of Newton’s method, where d is the problem dimension. The main idea of quasi-Newton methods is to approximate the step of Newton’s method without computing the objective function Hessian or its inverse at every iteration. To be more specific, quasi-Newton methods aim at approximating the curvature of the objective function by using only first-order information about the function. There are several different approaches for approximating the objective function Hessian and its inverse using first-order information, which leads to different quasi-Newton updates, but perhaps the most popular quasi-Newton algorithms are the Davidon-Fletcher-Powell (DFP) method and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method. As mentioned earlier, a major advantage of quasi-Newton methods is their asymptotic superlinear convergence rate. Although this convergence result is promising and lies between the linear rate of first-order methods and the quadratic rate of Newton’s method, it only holds asymptotically and does not characterize an explicit upper bound on the error of quasi-Newton methods after a finite number of iterations. As a result, the overall complexity of quasi-Newton methods for achieving a target accuracy cannot be explicitly characterized. This shortcoming of the analysis of quasi-Newton methods limits their application in many applications. In this thesis, we aim to investigate both the local and global non-asymptotic superlinear convergence rates of the quasi-Newton methods and leverage these results to study the application of this class of optimization algorithms in large-scale machine learning problems.