# Accelerating inverse solutions with machine learning and randomization

## Access full-text files

## Date

## Authors

## Journal Title

## Journal ISSN

## Volume Title

## Publisher

## Abstract

Inverse problems is a field of applied mathematics that finds wide application in both the scientific community and industry where the objective is to estimate some parameter of interest (PoI) from observations. These two quantities are related by a mapping known as the parameter-to-observable (PtO) map, which may be nonlinear. While the forward problem may be well-posed, the inverse problem is often ill-posed, making parameter estimation a difficult problem. Ill-posedness in the Hadamard sense means that at least one of the following is true: 1) the solution does not exist, 2) the solution is not unique, or 3) the solution does not depend continuously on the data. In cases of interest where the PtO map is an observational operator acting on the solution to a system of PDEs that have been discretized on a domain, the ill-posedness can be severe due to the compact nature of the PtO map. To address the ill-posedness, practitioners often write the solution of the inverse problem as the solution of a regularized least squares optimization problem where the regularization is constructed or designed to combat the ill-posedness, resulting in the existence of a unique solution that depends continuously on the data. There are many classical regularization methods including Tikhonov regularization, total variation (TV) regularization, and nonconvex regularization strategies such as using an l [subscript p] norm with 0 < p < 1. In addition to estimating the PoI only, it is also of interest to estimate the uncertainty. To do this, a Bayesian approach is typically employed where the solution to the inverse problem is a posterior probability density rather than a deterministic quantity. By Bayes theorem, the posterior is proportional to the product of the likelihood and the prior probability density. In the case of Gaussian observational noise and prior, finding the maximum a posteriori (MAP) point is equivalent to solving the regularized least squares optimization problem in weighted norms where the likelihood results in the data misfit term weighted by the inverse of the noise covariance matrix and the prior leads to the regularization term weighted by the inverse of the prior covariance. That is, computing the MAP estimate of the PoI in the Bayesian framework requires solving a deterministic inverse problem, so the apparatus for solving Bayesian inverse problems builds on the algorithms and tools used for solving deterministic inverse problems. This understanding is what enables us to gain insight into the inverse solutions from various methods and to develop new techniques that begin with deterministic inverse problems but can then be analyzed from a statistical perspective and used to quantify uncertainty. Since the likelihood depends on the PtO map, significant emphasis has been placed on developing robust and scalable computational models in past decades along with excellent problem-specific priors. On the other hand, there has been a recent trend to abandon models and embrace the era of big data. We aim to show that neither approach is best and that the surplus of data can be used in concert with classical models to both improve the quality of inverse solutions and to accelerate the solution process using modern machine learning techniques. In this research, we use global full waveform seismic inversion and Poisson’s equation as the prototypical inverse problems. Sparsely located seismogram observations are used to reconstruct the acoustic wave speed for the seismic inverse problem. This inverse problem is constrained by a three-dimensional acoustic wave equation which is a system of time-dependent PDEs discretized on the entire globe. Full waveform inversion is an important problem in science and industry with applications to reservoir characterization and various biomedical imaging problems. We use the adjoint method as the starting point from which we develop several new inversion methods. Seismic inversion is a good prototypical problem because it is a nonlinear inverse problem with high computational cost for which scalable libraries exist, enabling us to study the effectiveness of our methods on practical large-scale inverse problems. Sparsely sampled temperature observations are used to reconstruct the underlying heat conductivity for the Poisson problem on a two-dimensional mesh. Poisson’s equation is an excellent test problem because of the severe ill-posedness of inverting for the conductivity. We propose four new methods for solving PDE constrained inverse problems: 1. The data-informed active subspace (DIAS) regularization approach was developed as an alternative to Tikhonov regularization where the regularization is only applied in directions where the data are least informative. 2. The UQ-VAE framework was developed as a hybrid data/model driven machine learning approach for rapid MAP estimation and uncertainty quantification. 3. An autoencoder based compression strategy was developed to address the high cost of solving large-scale time-dependent inverse problems by eliminating the need for checkpointing. 4. By combining the DIAS approach and autoencoder compression, we aim to provide a comprehensive method for computing a data-informed inverse solution while mitigating the additional computational cost with autoencoder compression, enabling the DIAS method to scale to large problems. Additionally, we develop a unifying theory for the convergence of randomized methods for solving inverse problems and show the effectiveness on the Poisson inverse problem. Contributions to CSEM areas of interest: Area A Applied mathematics: The DI framework was rigorously derived from the truncated singular value decomposition. Its deterministic properties were analyzed from a spectral perspective and we show that the DIAS prior is a valid regularization strategy. Additionally, we analyze the DIAS prior from a statistical perspective and show that for linear inverse problems with Gaussian noise and prior covariances, the posterior covariance of the DIAS solution is bounded below by the Tikhonov posterior covariance and show that Tikhonov regularization results in over-confident uncertainty estimates. The UQ-VAE framework was rigorously derived from minimizing the Jensen-Shannon divergence (JSD) between the true posterior and the model posterior, parameterized by the weights of a deep neural network. We derive an explicit finite sample size loss function when the likelihood, prior, and posterior are all assumed to be Gaussian. We prove both asymptotic convergence and derive a non-asymptotic error bound for a broad family of randomized solutions for linear and nonlinear inverse problems. From this family of randomized methods, we show the equivalence of several existing methods and derive new methods. Area B Numerical analysis and scientific computation: While the DIAS prior has firm mathematical foundations, computing the DIAS regularization cost and corresponding gradient term are expensive, both computationally and in terms of storage. Therefore, we develop and investigate an approximate form of the DIAS prior that allows for the action of the inverse of the DIAS prior covariance matrix to be approximately applied to a vector in a scalable fashion. We also derive and implement a form of the DIAS prior that involves low-rank projections and require substantially less storage than the naive implementation of the DIAS prior would suggest. This approximate algorithm with low-rank projection is implemented on a large-scale seismic inverse problem solved on at least 64 nodes of the Frontera supercomputer, demonstrating that the DIAS regularization is viable, even on large problems. Non-asymptotic error analysis for randomized inverse problems employs techniques from numerical analysis to show that the error of the solution to the perturbed (randomized) optimization problem is bounded with high probability. We explore the convergence of various randomized methods numerically to validate the theoretical convergence properties and provide practical insight into the numerical performance of each method on a variety of problems. An autoencoder based compression strategy for time dependent inverse problems was developed as a scalable substitute for checkpointing. We study two different compression schemes: spatial compression and temporal compression. Each method is implemented and scaled on the Frontera supercomputer. Since the goal of this work is to reduce the wasted computational effort of checkpointing, we require that any proposed approach be faster than the original checkpointing implementation. This requires special care in scalable implementation since the underlying PDE solver (and thus restoration from checkpoints) is highly tuned and fast. We develop a novel sparse-dense autoencoder architecture to minimize the FLOPs required to perform compression and decompression, showing that excellent compression results can be obtained with high levels of sparsity in the autoencoder architecture. Lastly, we present a data generation, normalization, and training scheme, showing that even the “offline” cost of training is small relative to the cost of solving the inverse problem. This work was scaled up to 256 nodes of Frontera. Area C Mathematical modeling and applications: We apply our proposed methods to two model applications which are well-suited to exploring each method’s relative advantages and disadvantages. First, we consider a 2D Poisson equation with sparse measurements. While applicable in a wide variety of fields, we consider Poisson’s equation in the context of steady-state heat conduction. Though the forward problem is linear, the inverse problem of inferring the underlying conductivity is nonlinear. The natural ill-posedness of this problem makes it an excellent test problem for new regularization methods and machine learning. Observing the temperature at only a select few locations makes the inverse problem even more ill-posed and of practical interest for testing the capabilities of inverse solvers. We also consider a large-scale seismic inverse problem, or full waveform inversion (FWI). Seismic waves can be modeled as acoustic waves propagating through the earth. The inverse problem we consider is to invert for the underlying acoustic wave speed given sparse measurements of the velocity field. We use this application to exhibit the scalability of our proposed methods to large-scale problems. Additionally, the time-dependence of FWI allows us to develop new methods for accelerating the solution of large-scale inverse problems.