GPU-accelerated high-performance computing for architecture-aware wave simulation based on discontinuous Galerkin algorithms

Hanindhito, Bagus

GPU-accelerated high-performance computing for architecture-aware wave simulation based on discontinuous Galerkin algorithms

Access full-text files

HANINDHITO-THESIS-2020.pdf (9.26 MB)

Date

2020-05-09

Authors

Hanindhito, Bagus

Abstract

Full-waveform inversion has been an essential method for oil and gas industries to approximate the properties of the Earth’s surface without the need to see them directly by digging, drilling, or tunneling, and thus lowering exploration costs. This method relies on the use of generated seismic waves and the acquisition of the reflected wave data. Since each type of rocks, sediments, and materials have different properties, the acquired data can be used to approximate the location of mineral and oil repository, for example. The first problem, which is the focus of this research, is called the forward problem, which aims to generate synthetic seismograms based on the given model. The second problem is the inverse problem, which tries to find the optimum model that can best describe the obtained data. Generally, the area of the seismic survey is massive and can easily generate a vast amount of data, which is used to find the best Earth model. Therefore, a considerable amount of computing power is required to help in solving these problems. Industrial-scale wave simulators typically use multiple CPUs to accelerate computation. As the size of the problem increases, the time needed to run the simulation will increase accordingly. In this thesis, we investigated the implementation of a CPU-based wave simulator to find available parallelism that can be extracted. We mapped the massive number of parallelism to GPU which has thousands of cores, and thus suitable for doing this job. We performed additional optimization of the basic code to improve the performance. We also developed a method to verify the functionality of our implementation against the original code. The GPU-accelerated version of the code is then compared to the original CPU code. We run the simulation for different levels of discretization both in consumer-class GPU and datacenter-class GPU. For the double-precision run, our benchmark results show the speed-up over 120x, 210x, and 330x in GeForce GTX 1080Ti, Tesla P100, and Tesla V100 GPU, respectively, compared to the dual Intel Xeon Platinum 8160 CPUs with a total of 48 cores