Resurrecting legacy code to revitalize software for groundwater research : reproducibility and robustness for the Barton Springs case, Texas
MetadataShow full item record
Advanced computing is becoming an indispensable part of geosciences, the interdisciplinary nature of which often requires large-scale and data-intensive numerical modeling. Groundwater in Texas is one such area that can greatly benefit from advanced decision support for understanding aquifer systems, uncertainty analysis, and policy making. However, software developed for research is often used for a relatively short period of time before it is abandoned or lost. The unintentional abandonment of software within the fast changing technological landscape makes model simulation results difficult to replicate, hindering widespread reusability and causing significant effort to be lost on redeveloping new software for researchers pursuing similar or adapted studies. These legacy codes are potentially important assets and may be resurrected and moved to an archive for long-term reuse. This research develops and tests methodologies to inform the design of best practices for documenting and preserving reproducible workflows and scientific software. Methodologies were tested with an existing codebase and assets from the Groundwater Decision Support System (GWDSS), originally developed in 2006 for participatory decision making and groundwater management. The original GWDSS provided a hybrid architecture for integrated assessment models by combining a numerical simulation code for groundwater (MODFLOW) with other systems dynamics and optimization components. Prior attempts to resurrect GWDSS were unsuccessful due to problems commonly experienced with scientific software, such as insufficient documentation and backward compatibility issues. This research experimented with two resurrection strategies: 1) Initially, a virtual machine (VM) approach to handle compatibility issues, which found similar obstacles in addition to the lack of provenance that would yield questionable results, and possibly inherent problems with the codebase due to uncurated changes made in the past. 2) Then efforts were redirected to writing a new application that replicates and improves many of the old functionalities of GWDSS, leveraging high-performance computing for batch processing of data while seeking to integrate new web-based technologies for data visualization. Ultimately, research efforts informed design and preparation of an ideal architecture that uses an open source framework and technology stack that enables users to easily access and use distributed data systems.