Machine learning for executable code in software testing and verification




Nie, Pengyu

Journal Title

Journal ISSN

Volume Title



Software testing and verification are essential for keeping software systems reliable and safe to use. However, it requires significant manual effort to write and maintain code artifacts needed for testing and verification, i.e., tests and proofs. With the pressure for developing software in limited time, developers usually write tests and proofs much later than the code under test/verification, which leaves room for software bugs in unchecked code. Recent advances in machine learning (ML) models, especially large language models (LLMs), can help reduce manual effort for testing and verification. Namely, developers can benefit from ML models’ predictions to write tests and proofs faster. However, existing models understand and generate software code as natural language text, ignoring the unique property of software being executable. Software execution is the process of a computer reading and acting on software code. Our insight is that ML models can greatly benefit from software execution, e.g., by inspecting and simulating the execution process, to generate more accurate predictions. Integrating execution with ML models is important for generating tests and proofs because ML models using only syntax-level information do not perform well on these tasks. This dissertation presents the design and implementation of two execution-guided ML models to improve developers’ productivity in writing testing and verification code: TeCo for test completion and Roosterize for lemma naming. First, this dissertation introduces TeCo to aid developers in completing next statements when writing tests, a task we formalized as test completion. TeCo exploits code semantics extracted from test execution results (e.g., local variable types) and execution context (e.g., last called method). TeCo also reranks ML model’s predictions by executing the predicted statements, to prioritize functionally correct predictions. Compared to existing code completion models that use only syntax-level information (including LLMs trained on massive code dataset), TeCo improves the accuracy of test completion by 29%. Second, this dissertation introduces Roosterize to suggest lemma names when developers write proofs using proof assistants, such as Coq. Consistent coding conventions are important as verification projects based on proof assistants become larger, but manually enforcing the conventions can be costly. Existing ML models for method naming, a similar task in other programming languages, extract and summarize information extracted from code tokens, which is not suitable for Coq where the lemma names should exhibit semantic meanings that are not explicit in code tokens. Roosterize leverages the execution representations of the lemma from various phases of the proof assistant’s execution, including syntax trees from the parser and elaborated terms from the kernel. Roosterize improves the accuracy of lemma naming by 39% compared to baselines. Our findings in this dissertation support that the integration with execution can effectively improve the accuracy of ML models for testing and verification, which enables developing trustworthy software with high-quality tests and proofs with less manual effort.


LCSH Subject Headings