Browsing by Subject "Program synthesis"
Now showing 1 - 14 of 14
- Results Per Page
- Sort Options
Item An efficient programming-by-example framework(2019-08) Wang, Xinyu; Dillig, Isil; Durrett, Gregory; Pingali, Keshav; Jhala, Ranjit; Naik, MayurDue to the ubiquity of computing, programming has started to become an essential skill for an increasing number of people, including data scientists, financial analysts, and spreadsheet users. While it is well known that building any complex and reliable software is difficult, writing even simple scripts is challenging for novices with no formal programming background. Therefore, there is an increasing need for technology that can provide basic programming support to non-expert computer end-users. Program synthesis, as a technique for generating programs from high-level specifications such as input-output examples, has been used to automate many real-world programming tasks in a number of application domains such as spreadsheet programming and data science. However, developing specialized synthesizers for these application domains is notoriously hard. This dissertation aims to make the development of program synthesizers easier so that we can expand the applicability of program synthesis to more application domains. In particular, this dissertation describes a programming-by-example framework that is both generic and efficient. This framework can be applied broadly to automating tasks across different application domains. It is also efficient and achieves orders of magnitude improvement in terms of the synthesis speed compared to existing state-of-the-art techniques.Item An experimental evaluation and possible extensions of SyPet(2017-05-05) Yang, Zijiang, active 21st century; Khurshid, Sarfraz; Dillig, IsilProgram synthesis is an automated programming technique that automatically constructs a program which satisfies given specifications. SyPet is a recently published novel component-based synthesis tool that assembles a straight-line Java method body that invokes a sequence of methods from a given set of libraries to implement desired functionality that is defined by a given test suite. In this report, we experimentally evaluate the correctness and performance of the publicly available SyPet implementation, at the black-box level, focusing on the size of test suites. We then demonstrate how SyPet can be extended to support some other applications, such as synthesizing non-straight-line methods and program repair. Finally, we conjecture an alternative technique that is conceptually simpler for synthesizing straight-line methods and present a few initial experimental results.Item Automated synthesis of data extraction and transformation programs(2017-12-11) Yaghmazadeh, Navid; Dillig, Isil; Pingali, Keshav; Mooney, Raymond; Solar-Lezama, ArmandoDue to the abundance of data in today’s data-rich world, end-users increasingly need to perform various data extraction and transformation tasks. While many of these tedious tasks can be performed in a programmatic way, most end-users lack the required programming expertise to automate them and end up spending their valuable time in manually performing various data- related tasks. The field of program synthesis aims to overcome this problem by automatically generating programs from informal specifications, such as input-output examples or natural language. This dissertation focuses on the design and implementation of new systems for automating important classes of data transformation and extraction tasks. It introduces solutions for automating data manipulation tasks on fully- structured data formats like relational tables, or on semi-structured formats such as XML and JSON documents. First, we describe a novel algorithm for synthesizing hierarchical data transformations from input-output examples. A key novelty of our approach is that it reduces the synthesis of tree transformations to the simpler problem of synthesizing transformations over the paths of the tree. We also describe a new and effective algorithm for learning path transformations that combines logical SMT-based reasoning with machine learning techniques based on decision trees. Next, we present a new methodology for learning programs that migrate tree-structured documents to relational table representations from input-output examples. Our approach achieves its goal by decomposing the synthesis task to two subproblems of (A) learning the column extraction logic, and (B) learning the row extraction logic. We propose a technique for learning column extraction programs using deterministic finite automata, and a new algorithm for predicate learning which combines integer linear programing and logic minimization. Finally, we address the problem of automating data extraction tasks from natural language. Specifically, we focus on data retrieval from relational databases and describe a novel approach for learning SQL queries from English descriptions. The method we describe is fully automatic and database-agnostic (i.e., does not require customization for each database). Our method combines semantic parsing techniques from the NLP community with novel programming languages ideas involving probabilistic type inhabitation and automated sketch repair.Item Automated testing and sketching of alloy models(2017-05) Sullivan, Allison; Khurshid, Sarfraz; Gligoric, Milos; Julien, Christine; Leonard, Elizabeth; Perry, Dewayne EModels of software systems, e.g., designs, play an important role in the development of reliable and dependable systems. However, writing correct designs is hard. What makes formulating desired design properties particularly hard is the common lack of intuitive and effective techniques for validating their correctness. Despite significant advances in developing notations and languages for writing designs, techniques for validating them are often not as advanced and pose an undue burden on the users. Our thesis is that some foundational and widely used techniques in software testing -- the most common methodology for validating quality of code -- can provide a basis to develop a familiar and effective new approach for checking the correctness of designs. We lay a new foundation for testing designs in the traditional spirit of testing code. Our specific focus is the Alloy language, which is particularly useful for building models of software systems. Alloy is a first-order language based on relations and is supported by a SAT-based analysis tool, which provides a natural backend for building new analyses for Alloy. In recent work, we defined a foundation for testing Alloy models in the spirit of the widely practiced unit testing methodology popularized by the xUnit family of frameworks, such as JUnit for Java. Specifically, AUnit, our testing framework for Alloy, defines test cases, test outcomes (pass/fail), and model coverage, and forms a foundation that enables development of new testing techniques for Alloy. To provide a more robust validation environment for Alloy, we build on the AUnit foundation in four primary ways. One, we introduce test generation algorithms, which automate creation of test inputs, which is traditionally one of the most costly steps in testing. Two, we introduce synthesis of parts of Alloy models using sketching by enumeration and constraint checking, where the user writes a partial Alloy model and outlines the expected behavior using tests, and our sketching framework completes the Alloy model for the user. Three, we investigate optimizations to improve the efficacy of our core model sketching techniques targeting improved scalability. Four, we introduce a second approach for sketching Alloy models that incorporates attributes from our test generation efforts as well as our initial sketching framework and uses equivalent formulas to outline expected behavior rather than tests. To evaluate our techniques, we use a suite of well-studied Alloy models, including some from Alloy's standard distribution, as well as models written by graduate students as part of their homework.Item The diagrammatic specification and automatic generation of geometry subroutines(2010-05) Li, Yulin, Ph. D.; Novak, Gordon S.; Porter, Bruce; Boyer, Robert; Gouda, Mohamed; Kant, ElaineProgramming has advanced a great deal since the appearance of the stored-program architecture. Through the successive generations of machine codes, assembly languages, high-level languages, and object-oriented languages, the drive has been toward program descriptions that express more meaning in a shorter space. This trend continues today with domain-specific languages. However, conventional languages rely on a textual formalism (commands, statements, lines of code) to capture the programmer's intent, which, regardless of its level of abstraction, imposes inevitable overheads. Before successful programming activities can take place, the syntax has to be mastered, names and keywords memorized, the library routines mastered, etc. Existing visual programming languages avoid some of these overheads, but do not release the programmer from the task of specifying the program logic, which consumes the main portion of programming time and also is the major source of difficult bugs. Our work aims to minimize the demands a formalism imposes on the programmer of geometric subroutines other than what is inherent in the problem itself. Our approach frees the programmer from syntactic constraints and generates logically correct programs automatically from program descriptions in the form of diagrams. To write a program, the programmer simply draws a few diagrams to depict the problem context and specifies all the necessary parameters through menu operations. Diagrams are succinct, easy to learn, and intuitive to use. They are much easier to modify than code, and they help the user visualize and analyze the problem, in addition to providing information to the computer. Furthermore, diagrams describe a situation rather than a task and thus are reusable for different tasks—in general, a single diagram can generate many programs. For these reasons, we have chosen diagrams as the main specification mechanism. In addition, we leverage the power of automatic inference to reason about diagrams and generic components—the building blocks of our programs—and discover the logic for assembling these components into correct programs. To facilitate inference, symbolic facts encode entities present in the diagrams, their spatial relationships, and the preconditions and effects of reusable components. We have developed a reference implementation and tested it on a number of real-world examples to demonstrate the feasibility and efficacy of our approach.Item Exploring synergies between program synthesis and machine learning(2022-07-01) Singh, Shikhar; Khurshid, Sarfraz; Garg, Vijay K; Vikalo, Haris; Julien, Christine; Zhang, LingmingProgram synthesis is a term that describes a family of techniques that enables automatic generation of code according to user-provided specifications. Search-based program synthesis offers numerous benefits in developing reliable and performant software and has successfully been deployed in several domains. However, searching through intractable program spaces remains a daunting challenge, precluding the widespread adoption of this technique. Machine learning has had remarkable success in accomplishing incredibly challenging tasks, especially in computer vision and language translation. This thesis introduces two learning-guided techniques which make the program synthesis search more efficient in navigating the program space. In the domain of superoptimization - a solver-aided search-based code optimization technique, we develop machine learning models which direct the enumerative search towards program spaces where an optimal solution is likely to exist. By integrating our models with a modern superoptimizer, we can make the search more efficient by reducing enumeration overheads. Deep learning compilers also incorporate a search-based approach to map computations onto the target hardware. These mappings, also called schedules, specify how the underlying loop-nests corresponding to each layer is mapped on the hardware. The search must navigate through an intractable space of schedules to determine an efficient mapping for a given deep-learning model. We develop a graph convolutional network-based analytical cost model which enables the schedule search to determine the performance of a candidate schedule without depending on slow and expensive executions on hardware. Incorporating a graph-based representation enables us to surpass the accuracy of existing cost models used in state-of-art compilers. In addition to learning-guided techniques, we also explore parallel execution to address the problem of state-space explosion. We study this in the context of symbolic execution, a search-based software testing technique. In this setting, multiple executors can work on non-overlapping regions of a program, thus covering more program space and finding bugs faster when compared to a single instance. While machine learning can help alleviate challenges encountered in program synthesis, we can also utilize program synthesis to construct machine learning models which provide certain correctness guarantees. We propose a novel approach to synthesizing binary classifiers, built using neural networks, from user-provided specifications. By adopting a synthesis-based approach instead of traditional learning using datasets, we can create networks that are correct-by-construction. The results show that our technique can create networks that are difficult to train using data.Item Formal methods for database schema refactoring(2020-09-09) Wang, Yuepeng; Dillig, Isil; Cook, William; Pillai, Vijaychidambaram Velayudhan; Solar-Lezama, Armando; Jermaine, ChristopherDatabase applications typically undergo several schema refactorings during their life cycle due to performance or maintainability reasons. Such refactorings not only require migrating the underlying data to a new schema but also re-implementing large chunks of the code that query and update the database. The code and data migration tasks implied by schema refactoring are notoriously challenging to developers, as they are time-consuming and error-prone. Motivated by these challenges, this dissertation presents formal method techniques to help developers correctly and easily evolve database applications during schema refactoring. Specifically, it first describes how to verify equivalence between database applications that operate over different schemas, such as those that arise before and after schema refactoring. Next, it presents a novel technique that can automatically synthesize an equivalent version of the database program that operates over the new schema. Finally, it describes a synthesis technique that helps developers migrate data between schemas using input-output examples. We also implement research prototypes based on the proposed techniques and perform experiments to evaluate their effectiveness and efficiency. The experimental results demonstrate that our techniques are effective for code and data migration during schema refactoring, and they are more efficient than several baselines and state-of-the-art approaches.Item Leveraging program synthesis for robust long-term robot autonomy via interactive learning and adaptation(2022-03-15) Holtz, Jarrett; Biswas, Joydeep (Assistant professor); Guha, Arjun; Niekum, Scott; Chaudhuri, SwaratFor autonomous robots to become as pervasive in uncontrolled human environments and our everyday lives as they are on campuses and in warehouses, they need to be deployable by end-users for various tasks. End-user deployment of autonomous robots over the long term requires robust behaviors that can leverage fundamental robot capabilities to achieve diverse goals while subject to various domains and user preferences. Achieving this goal requires a system for designing and adapting behaviors that is intuitive, data-efficient, easy to integrate, and can handle changes in user-imposed requirements over long deployments. State-of-the-art approaches to designing robot behaviors broadly fall into three categories: reinforcement learning, inverse reinforcement learning, and learning from demonstration. State-of-the-art approaches for these techniques widely leverage deep neural networks (DNNs) as function approximators to represent either the complete behavior, an optimal reward function, or both a value function and the behavior in Actor-Critic approaches. DNNs are a powerful tool for function approximation that have been the catalyst for significant successes across a wide range of learning applications. While DNN-based approaches are broadly applicable, they suffer from three key weaknesses when used for end-user robot behavior design and adaptation: 1) DNNs are black-box behavior representations and thus are opaque to the user and difficult to understand or verify, 2) learning with DNNs is extremely data-intensive, often requiring that data be collected in simulation, and 3) DNN behaviors are difficult to adapt and sensitive to changing domains or user- preferences, such as when transferring from simulation to the real world. In this thesis, we present approaches to leverage program synthesis as an alternative function approximator for learning from demonstration to approximate behaviors and reward functions, respectively. Program synthesis as a function approximator addresses some limitations of DNN-based approaches by yielding human-readable behavior representations that are amenable to program repair and parameter optimization for adaptation, and that can leverage the well-structured space of programs to learn behaviors in a data-efficient manner. However, due to two primary factors, existing state-of-the-art synthesis approaches are insufficient to learn general robot programs. First, these approaches are not designed to handle non-linear real arithmetic, vector operations, or dimensioned quantities, all commonly found in robot programs. Second, synthesis techniques are primarily limited by their ability to scale with the search space of potential programs, such that synthesis of many reasonably complex behaviors is intractable for existing approaches. To address the goal of end-user-guided robot behavior learning and adaption, We present Physics Informed Programs Synthesis (PIPS) as part of a learning from demonstration and adaptation approach to lifelong robot learning. Towards this goal, this thesis presents the following contributions: 1) An algorithm for PIPS that addresses limitations of program synthesis for robotics by reasoning about physical quantities, 2) algorithms for LfD leveraging PIPS to learn robot behaviors as human-readable programs, 3) an approach to guiding lifelong robot learning by leveraging the structure of programmatic policies and demonstrations, 4) program repair and synthesis techniques for adapting these learned policies from iterative user guidance, and finally, 5) extensive evaluation results in the social robot navigation domain across simulated and real-world deployments that compare PIPS-based learning to DNN-based and traditional approaches.Item Practical formal methods for software analysis and development(2021-03-14) Ferles, Konstantinos; Dillig, Isil; Chaudhuri, Swarat; Bornholt, James; Aiken, Alex; McMillan, KennethFormal methods techniques for improving software correctness and reliability fall into two categories, namely, program analysis and program synthesis. Program analysis techniques automatically find defects (or prove the absence thereof) in existing software. In a dual way, program synthesis techniques generate correct-by-construction code from high-level specifications. In this thesis, we propose an array of formal method techniques that further improve the state-of-the-art of program analysis techniques, while also applying program synthesis techniques in previously unexplored domains. Broadly speaking, the long history of program analysis can be summarized as the battle between precision and scalability. As a first step in this thesis, we propose a technique called program trimming that helps arbitrary safety-checking tools to achieve a better balance between precision and scalability. In a nutshell, program trimming semantically simplifies the program by eliminating provably correct execution paths. Because the number of execution paths is a typical scalability bottleneck for some techniques (e.g., symbolic execution) and a source of imprecision for others (e.g., abstract interpretation), program trimming can be used to improve both precision and scalability of program analysis tools. We have implemented our technique in a tool called Trimmer and showed that Trimmer significantly improves the behavior of two different program analyzers over a set of challenging verification benchmarks. Program synthesis, on the other hand, has only recently started to appear in more practical aspects of software development. Formal method techniques in this area aim to ease programming for several domains while targeting a broad range of programmers, from novices to experts. In this thesis, we propose a novel program synthesis technique, implemented in a tool called Expresso, that aids experts in writing correct and efficient concurrent programs. Specifically, Expresso allows programmers to implement concurrent programs using the implicit signaling paradigm, where the programmer specifies the condition under which a thread should block but she does not need to worry about explicitly signaling other threads when this condition becomes true. Given such an implicit signaling program, Expresso generates an efficient and correct-by-construction program that does not contain deadlocks caused by improper signaling. Our evaluation shows that Expresso is able to synthesize efficient implementations of real-world monitors with performance comparable to the one written by programming experts. Finally, we observe that certain monitors require their clients to use their API in a manner that conforms to a context-free specification. Statically verifying that a client conforms to a context-free API protocol cannot be handled by any prior technique. To rectify this and ensure that client applications properly use such protocols, we propose CFPChecker, a tool that verifies the correct usage of context-free API protocols. Given a program, P, and an API protocol expressed as a parameterized context-free grammar, GS, CFPChecker either proves that P conforms to GS or provides a genuine program trace that demonstrates an API misuse. We have evaluated our proposed technique on a wide variety of popular context-free API protocols and several clients drawn from popular open-source applications. Our experiments show that CFPChecker is effective in both verifying the correct usage and finding counterexamples of context-free API protocols.Item Program synthesis using statistical models and logical reasoning(2018-08-14) Feng, Yu, Ph. D.; Dillig, Isil; Mooney, Raymond; Kr ahenb uhl, Philipp; Aiken, AlexComplex APIs in new frameworks (Spark, R, TensorFlow, etc) have imposed steep learning curves on everyone, especially for people with limited programming backgrounds. For instance, due to the messy nature of data in different application domains, data scientists spend close to 80% of their time in data wrangling tasks, which are considered to be the "janitor work" of data science. Similarly, software engineers spend hours or even days learning how to use APIs through official documentation or examples from online forums. Program synthesis has the potential to automate complex tasks that involve API usage by providing powerful search algorithms to look for executable programs that satisfy a given specification (input-output examples, partial programs, formal specs, etc). However, the biggest barrier to a practical synthesizer is the size of search space, which increases strikingly fast with the complexity of the programs and the size of the targeted APIs. To address the above issue, this dissertation focuses on developing algorithms that push the frontiers of program synthesis. First, we propose a type-directed graph reachability algorithm in SyPet, a synthesizer for assembling programs from complex APIs. Second, we show how to combine enumerative search with lightweight constraint-based deduction in Morpheus, a synthesizer for automating real-world data wrangling tasks from input-output examples. Finally, we generalize the previous approaches to develop a novel conflict-driven synthesis algorithm that can learn from past mistakes.Item Programmatic reinforcement learning(2021-08) Verma, Abhinav; Chaudhuri, Swarat; Alur, Rajeev; Biswas, Joydeep; Stone, PeterProgrammatic Reinforcement Learning is the study of learning algorithms that can leverage partial symbolic knowledge provided in expressive high-level domain specific languages. The aim of such algorithms is to learn agents that are reliable, secure, and transparent. This means that such agents can be expected to learn desirable behaviors with limited data, while provably maintaining some essential correctness invariant, and providing insights into their decision mechanisms which can be understood by humans. Contrasted with the popular Deep Reinforcement Learning paradigm, where the learnt policy is represented by a neural network, programmatic representations are more easily interpreted and more amenable to verification by scalable symbolic methods. The interpretability and verifiability of these policies provides the opportunity to deploy reinforcement learning based solutions in safety critical environments. In this dissertation, we formalize the concept of Programmatic Reinforcement Learning, and introduce algorithms that integrate policy learning with principled mechanisms that incorporate domain knowledge. An analysis of the presented algorithms demonstrates that they posses robust theoretical guarantees and are capable of impressive performance in challenging reinforcement learning environments.Item Theory and techniques for synthesizing efficient breadth-first search algorithms(2012-08) Nedunuri, Srinivas; Cook, William Randall; Batory, Don; Baxter, Ira; Pingali, Keshav; Smith, Douglas R.The development of efficient algorithms to solve a wide variety of combinatorial and planning problems is a significant achievement in computer science. Traditionally each algorithm is developed individually, based on flashes of insight or experience, and then (optionally) verified for correctness. While computer science has formalized the analysis and verification of algorithms, the process of algorithm development remains largely ad-hoc. The ad-hoc nature of algorithm development is especially limiting when developing algorithms for a family of related problems. Guided program synthesis is an existing methodology for systematic development of algorithms. Specific algorithms are viewed as instances of very general algorithm schemas. For example, the Global Search schema generalizes traditional branch-and-bound search, and includes both depth-first and breadth-first strategies. Algorithm development involves systematic specialization of the algorithm schema based on problem-specific constraints to create efficient algorithms that are correct by construction, obviating the need for a separate verification step. Guided program synthesis has been applied to a wide range of algorithms, but there is still no systematic process for the synthesis of large search programs such as AI planners. Our first contribution is the specialization of Global Search to a class we call Efficient Breadth-First Search (EBFS), by incorporating dominance relations to constrain the size of the frontier of the search to be polynomially bounded. Dominance relations allow two search spaces to be compared to determine whether one dominates the other, thus allowing the dominated space to be eliminated from the search. We further show that EBFS is an effective characterization of greedy algorithms, when the breadth bound is set to one. Surprisingly, the resulting characterization is more general than the well-known characterization of greedy algorithms, namely the Greedy Algorithm parametrized over algebraic structures called greedoids. Our second contribution is a methodology for systematically deriving dominance relations, not just for individual problems but for families of related problems. The techniques are illustrated on numerous well-known problems. Combining this with the program schema for EBFS results in efficient greedy algorithms. Our third contribution is application of the theory and methodology to the practical problem of synthesizing fast planners. Nearly all the state-of-the-art planners in the planning literature are heuristic domain-independent planners. They generally do not scale well and their space requirements also become quite prohibitive. Planners such as TLPlan that incorporate domain-specific information in the form of control rules are orders of magnitude faster. However, devising the control rules is labor-intensive task and requires domain expertise and insight. The correctness of the rules is also not guaranteed. We introduce a method by which domain-specific dominance relations can be systematically derived, which can then be turned into control rules, and demonstrate the method on a planning problem (Logistics).Item Unifying program repair and program synthesis(2018-07-25) Hua, Jinru; Khurshid, Sarfraz; Garg, Vijay; Julien, Christine; Perry, Dewayne; Prasad, MukulThe last few years have seen much progress in two related but traditionally disjoint areas of research: program repair and program synthesis. Program repair is the problem of locating and removing faults in a given faulty program. Program synthesis is the problem of generating programs automatically from high-level specifications. While innovation in each of these two research areas has been impressive, the techniques developed within one area have largely been confined to that area. Our insight is that the unification of program repair and program synthesis holds a key to developing well-founded, systematic, and scalable tools for repairing complex defects. The contribution of this dissertation is three-fold: a synthesis-based approach SketchRep for program repair based on propositional satisfiability solving, an execution-driven synthesis engine EdSketch for Java, and a program repair approach SketchFix to repair defects at the AST node-level with execution-driven sketching. SketchRep is a debugging approach that reduces the problem of program repair to a sub-problem of program synthesis, namely program sketching, in which the user writes a sketch, i.e., an incomplete program that has holes, and automated tools complete the sketch with respect to the given specification or reference implementation. Our program repair approach translates the given faulty program to a sketch and leverages an off-the-shelf inductive synthesizer to fill in the holes of the incomplete program with respect to the given test suite. EdSketch is an execution-driven synthesis engine for Java. Traditional solutions to the sketching problem perform a translation to SAT formulas. While effective for a range of programs, when applied to real applications, such translation-based approaches may lead to impractical problems to translate all relevant libraries to SAT. Instead of transforming the program to logic formulas for SAT solvers, EdSketch explores the actual program behaviors in presence of libraries and provides a practical solution to sketching small parts of real-world applications. SketchFix is a repair technique that generates candidate fixes on demand during the test execution. It translates faulty programs to sketches, compiles each sketch once which may represent a large number of concrete candidates, and lazily initializes the candidates of the sketches while validating them against the test execution. The dissertation describes each technique and presents experimental results that demonstrate its efficacy.Item Zero-One Integer Linear Programming for program synthesis(2016-05) Ashok Kaushik, Arati; Dillig, Thomas; Dillig, IsilProgram synthesis techniques generate code automatically for a given specification, while code reuse techniques adapt existing code to suit the user's requirements. These methods can be used to help developers implement hard-to-write functions which they find difficult to code by themselves. At the same time, they can also be used to automatically synthesize uninteresting glue code, thereby enabling programmers to concentrate on their own key goals. In this thesis, we describe how 0-1 Integer Linear Programming (ILP) can be utilized for program synthesis and code reuse, by discussing its employment in two recent applications.