Wagner, Luke A.
14 January 2010
To work effectively, programmers need tools to support their typical development activities, such as the creation, analysis, and transformation of source code. Analysis and transformation tools can be difficult to write for modern programming languages and, without a reusable framework, each tool must separately implement nontrivial algorithms like name lookup and type checking. This thesis describes an extension to one such framework, named Pivot, that focuses on programs written in C++. This extension, named Filter, assists the tool builder in traversal, case analysis, and lowering of the data structure representing C++ programs. Comparisons described in the thesis show a 2-4x code reduction when solving basic problems (e.g., searching for uses of a given declaration) using the extension and a performance overhead that drops below 2x for larger problems (e.g., checking C++ layout compatibility).
Web applications are one of the fastest growing types of software systems today. Structurally, they are composed out of two parts: the server-side, used for data-access and business logic, and the client-side used as a user-interface. In recent years, thanks to fast, modern web browsers and advanced scripting techniques, developers are building complex interfaces, and the client-side is playing an increasingly important role. From the user's perspective, the client-side offers a number of features. A feature is an abstract notion representing a distinguishable part of the system behavior. Similar features are often used in a large number of web applications, and facilitating their reuse would offer considerable benefits. However, the client-side technology stack does not offer any widely used structured reuse method, and code responsible for a feature is usually copy-pasted to the new application. Copy-paste reuse can be complex and error prone - usually it is hard to identify exactly the code responsible for a certain feature and introduce it into the new application without errors. The primary focus of the research described in this PhD thesis is to provide methods and tools for automatizing reuse in client-side web application development. This overarching problem leads to a number of sub-problems: i) how to identify code responsible for a particular feature; ii) how to include the code that implements a feature into an already existing application without breaking neither the code of the feature nor of the application; and iii) how to automatically generate sequences of user actions that accurately capture the behavior of a feature? In order to tackle these problems we have made the following contributions: i) a client-side dependency graph that is capable of capturing dependencies that exist in client-side web applications, ii) a method capable of identifying the exact code and resources that implement a particular feature, iii) a method that can introduce code from one application into another without introducing errors, and iv) a method for generating usage scenarios that cause the manifestation of a feature. Each contribution was evaluated a suite of web applications, and the evaluations have shown that each method is capable of performing its intended purpose.
Williamson, Eric Robert
09 July 2018
To learn about computer operating systems, students at Virginia Tech implement a command-line shell in their Computer Systems course. Successfully implementing the shell requires a deep understanding of operating system abstractions and interactions. Students often struggle with the project because subtle errors can take hours to debug. In this work, we developed two hybrid domain-specific analysis tools to pinpoint the root causes of student errors: EshMD and ShellTrace. The EshMD tool models common errors in the shell and checks the student code against those models. To accomplish this, it monitors the specific calls the program is making and correlates those with expected changes in its environment. Students' errors are shown directly in the source code. The concept of EshMD can be applied to other programming projects by observing and modeling common bugs during implementation. The ShellTrace tool dynamically creates a specification from a reference solution based on how the reference solution makes use of operating system resources and then uses this specification to check that a student solution is functionally identical. The ShellTrace concept can be applied to other programs that exhibit similar resource dependencies. We deployed these tools in an undergraduate computer systems class and evaluated our tools based on the number of bugs detected and the students' perceptions of usefulness. We found that the tools detected a significant number of bugs and that the majority of students that made use of the tools found them valuable in debugging their submissions. / Master of Science
24 July 2013
With the advent of multicore processors and an increased emphasis on parallel computing, parallel programming has become a fundamental requirement for achieving available performance. Parallel programming is inherently hard because, to reason about the correctness of a parallel program, programmers have to consider large numbers of interleavings of statements in different threads in the program. Though structured parallelism imposes some restrictions on the programmer, it is an attractive approach because it provides useful guarantees such as deadlock-freedom. However, data races remain a challenging source of bugs in parallel programs. Data races may occur only in few of the possible schedules of a parallel program, thereby making them extremely hard to detect, reproduce, and correct. In the past, dynamic data race detection algorithms have suffered from at least one of the following limitations: some algorithms have a worst-case linear space and time overhead, some algorithms are dependent on a specific scheduling technique, some algorithms generate false positives and false negatives, some have no empirical evaluation as yet, and some require sequential execution of the parallel program. In this thesis, we introduce dynamic data race detection algorithms for structured parallel programs that overcome past limitations. We present a race detection algorithm called ESP-bags that requires the input program to be executed sequentially and another algorithm called SPD3 that can execute the program in parallel. While the ESP-bags algorithm addresses all the above mentioned limitations except sequential execution, the SPD3 algorithm addresses the issue of sequential execution by scaling well across highly parallel shared memory multiprocessors. Our algorithms incur constant space overhead per memory location and time overhead that is independent of the number of processors on which the programs execute. Our race detection algorithms support a rich set of parallel constructs (including async, finish, isolated, and future) that are found in languages such as HJ, X10, and Cilk. Our algorithms for async, finish, and future are precise and sound for a given input. In the presence of isolated, our algorithms are precise but not sound. Our experiments show that our algorithms (for async, finish, and isolated) perform well in practice, incurring an average slowdown of under 3x over the original execution time on a suite of 15 benchmarks. SPD3 is the first practical dynamic race detection algorithm for async-finish parallel programs that can execute the input program in parallel and use constant space per memory location. This takes us closer to our goal of building dynamic data race detectors that can be "always-on" when developing parallel applications.
Qiu, Rui, active 21st century
18 September 2014
Symbolic execution is a powerful, systematic analysis that has received much visibility in the last decade. Scalability however remains a major challenge for symbolic execution. Compositional analysis is a well-known general purpose methodology for increasing scalability. This thesis introduces a new approach for compositional symbolic execution. Our key insight is that we can summarize each analyzed method as a memoization tree that captures the crucial elements of symbolic execution, and leverage these memoization trees to efficiently replay the symbolic execution of the corresponding methods with respect to their calling contexts. Memoization trees offer a natural way to compose in the presence of heap operations, which cannot be dealt with by previous work that uses logical formulas as summaries for composi- tional symbolic execution. Our approach also enables an efficient treatment of error traces by short-circuiting the execution of paths that lead to them. Our preliminary experimental evaluation based on a prototype implementation in Symbolic PathFinder shows promising results. / text
Hanks, Jeanne Marie
No description available.
Developing automated techniques for identifying a fault candidate set (i.e., subset of executed statements that contains the faulty code responsible for the failure during a program run), can greatly reduce the effort of debugging. Over 15 years ago precise dynamic slicing was proposed to identify a fault candidate set as consisting of all executed statements that influence the computation of an incorrect value through a chain of data and/or control dependences. However, the challenge of making precise dynamic slicing practical has not been addressed. This dissertation addresses this challenge and makes precise dynamic slicing useful for debugging realistic applications. First, the cost of computing precise dynamic slices is greatly reduced. Second, innovative ways of using precise dynamic slicing are identified to produce small failure candidate sets. The key cause of high space and time cost of precise dynamic slicing is the very large size of dynamic dependence graphs that are constructed and traversed for computing dynamic slices. By developing a novel series of optimizations the size of the dynamic dependence graph is greatly reduced leading to a compact representation that can be rapidly traversed. Average space needed is reduced from 2 Gigabytes to 94 Megabytes for dynamic dependence graphs corresponding to executions with average lengths of 130 Million instructions. The precise dynamic slicing time is reduced from up to 20 minutes for a demand-driven algorithm to 16 seconds. A compression algorithm is developed to further reduce dependence graph sizes. The resulting representation achieves the space efficiency such that the dynamic execution history of executing a couple of billion instructions can be held in a Gigabyte of memory. To further scale precise dynamic slicing to longer program runs, a novel approach is proposed that uses checkpointing/logging to enable collection of dynamic history of only the relevant window of execution. Classical backward dynamic slicing can often produce fault candidate sets that contain thousands of statements making the task of identifying faulty code very time consuming for the programmer. Novel techniques are proposed to improve effectiveness of dynamic slicing for fault location. The merit of these techniques lies in identifying multiple forms of dynamic slices in a failed run and then intersecting them to produce smaller fault candidate sets. Using these techniques, the fault candidate set size corresponding to the backward dynamic slice is reduced by nearly a factor of 3. A fine-grained statistical pruning technique based on value profiles is also developed and this technique reduces the sizes of backward dynamic slices by a factor of 2.5. In conclusion, this dissertation greatly reduces the cost of precise dynamic slicing and presents techniques to improve its effectiveness for fault location.
Standard techniques for analysing sequential programs are severely constrained when applied to a concurrent program because they cannot take full advantage of the concurrent structure of the program. In this work, we overcome this limitation using a novel approach which ``lifts'' a sequential dataflow analysis to a concurrent analysis. First, we introduce concurrency primitives which abstract away from the details of how concurrency features are implemented in real programming languages. Using these primitives, we describe how sequential analyses can be made applicable to concurrent programs. Under some circumstances, there is no penalty for concurrency: our method produces results which are as precise as the sequential analysis. Our lifting is straightforward, and we illustrate it on some standard analyses -- available expressions, live variables and generalized constant propagation. Finally, we describe how concurrency features of real languages can be expressed using our abstract concurrency primitives, and present analyses for finding our concurrency primitives in real programs.
Top: An Infrastructure for detecting Application-Specific Program Errors by Binary Runtime InstrumentationGopal, Prasad 12 January 2007 (has links)
Finding errors in applications has been achieved using a wide variety of techniques. Some tools instrument the application to check for program properties dynamically whereas others analyze the program statically. We use a technique that analyzes a program's execution by binary runtime instrumentation. Unlike tools that work on a particular language or an intermediate representation of a language, our approach works directly on binaries and hence it is not bound to any language. In order to instrument binaries, we use a binary instrumentation system called Pin, which provides APIs to instrument the application at runtime. We have built an infrastructure using Pin called Top that allows program entities like variables and events to be traced. Using finite automata we can check if certain events take place during the execution of the program. Top consists of a Tracing System that can trace movement of pointers to memory locations or 32-bit data values and keeps track of all their copies. It also provides an Event Framework that reports the occurrence of events such as function calls or returns. Top provides a programming interface which allows querying for particular events. The query is compiled with Top to produce a customized analysis tool, also called client. Running the analysis tool with the application, under Pin, results in events of interest being detected and reported. Using Top, we built a Memory Checker that checks for incorrect usage of dynamic memory allocation APIs and semantically incorrect accesses to dynamically allocated memory. Since we perform fine-grained checking by tracing references, our memory checker found some errors that a popular memory checker called valgrind did not. We have also built an MPI Checker which is used to check if programs use MPI's asynchronous communication primitives properly. This checker can detect errors related to illegal data buffer accesses and errors where the programmer inadvertently overwrote a handle needed to finish the processing of a request. / Master of Science
16 December 2013
Production compilers implement optimizing transformation rules for built-in types. What justifies applying these optimizing rules is the axioms that hold for built-in types and the built-in operations supported by these types. Similar axioms also hold for user-defined types and the operations defined on them, and therefore justify a set of optimization rules that may apply to user-defined types. Production compilers, however, do not attempt to construct and apply these optimization rules to user-defined types. Built-in types together the axioms that apply to them are instances of more general algebraic structures. So are user-defined types and their associated axioms. We use the technique of generic programming, a programming paradigm to design efficient, reusable software libraries, to identify the commonality of classes of types, whether built-in or user-defined, convey the semantics of the classes of types to compilers, design scalable and effective program analysis for them, and eventually apply optimizing rules to the operations on them. In generic programming, algorithms and data structures are defined in terms of such algebraic structures. The same definitions are reused for many types, both built-in and user-defined. This dissertation applies generic programming to compiler analyses and transformations. Analyses and transformations are specified for general algebraic structures, and they apply to all types, both built-in and primitive types.
Page generated in 0.0431 seconds