591 |
Techniques for Automated Software EvolutionKhatchadourian, Raffi Takvor 20 July 2011 (has links)
No description available.
|
592 |
EFFICIENT AND PRODUCTIVE GPU PROGRAMMINGMengchi Zhang (13109886) 28 July 2022 (has links)
<p> </p>
<p>Productive programmable accelerators, like GPUs, have been developed for generations to support programming features. The ever-increasing performance improves the usability of programming features on GPUs, and these programming features further ease the porting of code and data structure from CPU to GPU. However, GPU programming features, such as function call or runtime polymorphism, have not been well explored or optimized.</p>
<p>I identify efficient and productive GPU programming as a potential area to exploit. Although many programming paradigms are well studied and efficiently supported on CPU architectures, their performance on novel accelerators, like GPUs, has never been studied, evaluated, and made perfect. For instance, programming with functions is a commonplace programming paradigm that shapes software programs with modularity and simplifies code with reusability. A large amount of work has been proposed to alleviate function calling overhead on CPUs, however, less paper talked about its deficiencies on GPUs. On the other hand, polymorphism amplifies an object’s behaviors at runtime. A body of work targets</p>
<p>efficient polymorphism on CPUs, but no work has ever discussed this feature under GPU contexts.</p>
<p><br></p>
<p>In this dissertation, I discussed those two programming features on GPU architectures. First, I performed the first study to identify the deficiency of GPU polymorphism. I created micro-benchmarks to evaluate virtual function overhead in controlled settings and the first GPU polymorphic benchmark suite, ParaPoly, to investigate real-world scenarios. The micro-benchmarks indicated that the virtual function overhead is usually negligible but can</p>
<p>cause up to a 7x slowdown. Virtual functions in ParaPoly show a geometric meaning of 77% overhead on GPUs compared to the function’s inlined version. Second, I proposed two novel techniques that determine an object’s type only by its address pointer to improve GPU polymorphism. The first technique, Coordinated Object</p>
<p>Allocation and function Lookup (COAL) is a software-only technique that uses the object’s address to determine its type. The second technique, TypePointer, needs hardware modification to embed the object’s type information into its address pointer. COAL achieves 80% and 6% improvements, and TypePointer achieves 90% and 12% over contemporary CUDA and our type-based SharedOA.</p>
<p>Considering the growth of GPU programs, function calls become a pervasive paradigm to be consistently used on GPUs. I also identified the overhead of excessive register spilling with function calls on GPU. To diminish this cost, I proposed a novel Massively Multithreaded Register Windowing technique with Variable Size Register Window and Register-Conscious Warp Scheduling. Our techniques improve the representative workloads with a geometric</p>
<p>mean of 1.18x with only 1.8% hardware storage overhead.</p>
|
593 |
Scalable and Energy-Efficient SIMT Systems for Deep Learning and Data Center MicroservicesMahmoud Khairy A. Abdallah (12894191) 04 July 2022 (has links)
<p> </p>
<p>Moore’s law is dead. The physical and economic principles that enabled an exponential rise in transistors per chip have reached their breaking point. As a result, High-Performance Computing (HPC) domain and cloud data centers are encountering significant energy, cost, and environmental hurdles that have led them to embrace custom hardware/software solutions. Single Instruction Multiple Thread (SIMT) accelerators, like Graphics Processing Units (GPUs), are compelling solutions to achieve considerable energy efficiency while still preserving programmability in the twilight of Moore’s Law.</p>
<p>In the HPC and Deep Learning (DL) domain, the death of single-chip GPU performance scaling will usher in a renaissance in multi-chip Non-Uniform Memory Access (NUMA) scaling. Advances in silicon interposers and other inter-chip signaling technology will enable single-package systems, composed of multiple chiplets that continue to scale even as per-chip transistors do not. Given this evolving, massively parallel NUMA landscape, the placement of data on each chiplet, or discrete GPU card, and the scheduling of the threads that use that data is a critical factor in system performance and power consumption.</p>
<p>Aside from the supercomputer space, general-purpose compute units are still the main driver of data center’s total cost of ownership (TCO). CPUs consume 60% of the total data center power budget, half of which comes from the CPU pipeline’s frontend. Coupled with the hardware efficiency crisis is an increased desire for programmer productivity, flexible scalability, and nimble software updates that have led to the rise of software microservices. Consequently, single servers are now packed with many threads executing the same, relatively small task on different data.</p>
<p>In this dissertation, I discuss these new paradigm shifts, addressing the following concerns: (1) how do we overcome the non-uniform memory access overhead for next-generation multi-chiplet GPUs in the era of DL-driven workloads?; (2) how can we improve the energy efficiency of data center’s CPUs in the light of microservices evolution and request similarity?; and (3) how to study such rapidly-evolving systems with an accurate and extensible SIMT performance modeling?</p>
|
594 |
Deterministic Reactive Programming for Cyber-physical SystemsMenard, Christian 03 June 2024 (has links)
Today, cyber-physical systems (CPSs) are ubiquitous. Whether it is robotics, electric vehicles, the smart home, autonomous driving, or smart prosthetics, CPSs shape our day-to-day lives. Yet, designing and programming CPSs becomes evermore challenging as the overall complexity of systems increases. CPSs need to interface (potentially distributed) computation with concurrent processes in the physical world while fulfilling strict safety requirements. Modern and popular frameworks for designing CPS applications, such as ROS and AUTOSAR, address the complexity challenges by emphasizing scalability and reactivity. This, however, comes at the cost of compromising determinism and the time predictability of applications, which ultimately compromises safety. This thesis argues that this compromise is not a necessity and demonstrates that scalability can be achieved while ensuring a predictable execution.
At the core of this thesis is the novel reactor model of computation (MoC) that promises to provide timed semantics, reactivity, scalability, and determinism. A comprehensive study of related models indicates that there is indeed no other MoC that provides similar properties. The main contribution of this thesis is the introduction of a complete set of tools that make the reactor model accessible for CPS design and a demonstration of their ability to facilitate the development of scalable deterministic software.
After introducing the reactor model, we discuss its key principles and utility through an adaptation of reactors in the DEAR framework. This framework integrates reactors with a popular runtime for adaptive automotive applications developed by AUTOSAR. An existing AUTOSAR demonstrator application serves as a case study that exposes the problem of nondeterminism in modern CPS frameworks. We show that the reactor model and its implementation in the DEAR framework are applicable for achieving determinism in industrial use cases.
Building on the reactor model, we introduce the polyglot coordination language Lingua Franca (LF), which enables the definition of reactor programs independent of a concrete target programming language. Based on the DEAR framework, we develop a full-fledged C++ reactor runtime and a code generation backend for LF. Various use cases studied throughout the thesis illustrate the general applicability of reactors and LF to CPS design, and a comprehensive performance evaluation using an optimized version of the C++ reactor runtime demonstrates the scalability of LF programs. We also discuss some limitations of the current scheduling mechanisms and show how they can be overcome by partitioning programs.
Finally, we consider design space exploration (DSE) techniques to further improve the scalability of LF programs and manage hardware complexity by automating the process of allocating hardware resources to specific components in the program. This thesis contributes the Mocasin framework, which resembles a modular platform for prototyping and researching DSE flows. While a concrete integration with LF remains for future work, Mocasin provides a foundation for exploring DSE in Lingua Franca.
|
595 |
A visual language for ADA program unit specificationsGordon, Christopher Todd 23 June 2009 (has links)
This thesis describes a visual programming language designed to describe and generate Ada program unit specifications. The author first describes the foundations for the work, and gives a brief introduction to some of the features of the language. Most of the thesis is dedicated to describing the visual representation for each portion of an Ada package specification. The
BNF grammar of an Ada package specification is used as a basis for organization. By organizing the thesis via the package specification, all program unit specifications i.e., package, task, subprogram and generic specifications) are described and given a representation in the language.
Toward the end of the thesis, the design and reference of a package specification is demonstrated in a hypothetical implementation. / Master of Science
|
596 |
IMPROVING THE UTILIZATION AND PERFORMANCE OF SPECIALIZED GPU CORESAaron M Barnes (20767127) 26 February 2025 (has links)
<p dir="ltr">Specialized hardware accelerators are becoming increasingly common to provide application performance gain despite the slowing trend of transistor scaling. Accelerators can adapt to the compute and data dependency patterns of an application to fully exploit the parallelism of the application and reduce data movement. However, specialized hardware is often limited by the application it was tailored to, which can lead to idle or inactive silicon in computations that do not match the exact patterns it was designed for. In this work I study two cases of GPU specialization and techniques that can be used to improve performance in a broader domain of applications. </p><p dir="ltr">First, I examine the effects of GPU core partitioning, a trend in contemporary GPUs to sub-divide core components to reduce area and energy overheads. Core partitioning is essentially a specialization of the hardware towards balanced applications, wherein the intra-core connectivity provides minimal benefit but takes up valuable on-chip area. I identify four orthogonal performance effects of GPU core sub-division, two of which have significant impact in practice: a bottleneck in the read operand stage caused by the reduced number of collector units and register banks allocated to each sub-core, and an instruction issue imbalance across sub-core schedulers caused by a simple round robin assignment of threads to sub-cores. To alleviate these issues I propose a Register Bank Aware (RBA) warp scheduler, which uses feedback from current register bank contention to inform thread scheduling decisions, and a hashed sub-core work scheduler to prevent pathological issue imbalances caused by round robin scheduling. I rigorously evaluate these designs in simulation and show they are able to capture 81% of the performance lost due to core subdivision. Further, I evaluate my techniques using synthesis tools and find that RBA is able to achieve performance equivalent to doubling the number of operand Collector Units (CUs) per sub-core with only a 1% increase in area and power.</p><p dir="ltr">Second, I study the inclusion of specialized ray tracing accelerator cores on GPUs. Specialized ray-tracing acceleration units have become a common feature in GPU hardware, enabling real-time ray-tracing of complex scenes for the first time. The ray-tracing unit accelerates the traversal of a hierarchical tree data structure called a bounding volume hierarchy to determine whether rays have intersected triangle primitives. Hierarchical search algorithms are a fundamental software pattern common in many important domains, such as recommendation systems and point cloud registration, but are difficult for GPUs to accelerate because they are characterized by extensive branching and recursion. The ray-tracing unit overcomes these limitations with specialized hardware to traverse hierarchical data structures efficiently, but is mired by a highly specialized graphics API, which is not readily adaptable to general-purpose computation. In this work I present the Hierarchical Search Unit (HSU), a flexible datapath to accelerate a more general class of hierarchical search algorithms, of which ray-tracing is one. I synthesize a baseline ray-intersection datapath and maximize functional unit reuse while extending the ray-tracing unit to support additional computations and a more general set of instructions. I demonstrate that the unit can improve the performance of three hierarchical search data structures in approximate nearest neighbors search algorithms and a B-tree key-value store index. For a minimal extension to the existing unit, HSU improves the state-of-the-art GPU approximate nearest neighbor implementation by an average of 24.8% using the GPU's general computing interface.</p>
|
597 |
The development of a method to assist in the transformation from procedural languages to object oriented languages with specific reference to COBOL and JAVAWing, Jeanette Wendy January 2002 (has links)
Thesis (M.Tech.: Computer Studies)-Dept. of Computer Science, Durban Institute of Technology, 2002. / Computer programming has been a science for approximately 50 years. It this time there havebeen two major paradigm shifts that have taken place. The first was from “spaghetti code” to structured programs. The second paradigm shift is from procedural programs to object oriented programs. The change in paradigm involves a change in the way in which a problem is approached, can be solved, as well as a difference in the language that is used.
The languages that were chosen to be studied, are COBOL and Java. These programming languages were identified as key languages, and the languages that software development are the most reliant on. COBOL, the procedural language for existing business systems, and Java the object oriented language, the most likely to be used for future development.
To complete this study, both languages were studied in detail. The similarities and differences between the programming languages are discussed. Some key issues that a COBOL programmer has to keep in mind when moving to Java were identified.
|
598 |
Alternative Approaches to Correction of Malapropisms in AIML Based Conversational AgentsBrock, Walter A. 26 November 2014 (has links)
The use of Conversational Agents (CAs) utilizing Artificial Intelligence Markup Language (AIML) has been studied in a number of disciplines. Previous research has shown a great deal of promise. It has also documented significant limitations in the abilities of these CAs. Many of these limitations are related specifically to the method employed by AIML to resolve ambiguities in the meaning and context of words. While methods exist to detect and correct common errors in spelling and grammar of sentences and queries submitted by a user, one class of input error that is particularly difficult to detect and correct is the malapropism. In this research a malapropism is defined a "verbal blunder in which one word is replaced by another similar in sound but different in meaning" ("malapropism," 2013).
This research explored the use of alternative methods of correcting malapropisms in sentences input to AIML CAs using measures of Semantic Distance and tri-gram probabilities. Results of these alternate methods were compared against AIML CAs using only the Symbolic Reductions built into AIML.
This research found that the use of the two methodologies studied here did indeed lead to a small, but measurable improvement in the performance of the CA in terms of the appropriateness of its responses as classified by human judges. However, it was also noted that in a large number of cases, the CA simply ignored the existence of a malapropism altogether in formulating its responses. In most of these cases, the interpretation and response to the user's input was of such a general nature that one might question the overall efficacy of the AIML engine. The answer to this question is a matter for further study.
|
599 |
A model checker for the LF systemGerber, Erick D. B. 03 1900 (has links)
Thesis (MSc)--University of Stellenbosch, 2007. / ENGLISH ABSTRACT: Computer aided veri cation techniques, such as model checking, can be used to improve the
reliability of software. Model checking is an algorithmic approach to illustrate the correctness
of temporal logic speci cations in the formal description of hardware and software systems.
In contrast to traditional testing tools, model checking relies on an exhaustive search of all
the possible con gurations that these systems may exhibit. Traditionally model checking is
applied to abstract or high level designs of software. However, often interpreting or translating
these abstract designs to implementations introduce subtle errors. In recent years one
trend in model checking has been to apply the model checking algorithm directly to the
implementations instead.
This thesis is concerned with building an e cient model checker for a small concurrent langauge
developed at the University of Stellenbosch. This special purpose langauge, LF, is
aimed at developement of small embedded systems. The design of the language was carefully
considered to promote safe programming practices. Furthermore, the language and its runtime
support system was designed to allow directly model checking LF programs. To achieve
this, the model checker extends the existing runtime support infrastructure to generate the
state space of an executing LF program. / AFRIKAANSE OPSOMMING: Rekenaar gebaseerde program toetsing, soos modeltoetsing, kan gebruik word om die betroubaarheid
van sagteware te verbeter. Model toetsing is 'n algoritmiese benadering om
die korrektheid van temporale logika spesi kasies in die beskrywing van harde- of sagteware
te bewys. Anders as met tradisionlee program toetsing, benodig modeltoetsing 'n volledige
ondersoek van al die moontlike toestande waarin so 'n beskrywing homself kan bevind. Model
toetsing word meestal op abstrakte modelle van sagteware of die ontwerp toegepas. Indien
die ontwerp of model aan al die spesi kasies voldoen word die abstrakte model gewoontlik
vertaal na 'n implementasie. Die vertalings proses word gewoontlik met die hand gedoen
en laat ruimte om nuwe foute, en selfs foute wat uitgeskakel in die model of ontwerp is te
veroorsaak. Deesdae, is 'n gewilde benadering tot modeltoetsing om di e tegnieke direk op die
implementasie toe te pas, en sodoende die ekstra moeite van model konstruksie en vertaling
uit te skakel.
Hierdie tesis handel oor die ontwerp, implementasie en toetsing van 'n e ektiewe modeltoetser
vir 'n klein gelyklopende taal, LF, wat by die Universiteit van Stellenbosch ontwikkel is. Die
enkeldoelige taal, LF, is gemik op die veilige ontwikkeling van ingebedde sagteware. Die
taal is ontwerp om veilige programmerings praktyke aan te moedig. Verder is die taal en
die onderliggende bedryfstelsel so ontwerp om 'n model toetser te akkomodeer. Om die LF
programme direk te kan toets, is die model toetser 'n integrale deel van die bedryfstelsel sodat
dit die program kan aandryf om alle moontlike toestande te besoek.
|
600 |
Higher-order semantics for quantum programming languages with classical controlAtzemoglou, George Philip January 2012 (has links)
This thesis studies the categorical formalisation of quantum computing, through the prism of type theory, in a three-tier process. The first stage of our investigation involves the creation of the dagger lambda calculus, a lambda calculus for dagger compact categories. Our second contribution lifts the expressive power of the dagger lambda calculus, to that of a quantum programming language, by adding classical control in the form of complementary classical structures and dualisers. Finally, our third contribution demonstrates how our lambda calculus can be applied to various well known problems in quantum computation: Quantum Key Distribution, the quantum Fourier transform, and the teleportation protocol.
|
Page generated in 0.0213 seconds