Global ETD Search

1	Modular Objective-C run-time library / Modular Objective-C run-time library Váša, Kryštof January 2013 (has links) This thesis contains analysis of currently available Objective-C run-time libraries (GCC, Apple and Étoilé run-times), their prerequisites and dependencies on the particular platform and operating system. The result of the analysis is a design of a modular run-time library that allows dynamic configuration of each component for the particular need (e.g. disabling run-time locks in a single-threaded environment). The resulting design can also be easily ported to other atypical platforms (e.g. kernel, or an experimental OS) and extended feature-wise (e.g. adding support for Objective-C categories, or associated objects). A prototype implementation of such a modular run-time for Objective-C also is included. Objective-C; run-time
2	Speculative parallelization of partially parallel loops Dang, Francis Hoai Dinh 15 May 2009 (has links) Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. In our previous work, we have speculatively executed a loop as a doall, and applied a fully parallel data dependence test to determine if it had any cross–processor depen- dences. If the test failed, then the loop was re–executed serially. While this method exploits doall parallelism well, it can cause slowdowns for loops with even one cross- processor flow dependence because we have to re-execute sequentially. Moreover, the existing, partial parallelism of loops is not exploited. We demonstrate a generalization of the speculative doall parallelization tech- nique, called the Recursive LRPD test, that can extract and exploit the maximum available parallelism of any loop and that limits potential slowdowns to the over- head of the run-time dependence test itself. In this thesis, we have presented the base algorithm and an analysis of the different heuristics for its practical applica- tion. To reduce the run-time overhead of the Recursive LRPD test, we have im- plemented on-demand checkpointing and commit, more efficient data dependence analysis and shadow structures, and feedback-guided load balancing. We obtained scalable speedups for loops from Track, Spice, and FMA3D that were not paralleliz- able by previous speculative parallelization methods. Run time parallelization optimization
3	Design and implementation of reconfigurable DSP circuit architectures on FPGA Heron, Jean-Paul Stephen January 1998 (has links) No description available. 621.3192 Run time reconfiguration
4	Predictable Run Time Scheduling Torenvliet, Nick 19 December 2005 (has links) <p> Hybrid task-lists are sets of periodic and asynchronous processes. To verifiably schedule hybrid tasks-lists with hard and soft real-time requirements, Xu and Lam proposed Integrated Pre-Run-Time scheduling (IPRTS) [13], a two phase method that first makes use of pre-run-time scheduling techniques, converting some asynchronous tasks with hard deadlines to periodic tasks and reserving processor capacity for the remaining hard deadline asynchronous tasks. These remaining asynchronous tasks are scheduled by a novel run-time scheduler that enforces arbitrary exclusion relations between any combination of periodic and asynchronous processes. The technique has two significant drawbacks: (i) a custom run-time scheduler is required that is not available on existing Real-Time Operating Systems (RTOS) and (ii) in many circumstances the reservation of processor capacity is overly pessimistic, causing the failure of the method for many simple task lists. To overcome these drawbacks, this thesis narrows the set of task-lists considered to those where the asynchronous tasks exclude periodic tasks and periodic processes do not exclude asynchronous tasks. A high priority polling server is then used to handle all hard asynchronous tasks. In cases where the method succeeds, it is easily implementable on any RTOS that has priority based scheduling with phased release times, and inherits the error handling and soft real-time process scheduling capabilities of the RTOS. A set of software tools which partially automates the technique, including an open source implementation of the Xu-Parnas pre-run-time scheduling algorithm [14], has been developed and applied to the examples in the thesis.</p> / Thesis / Master of Applied Science (MASc)
5	VLSI Implementation of a Run-time Configurable Computing Integrated Circuit - The Stallion Chip He, Yingchun 22 July 1998 (has links) Reconfigurable computing architectures are gaining popularity as a replacement for general-purpose architectures for many high performance embedded applications. These machines support parallel computation and direct the data from the producers of an intermediate result to the consumers over custom pathways. The Wormhole Run-time Reconfigurable (RTR) computing architecture is a concept developed at Virginia Tech to address the weaknesses of contemporary FPGAs for configurable computing. The Stallion chip is a full-custom configurable computing "FPGA"-like integrated circuit with a coarse grained nature. Based on the result of the first generation device, the Colt chip, the Stallion chip is a follow-up configurable computing chip. This thesis focuses on the VLSI layout implementation of the Stallion chip. Effort has been made to explain many facts and advantages of the Wormhole Configurable Computing Machine (CCM). Design techniques, strategies, circuit characterization, performance estimation, and ways to solve problems when using CAD layout design tools are illustrated. / Master of Science run-time configurable computing layout VLSI
6	Run-time optimization of adaptive irregular applications Yu, Hao 15 November 2004 (has links) Compared to traditional compile-time optimization, run-time optimization could oﬀer signiﬁcant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identiﬁed a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Speciﬁcally, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm speciﬁed by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems. compiler optimizations adaptive optimization performance modeling run-time parallelization run-time optimization reduction parallelization
7	Run-time optimization of adaptive irregular applications Yu, Hao 15 November 2004 (has links) Compared to traditional compile-time optimization, run-time optimization could oﬀer signiﬁcant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identiﬁed a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Speciﬁcally, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm speciﬁed by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems. compiler optimizations adaptive optimization performance modeling run-time parallelization run-time optimization reduction parallelization
8	A Run-Time Loop Parallelization Technique on Shared-Memory Multiprocessor Systems Wu, Chi-Fan 06 July 2000 (has links) High performance computing power is important for the current advanced calculations of scientific applications. A multiprocessor system obtains its high performance from the fact that some computations can proceed in parallel. A parallelizing compiler can take a sequential program as input and automatically translate it into parallel form for the target multiprocessor system. But when loops with arrays of irregular, nonlinear or dynamic access patterns, no any current parallelizing compiler can determine whether data dependences exist at compile-time. Thus a run-time parallel algorithm will be utilized to determine dependences and extract the potential parallelism of loops. In this thesis, we propose an efficient run-time parallelization technique to compute a proper parallel execution schedule in those loops. This new method first detects immediate predecessor iterations of each loop iteration and constructs an immediate predecessor table, then efficiently schedules the whole loop iterations into wavefronts for parallel execution. According to either theoretical analysis or experimental results, our new run-time parallelization technique reveals high speedup and low processing overhead. Furthermore, this new technique is appropriate to implement on multiprocessor systems due to the characteristics of high scalability. Run-time parallelization Parallelizing compiler Multiprocessor system Wavefront scheduling
9	Run-time assurance via real time trajectory generation and transverse dynamics regulation law Alhani, Fatema H. 03 1900 (has links) In safety-critical environments, it is crucial to have a backup strategy the system can turn to when facing a potentially unsafe situation. Run-time assurance provides a reliable methodology as a backup strategy. This work introduces a new framework for Run-time assurance, by generating trajectories in real-time using an optimal trajectory generation algorithm, then tracking the trajectory using transverse dynamics to design a feedback control law tailored for each trajectory generated. The generated trajectories are treated as safety backup trajectories that are only executed and followed by the plant if deemed necessary by the Run-time assurance logic. By using the Run-time assurance mechanism the system’s safety is ensured regardless of the behavior of the primary controller for the system with some constraints on the system. The framework assumes full knowledge of the environment and the system dynamics, while treating the trajectory generation part as a black box. Run-time assurance Transverse Dynamics Real Time Trajectory Generation
10	Optimizing Distributed Transactions: Speculative Client Execution, Certified Serializability, and High Performance Run-Time Pandey, Utkarsh 01 September 2016 (has links) On-line services already form an important part of modern life with an immense potential for growth. Most of these services are supported by transactional systems, which are backed by database management systems (DBMS) in many cases. Many on-line services use replication to ensure high-availability, fault tolerance and scalability. Replicated systems typically consist of different nodes running the service co-ordinated by a distributed algorithm which aims to drive all the nodes along the same sequence of states by providing a total order to their operations. Thus optimization of both local DBMS operations through concurrency control and the distributed algorithm driving replicated services can lead to enhancing the performance of the on-line services. Deferred Update Replication (DUR) is a well-known approach to design scalable replicated systems. In this method, the database is fully replicated on each distributed node. User threads perform transactions locally and optimistically before a total order is reached. DUR based systems find their best usage when remote transactions rarely conflict. Even in such scenarios, transactions may abort due to local contention on nodes. A generally adopted method to alleviate the local contention is to invoke a local certification phase to check if a transaction conflicts with other local transactions already completed. If so, the given transaction is aborted locally without burdening the ordering layer. However, this approach still results in many local aborts which significantly degrades the performance. The first main contribution of this thesis is PXDUR, a DUR based transactional system, which enhances the performance of DUR based systems by alleviating local contention and increasing the transaction commit rate. PXDUR alleviates local contention by allowing speculative forwarding of shared objects from locally committed transactions awaiting total order to running transactions. PXDUR allows transactions running in parallel to use speculative forwarding, thereby enabling the system to utilize the highly parallel multi-core platforms. PXDUR also enhances the performance by optimizing the transaction commit process. It allows the committing transactions to skip read-set validation when it is safe to do so. PXDUR achieves performance gains of an order of magnitude over closest competitors under favorable conditions. Transactions also form an important part of centralized DBMS, which tend to support multi-threaded access to utilize the highly parallel hardware platforms. The applications can be wrapped in transactions which can then access the DBMS as per the rules of concurrency control. This allows users to develop applications that can run on DBMSs without worrying about synchronization. texttt{Serializability} is the de-facto standard form of isolation required by transactions for many applications. The existing methods employed by DBMSs to enforce serializability employ explicit fine-grained locking. The eager-locking based approach is pessimistic and can be too conservative for many applications. The locking approach can severely limit the performance of DBMSs especially for scenarios with moderate to high contention. This leads to the second major contribution of this thesis is TSAsR, an adaptive transaction processing framework, which can be applied to DBMSs to improve performance. TSAsR allows the DBMS's internal synchronization to be more relaxed and enforces serializability through the processng of external meta-data in an optimistic manner. It does not require any changes in the application code and achieves orders of magnitude performance improvements for high and moderate contention cases. The replicated transaction processing systems require a distributed algorithm to keep the system consistent by ensuring that each node executes the same sequence of deterministic commands. These algorithms generally employ texttt{State Machine Replication (SMR)}. Enhancing the performance of such algorithms is a potential way to increase the performance of distributed systems. However, developing new SMR algorithms is limited in production settings because of the huge verification cost involved in proving their correctness. There are frameworks that allow easy specification of SMR algorithms and subsequent verification. However, algorithms implemented in such framework, give poor performance. This leads to the third major contribution of this thesis Verified JPaxos, a JPaxos based runtime system which can be integrated to an easy to verify I/O automaton based on Multipaxos protocol. Multipaxos is specified in Higher Order Logic (HOL) for ease of verification which is used to generates executable code representing the Multipaxos state changes (I/O Automaton). The runtime drives the HOL generated code and interacts with the service and network to create a fully functional replicated Multipaxos system. The runtime inherits its design from JPaxos along with some optimizations. It achieves significant improvement over a state-of-art SMR verification framework while still being comparable to the performance of non-verified systems. / Master of Science Distributed systems Transactions Databases Verification Run-time Concurrency Paxos Replication.

Search results