Global ETD Search

151	A performance-efficient and practical processor error recovery framework Soman, Jyothish January 2019 (has links) Continued reduction in the size of a transistor has affected the reliability of pro- cessors built using them. This is primarily due to factors such as inaccuracies while manufacturing, as well as non-ideal operating conditions, causing transistors to slow down consistently, eventually leading to permanent breakdown and erroneous operation of the processor. Permanent transistor breakdown, or faults, can occur at any point in time in the processor's lifetime. Errors are the discrepancies in the output of faulty circuits. This dissertation shows that the components containing faults can continue operating if the errors caused by them are within certain bounds. Further, the lifetime of a processor can be increased by adding supportive structures that start working once the processor develops these hard errors. This dissertation has three major contributions, namely REPAIR, FaultSim and PreFix. REPAIR is a fault tolerant system with minimal changes to the processor design. It uses an external Instruction Re-execution Unit (IRU) to perform operations, which the faulty processor might have erroneously executed. Instructions that are found to use faulty hardware are then re-executed on the IRU. REPAIR shows that the performance overhead of such targeted re-execution is low for a limited number of faults. FaultSim is a fast fault-simulator capable of simulating large circuits at the transistor level. It is developed in this dissertation to understand the effect of faults on different circuits. It performs digital logic based simulations, trading off analogue accuracy with speed, while still being able to support most fault models. A 32-bit addition takes under 15 micro-seconds, while simulating more than 1500 transistors. It can also be integrated into an architectural simulator, which added a performance overhead of 10 to 26 percent to a simulation. The results obtained show that single faults cause an error in an adder in less than 10 percent of the inputs. PreFix brings together the fault models created using FaultSim and the design directions found using REPAIR. PreFix performs re-execution of instructions on a remote core, which pick up instructions to execute using a global instruction buffer. Error prediction and detection are used to reduce the number of re-executed instructions. PreFix has an area overhead of 3.5 percent in the setup used, and the performance overhead is within 5 percent of a fault-free case. This dissertation shows that faults in processors can be tolerated without explicitly switching off any component, and minimal redundancy is sufficient to achieve the same.
152	Evolvable virtual machines Nowostawski, Mariusz, n/a January 2008 (has links) The Evolvable Virtual Machine abstract architecture (EVMA) is a computational architecture for dynamic hierarchically organised virtual machines. The concrete EVM instantiation (EVMI) builds on traditional stack-based models of computation and extends them by notions of hierarchy and reflection on the virtual machine level. The EVM Universe is composed of a number of autonomous and asynchronously communicating EVM machines. The main contribution of this work lies in the new model of computation and in the architecture itself: a novel, compact, flexible and expressive representation of distributed concurrent computation. The EVMA provides a way of expressing and modelling auto-catalytic networks composed of a hierarchical hypercycle of autopoietic subsystems characterised by self-adaptable structural tendencies and self-organised criticality. EVMA provides capabilities for: a) self-learning of dynamical patterns through continuous observation of computable environments, b) self-compacting and generalisation of existing program structures, c) emergence of efficient and robust communication code through appropriate machine assembly on both ends of communication channel. EVMA is in one sense a multi-dimensional generalisation of stack machine with the purpose of modelling concurrent asynchronous processing. EVMA approach can be also seen as a meta-evolutionary theory of evolution. The EVMA is designed to model systems that mimic living autonomous and adaptable computational processes. The EVMI prototype has been designed and developed to conduct experimental studies on complex evolving systems. The generality of our approach not only provides the means to experiment with complex hierarchical, computational and evolutionary systems, but it provides a useful model to evaluate, share and discuss the complex hierarchical systems in general. The EVMA provides a novel methodology and language to pursue research, to understand and to talk about evolution of complexity in living systems. In this thesis, we present the simple single-cell EVMI framework, discuss the multi-cell EVM Universe architecture, present experimental results, and propose further extensions, experimental studies, and possible hardware implementations of the EVMI. virtual computer systems computer architecture system design
153	Efficient mapping of fast Fourier transform on the Cyclops-64 multithreaded architecture Xue, Liping. January 2007 (has links) Thesis (M.S.)--University of Delaware, 2007. / Principal faculty advisor: Guang R. Gao, Dept. of Electrical and Computer Engineering. Includes bibliographical references.
154	Fine-grain parallelism on sequential processors Kotikalapoodi, Sridhar V. 07 September 1994 (has links) There seems to be a consensus that future Massively Parallel Architectures will consist of a number nodes, or processors, interconnected by high-speed network. Using a von Neumann style of processing within the node of a multiprocessor system has its performance limited by the constraints imposed by the control-flow execution model. Although the conventional control-flow model offers high performance on sequential execution which exhibits good locality, switching between threads and synchronization among threads causes substantial overhead. On the other hand, dataflow architectures support rapid context switching and efficient synchronization but require extensive hardware and do not use high-speed registers. There have been a number of architectures proposed to combine the instruction-level context switching capability with sequential scheduling. One such architecture is Threaded Abstract Machine (TAM), which supports fine-grain interleaving of multiple threads by an appropriate compilation strategy rather than through elaborate hardware. Experiments on TAM have already shown that it is possible to implement the dataflow execution model on conventional architectures and obtain reasonable performance. These studies also show a basic mismatch between the requirements for fine-grain parallelism and the underlying architecture and considerable improvement is possible through hardware support. This thesis presents two design modifications to efficiently support fine-grain parallelism. First, a modification to the instruction set architecture is proposed to reduce the cost involved in scheduling and synchronization. The hardware modifications are kept to a minimum so as to not disturb the functionality of a conventional RISC processor. Second, a separate coprocessor is utilized to handle messages. Atomicity and message handling are handled efficiently, without compromising per-processor performance and system integrity. Clock cycles per TAM instruction is used as a measure to study the effectiveness of these changes. / Graduation date: 1995 Computer architecture
155	Scheme86: A System for Interpreting Scheme Berlin, Andrew A., Wu, Henry M. 01 April 1988 (has links) Scheme86 is a computer system designed to interpret programs written in the Scheme dialect of Lisp. A specialized architecture, coupled with new techniques for optimizing register management in the interpreter, allows Scheme86 to execute interpreted Scheme at a speed comparable to that of compiled Lisp on conventional workstations. Scheme Lisp computer architecture interpretive techniques
156	Framework for accessing CORBA objects with Internet as the backbone Sethuraman, Meenakshi Sundar. January 2001 (has links) (PDF) Thesis (M.S.)--University of Florida, 2001. / Title from first page of PDF file. Document formatted into pages; contains viii, 30 p.; also contains graphics. Vita. Includes bibliographical references (p. 29).
157	Exploring Virtualization Techniques for Branch Outcome Prediction Sadooghi-Alvandi, Maryam 20 December 2011 (has links) Modern processors use branch prediction to predict branch outcomes, in order to fetch ahead in the instruction stream, increasing concurrency and performance. Larger predictor tables can improve prediction accuracy, but come at the cost of larger area and longer access delay. This work introduces a new branch predictor design that increases the perceived predictor capacity without increasing its delay, by using a large virtual second-level table allocated in the second-level caches. Virtualization is applied to a state-of-the-art multi- table branch predictor. We evaluate the design using instruction count as proxy for timing on a set of commercial workloads. For a predictor whose size is determined by access delay constraints rather than area, accuracy can be improved by 8.7%. Alternatively, the design can be used to achieve the same accuracy as a non-virtualized design while using 25% less dedicated storage. branch prediction virtualization computer architecture 0544 0984
158	Exploring Virtualization Techniques for Branch Outcome Prediction Sadooghi-Alvandi, Maryam 20 December 2011 (has links) Modern processors use branch prediction to predict branch outcomes, in order to fetch ahead in the instruction stream, increasing concurrency and performance. Larger predictor tables can improve prediction accuracy, but come at the cost of larger area and longer access delay. This work introduces a new branch predictor design that increases the perceived predictor capacity without increasing its delay, by using a large virtual second-level table allocated in the second-level caches. Virtualization is applied to a state-of-the-art multi- table branch predictor. We evaluate the design using instruction count as proxy for timing on a set of commercial workloads. For a predictor whose size is determined by access delay constraints rather than area, accuracy can be improved by 8.7%. Alternatively, the design can be used to achieve the same accuracy as a non-virtualized design while using 25% less dedicated storage. branch prediction virtualization computer architecture 0544 0984
159	Architectural Support for Efficient Communication in Future Microprocessors Jin, Yu Ho 16 January 2010 (has links) Traditionally, the microprocessor design has focused on the computational aspects of the problem at hand. However, as the number of components on a single chip continues to increase, the design of communication architecture has become a crucial and dominating factor in defining performance models of the overall system. On-chip networks, also known as Networks-on-Chip (NoC), emerged recently as a promising architecture to coordinate chip-wide communication. Although there are numerous interconnection network studies in an inter-chip environment, an intra-chip network design poses a number of substantial challenges to this well-established interconnection network field. This research investigates designs and applications of on-chip interconnection network in next-generation microprocessors for optimizing performance, power consumption, and area cost. First, we present domain-specific NoC designs targeted to large-scale and wire-delay dominated L2 cache systems. The domain-specifically designed interconnect shows 38% performance improvement and uses only 12% of the mesh-based interconnect. Then, we present a methodology of communication characterization in parallel programs and application of characterization results to long-channel reconfiguration. Reconfigured long channels suited to communication patterns enhance the latency of the mesh network by 16% and 14% in 16-core and 64-core systems, respectively. Finally, we discuss an adaptive data compression technique that builds a network-wide frequent value pattern map and reduces the packet size. In two examined multi-core systems, cache traffic has 69% compressibility and shows high value sharing among flows. Compression-enabled NoC improves the latency by up to 63% and saves energy consumption by up to 12%.
160	Framework for accessing CORBA objects with Internet as the backbone Sethuraman, Meenakshi Sundar. January 2001 (has links) (PDF) Thesis (M.S.)--University of Florida, 2001. / Title from first page of PDF file. Document formatted into pages; contains viii, 30 p.; also contains graphics. Vita. Includes bibliographical references (p. 29).

Search results