151 |
A performance-efficient and practical processor error recovery frameworkSoman, Jyothish January 2019 (has links)
Continued reduction in the size of a transistor has affected the reliability of pro- cessors built using them. This is primarily due to factors such as inaccuracies while manufacturing, as well as non-ideal operating conditions, causing transistors to slow down consistently, eventually leading to permanent breakdown and erroneous operation of the processor. Permanent transistor breakdown, or faults, can occur at any point in time in the processor's lifetime. Errors are the discrepancies in the output of faulty circuits. This dissertation shows that the components containing faults can continue operating if the errors caused by them are within certain bounds. Further, the lifetime of a processor can be increased by adding supportive structures that start working once the processor develops these hard errors. This dissertation has three major contributions, namely REPAIR, FaultSim and PreFix. REPAIR is a fault tolerant system with minimal changes to the processor design. It uses an external Instruction Re-execution Unit (IRU) to perform operations, which the faulty processor might have erroneously executed. Instructions that are found to use faulty hardware are then re-executed on the IRU. REPAIR shows that the performance overhead of such targeted re-execution is low for a limited number of faults. FaultSim is a fast fault-simulator capable of simulating large circuits at the transistor level. It is developed in this dissertation to understand the effect of faults on different circuits. It performs digital logic based simulations, trading off analogue accuracy with speed, while still being able to support most fault models. A 32-bit addition takes under 15 micro-seconds, while simulating more than 1500 transistors. It can also be integrated into an architectural simulator, which added a performance overhead of 10 to 26 percent to a simulation. The results obtained show that single faults cause an error in an adder in less than 10 percent of the inputs. PreFix brings together the fault models created using FaultSim and the design directions found using REPAIR. PreFix performs re-execution of instructions on a remote core, which pick up instructions to execute using a global instruction buffer. Error prediction and detection are used to reduce the number of re-executed instructions. PreFix has an area overhead of 3.5 percent in the setup used, and the performance overhead is within 5 percent of a fault-free case. This dissertation shows that faults in processors can be tolerated without explicitly switching off any component, and minimal redundancy is sufficient to achieve the same.
|
152 |
Evolvable virtual machinesNowostawski, Mariusz, n/a January 2008 (has links)
The Evolvable Virtual Machine abstract architecture (EVMA) is a computational architecture for dynamic hierarchically organised virtual machines. The concrete EVM instantiation (EVMI) builds on traditional stack-based models of computation and extends them by notions of hierarchy and reflection on the virtual machine level. The EVM Universe is composed of a number of autonomous and asynchronously communicating EVM machines. The main contribution of this work lies in the new model of computation and in the architecture itself: a novel, compact, flexible and expressive representation of distributed concurrent computation. The EVMA provides a way of expressing and modelling auto-catalytic networks composed of a hierarchical hypercycle of autopoietic subsystems characterised by self-adaptable structural tendencies and self-organised criticality. EVMA provides capabilities for: a) self-learning of dynamical patterns through continuous observation of computable environments, b) self-compacting and generalisation of existing program structures, c) emergence of efficient and robust communication code through appropriate machine assembly on both ends of communication channel. EVMA is in one sense a multi-dimensional generalisation of stack machine with the purpose of modelling concurrent asynchronous processing. EVMA approach can be also seen as a meta-evolutionary theory of evolution. The EVMA is designed to model systems that mimic living autonomous and adaptable computational processes. The EVMI prototype has been designed and developed to conduct experimental studies on complex evolving systems. The generality of our approach not only provides the means to experiment with complex hierarchical, computational and evolutionary systems, but it provides a useful model to evaluate, share and discuss the complex hierarchical systems in general. The EVMA provides a novel methodology and language to pursue research, to understand and to talk about evolution of complexity in living systems. In this thesis, we present the simple single-cell EVMI framework, discuss the multi-cell EVM Universe architecture, present experimental results, and propose further extensions, experimental studies, and possible hardware implementations of the EVMI.
|
153 |
Efficient mapping of fast Fourier transform on the Cyclops-64 multithreaded architectureXue, Liping. January 2007 (has links)
Thesis (M.S.)--University of Delaware, 2007. / Principal faculty advisor: Guang R. Gao, Dept. of Electrical and Computer Engineering. Includes bibliographical references.
|
154 |
Fine-grain parallelism on sequential processorsKotikalapoodi, Sridhar V. 07 September 1994 (has links)
There seems to be a consensus that future Massively Parallel Architectures
will consist of a number nodes, or processors, interconnected by high-speed network.
Using a von Neumann style of processing within the node of a multiprocessor system
has its performance limited by the constraints imposed by the control-flow execution
model. Although the conventional control-flow model offers high performance on
sequential execution which exhibits good locality, switching between threads and synchronization
among threads causes substantial overhead. On the other hand, dataflow
architectures support rapid context switching and efficient synchronization but require
extensive hardware and do not use high-speed registers.
There have been a number of architectures proposed to combine the instruction-level
context switching capability with sequential scheduling. One such architecture
is Threaded Abstract Machine (TAM), which supports fine-grain interleaving of multiple
threads by an appropriate compilation strategy rather than through elaborate hardware.
Experiments on TAM have already shown that it is possible to implement the dataflow
execution model on conventional architectures and obtain reasonable performance.
These studies also show a basic mismatch between the requirements for fine-grain
parallelism and the underlying architecture and considerable improvement is possible through hardware support.
This thesis presents two design modifications to efficiently support fine-grain parallelism. First, a modification to the instruction set architecture is proposed to reduce the cost involved in scheduling and synchronization. The hardware modifications are kept to a minimum so as to not disturb the functionality of a conventional RISC processor. Second, a separate coprocessor is utilized to handle messages. Atomicity and message handling are handled efficiently, without compromising per-processor performance and system integrity. Clock cycles per TAM instruction is used as a measure to study the effectiveness of these changes. / Graduation date: 1995
|
155 |
Scheme86: A System for Interpreting SchemeBerlin, Andrew A., Wu, Henry M. 01 April 1988 (has links)
Scheme86 is a computer system designed to interpret programs written in the Scheme dialect of Lisp. A specialized architecture, coupled with new techniques for optimizing register management in the interpreter, allows Scheme86 to execute interpreted Scheme at a speed comparable to that of compiled Lisp on conventional workstations.
|
156 |
Framework for accessing CORBA objects with Internet as the backboneSethuraman, Meenakshi Sundar. January 2001 (has links) (PDF)
Thesis (M.S.)--University of Florida, 2001. / Title from first page of PDF file. Document formatted into pages; contains viii, 30 p.; also contains graphics. Vita. Includes bibliographical references (p. 29).
|
157 |
Exploring Virtualization Techniques for Branch Outcome PredictionSadooghi-Alvandi, Maryam 20 December 2011 (has links)
Modern processors use branch prediction to predict branch outcomes, in order to fetch ahead in the instruction stream, increasing concurrency and performance. Larger predictor tables can improve prediction accuracy, but come at the cost of larger area and longer access delay.
This work introduces a new branch predictor design that increases the perceived predictor capacity without increasing its delay, by using a large virtual second-level table allocated in the second-level caches. Virtualization is applied to a state-of-the-art multi- table branch predictor. We evaluate the design using instruction count as proxy for timing on a set of commercial workloads. For a predictor whose size is determined by access delay constraints rather than area, accuracy can be improved by 8.7%. Alternatively, the design can be used to achieve the same accuracy as a non-virtualized design while using 25% less dedicated storage.
|
158 |
Exploring Virtualization Techniques for Branch Outcome PredictionSadooghi-Alvandi, Maryam 20 December 2011 (has links)
Modern processors use branch prediction to predict branch outcomes, in order to fetch ahead in the instruction stream, increasing concurrency and performance. Larger predictor tables can improve prediction accuracy, but come at the cost of larger area and longer access delay.
This work introduces a new branch predictor design that increases the perceived predictor capacity without increasing its delay, by using a large virtual second-level table allocated in the second-level caches. Virtualization is applied to a state-of-the-art multi- table branch predictor. We evaluate the design using instruction count as proxy for timing on a set of commercial workloads. For a predictor whose size is determined by access delay constraints rather than area, accuracy can be improved by 8.7%. Alternatively, the design can be used to achieve the same accuracy as a non-virtualized design while using 25% less dedicated storage.
|
159 |
Architectural Support for Efficient Communication in Future MicroprocessorsJin, Yu Ho 16 January 2010 (has links)
Traditionally, the microprocessor design has focused on the computational aspects
of the problem at hand. However, as the number of components on a single chip
continues to increase, the design of communication architecture has become a crucial
and dominating factor in defining performance models of the overall system. On-chip
networks, also known as Networks-on-Chip (NoC), emerged recently as a promising
architecture to coordinate chip-wide communication.
Although there are numerous interconnection network studies in an inter-chip
environment, an intra-chip network design poses a number of substantial challenges
to this well-established interconnection network field. This research investigates designs
and applications of on-chip interconnection network in next-generation microprocessors
for optimizing performance, power consumption, and area cost. First,
we present domain-specific NoC designs targeted to large-scale and wire-delay dominated
L2 cache systems. The domain-specifically designed interconnect shows 38%
performance improvement and uses only 12% of the mesh-based interconnect. Then,
we present a methodology of communication characterization in parallel programs
and application of characterization results to long-channel reconfiguration. Reconfigured
long channels suited to communication patterns enhance the latency of the
mesh network by 16% and 14% in 16-core and 64-core systems, respectively. Finally,
we discuss an adaptive data compression technique that builds a network-wide frequent value pattern map and reduces the packet size. In two examined multi-core
systems, cache traffic has 69% compressibility and shows high value sharing among
flows. Compression-enabled NoC improves the latency by up to 63% and saves energy
consumption by up to 12%.
|
160 |
Framework for accessing CORBA objects with Internet as the backboneSethuraman, Meenakshi Sundar. January 2001 (has links) (PDF)
Thesis (M.S.)--University of Florida, 2001. / Title from first page of PDF file. Document formatted into pages; contains viii, 30 p.; also contains graphics. Vita. Includes bibliographical references (p. 29).
|
Page generated in 0.3305 seconds