• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 19
  • 4
  • 1
  • Tagged with
  • 27
  • 27
  • 27
  • 17
  • 7
  • 6
  • 6
  • 5
  • 5
  • 5
  • 4
  • 4
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Efficient fault tolerance for pipelined structures and its application to superscalar and dataflow machines

Mizan, Elias, 1976- 10 October 2012 (has links)
Silicon reliability has reemerged as a very important problem in digital system design. As voltage and device dimensions shrink, combinational logic is becoming sensitive to temporary errors caused by single event upsets, transistor and interconnect aging and circuit variability. In particular, computational functional units are very challenging to protect because current redundant execution techniques have a high power and area overhead, cannot guarantee detection of some errors and cause a substantial performance degradation. As traditional worst-case design rules that guarantee error avoidance become too conservative to be practical, new microarchitectures need to be investigated to address this problem. To this end, this dissertation introduces Self-Imposed Temporal Redundancy (SITR), a speculative microarchitectural temporal redundancy technique suitable for pipelined computational functional units. SITR is able to detect most temporary errors, is area and energy-efficient and can be easily incorporated in an out-of-order microprocessor. SITR can also be used as a throttling mechanism against thermal viruses and, in some cases, allows designers to design very aggressive bypass networks capable of achieving high instruction throughput, by tolerating timing violations. To address the performance degradation caused by redundant execution, this dissertation proposes using a tiled-data ow model of computation because it enables the design of scalable, resource-rich computational substrates. Starting with the WaveScalar tiled-data flow architecture, we enhance the reliability of its datapath, including computational logic, interconnection network and storage structures. Computations are performed speculatively using SITR while traditional information redundancy techniques are used to protect data transmission and storage. Once a value has been verified, confirmation messages are transmitted to consumer instructions. Upon error detection, nullification messages are sent to the instructions affected by the error. Our experimental results demonstrate that the slowdown due to redundant computation and error recovery on the tiled-data flow machine is consistently smaller than on a superscalar von Neumann architecture. However, the number of additional messages required to support SITR execution is substantial, increasing power consumption. To reduce this overhead without significantly affecting performance, we introduce wave-based speculation, a mechanism targeted for data flow architectures that enables speculation only when it is likely to benefit performance. / text
22

A spectral method for mapping dataflow graphs

Elling, Volker Wilhelm January 1998 (has links)
No description available.
23

Capsules: expressing composable computations in a parallel programming model

Mandviwala, Hasnain A. 01 October 2008 (has links)
A well-known problem in designing high-level parallel programming models and languages is the "granularity problem", where the execution of parallel tasks that are too fine grain incur large overheads in the parallel runtime and adversely affect the speed-up that can be achieved by parallel execution. On the other hand, tasks that are too coarse-grain create load imbalance and do not adequately utilize the parallel machine. In this work we attempt to address the issue of granularity with a concept of expressing "composable computations" within a parallel programming model called "Capsules". In Capsules, we provide a unifying framework that allows composition and adjustment of granularity for both data and computation over iteration space and computation space. The Capsules model not only allows the user to express the decision on granularity of execution, but also the decision on the granularity of garbage collection (and therefore, the aggressiveness of the GC optimization), and other features that may be supported by the programming model. We argue that this adaptability of execution granularity leads to efficient parallel execution by matching the available application concurrency to the available hardware concurrency, thereby reducing parallelization overhead. By matching, we refer to creating coarsegrain Computation Capsules that encompass multiple instances of fine-grain computation instances. In effect, creating coarse-grain computations reduces overhead by simply reducing the number of parallel computations. Reducing parallel computation instances in turn leads to: (1) Reduced synchronization cost such as that required to access and search in shared data-structures; (2) Reduced distribution and scheduling cost for parallel computation instances; and (3) Reduced book-keeping costs consisting of maintain data-structures such as blocked lists for unfulfilled data requests. Capsules builds on our prior work, TStreams, a data-flow oriented parallel programming framework. Our results on an CMP/SMP machine using real vision applications such as the Cascade Face Detector, and the Stereo Vision Depth applications, and other synthetic applications show benefits in application performance. We use profiling to help determine optimal coarse-grain serial execution granularity, and provide empirical proof that adjusting execution granularity reduces parallelization overhead to yield maximum application performance.
24

A micro data flow (MDF) : a data flow approach to self-timed VLSI system design for DSP

Merani, Lalit T. 24 August 1993 (has links)
Synchronization is one of the important issues in digital system design. While other approaches have been intriguing, up until now a globally clocked timing discipline has been the dominant design philosophy. However, we have reached the point, with advances in technology, where other options should be given serious consideration. VLSI promises great processing power at low cost. This increase in computation power has been obtained by scaling the digital IC process. But as this scaling continues, it is doubtful that the advantages of faster devices can be fully exploited. This is because the clock periods are getting much smaller in relation to the interconnect propagation delays, even within a single chip and certainly at the board and backplane level. In this thesis, some alternative approaches to synchronization in digital system design are described and developed. We owe these techniques to a long history of effort in both digital computational system design as well as digital communication system design. The latter field is relevant because large propagation delays have always been a dominant consideration in its design methods. Asynchronous design gives better performance than comparable synchronous design in situations for which a global synchronization with a high speed clock becomes a constraint for greater system throughput. Asynchronous circuits with unbounded gate delays, or self-timed digital circuit can be designed by employing either of two request-acknowledge protocols 4-cycle and 2-cycle. We will also present an alternative approach to the problem of mapping computation algorithms directly into asynchronous circuits. Data flow graph or language is used to describe the computation algorithms. The data flow primitives have been designed using both the 2-cycle and 4-cycle signaling schemes which are compared in terms of performance and transistor count. The 2-cycle implementations prove to be better than their 4-cycle counterparts. A promising application of self-timed design is in high performance DSP systems. Since there is no global constraint of clock distribution, localized forwardonly connection allows computation to be extended and sped up using pipelining. A decimation filter was designed and simulated to check the system level performance of the two protocols. Simulations were carried out using VHDL for high level definition of the design. The simulation results will demonstrate not only the efficacy of our synthesis procedure but also the improved efficiency of the 2-cycle scheme over the 4- cycle scheme. / Graduation date: 1994
25

Flow grammars: a methodology for automatically constructing static analyzers

Uhl, James S. 12 June 2018 (has links)
A new control flow model called flow grammars is introduced which unifies the treatment of intraprocedural and interprocedural control flow. This model provides excellent support for the rapid prototyping of flow analyzers. Flow grammars are an easily understood, easily constructed and flexible representation of control flow, forming an effective bridge between the usual control flow graph model of traditional compilers and the continuation passing style of denotational semantics. A flow grammar semantics is given which is shown to summarize the effects all possible executions generated by a flow grammar conservatively. Various interpretations of flow grammars for data flow analysis are explored, including a novel bidirectional interprocedural variant. Several algorithms, based on a similar technique called grammar flow analysis, for solving the equations arising from the interpretations are given. Flow grammars were developed as a basis for FACT (Flow Analysis Compiler Tool), a compiler construction tool for the automatic construction of flow analyzers. Several important analyses from the literature are cast in the flow grammar framework and their implementation in a FACT prototype is discussed. / Graduate
26

Using Dataflow Optimization Techniques with a Monadic Intermediate Language

Bailey, Justin George 01 January 2012 (has links)
Our work applies the dataflow algorithm to an area outside its traditional scope: functional languages. Our approach relies on a monadic intermediate language that provides low-level, imperative features like computed jumps and explicit allocations, while at the same time supporting high-level, functional-language features like case discrimination and partial application. We prototyped our work in Haskell using the HOOPL library and this dissertation shows numerous examples demonstrating its use. We prove the efficacy of our approach by giving a novel description of the uncurrying optimization in terms of the dataflow algorithm, as well as a complete implementation of the optimization using HOOPL.
27

Dataflow Processing in Memory Achieves Significant Energy Efficiency

Shelor, Charles F. 08 1900 (has links)
The large difference between processor CPU cycle time and memory access time, often referred to as the memory wall, severely limits the performance of streaming applications. Some data centers have shown servers being idle three out of four clocks. High performance instruction sequenced systems are not energy efficient. The execute stage of even simple pipeline processors only use 9% of the pipeline's total energy. A hybrid dataflow system within a memory module is shown to have 7.2 times the performance with 368 times better energy efficiency than an Intel Xeon server processor on the analyzed benchmarks. The dataflow implementation exploits the inherent parallelism and pipelining of the application to improve performance without the overhead functions of caching, instruction fetch, instruction decode, instruction scheduling, reorder buffers, and speculative execution used by high performance out-of-order processors. Coarse grain reconfigurable logic in an energy efficient silicon process provides flexibility to implement multiple algorithms in a low energy solution. Integrating the logic within a 3D stacked memory module provides lower latency and higher bandwidth access to memory while operating independently from the host system processor.

Page generated in 0.1067 seconds