Spelling suggestions: "subject:"data flow computing"" "subject:"mata flow computing""
21 |
Efficient fault tolerance for pipelined structures and its application to superscalar and dataflow machinesMizan, Elias, 1976- 10 October 2012 (has links)
Silicon reliability has reemerged as a very important problem in digital system design. As voltage and device dimensions shrink, combinational logic is becoming sensitive to temporary errors caused by single event upsets, transistor and interconnect aging and circuit variability. In particular, computational functional units are very challenging to protect because current redundant execution techniques have a high power and area overhead, cannot guarantee detection of some errors and cause a substantial performance degradation. As traditional worst-case design rules that guarantee error avoidance become too conservative to be practical, new microarchitectures need to be investigated to address this problem. To this end, this dissertation introduces Self-Imposed Temporal Redundancy (SITR), a speculative microarchitectural temporal redundancy technique suitable for pipelined computational functional units. SITR is able to detect most temporary errors, is area and energy-efficient and can be easily incorporated in an out-of-order microprocessor. SITR can also be used as a throttling mechanism against thermal viruses and, in some cases, allows designers to design very aggressive bypass networks capable of achieving high instruction throughput, by tolerating timing violations. To address the performance degradation caused by redundant execution, this dissertation proposes using a tiled-data ow model of computation because it enables the design of scalable, resource-rich computational substrates. Starting with the WaveScalar tiled-data flow architecture, we enhance the reliability of its datapath, including computational logic, interconnection network and storage structures. Computations are performed speculatively using SITR while traditional information redundancy techniques are used to protect data transmission and storage. Once a value has been verified, confirmation messages are transmitted to consumer instructions. Upon error detection, nullification messages are sent to the instructions affected by the error. Our experimental results demonstrate that the slowdown due to redundant computation and error recovery on the tiled-data flow machine is consistently smaller than on a superscalar von Neumann architecture. However, the number of additional messages required to support SITR execution is substantial, increasing power consumption. To reduce this overhead without significantly affecting performance, we introduce wave-based speculation, a mechanism targeted for data flow architectures that enables speculation only when it is likely to benefit performance. / text
|
22 |
A spectral method for mapping dataflow graphsElling, Volker Wilhelm January 1998 (has links)
No description available.
|
23 |
Capsules: expressing composable computations in a parallel programming modelMandviwala, Hasnain A. 01 October 2008 (has links)
A well-known problem in designing high-level parallel programming models and languages is the "granularity problem", where the execution of parallel tasks that are too fine grain incur large overheads in the parallel runtime and adversely affect the speed-up that can be achieved by parallel execution. On the other hand, tasks that are too coarse-grain create load imbalance and do not adequately utilize the parallel machine. In this work we attempt to address the issue of granularity with a concept of expressing "composable computations" within a parallel programming model called "Capsules".
In Capsules, we provide a unifying framework that allows composition and adjustment of granularity for both data and computation over iteration space and computation space.
The Capsules model not only allows the user to express the decision on granularity of execution, but also the decision on the granularity of garbage collection (and therefore, the aggressiveness of the GC optimization), and other features that may be supported by the programming model. We argue that this adaptability of execution granularity leads to efficient parallel execution by matching the available application concurrency to the available hardware concurrency,
thereby reducing parallelization overhead. By matching, we refer to creating coarsegrain
Computation Capsules that encompass multiple instances of fine-grain computation instances. In effect, creating coarse-grain computations reduces overhead by simply reducing the number of parallel computations. Reducing parallel computation instances in turn leads to: (1) Reduced synchronization cost such as that required to access and search in shared data-structures; (2) Reduced distribution and scheduling cost for parallel computation
instances; and (3) Reduced book-keeping costs consisting of maintain data-structures such as blocked lists for unfulfilled data requests.
Capsules builds on our prior work, TStreams, a data-flow oriented parallel programming framework. Our results on an CMP/SMP machine using real vision applications such as the Cascade Face Detector, and the Stereo Vision Depth applications, and other synthetic applications show benefits in application performance. We use profiling to help determine optimal coarse-grain serial execution granularity, and provide empirical proof that adjusting execution granularity reduces parallelization overhead to yield maximum application performance.
|
24 |
A micro data flow (MDF) : a data flow approach to self-timed VLSI system design for DSPMerani, Lalit T. 24 August 1993 (has links)
Synchronization is one of the important issues in digital system design. While
other approaches have been intriguing, up until now a globally clocked timing
discipline has been the dominant design philosophy. However, we have reached the
point, with advances in technology, where other options should be given serious
consideration. VLSI promises great processing power at low cost. This increase in
computation power has been obtained by scaling the digital IC process. But as this
scaling continues, it is doubtful that the advantages of faster devices can be fully
exploited. This is because the clock periods are getting much smaller in relation to the
interconnect propagation delays, even within a single chip and certainly at the board and
backplane level.
In this thesis, some alternative approaches to synchronization in digital system
design are described and developed. We owe these techniques to a long history of
effort in both digital computational system design as well as digital communication
system design. The latter field is relevant because large propagation delays have always
been a dominant consideration in its design methods.
Asynchronous design gives better performance than comparable synchronous
design in situations for which a global synchronization with a high speed clock
becomes a constraint for greater system throughput. Asynchronous circuits with
unbounded gate delays, or self-timed digital circuit can be designed by employing either
of two request-acknowledge protocols 4-cycle and 2-cycle.
We will also present an alternative approach to the problem of mapping
computation algorithms directly into asynchronous circuits. Data flow graph or
language is used to describe the computation algorithms. The data flow primitives have
been designed using both the 2-cycle and 4-cycle signaling schemes which are
compared in terms of performance and transistor count. The 2-cycle implementations
prove to be better than their 4-cycle counterparts.
A promising application of self-timed design is in high performance DSP
systems. Since there is no global constraint of clock distribution, localized forwardonly
connection allows computation to be extended and sped up using pipelining. A
decimation filter was designed and simulated to check the system level performance of
the two protocols. Simulations were carried out using VHDL for high level definition
of the design. The simulation results will demonstrate not only the efficacy of our
synthesis procedure but also the improved efficiency of the 2-cycle scheme over the 4-
cycle scheme. / Graduation date: 1994
|
25 |
Flow grammars: a methodology for automatically constructing static analyzersUhl, James S. 12 June 2018 (has links)
A new control flow model called flow grammars is introduced which unifies the treatment of intraprocedural and interprocedural control flow. This model provides excellent support for the rapid prototyping of flow analyzers. Flow grammars are an easily understood, easily constructed and flexible representation of control flow, forming an effective bridge between the usual control flow graph model of traditional compilers and the continuation passing style of denotational semantics. A flow grammar semantics is given which is shown to summarize the effects all possible executions generated by a flow grammar conservatively. Various interpretations of flow grammars for data flow analysis are explored, including a novel bidirectional interprocedural variant. Several algorithms, based on a similar technique called grammar flow analysis, for solving the equations arising from the interpretations are given. Flow grammars were developed as a basis for FACT (Flow Analysis Compiler Tool), a compiler construction tool for the automatic construction of flow analyzers. Several important analyses from the literature are cast in the flow grammar framework and their implementation in a FACT prototype is discussed. / Graduate
|
26 |
Using Dataflow Optimization Techniques with a Monadic Intermediate LanguageBailey, Justin George 01 January 2012 (has links)
Our work applies the dataflow algorithm to an area outside its traditional scope: functional languages. Our approach relies on a monadic intermediate language that provides low-level, imperative features like computed jumps and explicit allocations, while at the same time supporting high-level, functional-language features like case discrimination and partial application. We prototyped our work in Haskell using the HOOPL library and this dissertation shows numerous examples demonstrating its use. We prove the efficacy of our approach by giving a novel description of the uncurrying optimization in terms of the dataflow algorithm, as well as a complete implementation of the optimization using HOOPL.
|
27 |
Dataflow Processing in Memory Achieves Significant Energy EfficiencyShelor, Charles F. 08 1900 (has links)
The large difference between processor CPU cycle time and memory access time, often referred to as the memory wall, severely limits the performance of streaming applications. Some data centers have shown servers being idle three out of four clocks. High performance instruction sequenced systems are not energy efficient. The execute stage of even simple pipeline processors only use 9% of the pipeline's total energy. A hybrid dataflow system within a memory module is shown to have 7.2 times the performance with 368 times better energy efficiency than an Intel Xeon server processor on the analyzed benchmarks.
The dataflow implementation exploits the inherent parallelism and pipelining of the application to improve performance without the overhead functions of caching, instruction fetch, instruction decode, instruction scheduling, reorder buffers, and speculative execution used by high performance out-of-order processors. Coarse grain reconfigurable logic in an energy efficient silicon process provides flexibility to implement multiple algorithms in a low energy solution. Integrating the logic within a 3D stacked memory module provides lower latency and higher bandwidth access to memory while operating independently from the host system processor.
|
Page generated in 0.0817 seconds