Global ETD Search

11	Dynamic execution prediction and pipeline balancing of streaming applications Aleen, Farhana Afroz 30 August 2010 (has links) The number and scope of data driven streaming applications is growing. Such streaming applications are promising targets for effectively utilizing multi-cores because of their inherent amenability to pipelined parallelism. While existing methods of orchestrating streaming programs on multi-cores have mostly been static, real-world applications show ample variations in execution time that may cause the achieved speedup and throughput to be sub-optimal. One of the principle challenges for moving towards dynamic pipeline balancing has been the lack of approaches that can predict upcoming dynamic variations in execution efficiently, well before they occur. In this thesis, we propose an automated dynamic execution behavior prediction approach based on compiler analysis that can be used to efficiently estimate the time to be spent in different pipeline stages for upcoming inputs. Our approach first uses dynamic taint analysis to automatically generate an input-based execution characterization of the streaming program, which identifies the key control points where variation in execution might occur with respect to the associated input elements. We then automatically generate a light-weight emulator from the program using this characterization that can predict the execution paths taken for new streaming inputs and provide execution time estimates and possible dynamic variations. The main challenge in devising such an approach is the essential trade-off between accuracy and overhead of dynamic analysis. We present experimental evidence that our technique can accurately and efficiently estimate dynamic execution behaviors for several benchmarks with a small error rate. We also showed that the error rate could be lowered with the trade-off of execution overhead by implementing a selective symbolic expression generation for each of the complex conditions of control-flow operations. Our experiments show that dynamic pipeline balancing using our predicted execution behavior can achieve considerably higher speedup and throughput along with more effective utilization of multi-cores than static balancing approaches. Dynamic scheduling Software pipelining Multimedia systems Computers, Pipeline Pipelining (Electronics)
12	Efficient fault tolerance for pipelined structures and its application to superscalar and dataflow machines Mizan, Elias, 1976- 10 October 2012 (has links) Silicon reliability has reemerged as a very important problem in digital system design. As voltage and device dimensions shrink, combinational logic is becoming sensitive to temporary errors caused by single event upsets, transistor and interconnect aging and circuit variability. In particular, computational functional units are very challenging to protect because current redundant execution techniques have a high power and area overhead, cannot guarantee detection of some errors and cause a substantial performance degradation. As traditional worst-case design rules that guarantee error avoidance become too conservative to be practical, new microarchitectures need to be investigated to address this problem. To this end, this dissertation introduces Self-Imposed Temporal Redundancy (SITR), a speculative microarchitectural temporal redundancy technique suitable for pipelined computational functional units. SITR is able to detect most temporary errors, is area and energy-efficient and can be easily incorporated in an out-of-order microprocessor. SITR can also be used as a throttling mechanism against thermal viruses and, in some cases, allows designers to design very aggressive bypass networks capable of achieving high instruction throughput, by tolerating timing violations. To address the performance degradation caused by redundant execution, this dissertation proposes using a tiled-data ow model of computation because it enables the design of scalable, resource-rich computational substrates. Starting with the WaveScalar tiled-data flow architecture, we enhance the reliability of its datapath, including computational logic, interconnection network and storage structures. Computations are performed speculatively using SITR while traditional information redundancy techniques are used to protect data transmission and storage. Once a value has been verified, confirmation messages are transmitted to consumer instructions. Upon error detection, nullification messages are sent to the instructions affected by the error. Our experimental results demonstrate that the slowdown due to redundant computation and error recovery on the tiled-data flow machine is consistently smaller than on a superscalar von Neumann architecture. However, the number of additional messages required to support SITR execution is substantial, increasing power consumption. To reduce this overhead without significantly affecting performance, we introduce wave-based speculation, a mechanism targeted for data flow architectures that enables speculation only when it is likely to benefit performance. / text Fault-tolerant computing Computers, Pipeline--Reliability Data flow computing Computer architecture
13	Reconfigurable pipelined datapaths / Cronquist, Darren C. January 1999 (has links) Thesis (Ph. D.)--University of Washington, 1999. / Vita. Includes bibliographical references (p. 189-193).
14	Switch-based Fast Fourier Transform processor Mohd, Bassam Jamil, January 1900 (has links) Thesis (Ph. D.)--University of Texas at Austin, 2008. / Vita. Includes bibliographical references.
15	Switch-based Fast Fourier Transform processor Mohd, Bassam Jamil, 1968- 05 October 2012 (has links) The demand for high-performance and power scalable DSP processors for telecommunication and portable devices has increased significantly in recent years. The Fast Fourier Transform (FFT) computation is essential to such designs. This work presents a switch-based architecture to design radix-2 FFT processors. The processor employs M processing elements, 2M memory arrays and M Read Only Memories (ROMs). One processing element performs one radix-2 butterfly operation. The memory arrays are designed as single-port memory, where each has a size of N/(2M); N is the number of FFT points. Compared with a single processing element, this approach provides a speedup of M. If not addressed, memory collisions degrade the processor performance. A novel algorithm to detect and resolve the collisions is presented. When a collision is detected, a memory management operation is executed. The performance of the switch architecture can be further enhanced by pipelining the design, where each pipeline stage employs a switch component. The result is a speedup of Mlog2N compared with a single processing element performance. The utilization of single-port memory reduces the design complexities and area. Furthermore, memory arrays significantly reduce power compared with the delay elements used in some FFT processors. The switch-based architecture facilitates deactivating processing elements for power scalability. It also facilitates implementing different FFT sizes. The VLSI implementation of a non-pipeline switch-based processor is presented. Matlab simulations are conducted to analyze the performance. The timing, power and area results from RTL, synthesis and layout simulations are discussed and compared with other processors. / text Microprocessors--Design and construction Computer architecture Memory management (Computer science) Computer storage devices

Page generated in 0.0657 seconds