Global ETD Search

1	Efficient data sharing Burrows, Michael January 1988 (has links) No description available. 005 Distributed computing systems
2	Implementation and comparison of numerical algorithms for the solution of linear systems using transputer networks Dias dos Santos, Jose January 1990 (has links) No description available. 519.5 Parallel computing systems
3	Reconfigurable cellular automata computing for complex systems on the SPACE machine / George, David Frederick James. January 2005 (has links) Thesis (M.Sc.)--University of Western Australia, 2006. Adaptive computing systems.
4	The theory and applications of ringtree networks Xie, Hong January 1994 (has links) No description available. 621.39 Networks; Parallel computing systems
5	System-level design and configuration management for run-time reconfigurable devices / Qu, Yang. January 1900 (has links) (PDF) Thesis (doctoral)--Tampere University of Technology, 2007. / Includes bibliographical references (p. 115-133). Also available on the World Wide Web.
6	H-tree based configuration schemes for a reconfigurable DSP architecture Widjaja, Andy, January 2005 (has links) (PDF) Thesis (M.S. in computer science)--Washington State University. / Includes bibliographical references.
7	Remote sensing and imaging in a reconfigurable computing environment Aggarwal, Vikas. January 2005 (has links) Thesis (M.S.)--University of Florida, 2005. / Title from title page of source document. Document formatted into pages; contains 70 pages. Includes vita. Includes bibliographical references.
8	A medium-grain reconfigurable architecture for digital signal processing Myjak, Mitchell John. January 2006 (has links) (PDF) Thesis (Ph. D.)--Washington State University, May 2006. / Includes bibliographical references (p. 91-94).
9	CAD tool emulation for a two-level reconfigurable DSP architecture Skarpas, Daniel. January 2007 (has links) (PDF) Thesis (M.S. in computer science)--Washington State University, May 2007. / Includes bibliographical references (p. 36).
10	Mixed speculative multithreaded execution models Xekalakis, Polychronis January 2010 (has links) The current trend toward chip multiprocessor architectures has placed great pressure on programmers and compilers to generate thread-parallel programs. Improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), but, instead, via multithreaded execution. One notable technique that facilitates the extraction of parallel threads from sequential applications is thread-level speculation (TLS). This technique allows programmers/compilers to generate threads without checking for inter-thread data and control dependences, which are then transparently enforced by the hardware. Most prior work on TLS has concentrated on thread selection and mechanisms to efficiently support the main TLS operations, such as squashes, data versioning, and commits. This thesis seeks to enhance TLS functionality by combining it with other speculative multithreaded execution models. The main idea is that TLS already requires extensive hardware support, which when slightly augmented can accommodate other speculative multithreaded techniques. Recognizing that for different applications, or even program phases, the application bottlenecks may be different, it is reasonable to assume that the more versatile a system is, the more efficiently it will be able to execute the given program. As mentioned above, generating thread-parallel programs is hard and TLS has been suggested as an execution model that can speculatively exploit thread-level parallelism (TLP) even when thread independence cannot be guaranteed by the programmer/ compiler. Alternatively, the helper threads (HT) execution model has been proposed where subordinate threads are executed in parallel with a main thread in order to improve the execution efficiency (i.e., ILP) of the latter. Yet another execution model, runahead execution (RA), has also been proposed where subordinate versions of the main thread are dynamically created especially to cope with long-latency operations, again with the aim of improving the execution efficiency of the main thread (ILP). Each one of these multithreaded execution models works best for different applications and application phases. We combine these three models into a single execution model and single hardware infrastructure such that the system can dynamically adapt to find the most appropriate multithreaded execution model. More specifically, TLS is favored whenever successful parallel execution of instructions in multiple threads (i.e., TLP) is possible and the system can seamlessly transition at run-time to the other models otherwise. In order to understand the tradeoffs involved, we also develop a performance model that allows one to quantitatively attribute overall performance gains to either TLP or ILP in such combined multithreaded execution model. Experimental results show that our combined execution model achieves speedups of up to 41.2%, with an average of 10.2%, over an existing state-of-the-art TLS system and speedups of up to 35.2%, with an average of 18.3%, over a flavor of runahead execution for a subset of the SPEC2000 Integer benchmark suite. We then investigate how a common ILP-enhancingmicroarchitectural feature, namely branch prediction, interacts with TLS.We show that branch prediction for TLS is even more important than it is for single core machines. Unfortunately, branch prediction for TLS systems is also inherently harder. Code partitioning and re-executions of squashed threads pollute the branch history making it harder for predictors to be accurate. We thus propose to augment the hardware, so as to accommodate Multi-Path (MP) execution within the existing TLS protocol. Under the MP execution model, all paths following a number of hard-to-predict conditional branches are followed. MP execution thus, removes branches that would have been otherwise mispredicted helping in this way the processor to exploit more ILP. We show that with only minimal hardware support, one can combine these two execution models into a unified one, which can achieve far better performance than both TLS and MP execution. Experimental results show that our combied execution model achieves speedups of up to 20.1%, with an average of 8.8%, over an existing state-of-the-art TLS system and speedups of up to 125%, with an average of 29.0%, when compared with multi-path execution for a subset of the SPEC2000 Integer benchmark suite. Finally, Since systems that support speculative multithreading usually treat all threads equally, they are energy-inefficient. This inefficiency stems from the fact that speculation occasionally fails and, thus, power is spent on threads that will have to be discarded. We propose a profitability-based power allocation scheme, where we “steal” power from non-profitable threads and use it to speed up more useful ones. We evaluate our techniques for a state-of-the-art TLS system and show that, with minimalhardware support, we achieve improvements in ED of up to 25.5% with an average of 18.9%, for a subset of the SPEC 2000 Integer benchmark suite. 004

Search results