Global ETD Search

101	Multiplexed pipelining : a cost effective loop transformation technique Pai, Satish 01 January 1992 (has links) Parallel processing has gained increasing importance over the last few years. A key aim of parallel processing is to improve the execution times of scientific programs by mapping them to many processors. Loops form an important part of most computational programs and must be processed efficiently to get superior performance in terms of execution times. Important examples of such programs include graphics algorithms, matrix operations (which are used in signal processing and image processing applications), particle simulation, and other scientific applications. Pipelining uses overlapped parallelism to efficiently reduce execution time. Computer programming Multiprocessors Electrical and Computer Engineering
102	Optimization and enhancement strategies for data flow systems Dunkelman, Laurence William. January 1984 (has links) No description available. Data structures (Computer science) Multiprocessors.
103	Multiport memory as a medium for interprocessor communication in multiprocessors / by Nasser Asgari. Asgari, Nasser January 2003 (has links) "February 2003" / Includes bibliography (leaves 192-203) / xix, 203 leaves : ill. ; 30 cm. / Title page, contents and abstract only. The complete thesis in print form is available from the University Library. / Thesis (Ph.D.)--University of Adelaide, Dept. of Electrical and Electronic Engineering, 2003 Multiprocessors Computer network architectures
104	A study of hardware/software multithreading Carlson, Ryan L. 04 June 1998 (has links) As the design of computers advances, two important trends have surfaced: The exploitation of parallelism and the design against memory latency. Into these two new trends has come the Multithreaded Virtual Processor (MVP). Based on a standard superscalar core, the MVP is able to exploit both Instruction Level Parallelism (ILP) and, utilizing the concepts of multithreading, is able to further exploit Thread Level Parallelism (TLP) in program code. By combining both hardware and software multithreading techniques into a new hybrid model, the MVP is able to use fast hardware context switching techniques along with both hardware and software scheduling. The new hybrid creates a processor capable of exploiting long memory latency operations to increase parallelism, while introducing both minimal software overhead and hardware design changes. This thesis will explore the MVP model and simulator and provide results that illustrate MVP's effectiveness and demonstrate its recommendation to be included in future processor designs. Additionally, the thesis will show that MVP's effectiveness is governed by four main considerations: (1) The data set size relative to the cache size, (2) the number of hardware contexts/threads supported, (3) the amount of locality within the data sets, and (4) the amount of exploitable parallelism within the algorithms. / Graduation date: 1999 Threads (Computer programs) Multiprocessors
105	Automatic program restructuring for distributed memory multicomputers Ikei, Mitsuru 04 1900 (has links) (PDF) M.S. / Computer Science and Engineering / To compile a Single Program Multiple Data (SPMD) program for a Distributed Memory Multicomputer (DMMC), we need to find data that can be processed in parallel in the program and we need to distribute the data among processors such that the interprocessor communication becomes reasonably small. Loop restructuring is needed for finding parallelism in imperative programs and array alignment is one effective step to reduce interprocessor communication caused by array references. Automatic conversion of imperative programs using these two program restructuring steps has been implemented in the Tiny loop restructuring tool. The restructuring strategy is derived by translating the way that the compiler uses for the functional language Crystal, to the imperative language Tiny. Although an imperative language can have more varied loop structures than a functional language and it is more difficult to select the optimal one, we can get a loop structure which is comparable to Crystal. We also can find array alignment preference (temporal + spatial) relations in a Tiny source program and add a new construct, the align statement, to Tiny to express the array alignment preferences. In this thesis, we discuss these program restructuring strategies which we used for Tiny by comparison with Crystal.
106	Program allocation for hypercube based dataflow systems Freytag, Vincent R. 18 March 1993 (has links) The dataflow model of computation differs from the traditional control-flow model of computation in that it does not utilize a program counter to sequence instructions in a program. Instead, the execution of instructions is based solely on the availability of their operands. Thus, an instruction is executed in a dataflow computer when all of its operands are available. This asynchronous nature of the dataflow model of computation allows the exploitation of fine-grain parallelism inherent in programs. Although the dataflow model of computation exploits parallelism, the problem of optimally allocating a program to processors belongs to the class of NP-complete problems. Therefore, one of the major issues facing designers of dataflow multiprocessors is the proper allocation of programs to processors. The problem of program allocation lies in maximizing parallelism while minimizing interprocessor communication costs. The culmination of research in the area of program allocation has produced the proposed method called the Balanced Layered Allocation Scheme that utilizes heuristic rules to strike a balance between computation time and communication costs in dataflow multiprocessors. Specifically, the proposed allocation scheme utilizes Critical Path and Longest Directed Path heuristics when allocating instructions to processors. Simulation studies indicate that the proposed scheme is effective in reducing the overall execution time of a program by considering the effects of communication costs on computation times. / Graduation date: 1993 Data flow computing Hypercube Multiprocessors
107	MPSoC simulation and implementation of KPN applications Cheung, Chun Shing. January 2009 (has links) Thesis (Ph. D.)--University of California, Riverside, 2009. / Includes abstract. Title from first page of PDF file (viewed March 8, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 123-137). Also issued in print.
108	Fair and high performance shared memory resource management Ebrahimi, Eiman 31 January 2012 (has links) Chip multiprocessors (CMPs) commonly share a large portion of memory system resources among different cores. Since memory requests from different threads executing on different cores significantly interfere with one another in these shared resources, the design of the shared memory subsystem is crucial for achieving high performance and fairness. Inter-thread memory system interference has different implications based on the type of workload running on a CMP. In multi-programmed workloads, different applications can experience significantly different slowdowns. If left uncontrolled, large disparities in slowdowns result in low system performance and make system software's priority-based thread scheduling policies ineffective. In a single multi-threaded application, memory system interference between threads of the same application can slow each thread down significantly. Most importantly, the critical path of execution can also be significantly slowed down, resulting in increased application execution time. This dissertation proposes three mechanisms that address different shortcomings of current shared resource management techniques targeted at multi-programmed workloads, and one mechanism which speeds up a single multi-threaded application by managing main-memory related interference between its different threads. With multi-programmed workloads, the key idea is that both demand- and prefetch-caused inter-application interference should be taken into account in shared resource management techniques across the entire shared memory system. Our evaluations demonstrate that doing so significantly improves both system performance and fairness compared to the state-of-the-art. When executing a single multi-threaded application on a CMP, the key idea is to take into account the inter-dependence of threads in memory scheduling decisions. Our evaluation shows that doing so significantly reduces the execution time of the multi-threaded application compared to using state-of-the-art memory schedulers designed for multi-programmed workloads. This dissertation concludes that the performance and fairness of CMPs can be significantly improved by better management of inter-thread interference in the shared memory resources, both for multi-programmed workloads and multi-threaded applications. / text Chip multiprocessors Memory system Interference Fairness System performance
109	A technology-scalable composable architecture Kim, Changkyu 28 August 2008 (has links) Not available / text Computer architecture Computer storage devices Memory management (Computer science) Multiprocessors
110	Fault-tolerant real-time multiprocessor scheduling Srinivasan, Anand 26 August 2015 (has links) Graduate Real-time data processing Real-time programming Multiprocessors

Search results