Global ETD Search

1	A Run-Time Loop Parallelization Technique on Shared-Memory Multiprocessor Systems Wu, Chi-Fan 06 July 2000 (has links) High performance computing power is important for the current advanced calculations of scientific applications. A multiprocessor system obtains its high performance from the fact that some computations can proceed in parallel. A parallelizing compiler can take a sequential program as input and automatically translate it into parallel form for the target multiprocessor system. But when loops with arrays of irregular, nonlinear or dynamic access patterns, no any current parallelizing compiler can determine whether data dependences exist at compile-time. Thus a run-time parallel algorithm will be utilized to determine dependences and extract the potential parallelism of loops. In this thesis, we propose an efficient run-time parallelization technique to compute a proper parallel execution schedule in those loops. This new method first detects immediate predecessor iterations of each loop iteration and constructs an immediate predecessor table, then efficiently schedules the whole loop iterations into wavefronts for parallel execution. According to either theoretical analysis or experimental results, our new run-time parallelization technique reveals high speedup and low processing overhead. Furthermore, this new technique is appropriate to implement on multiprocessor systems due to the characteristics of high scalability. Run-time parallelization Parallelizing compiler Multiprocessor system Wavefront scheduling
2	Split Latency Allocator: Process Variation-Aware Register Access Latency Boost in a Near-Threshold Graphics Processing Unit Pal, Asmita 01 August 2018 (has links) Over the last decade, Graphics Processing Units (GPUs) have been used extensively in gaming consoles, mobile phones, workstations and data centers, as they have exhibited immense performance improvement over CPUs, in graphics intensive applications. Due to their highly parallel architecture, general purpose GPUs (GPGPUs) have gained the foreground in applications where large data blocks can be processed in parallel. However, the performance improvement is constrained by a large power consumption. Likewise, Near Threshold Computing (NTC) has emerged as an energy-efficient design paradigm. Hence, operating GPUs at NTC seems like a plausible solution to counteract the high energy consumption. This work investigates the challenges associated with NTC operation of GPUs and proposes a low-power GPU design, Split Latency Allocator, to sustain the performance of GPGPU applications. GPU Near Threshold Computing Process Variation Register Access Latency Wavefront Scheduling Computer Engineering

Search results

A Run-Time Loop Parallelization Technique on Shared-Memory Multiprocessor Systems

Split Latency Allocator: Process Variation-Aware Register Access Latency Boost in a Near-Threshold Graphics Processing Unit