Global ETD Search

101	Algorithms for compiler-assisted design space exploration of clustered VLIW ASIP datapaths / Lapinskii, Viktor, January 2001 (has links) Thesis (Ph. D.)--University of Texas at Austin, 2001. / Vita. Includes bibliographical references (leaves 72-77). Available also in a digital version from Dissertation Abstracts.
102	The Lagniappe programming environment Riché, Taylor Louis, 1978- 31 August 2012 (has links) Multicore, multithreaded processors are rapidly becoming the platform of choice for designing high-throughput request processing applications. We refer to this class of modern parallel architectures as multi-[star] systems. In this dissertation, we describe the design and implementation of Lagniappe, a programming environment that simplifies the development of portable, high-throughput request-processing applications on multi-[star] systems. Lagniappe makes the following four key contributions: First, Lagniappe defines and uses a unique hybrid programming model for this domain that separates the concerns of writing applications for uni-processor, single-threaded execution platforms (single-[star]systems) from the concerns of writing applications necessary to efficiently execute on a multi-[star] system. We provide separate tools to the programmer to address each set of concerns. Second, we present meta-models of applications and multi-[star] systems that identify the necessary entities for reasoning about the application domain and multi-[star] platforms. Third, we design and implement a platform-independent mechanism called the load-distributing channel that factors out the key functionality required for moving an application from a single-[star] architecture to a multi-[star] one. Finally, we implement a platform-independent adaptation framework that defines custom adaptation policies from application and system characteristics to change resource allocations with changes in workload. Furthermore, applications written in the Lagniappe programming environment are portable; we separate the concerns of application programming from system programming in the programming model. We implement Lagniappe on a cluster of servers each with multiple multicore processors. We demonstrate the effectiveness of Lagniappe by implementing several stateful request-processing applications and showing their performance on our multi-[star] system. / text Computer architecture High performance computing
103	Design of wide-issue high-frequency processors in wire delay dominated technologies Murukkathampoondi, Hrishikesh Sathyavasu 28 August 2008 (has links) Not available / text Microprocessors--Design and construction Computer architecture
104	A hybrid-scheduling approach for energy-efficient superscalar processors Valluri, Madhavi Gopal 28 August 2008 (has links) Not available / text Microprocessors--Design and construction Computer architecture
105	Distributed selective re-execution for EDGE architectures Desikan, Rajagopalan 28 August 2008 (has links) Not available / text Computer architecture High performance computing
106	Braids: out-of-order performance with almost in-order complexity / Out-of-order performance with almost in-order complexity Tseng, Francis, 1976- 29 August 2008 (has links) Not available Computer architecture Compilers (Computer programs)
107	Efficient simulation techniques for large-scale applications Huang, Jen-Cheng 21 September 2015 (has links) Architecture simulation is an important performance modeling approach. Modeling hardware components with sufficient detail helps architects to identify both hardware and software bottlenecks. However, the major issue of architectural simulation is the huge slowdown compared to native execution. The slowdown gets higher for the emerging workloads that feature high throughput and massive parallelism, such as GPGPU kernels. In this dissertation, three simulation techniques were proposed to simulate emerging GPGPU kernels and data analytic workloads efficiently. First, TBPoint reduce the simulated instructions of GPGPU kernels using the inter-launch and intra-launch sampling approaches. Second, GPUmech improves the simulation speed of GPGPU kernels by abstracting the simulation model using functional simulation and analytical modeling. Finally, SimProf applies stratified random sampling with performance counters to select representative simulation points for data analytic workloads to deal with data-dependent performance. This dissertation presents the techniques that can be used to simulate the emerging large-scale workloads accurately and efficiently. Simulation Computer architecture Performance modeling
108	Atomic block formation for explicit data graph execution architectures Maher, Bertrand Allen 13 December 2010 (has links) Limits on power consumption, complexity, and on-chip latency have focused computer architects on power-efficient designs that exploit parallelism. One approach divides programs into atomic blocks of operations that execute semi-independently, which efficiently creates a large window of potentially concurrent operations. This dissertation studies the intertwined roles of the compiler, architecture, and microarchitecture in achieving efficiency and high performance with a block-atomic architecture. For such an architecture to achieve high performance the compiler must form blocks effectively. The compiler must create large blocks of instructions to amortize the per-block overhead, but control flow and content restrictions limit the compiler's options. Block formation should consider factors such of frequency of execution, block size such as selecting control-flow paths that are frequently executed, and exploiting locality of computations to reduce communication overheads. This dissertation determines what characteristics of programs influence block formation and proposes techniques to generate effective blocks. The first contribution is a method for solving phase-ordering problems inherent to block formation, mitigating the tension between block-enlarging optimizations---if-conversion, tail duplication, loop unrolling, and loop peeling---as well as scalar optimizations. Given these optimizations, analysis shows that the remaining obstacles to creating larger blocks are inherent in the control flow structure of applications, and furthermore that any fixed block size entails a sizable amount of wasted space. To eliminate this overhead, this dissertation proposes an architectural implementation of variable-size blocks that allow the compiler to dramatically improve block efficiency. We use these mechanisms to develop policies for block formation that achieve high performance on a range of applications and processor configurations. We find that the best policies differ significantly depending on the number of participating cores. Using machine learning, we discover generalized policies for particular hardware configurations and find that the best policy varies significantly between applications and based on the number of parallel resources available in the microarchitecture. These results show that effective and efficient block-atomic execution is possible when the compiler and microarchitecture are designed cooperatively. / text Computer architecture Compilers Block formation
109	Delay-sensitive branch predictors for future technologies Jiménez, Daniel Angel, 1969- 04 May 2011 (has links) Not available / text Microprocessors--Design and construction Computer architecture
110	Workload balancing in parallel video encoding 朱啓祥, Chu, Kai-cheung. January 2000 (has links) published_or_final_version / Electrical and Electronic Engineering / Master / Master of Philosophy Computer networks - Workload. Computer architecture.

Search results