• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 461
  • 85
  • 80
  • 20
  • 10
  • 7
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 2
  • Tagged with
  • 815
  • 815
  • 178
  • 167
  • 156
  • 120
  • 117
  • 114
  • 110
  • 99
  • 94
  • 91
  • 83
  • 82
  • 78
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
101

Algorithms for compiler-assisted design space exploration of clustered VLIW ASIP datapaths /

Lapinskii, Viktor, January 2001 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2001. / Vita. Includes bibliographical references (leaves 72-77). Available also in a digital version from Dissertation Abstracts.
102

The Lagniappe programming environment

Riché, Taylor Louis, 1978- 31 August 2012 (has links)
Multicore, multithreaded processors are rapidly becoming the platform of choice for designing high-throughput request processing applications. We refer to this class of modern parallel architectures as multi-[star] systems. In this dissertation, we describe the design and implementation of Lagniappe, a programming environment that simplifies the development of portable, high-throughput request-processing applications on multi-[star] systems. Lagniappe makes the following four key contributions: First, Lagniappe defines and uses a unique hybrid programming model for this domain that separates the concerns of writing applications for uni-processor, single-threaded execution platforms (single-[star]systems) from the concerns of writing applications necessary to efficiently execute on a multi-[star] system. We provide separate tools to the programmer to address each set of concerns. Second, we present meta-models of applications and multi-[star] systems that identify the necessary entities for reasoning about the application domain and multi-[star] platforms. Third, we design and implement a platform-independent mechanism called the load-distributing channel that factors out the key functionality required for moving an application from a single-[star] architecture to a multi-[star] one. Finally, we implement a platform-independent adaptation framework that defines custom adaptation policies from application and system characteristics to change resource allocations with changes in workload. Furthermore, applications written in the Lagniappe programming environment are portable; we separate the concerns of application programming from system programming in the programming model. We implement Lagniappe on a cluster of servers each with multiple multicore processors. We demonstrate the effectiveness of Lagniappe by implementing several stateful request-processing applications and showing their performance on our multi-[star] system. / text
103

Design of wide-issue high-frequency processors in wire delay dominated technologies

Murukkathampoondi, Hrishikesh Sathyavasu 28 August 2008 (has links)
Not available / text
104

A hybrid-scheduling approach for energy-efficient superscalar processors

Valluri, Madhavi Gopal 28 August 2008 (has links)
Not available / text
105

Distributed selective re-execution for EDGE architectures

Desikan, Rajagopalan 28 August 2008 (has links)
Not available / text
106

Braids: out-of-order performance with almost in-order complexity / Out-of-order performance with almost in-order complexity

Tseng, Francis, 1976- 29 August 2008 (has links)
Not available
107

Efficient simulation techniques for large-scale applications

Huang, Jen-Cheng 21 September 2015 (has links)
Architecture simulation is an important performance modeling approach. Modeling hardware components with sufficient detail helps architects to identify both hardware and software bottlenecks. However, the major issue of architectural simulation is the huge slowdown compared to native execution. The slowdown gets higher for the emerging workloads that feature high throughput and massive parallelism, such as GPGPU kernels. In this dissertation, three simulation techniques were proposed to simulate emerging GPGPU kernels and data analytic workloads efficiently. First, TBPoint reduce the simulated instructions of GPGPU kernels using the inter-launch and intra-launch sampling approaches. Second, GPUmech improves the simulation speed of GPGPU kernels by abstracting the simulation model using functional simulation and analytical modeling. Finally, SimProf applies stratified random sampling with performance counters to select representative simulation points for data analytic workloads to deal with data-dependent performance. This dissertation presents the techniques that can be used to simulate the emerging large-scale workloads accurately and efficiently.
108

Atomic block formation for explicit data graph execution architectures

Maher, Bertrand Allen 13 December 2010 (has links)
Limits on power consumption, complexity, and on-chip latency have focused computer architects on power-efficient designs that exploit parallelism. One approach divides programs into atomic blocks of operations that execute semi-independently, which efficiently creates a large window of potentially concurrent operations. This dissertation studies the intertwined roles of the compiler, architecture, and microarchitecture in achieving efficiency and high performance with a block-atomic architecture. For such an architecture to achieve high performance the compiler must form blocks effectively. The compiler must create large blocks of instructions to amortize the per-block overhead, but control flow and content restrictions limit the compiler's options. Block formation should consider factors such of frequency of execution, block size such as selecting control-flow paths that are frequently executed, and exploiting locality of computations to reduce communication overheads. This dissertation determines what characteristics of programs influence block formation and proposes techniques to generate effective blocks. The first contribution is a method for solving phase-ordering problems inherent to block formation, mitigating the tension between block-enlarging optimizations---if-conversion, tail duplication, loop unrolling, and loop peeling---as well as scalar optimizations. Given these optimizations, analysis shows that the remaining obstacles to creating larger blocks are inherent in the control flow structure of applications, and furthermore that any fixed block size entails a sizable amount of wasted space. To eliminate this overhead, this dissertation proposes an architectural implementation of variable-size blocks that allow the compiler to dramatically improve block efficiency. We use these mechanisms to develop policies for block formation that achieve high performance on a range of applications and processor configurations. We find that the best policies differ significantly depending on the number of participating cores. Using machine learning, we discover generalized policies for particular hardware configurations and find that the best policy varies significantly between applications and based on the number of parallel resources available in the microarchitecture. These results show that effective and efficient block-atomic execution is possible when the compiler and microarchitecture are designed cooperatively. / text
109

Delay-sensitive branch predictors for future technologies

Jiménez, Daniel Angel, 1969- 04 May 2011 (has links)
Not available / text
110

Workload balancing in parallel video encoding

朱啓祥, Chu, Kai-cheung. January 2000 (has links)
published_or_final_version / Electrical and Electronic Engineering / Master / Master of Philosophy

Page generated in 0.0757 seconds