101 |
The morphing architecture : runtime evolution of distributed applications /Williams, Nicholas P. January 2000 (has links) (PDF)
Thesis (Ph. D.)--University of Queensland, 2001. / Includes bibliographical references.
|
102 |
Algorithms for compiler-assisted design space exploration of clustered VLIW ASIP datapaths /Lapinskii, Viktor, January 2001 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2001. / Vita. Includes bibliographical references (leaves 72-77). Available also in a digital version from Dissertation Abstracts.
|
103 |
The Lagniappe programming environmentRiché, Taylor Louis, 1978- 31 August 2012 (has links)
Multicore, multithreaded processors are rapidly becoming the platform of choice for designing high-throughput request processing applications. We refer to this class of modern parallel architectures as multi-[star] systems. In this dissertation, we describe the design and implementation of Lagniappe, a programming environment that simplifies the development of portable, high-throughput request-processing applications on multi-[star] systems. Lagniappe makes the following four key contributions: First, Lagniappe defines and uses a unique hybrid programming model for this domain that separates the concerns of writing applications for uni-processor, single-threaded execution platforms (single-[star]systems) from the concerns of writing applications necessary to efficiently execute on a multi-[star] system. We provide separate tools to the programmer to address each set of concerns. Second, we present meta-models of applications and multi-[star] systems that identify the necessary entities for reasoning about the application domain and multi-[star] platforms. Third, we design and implement a platform-independent mechanism called the load-distributing channel that factors out the key functionality required for moving an application from a single-[star] architecture to a multi-[star] one. Finally, we implement a platform-independent adaptation framework that defines custom adaptation policies from application and system characteristics to change resource allocations with changes in workload. Furthermore, applications written in the Lagniappe programming environment are portable; we separate the concerns of application programming from system programming in the programming model. We implement Lagniappe on a cluster of servers each with multiple multicore processors. We demonstrate the effectiveness of Lagniappe by implementing several stateful request-processing applications and showing their performance on our multi-[star] system. / text
|
104 |
Design of wide-issue high-frequency processors in wire delay dominated technologiesMurukkathampoondi, Hrishikesh Sathyavasu 28 August 2008 (has links)
Not available / text
|
105 |
A hybrid-scheduling approach for energy-efficient superscalar processorsValluri, Madhavi Gopal 28 August 2008 (has links)
Not available / text
|
106 |
Distributed selective re-execution for EDGE architecturesDesikan, Rajagopalan 28 August 2008 (has links)
Not available / text
|
107 |
Braids: out-of-order performance with almost in-order complexity / Out-of-order performance with almost in-order complexityTseng, Francis, 1976- 29 August 2008 (has links)
Not available
|
108 |
Efficient simulation techniques for large-scale applicationsHuang, Jen-Cheng 21 September 2015 (has links)
Architecture simulation is an important performance modeling approach. Modeling hardware components with sufficient detail helps architects to identify both hardware and software bottlenecks. However, the major issue of architectural simulation is the huge slowdown compared to native execution. The slowdown gets higher for the emerging workloads that feature high throughput and massive parallelism, such as GPGPU kernels. In this dissertation, three simulation techniques were proposed to simulate emerging GPGPU kernels and data
analytic workloads efficiently. First, TBPoint reduce the simulated instructions of GPGPU kernels using the inter-launch and intra-launch sampling approaches. Second, GPUmech improves the simulation speed of GPGPU kernels by abstracting the simulation model using functional simulation and analytical modeling. Finally, SimProf applies stratified random sampling with performance counters to select representative simulation points for data analytic workloads to deal with data-dependent performance. This dissertation presents the techniques that can be used to simulate the emerging large-scale workloads accurately and efficiently.
|
109 |
Atomic block formation for explicit data graph execution architecturesMaher, Bertrand Allen 13 December 2010 (has links)
Limits on power consumption, complexity, and on-chip latency have
focused computer architects on power-efficient designs that exploit
parallelism. One approach divides programs into atomic blocks of
operations that execute semi-independently, which efficiently creates
a large window of potentially concurrent operations. This
dissertation studies the intertwined roles of the compiler,
architecture, and microarchitecture in achieving efficiency and high
performance with a block-atomic architecture.
For such an architecture to achieve high performance the compiler must
form blocks effectively. The compiler must create large blocks of
instructions to amortize the per-block overhead, but control flow and
content restrictions limit the compiler's options. Block formation
should consider factors such of frequency of execution, block size
such as selecting control-flow paths that are frequently executed, and
exploiting locality of computations to reduce communication overheads.
This dissertation determines what characteristics of programs
influence block formation and proposes techniques to generate
effective blocks. The first contribution is a method for solving
phase-ordering problems inherent to block formation, mitigating the
tension between block-enlarging optimizations---if-conversion, tail
duplication, loop unrolling, and loop peeling---as well as scalar
optimizations. Given these optimizations, analysis shows that the
remaining obstacles to creating larger blocks are inherent in the
control flow structure of applications, and furthermore that any fixed
block size entails a sizable amount of wasted space. To eliminate
this overhead, this dissertation proposes an architectural
implementation of variable-size blocks that allow the compiler to
dramatically improve block efficiency.
We use these mechanisms to develop policies for block formation that
achieve high performance on a range of applications and processor
configurations. We find that the best policies differ significantly
depending on the number of participating cores. Using machine
learning, we discover generalized policies for particular hardware
configurations and find that the best policy varies significantly
between applications and based on the number of parallel resources
available in the microarchitecture. These results show that effective
and efficient block-atomic execution is possible when the compiler and
microarchitecture are designed cooperatively. / text
|
110 |
Delay-sensitive branch predictors for future technologiesJiménez, Daniel Angel, 1969- 04 May 2011 (has links)
Not available / text
|
Page generated in 0.0779 seconds