Global ETD Search

321	A VLSI parallel processor structure for scientific computing Tan, Wei-Siong 05 1900 (has links) No description available. Computer architecture Parallel computers
322	Retargetable compilation for variable-grain data-parallel execution in image processing Sander, Samuel Thomas 08 1900 (has links) No description available. Digital techniques Image processing Computer architecture
323	Effects of attenuation and blurring in cardiac SPECT and compensations using parallel computers DiBella, Edward V. R 08 1900 (has links) No description available. Tomography, Emission Photon beams Data processing Computer architecture
324	Putting checkpoints to work in thread level speculative execution Khan, Salman January 2010 (has links) With the advent of Chip Multi Processors (CMPs), improving performance relies on the programmers/compilers to expose thread level parallelism to the underlying hardware. Unfortunately, this is a difficult and error-prone process for the programmers, while state of the art compiler techniques are unable to provide significant benefits for many classes of applications. An interesting alternative is offered by systems that support Thread Level Speculation (TLS), which relieve the programmer and compiler from checking for thread dependencies and instead use the hardware to enforce them. Unfortunately, data misspeculation results in a high cost since all the intermediate results have to be discarded and threads have to roll back to the beginning of the speculative task. For this reason intermediate checkpointing of the state of the TLS threads has been proposed. When the violation does occur, we now have to roll back to a checkpoint before the violating instruction and not to the start of the task. However, previous work omits study of the microarchitectural details and implementation issues that are essential for effective checkpointing. Further, checkpoints have only been proposed and evaluated for a narrow class of benchmarks. This thesis studies checkpoints on a state of the art TLS system running a variety of benchmarks. The mechanisms required for checkpointing and the costs associated are described. Hardware modifications required for making checkpointed execution efficient in time and power are proposed and evaluated. Further, the need for accurately identifying suitable points for placing checkpoints is established. Various techniques for identifying these points are analysed in terms of both effectiveness and viability. This includes an extensive evaluation of data dependence prediction techniques. The results show that checkpointing thread level speculative execution results in consistent power savings, and for many benchmarks leads to speedups as well. 004
325	Improving processor efficiency by exploiting common-case behaviors of memory instructions Subramaniam, Samantika 02 January 2009 (has links) Processor efficiency can be described with the help of a number of desirable effects or metrics, for example, performance, power, area, design complexity and access latency. These metrics serve as valuable tools used in designing new processors and they also act as effective standards for comparing current processors. Various factors impact the efficiency of modern out-of-order processors and one important factor is the manner in which instructions are processed through the processor pipeline. In this dissertation research, we study the impact of load and store instructions (collectively known as memory instructions) on processor efficiency, and show how to improve efficiency by exploiting common-case or predictable patterns in the behavior of memory instructions. The memory behavior patterns that we focus on in our research are the predictability of memory dependences, the predictability in data forwarding patterns, predictability in instruction criticality and conservativeness in resource allocation and deallocation policies. We first design a scalable and high-performance memory dependence predictor and then apply accurate memory dependence prediction to improve the efficiency of the fetch engine of a simultaneous multi-threaded processor. We then use predictable data forwarding patterns to eliminate power-hungry hardware in the processor with no loss in performance. We then move to studying instruction criticality to improve processor efficiency. We study the behavior of critical load instructions and propose applications that can be optimized using predictable, load-criticality information. Finally, we explore conventional techniques for allocation and deallocation of critical structures that process memory instructions and propose new techniques to optimize the same. Our new designs have the potential to reduce the power and the area required by processors significantly without losing performance, which lead to efficient designs of processors. Power Performance Computer architecture Processor Multiprocessors Memory management (Computer science)
326	GCA: Global Congestion Awareness for Load Balance in Networks-on-Chip Ramakrishna, Mukund 2012 August 1900 (has links) As modern CMPs scale to ever increasing core counts, Networks-on-Chip (NoCs) are emerging as an interconnection fabric, enabling communication between components. While NoCs are easy to implement and provide high and scalable bandwidth, current routing algorithms, such as dimension-ordered routing, suffer from poor load balance, leading to reduced throughput and high latencies. Improving load balance, hence, is critical in future CMP designs where increased latency leads to wasted power and energy waiting for outstanding requests to resolve. Adaptive routing is a known technique to improve load balance; however, prior adaptive routing techniques either use local, myopic information or misinformed, regionally-aggregated information to form their routing decisions. This thesis proposes a new, light-weight, adaptive routing algorithm for on-chip routers based on global link state and congestion information, Global Congestion Awareness (GCA). GCA leverages unused bits in existing packet header flits to "piggyback" congestion state information around the network and uses a simple, low-complexity route calculation unit, to calculate optimal packet paths to their destination without the myopia of local decisions, nor the aggregation of unrelated status information, found in prior designs. In particular GCA outperforms local adaptive routing by up to 82%, Regional Congestion Awareness (RCA) by up to 51%, and a recent competing adaptive routing algorithm, DAR, by 8% on average on realistic workloads. computer architecture on-chip networks networks-on-chip multicores adaptive routing
327	A framework for managing the evolving web service protocols in service-oriented architectures. Ryu, Seung Hwan, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) In Service-Oriented Architectures, everything is a service and services can interact with each other when needed. Web services (or simply services) are loosely coupled software components that are published, discovered, and invoked across the Web. As the use of Web services grows, in order to correctly interact with the growing services, it is important to understand the business protocols that provide clients with the information on how to interact with services. In dynamic Web services environments, service providers need to constantly adapt their business protocols for reflecting the restrictions and requirements proposed by new applications, new business strategies, and new laws, or for fixing the problems found in the protocol definition. However, the effective management of such a protocol evolution raises critical problems: one of the most critical issues is to handle instances running under the old protocol when their protocol has been changed. Simple solutions, such as aborting them or allowing them to continue to run according to the old protocol, can be considered, but they are inapplicable for many reasons (e.g., the lose of work already done and the critical nature of work). We present a framework that supports service administrators in managing the business protocol evolution by providing several features, such as a set of change operators allowing modifications of protocols, a variety of protocol change impact analyses automatically determining which ongoing instances can be migrated to the new version of protocol, and data mining techniques inducing a model for classifying ongoing instances migrateable to the new protocol. To support the protocol evolution process, we have also developed database-backed GUI tools on top of our existing system. The proposed approach and tools can help service administrators in managing the evolution of ongoing instances when the business protocols of services with which they are interacting have changed. Business protocol. Web services. Evolution. Computer architecture. Computer network architectures.
328	Architectural support for security and reliability in embedded processors Ragel, Roshan Gabriel, Computer Science & Engineering, Faculty of Engineering, UNSW January 2006 (has links) Security and reliability in processor based systems are concerns requiring adroit solutions. Security is often compromised by code injection attacks, jeopardizing even ???trusted software???. Reliability is of concern, where unintended code is executed in modern processors with ever smaller feature sizes and low voltage swings causing bit flips. Countermeasures by software-only approaches increase code size and therefore significantly reduce performance. Hardware assisted approaches use additional hardware monitors and thus incur considerably high hardware cost and have scalability problems. Considering reliability and security issues during the design of an embedded system has its advantages as this overcomes the limitations of existing solutions. The research work presented in this thesis combines two elements: one, defining a hardware software design framework for reliability and security monitoring at the granularity of micro-instructions, and two, applying this framework for real world problems. At a given time, a processor executes only a few instructions and large part of the processor is idle. Utilizing these idling hardware components by sharing them with the monitoring hardware, to perform security and reliability monitoring reduces the impact of the monitors on hardware cost. Using micro-instruction routines within the machine instructions, allows us to share most of the monitoring hardware. Therefore, our technique requires little hardware overhead in comparison to having additional hardware blocks outside the processor. This reduction in overhead is due to maximal sharing of hardware resources of the processor. Our framework is superior to software-only techniques as the monitoring routines are formed with micro-instructions and therefore reduces code size and execution time overheads, since they occur in parallel with machine instructions. This dissertation makes four significant contributions to the field of security and reliability on embedded processor research and they are: (i) proposed a security and reliability framework for embedded processors that could be included into its design phase; (ii) shown that inline (machine instruction level) monitoring will detect common security attacks (four inline monitors against common attacks cost 9.21% area and 0.67% performance, as opposed to previous work where an external monitor with two monitoring modules costs 15% area overhead); (iii) illustrated that basic block check-summing for code integrity is much simpler and efficient than currently proposed integrity violation detectors which address code injection attacks (this costs 5.03% area increase and 3.67% performance penalty with a single level control flow checking, as opposed to previous work where the area overhead is 5.59%, which needed three control flow levels of integrity checking); and (iv) shown that hardware assisted control flow checking implemented during the design of a processor is much cheaper and effective than software only approaches (this approach costs 0.24-1.47% performance and 3.59% area overheads, as opposed to previous work that costs 53.5-99.5% performance). Security Reliability Embedded Systems Computer Architecture Embedded Processors
329	Optimization of instruction memory for embedded systems Janapsatya, Andhi, Computer Science & Engineering, Faculty of Engineering, UNSW January 2005 (has links) This thesis presents methodologies for improving system performance and energy consumption by optimizing the memory hierarchy performance. The processor-memory performance gap is a well-known problem that is predicted to get worse, as the performance gap between processor and memory is widening. The author describes a method to estimate the best L1 cache configuration for a given application. In addition, three methods are presented to improve the performance and reduce energy in embedded systems by optimizing the instruction memory. Performance estimation is an important procedure to assess the performance of the system and to assess the effectiveness of any applied optimizations. A cache memory performance estimation methodology is presented in this thesis. The methodology is designed to quickly and accurately estimate the performance of multiple cache memory configurations. Experimental results showed that the methodology is on average 45 times faster compared to a widely used tool (Dinero IV). The first optimization method is a software-only method, called code placement, was implemented to improve the performance of instruction cache memory. The method involves careful placement of code within memory to ensure high cache hit rate when code is brought into the cache memory. Code placement methodology aims to improve cache hit rates to improve cache memory performance. Experimental results show that by applying the code placement method, a reduction in cache miss rate by up to 71%, and energy consumption reduction of up to 63% are observed when compared to application without code placement. The second method involves a novel architecture for utilizing scratchpad memory. The scratchpad memory is designed as a replacement of the instruction cache memory. Hardware modification was designed to allow data to be written into the scratchpad memory during program execution, allowing dynamic control of the scratchpad memory content. Scratchpad memory has a faster memory access time and a lower energy consumption per access compared to cache memory; the usage of scratchpad memory aims to improve performance and lower energy consumption of systems compared to system with cache memory. Experimental results show an average energy reduction of 26.59% and an average performance improvement of 25.63% when compared to a system with cache memory. The third is an application profiling method using statistical information to identify application???s hot-spots. Application profiling is important for identifying section in the application where performance degradation might occur and/or where maximum performance gain can be obtained through optimization. The method was applied and tested on the scratchpad based system described in this thesis. Experimental results show the effectiveness of the analysis method in reducing energy and improving performance when compared to previous method for utilizing the scratchpad memory based system (average performance improvement of 23.6% and average energy reduction of 27.1% are observed). Memory management (Computer science) Cache memory Computer architecture
330	The morphing architecture : runtime evolution of distributed applications Williams, Nicholas P. Unknown Date (has links) No description available. 0805 Distributed Computing Computer architecture.

Search results