Global ETD Search

471	Software caching techniques and hardware optimizations for on-chip local memories Vujic, Nikola 05 June 2012 (has links) Despite the fact that the most viable L1 memories in processors are caches, on-chip local memories have been a great topic of consideration lately. Local memories are an interesting design option due to their many benefits: less area occupancy, reduced energy consumption and fast and constant access time. These benefits are especially interesting for the design of modern multicore processors since power and latency are important assets in computer architecture today. Also, local memories do not generate coherency traffic which is important for the scalability of the multicore systems. Unfortunately, local memories have not been well accepted in modern processors yet, mainly due to their poor programmability. Systems with on-chip local memories do not have hardware support for transparent data transfers between local and global memories, and thus ease of programming is one of the main impediments for the broad acceptance of those systems. This thesis addresses software and hardware optimizations regarding the programmability, and the usage of the on-chip local memories in the context of both single-core and multicore systems. Software optimizations are related to the software caching techniques. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this thesis, we start optimizing traditional software cache by proposing a hierarchical, hybrid software-cache architecture. Afterwards, we develop few optimizations in order to speedup our hybrid software cache as much as possible. As the result of the software optimizations we obtain that our hybrid software cache performs from 4 to 10 times faster than traditional software cache on a set of NAS parallel benchmarks. We do not stop with software caching. We cover some other aspects of the architectures with on-chip local memories, such as the quality of the generated code and its correspondence with the quality of the buffer management in local memories, in order to improve performance of these architectures. Therefore, we run our research till we reach the limit in software and start proposing optimizations on the hardware level. Two hardware proposals are presented in this thesis. One is about relaxing alignment constraints imposed in the architectures with on-chip local memories and the other proposal is about accelerating the management of local memories by providing hardware support for the majority of actions performed in our software cache. / Malgrat les memòries cau encara son el component basic pel disseny del subsistema de memòria, les memòries locals han esdevingut una alternativa degut a les seves característiques pel que fa a l’ocupació d’àrea, el seu consum energètic i el seu rendiment amb un temps d’accés ràpid i constant. Aquestes característiques son d’especial interès quan les properes arquitectures multi-nucli estan limitades pel consum de potencia i la latència del subsistema de memòria.Les memòries locals pateixen de limitacions respecte la complexitat en la seva programació, fet que dificulta la seva introducció en arquitectures multi-nucli, tot i els avantatges esmentats anteriorment. Aquesta tesi presenta un seguit de solucions basades en programari i maquinari específicament dissenyat per resoldre aquestes limitacions.Les optimitzacions del programari estan basades amb tècniques d'emmagatzematge de memòria cau suportades per llibreries especifiques. La memòria cau per programari és un sòlid mètode per proporcionar a l'usuari una visió transparent de l'arquitectura, però aquest enfocament pot patir d'un rendiment deficient. En aquesta tesi, es proposa una estructura jeràrquica i híbrida. Posteriorment, desenvolupem optimitzacions per tal d'accelerar l’execució del programari que suporta el disseny de la memòria cau. Com a resultat de les optimitzacions realitzades, obtenim que el nostre disseny híbrid es comporta de 4 a 10 vegades més ràpid que una implementació tradicional de memòria cau sobre un conjunt d’aplicacions de referencia, com son els “NAS parallel benchmarks”.El treball de tesi inclou altres aspectes de les arquitectures amb memòries locals, com ara la qualitat del codi generat i la seva correspondència amb la qualitat de la gestió de memòria intermèdia en les memòries locals, per tal de millorar el rendiment d'aquestes arquitectures. La tesi desenvolupa propostes basades estrictament en el disseny de nou maquinari per tal de millorar el rendiment de les memòries locals quan ja no es possible realitzar mes optimitzacions en el programari. En particular, la tesi presenta dues propostes de maquinari: una relaxa les restriccions imposades per les memòries locals respecte l’alineament de dades, l’altra introdueix maquinari específic per accelerar les operacions mes usuals sobre les memòries locals. computer architecture high performance computing on-chip local memories software cache DMA multicores 004
472	Structures, Circuits and Architectures for Molecular Scale Integrated Sensing and Computing Pistol, Constantin January 2009 (has links) <p>Nanoscale devices offer the technological advances to enable a new era in computing. Device sizes at the molecular-scale have the potential to expand the domain of conventional computer systems to reach into environments and application domains that are otherwise impractical, such as single-cell sensing or micro-environmental monitoring.</p><p>New potential application domains, like biological scale computing, require processing elements that can function inside nanoscale volumes (e.g. single biological cells) and are thus subject to extreme size and resource constraints. In this thesis we address these critical new domain challenges through a synergistic approach that matches manufacturing techniques, circuit technology, and architectural design with application requirements. We explore and vertically integrate these three fronts: a) assembly methods that can cost-effectively provide nanometer feature sizes, b) device technologies for molecular-scale computing and sensing, and c) architectural design techniques for nanoscale processors, with the goal of mapping a potential path toward achieving molecular-scale computing.</p><p>We make four primary contributions in this thesis. First, we develop and experimentally demonstrate a scalable, cost-effective DNA self-assembly-based fabrication technique for molecular circuits. Second, we propose and evaluate Resonance Energy Transfer (RET) logic, a novel nanoscale technology for computing based on single-molecule optical devices. Third, we design and experimentally demonstrate selective sensing of several biomolecules using RET-logic elements. Fourth, we explore the architectural implications of integrating computation and molecular sensors to form nanoscale sensor processors (nSP), nanoscale-sized systems that can sense, process, store and communicate molecular information. Through the use of self-assembly manufacturing, RET molecular logic, and novel architectural techniques, the smallest nSP design is about the size of the largest known virus.</p> / Dissertation Computer Science computer architecture DNA self assembly molecular digital logic nano photonic circuits nanoscale computing
473	Architectural Enhancements for Color Image and Video Processing on Embedded Systems Kim, Jongmyon 21 April 2005 (has links) As emerging portable multimedia applications demand more and more computational throughput with limited energy consumption, the need for high-efficiency, high-throughput embedded processing is becoming an important challenge in computer architecture. In this regard, this dissertation addresses application-, architecture-, and technology-level issues in existing processing systems to provide efficient processing of multimedia in many, or ideally all, of its form. In particular, this dissertation explores color imaging in multimedia while focusing on two architectural enhancements for memory- and performance-hungry embedded applications: (1) a pixel-truncation technique and (2) a color-aware instruction set (CAX) for embedded multimedia systems. The pixel-truncation technique differs from previous techniques (e.g., 4:2:2 and 4:2:0 subsampling) used in image and video compression applications (e.g., JPEG and MPEG) in that it reduces the information content in individual pixel word sizes rather than in each dimension. Thus, this technique drastically reduces the bandwidth and memory required to transport and store color images without perceivable distortion in color. At the same time, it maintains the pixel storage format of color image processing in which each pixel computation is performed simultaneously on 3-D YCbCr components, which are widely used in the image and video processing community. CAX supports parallel operations on two-packed 16-bit (6:5:5) YCbCr data in a 32-bit datapath processor, providing greater concurrency and efficiency for processing color image sequences. This dissertation presents the impact of CAX on processing performance and on both area and energy efficiency for color imaging applications in three major processor architectures: dynamically scheduled (superscalar), statically scheduled (very long instruction word, VLIW), and embedded single instruction multiple data (SIMD) array processors. Unlike typical multimedia extensions, CAX obtains substantial performance and code density improvements through direct support for color data processing rather than depending solely on generic subword parallelism. In addition, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. In summary, CAX, coupled with the pixel-truncation technique, provides an efficient mechanism that meets the computational requirements and cost goals for future embedded multimedia products. Data parallel architectures Superscalar processors Embedded systems Subword parallelism Computer architecture Color image and video processing
474	DLL-Conscious Instruction Fetch Optimization for SMT Processors Mohamood, Fayez 12 April 2006 (has links) Simultaneous multithreading (SMT) processors can issue multiple instructions from distinct processes or threads in the same cycle. This technique effectively increases the overall throughput by keeping the pipeline resources more occupied at the potential expense of reducing single thread performance due to resource sharing. In the software domain, an increasing number of Dynamically Linked Libraries (DLL) are used by applications and operating systems, providing better flexibility and modularity, and enabling code sharing. It is observed that a significant amount of execution time in software today is spent in executing standard DLL instructions, that are shared among multiple threads or processes. However, for an SMT processor with a virtually-indexed based cache implementation, existing instruction fetching mechanisms can induce unnecessary false cache misses caused by the DLL-based instructions, which were intended to be shared. This problem is more conspicuous when multiple independent threads are executing concurrently in an SMT processor. This work investigates an often-neglected form of contention between running threads in the I-TLB and I-cache caused by DLLs. To address these shortcomings, we propose a system level technique involving a light-weight modification in the microarchitecture and the OS. By exploiting the nature of the DLLs in our new architecture, we are able to reinstate physical sharing of the DLLs in an SMT machine. Using Microsoft Windows based applications, our simulation results show that the optimized instruction fetching mechanism can reduce the number of DLL misses up to 5.5 times and improve the instruction cache hit rates by up to 62%, resulting in upto 30% DLL IPC improvements and upto 15% overall IPC improvements. DLL SMT Shared libraries Threads (Computer programs) Memory management (Computer science)
475	Subverting Linux on-the-fly using hardware virtualization technology Athreya, Manoj B. 13 May 2010 (has links) In this thesis, we address the problem faced by modern operating systems due to the exploitation of Hardware-Assisted Full-Virtualization technology by attackers. Virtualization technology has been of growing importance these days. With the help of such a technology, multiple operating systems can be run on a single piece of hardware, with little or no modification to the operating system. Both Intel and AMD have contributed to x86 full-virtualization through their respective instruction set architectures. Hardware virtualization extensions can be found in almost all x86 processors these days. Hardware virtualization technologies have opened a whole new frontier for a new kind of attack. A system hacker can abuse hardware virualization technology to gain control over an operating system on-the-fly (i.e., without a system restart) by installing a thin Virtual Machine Monitor (VMM) below the native operating system. Such a VMM based malware is termed a Hardware-Assisted Virtual Machine (HVM) rootkit. We discuss the technique used by a rootkit named Blue Pill to subvert the Windows Vista operating system by exploiting the AMD-V (codenamed "Pacifica") virtualization extensions. HVM rootkits do not hook any operating system code or data regions; hence detecting the existence of such malware using conventional techniques becomes extremely difficult. This thesis discusses existing methods to detect such rootkits and their inefficiencies. In this work, we implement a proof-of-concept HVM rootkit using Intel-VT hardware virtualization technology and also discuss how such an attack can be defended against by using an autonomic architecture called SHARK, which was proposed by Vikas et al., in MICRO 2008. Secure architecture Virtual machine based rootkits Encryption Virtual computer systems Computer security Computer architecture
476	Micro-scheduling and its interaction with cache partitioning Choudhary, Dhruv 05 July 2011 (has links) The thesis explores the sources of energy inefficiency in asymmetric multi- core architectures where energy efficiency is measured by the energy-delay squared product. The insights gathered from this study drive the development of optimized thread scheduling and coordinated cache management strategies in an important class of asymmetric shared memory architectures. The proposed techniques are founded on well known mathematical optimization techniques yet are lightweight enough to be implemented in practical systems. Cache partitioning Computer architecture Thread scheduling Cache memory Multiprocessors High performance computing
477	Performance Evaluation of Embedded Microcomputers for Avionics Applications Bilen, Celal Can, Alcalde, John January 2010 (has links) <p>Embedded microcomputers are used in a wide range of applications nowadays. Avionics is one of these areas and requires extra attention regarding reliability and determinism. Thus, these issues should also be born in mind in addition to performance when evaluating embedded microcomputers.</p><p>This master thesis suggests a framework for performance evaluation of two members of the PowerPC microprocessor family, namely the MPC5554 from Freescale and PPC440EPx from AMCC, and analyzes the results within and between these processors. The framework can be generalized to be used in any microprocessor family, if required.</p><p>Apart from performance evaluation, this thesis also suggests also a new terminology by introducing the concept of determinism levels to be able to estimate determinism issues in avionics applications more clearly, which is crucial regarding the requirements and working conditions of this very application. Such estimation does not include any practical results as in performance evaluation, but rather remains theoretical. Similar to Automark™ used by AutoBench™ in the EEMBC Benchmark Suite, we introduce a new performance metric score that we call ”Aviomark” and we carry out a detailed comparison of Aviomark with the traditional Automark™ score to be able to see how Aviomark differs from Automark™ in behavior.</p><p>Finally, we have developed a graphical user interface (GUI) which works in parallel with the Green Hills MULTI Integrated Development Environment (IDE) in order to simplify and automate the evaluation process. By the help of the GUI, the users will be able to easily evaluate their specific PowerPC processors by starting the debugging from MULTI IDE.</p> Microprocessor Avionics PowerPC Performance Evaluation Determinism Embedded Systems Computer Architecture Computer engineering Datorteknik
478	FPGA-based hardware accelerator design for performance improvement of a system-on-a-chip application Vyas, Dhaval N. January 2005 (has links) Thesis (M.S.)--State University of New York at Binghamton, Department of Electrical and Computer Engineering. / Includes bibliographical references (p. 55-56).
479	Platforms for HPJava Runtime support for scalable programming in Java / Lim, Sang Boem. Erlebacher, Gordon. January 2003 (has links) Thesis (Ph. D.)--Florida State University, 2003. / Advisor: Dr. Gordon Erlebacher, Florida State University, College of Arts and Sciences, Dept. of Computer Science. Title and description from dissertation home page (viewed Apr. 8, 2004). Includes bibliographical references.
480	Simulation of distributed object oriented servers / Kwok, Chee Khan. January 2003 (has links) (PDF) Thesis (M.S. in Computer Science)--Naval Postgraduate School, December 2003. / Thesis advisor(s): Bill Ray, Shing Man Tak. Includes bibliographical references (p. 215). Also available online.

Search results