Global ETD Search

121	Design of heterogeneous coherence hierarchies using manager-client pairing Beu, Jesse Garrett 09 April 2013 (has links) Over the past ten years, the architecture community has witnessed the end of single-threaded performance scaling and a subsequent shift in focus toward multicore and manycore processing. While this is an exciting time for architects, with many new opportunities and design spaces to explore, this brings with it some new challenges. One area that is especially impacted is the memory subsystem. Specifically, the design, verification, and evaluation of cache coherence protocols becomes very challenging as cores become more numerous and more diverse. This dissertation examines these issues and presents Manager-Client Pairing as a solution to the challenges facing next-generation coherence protocol design. By defining a standardized coherence communication interface and permissions checking algorithm, Manager-Client Pairing enables coherence hierarchies to be constructed and evaluated quickly without the high design-cost previously associated with hierarchical composition. Further, Manager-Client Pairing also allows for verification composition, even in the presence of protocol heterogeneity. As a result, this rapid development of diverse protocols is ensured to be bug-free, enabling architects to focus on performance optimization, rather than debugging and correctness concerns, while comparing diverse coherence configurations for use in future heterogeneous systems. Formal verification Protocol verification Heterogeneous computing Uncore Microarchitecture Computer architecture Computer storage devices Memory management (Computer science) Memory maps (Computer science) Data processing
122	The use of memory state knowledge to improve computer memory system organization Isen, Ciji 01 June 2011 (has links) The trends in virtualization as well as multi-core, multiprocessor environments have translated to a massive increase in the amount of main memory each individual system needs to be fitted with, so as to effectively utilize this growing compute capacity. The increasing demand on main memory implies that the main memory devices and their issues are as important a part of system design as the central processors. The primary issues of modern memory are power, energy, and scaling of capacity. Nearly a third of the system power and energy can be from the memory subsystem. At the same time, modern main memory devices are limited by technology in their future ability to scale and keep pace with the modern program demands thereby requiring exploration of alternatives to main memory storage technology. This dissertation exploits dynamic knowledge of memory state and memory data value to improve memory performance and reduce memory energy consumption. A cross-boundary approach to communicate information about dynamic memory management state (allocated and deallocated memory) between software and hardware viii memory subsystem through a combination of ISA support and hardware structures is proposed in this research. These mechanisms help identify memory operations to regions of memory that have no impact on the correct execution of the program because they were either freshly allocated or deallocated. This inference about the impact stems from the fact that, data in memory regions that have been deallocated are no longer useful to the actual program code and data present in freshly allocated memory is also not useful to the program because the dynamic memory has not been defined by the program. By being cognizant of this, such memory operations are avoided thereby saving energy and improving the usefulness of the main memory. Furthermore, when stores write zeros to memory, the number of stores to the memory is reduced in this research by capturing it as compressed information which is stored along with memory management state information. Using the methods outlined above, this dissertation harnesses memory management state and data value information to achieve significant savings in energy consumption while extending the endurance limit of memory technologies. / text Computer architecture Memory power Memory management (Computer science) Memory energy Memory allocation Phase change memory DRAM Computer storage devices Computer memory systems Program semantics
123	High-performance memory system architectures using data compression Baek, Seungcheol 22 May 2014 (has links) The Chip Multi-Processor (CMP) paradigm has cemented itself as the archetypal philosophy of future microprocessor design. Rapidly diminishing technology feature sizes have enabled the integration of ever-increasing numbers of processing cores on a single chip die. This abundance of processing power has magnified the venerable processor-memory performance gap, which is known as the ”memory wall”. To bridge this performance gap, a high-performing memory structure is needed. An attractive solution to overcoming this processor-memory performance gap is using compression in the memory hierarchy. In this thesis, to use compression techniques more efficiently, compressed cacheline size information is studied, and size-aware cache management techniques and hot-cacheline prediction for dynamic early decompression technique are proposed. Also, the proposed works in this thesis attempt to mitigate the limitations of phase change memory (PCM) such as low write performance and limited long-term endurance. One promising solution is the deployment of hybridized memory architectures that fuse dynamic random access memory (DRAM) and PCM, to combine the best attributes of each technology by using the DRAM as an off-chip cache. A dual-phase compression technique is proposed for high-performing DRAM/PCM hybrid environments and a multi-faceted wear-leveling technique is proposed for the long-term endurance of compressed PCM. This thesis also includes a new compression-based hybrid multi-level cell (MLC)/single-level cell (SLC) PCM management technique that aims to combine the performance edge of SLCs with the higher capacity of MLCs in a hybrid environment. Memory systems Cache compression Cache replacement Hybrid DRAM/PCM Data compression (Computer science) High performance computing Computer storage devices Cache memory
124	ISCSI-based storage area network for disaster Murphy, Matthew R. Harvey, Bruce A. January 2005 (has links) Thesis (M.S.)--Florida State University, 2005. / Advisor: Dr. Bruce A. Harvey, Florida State University, College of Engineering, Dept. of Electrical and Computer Engineering. Title and description from dissertation home page (viewed June 10, 2005). Document formatted into pages; contains vii, 73 pages. Includes bibliographical references.
125	Switch-based Fast Fourier Transform processor Mohd, Bassam Jamil, 1968- 05 October 2012 (has links) The demand for high-performance and power scalable DSP processors for telecommunication and portable devices has increased significantly in recent years. The Fast Fourier Transform (FFT) computation is essential to such designs. This work presents a switch-based architecture to design radix-2 FFT processors. The processor employs M processing elements, 2M memory arrays and M Read Only Memories (ROMs). One processing element performs one radix-2 butterfly operation. The memory arrays are designed as single-port memory, where each has a size of N/(2M); N is the number of FFT points. Compared with a single processing element, this approach provides a speedup of M. If not addressed, memory collisions degrade the processor performance. A novel algorithm to detect and resolve the collisions is presented. When a collision is detected, a memory management operation is executed. The performance of the switch architecture can be further enhanced by pipelining the design, where each pipeline stage employs a switch component. The result is a speedup of Mlog2N compared with a single processing element performance. The utilization of single-port memory reduces the design complexities and area. Furthermore, memory arrays significantly reduce power compared with the delay elements used in some FFT processors. The switch-based architecture facilitates deactivating processing elements for power scalability. It also facilitates implementing different FFT sizes. The VLSI implementation of a non-pipeline switch-based processor is presented. Matlab simulations are conducted to analyze the performance. The timing, power and area results from RTL, synthesis and layout simulations are discussed and compared with other processors. / text Microprocessors--Design and construction Computer architecture Memory management (Computer science) Computer storage devices
126	Toward a brain-like memory with recurrent neural networks Salihoglu, Utku 12 November 2009 (has links) For the last twenty years, several assumptions have been expressed in the fields of information processing, neurophysiology and cognitive sciences. First, neural networks and their dynamical behaviors in terms of attractors is the natural way adopted by the brain to encode information. Any information item to be stored in the neural network should be coded in some way or another in one of the dynamical attractors of the brain, and retrieved by stimulating the network to trap its dynamics in the desired item’s basin of attraction. The second view shared by neural network researchers is to base the learning of the synaptic matrix on a local Hebbian mechanism. The third assumption is the presence of chaos and the benefit gained by its presence. Chaos, although very simply produced, inherently possesses an infinite amount of cyclic regimes that can be exploited for coding information. Moreover, the network randomly wanders around these unstable regimes in a spontaneous way, thus rapidly proposing alternative responses to external stimuli, and being easily able to switch from one of these potential attractors to another in response to any incoming stimulus. Finally, since their introduction sixty years ago, cell assemblies have proved to be a powerful paradigm for brain information processing. After their introduction in artificial intelligence, cell assemblies became commonly used in computational neuroscience as a neural substrate for content addressable memories. <p> <p>Based on these assumptions, this thesis provides a computer model of neural network simulation of a brain-like memory. It first shows experimentally that the more information is to be stored in robust cyclic attractors, the more chaos appears as a regime in the background, erratically itinerating among brief appearances of these attractors. Chaos does not appear to be the cause, but the consequence of the learning. However, it appears as an helpful consequence that widens the network’s encoding capacity. To learn the information to be stored, two supervised iterative Hebbian learning algorithm are proposed. One leaves the semantics of the attractors to be associated with the feeding data unprescribed, while the other defines it a priori. Both algorithms show good results, even though the first one is more robust and has a greater storing capacity. Using these promising results, a biologically plausible alternative to these algorithms is proposed using cell assemblies as substrate for information. Even though this is not new, the mechanisms underlying their formation are poorly understood and, so far, there are no biologically plausible algorithms that can explain how external stimuli can be online stored in cell assemblies. This thesis provide such a solution combining a fast Hebbian/anti-Hebbian learning of the network's recurrent connections for the creation of new cell assemblies, and a slower feedback signal which stabilizes the cell assemblies by learning the feed forward input connections. This last mechanism is inspired by the retroaxonal hypothesis. <p> / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Sciences exactes et naturelles Neural networks (Computer science) Neural computers Computer storage devices Artificial intelligence Réseaux neuronaux (Informatique) Ordinateurs neuronaux Ordinateurs -- Mémoires Intelligence artificielle Cell Assembly Brain Hebbian Chaos Recurrent Neural Networks Memory
127	Estudo da efetividade dos mecanismos de compartilhamento de memória em hipervisores / Study of the effectiveness of memory sharing mechanisms in hypervisors Veiga, Fellipe Medeiros 28 August 2015 (has links) A crescente demanda por ambientes de virtualização de larga escala, como os usados em datacenters e nuvens computacionais, faz com que seja necessário um gerenciamento eficiente dos recursos computacionais utilizados. Um dos recursos mais exigidos nesses ambientes é a memória RAM, que costuma ser o principal fator limitante em relação ao número de máquinas virtuais que podem executar sobre o mesmo host físico. Recentemente, hipervisores trouxeram mecanismos de compartilhamento transparente de memória RAM entre máquinas virtuais, visando diminuir a demanda total de memória no sistema. Esses mecanismos “fundem” páginas idênticas encontradas nas várias máquinas virtuais em um mesmo quadro de memória física, usando uma abordagem copy-on-write, de forma transparente para os sistemas convidados. O objetivo deste estudo é apresentar uma visão geral desses mecanismos e também avaliar seu desempenho e efetividade. São apresentados resultados de experimentos realizados com dois hipervisores populares (VMware e KVM), usando sistemas operacionais convidados distintos (Linux e Windows) e cargas de trabalho diversas (sintéticas e reais). Os resultados obtidos evidenciam diferenças significativas de desempenho entre os hipervisores em função dos sistemas convidados, das cargas de trabalho e do tempo. / The growing demand for large-scale virtualization environments, such as the ones used in cloud computing, has led to a need for efficient management of computing resources. RAM memory is the one of the most required resources in these environments, and is usually the main factor limiting the number of virtual machines that can run on the physical host. Recently, hypervisors have brought mechanisms for transparent memory sharing between virtual machines in order to reduce the total demand for system memory. These mechanisms “merge” similar pages detected in multiple virtual machines into the same physical memory, using a copy-on-write mechanism in a manner that is transparent to the guest systems. The objective of this study is to present an overview of these mechanisms and also evaluate their performance and effectiveness. The results of two popular hypervisors (VMware and KVM) using different guest operating systems (Linux and Windows) and different workloads (synthetic and real) are presented herein. The results show significant performance differences between hypervisors according to the guest system workloads and execution time. Gerenciamento de memória (Computação) Sistemas de memória de computadores Computação em nuvem Sistemas de computação virtual Sistemas operacionais (Computadores) VMware (Software) Computação Memory Management (Computer science) Computer storage devices Cloud computing Virtual computer systems Operating systems (Computers) Computer science
128	Central de confrontos para um sistema automático de identificação biométrica: uma abordagem de implementação escalável / Matching platform for an automatic biometric identification system: a scalable implementation approach Nishibe, Caio Arce 19 October 2017 (has links) Com a popularização do uso da biometria, determinar a identidade de um indivíduo é uma atividade cada vez mais comum em diversos contextos: controle de acesso físico e lógico, controle de fronteiras, identificações criminais e forenses, pagamentos. Sendo assim, existe uma demanda crescente por Sistemas Automáticos de Identificação Biométrica (ABIS) cada vez mais rápidos, com elevada acurácia e que possam operar com um grande volume de dados. Este trabalho apresenta uma abordagem de implementação de uma central de confrontos para um ABIS de grande escala utilizando um framework de computação em memória. Foram realizados experimentos em uma base de dados real com mais de 50 milhões de impressões digitais em um cluster com até 16 nós. Os resultados mostraram a escalabilidade da solução proposta e a capacidade de operar em grandes bases de dados. / With the popularization of biometrics, personal identification is an increasingly common activity in several contexts: physical and logical access control, border control, criminal and forensic identification, payments. Thus, there is a growing demand for faster and accurate Automatic Biometric Identification Systems (ABIS) capable to handle a large volume of biometric data. This work presents an approach to implement a scalable cluster-based matching platform for a large-scale ABIS using an in-memory computing framework. We have conducted some experiments that involved a database with more than 50 million captured fingerprints, in a cluster up to 16 nodes. The results have shown the scalability of the proposed solution and the capability to handle a large biometric database. Biometria Identificação biométrica Computação de alto desempenho Big data Antropometria Impressões digitais Sistemas de memória de computadores Engenharia elétrica Biometry Biometric identification High performance computing Big data Anthropometry Fingerprints Computer storage devices Electric engineering Engenharia Elétrica

Search results