Global ETD Search

1	Virtual memory systems using magnetic bubble memory Griffiths, R. B. January 1985 (has links) No description available. 621.39 Computer memory
2	Estudo e implementação da otimização de Preload de dados usando o processador XScale / Study and implementation of data Preload optimization using XScale Oliveira, Marcio Rodrigo de 08 October 2005 (has links) Orientador: Guido Costa Souza Araujo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-06T14:27:52Z (GMT). No. of bitstreams: 1 Oliveira_MarcioRodrigode_M.pdf: 1563381 bytes, checksum: 52e2e029998b3539a26f5c2b76284d88 (MD5) Previous issue date: 2005 / Resumo: Atualmente existe um grande mercado para o desenvolvimento de aplicações para sistemas embutidos, pois estes estão fazendo parte crescente do cotidiano das pessoas em produtos de eletrônica de consumo como telefones celulares, palmtop's, agendas eletrônicas, etc. Os produtos de eletrônica de consumo possuem grandes restrições de projeto, tais como custo reduzido, baixo consumo de potência e muitas vezes alto desempenho. Deste modo, o código produzido pelos compiladores para os programas executados nestes produtos, devem executar rapidamente, economizando energia de suas baterias. Estes melhoramentos são alcançados através de transformações no programa fonte chamadas de otimizações de código. A otimização preload de dados consiste em mover dados de um alto nível da hierarquia de memória para um baixo nível dessa hierarquia antes deste dado ser usado. Este é um método que pode reduzir a penalidade da latência de memória. Este trabalho mostra o desenvolvimento da otimização de preload de dados no compilador Xingo para a plataforma Pocket PC, cuja arquitetura possui um processador XScale. A arquitetura XScale possui a instrução preload, cujo objetivo é fazer uma pré-busca de dados para a cache. Esta otimização insere (através de previsões) a instrução preload no código intermediário do programa fonte, tentando prever quais dados serão usados e que darão miss na cache (trazendo-os para esta cache antes de seu uso). Com essa estratégia, tenta-se minimizar a porcentagem de misses na cache de dados, reduzindo o tempo gasto em acessos à memória. Foram usados neste trabalho vários programas de benchmarks conhecidos para a avaliação dos resultados, dentre eles destacam-se DSPstone e o MiBench. Os resultados mostram que esta otimização de preload de dados para o Pocket PC produz um aumento considerável de desempenho para a maioria dos programa testados, sendo que em vários programas observou-se uma melhora de desempenho maior que 30%! / Abstract: Nowadays, there is a big market for applications for embedded systems, in products as celIular phones, palmtops, electronic schedulers, etc. Consumer electronics are designed under stringent design constraints, like reduced cost, low power consumption and high performance. This way, the code produced by compiling programs to execute on these products, must execute quickly, and also should save power consumption. In order to achieve that, code optimizations must be performed at compile time. Data preload consists of moving data from a higher leveI of the memory hierarchy to a lower leveI before data is actualIy needed, thus reducing memory latency penalty. This dissertation shows how data preload optimization was implemented into the Xingo compiler for the Pocket PC platform, a XScale based processor. The XScale architecture has a preload instruction, whose main objective is to prefetch program data into cache. This optimization inserts (through heuristics) preload instructions into the program source code, in order to anticipate which data will be used. This strategy minimizes cache misses, allowing to reduce the cache miss latency while running the program code. Some benchmark programs have been used for evaluation, like DSPstone and MiBench. The results show a considerable performance improvement for almost alI tested programs, subject to the preload optimization. Many of the tested programs achieved performance improvements larger than 30% / Mestrado / Otimização de Codigo / Mestre em Ciência da Computação Compiladores (Computadores) Sistemas de memória de computadores Arquitetura de computador Compiling (Eletronic computers) Computer memory systems Computer architecture
3	A Novel Gate Controlled Metal Oxide Resistive Memory Cell and its Applications Herrmann, Eric January 2018 (has links) No description available. Computer Engineering memristor neuromorphic reram metal oxide computer memory thin film
4	Um estudo comparativo em memorias associativas com enfase em memorias associativas morfologicas / A comparative study on associative memories with emphasis on morphological associative memories Mesquita, Marcos Eduardo Ribeiro do Valle, 1979- 24 August 2005 (has links) Orientador: Peter Sussner / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matematica, Estatistica e Computação Cientifica / Made available in DSpace on 2018-08-05T07:48:58Z (GMT). No. of bitstreams: 1 Mesquita_MarcosEduardoRibeirodoValle_M.pdf: 893884 bytes, checksum: 9e4611642968683b375b78c5424ac19d (MD5) Previous issue date: 2005 / Resumo: Memórias associativas neurais são modelos do fenômeno biológico que permite o armazenamento de padrões e a recordação destes apos a apresentação de uma versão ruidosa ou incompleta de um padrão armazenado. Existem vários modelos de memórias associativas neurais na literatura, entretanto, existem poucos trabalhos comparando as varias propostas. Nesta dissertação comparamos sistematicamente o desempenho dos modelos mais influentes de memórias associativas neurais encontrados na literatura. Esta comparação está baseada nos seguintes critérios: capacidade de armazenamento, distribuição da informação nos pesos sinápticos, raio da bacia de atração, memórias espúrias e esforço computacional. Especial ênfase dado para as memórias associativas morfológicas cuja fundamentação matemática encontra-se na morfologia matemática e na álgebra de imagens / Abstract: Associative neural memories are models of biological phenomena that allow for the storage of pattern associations and the retrieval of the desired output pattern upon presentation of a possibly noisy or incomplete version of an input pattern. There are several models of neural associative memories in the literature, however, there are few works relating them. In this thesis, we present a systematic comparison of the performances of some of the most widely known models of neural associative memories. This comparison is based on the following criteria: storage capacity, distribution of the information over the synaptic weights, basin of attraction, number of spurious memories, and computational effort. The thesis places a special emphasis on morphological associative memories whose mathematical foundations lie in mathematical morphology and image algebra / Mestrado / Matematica Aplicada / Mestre em Matemática Aplicada Redes neurais (Computação) Morfologia matemática Sistemas de memória de computadores Neural networks (Computer science) Mathematical morphology Computer memory systems
5	Power-Aware Compilation Techniques For Embedded Systems Shyam, K 07 1900 (has links) The demand for devices like Personal Digital Assistants (PDA’s), Laptops, Smart Mobile Phones, are at an all time high. As the demand for these devices increases, so is the push to provide sophisticated functionalities in these devices. However energy consumption has become a major constraint in providing increased functionality for these devices. A majority of the applications meant for these devices are rich with multimedia content. In this thesis, we propose two approaches for compiler directed energy reduction, one targeting the memory subsystem and another the processor. The ﬁrst technique is a compiler directed optimization technique that reduces the energy consumption of the memory subsystem, for an oﬀ-chip partitioned memory archi- tecture, having multiple memory banks, and various low-power operating modes for each of these banks. We propose an eﬃcient layout of the data segment to reduce the number of simultaneously active memory banks, so that the other memory banks that are inactive can be put to low power modes to reduce the energy. We model this problem as a graph partitioning problem, and use well known heuristics to solve the same. We also propose a simple Integer Linear Programming (ILP) formulation for the above problem. Perfor- mance results indicate that our approach achieves an energy reduction of 20% compared to the base scheme, and a reduction of 8%-10% over a previously suggested method. Also, our results are well within the optimal results obtained by using ILP method. The second approach proposed in this thesis reduces the dynamic energy consumed by the processor using dynamic voltage and frequency scaling technique. Earlier works on dynamic voltage scaling focused mainly on performing voltage scaling when the CPU is waiting for memory subsystem or concentrated chieﬂy on loop nests and/or subroutine calls having suﬃcient number of dynamic instructions. We concentrate on coarser pro- gram regions and for the ﬁrst time uses program phase behavior for performing dynamic voltage scaling. We relate the Dynamic Voltage Scaling Problem to the Multiple Choice Knapsack Problem, and use well known heuristics to solve it eﬃciently. Also, we develop a simple Integer Linear Programming (ILP) problem formulation for this problem. Experi-mental evaluation on a set of media applications reveal that our heuristic method obtains 35-40% reduction in energy consumption on an average, with a negligible performance degradation. Further the energy consumed by our heuristic solution is within 1% the optimal solution obtained by the ILP approach. Electric Power Control Compilers Embedded Systems Computer Memory Architecture Voltages Dynamic Voltage Scaling Memory Architectures Integer Linear Programming (ILP) Computer Science
6	The use of memory state knowledge to improve computer memory system organization Isen, Ciji 01 June 2011 (has links) The trends in virtualization as well as multi-core, multiprocessor environments have translated to a massive increase in the amount of main memory each individual system needs to be fitted with, so as to effectively utilize this growing compute capacity. The increasing demand on main memory implies that the main memory devices and their issues are as important a part of system design as the central processors. The primary issues of modern memory are power, energy, and scaling of capacity. Nearly a third of the system power and energy can be from the memory subsystem. At the same time, modern main memory devices are limited by technology in their future ability to scale and keep pace with the modern program demands thereby requiring exploration of alternatives to main memory storage technology. This dissertation exploits dynamic knowledge of memory state and memory data value to improve memory performance and reduce memory energy consumption. A cross-boundary approach to communicate information about dynamic memory management state (allocated and deallocated memory) between software and hardware viii memory subsystem through a combination of ISA support and hardware structures is proposed in this research. These mechanisms help identify memory operations to regions of memory that have no impact on the correct execution of the program because they were either freshly allocated or deallocated. This inference about the impact stems from the fact that, data in memory regions that have been deallocated are no longer useful to the actual program code and data present in freshly allocated memory is also not useful to the program because the dynamic memory has not been defined by the program. By being cognizant of this, such memory operations are avoided thereby saving energy and improving the usefulness of the main memory. Furthermore, when stores write zeros to memory, the number of stores to the memory is reduced in this research by capturing it as compressed information which is stored along with memory management state information. Using the methods outlined above, this dissertation harnesses memory management state and data value information to achieve significant savings in energy consumption while extending the endurance limit of memory technologies. / text Computer architecture Memory power Memory management (Computer science) Memory energy Memory allocation Phase change memory DRAM Computer storage devices Computer memory systems Program semantics
7	Caracterização de memorias analogicas implementadas com transistores MOS floating gate / Analogic memories characterization implemented with floating gate MOS transistors Couto, Andre Luis do 28 November 2005 (has links) Orientador: Carlos Alberto dos Reis Filho / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-07T11:14:24Z (GMT). No. of bitstreams: 1 Couto_AndreLuisdo_M.pdf: 2940356 bytes, checksum: 959908541a3bc46b7b7035eb035de186 (MD5) Previous issue date: 2005 / Resumo: A integração de memórias e circuitos analógicos em um mesmo die oferece diversas vantagens: redução de espaço nas placas, maior confiabilidade, menor custo. Para tanto, prescindir-se de tecnologia específica à confecção de memórias e utilizar-se somente de tecnologia CMOS convencional é requisito para tal integração. Essa pode ser tanto mais eficiente quanto maior a capacidade de armazenagem de dados, ou seja, maior a densidade de informação. Para isso, memórias analógicas mostram-se bem mais adequadas, posto que em uma só célula (um ou dois transistores) podem ser armazenados dados que precisariam de diversas células de memórias digitais e, portanto, de maior área. Neste trabalho, transistores MOS com porta flutuante mostraram-se viáveis de serem confeccionados e resultados de caracterização como tipos de programação, retenção de dados e endurance foram obtidos. O trabalho apresenta as principais características dos FGMOS (Floating Gate MOS) e presta-se como referência à futuros trabalhos na área / Abstract:Monolithic integration of memories and analog circuits ,in the same die offers interesting advantages like: smaller application boards, higher robustness and mainly lower costs. Today, a profitable integration of these kind of circuit can only be possible using conventional CMOS technology, which allows efficiently extraordinary levels of integration. Thus, the possibility of integrating analog memories looks more suitable since one single cell (usually use one or two transistors) serves for storing the same data stored by few digital memory cells, therefore, they requiring less area. In this work, it was implemented different memory cells together with few devices using floating gate MOS transistors and manufactured by a conventional CMOS technology. Differemt sort of programrning', data retention, and endurance were characterized as well as the main characteristics of the FGMOS (Floating Gate MOS) were obtained. The results of their characterization reveal that is possible to make and' to program fIoating gate MOSFETS analog memories and must serve as starting-point and reference for new academic studies / Mestrado / Eletrônica, Microeletrônica e Optoeletrônica / Mestre em Engenharia Elétrica Memória cache Sistemas de memória de computadores Transistores de efeito de campo Transistores Circuitos integrados Microeletrônica Microelectronics Integrated circuits Transistors Field effect transistor Computer memory system
8	MIST : Mlgrate The Storage Too Kamala, R 07 1900 (has links) (PDF) We address the problem of migration of local storage of desktop users to remote sites. Assuming a network connection is maintained between the source and destination after the migration makes it possible for us to transfer a fraction of storage state while trying to operate as close to disconnected mode as possible. We have designed an approach to determine the subset of storage state that is to be transferred based on past accesses. We show that it is feasible to use information about files accessed to determine clusters and hot-spots in the file system. Using the tree structure of the file system and by applying an appropriate similarity measure to user accesses, we can approximate the working sets of the data accessed by the applications running at the time. Our results indicate that our technique reduces the amount of data to be copied by two orders of magnitude, bringing it into the realm of the possible. Data Storage Storage Migration Data Collection Computer Storage Network File Systems Remote Access User Access Patterns Clustering (Computers) Computer Memory Virtual Storage (Computer Science) Computer Science
9	Optimizations In Storage Area Networks And Direct Attached Storage Dharmadeep, M C 02 1900 (has links) The thesis consists of three parts. In the first part, we introduce the notion of device-cache-aware schedulers. Modern disk subsystems have many megabytes of memory for various purposes such as prefetching and caching. Current disk scheduling algorithms make decisions oblivious of the underlying device cache algorithms. In this thesis, we propose a scheduler architecture that is aware of underlying device cache. We also describe how the underlying device cache parameters can be automatically deduced and incorporated into the scheduling algorithm. In this thesis, we have only considered adaptive caching algorithms as modern high end disk subsystems are by default configured to use such algorithms. We implemented a prototype for Linux anticipatory scheduler, where we observed, compared with the anticipatory scheduler, upto 3 times improvement in query execution times with Benchw benchmark and upto 10 percent improvement with Postmark benchmark. The second part deals with implementing cooperative caching for the Redhat Global File System. The Redhat Global File System (GFS) is a clustered shared disk file system. The coordination between multiple accesses is through a lock manager. On a read, a lock on the inode is acquired in shared mode and the data is read from the disk. For a write, an exclusive lock on the inode is acquired and data is written to the disk; this requires all nodes holding the lock to write their dirty buffers/pages to disk and invalidate all the related buffers/pages. A DLM (Distributed Lock Manager) is a module that implements the functions of a lock manager. GFS’s DLM has some support for range locks, although it is not being used by GFS. While it is clear that a data sourced from a memory copy is likely to have lower latency, GFS currently reads from the shared disk after acquiring a lock (just as in other designs such as IBM’s GPFS) rather than from remote memory that just recently had the correct contents. The difficulties are mainly due to the circular relationships that can result between GFS and the generic DLM architecture while integrating DLM locking framework with cooperative caching. For example, the page/buffer cache should be accessible from DLM and yet DLM’s generality has to be preserved. The symmetric nature of DLM (including the SMP concurrency model) makes it even more difficult to understand and integrate cooperative caching into it (note that GPFS has an asymmetrical design). In this thesis, we describe the design of a cooperative caching scheme in GFS. To make it more effective, we also have introduced changes to the locking protocol and DLM to handle range locks more efficiently. Experiments with micro benchmarks on our prototype implementation reveal that, reading from a remote node over gigabit Ethernet can be upto 8 times faster than reading from a enterprise class SCSI disk for random disk reads. Our contributions are an integrated design for cooperative caching and lock manager for GFS, devising a novel method to do interval searches and determining when sequential reads from a remote memory perform better than sequential reads from a disk. The third part deals with selecting a primary network partition in a clustered shared disk system, when node/network failures occur. Clustered shared disk file systems like GFS, GPFS use methods that can fail in case of multiple network partitions and also in case of a 2 node cluster. In this thesis, we give an algorithm for fault-tolerant proactive leader election in asynchronous shared memory systems, and later its formal verification. Roughly speaking, a leader election algorithm is proactive if it can tolerate failure of nodes even after a leader is elected, and (stable) leader election happens periodically. This is needed in systems where a leader is required after every failure to ensure the availability of the system and there might be no explicit events such as messages in the (shared memory) system. Previous algorithms like DiskPaxos are not proactive. In our model, individual nodes can fail and reincarnate at any point in time. Each node has a counter which is incremented every period, which is same across all the nodes (modulo a maximum drift). Different nodes can be in different epochs at the same time. Our algorithm ensures that per epoch there can be at most one leader. So if the counter values of some set of nodes match, then there can be at most one leader among them. If the nodes satisfy certain timeliness constraints, then the leader for the epoch with highest counter also becomes the leader for the next epoch (stable property). Our algorithm uses shared memory proportional to the number of processes, the best possible. We also show how our protocol can be used in clustered shared disk systems to select a primary network partition. We have used the state machine approach to represent our protocol in Isabelle HOL logic system and have proved the safety property of the protocol. Computer Storage Computer Memory Devices Adaptive Caching Algorithms Cooperative Caching Redhat Global File System - Caching Device-Cache-Aware Schedulers Clustered Shared Disk Systems Scheduler Architecture Storage Area Networks Direct Attached Storage Distributed Lock Manager (DLM) Global File System (GFS) Computer Science
10	Automatic Data Allocation, Buffer Management And Data Movement For Multi-GPU Machines Ramashekar, Thejas 10 1900 (has links) (PDF) Multi-GPU machines are being increasingly used in high performance computing. These machines are being used both as standalone work stations to run computations on medium to large data sizes (tens of gigabytes) and as a node in a CPU-Multi GPU cluster handling very large data sizes (hundreds of gigabytes to a few terabytes). Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and managed at a on each GPU. A significant body of scientific applications that utilize multi-GPU machines contain computations inside affine loop nests, i.e., loop nests that have affine bounds and affine array access functions. These include stencils, linear-algebra kernels, dynamic programming codes and data-mining applications. Data allocation, buffer management, and coherency handling are critical steps that need to be performed to run affine applications on multi-GPU machines. Existing works that propose to automate these steps have limitations and in efficiencies in terms of allocation sizes, exploiting reuse, transfer costs and scalability. An automatic multi-GPU memory manager that can overcome these limitations and enable applications to achieve salable performance is highly desired. One technique that has been used in certain memory management contexts in the literature is that of bounding boxes. The bounding box of an array, for a given tile, is the smallest hyper-rectangle that encapsulates all the array elements accessed by that tile. In this thesis, we exploit the potential of bounding boxes for memory management far beyond their current usage in the literature. In this thesis, we propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding Box based Memory Manager (BBMM). BBMM is a compiler-assisted runtime memory manager. At compile time, it use static analysis techniques to identify a set of bounding boxes accessed by a computation tile. At run time, it uses the bounding box set operations such as union, intersection, difference, finding subset and superset relation to compute a set of disjoint bounding boxes from the set of bounding boxes identified at compile time. It also exploits the architectural capability provided by GPUs to perform fast transfers of rectangular (strided) regions of memory and hence performs all data transfers in terms of bounding boxes. BBMM uses these techniques to automatically allocate, and manage data required by applications (suitably tiled and parallelized for GPUs). This allows It to (1) allocate only as much data (or close to) as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence, maximize data reuse across tiles and minimize the data transfer overhead, (3) and as a result, enable applications to maximize the utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a system with four GPUs with various scientific programs showed that BBMM is able to reduce data allocations on each GPU by up to 75% compared to current allocation schemes, yield at least 88% of the performance of hand-optimized Open CL codes and allows excellent weak scaling. High Performance Computing Computer Memory Management Multi-GPU Memory Manager Automatic Data Allocation Data Transfer Buffer Management Affine Loop Nests Bounding Box Based Memory Manager GPU Architecture Data Movement Code Box Based Memory Manager (BBMM) Computer Science

Search results