Spelling suggestions: "subject:"cybrid memory"" "subject:"bybrid memory""
1 |
A generic processing in memory cycle accurate simulator under hybrid memory cube architecture / Um simulador genérico ciclo-acurado para processamento em memória baseado na arquitetura da hybrid memory cubeOliveira Junior, Geraldo Francisco de January 2017 (has links)
PIM - uma técnica onde elementos computacionais são adicionados perto, ou idealmente, dentro de dispositivos de memória - foi uma das tentativas criadas durante os anos 1990 visando mitigar o notório memory wall problem. Hoje em dia, com o amadurecimento do processo de integração 3D, um novo horizonte para novas arquiteturas PIM pode ser explorado. Para investigar este novo cenário, pesquisadores dependem de simuladores em software para navegar pelo espaço de exploração de projeto. Hoje, a maioria dos trabalhos que focam em PIM, implementam simuladores locais para realizar seus experimentos. Porém, esta metodologia pode reduzir a produtividade e reprodutibilidade. Neste trabalho, nós mostramos o desenvolvimento de um simulador de PIM preciso, modular e parametrizável. Nosso simulador, chamado CLAPPS, visa a arquitetura de memória HMC, uma memória 3D popular, que é amplamente utilizada em aceleradores PIM do estado da arte. Nós desenvolvemos nosso mecanismo utilizando a linguagem de programação SystemC, o que permite uma simulação paralela nativamente. A principal contribuição do nosso trabalho se baseia em desenvolver a interface amigável que permite a fácil exploração de arquiteturas PIM. Para avaliar o nosso sistema, nós implementamos um modulo de PIM que pode executar operações vetoriais com diferente tamanhos de operandos utilizando o proposto conjunto de ferramentas. / PIM - a technique which computational elements are added close, or ideally, inside memory devices - was one of the attempts created during the 1990s to try to mitigate the memory wall problem. Nowadays, with the maturation of 3D integration technologies, a new landscape for novel PIM architectures can be investigated. To exploit this new scenario, researchers rely on software simulators to navigate throughout the design evaluation space. Today, most of the works targeting PIM implement in-house simulators to perform their experiments. However, this methodology might hurt overall productivity, while it might also preclude replicability. In this work, we showed the development of a precise, modular and parametrized PIM simulation environment. Our simulator, named CLAPPS, targets the HMC architecture, a popular 3D-stacked memory widely employed in state-of-the-art PIM accelerators. We have designed our mechanism using the SystemC programming language, which allows native parallel simulation. The primary contribution of our work lies in developing a user-friendly interface to allow easy PIM architectures exploitation. To evaluate our system, we have implemented a PIM module that can perform vector operations with different operand sizes using the proposed set of tools.
|
2 |
IMPROVING THE PERFORMANCE AND ENERGY EFFICIENCY OF EMERGING MEMORY SYSTEMSGuo, Yuhua 01 January 2018 (has links)
Modern main memory is primarily built using dynamic random access memory (DRAM) chips. As DRAM chip scales to higher density, there are mainly three problems that impede DRAM scalability and performance improvement. First, DRAM refresh overhead grows from negligible to severe, which limits DRAM scalability and causes performance degradation. Second, although memory capacity has increased dramatically in past decade, memory bandwidth has not kept pace with CPU performance scaling, which has led to the memory wall problem. Third, DRAM dissipates considerable power and has been reported to account for as much as 40% of the total system energy and this problem exacerbates as DRAM scales up.
To address these problems, 1) we propose Rank-level Piggyback Caching (RPC) to alleviate DRAM refresh overhead by servicing memory requests and refresh operations in parallel; 2) we propose a high performance and bandwidth efficient approach, called SELF, to breaking the memory bandwidth wall by exploiting die-stacked DRAM as a part of memory; 3) we propose a cost-effective and energy-efficient architecture for hybrid memory systems composed of high bandwidth memory (HBM) and phase change memory (PCM), called Dual Role HBM (DR-HBM). In DR-HBM, hot pages are tracked at a cost-effective way and migrated to the HBM to improve performance, while cold pages are stored at the PCM to save energy.
|
3 |
A generic processing in memory cycle accurate simulator under hybrid memory cube architecture / Um simulador genérico ciclo-acurado para processamento em memória baseado na arquitetura da hybrid memory cubeOliveira Junior, Geraldo Francisco de January 2017 (has links)
PIM - uma técnica onde elementos computacionais são adicionados perto, ou idealmente, dentro de dispositivos de memória - foi uma das tentativas criadas durante os anos 1990 visando mitigar o notório memory wall problem. Hoje em dia, com o amadurecimento do processo de integração 3D, um novo horizonte para novas arquiteturas PIM pode ser explorado. Para investigar este novo cenário, pesquisadores dependem de simuladores em software para navegar pelo espaço de exploração de projeto. Hoje, a maioria dos trabalhos que focam em PIM, implementam simuladores locais para realizar seus experimentos. Porém, esta metodologia pode reduzir a produtividade e reprodutibilidade. Neste trabalho, nós mostramos o desenvolvimento de um simulador de PIM preciso, modular e parametrizável. Nosso simulador, chamado CLAPPS, visa a arquitetura de memória HMC, uma memória 3D popular, que é amplamente utilizada em aceleradores PIM do estado da arte. Nós desenvolvemos nosso mecanismo utilizando a linguagem de programação SystemC, o que permite uma simulação paralela nativamente. A principal contribuição do nosso trabalho se baseia em desenvolver a interface amigável que permite a fácil exploração de arquiteturas PIM. Para avaliar o nosso sistema, nós implementamos um modulo de PIM que pode executar operações vetoriais com diferente tamanhos de operandos utilizando o proposto conjunto de ferramentas. / PIM - a technique which computational elements are added close, or ideally, inside memory devices - was one of the attempts created during the 1990s to try to mitigate the memory wall problem. Nowadays, with the maturation of 3D integration technologies, a new landscape for novel PIM architectures can be investigated. To exploit this new scenario, researchers rely on software simulators to navigate throughout the design evaluation space. Today, most of the works targeting PIM implement in-house simulators to perform their experiments. However, this methodology might hurt overall productivity, while it might also preclude replicability. In this work, we showed the development of a precise, modular and parametrized PIM simulation environment. Our simulator, named CLAPPS, targets the HMC architecture, a popular 3D-stacked memory widely employed in state-of-the-art PIM accelerators. We have designed our mechanism using the SystemC programming language, which allows native parallel simulation. The primary contribution of our work lies in developing a user-friendly interface to allow easy PIM architectures exploitation. To evaluate our system, we have implemented a PIM module that can perform vector operations with different operand sizes using the proposed set of tools.
|
4 |
A generic processing in memory cycle accurate simulator under hybrid memory cube architecture / Um simulador genérico ciclo-acurado para processamento em memória baseado na arquitetura da hybrid memory cubeOliveira Junior, Geraldo Francisco de January 2017 (has links)
PIM - uma técnica onde elementos computacionais são adicionados perto, ou idealmente, dentro de dispositivos de memória - foi uma das tentativas criadas durante os anos 1990 visando mitigar o notório memory wall problem. Hoje em dia, com o amadurecimento do processo de integração 3D, um novo horizonte para novas arquiteturas PIM pode ser explorado. Para investigar este novo cenário, pesquisadores dependem de simuladores em software para navegar pelo espaço de exploração de projeto. Hoje, a maioria dos trabalhos que focam em PIM, implementam simuladores locais para realizar seus experimentos. Porém, esta metodologia pode reduzir a produtividade e reprodutibilidade. Neste trabalho, nós mostramos o desenvolvimento de um simulador de PIM preciso, modular e parametrizável. Nosso simulador, chamado CLAPPS, visa a arquitetura de memória HMC, uma memória 3D popular, que é amplamente utilizada em aceleradores PIM do estado da arte. Nós desenvolvemos nosso mecanismo utilizando a linguagem de programação SystemC, o que permite uma simulação paralela nativamente. A principal contribuição do nosso trabalho se baseia em desenvolver a interface amigável que permite a fácil exploração de arquiteturas PIM. Para avaliar o nosso sistema, nós implementamos um modulo de PIM que pode executar operações vetoriais com diferente tamanhos de operandos utilizando o proposto conjunto de ferramentas. / PIM - a technique which computational elements are added close, or ideally, inside memory devices - was one of the attempts created during the 1990s to try to mitigate the memory wall problem. Nowadays, with the maturation of 3D integration technologies, a new landscape for novel PIM architectures can be investigated. To exploit this new scenario, researchers rely on software simulators to navigate throughout the design evaluation space. Today, most of the works targeting PIM implement in-house simulators to perform their experiments. However, this methodology might hurt overall productivity, while it might also preclude replicability. In this work, we showed the development of a precise, modular and parametrized PIM simulation environment. Our simulator, named CLAPPS, targets the HMC architecture, a popular 3D-stacked memory widely employed in state-of-the-art PIM accelerators. We have designed our mechanism using the SystemC programming language, which allows native parallel simulation. The primary contribution of our work lies in developing a user-friendly interface to allow easy PIM architectures exploitation. To evaluate our system, we have implemented a PIM module that can perform vector operations with different operand sizes using the proposed set of tools.
|
5 |
SYSTEMS SUPPORT FOR DATA ANALYTICS BY EXPLOITING MODERN HARDWAREHongyu Miao (11751590) 03 December 2021 (has links)
<p>A large volume of data is continuously being generated by data centers, humans, and the internet of things (IoT). In order to get useful insights, such enormous data must be processed in time with high throughput, low latency, and high accuracy. To meet such performance demands, a large body of new hardware is being shipped by vendors, such as multi-core CPUs, 3D-stacked memory, embedded microcontrollers, and other accelerators.</p><br><p>However, traditional operating systems (OSes) and data analytics frameworks, the key layer that bridges high-level data processing applications and low-level hardware, fails to deliver these requirements due to quickly evolving new hardware and increases in explosion of data. For instance, general OSes are not aware of the unique characters and demands of data processing applications. Data analytics engines for stream processing, e.g., Apache Spark and Beam, always add more machines to deal with more data but leave every single machine underutilized without fully exploiting underlying hardware features, which leads to poor efficiency. Data analytics frameworks for machine learning inference on IoT devices cannot run neural networks that exceed SRAM size, which disqualifies many important use cases.</p><br><p>In order to bridge the gap between the performance demands of data analytics and the new features of emerging hardware, in this thesis we exploit runtime system designs for high-level data processing applications by exploiting low-level modern hardware features. We study two important data analytics applications, including real-time stream processing and on-device machine learning inference, on three important hardware platforms across the Cloud and the Edge, including multicore CPUs, hybrid memory system combining 3D-stacked memory and general DRAM, and embedded microcontrollers with limited resources. </p><br><p>In order to speed up and enable the two data analytics applications on the three hardware platforms, this thesis contributes three related research projects. In project StreamBox, we exploit the parallelism and memory hierarchy of modern multicore hardware on single machines for stream processing, achieving scalable and highly efficient performance. In project StreamBox-HBM, we exploit hybrid memories to balance bandwidth and latency, achieving memory scalability and highly efficient performance. StreamBox and StreamBox-HBM both offer orders of magnitude performance improvements over the prior state of the art, opening up new applications with higher data processing needs. In project SwapNN, we investigate a system solution for microcontrollers (MCUs) to execute neural networks (NNs) inference out-of-core without losing accuracy, enabling new use cases and significantly expanding the scope of NN inference on tiny MCUs. </p><br><p>We report the system designs, system implementations, and experimental results. Based on our experience in building above systems, we provide general guidance on designing runtime systems across hardware/software stack for a wider range of new applications on future hardware platforms.</p><div><br></div>
|
6 |
Data Replication in Hybrid Memory Database SystemsZarubin, Mikhail 15 March 2022 (has links)
The recent advances in hardware technologies - i.e. highly scalable multi-core NUMA architectures and non-volatile random-access memory (NVRAM) - lead to significant changes in the architecture of in-memory database systems. The novel memory type allows persistent writes while featuring DRAM-like characteristics - byte addressability, high bandwidth, and low access latencies. It is likely to complement or replace the block-based secondary storage (e.g., HDDs or SSDs) for storing the primary data of the DBMS. Therefore, the next generation of highly-performant scalable database systems will rely on single-level hybrid memory (e.g., compound exclusively of DRAM and NVRAM) NUMA architectures and is expected to keep the primary data solely persistent in NVRAM, while query processing could be executed on both mediums. Unfortunately, NVRAM faces certain drawbacks such as a lower write endurance, lower bandwidth, higher latencies, and - most importantly - an increased error-proneness compared to DRAM. Thus, efficient minimal-overhead data protection mechanisms have to be deployed in the underlined architectures to avoid primary data losses. This thesis provides an analytical overview of such envisioned hybrid memory database systems, gives a survey of reliability techniques that are generally deployed in computing systems, identifies their strengths and weaknesses when used in hybrid memory databases. As a result, this work proposes effective adoption and optimization primitives for the software-managed data replication as the most applicable resilience approach. In particular, research focus is given to runtime and space (and, therefore, NVRAM wear-out) reduction of the replication overheads, while preserving strong resilience guaranties and instant recovery opportunities. Subsequently, this thesis proposes a rich set of techniques that leverage data replication for query processing needs to achieve high performance, allocation flexibility and effective hardware utilization in modern commodity scale-up systems.
|
7 |
SIMD-MIMD cocktail in a hybrid memory glass: shaken, not stirredZarubin, Mikhail, Damme, Patrick, Krause, Alexander, Habich, Dirk, Lehner, Wolfgang 23 November 2021 (has links)
Hybrid memory systems consisting of DRAM and NVRAM offer a great opportunity for column-oriented data systems to persistently store and to efficiently process columnar data completely in main memory. While vectorization (SIMD) of query operators is state-of-the-art to increase the single-thread performance, it has to be combined with thread-level parallelism (MIMD) to satisfy growing needs for higher performance and scalability. However, it is not well investigated how such a SIMD-MIMD interplay could be leveraged efficiently in hybrid memory systems. On the one hand, we deliver an extensive experimental evaluation of typical workloads on columnar data in this paper. We reveal that the choice of the most performant SIMD version differs greatly for both memory types. Moreover, we show that the throughput of concurrent queries can be boosted (up to 2x) when combining various SIMD flavors in a multi-threaded execution. On the other hand, to enable that optimization, we propose an adaptive SIMD-MIMD cocktail approach incurring only a negligible runtime overhead.
|
Page generated in 0.0473 seconds