Global ETD Search

21	Towards Using Free Memory to Improve Microarchitecture Performance Panwar, Gagandeep 18 May 2020 (has links) A computer system's memory is designed to accommodate the worst-case workloads with the highest memory requirement; as such, memory is underutilized when a system runs workloads with common-case memory requirements. Through a large-scale study of four production HPC systems, we find that memory underutilization problem in HPC systems is very severe. As unused memory is wasted memory, we propose exposing a compute node's unused memory to its CPU(s) through a user-transparent CPU-OS codesign. This can enable many new microarchitecture techniques that transparently leverage unused memory locations to help improve microarchitecture performance. We refer to these techniques as Free-memory-aware Microarchitecture Techniques (FMTs). In the context of HPC systems, we present a detailed example of an FMT called Free-memory-aware Replication (FMR). FMR replicates in-use data to unused memory locations to effectively reduce average memory read latency. On average across five HPC benchmark suites, FMR provides 13% performance and 8% system-level energy improvement. / M.S. / Random-access memory (RAM) or simply memory, stores the temporary data of applications that run on a computer system. Its size is determined by the worst-case application workload that the computer system is supposed to run. Through our memory utilization study of four large multi-node high-performance computing (HPC) systems, we find that memory is underutilized severely in these systems. Unused memory is a wasted resource that does nothing. In this work, we propose techniques that can make use of this wasted memory to boost computer system performance. We call these techniques Free-memory-aware Microarchitecture Techniques (FMTs). We then present an FMT for HPC systems in detail called Free-memory-aware Replication (FMR) that provides performance improvement of over 13%. Computer Architecture Memory DRAM HPC systems
22	Utilization-adaptive Memory Architectures Panwar, Gagandeep 14 June 2024 (has links) DRAM contributes significantly to a server system's cost and global warming potential. To make matters worse, DRAM density scaling has not kept up with the scaling in logic and storage technologies. An effective way to reduce DRAM's monetary and environmental cost is to increase its effective utilization and extract the best possible performance in all utilization scenarios. To this end, this dissertation proposes Utilization-adaptive Memory Architectures that enhance the memory controller with the ability to adapt to current memory utilization and implement techniques to boost system performance. These techniques fall under two categories: (i) The techniques under Utilization-adaptive Hardware Memory Replication target the scenario where memory is underutilized and aim to boost performance versus a conventional system without replication, and (ii) The techniques under Utilization-adaptive Hardware Memory Compression target the scenario where memory utilization is high and aim to significantly increase memory capacity while closing the performance gap versus a conventional system that has sufficient memory and does not require compression. / Doctor of Philosophy / A computer system's memory stores information for the system's immediate use (e.g., data and instructions for in-use programs). The performance and capacity of the dominant memory technology – Dynamic Random Access Memory (DRAM) – has not kept up with advancements in computing devices such as CPUs. Furthermore, DRAM significantly contributes to a server's carbon footprint because a server can have over a thousand DRAM chips – substantially more than any other type of chip. DRAM's manufacturing cycle and lifetime energy use make it the most carbon-unfriendly component on today's servers. To reduce the environmental impact of DRAM, an intuitive way is to increase its utilization. To this end, this dissertation explores Utilization-adaptive Memory Architectures which enable the memory controller to adapt to the system's current memory through a variety of techniques such as: (i) Utilization-adaptive Hardware Memory Replication which copies in-use data to free memory and uses the extra copy to improve performance, and (ii) Utilization-adaptive Hardware Memory Compression which uses dense representation for data to save memory and allows the system to run applications that require more memory than the physically installed memory. Compared to conventional systems that do not feature these techniques, these techniques improve performance for different memory utilization scenarios ranging from low to high. Memory DRAM Utilization Replication Compression HPC Cloud
23	IMPROVING THE PERFORMANCE AND ENERGY EFFICIENCY OF EMERGING MEMORY SYSTEMS Guo, Yuhua 01 January 2018 (has links) Modern main memory is primarily built using dynamic random access memory (DRAM) chips. As DRAM chip scales to higher density, there are mainly three problems that impede DRAM scalability and performance improvement. First, DRAM refresh overhead grows from negligible to severe, which limits DRAM scalability and causes performance degradation. Second, although memory capacity has increased dramatically in past decade, memory bandwidth has not kept pace with CPU performance scaling, which has led to the memory wall problem. Third, DRAM dissipates considerable power and has been reported to account for as much as 40% of the total system energy and this problem exacerbates as DRAM scales up. To address these problems, 1) we propose Rank-level Piggyback Caching (RPC) to alleviate DRAM refresh overhead by servicing memory requests and refresh operations in parallel; 2) we propose a high performance and bandwidth efficient approach, called SELF, to breaking the memory bandwidth wall by exploiting die-stacked DRAM as a part of memory; 3) we propose a cost-effective and energy-efficient architecture for hybrid memory systems composed of high bandwidth memory (HBM) and phase change memory (PCM), called Dual Role HBM (DR-HBM). In DR-HBM, hot pages are tracked at a cost-effective way and migrated to the HBM to improve performance, while cold pages are stored at the PCM to save energy. Hybrid Memory Systems DRAM Die-stacked DRAM NVM Energy-efficient Computer and Systems Architecture Data Storage Systems
24	Étude des mécanismes de déclenchement des Bits Collés dans les SRAM et DRAM en Environnement Radiatif Spatial / Study of Mechanisms Leading to Stuck Bits on SRAM and DRAM Memories in the Space Radiation Environment Rodriguez, Axel 02 March 2017 (has links) Les résultats de différentes expériences du CNES (Centre National d’Études Spatiales) embarquées sur satellites montrent que des composants SRAM et SDRAM subissent des erreurs atypiques, qui se caractérisent par une fraction d’emplacements mémoire présentant des erreurs récurrentes. Ces erreurs non-catégorisées représentent la quasi-totalité des erreurs détectées sur ces mémoires. Une revue interne du CNES a déterminé que ces erreurs étaient dues aux radiations présentes dans l’environnement spatial (protons, électrons, ions lourds). Cette thèse s’attache à reproduire ces erreurs atypiques au sol en utilisant des moyens d’irradiation et des accélérateurs de particules, à les caractériser ainsi qu’à expliquer le mécanisme physique menant à l’apparition de ces cellules endommagées. Le mécanisme physique que nous proposons est cohérent avec les données obtenues sous faisceau de particules et soutenu par nos simulations de type TCAD. / CNES’s onboard experiment results on several satellites have demonstrated that on SRAM and SDRAM memories, a fraction of words suffers from unknown errors that increase the afflicted words’ rate of error by orders of magnitude compared to other words. CNES’s experts found that these errors were due to the space radiation environment (proton, electrons, heavy ions).The main goals of this Ph.D. thesis are to successfully recreate such errors at ground level using irradiation facilities and particle accelerators, to investigate their behavior and finally, to submit a physical mechanism for memory cell degradation under irradiation, both coherent with experimental data and data obtained from TCAD simulations. Sram Dram Tcad Test Proton Neutron Sram Dram Tcad Testing Proton Neutron
25	Materials for DRAM Memory Cell Applications Schroeder, Uwe, Cho, Kyuho, Slesazeck, Stefan 06 May 2022 (has links) Semiconductor memory is one of the key technologies driving the success of Si-based information technology within the last five decades. The most prominent representative memory type, the dynamic random access memory(DRAM)was patented in 1967 and was introduced into the market by Intel Corporation in 1972. Until the year 2001 and the realization of the 110 nm technology node, DRAM was the driving force on the lithography shrink roadmap, before NAND FLASH took over that role. Hence, the development of the DRAM technology was long time the forerunner for the exponentially growing large-scale integration and promoted similar advances in logic chips. One of the reasons of the success of the DRAM is its simple cell structure, which consists of only one transistor (1T) and one capacitor (1C), where the information is stored in form of a charge. DRAM, Halbleiter, Dielektrikum DRAM, semiconductors, dielectric info:eu-repo/classification/ddc/621.3 ddc:621.3
26	Optimizing Memory Systems for High Efficiency in Computing Clusters Liu, Wenjie January 2022 (has links) DRAM-based memory system suffers from increasing aggravating row buffer interference, which causes significant performance degradation and power consumption. With DRAM scaling, the overheads of row buffer interference become even worse due to higher row activation and precharge latency. Clusters have been a prevalent and successful computing framework for processing large amount of data due to their distributed and parallelized working paradigm. A task submitted to a cluster is typically divided into a number of subtasks which are designated to different work nodes running the same code but dealing with different equal portion of the dataset to be processed. Due to the existence of heterogeneity, it could easily result in stragglers unfairly slowing down the entire processing, because work nodes finish their subtasks at different rates. With the increasing problem complexity, more irregular applications are deployed on high-performance clusters due to the parallel working paradigm, and yield irregular memory access behaviors across nodes. However, the irregularity of memory access behaviors is not comprehensively studied, which results in low utilization of the integrated hybrid memory system compositing of stacked DRAM and off-chip DRAM. This dissertation lists our research results on the above three mentioned challenges in order to optimize the memory system for high efficiency in computing clusters. Details are as follows: To address low row buffer utilization caused by row buffer interference, we propose Row Buffer Cache (RBC) architecture to efficiently mitigate row buffer interference overheads. At the core of the RBC architecture, the DRAM pages with good locality are cached and escape from the row buffer interference.Such an RBC architecture significantly reduces the overheads caused by row activation and precharge, thus improves overall system performance and energy efficiency. We evaluate our RBC using SPEC CPU2006 on a DDR4 memory compared to the commodity baseline memory system along with the state-of-art methods, DICE and Bingo. Results show that RBC improves the memory performance by up to 2.24X (16.1% on average) and reduces the overall memory energy by up to 68.2% (23.6% on average) for single-core simulations. For multi-core simulations, RBC increases the performance by up to 1.55X (16.7% on average) and reduces the energy by up to 35.4% (21.3% on average). Comparing with the state-of-art methods, RBC outperforms DICE and Bingo by 8% and 5.1% on average for single-core scenario, and by 10.1% and 4.7% for multi-core scenario. To relax the straggling effect observed in clusters, we aim to speed up straggling work nodes to quicken the overall processing by leveraging exhibited performance variation, and propose StragglerHelper which conveys the memory access characteristics experienced by the forerunner to the stragglers such that stragglers can be sped up due to the accurately informed memory prefetching. A Progress Monitor is deployed to supervise the respective progresses of the work nodes and inform the memory access patterns of forerunner to straggling nodes. Our evaluation results with the SPEC MPI 2007 and BigDataBench on a cluster of 64 work nodes have shown that StragglerHelper is able to improve the execution time of stragglers by up to 99.5% with an average of 61.4%, contributing to an overall improvement of the entire cohort of the cluster by up to 46.7% with an average of 9.9% compared to the baseline cluster. To address the performance difference in the irregular application, we devise a novel method called Similarity-Managed Hybrid Memory System (SM-HMS) to improve the hybrid memory system performance by leveraging the memory access similarity among nodes in a cluster. Within SM-HMS, two techniques are proposed, Memory Access Similarity Measuring and Similarity-based Memory Access Behavior Sharing. To quantify the memory access similarity, memory access behaviors of each node are vectorized, and the distance between two vectors is used as the memory access similarity. The calculated memory access similarity is used to share memory access behaviors precisely across nodes. With the shared memory access behaviors, SM-HMS divides the stacked DRAM into two sections, the sliding window section and the outlier section. The shared memory access behaviors guide the replacement of the sliding window section while the outlier section is managed in the LRU manner. Our evaluation results with a set of irregular applications on various clusters consisting of up to 256 nodes have shown that SM-HMS outperforms the state-of-the-art approaches, Cameo, Chameleon, and Hyrbid2, on job finish time reduction by up to 58.6%, 56.7%, and 31.3%, with 46.1%, 41.6%, and 19.3% on average, respectively. SM-HMS can also achieve up to 98.6% (91.9% on average) of the ideal hybrid memory system performance. / Computer and Information Science Computer science Computing cluster DRAM Memory access behavior Memory system Stacked DRAM
27	記憶體模組產業的轉型策略-以C公司為例 / Industrial transformation strategy of Memory Module - Case of Company C 林婉菁 Unknown Date (has links) 台灣記憶體模組廠商在全球的排名及佔有率約有10%~15%，故台灣現仍為全球記憶體模組的生產重鎮，但近兩年來，我國半導體製造業有逐年下降的現象，營收表現則受到國際情勢及消費性電子產品發展的影響而出現波動；當創新產品出現並大幅成長，造成PC及NB的需求惡化，再加上國內記憶體工廠生產成本沒有競爭力，國內DRAM大廠紛紛轉型或被迫退出市場，台灣記憶體產業的榮景已不復見。現今台灣記憶體模組廠產業大多跳脫DRAM模組單一產品，而是多元化朝MP3播放器、數位相框、外接式硬碟盒等產品布局，但消費性產品的變化快，中小型記憶體模組廠在大廠大者恆大，小則轉型或被收購合併甚至面臨關廠的命運；中小型記憶體模組產業前景又將是營運艱辛的未來。為了解記憶體模組產業之現況及未來發展前景，因此本研究將針對國內記憶體模組公司-C公司的現況作詳細的了解，並提出對未來之建議。論文政大記憶體 thesis NCCU DRAM
28	Increasing memory access efficiency through a two-level memory controller Linck, Marcelo Melo 22 March 2018 (has links) Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-04-03T14:30:24Z No. of bitstreams: 1 MARCELO_MELO_LINCK_DIS.pdf: 4153250 bytes, checksum: 821a8f1e65f49c1b24a0b69b4f6e7f94 (MD5) / Approved for entry into archive by Tatiana Lopes (tatiana.lopes@pucrs.br) on 2018-04-12T21:09:45Z (GMT) No. of bitstreams: 1 MARCELO_MELO_LINCK_DIS.pdf: 4153250 bytes, checksum: 821a8f1e65f49c1b24a0b69b4f6e7f94 (MD5) / Made available in DSpace on 2018-04-12T21:23:08Z (GMT). No. of bitstreams: 1 MARCELO_MELO_LINCK_DIS.pdf: 4153250 bytes, checksum: 821a8f1e65f49c1b24a0b69b4f6e7f94 (MD5) Previous issue date: 2018-03-22 / Acessos simult?neos gerados por m?ltiplos clientes para um ?nico dispositivo de mem?ria em um Sistema-em-Chip (SoC) imp?e desafios que requerem aten??o extra devido ao gargalo gerado na performance. Considerando estes clientes como processadores, este problema torna-se mais evidente, pois a taxa de crescimento de velocidade para processadores excede a de dispositivos de mem?ria, criando uma lacuna de desempenho. Neste cen?rio, estrat?gias de controle de mem?ria s?o necess?rias para aumentar o desempenho do sistema. Estudos provam que a comunica??o com a mem?ria ? a maior causa de atrasos durante a execu??o de programas em processadores. Portanto, a maior contribui??o deste trabalho ? a implementa??o de uma arquitetura de controlador de mem?ria composta por dois n?veis: prioridade e mem?ria. O n?vel de prioridade ? respons?vel por interagir com os clientes e escalonar requisi??es de mem?ria de acordo com um algoritmo de prioridade fixa. O n?vel de mem?ria ? respons?vel por reordenar as requisi??es e garantir o isolamento de acesso ? mem?ria para clientes de alta prioridade. O principal objetivo deste trabalho ? apresentar um modelo que reduza as lat?ncias de acesso ? mem?ria para clientes de alta prioridade em um sistema altamente escal?vel. Os experimentos neste trabalho foram realizados atrav?s de uma simula??o comportamental da estrutura proposta utilizando um programa de simula??o. A an?lise dos resultados ? dividida em quatro partes: an?lise de lat?ncia, an?lise de row-hit, an?lise de tempo de execu??o e an?lise de escalabilidade. / Simultaneous accesses generated by memory clients in a System-on-Chip (SoC) to a single memory device impose challenges that require extra attention due to the performance bottleneck created. When considering these clients as processors, this issue becomes more evident, because the growth rate in speed for processors exceeds the same rate for memory devices, creating a performance gap. In this scenario, memory-controlling strategies are necessary to improve system performances. Studies have proven that the main cause of processor execution lagging is the memory communication. Therefore, the main contribution of this work is the implementation of a memory-controlling architecture composed of two levels: priority and memory. The priority level is responsible for interfacing with clients and scheduling memory requests according to a fixed-priority algorithm. The memory level is responsible for reordering requests and guaranteeing memory access isolation to high-priority clients. The main objective of this work is to provide latency reductions to high-priority clients in a scalable system. Experiments in this work have been conducted considering the behavioral simulation of the proposed architecture through a software simulator. The evaluation of the proposed work is divided into four parts: latency evaluation, row-hit evaluation, runtime evaluation and scalability evaluation. Memory Memory Controller DRAM DDR4
29	Sensitivity Analyses for Tumor Growth Models Mendis, Ruchini Dilinika 01 April 2019 (has links) This study consists of the sensitivity analysis for two previously developed tumor growth models: Gompertz model and quotient model. The two models are considered in both continuous and discrete time. In continuous time, model parameters are estimated using least-square method, while in discrete time, the partial-sum method is used. Moreover, frequentist and Bayesian methods are used to construct confidence intervals and credible intervals for the model parameters. We apply the Markov Chain Monte Carlo (MCMC) techniques with the Random Walk Metropolis algorithm with Non-informative Prior and the Delayed Rejection Adoptive Metropolis (DRAM) algorithm to construct parameters' posterior distributions and then obtain credible intervals. statistics Random Walk Metropolis Algorithm DRAM Credible intervals Applied Statistics
30	A New TFT with Trenched Body and Airgap-Insulated Structure for Capacitorless 1T-DRAM Application Chang, Tzu-feng 29 July 2010 (has links) In this thesis, we propose a new thin-film transistor with trenched body and airgap-insulated structure (AITFT) for one-transistor dynamic random access memory (1T-DRAM) applications and investigate the influence of different materials on the sensing current window and retention time. Its basic operation mechanisms are based on the impact ionization and floating body effects. Due to the generated holes storing in the pseudo neutral region, the threshold voltage (Vth) is lower, resulting in a high drain current for state ¡§1¡¨. So we can recognize the data by sensing the difference of the drain current. According to the ISE TCAD 10.0 simulations, owing to the design of trench and airgap-isolation structure, the AITFT can enhance about 212% sensing current window and 42% retention time compared with the conventional TFT at the channel length of 150 nm and temperature of 300K conditions. Also, owing to the source/drain-tie, the generated heat can be dissipated quickly from the source/drain to the substrate thus the thermal instability is improved. In other words, the AITFT can improve the thermal reliability but without losing control of the short-channel effects. TFT airgap retention time trench sensing current window 1T-DRAM

Search results