• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 30
  • 17
  • 16
  • 11
  • 6
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 92
  • 30
  • 19
  • 13
  • 11
  • 10
  • 9
  • 9
  • 8
  • 8
  • 7
  • 7
  • 7
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Design and prototyping of Hardware-Accelerated Locality-aware Memory Compression

Srinivas, Raghavendra 09 September 2020 (has links)
Hardware Acceleration is the most sought technique in chip design to achieve better performance and power efficiency for critical functions that may be in-efficiently handled from traditional OS/software. As technology started advancing with 7nm products already in the market which can provide better power and performance consuming low area, the latency-critical functions that were handled by software traditionally now started moving as acceleration units in the chip. This thesis describes the accelerator architecture, implementation, and prototype for one of such functions namely "Locality-Aware memory compression" which is part of the "OS-controlled memory compression" scheme that has been actively deployed in today's OSes. In brief, OS-controlled memory compression is a new memory management feature that transparently, dramatically, and adaptively increases effective main memory capacity on-demand as software-level memory usage increases beyond physical memory system capacity. OS-controlled memory compression has been adopted across almost all OSes (e.g., Linux, Windows, macOS, AIX) and almost all classes of computing systems (e.g., smartphones, PCs, data centers, and cloud). The OS-controlled memory compression scheme is Locality Aware. But still under OS-controlled memory compression today, applications experience long-latency page faults when accessing compressed memory. To solve this per- performance bottle-neck, acceleration technique has been proposed to manage "Locality Aware Memory compression" within hardware thereby enabling applications to access their OS- compressed memory directly. This Accelerator is referred to as HALK throughout this work, which stands for "Hardware-accelerated Locality-aware Memory Compression". The literal mean- ing of the word HALK in English is 'a hidden place'. As such, this accelerator is neither exposed to the OS nor to the running applications. It is hidden entirely in the memory con- troller hardware and incurs minimal hardware cost. This thesis work explores developing FPGA design prototype and gives the proof of concept for the functionality of HALK by running non-trivial micro-benchmarks. This work also provides and analyses power, performance, and area of HALK for ASIC designs (at technology node of 7nm) and selected FPGA Prototype design. / Master of Science / Memory capacity has become a scarce resource across many digital computing systems spanning from smartphones to large-scale cloud systems. The slowing improvement of memory capacity per dollar further worsens this problem. To address this, almost all industry-standard OSes like Linux, Windows, macOS, etc implement Memory compression to store more data in the same space. This is handled with software in today's systems which is very inefficient and suffers long latency thus degrading the user responsiveness. Hardware is always faster in performing computations compared to software. So, a solution that is implemented in hardware with the low area and low cost is always preferred as it can provide better performance and power efficiency. In the hardware world, such modules that perform specifically targeted software functions are called accelerators. This thesis shows the work on developing such a hardware accelerator to handle ``Locality Aware Memory Compression" so as to allow the applications to directly access compressed data without OS intervention thereby improving the overall performance of the system. The proposed accelerator is locality aware which means least recently allocated uncompressed page would be picked for compression to free up more space on-demand and most recently allocated page is put into an uncompressed format.
22

IMPROVING THE PERFORMANCE AND ENERGY EFFICIENCY OF EMERGING MEMORY SYSTEMS

Guo, Yuhua 01 January 2018 (has links)
Modern main memory is primarily built using dynamic random access memory (DRAM) chips. As DRAM chip scales to higher density, there are mainly three problems that impede DRAM scalability and performance improvement. First, DRAM refresh overhead grows from negligible to severe, which limits DRAM scalability and causes performance degradation. Second, although memory capacity has increased dramatically in past decade, memory bandwidth has not kept pace with CPU performance scaling, which has led to the memory wall problem. Third, DRAM dissipates considerable power and has been reported to account for as much as 40% of the total system energy and this problem exacerbates as DRAM scales up. To address these problems, 1) we propose Rank-level Piggyback Caching (RPC) to alleviate DRAM refresh overhead by servicing memory requests and refresh operations in parallel; 2) we propose a high performance and bandwidth efficient approach, called SELF, to breaking the memory bandwidth wall by exploiting die-stacked DRAM as a part of memory; 3) we propose a cost-effective and energy-efficient architecture for hybrid memory systems composed of high bandwidth memory (HBM) and phase change memory (PCM), called Dual Role HBM (DR-HBM). In DR-HBM, hot pages are tracked at a cost-effective way and migrated to the HBM to improve performance, while cold pages are stored at the PCM to save energy.
23

Étude des mécanismes de déclenchement des Bits Collés dans les SRAM et DRAM en Environnement Radiatif Spatial / Study of Mechanisms Leading to Stuck Bits on SRAM and DRAM Memories in the Space Radiation Environment

Rodriguez, Axel 02 March 2017 (has links)
Les résultats de différentes expériences du CNES (Centre National d’Études Spatiales) embarquées sur satellites montrent que des composants SRAM et SDRAM subissent des erreurs atypiques, qui se caractérisent par une fraction d’emplacements mémoire présentant des erreurs récurrentes. Ces erreurs non-catégorisées représentent la quasi-totalité des erreurs détectées sur ces mémoires. Une revue interne du CNES a déterminé que ces erreurs étaient dues aux radiations présentes dans l’environnement spatial (protons, électrons, ions lourds). Cette thèse s’attache à reproduire ces erreurs atypiques au sol en utilisant des moyens d’irradiation et des accélérateurs de particules, à les caractériser ainsi qu’à expliquer le mécanisme physique menant à l’apparition de ces cellules endommagées. Le mécanisme physique que nous proposons est cohérent avec les données obtenues sous faisceau de particules et soutenu par nos simulations de type TCAD. / CNES’s onboard experiment results on several satellites have demonstrated that on SRAM and SDRAM memories, a fraction of words suffers from unknown errors that increase the afflicted words’ rate of error by orders of magnitude compared to other words. CNES’s experts found that these errors were due to the space radiation environment (proton, electrons, heavy ions).The main goals of this Ph.D. thesis are to successfully recreate such errors at ground level using irradiation facilities and particle accelerators, to investigate their behavior and finally, to submit a physical mechanism for memory cell degradation under irradiation, both coherent with experimental data and data obtained from TCAD simulations.
24

Materials for DRAM Memory Cell Applications

Schroeder, Uwe, Cho, Kyuho, Slesazeck, Stefan 06 May 2022 (has links)
Semiconductor memory is one of the key technologies driving the success of Si-based information technology within the last five decades. The most prominent representative memory type, the dynamic random access memory(DRAM)was patented in 1967 and was introduced into the market by Intel Corporation in 1972. Until the year 2001 and the realization of the 110 nm technology node, DRAM was the driving force on the lithography shrink roadmap, before NAND FLASH took over that role. Hence, the development of the DRAM technology was long time the forerunner for the exponentially growing large-scale integration and promoted similar advances in logic chips. One of the reasons of the success of the DRAM is its simple cell structure, which consists of only one transistor (1T) and one capacitor (1C), where the information is stored in form of a charge.
25

Optimizing Memory Systems for High Efficiency in Computing Clusters

Liu, Wenjie January 2022 (has links)
DRAM-based memory system suffers from increasing aggravating row buffer interference, which causes significant performance degradation and power consumption. With DRAM scaling, the overheads of row buffer interference become even worse due to higher row activation and precharge latency. Clusters have been a prevalent and successful computing framework for processing large amount of data due to their distributed and parallelized working paradigm. A task submitted to a cluster is typically divided into a number of subtasks which are designated to different work nodes running the same code but dealing with different equal portion of the dataset to be processed. Due to the existence of heterogeneity, it could easily result in stragglers unfairly slowing down the entire processing, because work nodes finish their subtasks at different rates. With the increasing problem complexity, more irregular applications are deployed on high-performance clusters due to the parallel working paradigm, and yield irregular memory access behaviors across nodes. However, the irregularity of memory access behaviors is not comprehensively studied, which results in low utilization of the integrated hybrid memory system compositing of stacked DRAM and off-chip DRAM. This dissertation lists our research results on the above three mentioned challenges in order to optimize the memory system for high efficiency in computing clusters. Details are as follows: To address low row buffer utilization caused by row buffer interference, we propose Row Buffer Cache (RBC) architecture to efficiently mitigate row buffer interference overheads. At the core of the RBC architecture, the DRAM pages with good locality are cached and escape from the row buffer interference.Such an RBC architecture significantly reduces the overheads caused by row activation and precharge, thus improves overall system performance and energy efficiency. We evaluate our RBC using SPEC CPU2006 on a DDR4 memory compared to the commodity baseline memory system along with the state-of-art methods, DICE and Bingo. Results show that RBC improves the memory performance by up to 2.24X (16.1% on average) and reduces the overall memory energy by up to 68.2% (23.6% on average) for single-core simulations. For multi-core simulations, RBC increases the performance by up to 1.55X (16.7% on average) and reduces the energy by up to 35.4% (21.3% on average). Comparing with the state-of-art methods, RBC outperforms DICE and Bingo by 8% and 5.1% on average for single-core scenario, and by 10.1% and 4.7% for multi-core scenario. To relax the straggling effect observed in clusters, we aim to speed up straggling work nodes to quicken the overall processing by leveraging exhibited performance variation, and propose StragglerHelper which conveys the memory access characteristics experienced by the forerunner to the stragglers such that stragglers can be sped up due to the accurately informed memory prefetching. A Progress Monitor is deployed to supervise the respective progresses of the work nodes and inform the memory access patterns of forerunner to straggling nodes. Our evaluation results with the SPEC MPI 2007 and BigDataBench on a cluster of 64 work nodes have shown that StragglerHelper is able to improve the execution time of stragglers by up to 99.5% with an average of 61.4%, contributing to an overall improvement of the entire cohort of the cluster by up to 46.7% with an average of 9.9% compared to the baseline cluster. To address the performance difference in the irregular application, we devise a novel method called Similarity-Managed Hybrid Memory System (SM-HMS) to improve the hybrid memory system performance by leveraging the memory access similarity among nodes in a cluster. Within SM-HMS, two techniques are proposed, Memory Access Similarity Measuring and Similarity-based Memory Access Behavior Sharing. To quantify the memory access similarity, memory access behaviors of each node are vectorized, and the distance between two vectors is used as the memory access similarity. The calculated memory access similarity is used to share memory access behaviors precisely across nodes. With the shared memory access behaviors, SM-HMS divides the stacked DRAM into two sections, the sliding window section and the outlier section. The shared memory access behaviors guide the replacement of the sliding window section while the outlier section is managed in the LRU manner. Our evaluation results with a set of irregular applications on various clusters consisting of up to 256 nodes have shown that SM-HMS outperforms the state-of-the-art approaches, Cameo, Chameleon, and Hyrbid2, on job finish time reduction by up to 58.6%, 56.7%, and 31.3%, with 46.1%, 41.6%, and 19.3% on average, respectively. SM-HMS can also achieve up to 98.6% (91.9% on average) of the ideal hybrid memory system performance. / Computer and Information Science
26

記憶體模組產業的轉型策略-以C公司為例 / Industrial transformation strategy of Memory Module - Case of Company C

林婉菁 Unknown Date (has links)
台灣記憶體模組廠商在全球的排名及佔有率約有10%~15%,故台灣現仍為全球記憶體模組的生產重鎮,但近兩年來,我國半導體製造業有逐年下降的現象,營收表現則受到國際情勢及消費性電子產品發展的影響而出現波動;當創新產品出現並大幅成長,造成PC及NB的需求惡化,再加上國內記憶體工廠生產成本沒有競爭力,國內DRAM大廠紛紛轉型或被迫退出市場,台灣記憶體產業的榮景已不復見。 現今台灣記憶體模組廠產業大多跳脫DRAM模組單一產品,而是多元化朝MP3播放器、數位相框、外接式硬碟盒等產品布局,但消費性產品的變化快,中小型記憶體模組廠在大廠大者恆大,小則轉型或被收購合併甚至面臨關廠的命運;中小型記憶體模組產業前景又將是營運艱辛的未來。為了解記憶體模組產業之現況及未來發展前景,因此本研究將針對國內記憶體模組公司-C公司的現況作詳細的了解,並提出對未來之建議。
27

Increasing memory access efficiency through a two-level memory controller

Linck, Marcelo Melo 22 March 2018 (has links)
Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-04-03T14:30:24Z No. of bitstreams: 1 MARCELO_MELO_LINCK_DIS.pdf: 4153250 bytes, checksum: 821a8f1e65f49c1b24a0b69b4f6e7f94 (MD5) / Approved for entry into archive by Tatiana Lopes (tatiana.lopes@pucrs.br) on 2018-04-12T21:09:45Z (GMT) No. of bitstreams: 1 MARCELO_MELO_LINCK_DIS.pdf: 4153250 bytes, checksum: 821a8f1e65f49c1b24a0b69b4f6e7f94 (MD5) / Made available in DSpace on 2018-04-12T21:23:08Z (GMT). No. of bitstreams: 1 MARCELO_MELO_LINCK_DIS.pdf: 4153250 bytes, checksum: 821a8f1e65f49c1b24a0b69b4f6e7f94 (MD5) Previous issue date: 2018-03-22 / Acessos simult?neos gerados por m?ltiplos clientes para um ?nico dispositivo de mem?ria em um Sistema-em-Chip (SoC) imp?e desafios que requerem aten??o extra devido ao gargalo gerado na performance. Considerando estes clientes como processadores, este problema torna-se mais evidente, pois a taxa de crescimento de velocidade para processadores excede a de dispositivos de mem?ria, criando uma lacuna de desempenho. Neste cen?rio, estrat?gias de controle de mem?ria s?o necess?rias para aumentar o desempenho do sistema. Estudos provam que a comunica??o com a mem?ria ? a maior causa de atrasos durante a execu??o de programas em processadores. Portanto, a maior contribui??o deste trabalho ? a implementa??o de uma arquitetura de controlador de mem?ria composta por dois n?veis: prioridade e mem?ria. O n?vel de prioridade ? respons?vel por interagir com os clientes e escalonar requisi??es de mem?ria de acordo com um algoritmo de prioridade fixa. O n?vel de mem?ria ? respons?vel por reordenar as requisi??es e garantir o isolamento de acesso ? mem?ria para clientes de alta prioridade. O principal objetivo deste trabalho ? apresentar um modelo que reduza as lat?ncias de acesso ? mem?ria para clientes de alta prioridade em um sistema altamente escal?vel. Os experimentos neste trabalho foram realizados atrav?s de uma simula??o comportamental da estrutura proposta utilizando um programa de simula??o. A an?lise dos resultados ? dividida em quatro partes: an?lise de lat?ncia, an?lise de row-hit, an?lise de tempo de execu??o e an?lise de escalabilidade. / Simultaneous accesses generated by memory clients in a System-on-Chip (SoC) to a single memory device impose challenges that require extra attention due to the performance bottleneck created. When considering these clients as processors, this issue becomes more evident, because the growth rate in speed for processors exceeds the same rate for memory devices, creating a performance gap. In this scenario, memory-controlling strategies are necessary to improve system performances. Studies have proven that the main cause of processor execution lagging is the memory communication. Therefore, the main contribution of this work is the implementation of a memory-controlling architecture composed of two levels: priority and memory. The priority level is responsible for interfacing with clients and scheduling memory requests according to a fixed-priority algorithm. The memory level is responsible for reordering requests and guaranteeing memory access isolation to high-priority clients. The main objective of this work is to provide latency reductions to high-priority clients in a scalable system. Experiments in this work have been conducted considering the behavioral simulation of the proposed architecture through a software simulator. The evaluation of the proposed work is divided into four parts: latency evaluation, row-hit evaluation, runtime evaluation and scalability evaluation.
28

Sensitivity Analyses for Tumor Growth Models

Mendis, Ruchini Dilinika 01 April 2019 (has links)
This study consists of the sensitivity analysis for two previously developed tumor growth models: Gompertz model and quotient model. The two models are considered in both continuous and discrete time. In continuous time, model parameters are estimated using least-square method, while in discrete time, the partial-sum method is used. Moreover, frequentist and Bayesian methods are used to construct confidence intervals and credible intervals for the model parameters. We apply the Markov Chain Monte Carlo (MCMC) techniques with the Random Walk Metropolis algorithm with Non-informative Prior and the Delayed Rejection Adoptive Metropolis (DRAM) algorithm to construct parameters' posterior distributions and then obtain credible intervals.
29

A New TFT with Trenched Body and Airgap-Insulated Structure for Capacitorless 1T-DRAM Application

Chang, Tzu-feng 29 July 2010 (has links)
In this thesis, we propose a new thin-film transistor with trenched body and airgap-insulated structure (AITFT) for one-transistor dynamic random access memory (1T-DRAM) applications and investigate the influence of different materials on the sensing current window and retention time. Its basic operation mechanisms are based on the impact ionization and floating body effects. Due to the generated holes storing in the pseudo neutral region, the threshold voltage (Vth) is lower, resulting in a high drain current for state ¡§1¡¨. So we can recognize the data by sensing the difference of the drain current. According to the ISE TCAD 10.0 simulations, owing to the design of trench and airgap-isolation structure, the AITFT can enhance about 212% sensing current window and 42% retention time compared with the conventional TFT at the channel length of 150 nm and temperature of 300K conditions. Also, owing to the source/drain-tie, the generated heat can be dissipated quickly from the source/drain to the substrate thus the thermal instability is improved. In other words, the AITFT can improve the thermal reliability but without losing control of the short-channel effects.
30

Study of High Speed Main Amplifier and Low Power Peripheral Circuits for Low Supply Voltage Dynamic Random Access Memory

Chang, Yao-Sheng 09 July 2001 (has links)
Three high performance circuits for a low power supply DRAM¡¦s are presented in this thesis. First, a modified multi-stage sense amplifier is proposed, that utilizes the auxiliary transmission gate and charge recycling technique. The auxiliary NMOS transistor of the multi-stage sense amplifier is replaced by the transmission gate to improve the sensing speed. In addition, the charge recycling technique is used to reduce the power dissipation of multi-stage sense amplifier. It improves the sensing time by 6.1ns (24.4%) compared to that of the conventional multi-stage sense amplifier and the power saving percentage of 25.6% compared to that of the conventional one. Second, an improved Standby Power Reduction (SPR) Circuit is reported. The capacitor boosting technique is utilized in our proposed Static Current Cut-off Standby Power Reduction (SCCSPR) Circuit, which turns off the always-on MOS transistor of SPR circuit. The power consumption is 30.9% reduced by our design compared to that of the conventional SPR circuit. Third, an improved voltage doubler is developed. The indirect switch is utilized in our proposed circuit, it provides larger gate source bias applied to the PMOS pass transistor. Thus, the current drivability is arisen and the pumping speed is improved as well. In the 2V supply voltage, the pumping speed of our modified voltage doubler is arisen about 18.6% compared to that of the conventional voltage doubler. These high performance circuits in this thesis are applied in a 1-Kbit DRAM circuits. A data access time of 36ns and total power consumption 52.58mW are attained when the supply voltage is 2V. The access time of 10.3ns (22.2%) and power consumption of 6.44mW (11%) are reduced compared to that of the conventional DRAM.

Page generated in 0.0247 seconds