11 |
Minnestekniker bortom halvledare for inbyggda system / Memory technology for embedded systems beyond semiconductorsChowdhury, Taseen January 2022 (has links)
Silicon manufacturers are experiencing shortages ofsemiconductors and the demand for cost-effective, power-efficientembedded memory solutions is increasing. For these issues, a newemerging memory technology called embedded magnetoresistiverandom access memory (eMRAM) and the development of thewrite mechanism called spin-transfer torque (STT-MRAM) havebeen proposed. The eMRAM has non-volatility, reduced totalenergy consumption, fast read/write operation and has a smallmacro size compared to the semiconductor-based memory typessuch as SRAM, Flash and EEPROM. The purpose of thisstudy is to investigate eMRAM and how it can be used in amicrocontroller to replace all three existing, semiconductor-basedmemory types. The focus will be on how solution can be createdwith smaller memory chip area, improved energy efficiency andfaster read/write operations. A literature review was established,to determine if eMRAM does indeed result in better memorycharacteristics and memory performance. As well as to determinethe requirements that is needed for a flash-type and SRAMtypeapplication. The study shows that eMRAM have a potentialto create many solutions for a microcontroller, such as it hasthe potential to simplify its memory architecture by providing aunified memory solution for its code and data storage as well asfor its working memory. / Halvedartillverkarna står inför svårigheter på grund av bristen på halvledare och att efterfrågan på kostnadseffektiva, strömsnåla inbyggda minneslösningar ökas. För att lösa dessa problem har en ny framväxande minnesteknik som kallas för inbyggd magnetoresistivt slumpmässigt åtkomstminne (eMRAM) och utvecklingen av skrivmekanismen som kallas för spinn-överföringsmoment (STT-MRAM) föreslagits. eMRAM har icke-flyktighet, en låg total energiförbrukning, snabba läsoch skrivfunktioner och har en liten makrostorlek jämfört med de halvledarbaserade minnestyperna såsom SRAM, Flash och EEPROM. Syftet med denna studie är att undersöka eMRAM och hur det kan användas i en mikrokontroller för att ersätta alla tre befintliga halvledarbaserade minnestyper. Fokuset kommer att ligga på hur en lösning kan skapas med mindre minneschipyta, bättre energieffektivitet och snabbare läsoch skrivoperationer. En litteraturgenomgång gjordes för att fastställa om eMRAM verkligen resulterar i bättre minnesegenskaper samt minnesprestanda och att fastställa de krav som krävs för en tillämpning av en flash-typ och en SRAM-typ applikation. Undersökningen visar att eMRAM har en potential att skapa många lösningar för en mikrokontroller, t.ex. har den som potential att förenkla dess minnesarkitektur genom att bidra med en enhetlig minneslösning för kod- och datalagring samt för arbetsminnet. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm
|
12 |
IN-MEMORY COMPUTING WITH CMOS AND EMERGING MEMORY TECHNOLOGIESShubham Jain (7464389) 17 October 2019 (has links)
Modern computing workloads such as machine learning and data analytics perform simple computations on large amounts of data. Traditional von Neumann computing systems, which consist of separate processor and memory subsystems, are inefficient in realizing modern computing workloads due to frequent data transfers between these subsystems that incur significant time and energy costs. In-memory computing embeds computational capabilities within the memory subsystem to alleviate the fundamental processor-memory bottleneck, thereby achieving substantial system-level performance and energy benefits. In this dissertation, we explore a new generation of in-memory computing architectures that are enabled by emerging memory technologies and new CMOS-based memory cells. The proposed designs realize Boolean and non-Boolean computations natively within memory arrays.<br><div><br></div><div>For Boolean computing, we leverage the unique characteristics of emerging memories that allow multiple word lines within an array to be simultaneously enabled, opening up the possibility of directly sensing functions of the values stored in multiple rows using single access. We propose Spin-Transfer Torque Compute-in-Memory (STT-CiM), a design for in-memory computing with modifications to peripheral circuits that leverage this principle to perform logic, arithmetic, and complex vector operations. We address the challenge of reliable in-memory computing under process variations utilizing error detecting and correcting codes to control errors during CiM operations. We demonstrate how STT-CiM can be integrated within a general-purpose computing system and propose architectural enhancements to processor instruction sets and on-chip buses for in-memory computing. <br></div><div><br></div><div>For non-Boolean computing, we explore crossbar arrays of resistive memory elements, which are known to compactly and efficiently realize a key primitive operation involved in machine learning algorithms, i.e., vector-matrix multiplication. We highlight a key challenge involved in this approach - the actual function computed by a resistive crossbar can deviate substantially from the desired vector-matrix multiplication operation due to a range of device and circuit level non-idealities. It is essential to evaluate the impact of the errors introduced by these non-idealities at the application level. There has been no study of the impact of non-idealities on the accuracy of large-scale workloads (e.g., Deep Neural Networks [DNNs] with millions of neurons and billions of synaptic connections), in part because existing device and circuit models are too slow to use in application-level evaluation. We propose a Fast Crossbar Model (FCM) to accurately capture the errors arising due to crossbar non-idealities while being four-to-five orders of magnitude faster than circuit simulation. We also develop RxNN, a software framework to evaluate DNN inference on resistive crossbar systems. Using RxNN, we evaluate a suite of large-scale DNNs developed for the ImageNet Challenge (ILSVRC). Our evaluations reveal that the errors due to resistive crossbar non-idealities can degrade the overall accuracy of DNNs considerably, motivating the need for compensation techniques. Subsequently, we propose CxDNN, a hardware-software methodology that enables the realization of large-scale DNNs on crossbar systems with minimal degradation in accuracy by compensating for errors due to non-idealities. CxDNN comprises of (i) an optimized mapping technique to convert floating-point weights and activations to crossbar conductances and input voltages, (ii) a fast re-training method to recover accuracy loss due to this conversion, and (iii) low-overhead compensation hardware to mitigate dynamic and hardware-instance-specific errors. Unlike previous efforts that are limited to small networks and require the training and deployment of hardware-instance-specific models, CxDNN presents a scalable compensation methodology that can address large DNNs (e.g., ResNet-50 on ImageNet), and enables a common model to be trained and deployed on many devices. <br></div><div><br></div><div>For non-Boolean computing, we also propose TiM-DNN, a programmable hardware accelerator that is specifically designed to execute ternary DNNs. TiM-DNN supports various ternary representations including unweighted (-1,0,1), symmetric weighted (-a,0,a), and asymmetric weighted (-a,0,b) ternary systems. TiM-DNN is an in-memory accelerator designed using TiM tiles --- specialized memory arrays that perform massively parallel signed vector-matrix multiplications on ternary values per access. TiM tiles are in turn composed of Ternary Processing Cells (TPCs), new CMOS-based memory cells that function as both ternary storage units and signed scalar multiplication units. We evaluate an implementation of TiM-DNN in 32nm technology using an architectural simulator calibrated with SPICE simulation and RTL synthesis. TiM-DNN achieves a peak performance of 114 TOPs/s, consumes 0.9W power, and occupies 1.96mm2 chip area, representing a 300X improvement in TOPS/W compared to a state-of-the-art NVIDIA Tesla V100 GPU. In comparison to popular quantized DNN accelerators, TiM-DNN achieves 55.2X-240X and 160X-291X improvement in TOPS/W and TOPS/mm2, respectively.<br></div><div><br></div><div>In summary, the dissertation proposes new in-memory computing architectures as well as addresses the need for scalable modeling frameworks and compensation techniques for resistive crossbar based in-memory computing fabrics. Our evaluations show that in-memory computing architectures are promising for realizing modern machine learning and data analytics workloads, and can attain orders-of-magnitude improvement in system-level energy and performance over traditional von Neumann computing systems. <br></div>
|
13 |
Workload Driven Designs for Cost-Effective Non-Volatile Memory HierarchiesTimothy A Pritchett (9179468) 28 July 2020 (has links)
Compared to traditional hard-disk drives (HDDs), non-volatile (NV) memory technologies offer significant performance advantages on one hand, but also incur significant cost and asymmetric write-performance on the other. A common strategy to manage such cost- and performance-differentials is to use hierarchies such that a small, but intensely accessed, working set is staged in the NV storage (selective caching). However, when this working set includes write-heavy data, the low write-lifetime of NV storage necessitates significant over-provisioning to maintain required lifespans (e.g., storage lifespan must match or exceed 3 year server lifespan). One may think that employing DRAM-based write-buffers can filter writes that trickle through to the NV storage and thus alleviate the write-pressure felt at the NV storage. Unfortunately, selective caches, when used with common recency-based or frequency-based replacement, have access patterns that require large write buffers (e.g., 100MB+ relative to a 12GB cache) to filter writes adequately. Further, these large DRAM write-buffers also require backup-power to ensure the durability of disk writes. More sophisticated replacement policies that combine recency and frequency can reduce the size of the DRAM buffer (while preserving write-filtering), but are so computationally-expensive that they can limit the I/O rate, especially for simple controllers (e.g., RAID controller). <br>My first contribution is the design and implementation of WriteGuard– a self-tuning sieving write-buffer algorithm that filters writes as well as the highly-effective (but computationally-expensive) algorithms while requiring lightweight computation comparable to a simple LRU-based write-buffer. While WriteGuard reduces the capacity needed for DRAM buffering (to approx. 64 MB), it does not eliminate the need for DRAM buffers (and corresponding power backup).<br>For my second thrust, I identify two specific application characteristics – (1) the vast majority of the write-buffer’s contents is composed of write-dominant blocks, and (2) the vast majority of blocks in the write-buffer are overwritten within a period of 28 hours. I show that these characteristics help enable a high-density, optimized STT-MRAM as a replacement for DRAM, which enables durable write-buffers (thus eliminating the cost of power backup for the write-buffer). My optimized STT-MRAM-based write buffer achieves higher density by (a) trading off superfluous durability by exploiting characteristic (2), and (b) deoptimizing the read-performance of STT-MRAM by leveraging characteristic (1). Together, the techniques increase the density of STT-MRAM by 20% with low or no impact on write-buffer performance.<br>
|
14 |
Nanoscale resistive switching memory devices: a reviewSlesazeck, Stefan, Mikolajick, Thomas 10 November 2022 (has links)
In this review the different concepts of nanoscale resistive switching memory devices are described and classified according to their I–V behaviour and the underlying physical switching mechanisms. By means of the most important representative devices, the current state of electrical performance characteristics is illuminated in-depth. Moreover, the ability of resistive switching devices to be integrated into state-of-the-art CMOS circuits under the additional consideration with a suitable selector device for memory array operation is assessed. From this analysis, and by factoring in the maturity of the different concepts, a ranking methodology for application of the nanoscale resistive switching memory devices in the memory landscape is derived. Finally, the suitability of the different device concepts for beyond pure memory applications, such as brain inspired and neuromorphic computational or logic in memory applications that strive to overcome the vanNeumann bottleneck, is discussed.
|
15 |
Teoretická studie magnetické anizotropie v magnetických tunelových spojích na bázi MgO / Theoretical Study of Magnetic Anisotropy in MgO-based Magnetic Tunnel JunctionsVojáček, Libor January 2021 (has links)
Magnetický tunelový spoj (MTJ) je spintronická součástka komerčně používaná ve vysoce citlivých čtecích hlavách pevných disků. Počínaje rokem 2007 přispěla k udržení exponenciálního nárůstu hustoty magnetického zápisu. Kromě toho se také stala stavebním kamenem rychlé, odolné, úsporné a nevolatilní magnetické paměti s přímým přístupem (MRAM). Tento nový typ polovodičové paměti, stejně jako je tomu u čtecích hlav disků, využívá tunelové spoje založené na krystalickém oxidu hořečnatém (MgO) spolu s 3d kovovými magnetickými prvky (Fe a Co). Pro zmenšení MTJ a současné udržení dlouhodobé stability paměti proti tepelným fluktuacím je zapotřebí silná magnetická anizotropie ve směru kolmém na rozhraní kov|MgO. V této práci proto nejdříve provedeme analýzu magnetokrystalické anizotropie (MCA) kubického prostorově centrovaného Fe, Co a Ni na MgO pomocí ab initio simulací. Dále bude vyvinut program pro výpočet tvarové anizotropie, která je kromě MCA velmi podstatná, neboť v součtu dávají efektivní anizotropii. Na závěr implementujeme program pro výpočet MCA na základě poruchové teorie druhého řádu. To nám umožní dát pozorované anizotropní vlastnosti do souvislosti přímo s elektronickou strukturou systému (pásovou strukturou a hustotou stavů).
|
16 |
Device-Circuit Co-Design Employing Phase Transition Materials for Low Power ElectronicsAhmedullah Aziz (7025126) 12 August 2019 (has links)
<div>
<div>
<p>Phase
transition materials (PTM) have garnered immense interest in concurrent
post-CMOS electronics, due to their unique properties such as - electrically
driven abrupt resistance switching, hysteresis, and high selectivity. The phase
transitions can be attributed to diverse material-specific phenomena, including-
correlated electrons, filamentary ion diffusion, and dimerization. In this
research, we explore the application space for these materials through
extensive device-circuit co-design and propose new ideas harnessing their unique
electrical properties. The abrupt transitions and high selectivity of PTMs
enable steep (< 60 mV/decade) switching characteristics in Hyper-FET, a
promising post-CMOS transistor. We explore device-circuit co-design methodology
for Hyper-FET and identify the criterion for material down-selection. We evaluate
the achievable voltage swing, energy-delay trade-off, and noise response for
this novel device. In addition to the application in low power logic device,
PTMs can actively facilitate non-volatile memory design. We propose a PTM
augmented Spin Transfer Torque (STT) MRAM that utilizes selective phase
transitions to boost the sense margin and stability of stored data,
simultaneously. We show that such selective transitions can also be used to
improve other MRAM designs with separate read/write paths, avoiding the possibility
of read-write conflicts. Further, we analyze the application of PTMs as
selectors in cross-point memories. We establish a general simulation framework for
cross-point memory array with PTM based <i>selector</i>.
We explore the biasing constraints, develop detailed design methodology, and
deduce figures of merit for PTM selectors. We also develop a computationally
efficient compact model to estimate the leakage through the sneak paths in a
cross-point array. Subsequently, we present a new sense amplifier design utilizing
PTM, which offers built-in tunable reference with low power and area demand.
Finally, we show that the hysteretic characteristics of unipolar PTMs can be
utilized to achieve highly efficient rectification. We validate the idea by demonstrating
significant design improvements in a <i>Cockcroft-Walton
Multiplier, </i>implemented with TS
based rectifiers. We emphasize the need to explore other PTMs with high
endurance, thermal stability, and faster switching to enable many more
innovative applications in the future.</p></div></div>
|
Page generated in 0.0188 seconds