Global ETD Search

51	Reducing DRAM Row Activations with Eager Writeback Jeon, Myeongjae 06 September 2012 (has links) This thesis describes and evaluates a new approach to optimizing DRAM performance and energy consumption that is based on eagerly writing dirty cache lines to DRAM. Under this approach, dirty cache lines that have not been recently accessed are eagerly written to DRAM when the corresponding row has been activated by an ordinary access, such as a read. This approach enables clustering of reads and writes that target the same row, resulting in a significant reduction in row activations. Specifically, for 29 applications, it reduces the number of DRAM row activations by an average of 38% and a maximum of 81%. The results from a full system simulator show that for the 29 applications, 11 have performance improvements between 10% and 20%, and 9 have improvements in excess of 20%. Furthermore, 10 consume between 10% and 20% less DRAM energy, and 10 have energy consumption reductions in excess of 20%. DRAM memory system eager writeback row activation energy savings performance optimization
52	Multilevel Gain Cell Arrays for Fault-Tolerant VLSI Systems Khalid, Muhammad Umer January 2011 (has links) Embedded memories dominate area, power and cost of modern very large scale integrated circuits system on chips ( VLSI SoCs). Furthermore, due to process variations, it becomes challenging to design reliable energy efficient systems. Therefore, fault-tolerant designs will be area efficient, cost effective and have low power consumption. The idea of this project is to design embedded memories where reliability is intentionally compromised to increase storage density. Gain cell memories are smaller than SRAM and unlike DRAM they are logic compatible. In multilevel DRAM storage density is increased by storing two bits per cell without reducing feature size. This thesis targets multilevel read and write schemes that provide short access time, small area overhead and are highly reliable. First, timing analysis of reference design is performed for read and write operation. An analytical model of write bit line (WBL) is developed to have an estimate of write delay. Replica technique is designed to generate the delay and track variations of storage array. Design of replica technique is accomplished by designing replica column, read and write control circuits. A memory controller is designed to control the read and write operation in multilevel DRAM. A multilevel DRAM is with storage capacity of eight kilobits is designed in UMC 90 nm technology. Simulations are performed for testing and results are reported for energy and access time. Monte Carlo analysis is done for variation tolerance of replica technique. Finally, multilevel DRAM with replica technique is compared with reference design to check the improvement in access times. DRAM SRAM gain cell multilevel fault-tolerant replica technique finite state machine PVT variations.
53	A Vertical Middle Partial Insulation Structure for Capacitorless 1T-DRAM Application Chen, Cheng-Hsin 03 August 2011 (has links) In this thesis, we propose a novel vertical MOSFET device with middle partial insulator (MPI) or VMPI for capacitorless one transistor dynamic random access memory (1T-DRAM) application. In TCAD simulations, we compare the device performances of the planar MPI, conventional silicon-on-insulator SOI, and our proposed VMPI. Based on numerical simulation, we find out that the VMPI device has a large kink phenomenon for improving the programming window. As far as the data retention time is concerned, the hole carriers leaking into the source region are reduced due to the presence of a large pseudo neutral region and an effective blocking oxide layer. The retention time can thus be improved about 5 times when compared with conventional SOI counterpart. Furthermore, it should be noted that the gate-all-around (GAA) VMPI device structure not only increases the body pseudo-neutral region, but also enhances the 1T-DRAM performances, suggesting that the proposed VMPI can become a candidate for 1T-DRAM application. Floating Body Effect Data Retention Time Middle Partial Insulation 1T-DRAM Vertical MOSFET SOI
54	Study of High Performance Circuits for 2.0V Embedded Dynamic Random Access Memory Chen, Wei-Shiun 27 July 2000 (has links) Abstract Four high-performance circuits design techniques for embedded DRAM are proposed. First, a negative voltage generator having high efficiency is proposed to provide the negative voltage for the modified word line driver. The negative voltage generator circuits could be manufactured in n-Well CMOS process, and its operation achieve optimal output voltage. When 2.0-V supplied voltage is applied, the output voltage of -1.6-V is obtained. Even though, the supplied voltage is scaled down to 1.5-V, the output voltage can still achieve -1.05-V. In contrast, the output voltage of traditional one under 2.0-V supplied voltage is only -0.67-V. Second, a fast wordline driver suitable for PMOS pass transistor is proposed. The wordline driver improves the turned-on time by 26.8ns compared with the traditional one and raises the operating speed by 79%. Third, a new reduced clock-swing driver is proposed. Under 2.0-V supplied voltage and 100MHz operating frequency, the total power consumption of the new driver working with RCSFF is reduced by 10% than that of traditional one working with RCSFF. For the above advantage of low power, the new driver is thus more suitable for embedded DRAM applications. Fourth, a modified hierarchical read bus amplifier is proposed. The read bus amplifier is based on the new sense-amplifier. It could drive the output by full-swing voltage. It improves the sensing speed by 2.1ns. And it got the same advantage of no dc idling current as the traditional N&PMOS cross-coupled amplifier. In this thesis, finally, the performance of these circuits is also integrated and examined in an 1-Kbit embedded DRAM test circuit. The simulation RAS access time of 27.9ns is achieved under 2.0V supplied voltage and loading of 16-Mbit embedded DRAM. This indicated the above proposed circuits could be applied in the low voltage and high speed embedded DRAM. wordline driver reduced clock-swing driver embedded dram read bus amplifier
55	Mitigating DRAM complexities through coordinated scheduling policies Stuecheli, Jeffrey Adam 04 June 2012 (has links) Contemporary DRAM systems have maintained impressive scaling by managing a careful balance between performance, power, and storage density. In achieving these goals, a significant sacrifice has been made in DRAM's operational complexity. To realize good performance, systems must properly manage the significant number of structural and timing restrictions of the DRAM devices. DRAM's efficient use is further complicated in many-core systems where the memory interface has to be shared among multiple cores/threads competing for memory bandwidth. In computer architecture, caches have primarily been viewed as a means to hide memory latency from the CPU. Cache policies have focused on anticipating the CPU's data needs, and are mostly oblivious to the main memory. This work demonstrates that the era of many-core architectures has created new main memory bottlenecks, and mandates a new approach: coordination of cache policy with main memory characteristics. Using the cache for memory optimization purposes dramatically expands the memory controller's visibility of processor behavior, at low implementation overhead. Through memory-centric modification of existing policies, such as scheduled writebacks, this work demonstrates that performance-limiting effects of highly-threaded architectures combined with complex DRAM operation can be overcome. This work shows that an awareness of the physical main memory layout and by focusing on writes, both read and write average latency can be shortened, memory power reduced, and overall system performance improved. The use of the "Page-Mode" feature of DRAM devices can mitigate many DRAM constraints. Current open-page policies attempt to garner the highest level of page hits. In an effort to achieve this, such greedy schemes map sequential address sequences to a single DRAM resource. This non-uniform resource usage pattern introduces high levels of conflict when multiple workloads in a many-core system map to the same set of resources. This work presents a scheme that provides a careful balance between the benefits (increased performance and decreased power), and the detractors (unfairness) of page-mode accesses. In the proposed Minimalist approach, the system targets "just enough" page-mode accesses to garner page-mode benefits, avoiding system unfairness. This is accomplished with the use of a fair memory hashing scheme to control the maximum number of page mode hits. High density memory is becoming ever more important as many execution streams are consolidated onto single chip many-core processors. DRAM is ubiquitous as a main memory technology, but while DRAM's per-chip density and frequency continue to scale, the time required to refresh its dynamic cells has grown at an alarming rate. This work shows how currently-employed methods to schedule refresh operations are ineffective in mitigating the significant performance degradation caused by longer refresh times. Current approaches are deficient -- they do not effectively exploit the flexibility of DRAMs to postpone refresh operations. This work proposes dynamically reconfigurable predictive mechanisms that exploit the full dynamic range allowed in the industry standard DRAM memory specifications. The proposed mechanisms are shown to mitigate much of the penalties seen with dense DRAM devices. In summary this work presents a significant improvement in the ability to exploit the capabilities of high density, high frequency, DRAM devices in a many-core environment. This is accomplished though coordination of previously disparate system components, exploiting integration of such components into highly integrated system designs. / text Memory systems DRAM Cache Memory refresh Memory page-mode Memory scheduler
56	Architecting heterogeneous memory systems with 3D die-stacked memory Sim, Jae Woong 21 September 2015 (has links) The main objective of this research is to efficiently enable 3D die-stacked memory and heterogeneous memory systems. 3D die-stacking is an emerging technology that allows for large amounts of in-package high-bandwidth memory storage. Die-stacked memory has the potential to provide extraordinary performance and energy benefits for computing environments, from data-intensive to mobile computing. However, incorporating die-stacked memory into computing environments requires innovations across the system stack from hardware and software. This dissertation presents several architectural innovations to practically deploy die-stacked memory into a variety of computing systems. First, this dissertation proposes using die-stacked DRAM as a hardware-managed cache in a practical and efficient way. The proposed DRAM cache architecture employs two novel techniques: hit-miss speculation and self-balancing dispatch. The proposed techniques virtually eliminate the hardware overhead of maintaining a multi-megabytes SRAM structure, when scaling to gigabytes of stacked DRAM caches, and improve overall memory bandwidth utilization. Second, this dissertation proposes a DRAM cache organization that provides a high level of reliability for die-stacked DRAM caches in a cost-effective manner. The proposed DRAM cache uses error-correcting code (ECCs), strong checksums (CRCs), and dirty data duplication to detect and correct a wide range of stacked DRAM failures—from traditional bit errors to large-scale row, column, bank, and channel failures—within the constraints of commodity, non-ECC DRAM stacks. With only a modest performance degradation compared to a DRAM cache with no ECC support, the proposed organization can correct all single-bit failures, and 99.9993% of all row, column, and bank failures. Third, this dissertation proposes architectural mechanisms to use large, fast, on-chip memory structures as part of memory (PoM) seamlessly through the hardware. The proposed design achieves the performance benefit of on-chip memory caches without sacrificing a large fraction of total memory capacity to serve as a cache. To achieve this, PoM implements the ability to dynamically remap regions of memory based on their access patterns and expected performance benefits. Lastly, this dissertation explores a new usage model for die-stacked DRAM involving a hybrid of caching and virtual memory support. In the common case where system’s physical memory is not over-committed, die-stacked DRAM operates as a cache to provide performance and energy benefits to the system. However, when the workload’s active memory demands exceed the capacity of the physical memory, the proposed scheme dynamically converts the stacked DRAM cache into a fast swap device to avoid the otherwise grievous performance penalty of swapping to disk. Stacked DRAM Die-stacking Memory systems Heterogeneous memory Hardware management Computer architecture
57	DRAM-aware prefetching and cache management Lee, Chang Joo, 1975- 11 February 2011 (has links) Main memory system performance is crucial for high performance microprocessors. Even though the peak bandwidth of main memory systems has increased through improvements in the microarchitecture of Dynamic Random Access Memory (DRAM) chips, conventional on-chip memory systems of microprocessors do not fully take advantage of it. This results in underutilization of the DRAM system, in other words, many idle cycles on the DRAM data bus. The main reason for this is that conventional on-chip memory system designs do not fully take into account important DRAM characteristics. Therefore, the high bandwidth of DRAM-based main memory systems cannot be realized and exploited by the processor. This dissertation identifies three major performance-related characteristics that can significantly affect DRAM performance and makes a case for DRAM characteristic-aware on-chip memory system design. We show that on-chip memory resource management policies (such as prefetching, buffer, and cache policies) that are aware of these DRAM characteristics can significantly enhance entire system performance. The key idea of the proposed mechanisms is to send out to the DRAM system useful memory requests that can be serviced with low latency or in parallel with other requests rather than requests that are serviced with high latency or serially. Our evaluations demonstrate that each of the proposed DRAM-aware mechanisms significantly improves performance by increasing DRAM utilization for useful data. We also show that when employed together, the performance benefit of each mechanism is achieved additively: they work synergistically and significantly improve the overall system performance of both single-core and Chip MultiProcessor (CMP) systems. / text Microprocessor Memory system DRAM Dynamic Random Access Memory chips On-chip memory system Prefetching Cache management Buffer
58	Equipment for measuring cosmic-ray effects on DRAM Jonsson, Per-Axel January 2007 (has links) Nuclear particles hitting the silicon in a electronic device can cause a change in the data in a memory bit cell or in a flip-flop. The device is still working, but the data is corrupted and this is called a soft error. A soft error caused by a single nuclear particle is called a single event upset and is a growing problem. Research is ongoing at Saab aiming at how susceptible random access memories are to protons and neutrons. This thesis describes the development of equipment for measuring cosmic-ray effects on DRAM in laboratories. The system is built on existing hardware with a FPGA as the core unit. A short history of soft errors is also given and what causes it. How a DRAM works and basic operation is explained and the difference between a SRAM. The result is a working system ready to be used. DRAM memory SEU single event upset soft error FPGA cosmic radiation Electronics Elektronik
59	Preparation Of Baxsr1-xtio3 Thin Films By Chemical Solution Deposition And Their Electrical Characterization Adem, Umut 01 January 2004 (has links) (PDF) In this study, barium strontium titanate (BST) thin films with different compositions (Ba0.9Sr0.1TiO3, Ba0.8Sr0.2TiO3, Ba0.7Sr0.3TiO3, Ba0.5Sr0.5TiO3) were produced by chemical solution deposition technique. BST solutions were prepared by dissolving barium acetate, strontium acetate and titanium isopropoxide in acetic acid and adding ethylene glycol as a chelating agent and stabilizer to this solution, at molar ratio of acetic acid/ethylene glycol, 3:1. The solution was then coated on Si and Pt//Ti/SiO2/Si substrates at 4000 rpm for 30 seconds. Crack-free films were obtained up to 600 nm thickness after 3 coating &amp / #8211 / pyrolysis cycles by using 0.4M solutions. Crystal structure of the films was determined by x-ray diffraction while morphological properties of the surface and the film-substrate interface was examined by scanning electron microscope (SEM). Dielectric constant, dielectric loss and ferroelectric parameters of the films were measured. Sintering temperature, film composition and the thickness of the films were changed in order to observe the effect of these parameters on the measured electrical properties. The dielectric constant of the films was decreased slightly in 1kHz-1 MHz range. It was seen that dielectric constant and loss of the films was comparable to chemical solution deposition derived films on literature. Maximum dielectric constant was obtained for the Ba0.7Sr0.3TiO3 composition at a sintering temperature of 800&amp / #730 / C for duration of 3 hours. Dielectric constant increased whereas dielectric loss decreased with increasing film thickness. BST films have composition dependent Curie temperature. For Ba content greater than 70 %, the material is in ferroelectric state. However, fine grain size of the films associated with chemical solution deposition and Sr doping causes the suppression of ferroelectric behaviour in BST films. Therefore, only for Ba0.9Sr0.1TiO3 composition, slim hysteresis loops with very low remanent polarization values were obtained. QD Inorganic Chemistry 146-197
60	Microarchitectural techniques to reduce energy consumption in the memory hierarchy Ghosh, Mrinmoy 03 April 2009 (has links) This thesis states that dynamic profiling of the memory reference stream can improve energy and performance in the memory hierarchy. The research presented in this theses provides multiple instances of using lightweight hardware structures to profile the memory reference stream. The objective of this research is to develop microarchitectural techniques to reduce energy consumption at different levels of the memory hierarchy. Several simple and implementable techniques were developed as a part of this research. One of the techniques identifies and eliminates redundant refresh operations in DRAM and reduces DRAM refresh power. Another, reduces leakage energy in L2 and higher level caches for multiprocessor systems. The emphasis of this research has been to develop several techniques of obtaining energy savings in caches using a simple hardware structure called the counting Bloom filter (CBF). CBFs have been used to predict L2 cache misses and obtain energy savings by not accessing the L2 cache on a predicted miss. A simple extension of this technique allows CBFs to do way-estimation of set associative caches to reduce energy in cache lookups. Another technique using CBFs track addresses in a Virtual Cache and reduce false synonym lookups. Finally this thesis presents a technique to reduce dynamic power consumption in level one caches using significance compression. The significant energy and performance improvements demonstrated by the techniques presented in this thesis suggest that this work will be of great value for designing memory hierarchies of future computing platforms. Energy Cache Dram Microarchitecture Computer architecture

Search results