Global ETD Search

11	Designing Low Cost Error Correction Schemes for Improving Memory Reliability January 2017 (has links) abstract: Memory systems are becoming increasingly error-prone, and thus guaranteeing their reliability is a major challenge. In this dissertation, new techniques to improve the reliability of both 2D and 3D dynamic random access memory (DRAM) systems are presented. The proposed schemes have higher reliability than current systems but with lower power, better performance and lower hardware cost. First, a low overhead solution that improves the reliability of commodity DRAM systems with no change in the existing memory architecture is presented. Specifically, five erasure and error correction (E-ECC) schemes are proposed that provide at least Chipkill-Correct protection for x4 (Schemes 1, 2 and 3), x8 (Scheme 4) and x16 (Scheme 5) DRAM systems. All schemes have superior error correction performance due to the use of strong symbol-based codes. In addition, the use of erasure codes extends the lifetime of the 2D DRAM systems. Next, two error correction schemes are presented for 3D DRAM memory systems. The first scheme is a rate-adaptive, two-tiered error correction scheme (RATT-ECC) that provides strong reliability (10^10x) reduction in raw FIT rate) for an HBM-like 3D DRAM system that services CPU applications. The rate-adaptive feature of RATT-ECC enables permanent bank failures to be handled through sparing. It can also be used to significantly reduce the refresh power consumption without decreasing the reliability and timing performance. The second scheme is a two-tiered error correction scheme (Config-ECC) that supports different sized accesses in GPU applications with strong reliability. It addresses the mismatch between data access size and fixed sized ECC scheme by designing a product code based flexible scheme. Config-ECC is built around a core unit designed for 32B access with a simple extension to support 64B and 128B accesses. Compared to fixed 32B and 64B ECC schemes, Config-ECC reduces the failure in time (FIT) rate by 200x and 20x, respectively. It also reduces the memory energy by 17% (in the dynamic mode) and 21% (in the static mode) compared to a state-of-the-art fixed 64B ECC scheme. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2017 Electrical engineering Computer Architecture Error Control Coding Memory Reliability Memory System
12	Gestion hétérogène des données dans les hiérarchies mémoires pour l’optimisation énergétique des architectures multi-coeurs / Read Only Data Specific Management for an Energy Efficient Memory System Vaumourin, Gregory 04 October 2016 (has links) Les problématiques de consommation dans la hiérarchie mémoire sont très présentes dans les architectures actuelles que ce soit pour les systèmes embarqués limités par leurs batteries ou pour les supercalculateurs limités par leurs enveloppes thermiques. Introduire une information de classification dans le système mémoire permet une gestion hétérogène, adaptée à chaque type particulier de données. Nous nous sommes intéressé dans cette thèse plus précisément aux données en lecture seule et étudions les possibilités d’une gestion spécifique dans la hiérarchie mémoire à travers un codesign compilation/architecture. Cela permet d’ouvrir de nouveaux potentiels en terme de localité des données, passage à l’échelle des architectures ou design des mémoires. Evaluée par simulation sur une architecture multi-coeurs, la solution mise en oeuvre permet des gains significatifs en terme de réduction de la consommation d’énergie à performance constante. / The energy consumption of the memory system in modern architectures is a major issue for embedded system limited by their battery or supercalculators limited by their Thermal Design Power. Using a classification information in the memory system allows a heterogeneous management of data, more specific to each kind of data. During this thesis, we focused on the specific management of read-only data into the memory system through a compilation/architecture codesign. It allows to explore new potentials in terms of data locality, scalability of the system or cache designs. Evaluated by simulation with multi-core architecture, the proposed solution others significant energy consumption reduction while keeping the performance stable. Hiérarchie mémoire Protocoles de cohérence Localité des données Cache Memory System Coherence protocol Data locality Cache Memory
13	Applications of Agent-Based Modeling and Simulation in Organization Management / 組織管理におけるエージェント・ベース・モデル・シミュレーションの応用 WU, JIUN YAN 23 September 2020 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(経済学) / 甲第22717号 / 経博第620号 / 新制\|\|経\|\|294(附属図書館) / 京都大学大学院経済学研究科経済学専攻 / (主査)教授関口倫紀, 教授若林直樹, 教授椙山泰生 / 学位規則第4条第1項該当 / Doctor of Economics / Kyoto University / DGAM Agent-Based Modeling and Simulation Organization Management Team Dynamics Transactive Memory System Intragroup Conflict Organization Resilience 330
14	Parallel Memory System Architectures for Packet Processing in Network Virtualization / ネットワーク仮想化におけるパケット処理のための並列メモリシステムアーキテクチャ Korikawa, Tomohiro 23 March 2021 (has links) 京都大学 / 新制・課程博士 / 博士(情報学) / 甲第23326号 / 情博第762号 / 新制\|\|情\|\|130(附属図書館) / 京都大学大学院情報学研究科通信情報システム専攻 / (主査)教授大木英司, 教授守倉正博, 教授岡部寿男 / 学位規則第4条第1項該当 / Doctor of Informatics / Kyoto University / DFAM Memory system architecture Packet processing Network virtualization Queueing analysis 3D-stacked memory 007
15	Exploiting Application Behaviors for Resilient Static Random Access Memory Arrays in the Near-Threshold Computing Regime Mugisha, Dieudonne Manzi 01 May 2015 (has links) Near-Threshold Computing embodies an intriguing choice for mobile processors due to the promise of superior energy efficiency, extending the battery life of these devices while reducing the peak power draw. However, process, voltage, and temperature variations cause a significantly high failure rate of Level One cache cells in the near-threshold regime a stark contrast to designs in the super-threshold regime, where fault sites are rare. This thesis work shows that faulty cells in the near-threshold regime are highly clustered in certain regions of the cache. In addition, popular mobile benchmarks are studied to investigate the impact of run-time workloads on timing faults manifestation. A technique to mitigate the run-time faults is proposed. This scheme maps frequently used data to healthy cache regions by exploiting the application cache behaviors. The results show up to 78% gain in performance over two other state-of-the-art techniques. Low power computing Memory system Near-threshold computing Process variation Computer Engineering
16	Generation and the Google Effect: Transactive Memory System Preference Across Age Siler, Jessica 01 August 2013 (has links) A transactive memory system (TMS) is a means by which people may store information externally; in such a system the task of remembering is offloaded by remembering where information is located, rather than remembering the information itself. As Sparrow et al. (2011) suggest in the article Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips, people are beginning to use the internet and computers as a TMS, and this use is changing the way people encounter and treat information. The purpose of this thesis is to investigate whether preference for TMS type (either with books or with computers) varies across age groups. An interaction between TMS preference and age was hypothesized. Before the onset of the internet age, information was primarily found in books and other print materials whereas now the internet is more frequently used, thus this shift in thinking and habit across generations was expected to emerge in the data. The study yielded a total of 51 participants, 32 from the young age group (ages 18-24) and 19 from the old (ages 61-81). A modified Stroop task and question blocks (for priming purposes) were employed to examine whether people are prone to think of book- or computer-related sources when in search of information. Also, a "Look up or Learn" tendencies survey was used to better understand how people decide whether certain information should be learned or left to be "looked up" later (Yacci & Rosanski, 2012). The mixed ANOVA did not reveal main effects for question difficulty or TMS type, nor was an interaction with age found. The results were not consistent with those of Sparrow et al. (2011) and did not show significance for TMS preference. Future studies should continue to examine the Google effect and TMS preference, as it bears important applications for a number of fields. Psychology
17	Optimizing Memory Systems for High Efficiency in Computing Clusters Liu, Wenjie January 2022 (has links) DRAM-based memory system suffers from increasing aggravating row buffer interference, which causes significant performance degradation and power consumption. With DRAM scaling, the overheads of row buffer interference become even worse due to higher row activation and precharge latency. Clusters have been a prevalent and successful computing framework for processing large amount of data due to their distributed and parallelized working paradigm. A task submitted to a cluster is typically divided into a number of subtasks which are designated to different work nodes running the same code but dealing with different equal portion of the dataset to be processed. Due to the existence of heterogeneity, it could easily result in stragglers unfairly slowing down the entire processing, because work nodes finish their subtasks at different rates. With the increasing problem complexity, more irregular applications are deployed on high-performance clusters due to the parallel working paradigm, and yield irregular memory access behaviors across nodes. However, the irregularity of memory access behaviors is not comprehensively studied, which results in low utilization of the integrated hybrid memory system compositing of stacked DRAM and off-chip DRAM. This dissertation lists our research results on the above three mentioned challenges in order to optimize the memory system for high efficiency in computing clusters. Details are as follows: To address low row buffer utilization caused by row buffer interference, we propose Row Buffer Cache (RBC) architecture to efficiently mitigate row buffer interference overheads. At the core of the RBC architecture, the DRAM pages with good locality are cached and escape from the row buffer interference.Such an RBC architecture significantly reduces the overheads caused by row activation and precharge, thus improves overall system performance and energy efficiency. We evaluate our RBC using SPEC CPU2006 on a DDR4 memory compared to the commodity baseline memory system along with the state-of-art methods, DICE and Bingo. Results show that RBC improves the memory performance by up to 2.24X (16.1% on average) and reduces the overall memory energy by up to 68.2% (23.6% on average) for single-core simulations. For multi-core simulations, RBC increases the performance by up to 1.55X (16.7% on average) and reduces the energy by up to 35.4% (21.3% on average). Comparing with the state-of-art methods, RBC outperforms DICE and Bingo by 8% and 5.1% on average for single-core scenario, and by 10.1% and 4.7% for multi-core scenario. To relax the straggling effect observed in clusters, we aim to speed up straggling work nodes to quicken the overall processing by leveraging exhibited performance variation, and propose StragglerHelper which conveys the memory access characteristics experienced by the forerunner to the stragglers such that stragglers can be sped up due to the accurately informed memory prefetching. A Progress Monitor is deployed to supervise the respective progresses of the work nodes and inform the memory access patterns of forerunner to straggling nodes. Our evaluation results with the SPEC MPI 2007 and BigDataBench on a cluster of 64 work nodes have shown that StragglerHelper is able to improve the execution time of stragglers by up to 99.5% with an average of 61.4%, contributing to an overall improvement of the entire cohort of the cluster by up to 46.7% with an average of 9.9% compared to the baseline cluster. To address the performance difference in the irregular application, we devise a novel method called Similarity-Managed Hybrid Memory System (SM-HMS) to improve the hybrid memory system performance by leveraging the memory access similarity among nodes in a cluster. Within SM-HMS, two techniques are proposed, Memory Access Similarity Measuring and Similarity-based Memory Access Behavior Sharing. To quantify the memory access similarity, memory access behaviors of each node are vectorized, and the distance between two vectors is used as the memory access similarity. The calculated memory access similarity is used to share memory access behaviors precisely across nodes. With the shared memory access behaviors, SM-HMS divides the stacked DRAM into two sections, the sliding window section and the outlier section. The shared memory access behaviors guide the replacement of the sliding window section while the outlier section is managed in the LRU manner. Our evaluation results with a set of irregular applications on various clusters consisting of up to 256 nodes have shown that SM-HMS outperforms the state-of-the-art approaches, Cameo, Chameleon, and Hyrbid2, on job finish time reduction by up to 58.6%, 56.7%, and 31.3%, with 46.1%, 41.6%, and 19.3% on average, respectively. SM-HMS can also achieve up to 98.6% (91.9% on average) of the ideal hybrid memory system performance. / Computer and Information Science Computer science Computing cluster DRAM Memory access behavior Memory system Stacked DRAM
18	Multi-Core Memory System Design : Developing and using Analytical Models for Performance Evaluation and Enhancements Dwarakanath, Nagendra Gulur January 2015 (has links) (PDF) Memory system design is increasingly inﬂuencing modern multi-core architectures from both performance and power perspectives. Both main memory latency and bandwidth have im-proved at a rate that is slower than the increase in processor core count and speed. Off-chip memory, primarily built from DRAM, has received signiﬁcant attention in terms of architecture and design for higher performance. These performance improvement techniques include sophisticated memory access scheduling, use of multiple memory controllers, mitigating the impact of DRAM refresh cycles, and so on. At the same time, new non-volatile memory technologies have become increasingly viable in terms of performance and energy. These alternative technologies offer different performance characteristics as compared to traditional DRAM. With the advent of 3D stacking, on-chip memory in the form of 3D stacked DRAM has opened up avenues for addressing the bandwidth and latency limitations of off-chip memory. Stacked DRAM is expected to offer abundant capacity — 100s of MBs to a few GBs — at higher bandwidth and lower latency. Researchers have proposed to use this capacity as an extension to main memory, or as a large last-level DRAM cache. When leveraged as a cache, stacked DRAM provides opportunities and challenges for improving cache hit rate, access latency, and off-chip bandwidth. Thus, designing off-chip and on-chip memory systems for multi-core architectures is complex, compounded by the myriad architectural, design and technological choices, combined with the characteristics of application workloads. Applications have inherent spatial local-ity and access parallelism that inﬂuence the memory system response in terms of latency and bandwidth. In this thesis, we construct an analytical model of the off-chip main memory system to comprehend this diverse space and to study the impact of memory system parameters and work-load characteristics from latency and bandwidth perspectives. Our model, called ANATOMY, uses a queuing network formulation of the memory system parameterized with workload characteristics to obtain a closed form solution for the average miss penalty experienced by the last-level cache. We validate the model across a wide variety of memory conﬁgurations on four-core, eight-core and sixteen-core architectures. ANATOMY is able to predict memory latency with average errors of 8.1%, 4.1%and 9.7%over quad-core, eight-core and sixteen-core conﬁgurations respectively. Further, ANATOMY identiﬁe better performing design points accurately thereby allowing architects and designers to explore the more promising design points in greater detail. We demonstrate the extensibility and applicability of our model by exploring a variety of memory design choices such as the impact of clock speed, beneﬁt of multiple memory controllers, the role of banks and channel width, and so on. We also demonstrate ANATOMY’s ability to capture architectural elements such as memory scheduling mechanisms and impact of DRAM refresh cycles. In all of these studies, ANATOMY provides insight into sources of memory performance bottlenecks and is able to quantitatively predict the beneﬁt of redressing them. An insight from the model suggests that the provisioning of multiple small row-buffers in each DRAM bank achieves better performance than the traditional one (large) row-buffer per bank design. Multiple row-buffers also enable newer performance improvement opportunities such as intra-bank parallelism between data transfers and row activations, and smart row-buffer allocation schemes based on workload demand. Our evaluation (both using the analytical model and detailed cycle-accurate simulation) shows that the proposed DRAM re-organization achieves signiﬁcant speed-up as well as energy reduction. Next we examine the role of on-chip stacked DRAM caches at improving performance by reducing the load on off-chip main memory. We extend ANATOMY to cover DRAM caches. ANATOMY-Cache takes into account all the key parameters/design issues governing DRAM cache organization namely, where the cache metadata is stored and accessed, the role of cache block size and set associativity and the impact of block size on row-buffer hit rate and off-chip bandwidth. Yet the model is kept simple and provides a closed form solution for the aver-age miss penalty experienced by the last-level SRAM cache. ANATOMY-Cache is validated against detailed architecture simulations and shown to have latency estimation errors of 10.7% and 8.8%on average in quad-core and eight-core conﬁgurations respectively. An interesting in-sight from the model suggests that under high load, it is better to bypass the congested DRAM cache and leverage the available idle main memory bandwidth. We use this insight to propose a refresh reduction mechanism that virtually eliminates refresh overhead in DRAM caches. We implement a low-overhead hardware mechanism to record accesses to recent DRAM cache pages and refresh only these pages. Older cache pages are considered invalid and serviced from the (idle) main memory. This technique achieves average refresh reduction of 90% with resulting memory energy savings of 9%and overall performance improvement of 3.7%. Finally, we propose a new DRAM cache organization that achieves higher cache hit rate, lower latency and lower off-chip bandwidth demand. Called the Bi-Modal Cache, our cache organization brings three independent improvements together: (i) it enables parallel tag and data accesses, (ii) it eliminates a large fraction of tag accesses entirely by use of a novel way locator and (iii) it improves cache space utilization by organizing the cache sets as a combination of some big blocks (512B) and some small blocks (64B). The Bi-Modal Cache reduces hit latency by use of the way locator and parallel tag and data accesses. It improves hit rate by leveraging the cache capacity efficiently – blocks with low spatial reuse are allocated in the cache at 64B granularity thereby reducing both wasted off-chip bandwidth as well as cache internal fragmentation. Increased cache hit rate leads to reduction in off-chip bandwidth demand. Through detailed simulations, we demonstrate that the Bi-Modal Cache achieves overall performance improvement of 10.8%, 13.8% and 14.0% in quad-core, eight-core and sixteen-core workloads respectively over an aggressive baseline. Multi Core Architecture ANATOMY-Cache DRAM Off-chip Memory Off-chip Bandwidth On-chip Memory Systems ANATOMY Multi-Core Memory System DRAM Cache Computer System-performance Evaluation Memory System Design Computer Science
19	Power Analysis and Low Power Scheduling Techniques for Intelligent Memory System Cheng, Lien-Fu 27 July 2001 (has links) Power consumption is gradually becoming an important issue of designing computing systems. Most of the researches of low power issues have focused on semiconductor techniques or hardware architecture designs, but less utilized the techniques of software optimization. This paper presents a new scheduling methodology in source code level for Intelligent Memory System, which reduces the energy consumption by means of code compilation techniques. The scheduling kernel provides two options for users: performance-oriented low power scheduling and energy-oriented low power scheduling, to achieve the objective of considering high performance and low power issues. The experimental results are also presented and discussed. Energy-oriented low power scheduling Intelligent Memory System
20	Appliction-driven Memory System Design on FPGAs Dai, Zefu 08 January 2014 (has links) Moore's Law has helped Field Programmable Gate Arrays (FPGAs) scale continuously in speed, capacity and energy efficiency, allowing the integration of ever-larger systems into a single FPGA chip. This brings challenges to the productivity of developers in leveraging the sea of FPGA resources. Higher level of design abstractions and programming models are needed to improve the design productivity, which in turn require memory architectural supports on FPGAs. While previous efforts focus on computation-centric applications, we take a bandwidth-centric approach in designing memory systems. In particular, we investigate the scheduling, buffered switching and searching problems, which are common to a wide range of FPGA applications. Despite that the bandwidth problem has been extensively studied for general-purpose computing and application specific integrated circuit (ASIC) designs, the proposed techniques are often not applicable to FPGAs. In order to achieve optimized design implementations, designers need to take into consideration both the underlying FPGA physical characteristics as well as the requirements from applications. We therefore extract design requirements from four driving applications for the selected problems, and address them by exploiting the physical architectures and available resources of FPGAs. Towards solving the selected problems, we manage to advance state-of-the-art with a scheduling algorithm, a switch organization and a cache analytical model. These lead to performance improvements, resource savings and feasibilities of new approaches for well-known problems. FPGA Memory System Muti-port Memory Controller Switch Fabric Cache Churn Rate Flash XML Quality-of-Service Bandwidth Guarantee Crosspoint Queued Switch Zipf's Law Online Video 0544

Search results