• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 139
  • 20
  • 9
  • 7
  • 6
  • 4
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 218
  • 218
  • 166
  • 150
  • 47
  • 47
  • 43
  • 34
  • 31
  • 29
  • 29
  • 17
  • 16
  • 16
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

Enhancing memory controllers to improve DRAM power and performance

Hur, Ibrahim, January 1900 (has links) (PDF)
Thesis (Ph. D.)--University of Texas at Austin, 2006. / Vita. Includes bibliographical references.
32

Prefetch mechanisms by application memory access pattern

Agaram, Kartik Kandadai, January 1900 (has links)
Thesis (Ph. D.)--University of Texas at Austin, 2007. / Vita. Includes bibliographical references.
33

Memory Profiling Techniques

Faur, Andrei January 2012 (has links)
Memory profiling is an important technique which aids program optimization and can even help tracking down bugs. The main problem with the current memory profiling techniques and tools is that they slow down the target software considerably therefore making them inadequate for mainline integration. Ideally, the user would be able to monitor memory consumption without having to worry about the rest of the software being affected in any way. This thesis provides a comparison of existing techniques and tools along with the description of a memory profiler implementation which tries to provide a balance between the information it is able to retrieve and the influence it has on the target software.
34

Evaluate the Fragmentation Effect of Different Heap Allocation Algorithms in Linux

Rentas, Dimitris January 2015 (has links)
Modern application are becoming more complex and demanding in terms of resource utilization. LTE network is part of those applications. Efficient memory utilization poses a great challenge to developers. The dynamic memory allocations and de allocations over the program execution time leads to a problem called memory fragmentation, which can eventually lead the system out of memory. Currently there are many allocators that are specifically designed for dynamic memory management. This thesis contains the study and analysis of three different allocators, ptmalloc2, tcmalloc and tlsf. The goal of the thesis is the evaluation of their performance in terms of memory fragmentation and cpu execution time. The allocators are tested against a real program tracing file, which contains a sequence of allocations and deallocations captured from an executing process.
35

Netswap: Network-based Swapping for Server-Embedded Board Clusters

Errabelly, Sandeep 05 July 2023 (has links)
Capital equipment costs and energy costs are the major cost drivers in datacenters. Prior works have explored various techniques, like efficient scheduling algorithms and advanced power management techniques, to maximize resource utilization to reduce the capital and energy costs. The project HEXO has explored heterogeneous-Instruction Set Architecture (ISA) server-embedded clusters to minimize the cost. HEXO's key idea is to migrate stateful virtual machines from high-performance x86-based servers to low-power, low-cost ARM-based embedded boards, reducing server's resource congestion and thereby improving throughput and energy efficiency. However, embedded boards generally have significantly lower onboard memory, typically in the range of 100MB to 4GB. Due to this limitation, high memory-demand applications cannot be migrated to embedded devices. This limits the scope of applications that can be used with heterogeneous-ISA server-embedded clusters such as HEXO. This thesis proposes Netswap, a mechanism that utilizes the server's free memory as remote memory for the embedded board. Netswap comprises three main components: the swap-out and swap-in mechanism, a bitmap-based Free Memory Manager, and the Netswap Remote Daemon. Experimental studies using micro- and macro benchmarks reveal that Netswap improves the throughput and energy efficiency of server-embedded clusters by as much as 40% and 20%, respectively, over server-only baselines. / Master of Science / Datacenters have major expenditures like capital costs and energy expenditures. The project HEXO addresses in reducing these expenditures by including small embedded devices in datacenters. These embedded devices are cheaper and consume less energy than a typical server, but they have limited onboard RAM. The memory limitation restricts HEXO's ability to run applications with higher memory demand. This thesis introduces Netswap, which solves this issue by utilizing the free memory available on the servers as a secondary memory for the connected embedded devices. We discussed various design choices for efficiently implementing such a remote memory mechanism.
36

Intelligent Memory Management Heuristics

Panthulu, Pradeep 12 1900 (has links)
Automatic memory management is crucial in implementation of runtime systems even though it induces a significant computational overhead. In this thesis I explore the use of statistical properties of the directed graph describing the set of live data to decide between garbage collection and heap expansion in a memory management algorithm combining the dynamic array represented heaps with a mark and sweep garbage collector to enhance its performance. The sampling method predicting the density and the distribution of useful data is implemented as a partial marking algorithm. The algorithm randomly marks the nodes of the directed graph representing the live data at different depths with a variable probability factor p. Using the information gathered by the partial marking algorithm in the current step and the knowledge gathered in the previous iterations, the proposed empirical formula predicts with reasonable accuracy the density of live nodes on the heap, to decide between garbage collection and heap expansion. The resulting heuristics are tested empirically and shown to improve overall execution performance significantly in the context of the Jinni Prolog compiler's runtime system.
37

Intelligent Memory Manager: Towards improving the locality behavior of allocation-intensive applications.

Rezaei, Mehran 05 1900 (has links)
Dynamic memory management required by allocation-intensive (i.e., Object Oriented and linked data structured) applications has led to a large number of research trends. Memory performance due to the cache misses in these applications continues to lag in terms of execution cycles as ever increasing CPU-Memory speed gap continues to grow. Sophisticated prefetcing techniques, data relocations, and multithreaded architectures have tried to address memory latency. These techniques are not completely successful since they require either extra hardware/software in the system or special properties in the applications. Software needed for prefetching and data relocation strategies, aimed to improve cache performance, pollutes the cache so that the technique itself becomes counter-productive. On the other hand, extra hardware complexity needed in multithreaded architectures decelerates CPU's clock, since "Simpler is Faster." This dissertation, directed to seek the cause of poor locality behavior of allocation--intensive applications, studies allocators and their impact on the cache performance of these applications. Our study concludes that service functions, in general, and memory management functions, in particular, entangle with application's code and become the major cause of cache pollution. In this dissertation, we present a novel technique that transfers the allocation and de-allocation functions entirely to a separate processor residing in chip with DRAM (Intelligent Memory Manager). Our empirical results show that, on average, 60% of the cache misses caused by allocation and de-allocation service functions are eliminated using our technique.
38

High Performance Architecture using Speculative Threads and Dynamic Memory Management Hardware

Li, Wentong 12 1900 (has links)
With the advances in very large scale integration (VLSI) technology, hundreds of billions of transistors can be packed into a single chip. With the increased hardware budget, how to take advantage of available hardware resources becomes an important research area. Some researchers have shifted from control flow Von-Neumann architecture back to dataflow architecture again in order to explore scalable architectures leading to multi-core systems with several hundreds of processing elements. In this dissertation, I address how the performance of modern processing systems can be improved, while attempting to reduce hardware complexity and energy consumptions. My research described here tackles both central processing unit (CPU) performance and memory subsystem performance. More specifically I will describe my research related to the design of an innovative decoupled multithreaded architecture that can be used in multi-core processor implementations. I also address how memory management functions can be off-loaded from processing pipelines to further improve system performance and eliminate cache pollution caused by runtime management functions.
39

Software-assisted data prefetching algorithms.

January 1995 (has links)
by Chi-sum, Ho. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1995. / Includes bibliographical references (leaves 110-113). / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Cache Memories --- p.1 / Chapter 1.3 --- Improving Cache Performance --- p.3 / Chapter 1.4 --- Improving System Performance --- p.4 / Chapter 1.5 --- Organization of the dissertation --- p.6 / Chapter 2 --- Related Work --- p.8 / Chapter 2.1 --- Cache Performance --- p.8 / Chapter 2.2 --- Non-Blocking Cache --- p.9 / Chapter 2.3 --- Cache Prefetching --- p.10 / Chapter 2.3.1 --- Hardware Prefetching --- p.10 / Chapter 2.3.2 --- Software-assisted Prefetching --- p.13 / Chapter 2.3.3 --- Improving Cache Effectiveness --- p.22 / Chapter 2.4 --- Other Techniques to Reduce and Hide Memory Latencies --- p.25 / Chapter 2.4.1 --- Register Preloading --- p.25 / Chapter 2.4.2 --- Write Policies --- p.26 / Chapter 2.4.3 --- Small Specialized Cache --- p.26 / Chapter 2.4.4 --- Program Transformation --- p.27 / Chapter 3 --- Stride CAM Prefetching --- p.30 / Chapter 3.1 --- Introduction --- p.30 / Chapter 3.2 --- Architectural Model --- p.32 / Chapter 3.2.1 --- Compiler Support --- p.33 / Chapter 3.2.2 --- Hardware Support --- p.35 / Chapter 3.2.3 --- Model Details --- p.39 / Chapter 3.3 --- Optimization Issues --- p.39 / Chapter 3.3.1 --- Eliminating Reductant Prefetching --- p.40 / Chapter 3.3.2 --- Code Motion --- p.40 / Chapter 3.3.3 --- Burst Mode --- p.44 / Chapter 3.3.4 --- Stride CAM Overflow --- p.45 / Chapter 3.3.5 --- Effects of Loop Optimizations --- p.46 / Chapter 3.4 --- Practicability --- p.50 / Chapter 3.4.1 --- Evaluation Methodology --- p.51 / Chapter 3.4.2 --- Prefetch Accuracy --- p.54 / Chapter 3.4.3 --- Stride CAM Size --- p.56 / Chapter 3.4.4 --- Software Overhead --- p.60 / Chapter 4 --- Stride Register Prefetching --- p.67 / Chapter 4.1 --- Motivation --- p.67 / Chapter 4.2 --- Architectural Model --- p.67 / Chapter 4.2.1 --- Stride Register --- p.69 / Chapter 4.2.2 --- Compiler Support --- p.70 / Chapter 4.2.3 --- Prefetch Bits --- p.72 / Chapter 4.2.4 --- Operation Details --- p.77 / Chapter 4.3 --- Practicability and Optimizations --- p.78 / Chapter 4.3.1 --- Practicability on NASA7 Benchmark Programs --- p.78 / Chapter 4.3.2 --- Optimization Issues --- p.81 / Chapter 4.4 --- Comparison Between Stride CAM and Stride Register Models --- p.84 / Chapter 5 --- Small Software-Driven Array Cache --- p.87 / Chapter 5.1 --- Introduction --- p.87 / Chapter 5.2 --- Cache Pollution in MXM --- p.88 / Chapter 5.3 --- Architectural Model --- p.89 / Chapter 5.3.1 --- Operation Details --- p.91 / Chapter 5.4 --- Effectiveness of Array Cache --- p.92 / Chapter 6 --- Conclusion --- p.96 / Chapter 6.1 --- Conclusion --- p.96 / Chapter 6.2 --- Future Research: An Extension of the Stride CAM Model --- p.97 / Chapter 6.2.1 --- Background --- p.97 / Chapter 6.2.2 --- Reference Address Series --- p.98 / Chapter 6.2.3 --- Extending the Stride CAM Model --- p.100 / Chapter 6.2.4 --- Prefetch Overhead --- p.109 / Bibliography --- p.110 / Appendix --- p.114 / Chapter A --- Simulation Results - Stride CAM Model --- p.114 / Chapter A.l --- Execution Time --- p.114 / Chapter A.1.1 --- BTRIX --- p.114 / Chapter A.1.2 --- CFFT2D --- p.115 / Chapter A.1.3 --- CHOLSKY --- p.116 / Chapter A.1.4 --- EMIT --- p.117 / Chapter A.1.5 --- GMTRY --- p.118 / Chapter A.1.6 --- MXM --- p.119 / Chapter A.1.7 --- VPENTA --- p.120 / Chapter A.2 --- Memory Delay --- p.122 / Chapter A.2.1 --- BTRIX --- p.122 / Chapter A.2.2 --- CFFT2D --- p.123 / Chapter A.2.3 --- CHOLSKY --- p.124 / Chapter A.2.4 --- EMIT --- p.125 / Chapter A.2.5 --- GMTRY --- p.126 / Chapter A.2.6 --- MXM --- p.127 / Chapter A.2.7 --- VPENTA --- p.128 / Chapter A.3 --- Overhead --- p.129 / Chapter A.3.1 --- BTRIX --- p.129 / Chapter A.3.2 --- CFFT2D --- p.130 / Chapter A.3.3 --- CHOLSKY --- p.131 / Chapter A.3.4 --- EMIT --- p.132 / Chapter A.3.5 --- GMTRY --- p.133 / Chapter A.3.6 --- MXM --- p.134 / Chapter A.3.7 --- VPENTA --- p.135 / Chapter A.4 --- Hit Ratio --- p.136 / Chapter A.4.1 --- BTRIX --- p.136 / Chapter A.4.2 --- CFFT2D --- p.137 / Chapter A.4.3 --- CHOLSKY --- p.137 / Chapter A.4.4 --- EMIT --- p.138 / Chapter A.4.5 --- GMTRY --- p.139 / Chapter A.4.6 --- MXM --- p.139 / Chapter A.4.7 --- VPENTA --- p.140 / Chapter B --- Simulation Results - Array Cache --- p.141 / Chapter C --- NASA7 Benchmark --- p.145 / Chapter C.1 --- BTRIX --- p.145 / Chapter C.2 --- CFFT2D --- p.161 / Chapter C.2.1 --- cfft2dl --- p.161 / Chapter C.2.2 --- cfft2d2 --- p.169 / Chapter C.3 --- CHOLSKY --- p.179 / Chapter C.4 --- EMIT --- p.192 / Chapter C.5 --- GMTRY --- p.205 / Chapter C.6 --- MXM --- p.217 / Chapter C.7 --- VPENTA --- p.220
40

Data prefetching using hardware register value predictable table.

January 1996 (has links)
by Chin-Ming, Cheung. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 95-97). / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Overview --- p.1 / Chapter 1.2 --- Objective --- p.3 / Chapter 1.3 --- Organization of the dissertation --- p.4 / Chapter 2 --- Related Works --- p.6 / Chapter 2.1 --- Previous Cache Works --- p.6 / Chapter 2.2 --- Data Prefetching Techniques --- p.7 / Chapter 2.2.1 --- Hardware Vs Software Assisted --- p.7 / Chapter 2.2.2 --- Non-selective Vs Highly Selective --- p.8 / Chapter 2.2.3 --- Summary on Previous Data Prefetching Schemes --- p.12 / Chapter 3 --- Program Data Mapping --- p.13 / Chapter 3.1 --- Regular and Irregular Data Access --- p.13 / Chapter 3.2 --- Propagation of Data Access Regularity --- p.16 / Chapter 3.2.1 --- Data Access Regularity in High Level Program --- p.17 / Chapter 3.2.2 --- Data Access Regularity in Machine Code --- p.18 / Chapter 3.2.3 --- Data Access Regularity in Memory Address Sequence --- p.20 / Chapter 3.2.4 --- Implication --- p.21 / Chapter 4 --- Register Value Prediction Table (RVPT) --- p.22 / Chapter 4.1 --- Predictability of Register Values --- p.23 / Chapter 4.2 --- Register Value Prediction Table --- p.26 / Chapter 4.3 --- Control Scheme of RVPT --- p.29 / Chapter 4.3.1 --- Details of RVPT Mechanism --- p.29 / Chapter 4.3.2 --- Explanation of the Register Prediction Mechanism --- p.32 / Chapter 4.4 --- Examples of RVPT --- p.35 / Chapter 4.4.1 --- Linear Array Example --- p.35 / Chapter 4.4.2 --- Linked List Example --- p.36 / Chapter 5 --- Program Register Dependency --- p.39 / Chapter 5.1 --- Register Dependency --- p.40 / Chapter 5.2 --- Generalized Concept of Register --- p.44 / Chapter 5.2.1 --- Cyclic Dependent Register(CDR) --- p.44 / Chapter 5.2.2 --- Acyclic Dependent Register(ADR) --- p.46 / Chapter 5.3 --- Program Register Overview --- p.47 / Chapter 6 --- Generalized RVPT Model --- p.49 / Chapter 6.1 --- Level N RVPT Model --- p.49 / Chapter 6.1.1 --- Identification of Level N CDR --- p.51 / Chapter 6.1.2 --- Recording CDR instructions of Level N CDR --- p.53 / Chapter 6.1.3 --- Prediction of Level N CDR --- p.55 / Chapter 6.2 --- Level 2 Register Value Prediction Table --- p.55 / Chapter 6.2.1 --- Level 2 RVPT Structure --- p.56 / Chapter 6.2.2 --- Identification of Level 2 CDR --- p.58 / Chapter 6.2.3 --- Control Scheme of Level 2 RVPT --- p.59 / Chapter 6.2.4 --- Example of Index Array --- p.63 / Chapter 7 --- Performance Evaluation --- p.66 / Chapter 7.1 --- Evaluation Methodology --- p.66 / Chapter 7.1.1 --- Trace-Drive Simulation --- p.66 / Chapter 7.1.2 --- Architectural Method --- p.68 / Chapter 7.1.3 --- Benchmarks and Metrics --- p.70 / Chapter 7.2 --- General Result --- p.75 / Chapter 7.2.1 --- Constant Stride or Regular Data Access Applications --- p.77 / Chapter 7.2.2 --- Non-constant Stride or Irregular Data Access Applications --- p.79 / Chapter 7.3 --- Effect of Design Variations --- p.80 / Chapter 7.3.1 --- Effect of Cache Size --- p.81 / Chapter 7.3.2 --- Effect of Block Size --- p.83 / Chapter 7.3.3 --- Effect of Set Associativity --- p.86 / Chapter 7.4 --- Summary --- p.87 / Chapter 8 --- Conclusion and Future Research --- p.88 / Chapter 8.1 --- Conclusion --- p.88 / Chapter 8.2 --- Future Research --- p.90 / Bibliography --- p.95 / Appendix --- p.98 / Chapter A --- MCPI vs. cache size --- p.98 / Chapter B --- MCPI Reduction Percentage Vs cache size --- p.102 / Chapter C --- MCPI vs. block size --- p.106 / Chapter D --- MCPI Reduction Percentage Vs block size --- p.110 / Chapter E --- MCPI vs. set-associativity --- p.114 / Chapter F --- MCPI Reduction Percentage Vs set-associativity --- p.118

Page generated in 0.0748 seconds