Global ETD Search

81	Intelligent Memory Management Heuristics Panthulu, Pradeep 12 1900 (has links) Automatic memory management is crucial in implementation of runtime systems even though it induces a significant computational overhead. In this thesis I explore the use of statistical properties of the directed graph describing the set of live data to decide between garbage collection and heap expansion in a memory management algorithm combining the dynamic array represented heaps with a mark and sweep garbage collector to enhance its performance. The sampling method predicting the density and the distribution of useful data is implemented as a partial marking algorithm. The algorithm randomly marks the nodes of the directed graph representing the live data at different depths with a variable probability factor p. Using the information gathered by the partial marking algorithm in the current step and the knowledge gathered in the previous iterations, the proposed empirical formula predicts with reasonable accuracy the density of live nodes on the heap, to decide between garbage collection and heap expansion. The resulting heuristics are tested empirically and shown to improve overall execution performance significantly in the context of the Jinni Prolog compiler's runtime system. Memory management (Computer science) Memory management statistical methods partial marking random graphs probability factor
82	Intelligent Memory Manager: Towards improving the locality behavior of allocation-intensive applications. Rezaei, Mehran 05 1900 (has links) Dynamic memory management required by allocation-intensive (i.e., Object Oriented and linked data structured) applications has led to a large number of research trends. Memory performance due to the cache misses in these applications continues to lag in terms of execution cycles as ever increasing CPU-Memory speed gap continues to grow. Sophisticated prefetcing techniques, data relocations, and multithreaded architectures have tried to address memory latency. These techniques are not completely successful since they require either extra hardware/software in the system or special properties in the applications. Software needed for prefetching and data relocation strategies, aimed to improve cache performance, pollutes the cache so that the technique itself becomes counter-productive. On the other hand, extra hardware complexity needed in multithreaded architectures decelerates CPU's clock, since "Simpler is Faster." This dissertation, directed to seek the cause of poor locality behavior of allocation--intensive applications, studies allocators and their impact on the cache performance of these applications. Our study concludes that service functions, in general, and memory management functions, in particular, entangle with application's code and become the major cause of cache pollution. In this dissertation, we present a novel technique that transfers the allocation and de-allocation functions entirely to a separate processor residing in chip with DRAM (Intelligent Memory Manager). Our empirical results show that, on average, 60% of the cache misses caused by allocation and de-allocation service functions are eliminated using our technique. Memory management (Computer science) intelligent memory devices memory management algorithms storage utilization and fragmentation
83	An efficient and scalable core allocation strategy for multicore systems Unknown Date (has links) Multiple threads can run concurrently on multiple cores in a multicore system and improve performance/power ratio. However, effective core allocation in multicore and manycore systems is very challenging. In this thesis, we propose an effective and scalable core allocation strategy for multicore systems to achieve optimal core utilization by reducing both internal and external fragmentations. Our proposed strategy helps evenly spreading the servicing cores on the chip to facilitate better heat dissipation. We introduce a multi-stage power management scheme to reduce the total power consumption by managing the power states of the cores. We simulate three multicore systems, with 16, 32, and 64 cores, respectively, using synthetic workload. Experimental results show that our proposed strategy performs better than Square-shaped, Rectangle-shaped, L-Shaped, and Hybrid (contiguous and non-contiguous) schemes in multicore systems in terms of fragmentation and completion time. Among these strategies, our strategy provides a better heat dissipation mechanism. / by Manira S. Rani. / Thesis (M.S.C.S.)--Florida Atlantic University, 2011. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2011. Mode of access: World Wide Web. Modularity (Engineering) Multicasting (Computer networks) Convergence (Telecommunication) Computer architecture Memory management (Computer science) Cache memory
84	GPU-Acceleration of In-Memory Data Analytics Sitaridi, Evangelia January 2016 (has links) Hardware advances strongly influence the database system design. The flattening speed of CPU cores makes many-core accelerators, such as GPUs, a vital alternative to explore for processing the ever-increasing amounts of data. GPUs have a significantly higher degree of parallelism than multi-core CPUs but their cores are simpler. As a result, they do not face the power constraints limiting the parallelism of CPUs. Their trade-off, however, is the increased implementation complexity. This thesis adapts and redesigns data analytics operators to better exploit the GPU special memory and threading model. Due to the increasing memory capacity and also the user's need for fast interaction with the data, we focus on in-memory analytics. Our techniques span different steps of the data processing pipeline: (1) Data preprocessing, (2) Query compilation, and (3) Algorithmic optimization of the operators. Our data preprocessing techniques adapt the data layout for numeric and string columns to maximize the achieved GPU memory bandwidth. Our query compilation techniques compute the optimal execution plan for conjunctive filters. We formulate \textit{memory divergence} for string matching algorithms and suggest how to eliminate it. Finally, we parallelize decompression algorithms in our compression framework \textit{Gompresso} to fit more data into the limited GPU memory. Gompresso achieves high speed-ups on GPUs over multi-core CPU state-of-the-art libraries and is suitable for any massively parallel processor. Memory management (Computer science) Electronic data processing Graphics processing units Computer science
85	Extending branch prediction information to effective caching. January 1996 (has links) by Chung-Leung, Chiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 110-113). / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Partial Basic Block Storing Mechanism --- p.1 / Chapter 1.2 --- Data-Tagged Mechanism in Branch Target Buffer --- p.4 / Chapter 1.3 --- Organization of the dissertation --- p.5 / Chapter 2 --- Related Research --- p.7 / Chapter 2.1 --- Branch Prediction --- p.7 / Chapter 2.2 --- Branch History Table --- p.8 / Chapter 2.2.1 --- Performance of Branch History Table in reducing the Branch Penalty --- p.10 / Chapter 2.3 --- Branch Target Cache --- p.10 / Chapter 2.4 --- Early Resolution of Branch --- p.11 / Chapter 2.5 --- Software Inter-block Reorganization --- p.12 / Chapter 2.6 --- Branch Target Buffer --- p.13 / Chapter 2.7 --- Data Prefetching --- p.16 / Chapter 2.7.1 --- Software-Directed Prefetching --- p.16 / Chapter 2.7.2 --- Hardware-based prefetching --- p.17 / Chapter 3 --- New Branch Target Buffer Design --- p.19 / Chapter 3.1 --- Alternate Line Storing --- p.22 / Chapter 3.2 --- Storing More Than One Line On Entering The Dynamic Basic Block --- p.27 / Chapter 4 --- Simulation Environment for New Branch Target Buffer Design --- p.30 / Chapter 4.1 --- Architectural Models and Assumptions --- p.30 / Chapter 4.2 --- Memory Models --- p.33 / Chapter 4.3 --- Evaluation Methodology and Measurement Criteria --- p.34 / Chapter 4.4 --- Description of the Traces --- p.35 / Chapter 4.5 --- Effect of the limitation of ATOM on the statistics of SPEC92 Bench- marks --- p.35 / Chapter 4.6 --- Environments for collecting relevant statistics of SPEC92 Benchmarks --- p.36 / Chapter 5 --- Results for New Branch Target Buffer Design --- p.38 / Chapter 5.1 --- Statistical Results and Analysis for SPEC92 Benchmarks --- p.38 / Chapter 5.2 --- Overall Performance --- p.39 / Chapter 5.3 --- Bus Latency Effect --- p.42 / Chapter 5.4 --- Effect of Cache Size --- p.45 / Chapter 5.5 --- Effect of Line Size --- p.47 / Chapter 5.6 --- Cache Set Associativity --- p.50 / Chapter 5.7 --- Partial Hits --- p.50 / Chapter 5.8 --- Prefetch Accuracy --- p.53 / Chapter 5.9 --- Effect of Prefetch Buffer Size --- p.54 / Chapter 5.10 --- Effect of Storing More Than One Line on Entry of New Dynamic Basic Block --- p.56 / Chapter 6 --- Data References Tagged into Branch Target Buffer --- p.60 / Chapter 6.1 --- Branch History Table Tagged Mechanism --- p.60 / Chapter 6.2 --- Lookahead Technique --- p.65 / Chapter 6.3 --- Default Prefetches Vs Data-tagged Prefetches --- p.71 / Chapter 6.4 --- New Priority Scheme --- p.73 / Chapter 7 --- Architectural Model for Data-Tagged References in Branch Target Buffer --- p.74 / Chapter 7.1 --- Architectural Models and Assumptions --- p.76 / Chapter 7.2 --- Memory Models --- p.79 / Chapter 7.3 --- Evaluation Methodology and Measurement Criteria --- p.79 / Chapter 7.4 --- Description of the Traces --- p.80 / Chapter 7.5 --- Environments for collecting relevant statistics of SPEC92 Benchmarks --- p.80 / Chapter 8 --- Results for Data References Tagged into Branch Target Buffer --- p.82 / Chapter 8.1 --- Statistical Results and Analysis --- p.82 / Chapter 8.2 --- Overall Performance --- p.83 / Chapter 8.3 --- Effect of Branch Prediction --- p.85 / Chapter 8.4 --- Effect of Number of Tagged Registers --- p.87 / Chapter 8.5 --- Effect of Different Tagged Positions in Basic Block --- p.90 / Chapter 8.6 --- Effect of Lookahead Size --- p.91 / Chapter 8.7 --- Prefetch Accuracy --- p.93 / Chapter 8.8 --- Cache Size --- p.95 / Chapter 8.9 --- Line Size --- p.96 / Chapter 8.10 --- Set Associativity --- p.97 / Chapter 8.11 --- Size of Branch History Table --- p.99 / Chapter 8.12 --- Set Associativity of Branch History Table --- p.99 / Chapter 8.13 --- New Priority Scheme Vs Default Priority Scheme --- p.102 / Chapter 8.14 --- Effect of Prefetch-On-Miss --- p.103 / Chapter 8.15 --- Memory Latency --- p.104 / Chapter 9 --- Conclusions and Future Research --- p.106 / Chapter 9.1 --- Conclusions --- p.106 / Chapter 9.2 --- Future Research --- p.108 / Bibliography --- p.110 / Appendix --- p.114 / Chapter A --- Statistical Results - SPEC92 Benchmarks --- p.114 / Chapter A.1 --- Definition of Abbreviations and Terms --- p.114 Computer architecture Memory management (Computer science) Cache memory Buffer storage (Computer science)
86	Transaction logging and recovery on phase-change memory Gao, Shen 01 January 2013 (has links) No description available. Computer storage devices Memory management (Computer science) Transaction systems (Computer systems)
87	Logging Subsystem Performance: Model and Evaluation Clark, Thomas K. 21 October 1994 (has links) Transaction logging is an integral part of ensuring proper transformation of data from one state to another in modern data management. Because of this, the throughput of the logging subsystem can be critical to the throughput of an application. The purpose of this research is to break the log bottleneck at minimum cost. We first present a model for evaluating a logging subsystem, where a logging subsystem is made up of a log device, a log backup device, and the interconnect algorithm between the two, which we term the log backup method. Included in the logging model is a set of criteria for evaluating a logging subsystem and a system for weighting the criteria in order to facilitate comparisons of two logging subsystem configurations to determine the better of the two. We then present an evaluation of each of the pieces of the logging subsystem in order to increase the bandwidth of both the log device and log backup device, while selecting the best log backup method, at minimum cost. We show that the use of striping and RAID is the best alternative for increasing log device bandwidth. Along with our discussion of RAID, we introduce a new RAID algorithm that is designed to overcome the performance problems of small writes in a RAID log. In order to increase the effective bandwidth of the log backup device, we suggest the use of inexpensive magnetic tape drives and striping in the log backup device, where the bandwidth of the log backup device is increased to the point that it matches the bandwidth of the log device. For the log backup interconnect algorithm, we present the novel approach of backing up the log synchronously, where the log backup device is essentially a mirror of the log device, as well as evaluating other log backup interconnect algorithms. Finally, we present a discussion of a prototype implementation of some of the ideas in the thesis. The prototype was implemented in a commercial database system, using a beta version of INFORMIX-OnLine Dynamic Server™ version 6.0. Memory management (Computer science) Data recovery (Computer science) Computer algorithms Computer Sciences
88	Scratch-pad memory management for static data aggregates Li, Lian, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links) Scratch-pad memory (SPM), a fast on-chip SRAM managed by software, is widely used in embedded systems. Compared to hardware-managed cache, SPM can be more efficient in performance, power and area cost, and has the added advantage of better time predictability. In this thesis, SPMs should be seen in a general context. For example, in stream processors, a software-managed stream register file is usually used to stage data to and from off-chip memory. In IBM's Cell architecture, each co-processor has a software-managed local store for keeping data and instructions. SPM management is critical for SPM-based embedded systems. In this thesis, we propose two novel methodologies, the memory colouring methodology and the perfect colouring methodology, to place the static data aggregates such as arrays and structs of a program in SPM. Our methodologies are dynamic in the sense that some data aggregates can be swapped into and out of SPM during program execution. To this end, a live range splitting heuristic is introduced in order to create potential data transfer statements between SPM and off-chip memory. The memory colouring methodology is a general-purpose compiler approach. The novelty of this approach lies in partitioning an SPM into a pseudo register file then generalising existing graph colouring algorithms for register allocation to colour data aggregates. In this thesis, a scheme for partitioning an SPM into a pseudo register file is introduced. This methodology is inter-procedural and therefore operates on the interference graph for the data aggregates in the whole program. Different graph colouring algorithms may give rise to different results due to live range splitting and spilling heuristics used. As a result, two representative graph colouring algorithms, George and Appel's iterative-coalescing and Park and Moon's optimistic-coalescing, are generalised and evaluated for SPM allocation. Like memory colouring, perfect colouring is also inter-procedural. The novelty of this second methodology lies in formulating the SPM allocation problem as an interval colouring problem. The interval colouring problem is an NP problem and no widely-accepted approximation algorithms exist. The key observation is that the interference graphs for data aggregates in many embedded applications form a special class of superperfect graphs. This has led to the development of two additional SPM allocation algorithms. While differing in whether live range splits and spills are done sequentially or together, both algorithms place data aggregates in SPM based on the cliques in an interference graph. In both cases, we guarantee optimally that all data aggregates in an interference graph can be placed in SPM if the given SPM size is no smaller than the chromatic number of the graph. We have developed two memory colouring algorithms and two perfect colouring algorithms for SPM allocation. We have evaluated them using a set of embedded applications. Our results show that both methodologies are efficient and effective in handling large-scale embedded applications. While neither methodology outperforms the other consistently, perfect colouring has yielded better overall results in the set of benchmarks used in our experiments. All these algorithms are expected to be valuable. For example, they can be made available as part of the same compiler framework to assist the embedded designer with exploring a large number of optimisation opportunities for a particular embedded application. Scratch-pad memory. Compiler. Embedded computer systems. Memory management (Computer science)
89	Memory management strategies to improve the space-time performance of Java programs Yu, Ching-han. January 2006 (has links) Thesis (Ph. D.)--University of Hong Kong, 2006. / Title proper from title frame. Also available in printed format.
90	Memory management strategies to improve the space-time performance of Java programs / Yu, Ching-han. January 2006 (has links) Thesis (Ph. D.)--University of Hong Kong, 2006. / Also available online.

Search results