• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 104
  • 19
  • 9
  • 7
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 176
  • 176
  • 128
  • 114
  • 41
  • 35
  • 32
  • 29
  • 28
  • 25
  • 23
  • 15
  • 15
  • 14
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Intelligent Memory Management Heuristics

Panthulu, Pradeep 12 1900 (has links)
Automatic memory management is crucial in implementation of runtime systems even though it induces a significant computational overhead. In this thesis I explore the use of statistical properties of the directed graph describing the set of live data to decide between garbage collection and heap expansion in a memory management algorithm combining the dynamic array represented heaps with a mark and sweep garbage collector to enhance its performance. The sampling method predicting the density and the distribution of useful data is implemented as a partial marking algorithm. The algorithm randomly marks the nodes of the directed graph representing the live data at different depths with a variable probability factor p. Using the information gathered by the partial marking algorithm in the current step and the knowledge gathered in the previous iterations, the proposed empirical formula predicts with reasonable accuracy the density of live nodes on the heap, to decide between garbage collection and heap expansion. The resulting heuristics are tested empirically and shown to improve overall execution performance significantly in the context of the Jinni Prolog compiler's runtime system.
62

Intelligent Memory Manager: Towards improving the locality behavior of allocation-intensive applications.

Rezaei, Mehran 05 1900 (has links)
Dynamic memory management required by allocation-intensive (i.e., Object Oriented and linked data structured) applications has led to a large number of research trends. Memory performance due to the cache misses in these applications continues to lag in terms of execution cycles as ever increasing CPU-Memory speed gap continues to grow. Sophisticated prefetcing techniques, data relocations, and multithreaded architectures have tried to address memory latency. These techniques are not completely successful since they require either extra hardware/software in the system or special properties in the applications. Software needed for prefetching and data relocation strategies, aimed to improve cache performance, pollutes the cache so that the technique itself becomes counter-productive. On the other hand, extra hardware complexity needed in multithreaded architectures decelerates CPU's clock, since "Simpler is Faster." This dissertation, directed to seek the cause of poor locality behavior of allocation--intensive applications, studies allocators and their impact on the cache performance of these applications. Our study concludes that service functions, in general, and memory management functions, in particular, entangle with application's code and become the major cause of cache pollution. In this dissertation, we present a novel technique that transfers the allocation and de-allocation functions entirely to a separate processor residing in chip with DRAM (Intelligent Memory Manager). Our empirical results show that, on average, 60% of the cache misses caused by allocation and de-allocation service functions are eliminated using our technique.
63

Volatile Memory Message Carving: A "per process basis" Approach

Ali-Gombe, Aisha Ibrahim 01 December 2012 (has links)
The pace at which data and information transfer and storage has shifted from PCs to mobile devices is of great concern to the digital forensics community. Android is fast becoming the operating system of choice for these hand-held devices, hence the need to develop better forensic techniques for data recovery cannot be over-emphasized. This thesis analyzes the volatile memory for Motorola Android devices with a shift from traditional physical memory extraction to carving residues of data on a “per process basis”. Each Android application runs in a separate process within its own Dalvik Virtual Machine (JVM) instance, thus, the proposed “per process basis” approach. To extract messages, we first extract the runtime memory of the MotoBlur application, then carve and reconstruct both deleted and undeleted messages (emails and chat messages). An experimental study covering two Android phones is also presented.
64

An efficient and scalable core allocation strategy for multicore systems

Unknown Date (has links)
Multiple threads can run concurrently on multiple cores in a multicore system and improve performance/power ratio. However, effective core allocation in multicore and manycore systems is very challenging. In this thesis, we propose an effective and scalable core allocation strategy for multicore systems to achieve optimal core utilization by reducing both internal and external fragmentations. Our proposed strategy helps evenly spreading the servicing cores on the chip to facilitate better heat dissipation. We introduce a multi-stage power management scheme to reduce the total power consumption by managing the power states of the cores. We simulate three multicore systems, with 16, 32, and 64 cores, respectively, using synthetic workload. Experimental results show that our proposed strategy performs better than Square-shaped, Rectangle-shaped, L-Shaped, and Hybrid (contiguous and non-contiguous) schemes in multicore systems in terms of fragmentation and completion time. Among these strategies, our strategy provides a better heat dissipation mechanism. / by Manira S. Rani. / Thesis (M.S.C.S.)--Florida Atlantic University, 2011. / Includes bibliography. / Electronic reproduction. Boca Raton, Fla., 2011. Mode of access: World Wide Web.
65

Contributions au contrôle de l'affinité mémoire sur architectures multicoeurs et hiérarchiques / Contributions on Memory Affinity Management for Hierarchical Shared Memory Multi-core Platforms

Pousa Ribeiro, Christiane 29 June 2011 (has links)
Les plates-formes multi-coeurs avec un accès mémoire non uniforme (NUMA) sont devenu des ressources usuelles de calcul haute performance. Dans ces plates-formes, la mémoire partagée est constituée de plusieurs bancs de mémoires physiques organisés hiérarchiquement. Cette hiérarchie est également constituée de plusieurs niveaux de mémoires caches et peut être assez complexe. En raison de cette complexité, les coûts d'accès mémoire peuvent varier en fonction de la distance entre le processeur et le banc mémoire accédé. Aussi, le nombre de coeurs est très élevé dans telles machines entraînant des accès mémoire concurrents. Ces accès concurrents conduisent à des ponts chauds sur des bancs mémoire, générant des problèmes d'équilibrage de charge, de contention mémoire et d'accès distants. Par conséquent, le principal défi sur les plates-formes NUMA est de réduire la latence des accès mémoire et de maximiser la bande passante. Dans ce contexte, l'objectif principal de cette thèse est d'assurer une portabilité des performances évolutives sur des machines NUMA multi-coeurs en contrôlant l'affinité mémoire. Le premier aspect consiste à étudier les caractéristiques des plates-formes NUMA que sont à considérer pour contrôler efficacement les affinités mémoire, et de proposer des mécanismes pour tirer partie de telles affinités. Nous basons notre étude sur des benchmarks et des applications de calcul scientifique ayant des accès mémoire réguliers et irréguliers. L'étude de l'affinité mémoire nous a conduit à proposer un environnement pour gérer le placement des données pour les différents processus des applications. Cet environnement s'appuie sur des informations de compilation et sur l'architecture matérielle pour fournir des mécanismes à grains fins pour contrôler le placement. Ensuite, nous cherchons à fournir des solutions de portabilité des performances. Nous entendons par portabilité des performances la capacité de l'environnement à apporter des améliorations similaires sur des plates-formes NUMA différentes. Pour ce faire, nous proposons des mécanismes qui sont indépendants de l'architecture machine et du compilateur. La portabilité de l'environnement est évaluée sur différentes plates-formes à partir de plusieurs benchmarks et des applications numériques réelles. Enfin, nous concevons des mécanismes d'affinité mémoire qui peuvent être facilement adaptés et utilisés dans différents systèmes parallèles. Notre approche prend en compte les différentes structures de données utilisées dans les différentes applications afin de proposer des solutions qui peuvent être utilisées dans différents contextes. Toutes les propositions développées dans ce travail de recherche sont mises en œuvre dans une framework nommée Minas (Memory Affinity Management Software). Nous avons évalué l'adaptabilité de ces mécanismes suivant trois modèles de programmation parallèle à savoir OpenMP, Charm++ et mémoire transactionnelle. En outre, nous avons évalué ses performances en utilisant plusieurs benchmarks et deux applications réelles de géophysique. / Multi-core platforms with non-uniform memory access (NUMA) design are now a common resource in High Performance Computing. In such platforms, the shared memory is organized in an hierarchical memory subsystem in which the main memory is physically distributed into several memory banks. Additionally, the hierarchical memory subsystem of these platforms feature several levels of cache memories. Because of such hierarchy, memory access costs may vary depending on the distance between tasks and data. Furthermore, since the number of cores is considerably high in such machines, concurrent accesses to the same distributed shared memory are performed. These accesses produce more stress on the memory banks, generating load-balancing issues, memory contention and remote accesses. Therefore, the main challenge on a NUMA platform is to reduce memory access latency and memory contention. In this context, the main objective of this thesis is to attain scalable performances on multi-core NUMA machines by controlling memory affinity. The first goal of this thesis is to investigate which characteristics of the NUMA platform and the application have an important impact on the memory affinity control and propose mechanisms to deal with them on multi-core machines with NUMA design. We focus on High Performance Scientific Numerical workloads with regular and irregular memory access characteristics. The study of memory affinity aims at the proposal of an environment to manage memory affinity on Multi-core Platforms with NUMA design. This environment provides fine grained mechanisms to manage data placement for an application by using compilation time and architecture information. The second goal is to provide solutions that show performance portability. By performance portability, we mean solutions that are capable of providing similar performances improvements on different NUMA platforms. In order to do so, we propose mechanisms that are independent of machine architecture and compiler. The portability of the proposed environment is evaluated through the performance analysis of several benchmarks and applications over different platforms. Last, the third goal of this thesis is to design memory affinity mechanisms that can be easily adapted and used in different parallel systems. Our approach takes into account the different data structures used in High Performance Scientific Numerical workloads, in order to propose solutions that can be used in different contexts. We evaluate the adaptability of such mechanisms in two parallel programming systems. All the ideas developed in this research work are implemented in a Framework named Minas (Memory affInity maNAgement Software). Several OpenMP benchmarks and two real world applications from geophysics are used to evaluate its performance. Additionally, Minas integration on Charm++ (Parallel Programming System) and OpenSkel (Skeleton Pattern System for Software Transactional Memory) is also evaluated.
66

Extending branch prediction information to effective caching.

January 1996 (has links)
by Chung-Leung, Chiu. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1996. / Includes bibliographical references (leaves 110-113). / Abstract --- p.i / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Partial Basic Block Storing Mechanism --- p.1 / Chapter 1.2 --- Data-Tagged Mechanism in Branch Target Buffer --- p.4 / Chapter 1.3 --- Organization of the dissertation --- p.5 / Chapter 2 --- Related Research --- p.7 / Chapter 2.1 --- Branch Prediction --- p.7 / Chapter 2.2 --- Branch History Table --- p.8 / Chapter 2.2.1 --- Performance of Branch History Table in reducing the Branch Penalty --- p.10 / Chapter 2.3 --- Branch Target Cache --- p.10 / Chapter 2.4 --- Early Resolution of Branch --- p.11 / Chapter 2.5 --- Software Inter-block Reorganization --- p.12 / Chapter 2.6 --- Branch Target Buffer --- p.13 / Chapter 2.7 --- Data Prefetching --- p.16 / Chapter 2.7.1 --- Software-Directed Prefetching --- p.16 / Chapter 2.7.2 --- Hardware-based prefetching --- p.17 / Chapter 3 --- New Branch Target Buffer Design --- p.19 / Chapter 3.1 --- Alternate Line Storing --- p.22 / Chapter 3.2 --- Storing More Than One Line On Entering The Dynamic Basic Block --- p.27 / Chapter 4 --- Simulation Environment for New Branch Target Buffer Design --- p.30 / Chapter 4.1 --- Architectural Models and Assumptions --- p.30 / Chapter 4.2 --- Memory Models --- p.33 / Chapter 4.3 --- Evaluation Methodology and Measurement Criteria --- p.34 / Chapter 4.4 --- Description of the Traces --- p.35 / Chapter 4.5 --- Effect of the limitation of ATOM on the statistics of SPEC92 Bench- marks --- p.35 / Chapter 4.6 --- Environments for collecting relevant statistics of SPEC92 Benchmarks --- p.36 / Chapter 5 --- Results for New Branch Target Buffer Design --- p.38 / Chapter 5.1 --- Statistical Results and Analysis for SPEC92 Benchmarks --- p.38 / Chapter 5.2 --- Overall Performance --- p.39 / Chapter 5.3 --- Bus Latency Effect --- p.42 / Chapter 5.4 --- Effect of Cache Size --- p.45 / Chapter 5.5 --- Effect of Line Size --- p.47 / Chapter 5.6 --- Cache Set Associativity --- p.50 / Chapter 5.7 --- Partial Hits --- p.50 / Chapter 5.8 --- Prefetch Accuracy --- p.53 / Chapter 5.9 --- Effect of Prefetch Buffer Size --- p.54 / Chapter 5.10 --- Effect of Storing More Than One Line on Entry of New Dynamic Basic Block --- p.56 / Chapter 6 --- Data References Tagged into Branch Target Buffer --- p.60 / Chapter 6.1 --- Branch History Table Tagged Mechanism --- p.60 / Chapter 6.2 --- Lookahead Technique --- p.65 / Chapter 6.3 --- Default Prefetches Vs Data-tagged Prefetches --- p.71 / Chapter 6.4 --- New Priority Scheme --- p.73 / Chapter 7 --- Architectural Model for Data-Tagged References in Branch Target Buffer --- p.74 / Chapter 7.1 --- Architectural Models and Assumptions --- p.76 / Chapter 7.2 --- Memory Models --- p.79 / Chapter 7.3 --- Evaluation Methodology and Measurement Criteria --- p.79 / Chapter 7.4 --- Description of the Traces --- p.80 / Chapter 7.5 --- Environments for collecting relevant statistics of SPEC92 Benchmarks --- p.80 / Chapter 8 --- Results for Data References Tagged into Branch Target Buffer --- p.82 / Chapter 8.1 --- Statistical Results and Analysis --- p.82 / Chapter 8.2 --- Overall Performance --- p.83 / Chapter 8.3 --- Effect of Branch Prediction --- p.85 / Chapter 8.4 --- Effect of Number of Tagged Registers --- p.87 / Chapter 8.5 --- Effect of Different Tagged Positions in Basic Block --- p.90 / Chapter 8.6 --- Effect of Lookahead Size --- p.91 / Chapter 8.7 --- Prefetch Accuracy --- p.93 / Chapter 8.8 --- Cache Size --- p.95 / Chapter 8.9 --- Line Size --- p.96 / Chapter 8.10 --- Set Associativity --- p.97 / Chapter 8.11 --- Size of Branch History Table --- p.99 / Chapter 8.12 --- Set Associativity of Branch History Table --- p.99 / Chapter 8.13 --- New Priority Scheme Vs Default Priority Scheme --- p.102 / Chapter 8.14 --- Effect of Prefetch-On-Miss --- p.103 / Chapter 8.15 --- Memory Latency --- p.104 / Chapter 9 --- Conclusions and Future Research --- p.106 / Chapter 9.1 --- Conclusions --- p.106 / Chapter 9.2 --- Future Research --- p.108 / Bibliography --- p.110 / Appendix --- p.114 / Chapter A --- Statistical Results - SPEC92 Benchmarks --- p.114 / Chapter A.1 --- Definition of Abbreviations and Terms --- p.114
67

Transaction logging and recovery on phase-change memory

Gao, Shen 01 January 2013 (has links)
No description available.
68

Stream processing optimizations for mobile sensing applications

Lai, Farley 01 August 2017 (has links)
Mobile sensing applications (MSAs) are an emerging class of applications that process continuous sensor data streams to make time-sensitive inferences. Representative application domains range from environmental monitoring, context-aware services to recognition of physical activities and social interactions. Example applications involve city air quality assessment, indoor localization, pedometer and speaker identification. The common application workflow is to read data streams from the sensors (e.g, accelerometers, microphone, GPS), extract statistical features, and then present the inferred high-level events to the user. MSAs in the healthcare domain especially draw a significant amount of attention in recent years because sensor-based data collection and assessment offer finer-granularity, timeliness, and higher accuracy in greater quantity than traditional, labor-intensive, data gathering mechanisms in use today, e.g., surveys methods. The higher fidelity and accuracy of the collected data expose new research opportunities, improve the reliability and accuracy of medical decisions, and empower users to manage personal health more effectively. Nonetheless, a critical challenge to practical deployment of MSAs in real-world is to effectively manage limited resources of mobile platforms to meet stringent quality of service (QoS) requirements in terms of processing throughput and delay while ensuring long term robustness. To address the challenge, we model MSAs in dataflows as a graph of processing elements that are connected by communication channels. The processing elements may execute in parallel as long as they have sufficient data to process. A key feature of the dataflow model is that it explicitly capture parallelism and data dependencies between processing elements. Based on the graph composition, we first proposed CSense, a stream-processing toolkit for robust and high-rate MSAs. In this work, CSense provide a simple language for developers to describe their sensing flow without the need to deal with system intricacy, such as memory allocation, concurrency control and power management. The results show up to 19X performance difference may be achieved automatically compared with a baseline using the default runtime concurrency and memory management. Following this direction, we saw the opportunities that MSAs can be significantly improved from the perspective of memory performance and energy efficiency in view of the iterative execution. Therefore, we next focus on optimizing the runtime memory management through compile time analysis. The contribution is a stream compiler that captures the whole program memory behavior to generate an efficient memory layout for runtime access. Experiments show that our memory optimizations reduce memory footprint by as much as 96% while matching or improving the performance of the StreamIt compiler with cache optimizations enabled. On the other hand, while there is a significant body of work that has focused on optimizing the throughput or latency of processing sensor streams, little to no attention has been given to energy efficiency. We proposed an accurate offline energy prediction model for MSAs that leverages the pipeline structure and iterative execution nature to search for the most energy saving batching configuration w.r.t. a deadline constraint. The developers are expected to visualize the energy delay trade-off in the parameter space without runtime profiling. The evaluation shows the worst-case prediction errors are about 7% and 15% for energy and latency respectively despite variable application workloads.
69

Scratch-pad memory management for static data aggregates

Li, Lian, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links)
Scratch-pad memory (SPM), a fast on-chip SRAM managed by software, is widely used in embedded systems. Compared to hardware-managed cache, SPM can be more efficient in performance, power and area cost, and has the added advantage of better time predictability. In this thesis, SPMs should be seen in a general context. For example, in stream processors, a software-managed stream register file is usually used to stage data to and from off-chip memory. In IBM's Cell architecture, each co-processor has a software-managed local store for keeping data and instructions. SPM management is critical for SPM-based embedded systems. In this thesis, we propose two novel methodologies, the memory colouring methodology and the perfect colouring methodology, to place the static data aggregates such as arrays and structs of a program in SPM. Our methodologies are dynamic in the sense that some data aggregates can be swapped into and out of SPM during program execution. To this end, a live range splitting heuristic is introduced in order to create potential data transfer statements between SPM and off-chip memory. The memory colouring methodology is a general-purpose compiler approach. The novelty of this approach lies in partitioning an SPM into a pseudo register file then generalising existing graph colouring algorithms for register allocation to colour data aggregates. In this thesis, a scheme for partitioning an SPM into a pseudo register file is introduced. This methodology is inter-procedural and therefore operates on the interference graph for the data aggregates in the whole program. Different graph colouring algorithms may give rise to different results due to live range splitting and spilling heuristics used. As a result, two representative graph colouring algorithms, George and Appel's iterative-coalescing and Park and Moon's optimistic-coalescing, are generalised and evaluated for SPM allocation. Like memory colouring, perfect colouring is also inter-procedural. The novelty of this second methodology lies in formulating the SPM allocation problem as an interval colouring problem. The interval colouring problem is an NP problem and no widely-accepted approximation algorithms exist. The key observation is that the interference graphs for data aggregates in many embedded applications form a special class of superperfect graphs. This has led to the development of two additional SPM allocation algorithms. While differing in whether live range splits and spills are done sequentially or together, both algorithms place data aggregates in SPM based on the cliques in an interference graph. In both cases, we guarantee optimally that all data aggregates in an interference graph can be placed in SPM if the given SPM size is no smaller than the chromatic number of the graph. We have developed two memory colouring algorithms and two perfect colouring algorithms for SPM allocation. We have evaluated them using a set of embedded applications. Our results show that both methodologies are efficient and effective in handling large-scale embedded applications. While neither methodology outperforms the other consistently, perfect colouring has yielded better overall results in the set of benchmarks used in our experiments. All these algorithms are expected to be valuable. For example, they can be made available as part of the same compiler framework to assist the embedded designer with exploring a large number of optimisation opportunities for a particular embedded application.
70

Programmer friendly and efficient distributed shared memory integrated into a distributed operating system.

Silcock, Jackie, mikewood@deakin.edu.au January 1998 (has links)
Distributed Shared Memory (DSM) provides programmers with a shared memory environment in systems where memory is not physically shared. Clusters of Workstations (COWs), an often untapped source of computing power, are characterised by a very low cost/performance ratio. The combination of Clusters of Workstations (COWs) with DSM provides an environment in which the programmer can use the well known approaches and methods of programming for physically shared memory systems and parallel processing can be carried out to make full use of the computing power and cost advantages of the COW. The aim of this research is to synthesise and develop a distributed shared memory system as an integral part of an operating system in order to provide application programmers with a convenient environment in which the development and execution of parallel applications can be done easily and efficiently, and which does this in a transparent manner. Furthermore, in order to satisfy our challenging design requirements we want to demonstrate that the operating system into which the DSM system is integrated should be a distributed operating system. In this thesis a study into the synthesis of a DSM system within a microkernel and client-server based distributed operating system which uses both strict and weak consistency models, with a write-invalidate and write-update based approach for consistency maintenance is reported. Furthermore a unique automatic initialisation system which allows the programmer to start the parallel execution of a group of processes with a single library call is reported. The number and location of these processes are determined by the operating system based on system load information. The DSM system proposed has a novel approach in that it provides programmers with a complete programming environment in which they are easily able to develop and run their code or indeed run existing shared memory code. A set of demanding DSM system design requirements are presented and the incentives for the placement of the DSM system with a distributed operating system and in particular in the memory management server have been reported. The new DSM system concentrated on an event-driven set of cooperating and distributed entities, and a detailed description of the events and reactions to these events that make up the operation of the DSM system is then presented. This is followed by a pseudocode form of the detailed design of the main modules and activities of the primitives used in the proposed DSM system. Quantitative results of performance tests and qualitative results showing the ease of programming and use of the RHODOS DSM system are reported. A study of five different application is given and the results of tests carried out on these applications together with a discussion of the results are given. A discussion of how RHODOS’ DSM allows programmers to write shared memory code in an easy to use and familiar environment and a comparative evaluation of RHODOS DSM with other DSM systems is presented. In particular, the ease of use and transparency of the DSM system have been demonstrated through the description of the ease with which a moderately inexperienced undergraduate programmer was able to convert, write and run applications for the testing of the DSM system. Furthermore, the description of the tests performed using physically shared memory shows that the latter is indistinguishable from distributed shared memory; this is further evidence that the DSM system is fully transparent. This study clearly demonstrates that the aim of the research has been achieved; it is possible to develop a programmer friendly and efficient DSM system fully integrated within a distributed operating system. It is clear from this research that client-server and microkernel based distributed operating system integrated DSM makes shared memory operations transparent and almost completely removes the involvement of the programmer beyond classical activities needed to deal with shared memory. The conclusion can be drawn that DSM, when implemented within a client-server and microkernel based distributed operating system, is one of the most encouraging approaches to parallel processing since it guarantees performance improvements with minimal programmer involvement.

Page generated in 0.0834 seconds