• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 104
  • 19
  • 9
  • 7
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 176
  • 176
  • 128
  • 114
  • 41
  • 35
  • 32
  • 29
  • 28
  • 25
  • 23
  • 15
  • 15
  • 14
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Generating Miss Rate Curves with Low Overhead Using Existing Hardware

Walsh, Tom 17 February 2010 (has links)
Miss Rate Curves (MRCs) for main memory have been proposed as a representation of memory utilization for use in a range of optimizations in the area of memory man- agement. Various techniques exist for their creation; however, all real-world methods of MRC generation must make trade-offs between overhead and accuracy. Proposals for new hardware techniques exist, but have yet to be implemented in actual hardware. We pro- pose the use of the Intel PEBS (Precise Event-Based Sampling) performance monitoring capability for the task of MRC generation on existing commodity hardware. We use PEBS to generate MRCs and compare them against MRCs generated through instrumentation, finding the PEBS MRCs to be good, but imperfect approximations, while keeping average PEBS overheads below 5%. We were unable to show that PEBS is better or worse than existing techniques, but believe we have succeeded in showing the promise of the use of general purpose performance monitoring hardware for this task and in motivating future research and development in this area.
72

Generating Miss Rate Curves with Low Overhead Using Existing Hardware

Walsh, Tom 17 February 2010 (has links)
Miss Rate Curves (MRCs) for main memory have been proposed as a representation of memory utilization for use in a range of optimizations in the area of memory man- agement. Various techniques exist for their creation; however, all real-world methods of MRC generation must make trade-offs between overhead and accuracy. Proposals for new hardware techniques exist, but have yet to be implemented in actual hardware. We pro- pose the use of the Intel PEBS (Precise Event-Based Sampling) performance monitoring capability for the task of MRC generation on existing commodity hardware. We use PEBS to generate MRCs and compare them against MRCs generated through instrumentation, finding the PEBS MRCs to be good, but imperfect approximations, while keeping average PEBS overheads below 5%. We were unable to show that PEBS is better or worse than existing techniques, but believe we have succeeded in showing the promise of the use of general purpose performance monitoring hardware for this task and in motivating future research and development in this area.
73

Storage Management for Embedded SIMD Processors

Ryu, Soojung 17 December 2003 (has links)
SIMD parallelism offers a high performance and efficient execution approach for today's broad range of portable multimedia consumer products. However, new methods are needed to meet the complex demands of high performance, embedded systems. This research explores new storage management techniques for this focused but critical application. These techniques include memory design exploration based on the application retargeting technique, storage-based systolic instruction broadcast, and systolic virtual memory to improve both the performance and efficiency of embedded SIMD systems. For an efficient storage usage by memory design space exploration in embedded SIMD systems, an analysis method for assessing storage needs and costs of a given application automatically retargeted across a spectrum of storage configuration designs was developed. Using this technique, a SIMD processing element achieves optimal area and energy efficiency with a register file containing between 8 and 12 words for given workload. This configuration is between 15% and 25% more area and energy efficient than other memory configurations being considered. Systolic instruction broadcast is a high performance and area efficient instruction broadcasting scheme with short-wire interconnects by eliminating of wire latency bottleneck found in global instruction broadcast. Three implementation methods are defined and evaluated - software method, 2-write port register file method, and bypass method. In our evaluations, due to the system's short clock cycle time and scheduler, a speedup in system performance of up to 7.5 can be achieved by the year 2010. In addition, speedup of area efficiency also can be achieved up to 7.2 for a given workload. The ability of minimizing off-chip memory access latency while maximizing access frequency by scheduling techniques along with data prefetch techniques in systolic virtual memory mechanism was evaluated using our SIMD-systolic architecture simulator. Results show that, systolic virtual off-chip memory with shared address space can achieve over 50% higher area efficiency than that of an on-chip only system for a matrix multiplication application.
74

DLL-Conscious Instruction Fetch Optimization for SMT Processors

Mohamood, Fayez 12 April 2006 (has links)
Simultaneous multithreading (SMT) processors can issue multiple instructions from distinct processes or threads in the same cycle. This technique effectively increases the overall throughput by keeping the pipeline resources more occupied at the potential expense of reducing single thread performance due to resource sharing. In the software domain, an increasing number of Dynamically Linked Libraries (DLL) are used by applications and operating systems, providing better flexibility and modularity, and enabling code sharing. It is observed that a significant amount of execution time in software today is spent in executing standard DLL instructions, that are shared among multiple threads or processes. However, for an SMT processor with a virtually-indexed based cache implementation, existing instruction fetching mechanisms can induce unnecessary false cache misses caused by the DLL-based instructions, which were intended to be shared. This problem is more conspicuous when multiple independent threads are executing concurrently in an SMT processor. This work investigates an often-neglected form of contention between running threads in the I-TLB and I-cache caused by DLLs. To address these shortcomings, we propose a system level technique involving a light-weight modification in the microarchitecture and the OS. By exploiting the nature of the DLLs in our new architecture, we are able to reinstate physical sharing of the DLLs in an SMT machine. Using Microsoft Windows based applications, our simulation results show that the optimized instruction fetching mechanism can reduce the number of DLL misses up to 5.5 times and improve the instruction cache hit rates by up to 62%, resulting in upto 30% DLL IPC improvements and upto 15% overall IPC improvements.
75

The design and implementation of memory management of virtual machine in user-space

Chu, Ching-hao 21 June 2011 (has links)
With the popularity of Smart Handset devices, much more discussion of the design and development of embedded systems, some of embedded system problems such as the stability and efficiency of the device, the easy-operating interface design and a variety of application design are more and more important. Application development in the embedded systems is often limited by the system resource such as memory. Compared with common computer systems, embedded system got very limited memory. Therefore, program development in the embedded systems often need to consider the problem of insufficient memory, and program design must also avoid using too large number of memory allocation to cause the program take up a lot of system memory, affecting the system operation, causing the system hazard. Java is one of the common programming languages using in the embedded system development. Based on the high portability, Java programs can easily port to another system environment by using the Java virtual machine. However, the Java programming is also restricted, such as Java programming is not allowed to access memory space direct, and the memory allocation and release are all controlled by the system, rather than users. The purpose of the research is to design a set of Java programming tools. It can be applied to Android Dalvik virtual machine, which is responsible for operating the memory allocation and release, to allow users to control memory so as to ensure that memory can be reused to avoid the system hazard caused by the system memory leak problem.
76

An adaptive software transactional memory support for multi-core programming

Chan, Kinson. January 2009 (has links)
Thesis (M. Phil.)--University of Hong Kong, 2010. / Includes bibliographical references (leaves 94-98). Also available in print.
77

Exploding Java objects for performance /

Noth, Michael E., January 2003 (has links)
Thesis (Ph. D.)--University of Washington, 2003. / Vita. Includes bibliographical references (leaves 134-139).
78

Memory region: a system abstraction for managing the complex memory structures of multicore platforms

Lee, Min 13 January 2014 (has links)
The performance of modern many-core systems depends on the effective use of their complex cache and memory structures, and this will likely become more pronounced with the impending arrival of on-chip 3D stacked and non-volatile off-chip byte-addressable memory. Yet to date, operating systems have not treated memory as a first class schedulable resource, embracing memory heterogeneity. This dissertation presents a new software abstraction, called ‘memory region’, which denotes the current set of physical memory pages actively used by workloads. Using this abstraction, memory resources can be scheduled for applications to fully exploit a platform's underlying cache and memory system, thereby gaining improved performance and predictability in execution, particularly for the consolidated workloads seen in virtualized and cloud computing infrastructures. The abstraction's implementation in the Xen hypervisor involves the run-time detection of memory regions, the scheduled mapping of these regions to caches to match performance goals, and maintaining region-to-cache mappings using per-cache page tables. This dissertation makes the following specific contributions. First, its region scheduling method proposes that the location of memory blocks rather than CPU utilization is the principal determinant where workloads are run. It proposes a new scheduling method, the region scheduling that the location of memory blocks determines where the workloads are run. Second, treating memory blocks as first-class resources, new methods for efficient cache management are shown to improve application performance as well as the performance of certain operating system functions. Third, explicit memory scheduling makes it possible to disaggregate operating systems, without the need to change OS sources and with only small markups of target guest OS functionality. With this method, OS functions can be mapped to specific desired platform components, such as file system confined to running on specific cores and using only certain memory resources designated for its use. This can improve performance for applications heavily dependent on certain OS functions, by dynamically providing those functions with the resources needed for their current use, and it can prevent performance-critical application functionality from being needlessly perturbed by OS functions used for other purposes or by other jobs. Fourth, extensions of region scheduling can also help applications deal with the heterogeneous memory resources present in future systems, including on-chip stacked DRAM and NUMA or even NVRAM memory modules. More generally, regions scheduling is shown to apply to memory structures with well-defined differences in memory access latencies.
79

Sustainable system infrastructure and big bang evolution : can aspects keep pace?

Gibbs, Celina 08 January 2010 (has links)
Many rapidly evolving systems eventually require extensive restructuring in order to effectively support further evolution. Not surprisingly, these overhauls can reverberate throughout the system, forcing changes to hundreds of files. Though several studies have shown the benefits of aspect-oriented software development from the point of view of the modularization and evolution of crosscutting concerns, the question remains as to how well aspects fare when the code that is crosscut undergoes rapid, extensive restructuring. That is, can aspects keep pace when faced with a big bang type of evolution? This case study demonstrates the concrete ways in which aspects impact the rapid and extensive restructuring of a memory management subsystem of a Java virtual machine. Compared with best efforts in a hierarchical decomposition coupled with a preprocessor, results show an aspect-oriented implementation fared no worse than the original in two out of four aspects, and better in the remaining two.
80

The Functional Paradigm in Embedded Real-Time Systems : A study in the problems and opportunities the functional programming paradigm entails to embedded real-time systems

Bergström, Emil, Tong, Shiliang January 2014 (has links)
This thesis explores the possibility of the functional programming paradigm in the domain of hard embedded real-time systems. The implementation consists of re-implementing an already developed system that is written with the imperative and object oriented paradigms. The functional implementation of the system in question is compared with the original implementation and a study of code complexity, timing properties, CPU utilization and memory usage is performed. The implementation of this thesis consists of re-developing three of the periodic tasks of the original system and the whole development process is facilitated with the TDD development cycle. The programming language used in this thesis is C but with a functional approach to the problem. We conclusions of this thesis is that the functional implementation will give a more stable, reliable and readable system but some code volume, memory usage and CPU utilization overhead is present. The main benefit of using the functional paradigm in this type of system is the ability of using the TDD development cycle. The main con of this type of implementation is that it relies heavily on garbage collection due to the enforcement of data immutability. We find in conclusion that one can only use the functional paradigm if one has an over dimensioned system when it comes to hardware, mainly when it comes to memory size and CPU power. When developing small systems with scarce resources one should choose another paradigm.

Page generated in 0.0328 seconds