Global ETD Search

1	"a+b" arithmetic - Theory and implementation Manickavasagam, Senthilkumar January 1996 (has links) No description available. SPARC (Scalable Processor ARChitecture) "a+b" Arithmetic Reference Memory Manager Unit
2	A Case for Protecting Huge Pages from the Kernel Patel, Naman January 2016 (has links) (PDF) Modern architectures support multiple size pages to facilitate applications that use large chunks of contiguous memory either for buffer allocation, application specific memory management, in-memory caching or garbage collection. Most general purpose processors support larger page sizes, for e.g. x86 architecture supports 2MB and 1GB pages while PowerPC architecture supports 64KB, 16MB, 16GB pages. Such larger size pages are also known as superpages or huge pages. With the help of huge pages TLB reach can be increased significantly. The Linux kernel can transparently use these huge pages to significantly bring down the cost of TLB translations. With Transparent Huge Pages (THP) support in Linux kernel the end users or the application developers need not make any change to their application. Memory fragmentation which has been one of the classical problems in computing systems for decades is a key problem for the allocation of huge pages. Ubiquitous huge page support across architectures makes effective fragmentation management even more critical for modern systems. Applications tend to stress system TLB in the absence of huge pages, for virtual to physical address translation, which adversely affects performance/energy characteristics in long running systems. Since most kernel pages tend to be unmovable, fragmentation created due to their misplacement is more problematic and nearly impossible to recover with memory compaction. In this work, we explore physical memory manager of Linux and the interaction of kernel page placement with fragmentation avoidance and recovery mechanisms. Our analysis reveals that not only a random kernel page layout thwarts the progress of memory compaction; it can actually induce more fragmentation in the system. To address this problem, we propose a new allocator which takes special care for the placement of kernel pages. We propose a new region which represents memory area having kernel as well as user pages. Using this new region we introduce a staged allocator which with change in fragmentation level adapts and optimizes the kernel page placement. Later we introduce Illuminator which with zero overhead outperforms default kernel in terms of huge page allocation success rate and compaction overhead with respect to each huge page. We also show that huge page allocation is not a one dimensional problem but a two fold concern with how the fragmentation recovery mechanism may potentially interfere with the page clustering policy of allocator and worsen the fragmentation. Our results show that with effective kernel page placements the mixed page block counts reduces upto 70%, which allows our system to allocate 3x-4x huge pages than the default Kernel. Using these additional huge pages we show up to 38% improvement in terms of energy consumed and reduction in execution time up to 39% on standard benchmarks. Linux Kernel Memory Manager Huge Page Alloctions Virtual Machines Transparent Huge Pages (THP) Linux Memory Manager Anti-fragmentation Framework Optimal Page Block Selection (OPBS) Active Anti Fragmentation (AAF) Incremental Page Block Selection (IPBS) Computer Science
3	Automatic Data Allocation, Buffer Management And Data Movement For Multi-GPU Machines Ramashekar, Thejas 10 1900 (has links) (PDF) Multi-GPU machines are being increasingly used in high performance computing. These machines are being used both as standalone work stations to run computations on medium to large data sizes (tens of gigabytes) and as a node in a CPU-Multi GPU cluster handling very large data sizes (hundreds of gigabytes to a few terabytes). Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and managed at a on each GPU. A significant body of scientific applications that utilize multi-GPU machines contain computations inside affine loop nests, i.e., loop nests that have affine bounds and affine array access functions. These include stencils, linear-algebra kernels, dynamic programming codes and data-mining applications. Data allocation, buffer management, and coherency handling are critical steps that need to be performed to run affine applications on multi-GPU machines. Existing works that propose to automate these steps have limitations and in efficiencies in terms of allocation sizes, exploiting reuse, transfer costs and scalability. An automatic multi-GPU memory manager that can overcome these limitations and enable applications to achieve salable performance is highly desired. One technique that has been used in certain memory management contexts in the literature is that of bounding boxes. The bounding box of an array, for a given tile, is the smallest hyper-rectangle that encapsulates all the array elements accessed by that tile. In this thesis, we exploit the potential of bounding boxes for memory management far beyond their current usage in the literature. In this thesis, we propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding Box based Memory Manager (BBMM). BBMM is a compiler-assisted runtime memory manager. At compile time, it use static analysis techniques to identify a set of bounding boxes accessed by a computation tile. At run time, it uses the bounding box set operations such as union, intersection, difference, finding subset and superset relation to compute a set of disjoint bounding boxes from the set of bounding boxes identified at compile time. It also exploits the architectural capability provided by GPUs to perform fast transfers of rectangular (strided) regions of memory and hence performs all data transfers in terms of bounding boxes. BBMM uses these techniques to automatically allocate, and manage data required by applications (suitably tiled and parallelized for GPUs). This allows It to (1) allocate only as much data (or close to) as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence, maximize data reuse across tiles and minimize the data transfer overhead, (3) and as a result, enable applications to maximize the utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a system with four GPUs with various scientific programs showed that BBMM is able to reduce data allocations on each GPU by up to 75% compared to current allocation schemes, yield at least 88% of the performance of hand-optimized Open CL codes and allows excellent weak scaling. High Performance Computing Computer Memory Management Multi-GPU Memory Manager Automatic Data Allocation Data Transfer Buffer Management Affine Loop Nests Bounding Box Based Memory Manager GPU Architecture Data Movement Code Box Based Memory Manager (BBMM) Computer Science
4	A Memory Allocation Framework for Optimizing Power Consumption and Controlling Fragmentation Panwar, Ashish January 2015 (has links) (PDF) Large physical memory modules are necessary to meet performance demands of today's ap- plications but can be a major bottleneck in terms of power consumption during idle periods or when systems are running with workloads which do not stress all the plugged memory resources. Contribution of physical memory in overall system power consumption becomes even more signi cant when CPU cores run on low power modes during idle periods with hardware support like Dynamic Voltage Frequency Scaling. Our experiments show that even 10% of memory allocations can make references to all the banks of physical memory on a long running system primarily due to the randomness in page allocation. We also show that memory hot-remove or memory migration for large blocks is often restricted, in a long running system, due to allocation policies of current Linux VM which mixes movable and unmovable pages. Hence it is crucial to improve page migration for large contiguous blocks for a practical realization of power management support provided by the hardware. Operating systems can play a decisive role in effectively utilizing the power management support of modern DIMMs like PASR(Partial Array Self Refresh) in these situations but have not been using them so far. We propose three different approaches for optimizing memory power consumption by in- ducing bank boundary awareness in the standard buddy allocator of Linux kernel as well as distinguishing user and kernel memory allocations at the same time to improve the movability of memory sections (and hence memory-hotplug) by page migration techniques. Through a set of minimal changes in the standard buddy system of Linux VM, we have been able to reduce the number of active memory banks significantly (upto 80%) as well as to improve performance of memory-hotplug framework (upto 85%). Optimizing Power Consumption Memory Power Consumption Memory Modules Controlling Fragmentation Memory-Hotplug Linux Memory Manager Bank-Buddy Allocator Pool-Buddy Memory Allocation Framework Computer Science and Automation

1

Page generated in 0.0281 seconds