Spelling suggestions: "subject:"[een] MEMORY MANAGEMENT"" "subject:"[enn] MEMORY MANAGEMENT""
171 |
Fault-tolerant Programming Models and Computing FrameworksKurt, Mehmet Can 14 October 2015 (has links)
No description available.
|
172 |
From Intuition to Evidence: A Data-Driven Approach to Transforming CS EducationAllevato, Anthony James 13 August 2012 (has links)
Educators in many disciplines are too often forced to rely on intuition about how students learn and the effectiveness of teaching to guide changes and improvements to their curricula. In computer science, systems that perform automated collection and assessment of programming assignments are seeing increased adoption, and these systems generate a great deal of meaningful intermediate data and statistics during the grading process. Continuous collection of these data and long-term retention of collected data present educators with a new resource to assess both learning (how well students understand a topic or how they behave on assignments) and teaching (how effective a response, intervention, or assessment instrument was in evaluating knowledge or changing behavior), by basing their decisions on evidence rather than intuition. It is only possible to achieve these goals, however, if such data are easily accessible.
I present an infrastructure that has been added to one such automated grading system, Web-CAT, in order to facilitate routine data collection and access while requiring very little added effort by instructors. Using this infrastructure, I present three case studies that serve as representative examples of educational questions that can be explored thoroughly using pre-existing data from required student work. The first case study examines student time management habits and finds that students perform better when they start earlier but that offering extra credit for finishing earlier did not encourage them to do so. The second case study evaluates a tool used to improve student understanding of manual memory management and finds that students made fewer errors when using the tool. The third case study evaluates the reference tests used to grade student code on a selected assignment and confirms that the tests are a suitable instrument for assessing student ability. In each case study, I use a data-driven, evidence-based approach spanning multiple semesters and students, allowing me to answer each question in greater detail than was possible using previous methods and giving me significantly increased confidence in my conclusions. / Ph. D.
|
173 |
Traitement des signaux et images en temps réel : "implantation de H.264 sur MPSoC"Messaoudi, Kamel 19 December 2012 (has links)
Cette thèse est élaborée en cotutelle entre l’université Badji Mokhtar (Laboratoire LERICA) et l’université de bourgogne (Laboratoire LE2I, UMR CNRS 5158). Elle constitue une contribution à l’étude et l’implantation de l’encodeur H.264/AVC. Durent l’évolution des normes de compression vidéo, une réalité sure est vérifiée de plus en plus : avoir une bonne performance du processus de compression nécessite l’élaboration d’équipements beaucoup plus performants en termes de puissance de calcul, de flexibilité et de portabilité et ceci afin de répondre aux exigences des différents traitements et satisfaire au critère « Temps Réel ». Pour assurer un temps réel pour ce genre d’applications, une solution reste possible est l’utilisation des systèmes sur puce (SoC) ou bien des systèmes multiprocesseurs sur puce (MPSoC) implantés sur des plateformes reconfigurables à base de circuit FPGA. L’objective de cette thèse consiste à l’étude et l’implantation des algorithmes de traitement des signaux et images et en particulier la norme H.264/AVC, et cela dans le but d’assurer un temps réel pour le cycle codage-décodage. Nous utilisons deux plateformes FPGA de Xilinx (ML501 et XUPV5). Dans la littérature, il existe déjà plusieurs implémentations du décodeur. Pour l’encodeur, malgré les efforts énormes réalisés, il reste toujours du travail pour l’optimisation des algorithmes et l’extraction des parallélismes possibles surtout avec une variété de profils et de niveaux de la norme H.264/AVC.Dans un premier temps de cette thèse, nous proposons une implantation matérielle d’un contrôleur mémoire spécialement pour l’encodeur H.264/AVC. Ce contrôleur est réalisé en ajoutant, au contrôleur mémoire DDR2 des deux plateformes de Xilinx, une couche intelligente capable de calculer les adresses et récupérer les données nécessaires pour les différents modules de traitement de l’encodeur. Ensuite, nous proposons des implantations matérielles (niveau RTL) des modules de traitement de l’encodeur H.264. Sur ces implantations, nous allons exploiter les deux principes de parallélisme et de pipelining autorisé par l’encodeur en vue de la grande dépendance inter-blocs. Nous avons ainsi proposé plusieurs améliorations et nouvelles techniques dans les modules de la chaine Intra et le filtre anti-blocs. A la fin de cette thèse, nous utilisons les modules réalisés en matériels pour la l’implantation Matérielle/logicielle de l’encodeur H.264/AVC. Des résultats de synthèse et de simulation, en utilisant les deux plateformes de Xilinx, sont montrés et comparés avec les autres implémentations existantes / This thesis has been carried out in joint supervision between the Badji Mokhtar University (LERICA Laboratory) and the University of Burgundy (LE2I laboratory, UMR CNRS 5158). It is a contribution to the study and implementation of the H.264/AVC encoder. The evolution in video coding standards have historically demanded stringent performances of the compression process, which imposes the development of platforms that perform much better in terms of computing power, flexibility and portability. Such demands are necessary to fulfill requirements of the different treatments and to meet "Real Time" processing constraints. In order to ensure real-time performances, a possible solution is to made use of systems on chip (SoC) or multiprocessor systems on chip (MPSoC) built on platforms based reconfigurable FPGAs. The objective of this thesis is the study and implementation of algorithms for signal and image processing (in particular the H.264/AVC standard); especial attention was given to provide real-time coding-decoding cycles. We use two FPGA platforms (ML501 and XUPV5 from Xilinx) to implement our architectures. In the literature, there are already several implementations of the decoder. For the encoder part, despite the enormous efforts made, work remains to optimize algorithms and extract the inherent parallelism of the architecture. This is especially true with a variety of profiles and levels of H.264/AVC. Initially, we proposed a hardware implementation of a memory controller specifically targeted to the H.264/AVC encoder. This controller is obtained by adding, to the DDR2 memory controller, an intelligent layer capable of calculating the addresses and to retrieve the necessary data for several of the processing modules of the encoder. Afterwards, we proposed hardware implementations (RTL) for the processing modules of the H.264 encoder. In these implementations, we made use of principles of parallelism and pipelining, taking into account the constraints imposed by the inter-block dependency in the encoder. We proposed several enhancements and new technologies in the channel Intra modules and the deblocking filter. At the end of this thesis, we use the modules implemented in hardware for implementing the H.264/AVC encoder in a hardware/software design. Synthesis and simulation results, using both platforms for Xilinx, are shown and compared with other existing implementations
|
174 |
Estudo da efetividade dos mecanismos de compartilhamento de memória em hipervisores / Study of the effectiveness of memory sharing mechanisms in hypervisorsVeiga, Fellipe Medeiros 28 August 2015 (has links)
A crescente demanda por ambientes de virtualização de larga escala, como os usados em datacenters e nuvens computacionais, faz com que seja necessário um gerenciamento eficiente dos recursos computacionais utilizados. Um dos recursos mais exigidos nesses ambientes é a memória RAM, que costuma ser o principal fator limitante em relação ao número de máquinas virtuais que podem executar sobre o mesmo host físico. Recentemente, hipervisores trouxeram mecanismos de compartilhamento transparente de memória RAM entre máquinas virtuais, visando diminuir a demanda total de memória no sistema. Esses mecanismos “fundem” páginas idênticas encontradas nas várias máquinas virtuais em um mesmo quadro de memória física, usando uma abordagem copy-on-write, de forma transparente para os sistemas convidados. O objetivo deste estudo é apresentar uma visão geral desses mecanismos e também avaliar seu desempenho e efetividade. São apresentados resultados de experimentos realizados com dois hipervisores populares (VMware e KVM), usando sistemas operacionais convidados distintos (Linux e Windows) e cargas de trabalho diversas (sintéticas e reais). Os resultados obtidos evidenciam diferenças significativas de desempenho entre os hipervisores em função dos sistemas convidados, das cargas de trabalho e do tempo. / The growing demand for large-scale virtualization environments, such as the ones used in cloud computing, has led to a need for efficient management of computing resources. RAM memory is the one of the most required resources in these environments, and is usually the main factor limiting the number of virtual machines that can run on the physical host. Recently, hypervisors have brought mechanisms for transparent memory sharing between virtual machines in order to reduce the total demand for system memory. These mechanisms “merge” similar pages detected in multiple virtual machines into the same physical memory, using a copy-on-write mechanism in a manner that is transparent to the guest systems. The objective of this study is to present an overview of these mechanisms and also evaluate their performance and effectiveness. The results of two popular hypervisors (VMware and KVM) using different guest operating systems (Linux and Windows) and different workloads (synthetic and real) are presented herein. The results show significant performance differences between hypervisors according to the guest system workloads and execution time.
|
175 |
Automatic Data Allocation, Buffer Management And Data Movement For Multi-GPU MachinesRamashekar, Thejas 10 1900 (has links) (PDF)
Multi-GPU machines are being increasingly used in high performance computing. These machines are being used both as standalone work stations to run computations on medium to large data sizes (tens of gigabytes) and as a node in a CPU-Multi GPU cluster handling very large data sizes (hundreds of gigabytes to a few terabytes). Each GPU in such a machine has its own memory and does not share the address space either with the host CPU or other GPUs. Hence, applications utilizing multiple GPUs have to manually allocate and managed at a on each GPU.
A significant body of scientific applications that utilize multi-GPU machines contain computations inside affine loop nests, i.e., loop nests that have affine bounds and affine array access functions. These include stencils, linear-algebra kernels, dynamic programming codes and data-mining applications. Data allocation, buffer management, and coherency handling are critical steps that need to be performed to run affine applications on multi-GPU machines. Existing works that propose to automate these steps have limitations and in efficiencies in terms of allocation sizes, exploiting reuse, transfer costs and scalability. An automatic multi-GPU memory manager that can overcome these limitations and enable applications to achieve salable performance is highly desired.
One technique that has been used in certain memory management contexts in the literature is that of bounding boxes. The bounding box of an array, for a given tile, is the smallest hyper-rectangle that encapsulates all the array elements accessed by that tile. In this thesis, we exploit the potential of bounding boxes for memory management far beyond their current usage in the literature.
In this thesis, we propose a scalable and fully automatic data allocation and buffer management scheme for affine loop nests on multi-GPU machines. We call it the Bounding Box based Memory Manager (BBMM). BBMM is a compiler-assisted runtime memory manager. At compile time, it use static analysis techniques to identify a set of bounding boxes accessed by a computation tile. At run time, it uses the bounding box set operations such as union, intersection, difference, finding subset and superset relation to compute a set of disjoint bounding boxes from the set of bounding boxes identified at compile time. It also exploits the architectural capability provided by GPUs to perform fast transfers of rectangular (strided) regions of memory and hence performs all data transfers in terms of bounding boxes. BBMM uses these techniques to automatically allocate, and manage data required by applications (suitably tiled and parallelized for GPUs). This allows It to (1) allocate only as much data (or close to) as is required by computations running on each GPU, (2) efficiently track buffer allocations and hence, maximize data reuse across tiles and minimize the data transfer overhead, (3) and as a result, enable applications to maximize the utilization of the combined memory on multi-GPU machines. BBMM can work with any choice of parallelizing transformations, computation placement, and scheduling schemes, whether static or dynamic. Experiments run on a system with four GPUs with various scientific programs showed that BBMM is able to reduce data allocations on each GPU by up to 75% compared to current allocation schemes, yield at least 88% of the performance of hand-optimized Open CL codes and allows excellent weak scaling.
|
176 |
Efektivní správa paměti ve vícevláknových aplikacích / Effective Memory Management for Multi-Threaded ApplicationsVašíček, Libor January 2008 (has links)
This thesis describes design and implementation of effective memory management for multi-threaded applications. At first, the virtual memory possibilities are described, which can be found in the latest operating systems, such as Microsoft Windows and Linux. Afterwards the most frequently used algorithms for memory management are explained. Consequently, their features are used properly for a new memory manager. Final design includes particular tools for application debugging and profiling. At the end of the thesis a series of tests and evaluation of achieved results were done.
|
Page generated in 0.058 seconds