Global ETD Search

51	Automatic task and data mapping in shared memory architectures / Mapeamento automático de processos e dados em arquiteturas de memória compartilhada Diener, Matthias January 2015 (has links) Arquiteturas paralelas modernas têm hierarquias de memória complexas, que consistem de vários níveis de memórias cache privadas e compartilhadas, bem como Non-Uniform Memory Access (NUMA) devido a múltiplos controladores de memória por sistema. Um dos grandes desafios dessas arquiteturas é melhorar a localidade e o balanceamento de acessos à memória de tal forma que a latência média de acesso à memória é reduzida. Dessa forma, o desempenho e a eficiência energética de aplicações paralelas podem ser melhorados. Os acessos podem ser melhorados de duas maneiras: (1) processos que acessam dados compartilhados (comunicação entre processos) podem ser alocados em unidades de execução próximas na hierarquia de memória, a fim de melhorar o uso das caches. Esta técnica é chamada de mapeamento de processos. (2) Mapear as páginas de memória que cada processo acessa ao nó NUMA que ele está sendo executado, assim, pode-se reduzir o número de acessos a memórias remotas em arquiteturas NUMA. Essa técnica é conhecida como mapeamento de dados. Para melhores resultados, os mapeamentos de processos e dados precisam ser realizados de forma integrada. Trabalhos anteriores nesta área executam os mapeamentos separadamente, o que limita os ganhos que podem ser alcançados. Além disso, a maioria dos mecanismos anteriores exigem operações caras, como traços de acessos à memória, para realizar o mapeamento, além de exigirem mudanças no hardware ou na aplicação paralela. Estes mecanismos não podem ser considerados soluções genéricas para o problema de mapeamento. Nesta tese, fazemos duas contribuições principais para o problema de mapeamento. Em primeiro lugar, nós introduzimos um conjunto de métricas e uma metodologia para analisar aplicações paralelas, a fim de determinar a sua adequação para um melhor mapeamento e avaliar os possíveis ganhos que podem ser alcançados através desse mapeamento otimizado. Em segundo lugar, propomos um mecanismo que executa o mapeamento de processos e dados online. Este mecanismo funciona no nível do sistema operacional e não requer alterações no hardware, os códigos fonte ou bibliotecas. Uma extensa avaliação com múltiplos conjuntos de carga de trabalho paralelos mostram consideráveis melhorias em desempenho e eficiência energética. / Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a major challenge in shared-memory architectures. Modern systems have deep and complex memory hierarchies with multiple cache levels and memory controllers, leading to a Non-Uniform Memory Access (NUMA) behavior. In such systems, there are two ways to improve the memory affinity: First, by mapping tasks that share data (communicate) to cores with a shared cache, cache usage and communication performance are improved. Second, by mapping memory pages to memory controllers that perform the most accesses to them and are not overloaded, the average cost of accesses is reduced. We call these two techniques task mapping and data mapping, respectively. For optimal results, task and data mapping need to be performed in an integrated way. Previous work in this area performs the mapping only separately, which limits the gains that can be achieved. Furthermore, most previous mechanisms require expensive operations, such as communication or memory access traces, to perform the mapping, require changes to the hardware or to the parallel application, or use a simple static mapping. These mechanisms can not be considered generic solutions for the mapping problem. In this thesis, we make two contributions to the mapping problem. First, we introduce a set of metrics and a methodology to analyze parallel applications in order to determine their suitability for an improved mapping and to evaluate the possible gains that can be achieved using an optimized mapping. Second, we propose two automatic mechanisms that perform task mapping and combined task/data mapping, respectively, during the execution of a parallel application. These mechanisms work on the operating system level and require no changes to the hardware, the applications themselves or their runtime libraries. An extensive evaluation with parallel applications from multiple benchmark suites as well as real scientific applications shows substantial performance and energy efficiency improvements that are significantly higher than simple mechanisms and previous work, while maintaining a low overhead. Arquiteturas paralelas Processamento paralelo Banco : Dados Memória compartilhada Task mapping Data mapping Shared memory Multicore NUMA
52	Online thread and data mapping using the memory management unit / Mapeamento dinâmico de threads e dados usando a unidade de gerência de memória Cruz, Eduardo Henrique Molina da January 2016 (has links) Conforme o paralelismo a nível de threads aumenta nas arquiteturas modernas devido ao aumento do número de núcleos por processador e processadores por sistema, a complexidade da hierarquia de memória também aumenta. Tais hierarquias incluem diversos níveis de caches privadas ou compartilhadas e tempo de acesso não uniforme à memória. Um desafio importante em tais arquiteturas é a movimentação de dados entre os núcleos, caches e bancos de memória primária, que ocorre quando um núcleo realiza uma transação de memória. Neste contexto, a redução da movimentação de dados é um dos pilares para futuras arquiteturas para manter o aumento de desempenho e diminuir o consumo de energia. Uma das soluções adotadas para reduzir a movimentação de dados é aumentar a localidade dos acessos à memória através do mapeamento de threads e dados. Mecanismos de mapeamento do estado-da-arte aumentam a localidade de memória mapeando threads que compartilham um grande volume de dados em núcleos próximos na hierarquia de memória (mapeamento de threads), e mapeando os dados em bancos de memória próximos das threads que os acessam (mapeamento de dados). Muitas propostas focam em mapeamento de threads ou dados separadamente, perdendo oportunidades de ganhar desempenho. Outras propostas dependem de traços de execução para realizar um mapeamento estático, que podem impor uma sobrecarga alta e não podem ser usados em aplicações cujos comportamentos de acesso à memória mudam em diferentes execuções. Há ainda propostas que usam amostragem ou informações indiretas sobre o padrão de acesso à memória, resultando em informação imprecisa sobre o acesso à memória. Nesta tese de doutorado, são propostas soluções inovadoras para identificar um mapeamento que otimize o acesso à memória fazendo uso da unidade de gerência de memória para monitor os acessos à memória. As soluções funcionam dinamicamente em paralelo com a execução da aplicação, detectando informações para o mapeamento de threads e dados. Com tais informações, o sistema operacional pode realizar o mapeamento durante a execução das aplicações, não necessitando de conhecimento prévio sobre o comportamento da aplicação. Como as soluções funcionam diretamente na unidade de gerência de memória, elas podem monitorar a maioria dos acessos à memória com uma baixa sobrecarga. Em arquiteturas com TLB gerida por hardware, as soluções podem ser implementadas com pouco hardware adicional. Em arquiteturas com TLB gerida por software, algumas das soluções podem ser implementadas sem hardware adicional. As soluções aqui propostas possuem maior precisão que outros mecanismos porque possuem acesso a mais informações sobre o acesso à memória. Para demonstrar os benefícios das soluções propostas, elas são avaliadas com uma variedade de aplicações usando um simulador de sistema completo, uma máquina real com TLB gerida por software, e duas máquinas reais com TLB gerida por hardware. Na avaliação experimental, as soluções reduziram o tempo de execução em até 39%. O ganho de desempenho se deu por uma redução substancial da quantidade de faltas na cache, e redução do tráfego entre processadores. / As thread-level parallelism increases in modern architectures due to larger numbers of cores per chip and chips per system, the complexity of their memory hierarchies also increase. Such memory hierarchies include several private or shared cache levels, and Non-Uniform Memory Access nodes with different access times. One important challenge for these architectures is the data movement between cores, caches, and main memory banks, which occurs when a core performs a memory transaction. In this context, the reduction of data movement is an important goal for future architectures to keep performance scaling and to decrease energy consumption. One of the solutions to reduce data movement is to improve memory access locality through sharing-aware thread and data mapping. State-of-the-art mapping mechanisms try to increase locality by keeping threads that share a high volume of data close together in the memory hierarchy (sharing-aware thread mapping), and by mapping data close to where its accessing threads reside (sharing-aware data mapping). Many approaches focus on either thread mapping or data mapping, but perform them separately only, losing opportunities to improve performance. Some mechanisms rely on execution traces to perform a static mapping, which have a high overhead and can not be used if the behavior of the application changes between executions. Other approaches use sampling or indirect information about the memory access pattern, resulting in imprecise memory access information. In this thesis, we propose novel solutions to identify an optimized sharing-aware mapping that make use of the memory management unit of processors to monitor the memory accesses. Our solutions work online in parallel to the execution of the application and detect the memory access pattern for both thread and data mappings. With this information, the operating system can perform sharing-aware thread and data mapping during the execution of the application, without any prior knowledge of their behavior. Since they work directly in the memory management unit, our solutions are able to track most memory accesses performed by the parallel application, with a very low overhead. They can be implemented in architectures with hardwaremanaged TLBs with little additional hardware, and some can be implemented in architectures with software-managed TLBs without any hardware changes. Our solutions have a higher accuracy than previous mechanisms because they have access to more accurate information about the memory access behavior. To demonstrate the benefits of our proposed solutions, we evaluate them with a wide variety of applications using a full system simulator, a real machine with software-managed TLBs, and a trace-driven evaluation in two real machines with hardware-managed TLBs. In the experimental evaluation, our proposals were able to reduce execution time by up to 39%. The improvements happened to a substantial reduction in cache misses and interchip interconnection traffic. Processamento paralelo Memoria : Computadores Data movement Thread and data mapping Cache memory NUMA
53	Automatic task and data mapping in shared memory architectures / Mapeamento automático de processos e dados em arquiteturas de memória compartilhada Diener, Matthias January 2015 (has links) Arquiteturas paralelas modernas têm hierarquias de memória complexas, que consistem de vários níveis de memórias cache privadas e compartilhadas, bem como Non-Uniform Memory Access (NUMA) devido a múltiplos controladores de memória por sistema. Um dos grandes desafios dessas arquiteturas é melhorar a localidade e o balanceamento de acessos à memória de tal forma que a latência média de acesso à memória é reduzida. Dessa forma, o desempenho e a eficiência energética de aplicações paralelas podem ser melhorados. Os acessos podem ser melhorados de duas maneiras: (1) processos que acessam dados compartilhados (comunicação entre processos) podem ser alocados em unidades de execução próximas na hierarquia de memória, a fim de melhorar o uso das caches. Esta técnica é chamada de mapeamento de processos. (2) Mapear as páginas de memória que cada processo acessa ao nó NUMA que ele está sendo executado, assim, pode-se reduzir o número de acessos a memórias remotas em arquiteturas NUMA. Essa técnica é conhecida como mapeamento de dados. Para melhores resultados, os mapeamentos de processos e dados precisam ser realizados de forma integrada. Trabalhos anteriores nesta área executam os mapeamentos separadamente, o que limita os ganhos que podem ser alcançados. Além disso, a maioria dos mecanismos anteriores exigem operações caras, como traços de acessos à memória, para realizar o mapeamento, além de exigirem mudanças no hardware ou na aplicação paralela. Estes mecanismos não podem ser considerados soluções genéricas para o problema de mapeamento. Nesta tese, fazemos duas contribuições principais para o problema de mapeamento. Em primeiro lugar, nós introduzimos um conjunto de métricas e uma metodologia para analisar aplicações paralelas, a fim de determinar a sua adequação para um melhor mapeamento e avaliar os possíveis ganhos que podem ser alcançados através desse mapeamento otimizado. Em segundo lugar, propomos um mecanismo que executa o mapeamento de processos e dados online. Este mecanismo funciona no nível do sistema operacional e não requer alterações no hardware, os códigos fonte ou bibliotecas. Uma extensa avaliação com múltiplos conjuntos de carga de trabalho paralelos mostram consideráveis melhorias em desempenho e eficiência energética. / Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a major challenge in shared-memory architectures. Modern systems have deep and complex memory hierarchies with multiple cache levels and memory controllers, leading to a Non-Uniform Memory Access (NUMA) behavior. In such systems, there are two ways to improve the memory affinity: First, by mapping tasks that share data (communicate) to cores with a shared cache, cache usage and communication performance are improved. Second, by mapping memory pages to memory controllers that perform the most accesses to them and are not overloaded, the average cost of accesses is reduced. We call these two techniques task mapping and data mapping, respectively. For optimal results, task and data mapping need to be performed in an integrated way. Previous work in this area performs the mapping only separately, which limits the gains that can be achieved. Furthermore, most previous mechanisms require expensive operations, such as communication or memory access traces, to perform the mapping, require changes to the hardware or to the parallel application, or use a simple static mapping. These mechanisms can not be considered generic solutions for the mapping problem. In this thesis, we make two contributions to the mapping problem. First, we introduce a set of metrics and a methodology to analyze parallel applications in order to determine their suitability for an improved mapping and to evaluate the possible gains that can be achieved using an optimized mapping. Second, we propose two automatic mechanisms that perform task mapping and combined task/data mapping, respectively, during the execution of a parallel application. These mechanisms work on the operating system level and require no changes to the hardware, the applications themselves or their runtime libraries. An extensive evaluation with parallel applications from multiple benchmark suites as well as real scientific applications shows substantial performance and energy efficiency improvements that are significantly higher than simple mechanisms and previous work, while maintaining a low overhead. Arquiteturas paralelas Processamento paralelo Banco : Dados Memória compartilhada Task mapping Data mapping Shared memory Multicore NUMA
54	A Performance Evaluation of MPI Shared Memory Programming / En utvärdering av MPI shared memory - programmering med inriktning på prestanda Karlbom, David January 2016 (has links) The thesis investigates the Message Passing Interface (MPI) support for shared memory programming on modern hardware architecture with multiple Non-Uniform Memory Access (NUMA) domains. We investigate its performance in two case studies: the matrix-matrix multiplication and Conway’s game of life. We compare MPI shared memory performance in terms of execution time and memory consumption with the performance of implementations using OpenMP and MPI point-to-point communication, also called "MPI two-sided". We perform strong scaling tests in both test cases. We observe that MPI two-sided implementation is 21% and 18% faster than the MPI shared and OpenMP implementations respectively in the matrix-matrix multiplication when using 32 processes. MPI shared uses less memory space: when compared to MPI two-sided, MPI shared uses 45% less memory. In the Conway’s game of life, we find that MPI two-sided implementation is 10% and 82% faster than the MPI shared and OpenMP implementations respectively when using 32 processes. We also observe that not mapping virtual memory to a specific NUMA domain can lead to an increment in execution time of 64% when using 32 processes. The use of MPI shared is viable for intranode communication on modern hardware architecture with multiple NUMA domains. / I detta examensarbete undersöker vi Message Passing Inferfaces (MPI) support för shared memory programmering på modern hårdvaruarkitektur med flera Non-Uniform Memory Access (NUMA) domäner. Vi undersöker prestanda med hjälp av två fallstudier: matris-matris multiplikation och Conway’s game of life. Vi jämför prestandan utav MPI shared med hjälp utav exekveringstid samt minneskonsumtion jämtemot OpenMP och MPI punkt-till-punkt kommunikation, även känd som MPI two-sided. Vi utför strong scaling tests för båda fallstudierna. Vi observerar att MPI-two sided är 21% snabbare än MPI shared och 18% snabbare än OpenMP för matris-matris multiplikation när 32 processorer användes. För samma testdata har MPI shared en 45% lägre minnesförburkning än MPI two-sided. För Conway’s game of life är MPI two-sided 10% snabbare än MPI shared samt 82% snabbare än OpenMP implementation vid användandet av 32 processorer. Vi kunde också utskilja att om ingen mappning av virtuella minnet till en specifik NUMA domän görs, leder det till en ökning av exekveringstiden med upp till 64% när 32 processorer används. Vi kom fram till att MPI shared är användbart för intranode kommunikation på modern hårdvaruarkitektur med flera NUMA domäner. MPI OpenMP Shared Memory HPC Parallel Programming NUMA Computer Sciences Datavetenskap (datalogi)
55	Graph Pattern Matching on Symmetric Multiprocessor Systems Krause, Alexander 09 October 2020 (has links) Graph-structured data can be found in nearly every aspect of today's world, be it road networks, social networks or the internet itself. From a processing perspective, finding comprehensive patterns in graph-structured data is a core processing primitive in a variety of applications, such as fraud detection, biological engineering or social graph analytics. On the hardware side, multiprocessor systems, that consist of multiple processors in a single scale-up server, are the next important wave on top of multi-core systems. In particular, symmetric multiprocessor systems (SMP) are characterized by the fact, that each processor has the same architecture, e.g. every processor is a multi-core and all multiprocessors share a common and huge main memory space. Moreover, large SMPs will feature a non-uniform memory access (NUMA), whose impact on the design of efficient data processing concepts should not be neglected. The efficient usage of SMP systems, that still increase in size, is an interesting and ongoing research topic. Current state-of-the-art architectural design principles provide different and in parts disjunct suggestions on which data should be partitioned and or how intra-process communication should be realized. In this thesis, we propose a new synthesis of four of the most well-known principles Shared Everything, Partition Serial Execution, Data Oriented Architecture and Delegation, to create the NORAD architecture, which stands for NUMA-aware DORA with Delegation. We built our research prototype called NeMeSys on top of the NORAD architecture to fully exploit the provided hardware capacities of SMPs for graph pattern matching. Being an in-memory engine, NeMeSys allows for online data ingestion as well as online query generation and processing through a terminal based user interface. Storing a graph on a NUMA system inherently requires data partitioning to cope with the mentioned NUMA effect. Hence, we need to dissect the graph into a disjunct set of partitions, which can then be stored on the individual memory domains. This thesis analyzes the capabilites of the NORAD architecture, to perform scalable graph pattern matching on SMP systems. To increase the systems performance, we further develop, integrate and evaluate suitable optimization techniques. That is, we investigate the influence of the inherent data partitioning, the interplay of messaging with and without sufficient locality information and the actual partition placement on any NUMA socket in the system. To underline the applicability of our approach, we evaluate NeMeSys against synthetic datasets and perform an end-to-end evaluation of the whole system stack on the real world knowledge graph of Wikidata. info:eu-repo/classification/ddc/004 ddc:004
56	Code Generation and Global Optimization Techniques for a Reconfigurable PRAM-NUMA Multicore Architecture Hansson, Erik January 2014 (has links) In this thesis we describe techniques for code generation and global optimization for a PRAM-NUMA multicore architecture. We specifically focus on the REPLICA architecture which is a family massively multithreaded very long instruction word (VLIW) chip multiprocessors with chained functional units that has a reconfigurable emulated shared on-chip memory. The on-ship memory system supports two execution modes, PRAM and NUMA, which can be switched between at run-time.PRAM mode is considered the standard execution mode and targets mainly applications with very high thread level parallelism (TLP). In contrast, NUMA mode is optimized for sequential legacy applications and applications with low amount of TLP. Different versions of the REPLICA architecture have different number of cores, hardware threads and functional units. In order to utilize the REPLICA architecture efficiently we have made several contributionsto the development of a compiler for REPLICA target code generation. It supports both code generation for PRAM mode and NUMA mode and can generate code for different versions of the processor pipeline (i.e. for different numbers of functional units). It includes optimization phases to increase the utilization of the available functional units. We have also contributed to quantitative the evaluation of PRAM and NUMA mode. The results show that PRAM mode often suits programs with irregular memory access patterns and control flow best while NUMA mode suites regular programs better. However, for a particular program it is not always obvious which mode, PRAM or NUMA, will show best performance. To tackle this we contributed a case study for generic stencil computations, using machine learning derived cost models in order to automatically select at runtime which mode to execute in. We extended this to also include a sequence of kernels. Computer and Information Sciences Data- och informationsvetenskap
57	The Numan tradition and its uses in the literature Rome's 'Golden Age' / Otis, Lise. January 2001 (has links) No description available. Latin literature. Rome -- In literature.
58	Avaliação de desempenho de plataformas de virtualização de redes. / Performance evaluation of network virtualization plataforms. Leopoldo Alexandre Freitas Mauricio 27 August 2013 (has links) O objetivo desta dissertação é avaliar o desempenho de ambientes virtuais de roteamento construídos sobre máquinas x86 e dispositivos de rede existentes na Internet atual. Entre as plataformas de virtualização mais utilizadas, deseja-se identificar quem melhor atende aos requisitos de um ambiente virtual de roteamento para permitir a programação do núcleo de redes de produção. As plataformas de virtualização Xen e KVM foram instaladas em servidores x86 modernos de grande capacidade, e comparadas quanto a eficiência, flexibilidade e capacidade de isolamento entre as redes, que são os requisitos para o bom desempenho de uma rede virtual. Os resultados obtidos nos testes mostram que, apesar de ser uma plataforma de virtualização completa, o KVM possui desempenho melhor que o do Xen no encaminhamento e roteamento de pacotes, quando o VIRTIO é utilizado. Além disso, apenas o Xen apresentou problemas de isolamento entre redes virtuais. Também avaliamos o efeito da arquitetura NUMA, muito comum em servidores x86 modernos, sobre o desempenho das VMs quando muita memória e núcleos de processamento são alocados nelas. A análise dos resultados mostra que o desempenho das operações de Entrada e Saída (E/S) de rede pode ser comprometido, caso as quantidades de memória e CPU virtuais alocadas para a VM não respeitem o tamanho dos nós NUMA existentes no hardware. Por último, estudamos o OpenFlow. Ele permite que redes sejam segmentadas em roteadores, comutadores e em máquinas x86 para que ambientes virtuais de roteamento com lógicas de encaminhamento diferentes possam ser criados. Verificamos que ao ser instalado com o Xen e com o KVM, ele possibilita a migração de redes virtuais entre diferentes nós físicos, sem que ocorram interrupções nos fluxos de dados, além de permitir que o desempenho do encaminhamento de pacotes nas redes virtuais criadas seja aumentado. Assim, foi possível programar o núcleo da rede para implementar alternativas ao protocolo IP. / The aim of this work is to evaluate the performance of routing virtual environments built on x86 machines and network devices existing on the Internet today. Among the most widely used virtualization platforms, we want to identify which best meets the requirements of a virtual routing to allow programming of the core production networks. Virtualization platforms Xen and KVM were installed on modern large capacity x86 machines, and they were compared for efficiency, flexibility and isolation between networks, which are the requirements for good performance of a virtual network. The tests results show that, despite being a full virtualization platform, KVM has better performance than Xen in forwarding and routing packets when the VIRTIO is used. Furthermore, only Xen had isolation problems between networks. We also evaluate the effect of the NUMA architecture, very common in modern x86 servers, on the performance of VMs when lots of memory and processor cores are allocated to them. The results show that Input and Output (I/O) network performance can be compromised whether the amounts of virtual memory and CPU allocated to VM do not respect the size of the existing hardware NUMA nodes. Finally, we study the OpenFlow. It allows slicing networks into routers, switches and x86 machines to create virtual environments with different routing forwarding rules. We found that, when installed with Xen and KVM, it enables the migration of virtual networks among different physical nodes, without interruptions in the data streams, and allows to increase the performance of packet forwarding in the virtual networks created. Thus, it was possible to program the core network to implement alternatives to IP protocol. Engenharia Eletrônica Redes virtuais OpenFlow KVM Xen Ambientes virtuais de roteamento Máquinas x86 Plataformas de virtualização Roteadores virtuais NUMA Electronic Engineering Virtual networks OpenFlow KVM Xen Virtual routing environments x86 machines Virtualizations platforms Virtual routes NUMA ENGENHARIAS
59	Avaliação de desempenho de plataformas de virtualização de redes. / Performance evaluation of network virtualization plataforms. Leopoldo Alexandre Freitas Mauricio 27 August 2013 (has links) O objetivo desta dissertação é avaliar o desempenho de ambientes virtuais de roteamento construídos sobre máquinas x86 e dispositivos de rede existentes na Internet atual. Entre as plataformas de virtualização mais utilizadas, deseja-se identificar quem melhor atende aos requisitos de um ambiente virtual de roteamento para permitir a programação do núcleo de redes de produção. As plataformas de virtualização Xen e KVM foram instaladas em servidores x86 modernos de grande capacidade, e comparadas quanto a eficiência, flexibilidade e capacidade de isolamento entre as redes, que são os requisitos para o bom desempenho de uma rede virtual. Os resultados obtidos nos testes mostram que, apesar de ser uma plataforma de virtualização completa, o KVM possui desempenho melhor que o do Xen no encaminhamento e roteamento de pacotes, quando o VIRTIO é utilizado. Além disso, apenas o Xen apresentou problemas de isolamento entre redes virtuais. Também avaliamos o efeito da arquitetura NUMA, muito comum em servidores x86 modernos, sobre o desempenho das VMs quando muita memória e núcleos de processamento são alocados nelas. A análise dos resultados mostra que o desempenho das operações de Entrada e Saída (E/S) de rede pode ser comprometido, caso as quantidades de memória e CPU virtuais alocadas para a VM não respeitem o tamanho dos nós NUMA existentes no hardware. Por último, estudamos o OpenFlow. Ele permite que redes sejam segmentadas em roteadores, comutadores e em máquinas x86 para que ambientes virtuais de roteamento com lógicas de encaminhamento diferentes possam ser criados. Verificamos que ao ser instalado com o Xen e com o KVM, ele possibilita a migração de redes virtuais entre diferentes nós físicos, sem que ocorram interrupções nos fluxos de dados, além de permitir que o desempenho do encaminhamento de pacotes nas redes virtuais criadas seja aumentado. Assim, foi possível programar o núcleo da rede para implementar alternativas ao protocolo IP. / The aim of this work is to evaluate the performance of routing virtual environments built on x86 machines and network devices existing on the Internet today. Among the most widely used virtualization platforms, we want to identify which best meets the requirements of a virtual routing to allow programming of the core production networks. Virtualization platforms Xen and KVM were installed on modern large capacity x86 machines, and they were compared for efficiency, flexibility and isolation between networks, which are the requirements for good performance of a virtual network. The tests results show that, despite being a full virtualization platform, KVM has better performance than Xen in forwarding and routing packets when the VIRTIO is used. Furthermore, only Xen had isolation problems between networks. We also evaluate the effect of the NUMA architecture, very common in modern x86 servers, on the performance of VMs when lots of memory and processor cores are allocated to them. The results show that Input and Output (I/O) network performance can be compromised whether the amounts of virtual memory and CPU allocated to VM do not respect the size of the existing hardware NUMA nodes. Finally, we study the OpenFlow. It allows slicing networks into routers, switches and x86 machines to create virtual environments with different routing forwarding rules. We found that, when installed with Xen and KVM, it enables the migration of virtual networks among different physical nodes, without interruptions in the data streams, and allows to increase the performance of packet forwarding in the virtual networks created. Thus, it was possible to program the core network to implement alternatives to IP protocol. Engenharia Eletrônica Redes virtuais OpenFlow KVM Xen Ambientes virtuais de roteamento Máquinas x86 Plataformas de virtualização Roteadores virtuais NUMA Electronic Engineering Virtual networks OpenFlow KVM Xen Virtual routing environments x86 machines Virtualizations platforms Virtual routes NUMA ENGENHARIAS
60	Contribution à la modélisation numérique de la propagation des ondes sismiques sur architectures multicoeurs et hiérarchiques Dupros, Fabrice 13 December 2010 (has links) (PDF) En termes de prévention du risque associé aux séismes, la prédiction quantitative des phénomènes de propagation et d'amplification des ondes sismiques dans des structures géologiques complexes devient essentielle. Dans ce domaine, la simulation numérique est prépondérante et l'exploitation efficace des techniques de calcul haute performance permet d'envisager les modélisations à grande échelle nécessaires dans le domaine du risque sismique. Plusieurs évolutions récentes au niveau de l'architecture des machines parallèles nécessitent l'adaptation des algorithmes classiques utilisées pour la modélisation sismique. En effet, l'augmentation de la puissance des processeurs se traduit maintenant principalement par un nombre croissant de coeurs de calcul et les puces multicoeurs sont maintenant à la base de la majorité des architectures multiprocesseurs. Ce changement correspond également à une plus grande complexité au niveau de l'organisation physique de la mémoire qui s'articule généralement autour d'une architecture NUMA (Non Uniform Memory Access pour accès mémoire non uniforme)~de profondeur importante. Les contributions de cette thèse se situent à la fois au niveau algorithmique et numérique mais abordent également l'articulation avec les supports d'exécution optimisés pour les architectures multicoeurs. Les solutions retenues sont validées à grande échelle en considérant deux exemples de modélisation sismique. Le premier cas se situe dans la préfecture de Niigata-Chuetsu au Japon (événement du 16 juillet 2007) et repose sur la méthode des différences finies. Le deuxième exemple met en oeuvre la méthode des éléments finis. Un séisme hypothétique dans la région de Nice est modélisé en tenant compte du comportement non linéaire du sol. calcul haute performance modélisation sismique architectures NUMA processeurs multicoeurs

Search results