Global ETD Search

21	Enabling Multi-threaded Applications on Hybrid Shared Memory Manycore Architectures January 2014 (has links) abstract: As the number of cores per chip increases, maintaining cache coherence becomes prohibitive for both power and performance. Non Coherent Cache (NCC) architectures do away with hardware-based cache coherence, but they become difficult to program. Some existing architectures provide a middle ground by providing some shared memory in the hardware. Specifically, the 48-core Intel Single-chip Cloud Computer (SCC) provides some off-chip (DRAM) shared memory some on-chip (SRAM) shared memory. We call such architectures Hybrid Shared Memory, or HSM, manycore architectures. However, how to efficiently execute multi-threaded programs on HSM architectures is an open problem. To be able to execute a multi-threaded program correctly on HSM architectures, the compiler must: i) identify all the shared data and map it to the shared memory, and ii) map the frequently accessed shared data to the on-chip shared memory. This work presents a source-to-source translator written using CETUS that identifies a conservative superset of all the shared data in a multi-threaded application and maps it to the shared memory such that it enables execution on HSM architectures. / Dissertation/Thesis / Masters Thesis Computer Science 2014 Computer science Computer engineering architecture HSM manycore multicore multithreading shared memory
22	Extending FreeRTOS to support dynamic and distributed task mapping in multiprocessor systems / Extensão do FreeRTOS para Suporte ao mapeamento dinâmico e distribuído de tarefas em sistemas multiprocessados Abich, Geancarlo January 2017 (has links) Sistemas de Multiprocessados Embarcados são uma realidade, tanto no setor da indústria e quanto no setor acadêmico. Esses dispositivos oferecem capacidades de processamento paralelo objetivando cobrir requisitos cada vez maiores de aplicações complexas. A carga de trabalho subjacente das aplicações é suscetível a variação em tempo de execução o que, se não for tratada adequadamente, pode levar a degradação de eficiência em desempenho e energia. O aumento contínuo da complexidade da carga de trabalho das aplicações, bem como do tamanho dos sistemas multiprocessados emergentes, requer soluções de mapeamento dinâmicas e distribuídas. A maioria das técnicas de mapeamento propostas são implementações personalizadas, considerando um sistema operacional interno desenvolvido para uma arquitetura de processador específica. Essa prática restringe sua aplicação em outras plataformas, levando a um design extra, revalidação e, consequentemente, um custo oculto que pode ser um tanto quanto alto. Neste cenário, esta dissertação propõe a extensão do FreeRTOS para suportar mapeamento dinâmico e distribuído de tarefas em sistemas multiprocessados. O FreeRTOS tem portabilidade para mais de 30 arquiteturas de processadores embarcados, aumentando a portabilidade de software e reduzindo o tempo de desenvolvimento. A extensão proposta utiliza técnicas de mapeamento que permitem ao FreeRTOS atender a altas demandas de mapeamento de aplicações em tempo de execução. Outra contribuição deste trabalho é o desenvolvimento de um framework que permite a exploração de grandes sistemas fornecendo, simultaneamente, resultados para depuração. O framework proposto possibilita a geração automática de plataformas multiprocessadas considerando seu tamanho, a arquitetura do processador e um conjunto de aplicações. A descrição da plataforma resultante é altamente escalável permitindo extração de dados em tempo de execução e alta depuração. Estas características permitiram validar a extensão do FreeRTOS proposta em mais de uma arquitetura de processador da família ARM Cortex-M. Os casos de teste foram executados em plataformas de grande escala e em diferentes níveis de abstração com casos de mais de 120 aplicações incorporando mais de 600 tarefas processadas. Os resultados mostram que a extensão proposta apresenta resultados melhores ou iguais à literatura. / Embedded Multiprocessor systems are a reality, in both industry and academia sectors. Such devices offer parallel processing capabilities, aiming at covering the increasing requirements of complex applications. Underlying application workloads are susceptible to variation at runtime, which if not properly handled, may lead to the performance and power efficiency degradation. The continuous increase in the complexity of application workload and the size of emerging multiprocessor systems, calls for dynamic and distributed mapping solutions. The majority of the promoted mapping techniques are bespoke implementations, which consider an in-house operating system developed to a particular processor architecture. This practice restricts its adoption in other platforms, leading to extra design time, re-validation and, consequentially, a hidden cost that may well be quite high. In this scenario, this dissertation proposes a FreeRTOS extension that integrates the support to dynamic and distributed tasks mapping in multiprocessor systems. FreeRTOS is portable to more than 30 embedded processors architectures, increasing software portability and reducing development time. The proposed extension employs mapping techniques allowing FreeRTOS for handle high demands of application mapping in runtime. Another contribution of this work is the development of a framework, which enables the exploration of large systems while providing debugging facilities. The proposed framework provides the automatic generation of multiprocessor platforms, considering parameters of size, processor architecture, and an application set. The resulting platform description is high scalable while allows runtime data extraction and high debugging. These features allowed to validate the proposed FreeRTOS extension in more than one processor architecture from ARM Cortex-M family. Test cases were executed on large-scale platforms and at different levels of abstraction with cases of more than 120 applications incorporating more than 600 tasks processed. The results show that the proposed extension presents better or equal results to the literature. Microeletrônica Multiprocessadores Dynamic mapping Distributed mapping Embedded kernel Multiprocessor systems Manycore Modelling Simulation
23	Porting a Real-Time Operating System to a Multicore Platform Sjöström Thames, Sixten January 2012 (has links) This thesis is part of the European MANY project. The goal of MANY is to provide developers with tools to develop software for multi and many-core hardware platforms. This is the first thesis that is part of MANY at Enea. The thesis aims to provide a knowledge base about software on many-core at the Enea student research group. More than just providing a knowledge base, a part of the thesis is also to port Enea's operating system OSE to Tilera's many-core processor TILEpro64. The thesis shall also investigate the memory hierarchy and interconnection network of the Tilera processor. The knowledge base about software on many-core was constrained to investigating the shared memory model and operating systems for many-core. This was achieved by investigating prominent academic research about operating systems for many-core processors. The conclusion was that a shared memory model does not scale and for the operating system case, operating systems shall be designed with scalability as one of the most important requirements. This thesis has implemented the hardware abstraction layer required to execute a single-core version of OSE on the TILEpro architecture. This was done in three steps. The Tilera hardware and the OSE software platform were investigated. After that, an OSE target port was chosen as reference architecture. Finally, the hardware dependent parts of the reference software were modified. A foundation has been made for future development. operating system operating systems many-core manycore multicore RTOS distributed operating system Computer Engineering Datorteknik
24	Utilizing Heterogeneity in Manycore Architectures for Streaming Applications Savas, Süleyman January 2017 (has links) In the last decade, we have seen a transition from single-core to manycore in computer architectures due to performance requirements and limitations in power consumption and heat dissipation. The first manycores had homogeneous architectures consisting of a few identical cores. However, the applications, which are executed on these architectures, usually consist of several tasks requiring different hardware resources to be executed efficiently. Therefore, we believe that utilizing heterogeneity in manycores will increase the efficiency of the architectures in terms of performance and power consumption. However, development of heterogeneous architectures is more challenging and the transition from homogeneous to heterogeneous architectures will increase the difficulty of efficient software development due to the increased complexity of the architecture. In order to increase the efficiency of hardware and software development, new hardware design methods and software development tools are required. Additionally, there is a lack of knowledge on the performance of applications when executed on manycore architectures. The transition began with a shift from single-core architectures to homogeneous multicore architectures consisting of a few identical cores. It now continues with a shift from homogeneous architectures with identical cores to heterogeneous architectures with different types of cores specialized for different purposes. However, this transition has increased the complexity of architectures and hence the complexity of software development and execution. In order to decrease the complexity of software development, new software tools are required. Additionally, there is a lack of knowledge on what kind of heterogeneous manycore design is most efficient for different applications and what are the performances of these applications when executed on current commercial manycores. This thesis studies manycore architectures in order to reveal possible uses of heterogeneity in manycores and facilitate choice of architecture for software and hardware developers. It defines a taxonomy for manycore architectures that is based on the levels of heterogeneity they contain and discusses benefits and drawbacks of these levels. Additionally, it evaluates several applications, a dataflow language (CAL), a source-to-source compilation framework (Cal2Many), and a commercial manycore architecture (Epiphany). The compilation framework takes implementations written in the dataflow language as input and generates code targetting different manycore platforms. Based on these evaluations, the thesis identifies the bottlenecks of the architecture. It finally presents a methodology for developing heterogeneoeus manycore architectures which target specific application domains. Our studies show that using different types of cores in manycore architectures has the potential to increase the performance of streaming applications. If we add specialized hardware blocks to a core, the performance easily increases by 15x for the target application while the core size increases by 40-50% which can be optimized further. Other results prove that dataflow languages, together with software development tools, decrease software development efforts significantly (25-50%) while having a small impact (2-17%) on the performance. / HiPEC (High Performance Embedded Computing) / NGES (Towards Next Generation Embedded Systems: Utilizing Parallelism and Reconfigurability) Manycores parallel architectures parallelism streaming applications dataflow manycore design heterogeneous manycores Computer Systems Datorsystem
25	Rigorous Design Flow for Programming Manycore Platforms / Flot de conception rigoureux pour la programmation de plates-formes manycore. Bourgos, Paraskevas 09 April 2013 (has links) L'objectif du travail présenté dans cette thèse est de répondre à un verrou fondamental, qui est «comment programmer d'une manière rigoureuse et efficace des applications embarquées sur des plateformes multi-coeurs?». Cette problématique pose plusieurs défis: 1) le développement d'une approche rigoureuse basée sur les modèles pour pouvoir garantir la correction; 2) le « mariage » entre modèle physique et modèle de calcul, c'est-à-dire, l'intégration du fonctionnel et non-fonctionnel; 3) l'adaptabilité. Pour s'attaquer à ces défis, nous avons développé un flot de conception rigoureux autour du langage BIP. Ce flot de conception permet l'exploration de l'espace de conception, le traitement à diffèrent niveaux d'abstraction à la fois pour la plate-forme et l'application, la génération du code et le déploiement sur des plates-formes multi-cœurs. La méthode utilisée s'appuie sur des transformations source-vers-source des modèles BIP. Ces transformations sont correctes-par-construction. Nous illustrons ce flot de conception avec la modélisation et le déploiement de plusieurs applications sur deux plates-formes différentes. La première plate-forme considérée est MPARM, une plate-forme virtuelle, basée sur des processeurs ARM et structurée avec des clusters, où chacun contient plusieurs cœurs. Pour cette plate-forme, nous avons considérée les applications suivantes: la factorisation de Cholesky, le décodage MPEG-2, le décodage MJPEG, la Transformée de Fourier Rapide et un algorithme de demosaicing. La seconde plate-forme est P2012/STHORM, une plate-forme multi-cœur, basée sur plusieurs clusters capable d'une gestion énergétique efficace. L'application considérée sur P2012/STHORM est l'algorithme HMAX. Les résultats expérimentaux montrent l'intérêt du flot de conception, notamment l'analyse rapide des performances ainsi que la modélisation au niveau du système, la génération de code et le déploiement. / The advent of many-core platforms is nowadays challenging our capabilities for efficient and predictable design. To meet this challenge, designers need methods and tools for guaranteeing essential properties and determining tradeoffs between performance and efficient resource management. In the process of designing a mixed software/hardware system, functional constraints and also extra-functional specifications should be taken into account as an essential part for the design of embedded systems. The impact of design choices on the overall behavior of the system should also be analyzed. This implies a deep understanding of the interaction between application software and the underlying execution platform. We present a rigorous model-based design flow for building parallel applications running on top of many-core platforms. The flow is based on the BIP - Behavior, Interaction, Priority - component framework and its associated toolbox. The method allows generation of a correct-by-construction mixed hardware/software system model for manycore platforms from an application software and a mapping. It is based on source-to-source correct-by-construction transformations of BIP models. It provides full support for modeling application software and validation of its functional correctness, modeling and performance analysis of system-level models, code generation and deployment on target many-core platforms. Our design flow is illustrated through the modeling and deployment of various software applications on two different hardware platforms; MPARM and platform P2012/STHORM. MPARM is a virtual ARM-based multi-cluster manycore platform, configured by the number of clusters, the number of ARM cores per cluster, and their interconnections. On MPARM, the software applications considered are the Cholesky factorization, the MPEG-2 decoding, the MJPEG decoding, the Fast Fourier Transform and the Demosaicing algorithm. Platform 2012 (P2012/STHORM) is a power efficient manycore computing fabric, which is highly modular and based on multiple clusters capable of aggressive fine-grained power management. As a case study on P2012/STHORM, we used the HMAX algorithm. Experimental results show the merits of the design flow, notably performance analysis as well as correct-by-construction system level modeling, code generation and efficient deployment. Systèmes mixtes materiel/logiciel Manycore Modèle formel Correct-by-construction Analyse de performance Génération de code Mixed hardware/software system Manycore Formal model Correct-by-construction Performance analysis Code generation 004
26	Early evaluation of multicore systems soft error reliability using virtual platforms / Avaliação de sistema de larga escala sob à influência de falhas temporárias durante a exploração de inicial projetos através do uso de plataformas virtuais Rosa, Felipe Rocha da January 2018 (has links) A crescente capacidade de computação dos componentes multiprocessados como processadores e unidades de processamento gráfico oferecem novas oportunidades para os campos de pesquisa relacionados computação embarcada e de alto desempenho (do inglês, high-performance computing). A crescente capacidade de computação progressivamente dos sistemas baseados em multicores permite executar eficientemente aplicações complexas com menor consumo de energia em comparação com soluções tradicionais de núcleo único. Essa eficiência e a crescente complexidade das cargas de trabalho das aplicações incentivam a indústria a integrar mais e mais componentes de processamento no mesmo sistema. O número de componentes de processamento empregados em sistemas grande escala já ultrapassa um milhão de núcleos, enquanto as plataformas embarcadas de 1000 núcleos estão disponíveis comercialmente. Além do enorme número de núcleos, a crescente capacidade de processamento, bem como o número de elementos de memória interna (por exemplo, registradores, memória RAM) inerentes às arquiteturas de processadores emergentes, está tornando os sistemas em grande escala mais vulneráveis a erros transientes e permanentes. Além disso, para atender aos novos requisitos de desempenho e energia, os processadores geralmente executam com frequências de relógio agressivos e múltiplos domínios de tensão, aumentando sua susceptibilidade à erros transientes, como os causados por efeitos de radiação. A ocorrência de erros transientes pode causar falhas críticas no comportamento do sistema, o que pode acarretar em perdas de vidas financeiras ou humanas. Embora tenha sido observada uma taxa de 280 erros transientes por dia durante o voo de uma nave espacial, os sistemas de processamento que trabalham à nível do solo devem experimentar pelo menos um erro transiente por dia em um futuro próximo. A susceptibilidade crescente de sistemas multicore à erros transientes necessariamente exige novas ferramentas para avaliar a resiliência à erro transientes de componentes multiprocessados em conjunto com pilhas complexas de software (sistema operacional, drivers) durante o início da fase de projeto. O objetivo principal abordado por esta Tese é desenvolver um conjunto de técnicas de injeção de falhas, que formam uma ferramenta de injeção de falha. O segundo objetivo desta Tese é estabelecer as bases para novas disciplinas de gerenciamento de confiabilidade considerando erro transientes em sistemas emergentes multi/manycore utilizando aprendizado de máquina. Este trabalho identifica multiplicas técnicas que podem ser usadas para fornecer diferentes níveis de confiabilidade na carga de trabalho e na criticidade do aplicativo. / The increasing computing capacity of multicore components like processors and graphics processing unit (GPUs) offer new opportunities for embedded and high-performance computing (HPC) domains. The progressively growing computing capacity of multicore-based systems enables to efficiently perform complex application workloads at a lower power consumption compared to traditional single-core solutions. Such efficiency and the ever-increasing complexity of application workloads encourage industry to integrate more and more computing components into the same system. The number of computing components employed in large-scale HPC systems already exceeds a million cores, while 1000-cores on-chip platforms are available in the embedded community. Beyond the massive number of cores, the increasing computing capacity, as well as the number of internal memory cells (e.g., registers, internal memory) inherent to emerging processor architectures, is making large-scale systems more vulnerable to both hard and soft errors. Moreover, to meet emerging performance and power requirements, the underlying processors usually run in aggressive clock frequencies and multiple voltage domains, increasing their susceptibility to soft errors, such as the ones caused by radiation effects. The occurrence of soft errors or Single Event Effects (SEEs) may cause critical failures in system behavior, which may lead to financial or human life losses. While a rate of 280 soft errors per day has been observed during the flight of a spacecraft, electronic computing systems working at ground level are expected to experience at least one soft error per day in near future. The increased susceptibility of multicore systems to SEEs necessarily calls for novel cost-effective tools to assess the soft error resilience of underlying multicore components with complex software stacks (operating system-OS, drivers) early in the design phase. The primary goal addressed by this Thesis is to describe the proposal and development of a fault injection framework using state-of-the-art virtual platforms, propose set of novel fault injection techniques to direct the fault campaigns according to with the software stack characteristics, and an extensive framework validation with over a million of simulation hours. The second goal of this Thesis is to set the foundations for a new discipline in soft error reliability management for emerging multi/manycore systems using machine learning techniques. It will identify and propose techniques that can be used to provide different levels of reliability on the application workload and criticality. Microeletrônica Tolerancia : Falhas Aprendizado : máquina Multi/Manycore Systems Machine Learning Soft Errors ARM Simulation Virtual Platforms Reliability Fault Tolerance
27	Conception, simulation parallèle et implémentation de réseaux sur puce hautes performances tolérants aux fautes / Design, Parallel Simulation and Implementation of High-Performance Fault-Tolerant Network-on-Chip Architectures Charif, Mohamed El Amir 17 November 2017 (has links) Grâce à une réduction considérable dans les dimensions des transistors, les systèmes informatiques sont aujourd'hui capables d'intégrer un très grand nombre de cœurs de calcul en une seule puce (System-on-Chip, SoC). Faire communiquer les composants au sein d'une puce est aujourd'hui assuré par un réseau de commutation de paquet intégré, communément appelé Network-on-Chip (NoC). Cependant, le passage à des technologies de plus en plus réduites rend les circuits plus vulnérables aux fautes et aux défauts de fabrication. Le réseau sur puce peut donc se retrouver avec des routeurs ou des liens non-opérationnels, qui ne peuvent plus être utilisés pour le routage de paquets. Par conséquent, le niveau de flexibilité offert par l'algorithme de routage n'a jamais été aussi important. La première partie de cette thèse consiste à proposer une méthodologie généralisée, permettant de concevoir des algorithmes de routage hautement flexibles, combinant tolérance aux fautes et hautes performances, et ce pour n'importe quelle topologie réseau. Cette méthodologie est basée sur une nouvelle condition suffisante pour l'absence d'interblocages (deadlocks) qui, contrairement aux méthodes existantes qui imposent des restrictions importantes sur l'utilisation des buffers, s'évalue de manière dynamique en fonction de chaque paquet et ne requiert pas un partitionnement stricte des canaux virtuels (virtual channels). Il est montré que ce degré élevé de liberté dans l'utilisation des buffers a un impact positif à la fois sur les performances et sur la robustesse du NoC, sans pour autant augmenter la complexité en termes d'implémentation matérielle. La seconde partie de la thèse s'intéresse à une problématique plus spécifique, qui est celle du routage dans des topologies tri-dimensionnelles partiellement connectées, qui vont vraisemblablement être en vigueur à cause du coût important des connexions verticales, réalisées en utilisant la technologie TSV (Through-Silicon Via). Cette thèse introduit un nouvel algorithme de routage pour ce type d'architectures nommé "First-Last". Grâce à un placement original des canaux virtuels, cet algorithme est le seul capable de garantir la connectivité totale du réseau en présence d'un seul pilier de TSVs de coordonnées arbitraires, tout en ne requérant de canaux virtuels que sur deux des ports du routeur. Contrairement à d'autres algorithmes qui utilisent le même nombre total de canaux virtuels, First-Last n'impose aucune règle sur la position des piliers, ni sur les piliers à sélectionner durant l'exécution. De plus, l'algorithme proposé ayant été construit en utilisant la méthode décrite dans la première partie de la thèse, il offre une utilisation optimisée des canaux virtuels ajoutés. L'implémentation d'un nouvel algorithme de routage implique souvent des changements considérables au niveau de la microarchitecture des routeurs. L'évaluation de ces nouvelles solutions requiert donc une plateforme capable de simuler précisément l'architecture matérielle du réseau au cycle près. De plus, il est essentiel de tester les nouvelles architectures sur des tailles de réseau significativement grandes, pour s'assurer de leur scalabilité et leur applicabilité aux technologies émergentes (e.g. intégration 3D). Malheureusement, les simulateurs de réseaux sur puce existants ne sont pas capables d'effectuer des simulations sur de grands réseaux (milliers de cœurs) assez vite, et souvent, la précision des simulations doit être sacrifiée afin d'obtenir des temps de simulation raisonnables. En réponse à ce problème, la troisième et dernière partie de cette thèse est consacrée à la conception et au développement d'un modèle de simulation générique, extensible et parallélisable, exploitant la puissance des processeurs graphiques modernes (GPU). L'outil développé modélise l'architecture d'un routeur de manière très précise et peut simuler de très grands réseaux en des temps record. / Networks-on-Chip (NoCs) have proven to be a fast and scalable replacement for buses in current and emerging many-core systems. They are today an actively researched topic and various solutions are being explored to meet the needs of emerging applications in terms of performance, quality of service, power consumption, and fault-tolerance. This thesis presents contributions in two important areas of Network-on-Chip research:- The design of ultra-flexible high-performance deadlock-free routing algorithms for any topology.- The design and implementation of parallel cycle-accurate Network-on-Chip simulators for a fast evaluation of new NoC architectures.While aggressive technology scaling has its benefits in terms of delay, area and power, it is also known to increase the vulnerability of circuits, suggesting the need for fault-tolerant designs. Fault-tolerance in NoCs is directly tied to the degree of flexibility of the routing algorithm. High routing flexibility is also required in some irregular topologies, as is the case for TSV-based 3D Network-on-Chips, wherein only a subset of the routers are connected using vertical connections. Unfortunately, routing freedom is often limited by the deadlock-avoidance method, which statically restricts the set of virtual channels that can be acquired by each packet.The first part of this thesis tackles this issue at the source and introduces a new topology-agnostic methodology for designing ultra-flexible routing algorithms for Networks-on-Chips. The theory relies on a novel low-restrictive sufficient condition of deadlock-freedom that is expressed using the local information available at each router during runtime, making it possible to verify the condition dynamically in a distributed manner.A significant gain in both performance and fault-tolerance when using our methodology compared to the existing static channel partitioning methods is reported. Moreover, hardware synthesis results show that the newly introduced mechanisms have a negligible impact on the overall router area.In the second part, a novel routing algorithm for vertically-partially-connected 3D Networks-on-Chips called First-Last is constructed using the previously presented methodology.Thanks to a unique distribution of virtual channels, our algorithm is the only one capable of guaranteeing full connectivity in the presence of one TSV pillar in an arbitrary position, while requiring a low number of extra buffers (1 extra VC in the East and North directions). This makes First-Last a highly appealing cost-effective alternative to the state-of-the-art Elevator-First algorithm.Finally, the third and last part of this work presents the first detailed and modular parallel NoC simulator design targeting Graphics Processing Units (GPUs). First, a flexible task decomposition approach, specifically geared towards high parallelization is proposed. Our approach makes it easy to adapt the granularity of parallelism to match the capabilities of the host GPU. Second, all the GPU-specific implementation issues are addressed and several optimizations are proposed. Our design is evaluated through a reference implementation, which is tested on an NVidia GTX980Ti graphics card and shown to speed up 4K-node NoC simulations by almost 280x. Tolérance aux fautes Routage Réseaux sur puce Multi-Coeurs Simulation Gpu Fault tolerance Routing Network-On-Chip Manycore Simulation Gpu 620
28	Early evaluation of multicore systems soft error reliability using virtual platforms / Avaliação de sistema de larga escala sob à influência de falhas temporárias durante a exploração de inicial projetos através do uso de plataformas virtuais Rosa, Felipe Rocha da January 2018 (has links) A crescente capacidade de computação dos componentes multiprocessados como processadores e unidades de processamento gráfico oferecem novas oportunidades para os campos de pesquisa relacionados computação embarcada e de alto desempenho (do inglês, high-performance computing). A crescente capacidade de computação progressivamente dos sistemas baseados em multicores permite executar eficientemente aplicações complexas com menor consumo de energia em comparação com soluções tradicionais de núcleo único. Essa eficiência e a crescente complexidade das cargas de trabalho das aplicações incentivam a indústria a integrar mais e mais componentes de processamento no mesmo sistema. O número de componentes de processamento empregados em sistemas grande escala já ultrapassa um milhão de núcleos, enquanto as plataformas embarcadas de 1000 núcleos estão disponíveis comercialmente. Além do enorme número de núcleos, a crescente capacidade de processamento, bem como o número de elementos de memória interna (por exemplo, registradores, memória RAM) inerentes às arquiteturas de processadores emergentes, está tornando os sistemas em grande escala mais vulneráveis a erros transientes e permanentes. Além disso, para atender aos novos requisitos de desempenho e energia, os processadores geralmente executam com frequências de relógio agressivos e múltiplos domínios de tensão, aumentando sua susceptibilidade à erros transientes, como os causados por efeitos de radiação. A ocorrência de erros transientes pode causar falhas críticas no comportamento do sistema, o que pode acarretar em perdas de vidas financeiras ou humanas. Embora tenha sido observada uma taxa de 280 erros transientes por dia durante o voo de uma nave espacial, os sistemas de processamento que trabalham à nível do solo devem experimentar pelo menos um erro transiente por dia em um futuro próximo. A susceptibilidade crescente de sistemas multicore à erros transientes necessariamente exige novas ferramentas para avaliar a resiliência à erro transientes de componentes multiprocessados em conjunto com pilhas complexas de software (sistema operacional, drivers) durante o início da fase de projeto. O objetivo principal abordado por esta Tese é desenvolver um conjunto de técnicas de injeção de falhas, que formam uma ferramenta de injeção de falha. O segundo objetivo desta Tese é estabelecer as bases para novas disciplinas de gerenciamento de confiabilidade considerando erro transientes em sistemas emergentes multi/manycore utilizando aprendizado de máquina. Este trabalho identifica multiplicas técnicas que podem ser usadas para fornecer diferentes níveis de confiabilidade na carga de trabalho e na criticidade do aplicativo. / The increasing computing capacity of multicore components like processors and graphics processing unit (GPUs) offer new opportunities for embedded and high-performance computing (HPC) domains. The progressively growing computing capacity of multicore-based systems enables to efficiently perform complex application workloads at a lower power consumption compared to traditional single-core solutions. Such efficiency and the ever-increasing complexity of application workloads encourage industry to integrate more and more computing components into the same system. The number of computing components employed in large-scale HPC systems already exceeds a million cores, while 1000-cores on-chip platforms are available in the embedded community. Beyond the massive number of cores, the increasing computing capacity, as well as the number of internal memory cells (e.g., registers, internal memory) inherent to emerging processor architectures, is making large-scale systems more vulnerable to both hard and soft errors. Moreover, to meet emerging performance and power requirements, the underlying processors usually run in aggressive clock frequencies and multiple voltage domains, increasing their susceptibility to soft errors, such as the ones caused by radiation effects. The occurrence of soft errors or Single Event Effects (SEEs) may cause critical failures in system behavior, which may lead to financial or human life losses. While a rate of 280 soft errors per day has been observed during the flight of a spacecraft, electronic computing systems working at ground level are expected to experience at least one soft error per day in near future. The increased susceptibility of multicore systems to SEEs necessarily calls for novel cost-effective tools to assess the soft error resilience of underlying multicore components with complex software stacks (operating system-OS, drivers) early in the design phase. The primary goal addressed by this Thesis is to describe the proposal and development of a fault injection framework using state-of-the-art virtual platforms, propose set of novel fault injection techniques to direct the fault campaigns according to with the software stack characteristics, and an extensive framework validation with over a million of simulation hours. The second goal of this Thesis is to set the foundations for a new discipline in soft error reliability management for emerging multi/manycore systems using machine learning techniques. It will identify and propose techniques that can be used to provide different levels of reliability on the application workload and criticality. Microeletrônica Tolerancia : Falhas Aprendizado : máquina Multi/Manycore Systems Machine Learning Soft Errors ARM Simulation Virtual Platforms Reliability Fault Tolerance
29	Early evaluation of multicore systems soft error reliability using virtual platforms / Avaliação de sistema de larga escala sob à influência de falhas temporárias durante a exploração de inicial projetos através do uso de plataformas virtuais Rosa, Felipe Rocha da January 2018 (has links) A crescente capacidade de computação dos componentes multiprocessados como processadores e unidades de processamento gráfico oferecem novas oportunidades para os campos de pesquisa relacionados computação embarcada e de alto desempenho (do inglês, high-performance computing). A crescente capacidade de computação progressivamente dos sistemas baseados em multicores permite executar eficientemente aplicações complexas com menor consumo de energia em comparação com soluções tradicionais de núcleo único. Essa eficiência e a crescente complexidade das cargas de trabalho das aplicações incentivam a indústria a integrar mais e mais componentes de processamento no mesmo sistema. O número de componentes de processamento empregados em sistemas grande escala já ultrapassa um milhão de núcleos, enquanto as plataformas embarcadas de 1000 núcleos estão disponíveis comercialmente. Além do enorme número de núcleos, a crescente capacidade de processamento, bem como o número de elementos de memória interna (por exemplo, registradores, memória RAM) inerentes às arquiteturas de processadores emergentes, está tornando os sistemas em grande escala mais vulneráveis a erros transientes e permanentes. Além disso, para atender aos novos requisitos de desempenho e energia, os processadores geralmente executam com frequências de relógio agressivos e múltiplos domínios de tensão, aumentando sua susceptibilidade à erros transientes, como os causados por efeitos de radiação. A ocorrência de erros transientes pode causar falhas críticas no comportamento do sistema, o que pode acarretar em perdas de vidas financeiras ou humanas. Embora tenha sido observada uma taxa de 280 erros transientes por dia durante o voo de uma nave espacial, os sistemas de processamento que trabalham à nível do solo devem experimentar pelo menos um erro transiente por dia em um futuro próximo. A susceptibilidade crescente de sistemas multicore à erros transientes necessariamente exige novas ferramentas para avaliar a resiliência à erro transientes de componentes multiprocessados em conjunto com pilhas complexas de software (sistema operacional, drivers) durante o início da fase de projeto. O objetivo principal abordado por esta Tese é desenvolver um conjunto de técnicas de injeção de falhas, que formam uma ferramenta de injeção de falha. O segundo objetivo desta Tese é estabelecer as bases para novas disciplinas de gerenciamento de confiabilidade considerando erro transientes em sistemas emergentes multi/manycore utilizando aprendizado de máquina. Este trabalho identifica multiplicas técnicas que podem ser usadas para fornecer diferentes níveis de confiabilidade na carga de trabalho e na criticidade do aplicativo. / The increasing computing capacity of multicore components like processors and graphics processing unit (GPUs) offer new opportunities for embedded and high-performance computing (HPC) domains. The progressively growing computing capacity of multicore-based systems enables to efficiently perform complex application workloads at a lower power consumption compared to traditional single-core solutions. Such efficiency and the ever-increasing complexity of application workloads encourage industry to integrate more and more computing components into the same system. The number of computing components employed in large-scale HPC systems already exceeds a million cores, while 1000-cores on-chip platforms are available in the embedded community. Beyond the massive number of cores, the increasing computing capacity, as well as the number of internal memory cells (e.g., registers, internal memory) inherent to emerging processor architectures, is making large-scale systems more vulnerable to both hard and soft errors. Moreover, to meet emerging performance and power requirements, the underlying processors usually run in aggressive clock frequencies and multiple voltage domains, increasing their susceptibility to soft errors, such as the ones caused by radiation effects. The occurrence of soft errors or Single Event Effects (SEEs) may cause critical failures in system behavior, which may lead to financial or human life losses. While a rate of 280 soft errors per day has been observed during the flight of a spacecraft, electronic computing systems working at ground level are expected to experience at least one soft error per day in near future. The increased susceptibility of multicore systems to SEEs necessarily calls for novel cost-effective tools to assess the soft error resilience of underlying multicore components with complex software stacks (operating system-OS, drivers) early in the design phase. The primary goal addressed by this Thesis is to describe the proposal and development of a fault injection framework using state-of-the-art virtual platforms, propose set of novel fault injection techniques to direct the fault campaigns according to with the software stack characteristics, and an extensive framework validation with over a million of simulation hours. The second goal of this Thesis is to set the foundations for a new discipline in soft error reliability management for emerging multi/manycore systems using machine learning techniques. It will identify and propose techniques that can be used to provide different levels of reliability on the application workload and criticality. Microeletrônica Tolerancia : Falhas Aprendizado : máquina Multi/Manycore Systems Machine Learning Soft Errors ARM Simulation Virtual Platforms Reliability Fault Tolerance
30	Système de fichiers scalable pour architectures many-cores à faible empreinte énergétique / Scalable file system for energy-efficient manycore architectures Karaoui, Mohamed Lamine 28 June 2016 (has links) Cette thèse porte sur l'étude des problèmes posés par l'implémentation d'un système de fichiers passant à l'échelle, pour un noyau de type UNIX sur une architecture manycore NUMA à cohérence de cache matérielle et à faible empreinte énergétique. Pour cette étude, nous prenons comme référence l'architecture manycore généraliste TSAR et le noyau de type UNIX ALMOS.L'architecture manycore visée pose trois problèmes pour lesquels nous apportons des réponses après avoir décrit les solutions existantes. L'un de ces problèmes est spécifique à l'architecture TSAR tandis que les deux autres sont généraux.Le premier problème concerne le support d'une mémoire physique plus grande que la mémoire virtuelle. Ceci est dû à l'espace d'adressage physique étendu de TSAR, lequel est 256 fois plus grand que l'espace d'adressage virtuel. Pour résoudre ce problème, nous avons profondément modifié la structure noyau pour le décomposer en plusieurs instances communicantes. La communication se fait alors principalement par passage de messages.Le deuxième problème concerne la stratégie de placement des structures du système de fichiers sur les nombreux bancs de mémoire. Pour résoudre ce problème nous avons implémenté une stratégie de distribution uniforme des données sur les différents bancs de mémoire.Le troisième problème concerne la synchronisation des accès concurrents. Pour résoudre ce problème, nous avons mis au point un mécanisme de synchronisation utilisant plusieurs mécanismes. En particulier, nous avons conçu un mécanisme lock-free efficace pour synchroniser les accès faits par plusieurs lecteurs et un écrivain. Les résultats expérimentaux montrent que : (1) l'utilisation d'une structure composée de plusieurs instances communicantes ne dégrade pas les performances du noyau et peut même les augmenter ; (2) l'ensemble des solutions utilisées permettent d'avoir des résultats qui passent mieux à l'échelle que le noyau NetBSD ; (3) la stratégie de placement la plus adaptée aux systèmes de fichiers pour les architectures manycore est celle distribuant uniformément les données. / In this thesis we study the problems of implementing a UNIX-like scalable file system on a hardware cache coherent NUMA manycore architecture. To this end, we use the TSAR manycore architecture and ALMOS, a UNIX-like operating system.The TSAR architecture presents, from the operating system point of view, three problems to which we offer a set of solutions. One of these problems is specific to the TSAR architecture while the others are common to existing coherent NUMA manycore.The first problem concerns the support of a physical memory that is larger than the virtual memory. This is due to the extended physical address space of TSAR, which is 256 times bigger than the virtual address space. To resolve this problem, we modified the structure of the kernel to decompose it into multiple communicating units.The second problem is the placement strategy to be used on the file system structures. To solve this problem, we implemented a strategy that evenly distributes the data on the different memory banks.The third problem is the synchronization of concurrent accesses to the file system. Our solution to resolve this problem uses multiple mechanisms. In particular, the solution uses an efficient lock-free mechanism that we designed, which synchronizes the accesses between several readers and a single writer.Experimental results show that: (1) structuring the kernel into multiple units does not deteriorate the performance and may even improve them; (2) our set of solutions allow us to give performances that scale better than NetBSD; (3) the placement strategy which distributes evenly the data is the most adapted for manycore architectures. Système d'exploitation Architecture manycores Scalabilité Parallélisme Système de fichiers Consommation énergétique File system Manycore architecture Energy-efficient 004.2

Search results