41 |
Neutron Beam Testing Methodology and Results for a Complex Programmable Multiprocessor SoCAnderson, Jordan Daniel 01 March 2019 (has links)
The Xilinx Multiprocessor System-on-Chip (MPSoC) is a complex device that uses 16nm FinFET technology to combine multiple processors, a large amount of FPGA resources, and many I/O interfaces on a single chip die. These features make the MPSoC a high-performance and architecturally flexible device. The potential computing power makes the MPSoC ideal for many embedded applications including terrestrial and space applications. The MPSoC, however, does not have extensive radiation history as many other devices have. The extent of the effect that ionized particles may have on the MPSoC is not well established. To solve this problem, neutron radiation testing can be used to determine the device's susceptibility to single-event upsets (SEUs). Though this thesis is not intended to qualify the MPSoC for space, this work does provide useful neutron radiation test data that helps to characterize the susceptible nature of the device. This thesis summarizes the SEU results obtained from neutron testing on the UltraScale+ MPSoC ZU9EG device. A series of three neutron beam tests were performed on the MPSoC ZU9EG at Los Alamos National Laboratories (LANL). Testing was performed using a novel testing methodology to collect SEU counts on the programmable logic and the processing system simultaneously. These results show a 10.1× improvement of the programmable logic CRAM over the previous Xilinx UltraScale device series.
|
42 |
Neutron Beam Testing Methodology and Results for a Complex Programmable Multiprocessor SoCAnderson, Jordan Daniel 01 March 2019 (has links)
The Xilinx Multiprocessor System-on-Chip (MPSoC) is a complex device that uses 16nm FinFET technology to combine multiple processors, a large amount of FPGA resources, and many I/O interfaces on a single chip die. These features make the MPSoC a high-performance and architecturally flexible device. The potential computing power makes the MPSoC ideal for many embedded applications including terrestrial and space applications.The MPSoC, however, does not have extensive radiation history as many other devices have. The extent of the effect that ionized particles may have on the MPSoC is not well established. To solve this problem, neutron radiation testing can be used to determine the device's susceptibility to single-event upsets (SEUs). . Though this thesis is not intended to qualify the MPSoC for space, this work does provide useful neutron radiation test data that helps to characterize the susceptible nature of the device. This thesis summarizes the SEU results obtained from neutron testing on the UltraScale+ MPSoC ZU9EG device. A series of three neutron beam tests were performed on the MPSoC ZU9EG at Los Alamos National Laboratories (LANL). Testing was performed using a novel testing methodology to collect SEU counts on the programmable logic and the processing system simultaneously. These results show a $10.1 timess improvement of the programmable logic CRAM over the previous Xilinx UltraScale device series.
|
43 |
Synthèse des communications dans un environnement de génération de logiciel embarqué pour des plateformes multi-tuiles hétérogènesChagoya-Garzon, A. 03 December 2010 (has links) (PDF)
Dans cette étude, nous nous intéressons aux outils de génération de logiciel embarqué ciblant des plateformes multi-tuiles hétérogènes. Dans ces plateformes, un système sur puce multiprocesseurs hétérogène (ou tuile) est répliqué et connecté par des réseaux externes à la tuile, extensibles et commutés par paquets. Ces outils se basent sur une représentation abstraite de l'architecture, de l'application et du déploiement des éléments applicatifs sur les éléments de l'architecture. Programmer de zéro ces architectures complexes n'est pas concevable, cependant nous ne pouvons nous contenter des environnements de programmation embarqués classiques en raison de l'hétérogénéité de la tuile de base, qui embarque des RISCs, des DSPs et une infrastructure interne à la tuile non uniforme et complexe. L'un des enjeux dans ce contexte est de masquer cette complexité au programmeur de l'application pour qu'il puisse se concentrer sur l'écriture de son programme sans se soucier dans un premier temps de son déploiement sur la plateforme cible. L'une des difficultés des systèmes multi-tuiles est le nombre de chemins de communication que ceux-ci proposent, c'est pourquoi nous nous concentrons dans ce manuscrit sur la gestion (transparente pour le programmeur) des communications dans notre flot. Nous définissons donc les informations minimales à inclure dans le modèle d'entrée du flot pour arriver à synthétiser les communications de l'application. Grâce à ces informations, nous arrivons à puiser les composants logiciels de communication adéquats, qui se présentent sous la forme de pilotes d'un système d'exploitation. Cette sélection n'est pas suffisante, il faut ensuite spécialiser ces composants pour chaque canal de communication de l'application afin d'arriver à un résultat correct. En raison du nombre d'unités de calcul de la plateforme ciblée et du nombre de canaux des applications considérées, une automatisation totale du flot est requise, nous abordons donc les difficultés que cela représente en raison du processus de compilation croisé mis en jeu par le flot, et la solution que nous avons retenue pour arriver à un flot fonctionnel. Trois applications (dont une appartenant au monde du calcul de haute performance), écrites par des programmeurs ne maîtrisant pas la plateforme multi-tuiles choisie, ont été soumises à notre flot, qui a généré de manière correcte plusieurs déploiements de ces applications.
|
44 |
Fault-Tolerant Average Execution Time Optimization for General-Purpose Multi-Processor System-On-ChipsVäyrynen, Mikael January 2009 (has links)
<p>Fault tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault tolerance. For a given job and a soft (transient) no-error probability, we define mathematical formulas for AET using voting (active replication), rollback-recovery with checkpointing (RRC) and a combination of these (CRV) where bus communication overhead is included. And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize the AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC or a combination where RRC is included, (2) finding the number of processors and job-to-processor assignment when using voting or a combination where voting is used, and (3) defining fault tolerance scheme (voting, RRC or CRV) per job and defining its usage for each job. Experiments demonstrate significant savings in AET.</p>
|
45 |
Study and design of a manycore architecture with multithreaded processors for dynamic embedded applicationsBechara, Charly 08 December 2011 (has links) (PDF)
Embedded systems are getting more complex and require more intensive processing capabilities. They must be able to adapt to the rapid evolution of the high-end embedded applications that are characterized by their high computation-intensive workloads (order of TOPS: Tera Operations Per Second), and their high level of parallelism. Moreover, since the dynamism of the applications is becoming more significant, powerful computing solutions should be designed accordingly. By exploiting efficiently the dynamism, the load will be balanced between the computing resources, which will improve greatly the overall performance. To tackle the challenges of these future high-end massively-parallel dynamic embedded applications, we have designed the AHDAM architecture, which stands for "Asymmetric Homogeneous with Dynamic Allocator Manycore architecture". Its architecture permits to process applications with large data sets by efficiently hiding the processors' stall time using multithreaded processors. Besides, it exploits the parallelism of the applications at multiple levels so that they would be accelerated efficiently on dedicated resources, hence improving efficiently the overall performance. AHDAM architecture tackles the dynamism of these applications by dynamically balancing the load between its computing resources using a central controller to increase their utilization rate.The AHDAM architecture has been evaluated using a relevant embedded application from the telecommunication domain called "spectrum radio-sensing". With 136 cores running at 500 MHz, AHDAM architecture reaches a peak performance of 196 GOPS and meets the computation requirements of the application.
|
46 |
Estudo sobre o impacto da hierarquia de memória em MPSoCs baseados em NoCSilva, Gustavo Girão Barreto da January 2009 (has links)
Ao longo dos últimos anos, os sistemas embarcados vêm se tornando cada vez mais complexos tanto em termos de hardware quanto de software. Ultimamente têm-se adotado como solução o uso de MPSoCs (sistemas multiprocessados integrados em chip) para uma maior eficiência energética e computacional nestes sistemas. Com o uso de diversos elementos de processamento, redes-em-chip (NoC - networks-on-chip) aparecem como soluções de melhor desempenho do que barramentos. Nestes ambientes cujo desempenho depende da eficiência do modelo de comunicação, a hierarquia de memória se torna um elemento chave. Baseando-se neste cenário, este trabalho realiza uma investigação sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC. Dentro deste escopo foi desenvolvida uma nova organização de memória fisicamente centralizada com diferentes espaços de endereçamentos denominada nDMA. Este trabalho também apresenta uma comparação entre a nova organização e outras três organizações bastante difundidas tais como memória distribuída, memória compartilhada e memória compartilhada distribuída. Estas duas ultimas adotam um modelo de coerência de cache baseado em diretório completamente desenvolvido em hardware. Os modelos de memória foram implementados na plataforma virtual SIMPLE (SIMPLE Multiprocessor Platform Environment). Resultados experimentais mostram uma forte dependência com relação à carga de comunicação gerada pelas aplicações. O modelo de memória distribuída apresenta melhores resultados conforme a carga de comunicação das aplicações é baixa. Por outro lado, o novo modelo de memória fisicamente compartilhado com diferentes espaços de endereçamento apresenta melhores resultados conforme a carga de comunicação das aplicações é alta. Também foram realizados experimentos objetivando analisar o desempenho dos modelos de memória em situações de alta latência de comunicação na rede. Resultados mostram melhores resultados do modelo de memória distribuída quando a carga de comunicação das aplicações é alta e, caso contrário, o modelo nDMA apresenta melhores resultados. Por fim, foram analisados os desempenhos dos modelos de memória durante o processo de migração de tarefas. Neste caso, os modelos de memória compartilhada e compartilhada distribuída apresentaram melhores resultados devido ao fato de que não se faz necessária o envio dos dados da aplicação nestes modelos e também devido ao menor tamanho de código se comparado com os outros modelos. / In the past few the years, embedded systems have become even more complex both on terms of hardware and software. Lately, the use of MPSoCs (Multi-Processor Systems-on-Chip) has been adopted on these systems for a better energetic and computational efficiency. Due to the use of several processing elements, Networks-on-Chip arise as better performance solutions than buses. Considering this scenario, this work performs an investigation on the impact of memory hierarchy in NoC-based MPSoCs. In this context, a new physically centralized and shared memory organization with different address spaces named nDMA was developed. This work also presents a comparison between the new memory organization and three different well-known memory hierarchy models such as distributed memory and shared and distributed shared memories that make use of a fully hardware cache coherence solution. The memory models were implemented in the SIMPLE (SIMPLE Multiprocessor Platform Environment) virtual platform. Experimental results shows a strong dependency on the application communication workload. The distributed memory model presents better results as the application communication workload is low. On the other hand, the new memory model (physically shared with different address spaces) presents better results as the application communication workload is high. There were also experiments aiming at observing the performance of the memory models in situations where the communication latency on the network is high. Results show better results of the distributed memory model when the application communication workload is high, and the nDMA model presents better results otherwise. Finally, the performance of the memory models during a task migration process were evaluated. In this case, the shared memory and distributed shared memory models presented better results due to the fact that in this case the data memory does not need to be transferred from one point to another and also due to the low size of the memory code in these cases if compared to other memory models.
|
47 |
Desenvolvimento e avaliação de redes-em-chip hierárquicas e reconfiguráveis para MPSoCs / Development and evaluation of hierarchical and reconfigurable networks-on-chip for MPSoCsReinbrecht, Cezar Rodolfo Wedig January 2012 (has links)
Com o advento dos processos submicrônicos, a capacidade de integração de transistores numa mesma pastilha de silício atingiu níveis que possibilitaram a construção dos sistemas com múltiplos processadores num chip (MPSoCs, do inglês MultiProcessor System-on-Chip). Essa possibilidade de integração permite inserir dezenas de Elementos de Processamento (EPs) nos circuitos integrados atuais, e já se projeta centenas de EPs para os sistemas da próxima década (ITRS, 2011). Nesse cenário, um dos principais desafios se refere ao serviço de interconexão dos EPs, que deve apresentar um desempenho de comunicação necessário para as aplicações em execução sem comprometer as limitações de consumo de área e energia do circuito. Nos primeiros sistemas multiprocessados, com poucos nodos, arquiteturas baseadas em barramento foram suficientes para cumprir esses requisitos. Porém, o número de elementos nos sistemas recentes aumentou rapidamente, tornando as redes-em-chip a solução mais apropriada, por aliar escalabilidade e reuso na mesma estrutura. Contudo, diante da previsão de que essa tendência de aumento se manterá retorna a discussão se as redes-em-chip atuais continuarão adequadas para os futuros sistemas. De fato, o custo das redes-em-chip convencionais pode se tornar proibitivo para as escalas dos circuitos em um futuro próximo. Novas propostas têm sido apresentadas na literatura científica onde se podem destacar duas principais estratégias de projeto às redes de interconexão: reconfiguração arquitetural e organização hierárquica da topologia. A reconfiguração arquitetural permite obter uma grande eficiência, independente do tipo de aplicação em execução, pois uma das alternativas é projetar o circuito para que ele se auto adapte conforme os requisitos de desempenho para cada aplicação. Por outro lado, arquiteturas organizadas em topologias hierárquicas são desenvolvidas para uma estrutura computacional definida em tempo de projeto, sendo mais eficazes para uma classe de aplicações. O presente trabalho explora a sinergia da combinação das potencialidades das duas soluções e propõe uma nova estrutura que oferece melhor desempenho para uma classe maior de aplicações apropriada para os futuros sistemas. Como resultado foi implementada uma arquitetura adaptativa chamada MINoC (Multiple Interconnections Networks-on-Chip), uma arquitetura organizada em hierarquia, chamada HiCIT (Hierarchical Crossbar-based Interconnection Topology) e uma simbiose de ambas culminando na arquitetura hierárquica adaptativa HASIN (Hierarchical Adaptive Switching Interconnection Network). São apresentados resultados que mostram a eficiência desses conceitos validando a proposta hierárquica adaptativa. / With the advent of submicron processes, the number of transistors integrated on a single chip has reached levels that allowed the design of Multiprocessor Systems-on-Chip (MPSoCs). This capability allows the integration of several processing elements (PEs) in integrated circuits designed nowadays. In the next decade it is expected that hundreds of PEs will be integrated on a single chip. In this scenario, a key challenge is the interconnection network between PEs, which must provide the communication service required to run applications without compromising the limitations of area and energy consumption. In the first multiprocessor systems, with few nodes, bus-based approaches have been sufficient to meet these requirements. However, current systems increased quickly the number of elements, making the Networks-on-Chip (NoCs) the most appropriate solution, because it handles scalability and reusability in the same structure. Nevertheless, ITRS roadmap predicts that this increase will continue (ITRS, 2011), which resumes the discussion if present NoC architectures will be the most adequate for future systems, since its costs could be prohibitive. Therefore, new proposals have been presented in the literature with two main design strategies: architectural reconfiguration and hierarchical organization of the topology. With the architectural reconfiguration it is possible to obtain an application independent high efficiency structure, because the circuit is designed to adapt itself to satisfy performance requirements. On the other hand, architectural organizations in hierarchical topologies are defined at design time to have the most appropriate features for a class of applications, being very effective. The current work identified the synergy of both approaches and proposes a new symbiotic structure suitable for a broader class of applications. As a result, it was implemented an adaptive architecture called MINoC (Multiple Interconexions Networks-on-chip), an architecture organized in hierarchy called HiCIT (Hierarchical Crossbar-based Interconnection Topology) and a mix of both ending up with the hierarchical adaptive architecture HASIN (Hierarchical Interconnection Network Adaptive Switching). Results show the efficiency of these concepts validating the proposed hierarchical adaptive architecture.
|
48 |
Software performance estimation in MPSoC design / Estimativa de desempenho de software embarcado em sistemas multiprocessadores em uma única pastilhaOyamada, Marcio Seiji January 2007 (has links)
Atualmente, novas metodologias de projeto são necessárias devido a crescente complexidade dos sistemas embarcados. Metodologias no nível de sistema são propostas para auxiliar o projetista a lidar com a crescente complexidade, iniciando o projeto em um nível de abstração mais alto que o nível de transferência de registradores. Ferramentas de estimativa de desempenho são uma importante parte das metodologias no nível de sistema, visto que as mesmas auxiliam a exploração do espaço de projeto desde os estágios iniciais. O objetivo desta tese é definir uma metodologia integrada para estimativa de desempenho do software. Atualmente, nota-se a crescente utilização de software embarcado, inclusive utilizando múltiplos processadores, visando atender os requisitos de flexibilidade, desempenho e potência consumida. O desenvolvimento de estimadores de desempenho de software não é trivial, devido à utilização de processadores embarcados com arquiteturas avançadas. Para auxiliar a seleção do processador no nível da especificação do sistema, um novo modelo de estimador do desempenho do software baseado em redes neurais é proposto. Redes neurais mostraram-se uma solução adequada para uma rápida estimativa de desempenho em um estágio inicial do projeto. Para realizar a análise do desempenho do software no nível funcional do barramento, onde o mapeamento do hardware e software já está definido, é utilizado um modelo global de simulação, chamado de protótipo virtual. A metodologia de análise de desempenho proposta neste trabalho é integrada a um ambiente para refinamento de interfaces de hardware e software chamada ROSES. A metodologia proposta é avaliada através de um estudo de caso de uma arquitetura multiprocessada de um codificador MPEG4. / Nowadays, embedded system complexity requires new design methodologies. System-level methodologies are proposed to cope with this complexity, starting the design above the register-transfer level. Performance estimation tools are an important piece of system-level design methodologies, since they are used to aid design space exploration at an early design stage. The goal of this thesis is to define an integrated methodology for software performance estimation. Currently, embedded software usage is increasing, becoming multiprocessor system-on-chip a common solution to cope with flexibility, performance, and power requirements. The development of accurate software performance estimators is not trivial, due to the increased complexity of embedded processors. To drive processor selection at specification level, a novel analytic software performance estimator based on neural networks is proposed. The neural network enables a fast estimation at an early design stage. To target the software performance analysis at bus functional level, where mapping of the hardware and software components is already established, we use a global simulation model supporting performance profiling. The proposed software performance estimation methodology is linked to a hardware and software interface refinement environment named ROSES. The proposed methodology is evaluated through a case study of a multiprocessor MPEG4 encoder.
|
49 |
Software performance estimation in MPSoC design / Estimativa de desempenho de software embarcado em sistemas multiprocessadores em uma única pastilhaOyamada, Marcio Seiji January 2007 (has links)
Atualmente, novas metodologias de projeto são necessárias devido a crescente complexidade dos sistemas embarcados. Metodologias no nível de sistema são propostas para auxiliar o projetista a lidar com a crescente complexidade, iniciando o projeto em um nível de abstração mais alto que o nível de transferência de registradores. Ferramentas de estimativa de desempenho são uma importante parte das metodologias no nível de sistema, visto que as mesmas auxiliam a exploração do espaço de projeto desde os estágios iniciais. O objetivo desta tese é definir uma metodologia integrada para estimativa de desempenho do software. Atualmente, nota-se a crescente utilização de software embarcado, inclusive utilizando múltiplos processadores, visando atender os requisitos de flexibilidade, desempenho e potência consumida. O desenvolvimento de estimadores de desempenho de software não é trivial, devido à utilização de processadores embarcados com arquiteturas avançadas. Para auxiliar a seleção do processador no nível da especificação do sistema, um novo modelo de estimador do desempenho do software baseado em redes neurais é proposto. Redes neurais mostraram-se uma solução adequada para uma rápida estimativa de desempenho em um estágio inicial do projeto. Para realizar a análise do desempenho do software no nível funcional do barramento, onde o mapeamento do hardware e software já está definido, é utilizado um modelo global de simulação, chamado de protótipo virtual. A metodologia de análise de desempenho proposta neste trabalho é integrada a um ambiente para refinamento de interfaces de hardware e software chamada ROSES. A metodologia proposta é avaliada através de um estudo de caso de uma arquitetura multiprocessada de um codificador MPEG4. / Nowadays, embedded system complexity requires new design methodologies. System-level methodologies are proposed to cope with this complexity, starting the design above the register-transfer level. Performance estimation tools are an important piece of system-level design methodologies, since they are used to aid design space exploration at an early design stage. The goal of this thesis is to define an integrated methodology for software performance estimation. Currently, embedded software usage is increasing, becoming multiprocessor system-on-chip a common solution to cope with flexibility, performance, and power requirements. The development of accurate software performance estimators is not trivial, due to the increased complexity of embedded processors. To drive processor selection at specification level, a novel analytic software performance estimator based on neural networks is proposed. The neural network enables a fast estimation at an early design stage. To target the software performance analysis at bus functional level, where mapping of the hardware and software components is already established, we use a global simulation model supporting performance profiling. The proposed software performance estimation methodology is linked to a hardware and software interface refinement environment named ROSES. The proposed methodology is evaluated through a case study of a multiprocessor MPEG4 encoder.
|
50 |
Architectures logicielles pour la radio flexible : intégration d'unités de calcul hétérogènes / Software design for flexible radio : integration of heterogeneous computing unitsHorrein, Pierre-Henri 10 January 2012 (has links)
L'utilisation de la radio flexible permet d'envisager des réseaux sans fil évolutifs, plus efficaces et plus intelligents. Le terme «~radio flexible~» est un terme très général, qui peut s'appliquer à une implanta- tion logicielle des opérations, à une implantation matérielle adaptable basée sur des accélérateurs matériels, ou encore à des implantations mixtes. Il regroupe en fait toutes les implantations de terminaux radio qui ne sont pas figées. Les travaux réalisés durant cette thèse tournent autour de deux points. Le premier est l'utilisation des processeurs graphiques pour la radio flexible, et plus particulièrement pour la radio logicielle. Ces cibles offrent des performances impressionnantes en termes de débit brut de calcul, en se basant sur architecture massivement parallèle. Le parallélisme de données utilisé dans ces processeurs diffère cependant du parallélisme de tâches souvent exploitées dans les applications de radio logicielle. Différentes approches pour résoudre ce problème sont étudiées. Les résultats obtenus sur ce point permettent une nette amélioration du débit de calcul atteignable avec une implantation logicielle, tout en libérant le processeur pour d'autres tâches. Le deuxième point abordé dans cette étude concerne la définition d'un environnement perme- ttant de supporter différentes possibilités d'implantation de la radio flexible. Cet environnement englobe le support de la plateforme hétérogène, et la gestion des applications sur ces plateformes. Bien qu'encore expérimental, les premiers résultats obtenus avec l'environnement montrent ses capacités d'adaptation, et le rendent utilisable pour des applications radio variées sur des plateformes hétérogènes. / The development of flexible radio leads to evolving wireless networks. This character- istic enables more efficient and smarter networks. Flexible radio is not a precise definition. It can be used to describe a software implementation of radio operations, as well as a hardware implementation based on configurable hardware coprocessors. It can also refer to heterogeneous implementations. It describes any implementation which can evolve. During this PhD, we focused on two different aspects of flexible radio. First, the use of graphi- cal processors for flexible radio (and especially software radio) is studied. These execution targets enable impressive performances, when studying raw attainable processing throughput, through the use of massively parallel architectures. The problem is that the data parallelism exhibited by these processors does not match the task parallelism of software radio applications. Different approaches to correct this mismatch are studied in this work. The displayed results show an improvement in the at- tainable software implementation, while letting the processor process other tasks. The other issue addressed in this work is the definition of an environment able to support dif- ferent implementation choices for flexible radio. Support for multiple implementations includes heterogeneous platforms support, as well as application management on these platforms. While this environment is still in early development stage, preliminary results demonstrate its adaptabil ity, and eases development of applications on different heterogeneous platforms.
|
Page generated in 0.03 seconds