Global ETD Search

51	Estudo sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC Silva, Gustavo Girão Barreto da January 2009 (has links) Ao longo dos últimos anos, os sistemas embarcados vêm se tornando cada vez mais complexos tanto em termos de hardware quanto de software. Ultimamente têm-se adotado como solução o uso de MPSoCs (sistemas multiprocessados integrados em chip) para uma maior eficiência energética e computacional nestes sistemas. Com o uso de diversos elementos de processamento, redes-em-chip (NoC - networks-on-chip) aparecem como soluções de melhor desempenho do que barramentos. Nestes ambientes cujo desempenho depende da eficiência do modelo de comunicação, a hierarquia de memória se torna um elemento chave. Baseando-se neste cenário, este trabalho realiza uma investigação sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC. Dentro deste escopo foi desenvolvida uma nova organização de memória fisicamente centralizada com diferentes espaços de endereçamentos denominada nDMA. Este trabalho também apresenta uma comparação entre a nova organização e outras três organizações bastante difundidas tais como memória distribuída, memória compartilhada e memória compartilhada distribuída. Estas duas ultimas adotam um modelo de coerência de cache baseado em diretório completamente desenvolvido em hardware. Os modelos de memória foram implementados na plataforma virtual SIMPLE (SIMPLE Multiprocessor Platform Environment). Resultados experimentais mostram uma forte dependência com relação à carga de comunicação gerada pelas aplicações. O modelo de memória distribuída apresenta melhores resultados conforme a carga de comunicação das aplicações é baixa. Por outro lado, o novo modelo de memória fisicamente compartilhado com diferentes espaços de endereçamento apresenta melhores resultados conforme a carga de comunicação das aplicações é alta. Também foram realizados experimentos objetivando analisar o desempenho dos modelos de memória em situações de alta latência de comunicação na rede. Resultados mostram melhores resultados do modelo de memória distribuída quando a carga de comunicação das aplicações é alta e, caso contrário, o modelo nDMA apresenta melhores resultados. Por fim, foram analisados os desempenhos dos modelos de memória durante o processo de migração de tarefas. Neste caso, os modelos de memória compartilhada e compartilhada distribuída apresentaram melhores resultados devido ao fato de que não se faz necessária o envio dos dados da aplicação nestes modelos e também devido ao menor tamanho de código se comparado com os outros modelos. / In the past few the years, embedded systems have become even more complex both on terms of hardware and software. Lately, the use of MPSoCs (Multi-Processor Systems-on-Chip) has been adopted on these systems for a better energetic and computational efficiency. Due to the use of several processing elements, Networks-on-Chip arise as better performance solutions than buses. Considering this scenario, this work performs an investigation on the impact of memory hierarchy in NoC-based MPSoCs. In this context, a new physically centralized and shared memory organization with different address spaces named nDMA was developed. This work also presents a comparison between the new memory organization and three different well-known memory hierarchy models such as distributed memory and shared and distributed shared memories that make use of a fully hardware cache coherence solution. The memory models were implemented in the SIMPLE (SIMPLE Multiprocessor Platform Environment) virtual platform. Experimental results shows a strong dependency on the application communication workload. The distributed memory model presents better results as the application communication workload is low. On the other hand, the new memory model (physically shared with different address spaces) presents better results as the application communication workload is high. There were also experiments aiming at observing the performance of the memory models in situations where the communication latency on the network is high. Results show better results of the distributed memory model when the application communication workload is high, and the nDMA model presents better results otherwise. Finally, the performance of the memory models during a task migration process were evaluated. In this case, the shared memory and distributed shared memory models presented better results due to the fact that in this case the data memory does not need to be transferred from one point to another and also due to the low size of the memory code in these cases if compared to other memory models. Microeletrônica MPSoC NoC Embedded systems Multiprocessor system-on-chip Network-on-chip Cache coherence Task migration
52	Desenvolvimento e avaliação de redes-em-chip hierárquicas e reconfiguráveis para MPSoCs / Development and evaluation of hierarchical and reconfigurable networks-on-chip for MPSoCs Reinbrecht, Cezar Rodolfo Wedig January 2012 (has links) Com o advento dos processos submicrônicos, a capacidade de integração de transistores numa mesma pastilha de silício atingiu níveis que possibilitaram a construção dos sistemas com múltiplos processadores num chip (MPSoCs, do inglês MultiProcessor System-on-Chip). Essa possibilidade de integração permite inserir dezenas de Elementos de Processamento (EPs) nos circuitos integrados atuais, e já se projeta centenas de EPs para os sistemas da próxima década (ITRS, 2011). Nesse cenário, um dos principais desafios se refere ao serviço de interconexão dos EPs, que deve apresentar um desempenho de comunicação necessário para as aplicações em execução sem comprometer as limitações de consumo de área e energia do circuito. Nos primeiros sistemas multiprocessados, com poucos nodos, arquiteturas baseadas em barramento foram suficientes para cumprir esses requisitos. Porém, o número de elementos nos sistemas recentes aumentou rapidamente, tornando as redes-em-chip a solução mais apropriada, por aliar escalabilidade e reuso na mesma estrutura. Contudo, diante da previsão de que essa tendência de aumento se manterá retorna a discussão se as redes-em-chip atuais continuarão adequadas para os futuros sistemas. De fato, o custo das redes-em-chip convencionais pode se tornar proibitivo para as escalas dos circuitos em um futuro próximo. Novas propostas têm sido apresentadas na literatura científica onde se podem destacar duas principais estratégias de projeto às redes de interconexão: reconfiguração arquitetural e organização hierárquica da topologia. A reconfiguração arquitetural permite obter uma grande eficiência, independente do tipo de aplicação em execução, pois uma das alternativas é projetar o circuito para que ele se auto adapte conforme os requisitos de desempenho para cada aplicação. Por outro lado, arquiteturas organizadas em topologias hierárquicas são desenvolvidas para uma estrutura computacional definida em tempo de projeto, sendo mais eficazes para uma classe de aplicações. O presente trabalho explora a sinergia da combinação das potencialidades das duas soluções e propõe uma nova estrutura que oferece melhor desempenho para uma classe maior de aplicações apropriada para os futuros sistemas. Como resultado foi implementada uma arquitetura adaptativa chamada MINoC (Multiple Interconnections Networks-on-Chip), uma arquitetura organizada em hierarquia, chamada HiCIT (Hierarchical Crossbar-based Interconnection Topology) e uma simbiose de ambas culminando na arquitetura hierárquica adaptativa HASIN (Hierarchical Adaptive Switching Interconnection Network). São apresentados resultados que mostram a eficiência desses conceitos validando a proposta hierárquica adaptativa. / With the advent of submicron processes, the number of transistors integrated on a single chip has reached levels that allowed the design of Multiprocessor Systems-on-Chip (MPSoCs). This capability allows the integration of several processing elements (PEs) in integrated circuits designed nowadays. In the next decade it is expected that hundreds of PEs will be integrated on a single chip. In this scenario, a key challenge is the interconnection network between PEs, which must provide the communication service required to run applications without compromising the limitations of area and energy consumption. In the first multiprocessor systems, with few nodes, bus-based approaches have been sufficient to meet these requirements. However, current systems increased quickly the number of elements, making the Networks-on-Chip (NoCs) the most appropriate solution, because it handles scalability and reusability in the same structure. Nevertheless, ITRS roadmap predicts that this increase will continue (ITRS, 2011), which resumes the discussion if present NoC architectures will be the most adequate for future systems, since its costs could be prohibitive. Therefore, new proposals have been presented in the literature with two main design strategies: architectural reconfiguration and hierarchical organization of the topology. With the architectural reconfiguration it is possible to obtain an application independent high efficiency structure, because the circuit is designed to adapt itself to satisfy performance requirements. On the other hand, architectural organizations in hierarchical topologies are defined at design time to have the most appropriate features for a class of applications, being very effective. The current work identified the synergy of both approaches and proposes a new symbiotic structure suitable for a broader class of applications. As a result, it was implemented an adaptive architecture called MINoC (Multiple Interconexions Networks-on-chip), an architecture organized in hierarchy called HiCIT (Hierarchical Crossbar-based Interconnection Topology) and a mix of both ending up with the hierarchical adaptive architecture HASIN (Hierarchical Interconnection Network Adaptive Switching). Results show the efficiency of these concepts validating the proposed hierarchical adaptive architecture. Microeletrônica Sistemas embarcados MPSoC MPSoCs Network-on-chip Interconnections Adaptive architecture System-on-chip
53	Fault-Tolerant Average Execution Time Optimization for General-Purpose Multi-Processor System-On-Chips Väyrynen, Mikael January 2009 (has links) Fault tolerance is due to the semiconductor technology development important, not only for safety-critical systems but also for general-purpose (non-safety critical) systems. However, instead of guaranteeing that deadlines always are met, it is for general-purpose systems important to minimize the average execution time (AET) while ensuring fault tolerance. For a given job and a soft (transient) no-error probability, we define mathematical formulas for AET using voting (active replication), rollback-recovery with checkpointing (RRC) and a combination of these (CRV) where bus communication overhead is included. And, for a given multi-processor system-on-chip (MPSoC), we define integer linear programming (ILP) models that minimize the AET including bus communication overhead when: (1) selecting the number of checkpoints when using RRC or a combination where RRC is included, (2) finding the number of processors and job-to-processor assignment when using voting or a combination where voting is used, and (3) defining fault tolerance scheme (voting, RRC or CRV) per job and defining its usage for each job. Experiments demonstrate significant savings in AET. Fault tolerance Execution time optimization Rollback recovery with checkpointing Active replication MPSoC Computer Sciences Datavetenskap (datalogi)
54	Estudo sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC Silva, Gustavo Girão Barreto da January 2009 (has links) Ao longo dos últimos anos, os sistemas embarcados vêm se tornando cada vez mais complexos tanto em termos de hardware quanto de software. Ultimamente têm-se adotado como solução o uso de MPSoCs (sistemas multiprocessados integrados em chip) para uma maior eficiência energética e computacional nestes sistemas. Com o uso de diversos elementos de processamento, redes-em-chip (NoC - networks-on-chip) aparecem como soluções de melhor desempenho do que barramentos. Nestes ambientes cujo desempenho depende da eficiência do modelo de comunicação, a hierarquia de memória se torna um elemento chave. Baseando-se neste cenário, este trabalho realiza uma investigação sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC. Dentro deste escopo foi desenvolvida uma nova organização de memória fisicamente centralizada com diferentes espaços de endereçamentos denominada nDMA. Este trabalho também apresenta uma comparação entre a nova organização e outras três organizações bastante difundidas tais como memória distribuída, memória compartilhada e memória compartilhada distribuída. Estas duas ultimas adotam um modelo de coerência de cache baseado em diretório completamente desenvolvido em hardware. Os modelos de memória foram implementados na plataforma virtual SIMPLE (SIMPLE Multiprocessor Platform Environment). Resultados experimentais mostram uma forte dependência com relação à carga de comunicação gerada pelas aplicações. O modelo de memória distribuída apresenta melhores resultados conforme a carga de comunicação das aplicações é baixa. Por outro lado, o novo modelo de memória fisicamente compartilhado com diferentes espaços de endereçamento apresenta melhores resultados conforme a carga de comunicação das aplicações é alta. Também foram realizados experimentos objetivando analisar o desempenho dos modelos de memória em situações de alta latência de comunicação na rede. Resultados mostram melhores resultados do modelo de memória distribuída quando a carga de comunicação das aplicações é alta e, caso contrário, o modelo nDMA apresenta melhores resultados. Por fim, foram analisados os desempenhos dos modelos de memória durante o processo de migração de tarefas. Neste caso, os modelos de memória compartilhada e compartilhada distribuída apresentaram melhores resultados devido ao fato de que não se faz necessária o envio dos dados da aplicação nestes modelos e também devido ao menor tamanho de código se comparado com os outros modelos. / In the past few the years, embedded systems have become even more complex both on terms of hardware and software. Lately, the use of MPSoCs (Multi-Processor Systems-on-Chip) has been adopted on these systems for a better energetic and computational efficiency. Due to the use of several processing elements, Networks-on-Chip arise as better performance solutions than buses. Considering this scenario, this work performs an investigation on the impact of memory hierarchy in NoC-based MPSoCs. In this context, a new physically centralized and shared memory organization with different address spaces named nDMA was developed. This work also presents a comparison between the new memory organization and three different well-known memory hierarchy models such as distributed memory and shared and distributed shared memories that make use of a fully hardware cache coherence solution. The memory models were implemented in the SIMPLE (SIMPLE Multiprocessor Platform Environment) virtual platform. Experimental results shows a strong dependency on the application communication workload. The distributed memory model presents better results as the application communication workload is low. On the other hand, the new memory model (physically shared with different address spaces) presents better results as the application communication workload is high. There were also experiments aiming at observing the performance of the memory models in situations where the communication latency on the network is high. Results show better results of the distributed memory model when the application communication workload is high, and the nDMA model presents better results otherwise. Finally, the performance of the memory models during a task migration process were evaluated. In this case, the shared memory and distributed shared memory models presented better results due to the fact that in this case the data memory does not need to be transferred from one point to another and also due to the low size of the memory code in these cases if compared to other memory models. Microeletrônica MPSoC NoC Embedded systems Multiprocessor system-on-chip Network-on-chip Cache coherence Task migration
55	Desenvolvimento e avaliação de redes-em-chip hierárquicas e reconfiguráveis para MPSoCs / Development and evaluation of hierarchical and reconfigurable networks-on-chip for MPSoCs Reinbrecht, Cezar Rodolfo Wedig January 2012 (has links) Com o advento dos processos submicrônicos, a capacidade de integração de transistores numa mesma pastilha de silício atingiu níveis que possibilitaram a construção dos sistemas com múltiplos processadores num chip (MPSoCs, do inglês MultiProcessor System-on-Chip). Essa possibilidade de integração permite inserir dezenas de Elementos de Processamento (EPs) nos circuitos integrados atuais, e já se projeta centenas de EPs para os sistemas da próxima década (ITRS, 2011). Nesse cenário, um dos principais desafios se refere ao serviço de interconexão dos EPs, que deve apresentar um desempenho de comunicação necessário para as aplicações em execução sem comprometer as limitações de consumo de área e energia do circuito. Nos primeiros sistemas multiprocessados, com poucos nodos, arquiteturas baseadas em barramento foram suficientes para cumprir esses requisitos. Porém, o número de elementos nos sistemas recentes aumentou rapidamente, tornando as redes-em-chip a solução mais apropriada, por aliar escalabilidade e reuso na mesma estrutura. Contudo, diante da previsão de que essa tendência de aumento se manterá retorna a discussão se as redes-em-chip atuais continuarão adequadas para os futuros sistemas. De fato, o custo das redes-em-chip convencionais pode se tornar proibitivo para as escalas dos circuitos em um futuro próximo. Novas propostas têm sido apresentadas na literatura científica onde se podem destacar duas principais estratégias de projeto às redes de interconexão: reconfiguração arquitetural e organização hierárquica da topologia. A reconfiguração arquitetural permite obter uma grande eficiência, independente do tipo de aplicação em execução, pois uma das alternativas é projetar o circuito para que ele se auto adapte conforme os requisitos de desempenho para cada aplicação. Por outro lado, arquiteturas organizadas em topologias hierárquicas são desenvolvidas para uma estrutura computacional definida em tempo de projeto, sendo mais eficazes para uma classe de aplicações. O presente trabalho explora a sinergia da combinação das potencialidades das duas soluções e propõe uma nova estrutura que oferece melhor desempenho para uma classe maior de aplicações apropriada para os futuros sistemas. Como resultado foi implementada uma arquitetura adaptativa chamada MINoC (Multiple Interconnections Networks-on-Chip), uma arquitetura organizada em hierarquia, chamada HiCIT (Hierarchical Crossbar-based Interconnection Topology) e uma simbiose de ambas culminando na arquitetura hierárquica adaptativa HASIN (Hierarchical Adaptive Switching Interconnection Network). São apresentados resultados que mostram a eficiência desses conceitos validando a proposta hierárquica adaptativa. / With the advent of submicron processes, the number of transistors integrated on a single chip has reached levels that allowed the design of Multiprocessor Systems-on-Chip (MPSoCs). This capability allows the integration of several processing elements (PEs) in integrated circuits designed nowadays. In the next decade it is expected that hundreds of PEs will be integrated on a single chip. In this scenario, a key challenge is the interconnection network between PEs, which must provide the communication service required to run applications without compromising the limitations of area and energy consumption. In the first multiprocessor systems, with few nodes, bus-based approaches have been sufficient to meet these requirements. However, current systems increased quickly the number of elements, making the Networks-on-Chip (NoCs) the most appropriate solution, because it handles scalability and reusability in the same structure. Nevertheless, ITRS roadmap predicts that this increase will continue (ITRS, 2011), which resumes the discussion if present NoC architectures will be the most adequate for future systems, since its costs could be prohibitive. Therefore, new proposals have been presented in the literature with two main design strategies: architectural reconfiguration and hierarchical organization of the topology. With the architectural reconfiguration it is possible to obtain an application independent high efficiency structure, because the circuit is designed to adapt itself to satisfy performance requirements. On the other hand, architectural organizations in hierarchical topologies are defined at design time to have the most appropriate features for a class of applications, being very effective. The current work identified the synergy of both approaches and proposes a new symbiotic structure suitable for a broader class of applications. As a result, it was implemented an adaptive architecture called MINoC (Multiple Interconexions Networks-on-chip), an architecture organized in hierarchy called HiCIT (Hierarchical Crossbar-based Interconnection Topology) and a mix of both ending up with the hierarchical adaptive architecture HASIN (Hierarchical Interconnection Network Adaptive Switching). Results show the efficiency of these concepts validating the proposed hierarchical adaptive architecture. Microeletrônica Sistemas embarcados MPSoC MPSoCs Network-on-chip Interconnections Adaptive architecture System-on-chip
56	Software performance estimation in MPSoC design / Estimativa de desempenho de software embarcado em sistemas multiprocessadores em uma única pastilha Oyamada, Marcio Seiji January 2007 (has links) Atualmente, novas metodologias de projeto são necessárias devido a crescente complexidade dos sistemas embarcados. Metodologias no nível de sistema são propostas para auxiliar o projetista a lidar com a crescente complexidade, iniciando o projeto em um nível de abstração mais alto que o nível de transferência de registradores. Ferramentas de estimativa de desempenho são uma importante parte das metodologias no nível de sistema, visto que as mesmas auxiliam a exploração do espaço de projeto desde os estágios iniciais. O objetivo desta tese é definir uma metodologia integrada para estimativa de desempenho do software. Atualmente, nota-se a crescente utilização de software embarcado, inclusive utilizando múltiplos processadores, visando atender os requisitos de flexibilidade, desempenho e potência consumida. O desenvolvimento de estimadores de desempenho de software não é trivial, devido à utilização de processadores embarcados com arquiteturas avançadas. Para auxiliar a seleção do processador no nível da especificação do sistema, um novo modelo de estimador do desempenho do software baseado em redes neurais é proposto. Redes neurais mostraram-se uma solução adequada para uma rápida estimativa de desempenho em um estágio inicial do projeto. Para realizar a análise do desempenho do software no nível funcional do barramento, onde o mapeamento do hardware e software já está definido, é utilizado um modelo global de simulação, chamado de protótipo virtual. A metodologia de análise de desempenho proposta neste trabalho é integrada a um ambiente para refinamento de interfaces de hardware e software chamada ROSES. A metodologia proposta é avaliada através de um estudo de caso de uma arquitetura multiprocessada de um codificador MPEG4. / Nowadays, embedded system complexity requires new design methodologies. System-level methodologies are proposed to cope with this complexity, starting the design above the register-transfer level. Performance estimation tools are an important piece of system-level design methodologies, since they are used to aid design space exploration at an early design stage. The goal of this thesis is to define an integrated methodology for software performance estimation. Currently, embedded software usage is increasing, becoming multiprocessor system-on-chip a common solution to cope with flexibility, performance, and power requirements. The development of accurate software performance estimators is not trivial, due to the increased complexity of embedded processors. To drive processor selection at specification level, a novel analytic software performance estimator based on neural networks is proposed. The neural network enables a fast estimation at an early design stage. To target the software performance analysis at bus functional level, where mapping of the hardware and software components is already established, we use a global simulation model supporting performance profiling. The proposed software performance estimation methodology is linked to a hardware and software interface refinement environment named ROSES. The proposed methodology is evaluated through a case study of a multiprocessor MPEG4 encoder. Sistemas embarcados SoC Microeletrônica Performance estimation MPSoC design Design space exploration
57	Nástroj pro grafické prototypování systémů na čipu / Graphical Tool for Rapid Prototyping of System on the Chip Netočný, Ondřej January 2013 (has links) This thesis deals with design and implementing of a tool for development of MPSoC (multiprocessor systems on chip). It is going to apprise the reader with this matter and introduces several ways how to solve these issues in Codasip Studio IDE (integrated development environment). The graphical editor for multicore system development and a set of support tools for fast and effective development are introduced in this thesis. These are mainly interactive wizards which help user to start new projects. To handle the subject matter it is necessary to understand CodAL language, Eclipse IDE, GMF (Graphical Modeling Framework) and EMF (Eclipse Modeling Framework) which are used for graphical editor implementation.
58	Beschleunigerarchitekturen zur energieeffizienten Datenbank-Anfrageverarbeitung in Mehrprozessorsystemen Haas, Sebastian 21 May 2019 (has links) Die Datenverarbeitung auf einer weltweit stetig wachsenden Informationsmenge und die hohen Anforderungen an die Energieeffizienz der Rechensysteme sind allgegenwärtige Herausforderungen der heutigen Zeit. Dabei werden zunehmend Datenbanken und deren Funktionalitäten eingesetzt, um diese großen Datenmengen effizient zu verwalten, abzuspeichern und zu verarbeiten. Auf Grund ihrer universellen Anwendbarkeit und der hohen Leistungsfähigkeit werden zumeist hoch-performante General-Purpose (GP) Prozessoren für administrative als auch für die Anfrageverarbeitung in Datenbanksystemen eingesetzt. Die Anfrageverarbeitung führt dabei eine Reihe von Operatoren wie z. B. das Suchen, Sortieren, oder Hashing aus, die signifikant die Gesamtleistung des Datenbanksystems beeinflussen. Um die weiter steigenden Anforderungen an Durchsatz, Latenz und Verlustleistung zu erfüllen, wurden bisher die Taktfrequenzen und damit die Leistungsfähigkeit von GP-Prozessoren kontinuierlich erhöht. In Zukunft werden jedoch die physikalischen Eigenschaften der verwendeten Halbleitermaterialien die Rechenleistung begrenzen. Diese Arbeit entwickelt und analysiert Beschleunigerarchitekturen für die Datenbank-Anfrageverarbeitung, um die Leistungsfähigkeit der zugrundeliegenden Datenbankoperatoren zu steigern. Die Datenbankbeschleuniger (DBA) werden als anwendungsspezifische Hardwareblöcke (ASIC) und als Prozessor mit erweiterten Befehlssatz (ASIP) implementiert, die eine Parallelisierung der Algorithmen auf Bit-, Daten- und Befehlsebene ermöglichen. Der erste Ansatz erlaubt eine hohe Beschleunigung bei gleichzeitig niedrigem Flächen- und Leistungsverbrauch der Hardware. Im Gegensatz dazu steht beim ASIP-Ansatz bereits ein konfigurierbarer Basisprozessor zur Verfügung, der die Befehlssteuerung übernimmt und damit eine einfache Anpassung des DBAs an zahlreiche Datenbankoperatoren ermöglicht. Die vorgestellten DBAs erreichen damit die Leistungsfähigkeit von optimierten GP-Prozessoren bei einer um bis zu drei Größenordnungen höheren Energie- und Flächeneffizienz. Für die Parallelisierung der Datenbankoperatoren auf Taskebene werden die DBAs in das Tomahawk Multiprozessorsystem auf einem Chip (MPSoC) integriert, das ein skalierbares Network-on-Chip und DMA-Controller für einen intelligenten Datentransfer bereitstellt. Eine zentrale Scheduling-Einheit arbeitet dabei den Anfrageausführungsplan ab und steuert die Zuweisung der Tasks auf die Verarbeitungseinheiten und den Transfer der Daten zu einem externen Speicher. Des Weiteren ist die Skalierung von Taktfrequenzen und Versorgungsspannungen möglich, um Durchsatz und Leistungsverbrauch an die Lastanforderungen anzupassen und damit den Energieverbrauch zu minimieren. Darüber hinaus wird das Tomahawk MPSoC mit Hilfe von Simulationen in einem virtuellen Prototyp und mit analytischen Modellen der Datenbankoperatoren hinsichtlich der Skalierbarkeit untersucht. Diese Auswertungen zeigen das Verhalten der Algorithmen bei steigender Prozessoranzahl und wachsenden Kardinalitäten sowie in Abhängigkeit der Speicherbandbreiten und relevanter algorithmusspezifischer Parameter. info:eu-repo/classification/ddc/621.3 ddc:621.3
59	A Novel Prototyping and Evaluation Framework for NoC-Based MPSoC Tatas, K., Siozios, K., Bartzas, A., Kyriacou, Costas, Soudris, D. January 2013 (has links) No / This paper presents a framework for high-level exploration, Register Transfer-Level (RTL) design and rapid prototyping of Network-on-Chip (NoC) architectures. From the high-level exploration, a selected NoC topology is derived, which is then implemented in RTL using an automated design flow. Furthermore, for verification purposes, appropriate self-checking testbenches for the verification of the RTL and architecture files for the semi-automatic implementation of the system in Xilinx EDK are also generated, significantly reducing design and verification time, and therefore Non-Recurring Engineering (NRE) cost. Simulation and FPGA implementation results are given for four case studies multimedia applications, proving the validity of the proposed approach.
60	Optimization and Scheduling on Heterogeneous CPU/FPGA Architecture with Communication Delays / Optimisation et ordonnancement sur une architecture hétérogène CPU/FPGA avec délais de communication Abdallah, Fadel 21 December 2017 (has links) Le domaine de l'embarqué connaît depuis quelques années un essor important avec le développement d'applications de plus en plus exigeantes en calcul auxquels les architectures traditionnelles à base de processeurs (mono/multi cœur) ne peuvent pas toujours répondre en termes de performances. Si les architectures multiprocesseurs ou multi cœurs sont aujourd'hui généralisées, il est souvent nécessaire de leur adjoindre des circuits de traitement dédiés, reposant notamment sur des circuits reconfigurables, permettant de répondre à des besoins spécifiques et à des contraintes fortes particulièrement lorsqu'un traitement temps-réel est requis. Ce travail présente l'étude des problèmes d'ordonnancement dans les architectures hétérogènes reconfigurables basées sur des processeurs généraux (CPUs) et des circuits programmables (FPGAs). L'objectif principal est d'exécuter une application présentée sous la forme d'un graphe de précédence sur une architecture hétérogène CPU/FPGA, afin de minimiser le critère de temps d'exécution total ou makespan (Cmax). Dans cette thèse, nous avons considéré deux cas d'étude : un cas d'ordonnancement qui tient compte des délais d'intercommunication entre les unités de calcul CPU et FPGA, pouvant exécuter une seule tâche à la fois, et un autre cas prenant en compte le parallélisme dans le FPGA, qui peut exécuter plusieurs tâches en parallèle tout en respectant la contrainte surfacique. Dans un premier temps, pour le premier cas d'étude, nous proposons deux nouvelles approches d'optimisation, GAA (Genetic Algorithm Approach) et MGAA (Modified Genetic Algorithm Approach), basées sur des algorithmes génétiques. Nous proposons également de tester un algorithme par séparation et évaluation (méthode Branch & Bound). Les approches GAA et MGAA proposées offrent un très bon compromis entre la qualité des solutions obtenues (critère d'optimisation de makespan) et le temps de calcul nécessaire à leur obtention pour résoudre des problèmes à grande échelle, en comparant à la méthode par séparation et évaluation (Branch & Bound) proposée et l'autre méthode exacte proposée dans la littérature. Dans un second temps, pour le second cas d'étude, nous avons proposé et implémenté une méthode basée sur les algorithmes génétiques pour résoudre le problème du partitionnement temporel dans un circuit FPGA en utilisant la reconfiguration dynamique. Cette méthode fournit de bonnes solutions avec des temps de calcul raisonnables. Nous avons ensuite amélioré notre précédente approche MGAA afin d'obtenir une nouvelle approche intitulée MGA (Multithreaded Genetic Algorithm), permettent d'apporter des solutions au problème de partitionnement. De plus, nous avons également proposé un algorithme basé sur le recuit simulé, appelé MSA (Multithreaded Simulated Annealing). Ces deux approches proposées, basées sur les méthodes métaheuristiques, permettent de fournir des solutions approchées dans un intervalle de temps très raisonnable aux problèmes d'ordonnancement et de partitionnement sur système de calcul hétérogène / The domain of the embedded systems becomes more and more attractive in recent years with the development of increasing computationally demanding applications to which the traditional processor-based architectures (either single or multi-core) cannot always respond in terms of performance. While multiprocessor or multicore architectures have now become generalized, it is often necessary to add to them dedicated processing circuits, based in particular on reconfigurable circuits, to meet specific needs and strong constraints, especially when real-time processing is required. This work presents the study of scheduling problems into the reconfigurable heterogeneous architectures based on general processors (CPUs) and programmable circuits (FPGAs). The main objective is to run an application presented in the form of a Data Flow Graph (DFG) on a heterogeneous CPU/FPGA architecture in order to minimize the total running time or makespan criterion (Cmax). In this thesis, we have considered two case studies: a scheduling case taking into account the intercommunication delays and where the FPGA device can perform a single task at a time, and another case taking into account parallelism in the FPGA, which can perform several tasks in parallel while respecting the constraint surface. First, in the first case, we propose two new optimization approaches GAA (Genetic Algorithm Approach) and MGAA (Modified Genetic Algorithm Approach) based on genetic algorithms. We also propose to compare these algorithms to a Branch & Bound method. The proposed approaches (GAA and MGAA) offer a very good compromise between the quality of the solutions obtained (optimization makespan criterion) and the computational time required to perform large-scale problems, unlike to the proposed Branch & Bound and the other exact methods found in the literature. Second, we first implemented an updated method based on genetic algorithms to solve the temporal partitioning problem in an FPGA circuit using dynamic reconfiguration. This method provides good solutions in a reasonable running time. Then, we improved our previous MGAA approach to obtain a new approach called MGA (Multithreaded Genetic Algorithm), which allows us to provide solutions to the partitioning problem. In addition, we have also proposed an algorithm based on simulated annealing, called MSA (Multithreaded Simulated Annealing). These two proposed approaches which are based on metaheuristic methods provide approximate solutions within a reasonable time period to the scheduling and partitioning problems on a heterogeneous computing system Algorithme génétique Problèmes d'ordonnancement Optimisation combinatoire Métaheuristiques Makespan (longueur d'ordonnancement) Programmation linéaire Système hétérogène MPSoC Parallélisme Systèmes embarqués reconfigurables Genetic algorithm Task scheduling problem Combinatorial optimization Metaheuristics Makespan (schedule length) Linear programming Heterogeneous system MPSoC Parallelism Embedded reconfigurable systems 006.22

Search results