121 |
Communication centric platforms for future high data intensive applicationsAhmad, Balal January 2009 (has links)
The notion of platform based design is considered as a viable solution to boost the design productivity by favouring reuse design methodology. With the scaling down of device feature size and scaling up of design complexity, throughput limitations, signal integrity and signal latency are becoming a bottleneck in future communication centric System-on-Chip (SoC) design. This has given birth to communication centric platform based designs. Development of heterogeneous multi-core architectures has caused the on-chip communication medium tailored for a specific application domain to deal with multidomain traffic patterns. This makes the current application specific communication centric platforms unsuitable for future SoC architectures. The work presented in this thesis, endeavours to explore the current communication media to establish the expectations from future on-chip interconnects. A novel communication centric platform based design flow is proposed, which consists of four communication centric platforms that are based on shared global bus, hierarchical bus, crossbars and a novel hybrid communication medium. Developed with a smart platform controller, the platforms support Open Core Protocol (OCP) socket standard, allowing cores to integrate in a plug and play fashion without the need to reprogram the pre-verified platforms. This drastically reduces the design time of SoC architectures. Each communication centric platform has different throughput, area and power characteristics, thus, depending on the design constraints, processing cores can be integrated to the most appropriate communication platform to realise the desired SoC architecture. A novel hybrid communication medium is also developed in this thesis, which combines the advantages of two different types of communication media in a single SoC architecture. The hybrid communication medium consists of crossbar matrix and shared bus medium . Simulation results and implementation of WiMAX receiver as a real-life example shows a 65% increase in data throughput than shared bus based communication medium, 13% decrease in area and 11% decrease in power than crossbar based communication medium. In order to automate the generation of SoC architectures with optimised communication architectures, a tool called SOCCAD (SoC Communication architecture development) is developed. Components needed for the realisation of the given application can be selected from the tool’s in-built library. Offering an optimised communication centric placement, the tool generates the complete SystemC code for the system with different interconnect architectures, along with its power and area characteristics. The generated SystemC code can be used for quick simulation and coupled with efficient test benches can be used for quick verification. Network-on-Chip (NoC) is considered as a solution to the communication bottleneck in future SoC architectures with data throughput requirements of over 10GB/s. It aims to provide low power, efficient link utilisation, reduced data contention and reduced area on silicon. Current on-chip networks, developed with fixed architectural parameters, do not utilise the available resources efficiently. To increase this efficiency, a novel dynamically reconfigurable NoC (drNoC) is developed in this thesis. The proposed drNoC reconfigures itself in terms of switching, routing and packet size with the changing communication requirements of the system at run time, thus utilising the maximum available channel bandwidth. In order to increase the applicability of drNoC, the network interface is designed to support OCP socket standard. This makes drNoC a highly reuseable communication framework, qualifying it as a communication centric platform for high data intensive SoC architectures. Simulation results show a 32% increase in data throughput and 22-35% decrease in network delay when compared with a traditional NoC with fixed parameters.
|
122 |
Modélisation et optimisation de la couche optique de réseaux sur puce / Modeling and optimization of optical layer networks on chipChannoufi, Malèk 28 February 2014 (has links)
Dans le cadre du développement de SoC (Systems-on-Chip) complexes, l'interconnexion des différent IP matériels (Intellectual Property), très distants à l'échelle d'un circuit intégré (typiquement quelques centimètres) et devant s'échanger des volumes de données parfois important, incite, pour des raisons de débit, de latence, de pertes et de consommation, l'adoption d'une méthodologie de conception adéquate pour réaliser des systèmes de plus en plus flexibles. Afin de répondre à ces nouvelles difficultés de conception, de nombreuses recherches ont fait émerger le concept de réseau optique sur puce (Optical Network-on-Chip ou ONoC).Dans cette thèse une étude détaillée d'une nouvelle architecture d'un réseau optique sur puce a été faite. La conception de ce réseau repose sur 2 paradigmes d'interconnexion: concevoir l'architecture dans le cadre d'une puce en 3D et l'empilement en plusieurs niveaux des guides d'onde optique dans la couche réseau optique sur puce. L'élément clef de cette architecture est un microrésonateur à plusieurs niveaux de guide d'onde (Si/SiO2). De ce fait, une étude détaillée sur le comportement optique de ce composant avec des modèles mathématiques et des simulations FEM a été faite dans le but d'optimiser la perte de puissance optique, le nombre des niveaux des guides d'onde empilés et la consommation d'énergie.Après avoir détaillé le fonctionnement de réseau multi-niveaux sur puce proposé "OMNoC", son protocole de routage a été étudié avec le simulateur NS-2, puis optimisé, rédiger et étudier avec C++ et l'outil Parsec Benchmark. Enfin et en tenant compte des études faites sur le comportement optique des guides d'onde et le protocole de routage, une étude desperformances comparatives avec des autres architectures a été élaborée montrant ainsi les avantages et les limites d'une telle méthodologie d'interconnexion. / The developing of complex System on Chip "SoC" interconnecting different cores IP distant in micrometer chip scale, needs important data bandwidth , low latency and the best compromise between optical power loss and crosstalk. According to that, finding new methodology design is necessary to cope to those challenges.Using centric communication becomes the mainly solution to improve communication performance in system on chip and recently many researches are focusing on Optical Network on Chip 'ONoC'.In this thesis, a novel architecture of an optical network on chip is proposed, this architecture is reposed on 2 design paradigms: ONoC based 3D chip and multilevel waveguides based ONoC. The key element of this architecture is the multilevel microresonator (Si/SiO2) which is the optical switch of the network. Optical wave behavior in different geometries have been studied using FEM method in order to find compromise between optical power loss and crosstalk. Operation mode of this ONoC called "OMNoC" is explained, routing protocol is studied using NS-2 simulator too, then optimized and developed using C++ and Benchmark tool. After that and by using FEM results and adopted routing strategy, OMNoC performances are studied and compared with other network architectures proposed in ONoC literature. In conclusion and according to performances analysis and comparisons, OMNoC could be considered as a promising network architecture which offer scalability and give a compromise between optical power loss and crosstalk.
|
123 |
Caracterização analítica de carga de trabalho baseada em cenários de aplicações multimídia. / Analytical characterization of workload based on scenarios of multimedia applications.Patiño Alvarez, Gustavo Adolfo 07 December 2012 (has links)
As metodologias clássicas de análise de desempenho de sistemas sobre silício (System on chip, SoC) geralmente são descritas em função do tempo de execução do pior-caso1 das tarefas a serem executadas. No entanto, nas aplicações do mundo real, o tempo de execução destas tarefas pode variar devido à presença dos diferentes eventos de entrada que ativam o sistema, colocando uma exigência diferente de execução sobre os recursos do sistema. Geralmente, um modelo da carga de trabalho é uma parte integrante de um modelo de desempenho utilizado para avaliar o desempenho de um sistema. O quão bom for um modelo de carga de trabalho determina em grande medida a qualidade das soluções do projeto e a precisão das estimativas de desempenho baseadas nele. Nesta tese, é abordado o problema de modelar a carga de trabalho para o projeto de sistemas de tempo-real cuja funcionalidade envolve processamento de fluxos de multimídia, isto é, fluxos de dados representando áudio, imagens ou vídeo. O problema de modelar a carga de trabalho é abordado sob a premissa de que uma caracterização acurada do comportamento temporal do software embarcado permite ao projetista identificar diversas exigências variáveis de execução, apresentadas para os diversos recursos de arquitetura do sistema, tanto na operação individual do conjunto de tarefas de software, assim como na execução global da aplicação, em fase de projeto. A caracterização do comportamento de cada tarefa foi definida a partir de uma análise temporal dos códigos de software associados às diferentes tarefas de uma aplicação, a fim de identificar os múltiplos modos de operação que o código pode apresentar dentro de um processador. Esta caracterização é feita através da realização de uma análise estática das rotas do código executável, de forma que para cada rota de execução encontrada, estimam-se os tempos extremos de execução (WCET e BCET)2, baseando-se na modelagem da microarquitetura de um processador on-chip. Desta forma, cada rota do código executável junto aos seus respectivos tempos de execução, constitui um modo de operação do código analisado. A fim de agrupar os diversos modos de operação que apresentam um grau de semelhança entre si de acordo a uma perspectiva da medida de processamento utilizado do processador modelado, foi utilizado o conceito de cenário, o qual diferencia o comportamento de cada tarefa em relação às entradas que a aplicação sob análise pode receber. Partindo desta caracterização temporal de cada tarefa de software, as exigências da execução global da aplicação são representadas através de um modelo analítico de eventos. O modelo considera as diferentes tarefas como atores temporais de um grafo de fluxo síncrono de dados, de modo que os diferentes cenários de operação da aplicação são definidos em função dos tempos variáveis de execução identificados previamente na caracterização de cada tarefa. Uma descrição matemática deste modelo, baseada na Álgebra de Max-Plus, permite caracterizar analiticamente os diferentes fluxos de eventos entre a entrada e a saída da aplicação, assim como os fluxos de eventos entre as diferentes tarefas, considerando as mudanças nas exigências de processamento associadas aos diversos cenários previamente identificados. Esta caracterização analítica dos diversos fluxos de eventos de entrada e saída é a base para um modelo de curvas de carga de trabalho baseada em cenários de aplicação, e um modelo de curvas de serviços baseada também em cenários, que dão lugar a caracterizar o dinamismo comportamental da aplicação analisada, determinado pela diversidade de eventos de entrada que podem ativar diferentes comportamentos do sistema em fase de execução. / Classical methods for performance analysis of Multiprocessor System-on-chip (MPSoCs) are usually described in terms of Worst-Case Execution Times (WCET) of the executed tasks. Nevertheless, in real-world applications the running time of tasks varies due to different input events that trigger the system, imposing a different workload on the system resources. Usually, a workload model is a part of a performance model used to evaluate the performance of a system. How good is a workload model largely determines the quality of design solutions and the accuracy of performance estimations based on it. This thesis addresses the problem of modeling the workload for the design of real-time systems which functionality involves multimedia streams processing, i.e, data streams representing audio, images or video. The workload modeling problem is addressed from the assumption that an accurate characterization of timing behavior of real-time embedded software enables the designer to identify several variable execution requirements that the individual operation of the software tasks and the overall execution of the application will present to the several system resources of an architecture, in design phase. The software task characterization was defined from a timing analysis of the source code in order to identify the multiple operating modes the code can exhibit within a processor. This characterization is done by performing a static path analysis on the code, so that for each given path the worst-case and bestcase execution times (WCET and BCET) were estimated, based on a microarchitectural modeling of an on-chip processor. Thus, every execution path of the code, with its estimated execution times, defines an operation mode of the analyzed code. In order to cluster the several operation modes that exhibit certain degree of similarity according to the required amount of processing in the modeled processor, the concept of scenario was used, which differentiates every task behavior with respect to the several inputs the application under analysis may receive. From this timing characterization of every application task, the global execution requirements of the application are represented by an analytical event model. It describes the tasks as timed actors of a synchronous dataflow graph, so that the multiple application scenarios are defined in terms of the variable execution times previously identified in the task characterization. A mathematical description of this model based on the Max-Plus Algebra allows one to characterize the different event sequences incoming to, and exiting from, the application as well as the event sequences between the different tasks, having in count changes in the processing requirements associated with the various scenarios previously identified. This analytical characterization between the input event sequences and the output event sequences states the basis for a model of scenario-based workload curves and a model of scenario-based service curves that allow characterizing the behavioral dynamism of the application determined by the several input events that activate several system behaviors, in the execution phase.
|
124 |
Conception architecturale pour la tolérance aux fautes d'un système auto-organisé multi-noeuds en réseau à base de NoC reconfigurables / Architectural design for fault tolerance networked multi-node self organized systems based on reconfigurable NoCsHeil, Mikael 04 December 2015 (has links)
Afin de répondre à des besoins croissants de performance et de fiabilité des systèmes sur puce embarqués pour satisfaire aux applications de plus en plus complexes, de nouveaux paradigmes architecturaux et structures de communication auto-adaptatives et auto-organisées sont à élaborer. Ces nouveaux systèmes de calcul intègrent au sein d'une même puce électronique plusieurs centaines d'éléments de calcul (systèmes sur puce multiprocesseur - MPSoC) et doivent permettre la mise à disposition d'une puissance de calcul parallèle suffisante tout en bénéficiant d'une grande flexibilité et d'une grande adaptabilité. Le but est de répondre aux évolutions des traitements distribués caractérisant le contexte évolutif du fonctionnement des systèmes. Actuellement, les performances de tels systèmes reposent sur une autonomie et une intelligence permettant de déployer et de redéployer les modules de calcul en temps réel en fonction de la demande de traitement et de la puissance de calcul. Elle dépend également des supports de communication entre les blocs de calcul afin de fournir une bande passante et une adaptabilité élevée pour une efficacité du parallélisme potentiel de la puissance de calcul disponible des MPSoC. De plus, l'apparition de la technologie FPGA reconfigurable dynamiquement a ouvert de nouvelles approches permettant aux MPSoC d'adapter leurs constituants en cours de fonctionnement, et de répondre aux besoins croissants d'adaptabilité et de flexibilité. C'est dans ce contexte du besoin primordial de flexibilité, de puissance de calcul et de bande passante qu'est apparue une nouvelle approche de conception des systèmes communicants, auto-organisés et auto-adaptatifs basés sur des nœuds de calcul reconfigurables. Ces derniers sont constitués de réseaux embarqués sur puce (NoC) permettant l'interconnexion optimisée d'un grand nombre d'éléments de calcul au sein d'une même puce, tout en assurant l'exigence d'une tolérance aux fautes et d'un compromis entre les performances de communication et les ressources d'interconnexion. Ces travaux de thèse ont pour objectif d'apporter des solutions architecturales innovantes pour la SdF des systèmes MPSoC en réseau basés sur la technologie FPGA, et configurés selon une structure distribuée et auto-organisée. L'objectif est d'obtenir des systèmes sur puce performants et fiables intégrant des techniques de détection, de localisation et de correction d'erreurs au sein de leurs structures NoC reconfigurables ou adaptatifs. La principale difficulté réside dans l'identification et la distinction entre des erreurs réelles et des fonctionnements variables ou adaptatifs des éléments constituants ces nœuds en réseau. Ces travaux ont permis de réaliser un réseau de nœuds reconfigurables à base de FPGA intégrant des structures NoC dynamiques, capables de s'auto-organiser et de s'auto-tester dans le but d'obtenir une maintenabilité maximale du fonctionnement du système dans un contexte en réseau. Dans ces travaux, un système communicant multi-nœuds MPSoC reconfigurable capable d'échanger et d'interagir a été développé, permettant ainsi une gestion avancée de tâches, la création et l'auto-gestion de mécanismes de tolérance aux fautes. Différentes techniques sont combinées et permettent d'identifier et localiser avec précision les éléments défaillants d'une telle structure dans le but de les corriger ou de les isoler pour prévenir toutes défaillances du système. Elles ont été validées au travers de nombreuses simulations matérielles afin d'estimer leur capacité de détection et de localisation des sources d'erreurs au sein d'un réseau. De même, des synthèses logiques du système intégrant les différentes solutions proposées sont analysées en termes de performances et de ressources logiques consommées dans le cas de la technologie FPGA / The need of growing performance and reliability of embedded System-on-Chips SoCs are increasing constantly to meet the requirements of applications becoming more and more complexes, new architectural processing paradigms and communication structures based in particular on self-adaptive and self-organizing structures have emerged. These new computing systems integrate within a single chip of hundreds of computing or processing elements (Multiprocessor Systems on Chip - MPSoC) allowing to feature a high level of parallel processing while providing high flexibility or adaptability. The goal is to change possible configurations of the distributed processing characterizing the evolving context of the networked systems. Nowadays, the performance of these systems relies on autonomous and intelligence allowing to deploy and redeploy the compute modules in real time to the request processing and computing power, the communication medium and data exchange between interconnected processing elements to provide bandwidth scalability and high efficiency for the potential parallelism of the available computing power of MPSoC. Moreover, the emergence of the partial reconfigurable FPGA technology allows to the MPSoC to adapt their elements during its operation in order to meet the system requirements. In this context, flexibility, computing power and high bandwidth requirements lead new approach to the design of self-organized and self-adaptive communication systems based Network-on-Chips (NoC). The aim is to allow the interconnection of a large number of elements in the same device while maintaining fault tolerance requirement and a compromise between parallel processing capacity of the MPSoC, communication performance, interconnection resources and tradeoff between performance and logical resources. This thesis work aims to provide innovative architectural solutions for networked fault tolerant MPSoC based on FPGA technology and configured as a distributed and self-organized structure. The objective is to obtain performance and reliable systems on chips incorporating detection, localization and correction of errors in their reconfigurable or adaptive NoC structures where the main difficulty lies in the identification and distinction between real errors and adaptive properties in these network nodes. More precisely, this work consists to perform a networked node based on reconfigurable FPGA which integrates dynamic or adaptive NoC capable of self-organized and self-test in order to achieve maximum maintainability of system operation in a networked environment (WSN). In this work, we developed a reconfigurable multi-node system based on MPSoC which can exchange and interact, allowing an efficient task management and self-management of fault tolerance mechanisms. Different techniques are combined and used to identify and precisely locate faulty elements of such a structure in order to correct or isolate them in order to prevent failures of the system. Validations through the many hardware simulations to estimate their capacity of detecting and locating sources of error within a network have been presented. Likewise, synthesized logic systems incorporating the various proposed solutions are analyzed in terms of performance and logic resources in the case of FPGA technology
|
125 |
Caracterização analítica de carga de trabalho baseada em cenários de aplicações multimídia. / Analytical characterization of workload based on scenarios of multimedia applications.Gustavo Adolfo Patiño Alvarez 07 December 2012 (has links)
As metodologias clássicas de análise de desempenho de sistemas sobre silício (System on chip, SoC) geralmente são descritas em função do tempo de execução do pior-caso1 das tarefas a serem executadas. No entanto, nas aplicações do mundo real, o tempo de execução destas tarefas pode variar devido à presença dos diferentes eventos de entrada que ativam o sistema, colocando uma exigência diferente de execução sobre os recursos do sistema. Geralmente, um modelo da carga de trabalho é uma parte integrante de um modelo de desempenho utilizado para avaliar o desempenho de um sistema. O quão bom for um modelo de carga de trabalho determina em grande medida a qualidade das soluções do projeto e a precisão das estimativas de desempenho baseadas nele. Nesta tese, é abordado o problema de modelar a carga de trabalho para o projeto de sistemas de tempo-real cuja funcionalidade envolve processamento de fluxos de multimídia, isto é, fluxos de dados representando áudio, imagens ou vídeo. O problema de modelar a carga de trabalho é abordado sob a premissa de que uma caracterização acurada do comportamento temporal do software embarcado permite ao projetista identificar diversas exigências variáveis de execução, apresentadas para os diversos recursos de arquitetura do sistema, tanto na operação individual do conjunto de tarefas de software, assim como na execução global da aplicação, em fase de projeto. A caracterização do comportamento de cada tarefa foi definida a partir de uma análise temporal dos códigos de software associados às diferentes tarefas de uma aplicação, a fim de identificar os múltiplos modos de operação que o código pode apresentar dentro de um processador. Esta caracterização é feita através da realização de uma análise estática das rotas do código executável, de forma que para cada rota de execução encontrada, estimam-se os tempos extremos de execução (WCET e BCET)2, baseando-se na modelagem da microarquitetura de um processador on-chip. Desta forma, cada rota do código executável junto aos seus respectivos tempos de execução, constitui um modo de operação do código analisado. A fim de agrupar os diversos modos de operação que apresentam um grau de semelhança entre si de acordo a uma perspectiva da medida de processamento utilizado do processador modelado, foi utilizado o conceito de cenário, o qual diferencia o comportamento de cada tarefa em relação às entradas que a aplicação sob análise pode receber. Partindo desta caracterização temporal de cada tarefa de software, as exigências da execução global da aplicação são representadas através de um modelo analítico de eventos. O modelo considera as diferentes tarefas como atores temporais de um grafo de fluxo síncrono de dados, de modo que os diferentes cenários de operação da aplicação são definidos em função dos tempos variáveis de execução identificados previamente na caracterização de cada tarefa. Uma descrição matemática deste modelo, baseada na Álgebra de Max-Plus, permite caracterizar analiticamente os diferentes fluxos de eventos entre a entrada e a saída da aplicação, assim como os fluxos de eventos entre as diferentes tarefas, considerando as mudanças nas exigências de processamento associadas aos diversos cenários previamente identificados. Esta caracterização analítica dos diversos fluxos de eventos de entrada e saída é a base para um modelo de curvas de carga de trabalho baseada em cenários de aplicação, e um modelo de curvas de serviços baseada também em cenários, que dão lugar a caracterizar o dinamismo comportamental da aplicação analisada, determinado pela diversidade de eventos de entrada que podem ativar diferentes comportamentos do sistema em fase de execução. / Classical methods for performance analysis of Multiprocessor System-on-chip (MPSoCs) are usually described in terms of Worst-Case Execution Times (WCET) of the executed tasks. Nevertheless, in real-world applications the running time of tasks varies due to different input events that trigger the system, imposing a different workload on the system resources. Usually, a workload model is a part of a performance model used to evaluate the performance of a system. How good is a workload model largely determines the quality of design solutions and the accuracy of performance estimations based on it. This thesis addresses the problem of modeling the workload for the design of real-time systems which functionality involves multimedia streams processing, i.e, data streams representing audio, images or video. The workload modeling problem is addressed from the assumption that an accurate characterization of timing behavior of real-time embedded software enables the designer to identify several variable execution requirements that the individual operation of the software tasks and the overall execution of the application will present to the several system resources of an architecture, in design phase. The software task characterization was defined from a timing analysis of the source code in order to identify the multiple operating modes the code can exhibit within a processor. This characterization is done by performing a static path analysis on the code, so that for each given path the worst-case and bestcase execution times (WCET and BCET) were estimated, based on a microarchitectural modeling of an on-chip processor. Thus, every execution path of the code, with its estimated execution times, defines an operation mode of the analyzed code. In order to cluster the several operation modes that exhibit certain degree of similarity according to the required amount of processing in the modeled processor, the concept of scenario was used, which differentiates every task behavior with respect to the several inputs the application under analysis may receive. From this timing characterization of every application task, the global execution requirements of the application are represented by an analytical event model. It describes the tasks as timed actors of a synchronous dataflow graph, so that the multiple application scenarios are defined in terms of the variable execution times previously identified in the task characterization. A mathematical description of this model based on the Max-Plus Algebra allows one to characterize the different event sequences incoming to, and exiting from, the application as well as the event sequences between the different tasks, having in count changes in the processing requirements associated with the various scenarios previously identified. This analytical characterization between the input event sequences and the output event sequences states the basis for a model of scenario-based workload curves and a model of scenario-based service curves that allow characterizing the behavioral dynamism of the application determined by the several input events that activate several system behaviors, in the execution phase.
|
126 |
Analysis and Development of Error-Job Mapping and Scheduling for Network-on-Chips with Homogeneous ProcessorsKarlsson, Erik January 2010 (has links)
<p>Due to increased complexity of today’s computer systems, which are manufactured in recent semiconductor technologies, and the fact that recent semiconductor technologies are more liable to soft errors (non-permanent errors) it is inherently difficult to ensure that the systems are and will remain error-free. Depending on the application, a soft error can have serious consequences for the system. It is therefore important to detect the presence of soft errors as early as possible and recover from the erroneous state and maintain correct operation. There is an entire research area devoted on proposing, implementing and analyzing techniques that can detect and recover from these errors, known as fault tolerance. The drawback of using faulttolerance is that it usually introduces some overhead. This overhead may be for instance redundant hardware, which increases the cost of the system, or it may be a time overhead that negatively impacts on system performance. Thus a main concern when applying fault tolerance is to minimize the imposed overhead while the system is still able to deliver the correct error-free operation. In this thesis we have analyzed one well known fault tolerant technique, Rollback-Recovery with Checkpointing (RRC). This technique is able to detect and recover from errors by taking and storing checkpoints during the execution of a job.Therefore we can think as if a job is divided into a number of execution segments and a checkpoint is taken after executing each execution segment. This technique requires the job to be concurrently executed on two processors. At each checkpoint, both processors exchange data, which contains enough information for the job’s state. The exchanged data are then compared. If the data differ, it means that an error is detected in the previous execution segment and it is therefore re-executed. If the exchanged data are the same, it means that no errors are detected and the data are stored as a safe point from which the job can be restarted later. A time overhead due to exchanging data between processors is therefore introduced, and it increases the average execution time of a job, i.e. the average time required for a given job to complete. The overhead depends on the number of links that has to be traversed (due to data exchange) after each execution segment and the number of execution segments that are needed for the given job. The number of links that has to be traversed after each execution segment is twice the distance between the processors that are executing the same job concurrently. However, this is only true if all the links are fully functional. A link failure can result in a longer route for communication between the processors. Even though all links arefully functional, the number of execution segments still depends on error-free probabilities of the processors, and these error-free probabilities can vary between processors. This implies that the choice of processors affects the total number of links the communication has to traverse. Choosing two processors with higher error-free probability further away from eachother increases the distance, but decreases the number of execution segments, which can result in a lower overhead. By carefully determining the mapping for a given job, one can decrease the overhead, hence decreasing the average execution time. Since it is very common to have a larger number of jobs than available resources, it is not only important to find a good mapping to decrease the average execution time for a whole system, but also a good order of execution for a given set jobs (scheduling of the jobs). We propose in this thesis several mapping and scheduling algorithms that aim to reduce the average execution time in a fault-tolerant multiprocessor System-on-Chip, which uses Network-on-Chip as an underlying interconnect architecture, so that the fault-tolerant technique (RRC) can perform efficiently.</p>
|
127 |
A Reconfigurable Computing Platform For Real Time Embedded ApplicationsSay, Fatih 01 September 2011 (has links) (PDF)
Today&rsquo / s reconfigurable devices successfully combine &lsquo / reconfigurable computing machine&rsquo / paradigm and &lsquo / high degree of parallelism&rsquo / and hence reconfigurable computing emerged as a
promising alternative for computing-intensive applications. Despite its superior performance
and lower power consumption compared to general purpose computing using microprocessors,
reconfigurable computing comes with a cost of design complexity. This thesis aims to
reduce this complexity by providing a flexible and user friendly development environment to
application programmers in the form of a complete reconfigurable computing platform.
The proposed computing platform is specially designed for real time embedded applications
and supports true multitasking by using available run time partially reconfigurable architectures.
For this computing platform, we propose a novel hardware task model aiming to minimize
logic resource requirement and the overhead due to the reconfiguration of the device.
Based on this task model an optimal 2D surface partitioning strategy for managing the hardware
resource is presented. A mesh network-on-chip is designed to be used as the communication
environment for the hardware tasks and a runtime mapping technique is employed to
lower the communication overhead.
As the requirements of embedded systems are known prior to field operation, an oine design
flow is proposed for generating the associated bit-stream for the hardware tasks. Finally, an
online real time operating system scheduler is given to complete the necessary building blocks
of a reconfigurable computing platform suitable for real time computing-intensive embedded
applications.
In addition to providing a flexible development environment, the proposed computing platform
is shown to have better device utilization and reconfiguration time overhead compared
to existing studies.
|
128 |
Implementação de algoritmos genéticos paralelos em uma arquitetura MPSoC. / Implementation of parallel genetic algorithms in an architecture MPSoC.Rubem Euzébio Ferreira 07 August 2009 (has links)
Essa dissertação apresenta a implementação de um algoritmo genético paralelo utilizando o modelo de granularidade grossa, também conhecido como modelo das ilhas, para sistemas embutidos multiprocessados. Os sistemas embutidos multiprocessados estão tornando-se
cada vez mais complexos, pressionados pela demanda por maior poder computacional requerido pelas aplicações, principalmente de multimídia, Internet e comunicações sem fio, que são executadas nesses sistemas. Algumas das referidas aplicações estão começando a utilizar algoritmos
genéticos, que podem ser beneficiados pelas vantagens proporcionadas pelo processamento paralelo
disponível em sistemas embutidos multiprocessados. No algoritmo genético paralelo do modelo das ilhas, cada processador do sistema embutido é responsável pela evolução de uma população de forma independente dos demais. A fim de acelerar o processo evolutivo, o operador
de migração é executado em intervalos definidos para realizar a migração dos melhores indivíduos entre as ilhas. Diferentes topologias lógicas, tais como anel, vizinhança e broadcast, são analisadas na fase de migração de indivíduos. Resultados experimentais são gerados para
a otimização de três funções encontradas na literatura. / This dissertation presents an implementation of a parallel genetic algorithm using the
coarse grained model, also known as the islands model, targeted to MPSoCs systems. MPSoC
systems are becoming more and more complex, due to the greater computational power
demanded by applications, mainly those that deal with multimedia, Internet and wireless communications,
which are executed within these systems. Some of these applications are starting
to use genetic algorithms, that can benefit from the parallel processing offered by MPSoC. In
the island model for parallel genetic algorithm, each processor is responsible for evolving the
corresponding population independently from the others. Aiming at accelerating the evolutionary
process, the migration operator is executed periodically in order to migrate the best
individuals among islands. Different logic topologies, such as ring, neighborhood and broadcast,
are analyzed during the migration step. Experimental results are generated for the
optimization of three functions found in the literature.
|
129 |
Reuse-based test planning for core-based systems-on-chip / Planejamento de teste para sistemas de hardware integrados baseados em componentes virtuaisCota, Erika Fernandes January 2003 (has links)
O projeto de sistemas eletrônicos atuais segue o paradigma do reuso de componentes de hardware. Este paradigma reduz a complexidade do projeto de um chip, mas cria novos desafios para o projetista do sistema em relação ao teste do produto final. O acesso aos núcleos profundamente embutidos no sistema, a integração dos diversos métodos de teste e a otimização dos diversos fatores de custo do sistema são alguns dos problemas que precisam ser resolvidos durante o planejamento do teste de produção do novo circuito. Neste contexto, esta tese propõe duas abordagens para o planejamento de teste de sistemas integrados. As abordagens propostas têm como principal objetivo a redução dos custos de teste através do reuso dos recursos de hardware disponíveis no sistema e da integração do planejamento de teste no fluxo de projeto do circuito. A primeira abordagem considera os sistemas cujos componentes se comunicam através de conexões dedicadas ou barramentos funcionais. O método proposto consiste na definição de um mecanismo de acesso aos componentes do circuito e de um algoritmo para exploração do espaço de projeto. O mecanismo de acesso prevê o reuso das conexões funcionais, o uso de barramentos de teste locais, núcleos transparentes e outros modos de passagem do sinal de teste. O algoritmo de escalonamento de teste é definido juntamente com o mecanismo de acesso, de forma que diferentes combinações de custos sejam exploradas. Além disso, restrições de consumo de potência do sistema podem ser consideradas durante o escalonamento dos testes. Os resultados experimentais apresentados para este método mostram claramente a variedade de soluções que podem ser exploradas e a efi- ciência desta abordagem na otimização do teste de um sistema complexo. A segunda abordagem de planejamento de teste propõe o reuso de redes em-chip como mecanismo de acesso aos componentes dos sistemas construídos sobre esta plataforma de comunicação. Um algoritmo de escalonamento de teste que considera as restrições de potência da aplicação é apresentado e a estratégia de teste é avaliada para diferentes configurações do sistema. Os resultados experimentais mostram que a capacidade de paralelização da rede em-chip pode ser explorada para reduzir o tempo de teste do sistema, enquanto os custos de área e pinos de teste são drasticamente minimizados. Neste manuscrito, os principais problemas relacionados ao teste dos sistemas integrados baseados em componentes virtuais são identificados e as soluções já apresentadas na literatura são discutidas. Em seguida, os problemas tratados por este traballho são listados e as abordagens propostas são detalhadas. Ambas as técnicas são validadas através dos sistemas disponíveis no ITC’02 SoC Test Benchmarks. As técnicas propostas são ainda comparadas com outras abordagens de teste apresentadas recentemente. Esta comparação confirma a eficácia dos métodos desenvolvidos nesta tese. / Electronic applications are currently developed under the reuse-based paradigm. This design methodology presents several advantages for the reduction of the design complexity, but brings new challenges for the test of the final circuit. The access to embedded cores, the integration of several test methods, and the optimization of the several cost factors are just a few of the several problems that need to be tackled during test planning. Within this context, this thesis proposes two test planning approaches that aim at reducing the test costs of a core-based system by means of hardware reuse and integration of the test planning into the design flow. The first approach considers systems whose cores are connected directly or through a functional bus. The test planning method consists of a comprehensive model that includes the definition of a multi-mode access mechanism inside the chip and a search algorithm for the exploration of the design space. The access mechanism model considers the reuse of functional connections as well as partial test buses, cores transparency, and other bypass modes. The test schedule is defined in conjunction with the access mechanism so that good trade-offs among the costs of pins, area, and test time can be sought. Furthermore, system power constraints are also considered. This expansion of concerns makes it possible an efficient, yet fine-grained search, in the huge design space of a reuse-based environment. Experimental results clearly show the variety of trade-offs that can be explored using the proposed model, and its effectiveness on optimizing the system test plan. Networks-on-chip are likely to become the main communication platform of systemson- chip. Thus, the second approach presented in this work proposes the reuse of the on-chip network for the test of the cores embedded into the systems that use this communication platform. A power-aware test scheduling algorithm aiming at exploiting the network characteristics to minimize the system test time is presented. The reuse strategy is evaluated considering a number of system configurations, such as different positions of the cores in the network, power consumption constraints and number of interfaces with the tester. Experimental results show that the parallelization capability of the network can be exploited to reduce the system test time, whereas area and pin overhead are strongly minimized. In this manuscript, the main problems of the test of core-based systems are firstly identified and the current solutions are discussed. The problems being tackled by this thesis are then listed and the test planning approaches are detailed. Both test planning techniques are validated for the recently released ITC’02 SoC Test Benchmarks, and further compared to other test planning methods of the literature. This comparison confirms the efficiency of the proposed methods.
|
130 |
Conception, simulation parallèle et implémentation de réseaux sur puce hautes performances tolérants aux fautes / Design, Parallel Simulation and Implementation of High-Performance Fault-Tolerant Network-on-Chip ArchitecturesCharif, Mohamed El Amir 17 November 2017 (has links)
Grâce à une réduction considérable dans les dimensions des transistors, les systèmes informatiques sont aujourd'hui capables d'intégrer un très grand nombre de cœurs de calcul en une seule puce (System-on-Chip, SoC). Faire communiquer les composants au sein d'une puce est aujourd'hui assuré par un réseau de commutation de paquet intégré, communément appelé Network-on-Chip (NoC). Cependant, le passage à des technologies de plus en plus réduites rend les circuits plus vulnérables aux fautes et aux défauts de fabrication. Le réseau sur puce peut donc se retrouver avec des routeurs ou des liens non-opérationnels, qui ne peuvent plus être utilisés pour le routage de paquets. Par conséquent, le niveau de flexibilité offert par l'algorithme de routage n'a jamais été aussi important. La première partie de cette thèse consiste à proposer une méthodologie généralisée, permettant de concevoir des algorithmes de routage hautement flexibles, combinant tolérance aux fautes et hautes performances, et ce pour n'importe quelle topologie réseau. Cette méthodologie est basée sur une nouvelle condition suffisante pour l'absence d'interblocages (deadlocks) qui, contrairement aux méthodes existantes qui imposent des restrictions importantes sur l'utilisation des buffers, s'évalue de manière dynamique en fonction de chaque paquet et ne requiert pas un partitionnement stricte des canaux virtuels (virtual channels). Il est montré que ce degré élevé de liberté dans l'utilisation des buffers a un impact positif à la fois sur les performances et sur la robustesse du NoC, sans pour autant augmenter la complexité en termes d'implémentation matérielle. La seconde partie de la thèse s'intéresse à une problématique plus spécifique, qui est celle du routage dans des topologies tri-dimensionnelles partiellement connectées, qui vont vraisemblablement être en vigueur à cause du coût important des connexions verticales, réalisées en utilisant la technologie TSV (Through-Silicon Via). Cette thèse introduit un nouvel algorithme de routage pour ce type d'architectures nommé "First-Last". Grâce à un placement original des canaux virtuels, cet algorithme est le seul capable de garantir la connectivité totale du réseau en présence d'un seul pilier de TSVs de coordonnées arbitraires, tout en ne requérant de canaux virtuels que sur deux des ports du routeur. Contrairement à d'autres algorithmes qui utilisent le même nombre total de canaux virtuels, First-Last n'impose aucune règle sur la position des piliers, ni sur les piliers à sélectionner durant l'exécution. De plus, l'algorithme proposé ayant été construit en utilisant la méthode décrite dans la première partie de la thèse, il offre une utilisation optimisée des canaux virtuels ajoutés. L'implémentation d'un nouvel algorithme de routage implique souvent des changements considérables au niveau de la microarchitecture des routeurs. L'évaluation de ces nouvelles solutions requiert donc une plateforme capable de simuler précisément l'architecture matérielle du réseau au cycle près. De plus, il est essentiel de tester les nouvelles architectures sur des tailles de réseau significativement grandes, pour s'assurer de leur scalabilité et leur applicabilité aux technologies émergentes (e.g. intégration 3D). Malheureusement, les simulateurs de réseaux sur puce existants ne sont pas capables d'effectuer des simulations sur de grands réseaux (milliers de cœurs) assez vite, et souvent, la précision des simulations doit être sacrifiée afin d'obtenir des temps de simulation raisonnables. En réponse à ce problème, la troisième et dernière partie de cette thèse est consacrée à la conception et au développement d'un modèle de simulation générique, extensible et parallélisable, exploitant la puissance des processeurs graphiques modernes (GPU). L'outil développé modélise l'architecture d'un routeur de manière très précise et peut simuler de très grands réseaux en des temps record. / Networks-on-Chip (NoCs) have proven to be a fast and scalable replacement for buses in current and emerging many-core systems. They are today an actively researched topic and various solutions are being explored to meet the needs of emerging applications in terms of performance, quality of service, power consumption, and fault-tolerance. This thesis presents contributions in two important areas of Network-on-Chip research:- The design of ultra-flexible high-performance deadlock-free routing algorithms for any topology.- The design and implementation of parallel cycle-accurate Network-on-Chip simulators for a fast evaluation of new NoC architectures.While aggressive technology scaling has its benefits in terms of delay, area and power, it is also known to increase the vulnerability of circuits, suggesting the need for fault-tolerant designs. Fault-tolerance in NoCs is directly tied to the degree of flexibility of the routing algorithm. High routing flexibility is also required in some irregular topologies, as is the case for TSV-based 3D Network-on-Chips, wherein only a subset of the routers are connected using vertical connections. Unfortunately, routing freedom is often limited by the deadlock-avoidance method, which statically restricts the set of virtual channels that can be acquired by each packet.The first part of this thesis tackles this issue at the source and introduces a new topology-agnostic methodology for designing ultra-flexible routing algorithms for Networks-on-Chips. The theory relies on a novel low-restrictive sufficient condition of deadlock-freedom that is expressed using the local information available at each router during runtime, making it possible to verify the condition dynamically in a distributed manner.A significant gain in both performance and fault-tolerance when using our methodology compared to the existing static channel partitioning methods is reported. Moreover, hardware synthesis results show that the newly introduced mechanisms have a negligible impact on the overall router area.In the second part, a novel routing algorithm for vertically-partially-connected 3D Networks-on-Chips called First-Last is constructed using the previously presented methodology.Thanks to a unique distribution of virtual channels, our algorithm is the only one capable of guaranteeing full connectivity in the presence of one TSV pillar in an arbitrary position, while requiring a low number of extra buffers (1 extra VC in the East and North directions). This makes First-Last a highly appealing cost-effective alternative to the state-of-the-art Elevator-First algorithm.Finally, the third and last part of this work presents the first detailed and modular parallel NoC simulator design targeting Graphics Processing Units (GPUs). First, a flexible task decomposition approach, specifically geared towards high parallelization is proposed. Our approach makes it easy to adapt the granularity of parallelism to match the capabilities of the host GPU. Second, all the GPU-specific implementation issues are addressed and several optimizations are proposed. Our design is evaluated through a reference implementation, which is tested on an NVidia GTX980Ti graphics card and shown to speed up 4K-node NoC simulations by almost 280x.
|
Page generated in 0.0541 seconds