181 |
Projeto de Sistemas Integrados de Prop?sito Geral Baseados em Redes em Chip Expandindo as Funcionalidades dos Roteadores para Execu??o de Opera??es: A plataforma IPNoSysAra?jo, S?lvio Roberto Fernandes de 30 March 2012 (has links)
Made available in DSpace on 2014-12-17T15:47:00Z (GMT). No. of bitstreams: 1
SilvioRFA_TESE.pdf: 5797455 bytes, checksum: 65da3be6db5be8c8185888e31c1f294c (MD5)
Previous issue date: 2012-03-30 / It bet on the next generation of computers as architecture with multiple processors and/or
multicore processors. In this sense there are challenges related to features interconnection, operating
frequency, the area on chip, power dissipation, performance and programmability. The mechanism of
interconnection and communication it was considered ideal for this type of architecture are the
networks-on-chip, due its scalability, reusability and intrinsic parallelism. The networks-on-chip
communication is accomplished by transmitting packets that carry data and instructions that
represent requests and responses between the processing elements interconnected by the network.
The transmission of packets is accomplished as in a pipeline between the routers in the network, from
source to destination of the communication, even allowing simultaneous communications between
pairs of different sources and destinations. From this fact, it is proposed to transform the entire
infrastructure communication of network-on-chip, using the routing mechanisms, arbitration and
storage, in a parallel processing system for high performance. In this proposal, the packages are
formed by instructions and data that represent the applications, which are executed on routers as
well as they are transmitted, using the pipeline and parallel communication transmissions. In
contrast, traditional processors are not used, but only single cores that control the access to memory.
An implementation of this idea is called IPNoSys (Integrated Processing NoC System), which has an
own programming model and a routing algorithm that guarantees the execution of all instructions in
the packets, preventing situations of deadlock, livelock and starvation. This architecture provides
mechanisms for input and output, interruption and operating system support. As proof of concept
was developed a programming environment and a simulator for this architecture in SystemC, which
allows configuration of various parameters and to obtain several results to evaluate it / Aposta-se na pr?xima gera??o de computadores como sendo de arquitetura com m?ltiplos
processadores e/ou processadores com v?rios n?cleos. Neste sentido h? desafios relacionados aos
mecanismos de interconex?o, frequ?ncia de opera??o, ?rea ocupada em chip, pot?ncia dissipada,
programabilidade e desempenho. O mecanismo de interconex?o e comunica??o considerado ideal
para esse tipo de arquitetura s?o as redes em chip, pela escalabilidade, paralelismo intr?nseco e
reusabilidade. A comunica??o nas redes em chip ? realizada atrav?s da transmiss?o de pacotes que
carregam dados e instru??es que representam requisi??es e respostas entre os elementos
processadores interligados pela rede. A transmiss?o desses pacotes acontece como em um pipeline
entre os roteadores da rede, da origem at? o destino da comunica??o, permitindo inclusive
comunica??es simult?neas entre pares de origem e destinos diferentes. Partindo desse fato, prop?ese
transformar toda a infraestrutura de comunica??o de uma rede em chip, aproveitando os
mecanismos de roteamento, arbitragem e memoriza??o em um sistema de processamento paralelo
de alto desempenho. Nessa proposta os pacotes s?o formados por instru??es e dados que
representam as aplica??es, os quais s?o executados nos roteadores enquanto s?o transmitidos,
aproveitando o pipeline das transmiss?es e a comunica??o paralela. Em contrapartida, n?o s?o
utilizados processadores tradicionais, mas apenas n?cleos simples que controlam o acesso a
mem?ria. Uma implementa??o dessa ideia ? a arquitetura intitulada IPNoSys (Integrated Processing
NoC System), que conta com um modelo de programa??o pr?prio e um algoritmo de roteamento que
garante a execu??o de todas as instru??es presentes nos pacotes, prevenindo situa??es de deadlock,
livelock e starvation. Essa arquitetura apresenta mecanismos de entrada e sa?da, interrup??o e
suporte ao sistema operacional. Como prova de conceito foi desenvolvido um ambiente de
programa??o e simula??o para esta arquitetura em SystemC, o qual permite a configura??o de v?rios
par?metros da arquitetura e obten??o dos resultados para avalia??o da mesma
|
182 |
Estudo da viabilidade do desenvolvimento de sistemas integrados baseados em redes em chip sem processadores: sistema IPNoSys / The study of viability of development of no processor integrated system based on network-on-chip: IPNoSys systemAra?jo, S?lvio Roberto Fernandes de 11 April 2008 (has links)
Made available in DSpace on 2014-12-17T15:47:45Z (GMT). No. of bitstreams: 1
SilvioRFA.pdf: 3522539 bytes, checksum: 0e7ac6eda46a29d5f5968d779986fb03 (MD5)
Previous issue date: 2008-04-11 / The increase of capacity to integrate transistors permitted to develop completed systems, with several components, in single chip, they are called SoC (System-on-Chip). However, the interconnection subsystem cans influence the scalability of SoCs, like buses, or can be an ad hoc solution, like bus hierarchy. Thus, the ideal interconnection subsystem to SoCs is the Network-on-Chip (NoC). The NoCs permit to use simultaneous point-to-point channels between components and they can be reused in other projects. However, the NoCs can raise the complexity of project, the area in chip and the dissipated power. Thus, it is necessary or to modify the way how to use them or to change the development paradigm. Thus, a system based on NoC is proposed, where the applications are described through packages and performed in each router between source and destination, without traditional processors. To perform applications, independent of number of instructions and of the NoC dimensions, it was developed the spiral complement algorithm, which finds other destination until all instructions has been performed.
Therefore, the objective is to study the viability of development that system, denominated IPNoSys system. In this study, it was developed a tool in SystemC, using accurate cycle, to simulate the system that performs applications, which was implemented in a package description language, also developed to this study. Through the simulation tool, several result were obtained that could be used to evaluate the system performance. The methodology used to describe the application corresponds to transform the high level application in data-flow graph that become one or more packages. This methodology was used in three applications: a counter, DCT-2D and float add. The counter was used to evaluate a deadlock solution and to perform parallel application. The DCT was used to compare to STORM platform. Finally, the float add aimed to evaluate the efficiency of the software routine to perform a unimplemented hardware instruction. The results from simulation confirm the viability of development of IPNoSys system. They showed that is possible to perform application described in packages, sequentially or parallelly, without interruptions caused by deadlock, and also showed that the execution time of IPNoSys is more efficient than the STORM platform / O aumento na capacidade de integra??o de transistores permitiu o desenvolvimento de sistemas completos, com in?meros componentes, dentro de um ?nico chip, s?o os chamados SoCs (System-on-Chip). No entanto, o subsistema de interconex?o utilizado pode limitar a escalabilidade dos SoCs, como os barramentos, ou ser uma solu??o ad hoc, como a hierarquia de barramentos. Desse modo, a solu??o ideal para interconex?o no SoCs s?o as redes em chip ou NoCs (Network-on-Chip). As NoCs permitem m?ltiplas conex?o ponto-a-ponto entre os componente e podem ser reusadas em projetos diversos. Entretanto, o uso de NoCs pode representar o aumento na complexidade do projeto do sistema, da ?rea em chip e/ou pot?ncia dissipada. Dessa forma, ? necess?rio ampliar o horizonte de utiliza??o dos sistemas ou quebrar o paradigma do seu desenvolvimento. Assim, ? proposto um sistema baseado em uma NoC, onde as aplica??es s?o descritas em forma de pacotes e executadas de roteador em roteador durante o percurso entre origem e destino dos pacotes, sem a necessidade do uso de processadores convencionais. Para permitir a execu??o de aplica??es, independente do n?mero de instru??es e das dimens?es da rede, foi desenvolvido o algoritmo spiral complement, que permite re-rotear pacotes at? que todas as instru??es contidas nele sejam executadas. Portanto, o objetivo desse trabalho foi estudar a viabilidade do desenvolvimento de tal sistema, denominado sistema IPNoSys. Nesse estudo, foi desenvolvida em SystemC, com precis?o de ciclo, uma ferramenta para simula??o do sistema, a qual permite executar aplica??es implementadas na linguagem de descri??o de pacotes, tamb?m desenvolvida para esse fim. Atrav?s da ferramenta podem ser obtidos diversos resultados que permitem avaliar o funcionamento e desempenho do sistema. A metodologia empregada para descri??o das aplica??es corresponde, a priori, em obter o grafo de fluxo de dados da aplica??o em alto n?vel, e desse grafo descrev?-la em um ou mais pacotes. Utilizando essa metodologia, foram realizados tr?s estudos de casos: contador, DCT-2D e adi??o de ponto flutuante. O contador foi usado para avaliar a capacidade do sistema em tratar situa??es de deadlock e executar aplica??es em paralelo. A DCT-2D foi utilizada para realizar compara??es com a plataforma STORM. E, finalmente, a adi??o de ponto flutuante teve como objetivo ser usada como rotina de tratamento de uma instru??o n?o implementada em hardware. Os resultados de simula??o apontam favoravelmente com rela??o ? viabilidade do desenvolvimento do sistema IPNoSys. Mostrando que ? poss?vel executar aplica??es em forma de pacotes, inclusive paralelamente, sem interrup??es provocadas por eventuais deadlocks, e ainda indicam maior efici?ncia do sistema IPNoSys a respeito do tempo de execu??o comparada a plataforma STORM
|
183 |
Simula??o de reservat?rios de petr?leo em ambiente MPSoC / Reservoir simulation in a MPSOC environmentOliveira, Bruno Cruz de 22 May 2009 (has links)
Made available in DSpace on 2014-12-17T15:47:50Z (GMT). No. of bitstreams: 1
BrunoCO.pdf: 708202 bytes, checksum: 3eb4368a0c268064bcd6ad892e1f2c0c (MD5)
Previous issue date: 2009-05-22 / The constant increase of complexity in computer applications demands the development of more powerful hardware support for them. With processor's operational frequency reaching its limit, the most viable solution is the use of parallelism. Based on parallelism techniques and the progressive growth in the capacity of transistors integration in a single chip is the concept of MPSoCs (Multi-Processor System-on-Chip). MPSoCs will eventually become a cheaper and faster alternative to supercomputers and clusters, and
applications developed for these high performance systems will migrate to computers equipped with MP-SoCs containing dozens to hundreds of computation cores. In particular, applications in the area of oil and natural gas exploration are also characterized by the high processing capacity required and would benefit greatly from these high performance systems. This work intends to evaluate a traditional and complex application of the oil and gas industry known as reservoir simulation, developing a solution with integrated
computational systems in a single chip, with hundreds of functional unities. For this, as the STORM (MPSoC Directory-Based Platform) platform already has a shared memory
model, a new distributed memory model were developed. Also a message passing library has been developed folowing MPI standard / O constante aumento da complexidade das aplica??es demanda um suporte de hardware computacionalmente mais poderoso. Com a aproxima??o do limite de velocidade
dos processadores, a solu??o mais vi?vel ? o paralelismo. Baseado nisso e na crescente capacidade de integra??o de transistores em um ?nico chip surgiram os chamados MPSoCs
(Multiprocessor System-on-Chip) que dever?o ser, em um futuro pr?ximo, uma alternativa mais r?pida e mais barata aos supercomputadores e clusters. Aplica??es tidas como
destinadas exclusivamente a execu??o nesses sistemas de alto desempenho dever?o migrar para m?quinas equipadas com MPSoCs dotados de dezenas a centenas de n?cleos
computacionais. Aplica??es na ?rea de explora??o de petr?leo e g?s natural tamb?m se caracterizam pela enorme capacidade de processamento requerida e dever?o se beneficiar desses novos sistemas de alto desempenho.
Esse trabalho apresenta uma avalia??o de uma tradicional e complexa aplica??o da ind?stria de petr?leo e g?s natural, a simula??o de reservat?rios, sob a nova ?tica do
desenvolvimento de sistemas computacionais integrados em um ?nico chip, dotados de dezenas a centenas de unidades funcionais. Para isso, um modelo de mem?ria distribu?da
foi desenvolvido para a plataforma STORM (MPSoC Directory-Based Platform), que j? contava com um modelo de mem?ria compartilhada. Foi desenvolvida, ainda, uma biblioteca de troca de mensagens para esse modelo de mem?ria seguindo o padr?o MPI
|
184 |
Avalia??o da execu??o de aplica??es orientadas ? dados na arquitetura de redes em chip IPNoSysNobre, Christiane de Ara?jo 17 August 2012 (has links)
Made available in DSpace on 2014-12-17T15:48:05Z (GMT). No. of bitstreams: 1
ChristianeAN_DISSERT.pdf: 2651034 bytes, checksum: 1c708aec5eba3fd620f2944124931c55 (MD5)
Previous issue date: 2012-08-17 / Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior / The increasing complexity of integrated circuits has boosted the development of communications architectures like Networks-on-Chip (NoCs), as an architecture; alternative for interconnection of Systems-on-Chip (SoC). Networks-on-Chip complain for component reuse, parallelism and scalability, enhancing reusability in projects of dedicated applications. In the literature, lots of proposals have been made, suggesting different configurations for networks-on-chip architectures. Among all networks-on-chip considered, the architecture of IPNoSys is a non conventional one, since it allows the execution of operations, while the communication process is performed. This study aims to evaluate the execution of data-flow based applications on IPNoSys, focusing on their adaptation against the design constraints. Data-flow based applications are characterized by the flowing of continuous stream of data, on which operations are executed. We expect that these type of applications can be improved when running on IPNoSys, because they have a programming model similar to the execution model of this network. By observing the behavior of these applications when running on IPNoSys, were performed changes in the execution model of the network IPNoSys, allowing the implementation of an instruction level parallelism. For these purposes, analysis of the implementations of dataflow applications were performed and compared / A crescente complexidade dos circuitos integrados impulsionou o surgimento de arquiteturas de comunica??o do tipo Redes em chip ou NoC (do ingl?s, Network-on-Chip), como alternativa de arquitetura de interconex?o para Sistemas-em-Chip (SoC; Systems-on-Chip). As redes em chip possuem capacidade de reuso de componentes, paralelismo e escalabilidade, permitindo a reutiliza??o em projetos diversos. Na literatura, t?m-se uma grande quantidade de propostas com diferentes configura??es de redes em chip. Dentre as redes em chip estudadas, a rede IPNoSys possui arquitetura diferenciada, pois permite a execu??o de opera??es, em conjunto com as atividades de comunica??o. Este trabalho visa avaliar a execu??o de aplica??es orientadas a dados na rede IPNoSys, focando na sua adequa??o frente ?s restri??es de projeto. As aplica??es orientadas a dados s?o caracterizadas pela comunica??o de um fluxo cont?nuo de dados sobre os quais, opera??es s?o executadas. Espera-se ent?o, que estas aplica??es possam ser beneficiadas quando de sua execu??o na rede IPNoSys, devido ao seu elevado grau de paralelismo e por possu?rem modelo de programa??o semelhante ao modelo de execu??o desta rede. Uma vez observadas a execu??o de aplica??es na rede IPNoSys, foram realizadas modifica??es no modelo de execu??o da rede IPNoSys, o que permitiu a explora??o do paralelismo em n?vel de instru??es. Para isso, an?lises das execu??es de aplica??es data flow foram realizadas e comparadas
|
185 |
Hiérarchie mémoire dans les systèmes intégrés multiprocesseurs construits autour de réseaux sur puce / Memory hierarchy in embedded multiprocessor system built around networks on chipBelhadj Amor, Hela 05 October 2017 (has links)
Les systèmes parallèles de type multi/pluri-cœurs permettant d'obtenir une grande puissance de calcul à bas coût énergétique sont de nos jours une réalité. Néanmoins, l'exploitation des performances de ces architectures dépend de l'efficacité du système à gérer les accès aux données. Le but de nos travaux est d'améliorer l'efficacité de ces accès en exploitant les caractéristiques de l'architecture matérielle.Dans une première partie, nous proposons une nouvelle organisation de la hiérarchie des mémoires caches qui maximise l'utilisation de l'espace de stockage disponible à chaque niveau. Cette solution, basée sur les architectures à accès non uniforme au cache (NUCA), supporte les transferts inter et intra-niveau de la hiérarchie. Elle requiert un protocole de cohérence de cache qui s'adapte à ses spécifications.Certes, le transfert des données au niveau de la hiérarchie est aussi un déterminant de la performance du système. Dans une seconde partie, nous prenons en compte les besoins de communication spécifiques du protocole. Nous proposons un réseau virtualisé comme support de communication ad-hoc afin de gérer le trafic de cohérence à moindre coût. Ce dernier relie les caches d'un même niveau pour supporter les transferts intra-niveaux, qui sont une spécificité de notre protocole, en vue de réduire la latence moyenne d'accès. / Multi/many-cores parallel systems for high-power computing at low energy costs are nowadays a reality. However, exploiting the performance of these architectures depends on the efficiency of the system in managing data accesses. The aim of our work is to improve the efficiency of these accesses by exploiting the hardware architecture characteristics.In a first part, we propose a new cache hierarchy organization that aims at maximizing the use of the available storage space at each level. This solution, based on non-uniform cache access architectures (NUCA), supports inter and intra-level transfers of the hierarchy. It requires a cache coherency protocol that suits its specifications.Obviously, the transfer of data in the hierarchy is also a determinant of the system performance. In a second part, we consider the specific communication needs of the protocol. We suggest the use of a virtualized network as an ad-hoc communication medium to manage consistency traffic at a lower cost. It links the caches of the same level to support intra-level transfers, which are a specificity of our protocol, in order to reduce the average access latency.
|
186 |
Worst-case delay analysis of core-to-IO flows over many-cores architectures / Analyse des délais pire cas des flux entre coeur et interfaces entrées/sorties sur des architectures pluri-coeursAbdallah, Laure 05 April 2017 (has links)
Les architectures pluri-coeurs sont plus intéressantes pour concevoir des systèmes en temps réel que les systèmes multi-coeurs car il est possible de les maîtriser plus facilement et d’intégrer un plus grand nombre d’applications, potentiellement de différents niveau de criticité. Dans les systèmes temps réel embarqués, ces architectures peuvent être utilisées comme des éléments de traitement au sein d’un réseau fédérateur car ils fournissent un grand nombre d’interfaces Entrées/Sorties telles que les contrôleurs Ethernet et les interfaces de la mémoire DDR-SDRAM. Aussi, il est possible d’y allouer des applications ayant différents niveaux de criticités. Ces applications communiquent entre elles à travers le réseau sur puce (NoC) du pluri coeur et avec des capteurs et des actionneurs via l’interface Ethernet. Afin de garantir les contraintes temps réel de ces applications, les délais de transmission pire cas (WCTT) doivent être calculés pour les flux entre les coeurs ("inter-core") et les flux entre les coeurs et les interfaces entrées/sorties ("core-to-I/O"). Plusieurs réseaux sur puce (NoCs) ciblant les systèmes en temps réel dur ont été conçus en s’appuyant sur des extensions matérielles spécifiques. Cependant, aucune de ces extensions ne sont actuellement disponibles dans les architectures de réseaux sur puce commercialisés, qui se basent sur la commutation wormhole avec la stratégie d’arbitrage par tourniquet. En utilisant cette stratégie de commutation, différents types d’interférences peuvent se produire sur le réseau sur puce entre les flux. De plus, le placement de tâches des applications critiques et non critiques a un impact sur les contentions que peut subir les flux "core-to-I/O". Ces flux "core-to-I/O" parcourent deux réseaux de vitesses différentes: le NoC et Ethernet. Sur le NoC, la taille des paquets autorisés est beaucoup plus petite que la taille des trames Ethernet. Ainsi, lorsque la trame Ethernet est transmise sur le NoC, elle est divisée en plusieurs paquets. La trame sera supprimée de la mémoire tampon de l’interface Ethernet uniquement lorsque la totalité des données aura été transmise. Malheureusement, la congestion du NoC ajoute des délais supplémentaires à la transmission des paquets et la taille de la mémoire tampon de l’interface Ethernet est limitée. En conséquence, ce comportement peut aboutir au rejet des trames Ethernet. L’idée donc est de pouvoir analyser les délais de transmission pire cas sur les NoC et de réduire leurs délais afin d’éviter ce problème de rejet. Dans cette thèse, nous montrons que le pessimisme de méthodes existantes de calcul de WCTT et les stratégies de placements existantes conduisent à rejeter des trames Ethernet en raison d’une congestion interne sur le NoC. Des propriétés des réseaux utilisant la commutation "wormhole" ont été définies et validées afin de mieux prendre en compte les conflits entre les flux. Une stratégie de placement de tâches qui prend en compte les communications avec les I/O a été ensuite proposée. Cette stratégie vise à diminuer les contentions des flux qui proviennent de l’I/O et donc de réduire leurs WCTTs. Les résultats obtenus par la méthode de calcul définie au cours de cette thèse montrent que les valeurs du WCTT des flux peuvent être réduites jusqu’à 50% par rapport aux valeurs de WCTT obtenues par les méthodes de calcul existantes. En outre, les résultats expérimentaux sur des applications avioniques réelles montrent des améliorations significatives des délais de transmission des flux "core-to-I/O", jusqu’à 94%, sans impact significatif sur ceux des flux "intercore". Ces améliorations sont dues à la stratégie d’allocation définie qui place les applications de manière à réduire l’impact des flux non critiques sur les flux critiques. Ces réductions de WCTT des flux "core-to-I/O" évitent le rejet des trames Ethernet. / Many-core architectures are more promising hardware to design real-time systems than multi-core systems as they should enable an easier mastered integration of a higher number of applications, potentially of different level of criticalities. In embedded real-time systems, these architectures will be integrated within backbone Ethernet networks, as they mostly provide Ethernet controllers as Input/Output(I/O) interfaces. Thus, a number of applications of different level of criticalities could be allocated on the Network-on-Chip (NoC) and required to communicate with sensors and actuators. However, the worst-case behavior of NoC for both inter-core and core-to-I/O communications must be established. Several NoCs targeting hard real-time systems, made of specific hardware extensions, have been designed. However, none of these extensions are currently available in commercially available NoC-based many-core architectures, that instead rely on wormhole switching with round-robin arbitration. Using this switching strategy, interference patterns can occur between direct and indirect flows on many-cores. Besides, the mapping over the NoC of both critical and non-critical applications has an impact on the network contention these core-to-I/O communications exhibit. These core-to-I/O flows (coming from the Ethernet interface of the NoC) cross two networks of different speeds: NoC and Ethernet. On the NoC, the size of allowed packets is much smaller than the size of Ethernet frames. Thus, once an Ethernet frame is transmitted over the NoC, it will be divided into many packets. When all the data corresponding to this frame are received by the DDR-SDRAM memory on the NoC, the frame is removed from the buffer of the Ethernet interface. In addition, the congestion on the NoC, due to wormhole switching, can delay these flows. Besides, the buffer in the Ethernet interface has a limited capacity. Then, this behavior may lead to a problem of dropping Ethernet frames. The idea is therefore to analyze the worst case transmission delays on the NoC and reduce the delays of the core-to-I/O flows. In this thesis, we show that the pessimism of the existing Worst-Case Traversal Time (WCTT) computing methods and the existing mapping strategies lead to drop Ethernet frames due to an internal congestion in the NoC. Thus, we demonstrate properties of such NoC-based wormhole networks to reduce the pessimism when modeling flows in contentions. Then, we propose a mapping strategy that minimizes the contention of core-to-I/O flows in order to solve this problem. We show that the WCTT values can be reduced up to 50% compared to current state-of-the-art real-time packet schedulability analysis. These results are due to the modeling of the real impact of the flows in contention in our proposed computing method. Besides, experimental results on real avionics applications show significant improvements of core-to-I/O flows transmission delays, up to 94%, without significantly impacting transmission delays of core-to-core flows. These improvements are due to our mapping strategy that allocates the applications in such a way to reduce the impact of non-critical flows on critical flows. These reductions on the WCTT of the core-to-I/O flows avoid the drop of Ethernet frames.
|
187 |
Contributions aux processeurs multi-coeurs massivement parallèles en technologie en rupture : routage tolérant aux fautes de réseau d'interconnexion et auto-adaptabilité des applications / Algorithms for the efficiency of unreliable multicore processors and their On-Chip interconnectChaix, Fabien 28 October 2013 (has links)
La perspective de technologies nanométriques permet d'envisager l'avènement de processeurs constitués de centaines de coeurs de calcul. Néanmoins, l'utilisation de ces processeurs nécessitera de pallier aux problèmes de fiabilité et de variabilité inhérents à ces procédés de fabrication agressifs. Dans cette thèse, nous présentons un ensemble cohérent de techniques pour l'utilisation de processeurs multi-coeurs massivement parallèles, soumis à de forts taux de variabilité et de défaillance. Tout d' abord, la fiabilité du réseau d'interconnexion est abordée, avec la présentation de plusieurs algorithmes de routage tolérants aux fautes, sans interblocages et sans table de routage pour une meilleure scalabilité. Les différentes variantes de ces algorithmes permettent d'ajuster la complexité du réseau sur puce, en fonction des besoins en fiabilité des applications. A titre d'exemple, le plus performant des algorithmes de routage peut acheminer les paquets tant qu'il existe un chemin sans défaillance, et ce jusqu'à 40% de ressources défectueuses. Plusieurs évolutions ont également été étudiées afin d'améliorer les performances du réseau en présence d'un nombre important de fautes. Ensuite, nous proposons une technique auto-adaptative de gestion des applications parallèles, basée sur un routage tolérant aux fautes. L'affectation dynamique des tâches se base sur la recherche adaptative des noeuds de calcul, afin de diminuer la consommation énergétique de l'application en présence de variabilité. Enfin, nous présentons un modèle de simulation de haut-niveau appelé VOCIS (Versatile On-Chip Interconnect Simulator), développé pendant cette thèse. Il permet l'étude approfondie des réseaux d'interconnexion et des routages tolérants aux fautes dans des conditions complexes, afin de répondre aux contraintes propres à ce travail. Nous décrivons son architecture et ses capacités de visualisation. Finalement, nous analysons et illustrons plusieurs résultats expérimentaux originaux obtenus avec ce modèle. / The perspective of nanometric technologies foreshadows the advent of processors consisting of hundreds of computation cores. However, the exploitation of these processors will require to cope with reliability and variability issues inherent to these aggressive manufacturing processes. In this thesis, we present a coherent set of techniques for the utilization of many-cores processors subject to high defect and variability rates. First, the interconnection network reliability is addressed, with the presentation of several deadlock-free fault-tolerant routing algorithms, without routing tables for improving their scalability. The different variants of these algorithms allow for the tune-up of NoC complexity, depending on applications' reliability requirements. For example, the most performant routing algorithm is able to transmit packets as long as a fault-free path exists, with defect rates as high as 40%. Evolutions have also been studied, in order to improve the interconnect performances in the presence of a large number of faults. Second, we propose a self-adaptive technique for the management of parallel applications, based on a fault-tolerant interconnect. The dynamic tasks mapping is based on the adaptive search of computing nodes, in order to reduce the application's energy consumption in the presnece of variability. Third, we present a high-level simulation model named VOCIS (Versatile On-Chip Interconnect Simulator), developed during this thesis. The model allows in-depth study of interconnection networks and fault-tolerant routings under complex settings, in order to meet the specific constraints of this work. The architecture and visualization features are described. Finally, we analyse and illustrate original experimental results obtained with this model.
|
188 |
Desenvolvimento de um sistema dinamicamente reconfigurável baseado em redes intra-chip e ferramenta para posicionamento de módulos. / Development of a dynamically reconfigurable systems under noc and CAD for modules mapping.Mario Andrés Raffo Jara 05 February 2010 (has links)
Os sistemas dinamicamente reconfiguráveis (SDRs) são uma alternativa para o desenvolvimento de sistemas sobre silício baseados em circuitos programáveis (SoPC), cujo principal beneficio é o bom aproveitamento da área do dispositivo. Sendo neles implementados circuitos que representam as tarefas que devem operar numa etapa específica do tempo de operação do sistema, permitem um menor consumo de área e de energia, parâmetros importantes nos sistemas portáveis. Isto tem gerado muito interesse no que se refere às metodologias de projeto utilizando FPGAs (Field Programmable Gate Arrays) dinamicamente reconfiguráveis (DRFPGAs) e à definição de um meio de comunicação estruturado para tratar da transferência de dados entre as partes reconfiguráveis e as fixas, mas estas tarefas, assim como a concretização de sua comunicação, seguem sendo ainda essencialmente manuais, devido à falta de metodologias de projeto e ferramentas de CAD que simplifiquem o projeto de SDRs. Este trabalho foca uma das limitações mais efetivas para a adoção da reconfiguração dinâmica: a falta de ferramentas de CAD que suportem o projeto de SDRs, inclusive os baseados em redes intra-chip (NoCs), em particular, no posicionamento dos módulos. Neste trabalho, uma arquitetura para SDRs baseado em NoCs é proposta e um algoritmo de posicionamento dos módulos de um SDR baseado em aspectos reais da família do DRFPGAs é desenvolvido, dentro de uma ferramenta denominada DynoPlace. Desenvolveu-se também um modelo de validação e simulação de SDRs, em tempo de operação, utilizando-se a técnica de chaveamento dinâmico de circuitos. Para o estudo do caso, de validação da arquitetura e metodologia, propõe-se uma aplicação teste baseada em computação de operações aritméticas. A metodologia de simulação permite determinar o tempo da reconfiguração e verificar o comportamento do SDR no momento da reconfiguração. A ferramenta DynoPlace permite gerar os arquivos de restrição de usuário (UCF) de posicionamento dos módulos do SDR no DRFPGA Virtex-4LX25. Este contém informações do posicionamento dos módulos do sistema, dos dispositivos usados para as entradas e saídas do sistema além do posicionamento dos bus-macros. Com os arquivos gerados pela metodologia e ferramenta DynoPlace, pode-se executar com sucesso os scripts da metodologia Early Access da Xilinx para gerar o SDR de forma automática. / Dynamically Reconfigurable Systems (DRSs) are an alternative for developing Systems on a Programmable Chip (SoPC), being the efficient use of device\'s area one of its main advantages. Circuits implemented as DRSs represent tasks which must be active in specific times into the system operation, allowing area and energy saving, which is an important goal for portable systems. This has generated interests on the design methodology using Dynamically Reconfigurable Field Programmable Gate Arrays (DRFPGAs) and on the definition of communication systems for handling data transfer between static and reconfigurable partitions. However, these tasks, as well as the communication structure, are still carried out manually due to lack of design methodologies and CAD tools applied to DRSs design. This work focuses on the one of main drawbacks to the adoption of dynamic reconfiguration methods: the absence of CAD tools which support DRS designs, specifically, in the module positioning task, included, for those based on Network-on-Chip (NoCs). In this work, an architecture for DRSs based on NoCs is presented and an algorithm for module positioning is developed in a tool called DynoPlace as well, based on real specifications of DRFPGAs families. It is also developed a run-time simulation and validation model for DRSs, through a dynamic circuit switching technique. For the validation of architecture and methodology study case, an application test based on arithmetic operations has been proposed. The simulations methodology allows to determine the reconfiguration time and verify the DRS behavior at the moment of reconfiguration. The DynoPlace tool allows to generate User Constraint File (UCF) of DRS\'s modules positioning for the DRFPGA Virtex-4LX25. This file contains information of modules positioning in the system, of the devices used for inputs and outputs of the system, and the positioning of bus-macros. After the files generation by the methodology, and the DynoPlace tool, it is possible to successfully execute the Early Access scripts for generating the DRS automatically.
|
189 |
Modélisation comportementale d'un réseau sur puce basé sur des interconnexions RF. / Behavioral modeling of a network on chip based on RF interconnections.Zerioul, Lounis 01 September 2015 (has links)
Le développement des systèmes multiprocesseurs intégrés sur puce (MPSoC) répond au besoin grandissant des architectures de calcul intensif. En revanche, l'évolution de leurs performances est entravée par leurs réseaux de communication sur puce (NoC) à cause de leur consommation d'énergie ainsi que du retard. C'est dans ce contexte que les NoC à base d'interconnexions RF et filaires (RFNoC) ont émergé. Afin de gérer au mieux et d'optimiser la conception d'un RFNoC, il est indispensable de développer une plateforme de simulation intégrant à la fois des circuits analogiques et numériques.Dans un premier temps, la simulation temporelle d'un RFNoC avec des composants dont les modèles sont idéaux est utilisée pour optimiser l'allocation des ressources spectrales disponibles. Le cas échéant, nous proposons des solutions pour améliorer la qualité de signal transmis. Dans un deuxième temps, nous avons développé en VHDL-AMS des modèles comportementaux et précis de chacun des composants du RFNoC. Les modèles de l'amplificateur faible bruit (LNA) et du mélangeur, prennent en compte les paramètres concernant, l'amplification, les non-linéarités, le bruit et la bande passante. Le modèle de l'oscillateur local considère les paramètresconventionnels, notamment le bruit de phase. Quant à la ligne de transmission, un modèle fréquentiel précis, incluant l'effet de peau est adapté pour les simulations temporelles. Ensuite, l'impact des paramètres des composants sur les performances du RFNoC est évalué afin d'anticiper les contraintes qui s'imposeront lors de la conception du RFNoC. / The development of multiprocessor systems integrated on chip (MPSoC) respondsto the growing need for intensive computation systems. However, the evolutionof their performances is hampered by their communication networks on chip(NoC) due to their energy consumption and delay. It is in this context that the wired RF network on chip (RFNoC) was emerged. In order to better manage and optimize the design of an RFNoC, it is necessary to develop a simulation platform adressing both analog and digital circuits.First, a time domaine simulation of an RFNoC with components whose modelsare ideal is used to optimize the allocation of the available spectrum resources. Where appropriate, we provide solutions to improve the quality of transmitted signal. Secondly, we have developed, in VHDL-AMS, behavioral and accurate models of all RFNoC components. The models of the low noise amplifier (LNA) and the mixer take into account the parameters for the amplification, nonlinearities, noise and bandwidth. The model of the local oscillator considers the conventional parameters, including its phase noise. Concerning the transmission line, an accurate frequency model, including the skin effect is adapted for time domaine simulations. Then, the impact of component parameters on RFNoC performances is evaluatedto anticipate constraints of the RFNoC design.
|
190 |
Architectural exploration of network Interface for energy efficient 3D optical network-on-chip / Exploration architecturale d'un système 3D multi-coeurs communiquant par réseau optique embarqué sur pucePham, Van Dung 13 December 2018 (has links)
Depuis quelques années, les réseaux optiques sur puce (ONoC) sont devenus une solution intéressante pour surpasser les limitations des interconnexions électriques, compte tenu de leurs caractéristiques attractives concernant la consommation d’énergie, le délai de transfert et la bande passante. Cependant, les éléments optiques nécessaires pour définir un tel réseau souffrent d’imperfections qui introduisent des pertes durant les communications. De plus, l'utilisation de la technique de multiplexage en longueurs d'ondes (WDM) permet d'augmenter les performances, mais introduit de nouvelles pertes et de la diaphonie entre les longueurs d'ondes, ce qui a pour effet de réduire le rapport signal sur bruit et donc la qualité de la communication. Les contributions présentées dans ce manuscrit adressent cette problématique d’amélioration de performance des liens optiques dans un ONoC. Pour cela, nous proposons tout d’abord un modèle analytique des pertes et de la diaphonie dans un réseau optique sur puce WDM. Nous proposons ensuite une méthodologie pour améliorer les performances globales du système s'appuyant sur l'utilisation de codes correcteurs d'erreurs. Nous présentons deux types de codes, le premier(Hamming) est d'une complexité d'implémentation faible alors que le second(Reed-Solomon) est plus complexe, mais offre un meilleur taux de correction. Nous avons implémenté des blocs matériels supportant ces corrections d'erreurs avec une technologie 28nm FDSOI. Finalement, nous proposons la définition d'une interface complète entre le domaine électrique et le domaine optique permettant d'allouer les longueurs d'ondes, de coder l'information, de sérialiser le flux de données et de contrôler le driver du laser pour obtenir la modulation à la puissance optique souhaitée. / Electrical Network-on-Chip (ENoC) has long been considered as the de facto technology for interconnects in multiprocessor systems-on-chip (MPSoCs). However, with the increase of the number of cores integrated on a single chip, ENoCs are less and less suitable to adapt the bandwidth and latency requirements of nowadays complex and highly-parallel applications. In recent years, due to power consumption constraint, low latency, and high data bandwidth requirements, optical interconnects became an interesting solution to overcome these limitations. Indeed, Optical Networks on Chip (ONoC) are based on waveguides which drive optical signals from source to destination with very low latency. Unfortunately, the optical devices used to built ONoCs suffer from some imperfections which introduce losses during communications. These losses (crosstalk noises and optical losses) are very important factors which impact the energy efficiency and the performance of the system. Furthermore, Wavelength Division Multiplexing (WDM) technology can help the designer to improve ONoC performance, especially the bandwidth and the latency. However, using the WDM technology leads to introduce new losses and crosstalk noises which negatively impact the Signal to Noise Ratio (SNR) and Bit Error Rate (BER). In detail, this results in higher BER and increases power consumption, which therefore reduces the energy efficiency of the optical interconnects. The contributions presented in this manuscript address these issues. For that, we first model and analyze the optical losses and crosstalk in WDM based ONoC. The model can provide an analytical evaluation of the worst case of loss and crosstalk with different parameters for optical ring network-on-chip. Based on this model, we propose a methodology to improve the performance and then to reduce the power consumption of optical interconnects relying on the use of forward error correction (FEC). We present two case studies of lightweight FEC with low implementation complexity and high error-correction performance under 28nm Fully-Depleted Silicon-On-Insulator (FDSOI) technology. The results demonstrate the advantages of using FEC on the optical interconnect in the context of the CHAMELEON ONoC. Secondly, we propose a complete design of Optical Network Interface (ONI) which is composed of data flow allocation, integrated FECs, data serialization/deserialization, and control of the laser driver. The details of these different elements are presented in this manuscript. Relying on this network interface, an allocation management to improve energy efficiency can be supported at runtime depending on the application demands. This runtime management of energy vs. performance can be integrated into the ONI manager through configuration manager located in each ONI. Finally, the design of an ONoC configuration sequencer (OCS), located at the center of the optical layer, is presented. By using the ONI manager, the OCS can configure ONoC at runtime according to the application performance and energy requirements.
|
Page generated in 0.0259 seconds