11 |
Arquitetura de uma rede de interconexão com memória compartilhada baseada na topologia crossbar / Architecture of an interconnection network with shared memory based on the topology crossbar.Fábio Gonçalves Pessanha 22 March 2013 (has links)
Multi-Processor System-on-Chip (MPSoC) possui vários processadores, em um único chip. Várias aplicações podem ser executadas de maneira paralela ou uma aplicação paralelizável pode ser particionada e alocada em cada processador, a fim de acelerar a sua execução. Um problema em MPSoCs é a comunicação entre os processadores, necessária para a execução destas aplicações. Neste trabalho, propomos uma arquitetura de
rede de interconexão baseada na topologia crossbar, com memória compartilhada. Esta arquitetura é parametrizável, possuindo N processadores e N módulos de memórias. A
troca de informação entre os processadores é feita via memória compartilhada. Neste tipo de implementação cada processador executa a sua aplicação em seu próprio módulo de memória. Através da rede, todos os processadores têm completo acesso a seus módulos
de memória simultaneamente, permitindo que cada aplicação seja executada concorrentemente. Além disso, um processador pode acessar outros módulos de memória, sempre que necessite obter dados gerados por outro processador. A arquitetura proposta é modelada
em VHDL e seu desempenho é analisado através da execução paralela de uma aplicação, em comparação à sua respectiva execução sequencial. A aplicação escolhida consiste na otimização de funções objetivo através do método de Otimização por Enxame de Partículas (Particle Swarm Optimization - PSO). Neste método, um enxame de partículas é distribuído igualmente entre os processadores da rede e, ao final de cada interação, um processador acessa o módulo de memória de outro processador, a fim de obter a melhor posição encontrada pelo enxame alocado neste. A comunicação entre processadores é baseada
em três estratégias: anel, vizinhança e broadcast. Essa aplicação foi escolhida por ser computacionalmente intensiva e, dessa forma, uma forte candidata a paralelização. / Multi-Processor System-on-Chip (MPSoC) has multiple processors in a single chip.
Multiple applications can be executed in parallel or a parallelizable application can be
partitioned and allocated to each processor in order to accelerate their execution. One
problem in MPSoCs is the communication between the processors required to implement
these applications. In this work, we propose the architecture of an interconnection network
based on the crossbar topology, with shared memory. This architecture is parameterizable,
having N processors and N memory modules. The exchange of information between
processors is done via shared memory. In this type of implementation each processor
executes its application stored in its own memory module. Through the network, all
processors have complete access to their own memory modules simultaneously allowing
each application to run concurrently. Moreover, a processor can access other memory
modules, whenever it needs to retrieve data generated by another processor. The proposed
architecture is modelled in VHDL and its performance is analysed by the execution of a
parallel aplication, in comparison to its sequencial one. The chosen application consists
of optimizing some objetive functions by using the Particle Swarm Optimization method.
In this method, particles of a swarm are distributed among the processors and, at the
end of each iteration, a processor accesses the memory module of another one in order
to obtain the best position found in the swarm. The communication between processors
is based on three strategies: ring, neighbourhood and broadcast. This application was
chosen due to its computational intensive characteristic and, therefore, a strong candidate
for parallelization.
|
12 |
Optimisation de mémoires PCRAM pour générations sub-40 nm : intégration de matériaux alternatifs et structures innovantes. / PCRAM optimisation for sub-40nm technology nodes : integration of alternative materials and innovative structuresHubert, Quentin 17 December 2013 (has links)
Au cours des dernières années, la demande de plus en plus forte pour des mémoires non-volatiles performantes, a mené au développement des technologies NOR Flash et NAND Flash, qui dominent aujourd'hui le marché des mémoires non-volatiles. Cependant, la miniaturisation de ces technologies, qui permettait d'en réduire le coût, laisse aujourd'hui entrevoir ses limites. En conséquence, des mémoires alternatives et émergentes sont développées, et parmi celles-ci, la technologie des mémoires à changement de phase, ou PCRAM, est l'une des candidates les plus prometteuses tant pour remplacer les mémoires Flash, notamment de type NOR, que pour accéder à de nouveaux marchés tels que le marché des SCM. Toutefois, afin d'être pleinement compétitives avec les autres technologies mémoires, certaines performances de la technologie PCRAM doivent encore être améliorées. Au cours de cette thèse, nous cherchons donc à obtenir des dispositifs PCRAM plus performants. Parmi les résultats présentés, nous réduisons les courants de programmation et la consommation électrique des dispositifs tout en augmentant la rétention de l'information à haute température. Pour cela, nous modifions la structure du dispositif ou nous utilisons un matériau à changement de phase alternatif. De plus, à l'aide de solutions innovantes, nous permettons aux dispositifs PCRAM de conserver l'information pendant une éventuelle étape de soudure de la puce mémoire. Enfin, nous avons conçu, développé et validé un procédé de fabrication permettant d'intégrer une diode PN de sélection en Silicium en série avec un élément résistif PCRAM, démontrant l'intérêt de ce sélecteur vertical pour être utilisées comme élément de sélection d'une cellule PCRAM intégrée au sein d'une architecture crossbar. / In the past few years, the increasing demand for high quality non-volatile memory (NVM) devices, leads to the developpment of NOR Flash and NAND Flash technologies, which are now the two main NVM players. However, because of some limitations such as performance degradation and limited cost reduction, the scaling of these technologies will reach in the next few years. Therefore, new NVM technologies are under development and among them, phase-change memory (PCM) technology has attracted strong interest and is now became one of the most promising candidates in order to replace Flash technologies, especially NOR Flash technology, and to address new memory markets, such as storage-class-memory market. However, in order to fully take their role in the memory arena, some performances of the PCM technology have to be improved. Therefore, during this PhD, we have tried to improve PCM devices electrical performances by reducing both programming currents and energy consumption while increasing high-temperature data-retention. To this extent, we have studied innovative device structure and alternative phasechange material. Moreover, using innovatives solutions, we show that our PCM devices could store data during the soldering step of the memory chipset. Finally, we have conceived, developed and validated, a process flow in order to make 1D1R PCM cell with Silicon-based vertical PN diodes, proving the relevance of this selector for PCRAM-based crossbar architecture.
|
13 |
Performance and Energy Efficient Building Blocks for Network-on-Chip ArchitecturesVangal, Sriram R. January 2006 (has links)
The ever shrinking size of the MOS transistors brings the promise of scalable Network-on-Chip (NoC) architectures containing hundreds of processing elements with on-chip communication, all integrated into a single die. Such a computational fabric will provide high levels of performance in an energy efficient manner. To mitigate emerging wire-delay problem and to address the need for substantial interconnect bandwidth, packet switched routers are fast replacing shared buses and dedicated wires as the interconnect fabric of choice. With on-chip communication consuming a significant portion of the chip power and area budgets, there is a compelling need for compact, low power routers. While applications dictate the choice of the compute core, the advent of multimedia applications, such as 3D graphics and signal processing, places stronger demands for self-contained, low-latency floating-point processors with increased throughput. Therefore, this work focuses on two key building blocks critical to the success of NoC design: high performance, area and energy efficient router and floating-point processor architectures. This thesis first presents a six-port four-lane 57 GB/s non-blocking router core based on wormhole switching. The router features double-pumped crossbar channels and destinationaware channel drivers that dynamically configure based on the current packet destination. This enables 45% reduction in crossbar channel area, 23% overall router area, up to 3.8X reduction in peak channel power, and 7.2% improvement in average channel power, with no performance penalty over a published design. In a 150nm six-metal CMOS process, the 12.2mm2 router contains 1.9 million transistors and operates at 1GHz at 1.2V. We next present a new pipelined single-precision floating-point multiply accumulator core (FPMAC) featuring a single-cycle accumulate loop using base 32 and internal carry-save arithmetic, with delayed addition techniques. Combined algorithmic, logic and circuit techniques enable multiply-accumulates at speeds exceeding 3GHz, with single-cycle throughput. Unlike existing FPMAC architectures, the design eliminates scheduling restrictions between consecutive FPMAC instructions. The optimizations allow removal of the costly normalization step from the critical accumulate loop and conditionally powered down using dynamic sleep transistors on long accumulate operations, saving active and leakage power. In addition, an improved leading zero anticipator (LZA) and overflow detection logic applicable to carry-save format is presented. In a 90nm seven-metal dual-VT CMOS process, the 2mm2 custom design contains 230K transistors. The fully functional first silicon achieves 6.2 GFLOPS of performance while dissipating 1.2W at 3.1GHz, 1.3V supply. It is clear that realization of successful NoC designs require well balanced decisions at all levels: architecture, logic, circuit and physical design. Our results from key building blocks demonstrate the feasibility of pushing the performance limits of compute cores and communication routers, while keeping active and leakage power, and area under control. / Report code: LiU-TEK-LIC-2006:36.
|
14 |
Performance and Energy Efficient Building Blocks for Network-on-Chip ArchitecturesVangal, Sriram R. January 2006 (has links)
<p>The ever shrinking size of the MOS transistors brings the promise of scalable Network-on-Chip (NoC) architectures containing hundreds of processing elements with on-chip communication, all integrated into a single die. Such a computational fabric will provide high levels of performance in an energy efficient manner. To mitigate emerging wire-delay problem and to address the need for substantial interconnect bandwidth, packet switched routers are fast replacing shared buses and dedicated wires as the interconnect fabric of choice. With on-chip communication consuming a significant portion of the chip power and area budgets, there is a compelling need for compact, low power routers. While applications dictate the choice of the compute core, the advent of multimedia applications, such as 3D graphics and signal processing, places stronger demands for self-contained, low-latency floating-point processors with increased throughput. Therefore, this work focuses on two key building blocks critical to the success of NoC design: high performance, area and energy efficient router and floating-point processor architectures.</p><p>This thesis first presents a six-port four-lane 57 GB/s non-blocking router core based on wormhole switching. The router features double-pumped crossbar channels and destinationaware channel drivers that dynamically configure based on the current packet destination. This enables 45% reduction in crossbar channel area, 23% overall router area, up to 3.8X reduction in peak channel power, and 7.2% improvement in average channel power, with no performance penalty over a published design. In a 150nm six-metal CMOS process, the 12.2mm2 router contains 1.9 million transistors and operates at 1GHz at 1.2V. We next present a new pipelined single-precision floating-point multiply accumulator core (FPMAC) featuring a single-cycle accumulate loop using base 32 and internal carry-save arithmetic, with delayed addition techniques. Combined algorithmic, logic and circuit techniques enable multiply-accumulates at speeds exceeding 3GHz, with single-cycle throughput. Unlike existing FPMAC architectures, the design eliminates scheduling restrictions between consecutive FPMAC instructions. The optimizations allow removal of the costly normalization step from the critical accumulate loop and conditionally powered down using dynamic sleep transistors on long accumulate operations, saving active and leakage power. In addition, an improved leading zero anticipator (LZA) and overflow detection logic applicable to carry-save format is presented. In a 90nm seven-metal dual-VT CMOS process, the 2mm2 custom design contains 230K transistors. The fully functional first silicon achieves 6.2 GFLOPS of performance while dissipating 1.2W at 3.1GHz, 1.3V supply.</p><p>It is clear that realization of successful NoC designs require well balanced decisions at all levels: architecture, logic, circuit and physical design. Our results from key building blocks demonstrate the feasibility of pushing the performance limits of compute cores and communication routers, while keeping active and leakage power, and area under control.</p> / Report code: LiU-TEK-LIC-2006:36.
|
15 |
Exploring hierarchy, adaptability and 3D in NoCs for the next generation of MPSoCs / Explorando hierarquia, adaptabilidade e 3D em NoCs para a próxima geração de MPSoCsMatos, Débora da Silva Motta January 2014 (has links)
A demanda por sistemas com elevado desempenho tem trazido a necessidade de aumentar o número de elementos de processamento, surgindo os chamados Sistemas em Chip Multiprocessados (MPSoCs). Além disso, com a possibilidade de redução da escala tecnológica na era submicrônica, permitindo a integração de vários dispositivos, os chips têm se tornado ainda mais complexos. No entanto, com o aumento no número de elementos de processamento, as interconexões são vistas com o principal gargalo dos sistemas-em-chip. Com isso, uma preocupação na forma como tais elementos se comunicam e estão interconectados tem sido levantada, uma vez que tais características são cruciais nos aspectos de desempenho, energia e potência, principalmente em sistemas embarcados. Essa necessidade permitiu o advento das redes-em-chip (Networks-on-Chip – NoCs) e inúmeros estudos já foram realizados para tais dispositivos. No entanto, devido ao aceleramento tecnológico atual, que traz a necessidade por sistemas ainda mais complexos, que consumam baixa energia e que permitam que as aplicações sejam constantemente atualizadas sem perder as características de desempenho, as arquiteturas de interconexão tradicionais não serão suficientes para satisfazer tais desafios. Outras alternativas de interconexão para MPSoCs precisam ser investigadas e nesse trabalho novas arquiteturas para NoCs com tais requisitos são apresentadas. As soluções propostas exploram hierarquia, adaptabilidade e interconexões em três dimensões. Esse trabalho aborda a necessidade do uso de diferentes estratégias em NoCs a fim de atingir os requisitos de desempenho e baixo consumo de potência dos atuais e futuros MPSoCs. Dessa forma, serão verificadas as diversas arquiteturas de interconexões para sistemas heterogêneos, sua escalabilidade, suas principais características e as vantagens das propostas apresentadas sobre as demais soluções. / The demand for systems with high performance has brought the need to increase the number of cores, emerging the called Multi-Processors System-on-Chip (MPSoCs). Also, with the shrinking feature size in deep-submicron era, allowing the integration of several devices, chips have become even more complex. However, with the increase in these elements, interconnections are seen as the main bottleneck in many core systemson- chip. With this, a concern about how these devices communicate and are interconnected has been raised, since these features are crucial for the performance, energy and power consumption aspects, mainly in embedded systems. This need allows the advent of the Networks-on-Chip (NoCs) and countless studies had already been done to analyze such interconnection devices. However, due to the current technological accelerating that brings the need for even more complex systems, consuming lower energy and providing constant application updates without losing performance features, traditional interconnect architectures will not be sufficient to satisfy such challenges. Other interconnecting alternatives for MPSoCs need to be investigated and in this work, novel architectures for NoCs meeting such requirements are presented. The proposed solutions explore hierarchy, adaptability and three dimensional interconnections. This work approaches the requirements in the use of different strategies for NoCs in order to reach the performance requisites and low power consumption of the current and future MPSoCs. Hence, in this approach, several interconnection architectures for heterogeneous systems, their scalability and the main features and advantages of the proposed strategies in comparison with others will be verified.
|
16 |
Exploring hierarchy, adaptability and 3D in NoCs for the next generation of MPSoCs / Explorando hierarquia, adaptabilidade e 3D em NoCs para a próxima geração de MPSoCsMatos, Débora da Silva Motta January 2014 (has links)
A demanda por sistemas com elevado desempenho tem trazido a necessidade de aumentar o número de elementos de processamento, surgindo os chamados Sistemas em Chip Multiprocessados (MPSoCs). Além disso, com a possibilidade de redução da escala tecnológica na era submicrônica, permitindo a integração de vários dispositivos, os chips têm se tornado ainda mais complexos. No entanto, com o aumento no número de elementos de processamento, as interconexões são vistas com o principal gargalo dos sistemas-em-chip. Com isso, uma preocupação na forma como tais elementos se comunicam e estão interconectados tem sido levantada, uma vez que tais características são cruciais nos aspectos de desempenho, energia e potência, principalmente em sistemas embarcados. Essa necessidade permitiu o advento das redes-em-chip (Networks-on-Chip – NoCs) e inúmeros estudos já foram realizados para tais dispositivos. No entanto, devido ao aceleramento tecnológico atual, que traz a necessidade por sistemas ainda mais complexos, que consumam baixa energia e que permitam que as aplicações sejam constantemente atualizadas sem perder as características de desempenho, as arquiteturas de interconexão tradicionais não serão suficientes para satisfazer tais desafios. Outras alternativas de interconexão para MPSoCs precisam ser investigadas e nesse trabalho novas arquiteturas para NoCs com tais requisitos são apresentadas. As soluções propostas exploram hierarquia, adaptabilidade e interconexões em três dimensões. Esse trabalho aborda a necessidade do uso de diferentes estratégias em NoCs a fim de atingir os requisitos de desempenho e baixo consumo de potência dos atuais e futuros MPSoCs. Dessa forma, serão verificadas as diversas arquiteturas de interconexões para sistemas heterogêneos, sua escalabilidade, suas principais características e as vantagens das propostas apresentadas sobre as demais soluções. / The demand for systems with high performance has brought the need to increase the number of cores, emerging the called Multi-Processors System-on-Chip (MPSoCs). Also, with the shrinking feature size in deep-submicron era, allowing the integration of several devices, chips have become even more complex. However, with the increase in these elements, interconnections are seen as the main bottleneck in many core systemson- chip. With this, a concern about how these devices communicate and are interconnected has been raised, since these features are crucial for the performance, energy and power consumption aspects, mainly in embedded systems. This need allows the advent of the Networks-on-Chip (NoCs) and countless studies had already been done to analyze such interconnection devices. However, due to the current technological accelerating that brings the need for even more complex systems, consuming lower energy and providing constant application updates without losing performance features, traditional interconnect architectures will not be sufficient to satisfy such challenges. Other interconnecting alternatives for MPSoCs need to be investigated and in this work, novel architectures for NoCs meeting such requirements are presented. The proposed solutions explore hierarchy, adaptability and three dimensional interconnections. This work approaches the requirements in the use of different strategies for NoCs in order to reach the performance requisites and low power consumption of the current and future MPSoCs. Hence, in this approach, several interconnection architectures for heterogeneous systems, their scalability and the main features and advantages of the proposed strategies in comparison with others will be verified.
|
17 |
Exploring hierarchy, adaptability and 3D in NoCs for the next generation of MPSoCs / Explorando hierarquia, adaptabilidade e 3D em NoCs para a próxima geração de MPSoCsMatos, Débora da Silva Motta January 2014 (has links)
A demanda por sistemas com elevado desempenho tem trazido a necessidade de aumentar o número de elementos de processamento, surgindo os chamados Sistemas em Chip Multiprocessados (MPSoCs). Além disso, com a possibilidade de redução da escala tecnológica na era submicrônica, permitindo a integração de vários dispositivos, os chips têm se tornado ainda mais complexos. No entanto, com o aumento no número de elementos de processamento, as interconexões são vistas com o principal gargalo dos sistemas-em-chip. Com isso, uma preocupação na forma como tais elementos se comunicam e estão interconectados tem sido levantada, uma vez que tais características são cruciais nos aspectos de desempenho, energia e potência, principalmente em sistemas embarcados. Essa necessidade permitiu o advento das redes-em-chip (Networks-on-Chip – NoCs) e inúmeros estudos já foram realizados para tais dispositivos. No entanto, devido ao aceleramento tecnológico atual, que traz a necessidade por sistemas ainda mais complexos, que consumam baixa energia e que permitam que as aplicações sejam constantemente atualizadas sem perder as características de desempenho, as arquiteturas de interconexão tradicionais não serão suficientes para satisfazer tais desafios. Outras alternativas de interconexão para MPSoCs precisam ser investigadas e nesse trabalho novas arquiteturas para NoCs com tais requisitos são apresentadas. As soluções propostas exploram hierarquia, adaptabilidade e interconexões em três dimensões. Esse trabalho aborda a necessidade do uso de diferentes estratégias em NoCs a fim de atingir os requisitos de desempenho e baixo consumo de potência dos atuais e futuros MPSoCs. Dessa forma, serão verificadas as diversas arquiteturas de interconexões para sistemas heterogêneos, sua escalabilidade, suas principais características e as vantagens das propostas apresentadas sobre as demais soluções. / The demand for systems with high performance has brought the need to increase the number of cores, emerging the called Multi-Processors System-on-Chip (MPSoCs). Also, with the shrinking feature size in deep-submicron era, allowing the integration of several devices, chips have become even more complex. However, with the increase in these elements, interconnections are seen as the main bottleneck in many core systemson- chip. With this, a concern about how these devices communicate and are interconnected has been raised, since these features are crucial for the performance, energy and power consumption aspects, mainly in embedded systems. This need allows the advent of the Networks-on-Chip (NoCs) and countless studies had already been done to analyze such interconnection devices. However, due to the current technological accelerating that brings the need for even more complex systems, consuming lower energy and providing constant application updates without losing performance features, traditional interconnect architectures will not be sufficient to satisfy such challenges. Other interconnecting alternatives for MPSoCs need to be investigated and in this work, novel architectures for NoCs meeting such requirements are presented. The proposed solutions explore hierarchy, adaptability and three dimensional interconnections. This work approaches the requirements in the use of different strategies for NoCs in order to reach the performance requisites and low power consumption of the current and future MPSoCs. Hence, in this approach, several interconnection architectures for heterogeneous systems, their scalability and the main features and advantages of the proposed strategies in comparison with others will be verified.
|
18 |
Multi-core Architectures for Feed-forward Neural NetworksHasan, Md. Raqibul 05 June 2014 (has links)
No description available.
|
19 |
Memristor Based Low Power High Throughput Circuits and Systems DesignHasan, Md Raqibul 17 May 2016 (has links)
No description available.
|
20 |
Balancing Performance, Area, and Power in an On-Chip NetworkGold, Brian 06 August 2003 (has links)
Several trends can be observed in modern microprocessor design. Architectures have become increasingly complex while design time continues to dwindle. As feature sizes shrink, wire resistance and delay increase, limiting architects from scaling designs centered around a single thread of execution. Where previous decades have focused on exploiting instruction-level parallelism, emerging applications such as streaming media and on-line transaction processing have shown greater thread-level parallelism. Finally, the increasing gap between processor and off-chip memory speeds has constrained performance of memory-intensive applications.
The Single-Chip Message Passing (SCMP) parallel computer sits at the confluence of these trends. SCMP is a tiled architecture consisting of numerous thread-parallel processor and memory nodes connected through a structured interconnection network. Using an interconnection network removes global, ad-hoc wiring that limits scalability and introduces design complexity. However, routing data through general purpose interconnection networks can come at the cost of dedicated bandwidth, longer latency, increased area, and higher power consumption. Understanding the impact architectural decisions have on cost and performance will aid in the eventual adoption of general purpose interconnects.
This thesis covers the design and analysis of the on-chip network and its integration with the SCMP system. The result of these efforts is a framework for analyzing on-chip interconnection networks that considers network performance, circuit area, and power consumption. / Master of Science
|
Page generated in 0.0282 seconds