Spelling suggestions: "subject:"networkteaching""
61 |
Estimativa de desempenho de uma NoC a partir de seu modelo em SYSTEMC-TLM. / A NoC performance evaluation from a SYSTEMC - TLM model.Sepúlveda Flórez, Martha Johanna 16 October 2006 (has links)
The wide variety of interconnection structures presently nowadays for SoC (Systemon- Chip), bus and networks-on-Chip NoCs, each of them with a wide set of setup parameters, provides a huge amount of design alternatives. Although the interconnection structure is a key SoC component, there are few design tools in order to set the appropriate configuration parameters for a given application. An efficient SoC project may comply an exploration stage among the possible solutions for the communication structure, during the first steps of the design process. The absence of appropriate tools for that exploration makes critical the designer?s judgment. The present study aims to enhance the communication SoC structure design area, when a NoC is used. This work proposes a methodology that allows the establishment of the NoC communication parameters using a high level model (SystemC TLM timed). Our approach analyzes and evaluates the NoC performance under a wide variety of traffic conditions. The experimental stage was conducted employing a model of a net represented by a SystemC TLM timed (Hermes_Temp). Parametric and pseudo-random generators control the network traffic. The analysis was carried on with a tool designed for these purpose, which generates a group of performance metrics. The results allow to elucidate the global and inner network behavior. The performance values are useful for the heterogeneous and homogeneous NoC design projects, improving the performance evaluation studies scope. / The wide variety of interconnection structures presently nowadays for SoC (Systemon- Chip), bus and networks-on-Chip NoCs, each of them with a wide set of setup parameters, provides a huge amount of design alternatives. Although the interconnection structure is a key SoC component, there are few design tools in order to set the appropriate configuration parameters for a given application. An efficient SoC project may comply an exploration stage among the possible solutions for the communication structure, during the first steps of the design process. The absence of appropriate tools for that exploration makes critical the designer?s judgment. The present study aims to enhance the communication SoC structure design area, when a NoC is used. This work proposes a methodology that allows the establishment of the NoC communication parameters using a high level model (SystemC TLM timed). Our approach analyzes and evaluates the NoC performance under a wide variety of traffic conditions. The experimental stage was conducted employing a model of a net represented by a SystemC TLM timed (Hermes_Temp). Parametric and pseudo-random generators control the network traffic. The analysis was carried on with a tool designed for these purpose, which generates a group of performance metrics. The results allow to elucidate the global and inner network behavior. The performance values are useful for the heterogeneous and homogeneous NoC design projects, improving the performance evaluation studies scope.
|
62 |
Estimativa de desempenho de uma NoC a partir de seu modelo em SYSTEMC-TLM. / A NoC performance evaluation from a SYSTEMC - TLM model.Martha Johanna Sepúlveda Flórez 16 October 2006 (has links)
The wide variety of interconnection structures presently nowadays for SoC (Systemon- Chip), bus and networks-on-Chip NoCs, each of them with a wide set of setup parameters, provides a huge amount of design alternatives. Although the interconnection structure is a key SoC component, there are few design tools in order to set the appropriate configuration parameters for a given application. An efficient SoC project may comply an exploration stage among the possible solutions for the communication structure, during the first steps of the design process. The absence of appropriate tools for that exploration makes critical the designer?s judgment. The present study aims to enhance the communication SoC structure design area, when a NoC is used. This work proposes a methodology that allows the establishment of the NoC communication parameters using a high level model (SystemC TLM timed). Our approach analyzes and evaluates the NoC performance under a wide variety of traffic conditions. The experimental stage was conducted employing a model of a net represented by a SystemC TLM timed (Hermes_Temp). Parametric and pseudo-random generators control the network traffic. The analysis was carried on with a tool designed for these purpose, which generates a group of performance metrics. The results allow to elucidate the global and inner network behavior. The performance values are useful for the heterogeneous and homogeneous NoC design projects, improving the performance evaluation studies scope. / The wide variety of interconnection structures presently nowadays for SoC (Systemon- Chip), bus and networks-on-Chip NoCs, each of them with a wide set of setup parameters, provides a huge amount of design alternatives. Although the interconnection structure is a key SoC component, there are few design tools in order to set the appropriate configuration parameters for a given application. An efficient SoC project may comply an exploration stage among the possible solutions for the communication structure, during the first steps of the design process. The absence of appropriate tools for that exploration makes critical the designer?s judgment. The present study aims to enhance the communication SoC structure design area, when a NoC is used. This work proposes a methodology that allows the establishment of the NoC communication parameters using a high level model (SystemC TLM timed). Our approach analyzes and evaluates the NoC performance under a wide variety of traffic conditions. The experimental stage was conducted employing a model of a net represented by a SystemC TLM timed (Hermes_Temp). Parametric and pseudo-random generators control the network traffic. The analysis was carried on with a tool designed for these purpose, which generates a group of performance metrics. The results allow to elucidate the global and inner network behavior. The performance values are useful for the heterogeneous and homogeneous NoC design projects, improving the performance evaluation studies scope.
|
63 |
Advanced Connection Allocation Techniques in Circuit Switching Network on ChipChen, Yong 14 September 2017 (has links)
With the advancement of semiconductor technology, the System on Chip (SoC) is becoming more and more complex, so the on-chip communication has become a bottleneck of SoC Design. Since the traditional bus system is inefficient and not scalable, the Network-On-Chip (NoC) has emerged as the promising communication mechanism for complex SoCs. As some systems have specific performance requirements, such as a minimum throughput (for real-time streaming data) or bounded latency (for interrupts, process synchronization, etc), communication with Guaranteed Service (GS) support becomes crucial for predictable SoC architectures. Circuit Switching (CS) is a popular approach to support GS, which firstly has to allocate an exclusively connection (circuit) between the source and destination nodes, and then the data packets are delivered over this connection. However, it is inefficient and inflexible because the resource is occupied by single connection during its whole lifetime, which can block other communications. Hence, two extensions of CS have been proposed to share resources: i) Time-Division Multiplexing (TDM), in which the available link capacity is split into multiple time slots to be shared by different flows in TDM scheme; and ii) Space-Division-Multiplexing (SDM), in which only a subset (sub-channel) of the link wires is exclusively allocated to a specific connection, while the remaining wires of the link can be used by other flows.
The connection allocation is critical for CS, since the data delivery can start only after the associated connection is allocated. In this thesis, we propose a dedicated hardware connection allocator to solve the dynamic connection allocation problem for CS NoCs, which has to i) allocate a contention-free path between source-destination pairs and ii) allocate appropriate portions of link bandwidth (appropriate number of time slots and subsets) along the path. The dedicated connection allocator, called NoCManager, solves the connection allocation problem by employing a trellis-search based shortest path algorithm. The trellis search can explore all possible paths between source node and destination. Moreover, it shall find the requested path in a fixed low latency and can guarantee the path optimality in terms of path length if the path is available.
In this thesis, two different trellis graphs, Forward-Backtrack trellis and Register-Exchange trellis are proposed. The Forward-Backtrack trellis completes the path search in two steps: forward search and backtracking. Firstly, the forward search begins at source node that traverses the network to find the free path. When destination node is reached, the backtrack starts from destination to select the survivor path and collect the associated path parameters. However, Register-Exchange trellis saves the entire survivor path sequences during forward search. Consequently, the backtracking step can be omitted, and thus the allocation time is halved compared to forward-backtrack approaches. Moreover, each trellis graph consists of three categories, unfolded structure, folded structure and bidirectional structure. The unfolded structure can provide high allocation speed while folded structure is more efficient from a hardware point of view. The bidirectional structure starts the search at two sides, source node and destination node simultaneously, so the allocation speed is 2 times faster than previous unidirectional search. Furthermore, in order to address the scalability issue of previous centralized systems, the partitioned architecture (i.e. spatial partitioning technique) is proposed to divide the large system into multiple smaller differentiated logical partitions served by local NoCManagers. This partitioning technique keeps the request load of the manager and manager-node communication overhead moderate. Inside each partition, the path search problem is solved by a local manager with trellis-search algorithm. To establish a path that crosses partitions, the managers communicate with each other in distributed manner to converge the global path.
In order to further enhance the path diversity and resource utilization, we adopt the combined TDM and SDM technique. In combined TDM-SDM approach, each SDM sub-channel is split into multiple time slots so that can be shared by multiple flows. Hence, the number of sub-channels can be kept moderate to reduce router complexity, while still providing higher path diversity than TDM scheme. In order to investigate and optimize TDM-SDM partitioning strategy, we studied the influence of different TDM-SDM link partitioning strategies on success rate and path length that allowed us to find the optimal solution. The dedicated connection allocator using the trellis-search algorithm is employed for TDM, SDM and TDM-SDM CS.
In the end, we present the router architecture that combines the circuit-switching network (for GS communication) and packet-switching network (for best-effort communication).
|
64 |
Power Optimal Network-On-Chip Interconnect DesignVikas, G 02 1900 (has links) (PDF)
A large part of today's multi-core chips is interconnect. Increasing communication complexity has made new strategies for interconnects essential such as Network on Chip. Power dissipation in interconnects has become a substantial part of the total power dissipation. Hence, techniques to reduce interconnect power have become a necessity. In this thesis, we present a design methodology that gives values of bus width for interconnect links, frequency of operation for routers, in Network on Chip scenario that satisfy required throughput and dissipate minimal switching power. We develop closed form analytical expressions for the power dissipation, with bus width and frequency as variables and then use Lagrange multiplier method to arrive at the optimal values.
To validate our methodology, we implement the router design in 90 nm technology and measure power for various bus widths and frequency combinations. We find that the experimental results are in good agreement with the predicted theoretical results. Further, we present the scenario of an Application Specific System on Chip (ASSoC), where the throughput requirements are different on different links. We show that our analytical model holds in this case also. Then, we present modified version of the solution considered for Chip Multi Processor (CMP) case that can solve the ASSoC scenario also.
|
65 |
Design and Programming Methods for Reconfigurable Multi-Core Architectures using a Network-on-Chip-Centric ApproachRettkowski, Jens 12 July 2022 (has links)
A current trend in the semiconductor industry is the use of Multi-Processor Systems-on-Chip (MPSoCs) for a wide variety of applications such as image processing, automotive, multimedia, and robotic systems. Most applications gain performance advantages by executing parallel tasks on multiple processors due to the inherent parallelism. Moreover, heterogeneous structures provide high performance/energy efficiency, since application-specific processing elements (PEs) can be exploited. The increasing number of heterogeneous PEs leads to challenging communication requirements. To overcome this challenge, Networks-on-Chip (NoCs) have emerged as scalable on-chip interconnect. Nevertheless, NoCs have to deal with many design parameters such as virtual channels, routing algorithms and buffering techniques to fulfill the system requirements.
This thesis highly contributes to the state-of-the-art of FPGA-based MPSoCs and NoCs. In the following, the three major contributions are introduced.
As a first major contribution, a novel router concept is presented that efficiently utilizes communication times by performing sequences of arithmetic operations on the data that is transferred. The internal input buffers of the routers are exchanged with processing units that are capable of executing operations. Two different architectures of such processing units are presented. The first architecture provides multiply and accumulate operations which are often used in signal processing applications. The second architecture introduced as Application-Specific Instruction Set Routers (ASIRs) contains a processing unit capable of executing any operation and hence, it is not limited to multiply and accumulate operations. An internal processing core located in ASIRs can be developed in C/C++ using high-level synthesis.
The second major contribution comprises application and performance explorations of the novel router concept. Models that approximate the achievable speedup and the end-to-end latency of ASIRs are derived and discussed to show the benefits in terms of performance. Furthermore, two applications using an ASIR-based MPSoC are implemented and evaluated on a Xilinx Zynq SoC. The first application is an image processing algorithm consisting of a Sobel filter, an RGB-to-Grayscale conversion, and a threshold operation. The second application is a system that helps visually impaired people by navigating them through unknown indoor environments. A Light Detection and Ranging (LIDAR) sensor scans the environment, while Inertial Measurement Units (IMUs) measure the orientation of the user to generate an audio signal that makes the distance as well as the orientation of obstacles audible. This application consists of multiple parallel tasks that are mapped to an ASIR-based MPSoC. Both applications show the performance advantages of ASIRs compared to a conventional NoC-based MPSoC. Furthermore, dynamic partial reconfiguration in terms of relocation and security aspects are investigated.
The third major contribution refers to development and programming methodologies of NoC-based MPSoCs. A software-defined approach is presented that combines the design and programming of heterogeneous MPSoCs. In addition, a Kahn-Process-Network (KPN) –based model is designed to describe parallel applications for MPSoCs using ASIRs. The KPN-based model is extended to support not only the mapping of tasks to NoC-based MPSoCs but also the mapping to ASIR-based MPSoCs. A static mapping methodology is presented that assigns tasks to ASIRs and processors for a given KPN-model. The impact of external hardware components such as sensors, actuators and accelerators connected to the processors is also discussed which makes the approach of high interest for embedded systems.
|
66 |
Improving Network-on-Chip Performance in Multi-Core SystemsGorgues Alonso, Miguel 10 September 2018 (has links)
Tesis por compendio / La red en el chip (NoC) se han convertido en el elemento clave para la comunicación eficiente entre los núcleos dentro de los chip multiprocesador (CMP). Tanto el uso de aplicaciones paralelas en los CMPs como el incremento de la cantidad de memoria necesitada por las aplicaciones, ha impulsado que la red de comunicación gane una mayor importancia. La NoC es la encargada de transportar toda la información requerida por los núcleos. Además, el incremento en el número de núcleos en los CMPs impulsa las NoC a ser diseñadas de forma escalable, pero al mismo tiempo sin que esto afecte a las prestaciones de la red (latencia y productividad). Por tanto, el diseño de la red en el chip se convierte en crítico.
Esta tesis presenta diferentes propuestas que atacan el problema de la mejora de las prestaciones de la red en tres escenarios distintos. Los tres escenarios en los que se centran nuestras propuestas son: 1) NoCs que implementan un algoritmo de encaminamiento adaptativo, 2) escenarios con necesidad de tiempos de acceso a memoria bajos y 3) sistemas con previsión de seguridad a nivel de aplicación. Las primeras propuestas se centran en el aumento de la productividad en la red utilizando algoritmos de encaminamiento adaptativos mediante un mejor uso de los recursos de la red, primera propuesta SUR, y evitando que se ramifique la congestión cuando existe tráfico intenso hacia un único destinatario, segunda propuesta EPC.
La tercera y principal contribución de esta tesis se centra la problemática de reducir el tiempo de acceso a memoria. PROSA, mediante un diseño híbrido de conmutación de paquete y conmuntación de circuito, permite reducir la latencia de la red aprovechando la latencia de acceso a memoria para establecer circuitos. De esta forma cuando la información llega a la NoC, esta es servida sin retardos. Por último, la propuesta Token Based TDM se centra en el escenario con redes de interconexión seguras. En este tipo de NoC las aplicaciones esta divididas en dominios y la red debe garantizar que no existen interferencias entre los diferentes dominios para evitar de este modo la intrusión de posibles aplicaciones maliciosas. Token-based TDM permite el aislamiento de los dominios sin tener impacto en el diseño de los conmutados de la NoC.
Los resultados obtenidos demuestran como estas propuestas han servido para mejorar las prestaciones de la red en los diferentes escenarios. La implementación y la simulación de las propuestas muestra como mediante el balanceado de la utilización de los recursos de la red, los CMPs con algoritmos de encaminamiento adaptativos son capaces de aumentar el tráfico soportado por la red. Además, el uso de un filtro para limitar el encaminamiento adaptativo en situaciones de congestión previene a los mensajes de la ramificación de la congestión a lo largo de la red. Por otra parte, los resultados demuestran que el uso combinado de la conmutación de paquete y conmutación de circuito reduce muy significativa de la latencia de red acceso a memoria, contribuyendo a una reducción significativa del tiempo de ejecución de la aplicación. Por último, Token-Based TDM incrementa las prestaciones de las redes TDM debido a su alta flexibilidad dado que no requiere ninguna modificación en la red para soportar una cantidad diferente de dominios mientras mejora la latencia de la red y mantiene un aislamiento perfecto entre los tráficos de las aplicaciones. / The Network on Chip (NoC) has become the key element for an efficient communication between cores within the multiprocessor chip (CMP). The use of parallel applications in CMPs and the increase in the amount of memory needed by applications have pushed the network communication to gain importance. The NoC is in charge of transporting all the data needed by the processors cores. Moreover, the increase in the number of cores pushes the NoCs to be designed in a scalable way, but at the same time, without affecting network performance (latency and productivity). Thus, network-on-chip design becomes critical.
This thesis presents different proposals that attack the problem of improving the network performance in three different scenarios. The three scenarios in which our proposals are focused are: 1) NoCs with an adaptive routing algorithm, 2) scenarios with low memory access time needs, and 3) high-assurance NoCs. The first proposals focus on increasing network throughput with adaptive routing algorithms via the improvement of the network resources utilization, the first proposal SUR, and avoiding congestion spreading when an intense traffic to a single destination occurs, second proposal ECP. The third one and main contribution of this thesis focuses on the problem of reducing memory access latency. PROSA, through a hybrid circuit-packet switching architecture design, reduces the network latency by getting benefit of the memory access latency slack and to establishing circuits during that delay. In this way the information when arrives to the NoC is served without any delay. Finally, the proposal Token-Based TDM focuses on the scenario with high assurance networks on chips. In this type of NoCs the applications are divided into domains and the network must guarantee that there are no interferences between the different domains avoiding this way intrusion of possible malicious applications. Token-based TDM allows domain isolation with no design impact on NoC routers.
The results show how these proposals improve the performance of the network in each different scenario. The implementation and simulations of the proposals show the efficient use of network resources in CMPs with adaptive routing algorithms which leads to an increasement of the injected traffic supported by the network. In addition, using a filter to limit the adaptivity of the routing algorithm under congested situations prevents messages from spreading the congestion along the network. On the other hand, the results show that the combined use of circuit and packet switching reduces the memory access latency significantly, contributing to a significant reduction in application execution time. Finally, Token-Based TDM increases network performance of TDM networks due to its high flexibility and efficient arbitration. Moreover, Token-Based TDM does not require any modification in the network to support a different number of domains while improving latency and keeping a strong traffic isolation from different domains. / La xarxa en el xip (NoC) s'ha convertit en un element clau per a una comunicació eficient entre els diferents nuclis dins d'un xip multiprocessador (CMP). Tant la utilització d'aplicacions paral·leles en el CMP com l'increment de la quantitat de memòria necessitada per les aplicacions, hi ha produït que la xarxa de comunicació tinga una major importància. La NoC és l'encarregada de transportar tota la informació necessària pels nuclis. A més, l'increment del nombre de nuclis dins del CMP fa que la NoC haja de ser dissenyada d'una forma escalable, sense que afecte les prestacions de la xarxa (latència i productivitat). Per tant, el disseny de la xarxa en el xip es converteix crític.
Aquesta tesi presenta diferents propostes que ataquen el problema de la millora de les prestacions de la xarxa en tres escenaris distints. Els tres escenaris en els quals se centren les nostres propostes són: 1) NoCs que implementen un algoritme d'encaminament adaptatiu, 2) escenaris amb necessitat de temps baix d'accés a memòria i 3) sistemes amb previsió de seguretat en l'àmbit d'aplicació. Les primeres propostes se centren en l'augment de la productivitat en la xarxa utilitzant algoritmes d'encaminament adaptatiu mitjançant una millor utilització dels recursos de la xarxa, primera proposta SUR, i evitant que es ramifique la congestió quan existeix un trànsit intens cap a un únic destinatari, segona proposta EPC. La tercera i principal contribució d'aquesta tesi es basa en la problemàtica de reduir el temps d'accés a memòria. PROSA, mitjançant un disseny híbrid de commutació de paquet i commutació de circuit, redueix la latència de la xarxa aprofitant la latència d'accés a memòria i establint els circuits durant aquesta latència. D'aquesta forma la informació quan arriba a la NoC pot ser enviada sense cap retràs. Per últim, la proposta Token-based TDM se centra en l'escenari amb xarxes d'interconnexió d'alta seguretat. En aquest tipus de NoC les aplicacions estan dividides en dominis i la xarxa deu garantir que no existeixen interferències entre els diferents dominis per a evitar d'aquesta forma la intrusió de possibles aplicacions malicioses. Token-based TDM permet l'aïllament dels dominis sense tindre impacte en el disseny dels encaminadors de la NoC.
Els resultats demostren com aquestes propostes han servit per a millorar les prestacions de la xarxa en els diferents escenaris. La seua implementació i simulació demostra com mitjançant el balancejat de la utilització dels recursos de la xarxa, els CMP amb algoritmes d'encaminament adaptatiu són capaços d'augmentar el trànsit suportat per la xarxa. A més, l'ús d'un filtre per a limitar l'adaptabilitat de l'encaminament adaptatiu en situacions de congestió permet prevenir els missatges de la congestió al llarg de la xarxa. Per altra banda, els resultats demostren que l'ús combinat de la commutació de paquet i commutació de circuit redueix molt significativament de la latència d'accés a memòria, contribuint en una reducció significativa del temps d'execució de l'aplicació. Per últim, Token-based TDM incrementa les prestacions de les xarxes TDM debut a la seua alta flexibilitat donat que no requereix cap modificació en la xarxa per a suportar una quantitat diferent de dominis mentre millora la latència de la xarxa i mantén un aïllament perfecte entre els trànsits de les aplicacions. / Gorgues Alonso, M. (2018). Improving Network-on-Chip Performance in Multi-Core Systems [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/107336 / Compendio
|
67 |
Performance and Energy Efficient Building Blocks for Network-on-Chip ArchitecturesVangal, Sriram R. January 2006 (has links)
The ever shrinking size of the MOS transistors brings the promise of scalable Network-on-Chip (NoC) architectures containing hundreds of processing elements with on-chip communication, all integrated into a single die. Such a computational fabric will provide high levels of performance in an energy efficient manner. To mitigate emerging wire-delay problem and to address the need for substantial interconnect bandwidth, packet switched routers are fast replacing shared buses and dedicated wires as the interconnect fabric of choice. With on-chip communication consuming a significant portion of the chip power and area budgets, there is a compelling need for compact, low power routers. While applications dictate the choice of the compute core, the advent of multimedia applications, such as 3D graphics and signal processing, places stronger demands for self-contained, low-latency floating-point processors with increased throughput. Therefore, this work focuses on two key building blocks critical to the success of NoC design: high performance, area and energy efficient router and floating-point processor architectures. This thesis first presents a six-port four-lane 57 GB/s non-blocking router core based on wormhole switching. The router features double-pumped crossbar channels and destinationaware channel drivers that dynamically configure based on the current packet destination. This enables 45% reduction in crossbar channel area, 23% overall router area, up to 3.8X reduction in peak channel power, and 7.2% improvement in average channel power, with no performance penalty over a published design. In a 150nm six-metal CMOS process, the 12.2mm2 router contains 1.9 million transistors and operates at 1GHz at 1.2V. We next present a new pipelined single-precision floating-point multiply accumulator core (FPMAC) featuring a single-cycle accumulate loop using base 32 and internal carry-save arithmetic, with delayed addition techniques. Combined algorithmic, logic and circuit techniques enable multiply-accumulates at speeds exceeding 3GHz, with single-cycle throughput. Unlike existing FPMAC architectures, the design eliminates scheduling restrictions between consecutive FPMAC instructions. The optimizations allow removal of the costly normalization step from the critical accumulate loop and conditionally powered down using dynamic sleep transistors on long accumulate operations, saving active and leakage power. In addition, an improved leading zero anticipator (LZA) and overflow detection logic applicable to carry-save format is presented. In a 90nm seven-metal dual-VT CMOS process, the 2mm2 custom design contains 230K transistors. The fully functional first silicon achieves 6.2 GFLOPS of performance while dissipating 1.2W at 3.1GHz, 1.3V supply. It is clear that realization of successful NoC designs require well balanced decisions at all levels: architecture, logic, circuit and physical design. Our results from key building blocks demonstrate the feasibility of pushing the performance limits of compute cores and communication routers, while keeping active and leakage power, and area under control. / Report code: LiU-TEK-LIC-2006:36.
|
68 |
Self-adaptive QOS at communication and computation levels for many-core system-on-chipRuaro, Marcelo 16 March 2018 (has links)
Submitted by PPG Ci?ncia da Computa??o (ppgcc@pucrs.br) on 2018-04-03T14:37:48Z
No. of bitstreams: 1
MARCELO_RUARO_TES.pdf: 4683751 bytes, checksum: 6eb242e44efbbffa6fa556ea81cdeace (MD5) / Approved for entry into archive by Tatiana Lopes (tatiana.lopes@pucrs.br) on 2018-04-13T17:30:40Z (GMT) No. of bitstreams: 1
MARCELO_RUARO_TES.pdf: 4683751 bytes, checksum: 6eb242e44efbbffa6fa556ea81cdeace (MD5) / Made available in DSpace on 2018-04-13T17:37:13Z (GMT). No. of bitstreams: 1
MARCELO_RUARO_TES.pdf: 4683751 bytes, checksum: 6eb242e44efbbffa6fa556ea81cdeace (MD5)
Previous issue date: 2018-03-16 / Sistemas multi-n?cleos intra-chip s?o o estado-da-arte em termos de poder computacional, alcan?ando de d?zias a milhares de elementos de processamentos (PE) em um ?nico circuito integrado. Sistemas multi-n?cleos de prop?sito geral assumem uma admiss?o din?mica de aplica??es, onde o conjunto de aplica??es n?o ? conhecido em tempo de projeto e as aplica??es podem iniciar sua execu??o a qualquer momento. Algumas aplica??es podem ter requisitos de tempo real, requisitando n?veis de qualidade de servi?o (QoS) do sistema. Devido ao alto grau de imprevisibilidade do uso dos recursos e o grande n?mero de componentes para se gerenciar, propriedades autoadaptativas tornam-se fundamentais para dar suporte a QoS em tempo de execu??o. A literatura fornece diversas propostas de QoS autoadaptativo, focado em recursos de comunica??o (ex., redes intra-chip), ou computa??o (ex., CPU). Contudo, para fornecer um suporte de QoS completo, ? fundamental uma autoconsci?ncia abrangente dos recursos do sistema, e assumir t?cnicas adaptativas que permitem agir em ambos os n?veis de comunica??o e computa??o para atender os requisitos das aplica??es. Para suprir essas demandas, essa Tese prop?e uma infraestrutura e t?cnicas de gerenciamento de QoS autoadaptativo, cobrindo ambos os n?veis de computa??o e comunica??o. No n?vel de computa??o, a infraestrutura para QoS consiste em um escalonador din?mico de tarefas de tempo real e um protocolo de migra??o de tarefas de baixo custo. Estas t?cnicas fornecem QoS de computa??o, devido ao gerenciamento da utiliza??o e aloca??o da CPU. A novidade do escalonador de tarefas ? o suporte a requisitos de tempo real din?micos, o que gera mais flexibilidade para as tarefas em explorar a CPU de acordo com uma carga de trabalho vari?vel. A novidade do protocolo de migra??o de tarefas ? o baixo custo no tempo de execu??o comparado a trabalhos do estado-da-arte. No n?vel de comunica??o, a t?cnica proposta ? um chaveamento por circuito (CS) baseado em redes definidas por software (SDN). O paradigma SDN para NoCs ? uma inova??o desta Tese, e ? alcan?ado atrav?s de uma arquitetura gen?rica de software e hardware. Para QoS de comunica??o, SDN ? usado para definir caminhos CS em tempo de execu??o. Essas infraestruturas de QoS s?o gerenciadas de uma forma integrada por um gerenciamento de QoS autoadaptativo, o qual segue o paradigma ODA (Observar, Decidir, Agir), implementando um la?o fechado de adapta??es em tempo de execu??o. O gerenciamento de QoS ? autoconsciente dos recursos do sistema e das aplica??es em execu??o, e pode decidir por adapta??es no n?vel de computa??o ou comunica??o, baseado em notifica??es das tarefas, monitoramento do ambiente, e monitoramento de atendimento de QoS. A autoadapta??o decide reativamente assim como proativamente. Uma t?cnica de aprendizagem do perfil das aplica??es ? proposta para tra?ar o comportamento das tarefas de tempo real, possibilitando a??es proativas. Resultados gerais mostram que o gerenciamento de QoS autoadaptativo proposto pode restaurar os n?veis de QoS para as aplica??es com um baixo custo no tempo de execu??o das aplica??es. Uma avalia??o abrangente, assumindo diversos benchmarks mostra que, mesmo sob diversas interfer?ncias de QoS nos n?veis de computa??o e comunica??o, o tempo de execu??o das aplica??es ? restaurado pr?ximo ao cen?rio ?timo, como 99,5% das viola??es de deadlines mitigadas. / Many-core systems-on-chip are the state-of-the-art in processing power, reaching from a dozen to thousands of processing elements (PE) in a single integrated circuit. General purpose many-cores assume a dynamic application admission, where the application set is unknown at design-time and applications may start their execution at any moment, inducing interference between them. Some applications may have real-time constraints to fulfill, requiring levels of quality of service (QoS) from the system. Due to the high degree of resource?s utilization unpredictability and the number of components to manage, self-adaptive properties become fundamental to support QoS at run-time. The literature provides several self-adaptive QoS proposals, targeting either communication (e.g., Network-on-Chip) or computation resources (e.g., CPU). However, to offer a complete QoS support, it is fundamental to provide a comprehensive self-awareness of the system?s resources, assuming adaptive techniques enabling to act simultaneously at the communication and computation levels to meet the applications' constraints. To cope with these requirements, this Thesis proposes a self-adaptive QoS infrastructure and management techniques, covering both the computation and communication levels. At the computation level, the QoS-driven infrastructure comprises a dynamic real-time task scheduler and a low overhead task migration protocol. These techniques ensure computation QoS by managing the CPU utilization and allocation. The novelty of the task scheduler is the support for dynamic real time constraints, which leverage more flexibility to tasks to explore the CPU according to a variable workload. The novelty of the task migration protocol is its low execution time overhead compared to the state-of-the-art. At the communication level, the proposed technique is a Circuit-Switching (CS) approach based on the Software Defined Networking (SDN) paradigm. The SDN paradigm for NoCs is an innovation of this Thesis and is achieved through a generic software and hardware architecture. For communication QoS, SDN is used to define CS paths at run-time. A self-adaptive QoS management following the ODA (Observe Decide Act) paradigm controls these QoS-driven infrastructures in an integrated way, implementing a closed loop for run time adaptations. The QoS management is self-aware of the system and running applications and can decide to take adaptations at computation or communication levels based on the task feedbacks, environment monitoring, and QoS fulfillment monitoring. The self-adaptation decides reactively as well as proactively. An online application profile learning technique is proposed to trace the behavior of the RT tasks and enabling the proactive actions. Results show that the proposed self-adaptive QoS management can restore the QoS level for the applications with a low overhead over the applications execution time. A broad evaluation, using known benchmarks, shows that even under severe QoS disturbances at computation and communication levels, the execution time of the application is restored near to the optimal scenario, mitigating 99.5% of deadline misses.
|
69 |
DHyANA : neuromorphic architecture for liquid computing / DHyANA : uma arquitetura digital neuromórfica hierárquica para máquinas de estado líquidoHolanda, Priscila Cavalcante January 2016 (has links)
Redes Neurais têm sido um tema de pesquisas por pelo menos sessenta anos. Desde a eficácia no processamento de informações à incrível capacidade de tolerar falhas, são incontáveis os mecanismos no cérebro que nos fascinam. Assim, não é nenhuma surpresa que, na medida que tecnologias facilitadoras tornam-se disponíveis, cientistas e engenheiros têm aumentado os esforços para o compreender e simular. Em uma abordagem semelhante à do Projeto Genoma Humano, a busca por tecnologias inovadoras na área deu origem a projetos internacionais que custam bilhões de dólares, o que alguns denominam o despertar global de pesquisa da neurociência. Avanços em hardware fizeram a simulação de milhões ou até bilhões de neurônios possível. No entanto, as abordagens existentes ainda não são capazes de fornecer a densidade de conexões necessária ao enorme número de neurônios e sinapses. Neste sentido, este trabalho propõe DHyANA (Arquitetura Digital Neuromórfica Hierárquica), uma nova arquitetura em hardware para redes neurais pulsadas, a qual utiliza comunicação em rede-em-chip hierárquica. A arquitetura é otimizada para implementações de Máquinas de Estado Líquido. A arquitetura DHyANA foi exaustivamente testada em plataformas de simulação, bem como implementada em uma FPGA Stratix IV da Altera. Além disso, foi realizada a síntese lógica em tecnologia 65nm, a fim de melhor avaliar e comparar o sistema resultante com projetos similares, alcançando uma área de 0,23mm2 e potência de 147mW para uma implementação de 256 neurônios. / Neural Networks has been a subject of research for at least sixty years. From the effectiveness in processing information to the amazing ability of tolerating faults, there are countless processing mechanisms in the brain that fascinates us. Thereupon, it comes with no surprise that as enabling technologies have become available, scientists and engineers have raised the efforts to understand, simulate and mimic parts of it. In a similar approach to that of the Human Genome Project, the quest for innovative technologies within the field has given birth to billion dollar projects and global efforts, what some call a global blossom of neuroscience research. Advances in hardware have made the simulation of millions or even billions of neurons possible. However, existing approaches cannot yet provide the even more dense interconnect for the massive number of neurons and synapses required. In this regard, this work proposes DHyANA (Digital HierArchical Neuromorphic Architecture), a new hardware architecture for a spiking neural network using hierarchical network-on-chip communication. The architecture is optimized for Liquid State Machine (LSM) implementations. DHyANA was exhaustively tested in simulation platforms, as well as implemented in an Altera Stratix IV FPGA. Furthermore, a logic synthesis analysis using 65-nm CMOS technology was performed in order to evaluate and better compare the resulting system with similar designs, achieving an area of 0.23mm2 and a power dissipation of 147mW for a 256 neurons implementation.
|
70 |
Alocação dinâmica de tarefas periódicas em NoCs malha com redução do consumo de energia / Energy-aware dynamic allocation of periodic tasks on mesh NoCsWronski, Fabio January 2007 (has links)
O objetivo deste trabalho é propor técnicas de alocação dinâmica de tarefas periódicas em MPSoCs homogêneos, com processadores interligados por uma rede emchip do tipo malha, visando redução do consumo de energia do sistema. O foco principal é a definição de uma heurística de alocação, não se considerando protocolos de escalonamento distribuído, uma vez que este ainda é um primeiro estudo para o desenvolvimento de um alocador dinâmico. Na arquitetura alvo utilizada, cada nodo do sistema é dado como autônomo, possuindo seu próprio escalonador EDF. Além disso, são aplicadas técnicas de voltage scaling e power managmenent para redução do consumo de energia durante o escalonamento. Durante a pesquisa do estado da arte, não foram encontradas técnicas de alocação dinâmica em NoCs com restrições temporais e minimização do consumo de energia. Por isso, esse trabalho se concentra em avaliar técnicas de alocação convencionais, como bin-packing e técnicas baseadas em teoria de grafos, no contexto de sistemas embarcados. Dessa forma, o modelo de estimativas do consumo de energia de alocações é baseado no escalonamento de grafos de tarefas, e foi utilizado para implementar a ferramenta Serpens com este propósito. Os grafos de tarefas utilizados nos experimentos são tirados do benchmark E3S – Embedded System Synthesis Benchmark Suite, composto por um conjunto de grafos de tarefas gerados aleatoriamente com a ferramenta TGFF – Task Graph for Free, a partir de dados de aplicações comuns em sistemas embarcados obtidos no EEMBC – Embedded Microprocessor Benchmark Consortium. Entre as heurísticas de bin-packing, Best-Fit, First-Fit e Next-Fit geram alocações com concentração de carga, enquanto a heurística Worst-Fit faz balanceamento de carga. O balanceamento de carga favorece a aplicação de voltage scaling enquanto a concentração favorece o power management. Como o bin-packing não contempla comunicação e dependência entre tarefas em seu modelo, o mesmo foi reformulado para atender esta necessidade. Nos experimentos, a alocação inicial com bin-packing original apresentou perdas de deadlines de até 84 % para a heurística Worst-Fit, passando para perdas em torno de 16% na alocação final, praticamente com o mesmo consumo de energia, após a reformulação do modelo. / The goal of this work is to offer dynamic allocation techniques of periodic tasks in mesh networks-on-chip, aiming to reduce the system power consumption. The main focus is the definition of an allocation heuristic, which does not consider distributed scheduling protocols, since this is the beginning of a study for the development of a dynamic partitioning tool. In the target architecture, each system node is self-contained, that is, the nodes contain their own EDF scheduler. Besides, voltage-scaling and power management techniques are applied for reducing power consumption during the scheduling. To the best of our knowledge, this is the first research effort considering both temporal constraints and power consumption minimization on the dynamic allocation of tasks in a mesh NoC. This way, our concentrates in the evaluation of dynamic allocation techniques, which are generally used in distributed systems, in the embedded systems context, as bin-packing and graph theory based techniques. Therefore, the estimation model for power consumption is based on task graph scheduling, and it was used for implementing the Serpens tool with this purpose. The task graphs used in the experiments were obtained from the E3S benchmark (Embedded System Synthesis Benchmark Suite), which is composed by a set of task graphs randomly generated with the TGFF tool (Task Graph for Free), from common application data obtained from the EEMBC (Embedded Microprocessor Benchmark Consortium). Among the bin-packing heuristics, Best-Fit, First-Fit, and Next-Fit generate allocations with load concentration, while the Worst-Fit heuristics works with load balancing. Load balancing favors the application of voltage scaling, while load concentration favors the utilization of power management. Since the bin-packing model does not consider inter-task communication and dependency, it has been modified to fulfill this need. In the experiments, the initial allocation using the original bin-packing model presented deadline losses of up to 84% for the Worst-Fit heuristic, changing for losses around 16% in the final allocation, after modification of the model, maintaining almost the same power consumption.
|
Page generated in 0.043 seconds