• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 12
  • 4
  • 2
  • 2
  • 2
  • 1
  • 1
  • Tagged with
  • 30
  • 30
  • 10
  • 10
  • 8
  • 6
  • 6
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Conception d'une plate-forme de prototypage virtuel de réseaux d'interconnexion / Designing a virtual prototyping framework of interconnection networks

Nguyen, Tuan-Anh 17 December 2014 (has links)
Les systèmes HPC ("High-Performance Computing") sont des systèmes conçus avec des centaines de milliers de nœuds de calcul interconnectés entre eux par un réseau de communication de haute performance, lui-même assemblé suivant une variété de topologie par des nœuds de routage. La conception du réseau d'interconnexion d'un système HPC revêt une importance capitale dans la performance finale du système. La complexité de cette conception requiert la mise en œuvre d'un environnement de prototypage virtuel afin de pouvoir analyser et valider les hypothèses et options micro et macro-architecturales dès les premières étapes de la conception. Les travaux de cette thèse sont dédiés au développement d'une plate-forme de prototypage virtuel nommée CoSIN ("Composition and Simulation of Interconnected Network") pour assister les architectes de la société Bull S.A.S. dans leur conception des systèmes HPC. Ces travaux répondent au défi de modélisation et de simulation de réseaux de très grand taille (de 10^4 à 10^5 nœuds) et ce en des temps acceptables. Pour ce faire, l'environnement de programmation SystemC a été mis en parallèle afin de fournir une puissance de calcul et une capacité de mémoire distribuées. En plus de l'aspect conceptuel, a thèse se veut aussi pragmatique en produisant comme résultat, un outil déjà applicable à des projets de conception industriels / High-Performance Computing (HPC) systems are distributed systems made of hundreds of thousands of processing nodes communicating through large packet-switched interconnection networks with a variety of topologies. The design of those interconnection networks impacts the overall performance of the HPC systems. Due to increasing system complexity, virtual prototyping is becoming necessary at earlier stages of the design to assist in the analysis and validation of micro and macro-architectural hypotheses and options. This thesis is dedicated to the development of such a virtual prototyping framework named CoSIN ("Composition and Simulation of Interconnected Network") with the purpose of providing support to the architectural design of HPC systems at Bull S.A.S. Technical challenges of the work are in the modelling and simulation of large interconnection networks (from 10^4 to 10^5 nodes) within acceptable times. Distribution of SystemC has been necessary to support this objective. In addition to the conceptual aspect, the thesis is also pragmatic by producing as results, a tool already applicable to industrial design projects
22

DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS

Ferrer Pérez, Joan Lluís 19 December 2012 (has links)
El crecimiento de los computadores paralelos basados en redes de altas prestaciones ha aumentado el interés y esfuerzo de la comunidad investigadora en desarrollar nuevas técnicas que permitan obtener el mejor rendimiento de estas redes. En particular, el desarrollo de nuevas técnicas que permitan un encaminamiento eficiente y que reduzcan la latencia de los paquetes, aumentando así la productividad de la red. Sin embargo, una alta tasa de utilización de la red podría conllevar el que se conoce como "congestión de red", el cual puede causar una degradación del rendimiento. El control de la congestión en redes multietapa es un problema importante que no está completamente resuelto. Con el fin de evitar la degradación del rendimiento de la red cuando aparece congestión, se han propuesto diferentes mecanismos para el control de la congestión. Muchos de estos mecanismos están basados en notificación explícita de la congestión. Para este propósito, los switches detectan congestión y dependiendo de la estrategia aplicada, los paquetes son marcados con la finalidad de advertir a los nodos origenes. Como respuesta, los nodos origenes aplican acciones correctivas para ajustar su tasa de inyección de paquetes. El propósito de esta tesis es analizar las diferentes estratégias de detección y corrección de la congestión en redes multietapa, y proponer nuevos mecanismos de control de la congestión encaminados a este tipo de redes sin descarte de paquetes. Las nuevas propuestas están basadas en una estrategia más refinada de marcaje de paquetes en combinación con un conjunto de acciones correctivas justas que harán al mecanismo capaz de controlar la congestión de manera efectiva con independencia del grado de congestión y de las condiciones de tráfico. / Ferrer Pérez, JL. (2012). DESIGN OF EFFICIENT PACKET MARKING-BASED CONGESTION MANAGEMENT TECHNIQUES FOR CLUSTER INTERCONNECTS [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18197 / Palancia
23

Paralelní výpočetní architektury založené na numerické integraci / Parallel Computer Systems Based on Numerical Integrations

Kraus, Michal Unknown Date (has links)
This thesis deals with continuous system simulation. The systems can be described by system of differential equations or block diagram. Differential equations are usually solved by numerical methods that are integrated into simulation software such as Matlab, Maple or TKSL. Taylor series method has been used for numerical solutions of differential equations. The presented method has been proved to be both very accurate and fast and also procesed in parallel systems. The aim of the thesis is to design, implement and compare a few versions of the parallel system.
24

Performance modelling and evaluation of heterogeneous wired/wireless networks under bursty traffic : analytical models for performance analysis of communication networks in multi-computer systems, multi-cluster systems, and integrated wireless systems

Yulei, W. U. January 2010 (has links)
Computer networks can be classified into two broad categories: wired networks and wireless networks, according to the hardware and software technologies used to interconnect the individual devices. Wired interconnection networks are hardware fabrics supporting communications between individual processors in highperformance computing systems (e.g., multi-computer systems and cluster systems). On the other hand, due to the rapid development of wireless technologies, wireless networks have emerged and become an indispensable part for people's lives. The integration of different wireless technologies is an effective approach to accommodate the increasing demand of the users to communicate with each other and access the Internet. This thesis aims to investigate the performance of wired interconnection networks and integrated wireless networks under the realistic working conditions. Traffic patterns have a significant impact on network performance. A number of recent measurement studies have convincingly demonstrated that the traffic generated by many real-world applications in communication networks exhibits bursty arrival nature and the message destinations are non-uniformly distributed. Analytical models for the performance evaluation of wired interconnection networks and integrated wireless networks have been widely reported. However, most of these models are developed under the simplified assumption of non-bursty Poisson process with uniformly distributed message destinations. To fill this gap, this thesis first presents an analytical model to investigate the performance of wired interconnection networks in multi-computer systems. Secondly, the analytical models for wired interconnection networks in multi-cluster systems are developed. Finally, this thesis proposes analytical models to evaluate the end-to-end delay and throughput of integrated wireless local area networks and wireless mesh networks. These models are derived when the networks are subject to bursty traffic with non-uniformly distributed message destinations which can capture the burstiness of real-world network traffic in the both temporal domain and spatial domain. Extensive simulation experiments are conducted to validate the accuracy of the analytical models. The models are then used as practical and cost-effective tools to investigate the performance of heterogeneous wired or wireless networks under the traffic patterns exhibited by real-world applications.
25

Impacto del subsistema de comunicación en el rendimiento de los computadores paralelos: desde el hardware hasta las aplicaciones

Puente Varona, Valentín 20 February 2000 (has links)
A pesar del explosivo crecimiento de la capacidad computacional de los ordenadores convencionales, alimentada fundamentalmente por la rápida evolución experimentada por los procesadores, existen multitud de problemas de notable importancia que aún no pueden ser abordados de forma satisfactoria. La solución más factible para abordar este tipo de problemas se basa en la utilización de computadores paralelos. Esta tesis se centra en el estudio de la red de interconexión de los computadores paralelos, aportando soluciones eficaces para mejorar su rendimiento. Se proponen mejoras de los elementos críticos de la red: los encaminadores y la propia topología. Las nuevas propuestas derivadas del trabajo son:· Un eficaz mecanismo de encaminamiento con un menor coste. Esta idea fue empleada por IBM en el supercomputador IBM BlueGene/L.· Se ha mejorado la gestión interna de los encaminadores con un coste acotado.· Se presentan arquitecturas de almacenamiento para los encaminadores con una relación coste-rendimiento favorable.· Se propone una nueva disposición de la red de interconexión que permite mejorar sus propiedades topológicas de forma notable frente a las empleadas usualmente.
26

Conception d'une architecture extensible pour le calcul massivement parallèle / Designing a scalable architecture for massively parallel computing

Kaci, Ania 14 December 2016 (has links)
En réponse à la demande croissante de performance par une grande variété d’applications (exemples : modélisation financière, simulation sub-atomique, bio-informatique, etc.), les systèmes informatiques se complexifient et augmentent en taille (nombre de composants de calcul, mémoire et capacité de stockage). L’accroissement de la complexité de ces systèmes se traduit par une évolution de leur architecture vers une hétérogénéité des technologies de calcul et des modèles de programmation. La gestion harmonieuse de cette hétérogénéité, l’optimisation des ressources et la minimisation de la consommation constituent des défis techniques majeurs dans la conception des futurs systèmes informatiques.Cette thèse s’adresse à un domaine de cette complexité en se focalisant sur les sous-systèmes à mémoire partagée où l’ensemble des processeurs partagent un espace d’adressage commun. Les travaux porteront essentiellement sur l’implémentation d’un protocole de cohérence de cache et de consistance mémoire, sur une architecture extensible et sur la méthodologie de validation de cette implémentation.Dans notre approche, nous avons retenu les processeurs 64-bits d’ARM et des co-processeurs génériques (GPU, DSP, etc.) comme composants de calcul, les protocoles de mémoire partagée AMBA/ACE et AMBA/ACE-Lite ainsi que l’architecture associée « CoreLink CCN » comme solution de départ. La généralisation et la paramètrisation de cette architecture ainsi que sa validation dans l’environnement de simulation Gem5 constituent l’épine dorsale de cette thèse.Les résultats obtenus à la fin de la thèse, tendent à démontrer l’atteinte des objectifs fixés / In response to the growing demand for performance by a wide variety of applications (eg, financial modeling, sub-atomic simulation, bioinformatics, etc.), computer systems become more complex and increase in size (number of computing components, memory and storage capacity). The increased complexity of these systems results in a change in their architecture towards a heterogeneous computing technologies and programming models. The harmonious management of this heterogeneity, resource optimization and minimization of consumption are major technical challenges in the design of future computer systems.This thesis addresses a field of this complexity by focusing on shared memory subsystems where all processors share a common address space. Work will focus on the implementation of a cache coherence and memory consistency on an extensible architecture and methodology for validation of this implementation.In our approach, we selected processors 64-bit ARM and generic co-processor (GPU, DSP, etc.) as components of computing, shared memory protocols AMBA / ACE and AMBA / ACE-Lite and associated architecture "CoreLink CCN" as a starting solution. Generalization and parameterization of this architecture and its validation in the simulation environment GEM5 are the backbone of this thesis.The results at the end of the thesis, tend to demonstrate the achievement of objectives
27

Performance modelling and evaluation of heterogeneous wired / wireless networks under Bursty Traffic. Analytical models for performance analysis of communication networks in multi-computer systems, multi-cluster systems, and integrated wireless systems.

Yulei, W.U. January 2010 (has links)
Computer networks can be classified into two broad categories: wired networks and wireless networks, according to the hardware and software technologies used to interconnect the individual devices. Wired interconnection networks are hardware fabrics supporting communications between individual processors in highperformance computing systems (e.g., multi-computer systems and cluster systems). On the other hand, due to the rapid development of wireless technologies, wireless networks have emerged and become an indispensable part for people¿s lives. The integration of different wireless technologies is an effective approach to accommodate the increasing demand of the users to communicate with each other and access the Internet. This thesis aims to investigate the performance of wired interconnection networks and integrated wireless networks under the realistic working conditions. Traffic patterns have a significant impact on network performance. A number of recent measurement studies have convincingly demonstrated that the traffic generated by many real-world applications in communication networks exhibits bursty arrival nature and the message destinations are non-uniformly distributed. Analytical models for the performance evaluation of wired interconnection networks and integrated wireless networks have been widely reported. However, most of these models are developed under the simplified assumption of non-bursty Poisson process with uniformly distributed message destinations. To fill this gap, this thesis first presents an analytical model to investigate the performance of wired interconnection networks in multi-computer systems. Secondly, the analytical models for wired interconnection networks in multi-cluster systems are developed. Finally, this thesis proposes analytical models to evaluate the end-to-end delay and throughput of integrated wireless local area networks and wireless mesh networks. These models are derived when the networks are subject to bursty traffic with non-uniformly distributed message destinations which can capture the burstiness of real-world network traffic in the both temporal domain and spatial domain. Extensive simulation experiments are conducted to validate the accuracy of the analytical models. The models are then used as practical and cost-effective tools to investigate the performance of heterogeneous wired or wireless networks under the traffic patterns exhibited by real-world applications.
28

Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters

Montañana Aliaga, José Miguel 21 July 2008 (has links)
Actualmente, los clusters de PC son un alternativa rentable a los computadores paralelos. En estos sistemas, miles de componentes (procesadores y/o discos duros) se conectan a través de redes de interconexión de altas prestaciones. Entre las tecnologías de red actualmente disponibles para construir clusters, InfiniBand (IBA) ha emergido como un nuevo estándar de interconexión para clusters. De hecho, ha sido adoptado por muchos de los sistemas más potentes construidos actualmente (lista top500). A medida que el número de nodos aumenta en estos sistemas, la red de interconexión también crece. Junto con el aumento del número de componentes la probabilidad de averías aumenta dramáticamente, y así, la tolerancia a fallos en el sistema en general, y de la red de interconexión en particular, se convierte en una necesidad. Desafortunadamente, la mayor parte de las estrategias de encaminamiento tolerantes a fallos propuestas para los computadores masivamente paralelos no pueden ser aplicadas porque el encaminamiento y las transiciones de canal virtual son deterministas en IBA, lo que impide que los paquetes eviten los fallos. Por lo tanto, son necesarias nuevas estrategias para tolerar fallos. Por ello, esta tesis se centra en proporcionar los niveles adecuados de tolerancia a fallos a los clusters de PC, y en particular a las redes IBA. En esta tesis proponemos y evaluamos varios mecanismos adecuados para las redes de interconexión para clusters. El primer mecanismo para proporcionar tolerancia a fallos en IBA (al que nos referimos como encaminamiento tolerante a fallos basado en transiciones; TFTR) consiste en usar varias rutas disjuntas entre cada par de nodos origen-destino y seleccionar la ruta apropiada en el nodo fuente usando el mecanismo APM proporcionado por IBA. Consiste en migrar las rutas afectadas por el fallo a las rutas alternativas sin fallos. Sin embargo, con este fin, es necesario un algoritmo eficiente de encaminamiento capaz de proporcionar suficientes / Montañana Aliaga, JM. (2008). Efficient mechanisms to provide fault tolerance in interconnection networks for pc clusters [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2603 / Palancia
29

Photonic Interconnection Networks for Exascale Computers

Duro Gómez, José 24 May 2021 (has links)
[ES] En los últimos años, distintos proyectos alrededor del mundo se han centrado en el diseño de supercomputadores capaces de alcanzar la meta de la computación a exascala, con el objetivo de soportar la ejecución de aplicaciones de gran importancia para la sociedad en diversos campos como el de la salud, la inteligencia artificial, etc. Teniendo en cuenta la creciente tendencia de la potencia computacional en cada generación de supercomputadores, este objetivo se prevee accesible en los próximos años. Alcanzar esta meta requiere abordar diversos retos en el diseño y desarrollo del sistema. Uno de los principales es conseguir unas comunicaciones rápidas y eficientes entre el inmenso número de nodos de computo y los sitemas de memoria. La tecnología fotónica proporciona ciertas ventajas frente a las redes eléctricas, como un mayor ancho de banda en los enlaces, un mayor paralelismo a nivel de comunicaciones gracias al DWDM o una mejor gestión del cableado gracias a su reducido tamaño. En la tesis se ha desarrollado un estudio de viabilidad y desarrollo de redes de interconexión haciendo uso de la tecnología fotónica para los futuros sistemas a exaescala dentro del proyecto europeo ExaNeSt. En primer lugar, se ha realizado un análisis y caracterización de aplicaciones exaescala. Este análisis se ha utilizado para conocer el comportamiento y requisitos de red que presentan las aplicaciones, y con ello guiarnos en el diseño de la red del sistema. El análisis considera tres parámetros: la distribución de mensajes en base a su tamaño y su tipo, el consumo de ancho de banda requerido a lo largo de la ejecución y la matriz de comunicación espacial entre los nodos. El estudio revela la necesidad de una red eficiente y rápida, debido a que la mayoría de las comunaciones se realizan en burst y con mensajes de un tamaño medio inferior a 50KB. A continuación, la tesis se centra en identificar los principales elementos que diferencian las redes fotónicas de las eléctricas. Identificamos una secuencia de pasos en el diseño de un simulador, ya sea haciéndolo desde cero con tecnología fotónica o adaptando un simulador de redes eléctricas existente para modelar la fotónica. Después se han realizado dos estudios de rendimiento y comparativas entre las actuales redes eléctricas y distintas configuraciones de redes fotónicas utilizando topologías clásicas. En el primer estudio, realizado tanto con tráfico sintético como con trazas de ExaNeSt en un toro, fat tree y dragonfly, se observa como la tecnología fotónica supone una clara mejora respecto a la eléctrica. Además, el estudio muestra que el parámetro que más afecta al rendimiento es el ancho de banda del canal fotónico. El segundo estudio muestra el comportamiento y rendimiento de aplicaciones reales en simulaciones a gran escala en una topología jellyfish. En este estudio se confirman las conclusiones obtenidas en el anterior, revelando además que la tecnología fotónica permite reducir la complejidad de algunas topologías, y por ende, el coste de la red. En los estudios realizados se ha observado una baja utilización de la red debido a que las topologías utilizadas para redes eléctricas no aprovechan las características que proporciona la tecnología fotónica. Por ello, se ha propuesto Segment Switching, una estrategia de conmutación orientada a reducir la longitud de las rutas mediante el uso de buffers intermedios. Los resultados experimentales muestran que cada topología tiene sus propios requerimientos. En el caso del toro, el mayor rendimiento se obtiene con un mayor número de buffers en la red. En el fat tree el parámetro más importante es el tamaño del buffer, obteniendo unas prestaciones similares una configuración con buffers en todos los switches que la que los ubica solo en el nivel superior. En resumen, esta tesis estudia el uso de la tecnología fotónica para las redes de sistemas a exascala y propone aprovechar / [CA] Els darrers anys, múltiples projectes de recerca a tot el món s'han centrat en el disseny de superordinadors capaços d'assolir la barrera de computació exascala, amb l'objectiu de donar suport a l'execució d'aplicacions importants per a la nostra societat, com ara salut, intel·ligència artificial, meteorologia, etc. Segons la tendència creixent en la potència de càlcul en cada generació de superordinadors, es preveu assolir aquest objectiu en els propers anys. No obstant això, assolir aquest objectiu requereix abordar diferents reptes importants en el disseny i desenvolupament del sistema. Un dels principals és aconseguir comunicacions ràpides i eficients entre l'enorme nombre de nodes computacionals i els sistemes de memòria. La tecnologia fotònica proporciona diversos avantatges respecte a les xarxes elèctriques actuals, com ara un major ample de banda als enllaços, un major paral·lelisme de la xarxa gràcies a DWDM o una millor gestió del cable a causa de la seva mida molt més xicoteta. En la tesi, s'ha desenvolupat un estudi de viabilitat i desenvolupament de xarxes d'interconnexió mitjançant tecnologia fotònica per a futurs sistemes exascala dins del projecte europeu ExaNeSt. En primer lloc, s'ha dut a terme un estudi de caracterització d'aplicacions exascala dels requisits de xarxa. Els resultats de l'anàlisi ajuden a entendre els requisits de xarxa de les aplicacions exascale i, per tant, ens guien en el disseny de la xarxa del sistema. Aquesta anàlisi considera tres paràmetres principals: la distribució dels missatges en funció de la seva mida i tipus, el consum d'ample de banda requerit durant tota l'execució i els patrons de comunicació espacial entre els nodes. L'estudi revela la necessitat d'una xarxa d'interconnexió ràpida i eficient, ja que la majoria de comunicacions consisteixen en ràfegues de transmissions, cadascuna amb una mida mitjana de missatge de 50 KB. A continuació, la tesi se centra a identificar els principals elements que diferencien les xarxes fotòniques de les elèctriques. Identifiquem una seqüència de passos en el disseny i implementació d'un simulador: tractar la tecnologia fotònica des de zero o per ampliar un simulador de xarxa elèctrica existent per modelar la fotònica. Després, es presenten dos estudis principals de comparació de rendiment entre xarxes elèctriques i diferents configuracions de xarxes fotòniques mitjançant topologies clàssiques. En el primer estudi, realitzat tant amb trànsit sintètic com amb traces d'ExaNeSt en un toro, fat tree i dragonfly, vam trobar que la tecnologia fotònica representa una millora notable respecte a la tecnologia elèctrica. A més, l'estudi mostra que el paràmetre que més afecta el rendiment és l'amplada de banda del canal fotònic. Aquest darrer estudi analitza el rendiment d'aplicacions reals en simulacions a gran escala en una topologia jellyfish. Els resultats d'aquest estudi corroboren les conclusions obtingudes en l'anterior, revelant també que la tecnologia fotònica permet reduir la complexitat d'algunes topologies i, per tant, el cost de la xarxa. En els estudis anteriors ens adonem que la xarxa estava infrautilitzada principalment perquè les topologies estudiades per a xarxes elèctriques no aprofiten les característiques proporcionades per la tecnologia fotònica. Per aquest motiu, proposem Segment Switching, una estratègia de commutació destinada a reduir la longitud de les rutes mitjançant la implementació de memòries intermèdies en nodes intermedis al llarg de la ruta. Els resultats experimentals mostren que cadascuna de les topologies estudiades presenta diferents requisits de memòria intermèdia. Per al toro, com més gran siga el nombre de memòries intermèdies a la xarxa, major serà el rendiment. Per al fat tree, el paràmetre clau és la mida de la memòria intermèdia, aconseguint un rendiment similar tant amb una configuració amb memòria intermèdia en tots els co / [EN] In the last recent years, multiple research projects around the world have focused on the design of supercomputers able to reach the exascale computing barrier, with the aim of supporting the execution of important applications for our society, such as health, artificial intelligence, meteorology, etc. According to the growing trend in the computational power in each supercomputer generation, this objective is expected to be reached in the coming years. However, achieving this goal requires addressing distinct major challenges in the design and development of the system. One of the main ones is to achieve fast and efficient communications between the huge number of computational nodes and the memory systems. Photonics technology provides several advantages over current electrical networks, such as higher bandwidth in the links, greater network parallelism thanks to DWDM, or better cable management due to its much smaller size. In this thesis, a feasibility study and development of interconnection networks have been developed using photonics technology for future exascale systems within the European project ExaNeSt. First, a characterization study of exascale applications from the network requirements has been carried out. The results of the analysis help understand the network requirements of exascale applications, and thereby guide us in the design of the system network. This analysis considers three main parameters: the distribution of the messages based on their size and type, the required bandwidth consumption throughout the execution, and the spatial communication patterns between the nodes. The study reveals the need for a fast and efficient interconnection network, since most communications consist of bursts of transmissions, each with an average message size of 50 KB. Next, this dissertation concentrates on identifying the main elements that differentiate photonic networks from electrical ones. We identify a sequence of steps in the design and implementation of a simulator either i) dealing with photonic technology from scratch or ii) to extend an existing electrical network simulator in order to model photonics. After that, two main performance comparison studies between electrical networks and different configurations of photonic networks are presented using classical topologies. In the former study, carried out with both synthetic traffic and traces of ExaNeSt in a torus, fat tree and dragonfly, we found that photonic technology represents a noticeable improvement over electrical technology. Furthermore, the study shows that the parameter that most affects the performance is the bandwidth of the photonic channel. The latter study analyzes performance of real applications in large-scale simulations in a jellyfish topology. The results of this study corroborates the conclusions obtained in the previous, also revealing that photonic technology allows reducing the complexity of some topologies, and therefore, the cost of the network. In the previous studies we realize that the network was underutilized mainly because the studied topologies for electrical networks do not take advantage of the features provided by photonic technology. For this reason, we propose Segment Switching, a switching strategy aimed at reducing the length of the routes by implementing buffers at intermediate nodes along the path. Experimental results show that each of the studied topologies presents different buffering requirements. For the torus, the higher the number of buffers in the network, the higher the performance. For the fat tree, the key parameter is the buffer size, achieving similar performance a configuration with buffers on all switches that locating buffers only at the top level. In summary, this thesis studies the use of photonic technology for networks of exascale systems, and proposes to take advantage of the characteristics of this technology in current electrical network topologies. / This thesis has been conceived from the work carried out by Polytechnic University of Valencia in the ExaNeSt European project / Duro Gómez, J. (2021). Photonic Interconnection Networks for Exascale Computers [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/166796 / TESIS
30

NoC Design & Optimization of Multicore Media Processors

Basavaraj, T January 2013 (has links) (PDF)
Network on Chips[1][2][3][4] are critical elements of modern System on Chip(SoC) as well as Chip Multiprocessor(CMP)designs. Network on Chips (NoCs) help manage high complexity of designing large chips by decoupling computation from communication. SoCs and CMPs have a multiplicity of communicating entities like programmable processing elements, hardware acceleration engines, memory blocks as well as off-chip interfaces. With power having become a serious design constraint[5], there is a great need for designing NoC which meets the target communication requirements, while minimizing power using all the tricks available at the architecture, microarchitecture and circuit levels of the de-sign. This thesis presents a holistic, QoS based, power optimal design solution of a NoC inside a CMP taking into account link microarchitecture and processor tile configurations. Guaranteeing QoS by NoCs involves guaranteeing bandwidth and throughput for connections and deterministic latencies in communication paths. Label Switching based Network-on-Chip(LS-NoC) uses a centralized LS-NoC Management framework that engineers traffic into QoS guaranteed routes. LS-NoC uses label switching, enables band-width reservation, allows physical link sharing and leverages advantages of both packet and circuit switching techniques. A flow identification algorithm takes into account band-width available in individual links to establish QoS guaranteed routes. LS-NoC caters to the requirements of streaming applications where communication channels are fixed over the lifetime of the application. The proposed NoC framework inherently supports heterogeneous and ad-hoc SoC designs. A multicast, broadcast capable label switched router for the LS-NoC has been de-signed, verified, synthesized, placed and routed and timing analyzed. A 5 port, 256 bit data bus, 4 bit label router occupies 0.431 mm2 in 130nm and delivers peak band-width of80Gbits/s per link at312.5MHz. LS Router is estimated to consume 43.08 mW. Bandwidth and latency guarantees of LS-NoC have been demonstrated on streaming applications like Hiper LAN/2 and Object Recognition Processor, Constant Bit Rate traffic patterns and video decoder traffic representing Variable Bit Rate traffic. LS-NoC was found to have a competitive figure of merit with state-of-the-art NoCs providing QoS. We envision the use of LS-NoC in general purpose CMPs where applications demand deterministic latencies and hard bandwidth requirements. Design variables for interconnect exploration include wire width, wire spacing, repeater size and spacing, degree of pipelining, supply, threshold voltage, activity and coupling factors. An optimal link configuration in terms of number of pipeline stages for a given length of link and desired operating frequency is arrived at. Optimal configurations of all links in the NoC are identified and a power-performance optimal NoC is presented. We presents a latency, power and performance trade-off study of NoCs using link microarchitecture exploration. The design and implementation of a framework for such a design space exploration study is also presented. We present the trade-off study on NoCs by varying microarchitectural(e.g. pipelining) and circuit level(e.g. frequency and voltage) parameters. A System-C based NoC exploration framework is used to explore impacts of various architectural and microarchitectural level parameters of NoC elements on power and performance of the NoC. The framework enables the designer to choose from a variety of architectural options like topology, routing policy, etc., as well as allows experimentation with various microarchitectural options for the individual links like length, wire width, pitch, pipelining, supply voltage and frequency. The framework also supports a flexible traffic generation and communication model. Latency, power and throughput results using this framework to study a 4x4 CMP are presented. The framework is used to study NoC designs of a CMP using different classes of parallel computing benchmarks[6]. One of the key findings is that the average latency of a link can be reduced by increasing pipeline depth to a certain extent, as it enables link operation at higher link frequencies. Abstract There exists an optimum degree of pipelining which minimizes the energy-delay product of the link. In a 2D Torus when the longest link is pipelined by 4 stages at which point least latency(1.56 times minimum) is achieved and power(40% of max) and throughput (64%of max) are nominal. Using frequency scaling experiments, power variations of up to40%,26.6% and24% can be seen in 2D Torus, Reduced 2D Torus and Tree based NoC between various pipeline configurations to achieve same frequency at constant voltages. Also in some cases, we find that switching to a higher pipelining configuration can actually help reduce power as the links can be designed with smaller repeaters. We also find that the overall performance of the ICNs is determined by the lengths of the links needed to support the communication patterns. Thus the mesh seems to perform the best amongst the three topologies(Mesh, Torus and Folded Torus) considered in case studies. The effects of communication overheads on performance, power and energy of a multiprocessor chip using L1,L2 cache sizes as primary exploration parameters using accurate interconnect, processor, on-chip and off-chip memory modelling are presented. On-chip and off-chip communication times have significant impact on execution time and the energy efficiency of CMPs. Large cache simply larger tile area that result in longer inter-tile communication link lengths and latencies, thus adversely impacting communication time. Smaller caches potentially have higher number of misses and frequent of off-tile communication. Energy efficient tile design is a configuration exploration and trade-off study using different cache sizes and tile areas to identify a power-performance optimal configuration for the CMP. Trade-offs are explored using a detailed, cycle accurate, multicore simulation frame-work which includes superscalar processor cores, cache coherent memory hierarchies, on-chip point-to-point communication networks and detailed interconnect model including pipelining and latency. Sapphire, a detailed multiprocessor execution environment integrating SESC, Ruby and DRAM Sim was used to run applications from the Splash2 benchmark(64KpointFFT).Link latencies are estimated for a16 core CMP simulation on Sapphire. Each tile has a single processor, L1 and L2 caches and a router. Different sizesofL1 andL2lead to different tile clock speeds, tile miss rates and tile area and hence interconnect latency. Simulations across various L1, L2 sizes indicate that the tile configuration that maximizes energy efficiency is related to minimizing communication time. Experiments also indicate different optimal tile configurations for performance, energy and energy efficiency. Clustered interconnection network, communication aware cache bank mapping and thread mapping to physical cores are also explored as potential energy saving solutions. Results indicate that ignoring link latencies can lead to large errors in estimates of program completion times, of up to 17%. Performance optimal configurations are achieved at lower L1 caches and at moderateL2 cache sizes due to higher operating frequencies and smaller link lengths and comparatively lesser communication. Using minimal L1 cache size to operate at the highest frequency may not always be the performance-power optimal choice. Larger L1 sizes, despite a drop in frequency, offer a energy advantage due to lesser communication due to misses. Clustered tile placement experiments for FFT show considerable performance per watt improvement (1.2%). Remapping most accessed L2 banks by a process in the same core or neighbouring cores after communication traffic analysis offers power and performance advantages. Remapped processes and banks in clustered tile placement show a performance per watt improvement of5.25% and energy reductionof2.53%. This suggests that processors could execute a program in multiple modes, for example, minimum energy, maximum performance.

Page generated in 0.5138 seconds