• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 136
  • 37
  • 8
  • 8
  • 8
  • 8
  • 8
  • 8
  • 6
  • 4
  • 4
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 239
  • 112
  • 80
  • 71
  • 68
  • 61
  • 46
  • 39
  • 36
  • 35
  • 33
  • 31
  • 28
  • 23
  • 22
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
191

Resource-aware clustering design for NoC-based MPSoCs / Projeto de MPSoCs baseados em NoC utilizando clusterização e gerenciamento de recursos

Silva, Gustavo Girão Barreto da January 2014 (has links)
Atualmente, o paradigma multicore é uma tendência fortemente estabelecida também na área de sistemas embarcados. O grau de paralelismo provido por tal arquitetura tem sido a principal causa de avanços de performance na área além de economia de energia e potência. Entretanto, para obter paralelismo eficiente desta arquitetura não é uma tarefa simples. Assim, desenvolvedores propuseram diversos modelos de ambientes de programação tentando prover o máximo de transparência possível. No nível do hardware, este crescente aumento no número de componentes dentro chip cria um problema de gerenciamento a ser tratado. No contexto deste cenário complexo, esta tese propõe o uso de abordagens de gerenciamento de recursos para aumentar a eficiência, levando em consideração tanto performance quanto consumo de energia, de ambientes MPSoC em diferentes níveis. Além disso, estas abordagens tem em comum a noção de clusterização, a qual tenta agregar recursos logicamente de acordo com as demandas da aplicação. Primeiramente no nível do processador/aplicação, é proposto um hardware dinamicamente adaptável para suportar modelos de programação paralelos distintos sem nenhum sobrecusto computacional uma vez que todo o processo é completamente transparente para o programador. Ainda neste ambiente, onde aplicações distintas podem ser executadas, é proposto um mecanismo de escalonamento visando gerenciamento de recursos para aumentar a performance chamado Processor Clustering. São propostas quatro diferentes políticas de mapeamento de recursos que tiram vantagem de aspectos distintos da natureza paralela das aplicações e das restrições arquiteturais do sistema. Entretanto, algumas aplicações tem demandas de memória mais altas do que demandas computacionais. Logo, uma abordagem similar pode ser utilizada no nível da hierarquia de memória. Neste caso, o objetivo é redistribuir recursos de memória de acordo com as demandas da aplicação. Redistribuição de memória é explorada tanto em tempo de projeto quanto em tempo de execução. Um mecanismo de mapeamento de distribuição é proposto baseado na quantidade de requisições de acesso à memória externa. Finalmente, é proposto um mecanismo de tolerância à falhas baseado em gerenciamento de recursos para memórias distribuídas dentro do chip em NoCs. É introduzido um modelo de Reliability Clustering que tira proveito da infraestrutura da NoC. Neste caso, os roteadores tem conhecimento dos blocos com falhas e blocos redundantes. Baseado neste conhecimento, o mecanismo é capaz evitar altas latências de acesso à memória. / The multicore paradigm is a solid trend nowadays, also in the field of embedded systems. The degree of parallelism provided by such architecture has been the foundation of performance advancements in the field as well as for power and energy savings. However, to obtain efficient parallelism of such architecture is not an easy task. Therefore, developers come up with several proposals of programming environments trying to provide as much transparency as possible. On the hardware side, this increasing number of on-chip components creates a management issue to be handled. In the context of this complex scenario this thesis proposes the use of resource management approaches to improve the efficiency, regarding both performance and energy consumption, of MPSoC environments at different levels. Also, these approaches have in common the notion of clustering, which tries to logically aggregate resources according to application demands. First, at the processor/application level, we propose a dynamically adaptable hardware to support distinct parallel programming models at no computational overhead, since the entire process is completely transparent to the programmer. Also, in this environment, where distinct applications can be executed, we propose a resource-aware scheduling mechanism to improve performance named Processor Clustering. We propose four different resource mapping policies that leverage on distinct aspects of the parallel nature of the applications and on architecture constraints. However, some applications have higher memory demands than computational demands. Therefore, a similar approach can be used at the memory level. In this case, we aim at redistributing memory resources according to application demands. We explore memory redistribution at both design time and runtime and propose a distribution mapping mechanism based on the amount of off-chip memory requests. Finally, we propose a resource-aware fault-tolerance mechanism for distributed on-chip memories in NoCs. We introduce a Reliability Clustering model that leverages on the NoC infrastructure. In this case, the routers have knowledge of faulty blocks and redundancy blocks and, based on that, they are able to avoid higher memory access latency.
192

Torus routing in the presence of multicasts

Ishibashi, Hiroki 01 January 1996 (has links)
No description available.
193

Compiling for a multithreaded dataflow architecture : algorithms, tools, and experience / Compilation pour une architecture multi-thread à flot de données : algorithmes, outils et retour d'expérience

Li, Feng 20 May 2014 (has links)
Quelque-soit le multiprocesseur et son architecture, la facilité de leur programmation demeure une difficulté majeure. Une croyance bien installée est que l’exploitation correcte et efficace du parallélisme dans une application est une question pour les concepteurs d’outils de développement logiciel. Selon cette vision, nous avons besoin de techniques de compilation plus sophistiqués pour partitionner une application en threads simultanés. Mais de nombreux experts revendiquent que l'architecture joue un rôle tout aussi important: il faut opérer un changement fondamental dans l'architecture de processeurs avant que l’on puisse espérer des progrès importants au niveau de leur programmabilité. Notre approche favorise la convergence de ces points de vue. La convergence entre le calcul parallèle “en flot de données” avec l'architecture de von Neumann est porteuse de nombreuses promesses. En particulier en termes de tolérance à la latence, en termes d’exploitation d'un haut degré de parallélisme, le tout pour un très faible coût de changement de contexte entre threads. Les architectures à flot de données multithread exigent un haut degré de parallélisme pour tolérer la latence. D'autre part, le partitionnement d’un programme en un grand nombre de threads à grain fin est une source d'erreurs commune pour les développeurs. Pour reconcilier ces faits, nous nous efforçons de faire progresser l'état de l'art dans le partitionnement automatique de threads, conjointement avec le support du langage de programmation pour l’exploitation de parallélisme à plus gros grain, tout en préservant un concurrence déterministe. Cette thèse présente un algorithme général de partitionnement de threads, pour transformer du code séquentiel en un programme exprimant du parallélisme en flot de données. Notre algorithme fonctionne sur le Program Dependence Graph (PDG) et la forme en assignation unique statique (Static Single Assignment, SSA), pour extraire du parallélisme de tâche, pipeline, et de données, en présence de flot de contrôle arbitraire. Nous avons conçu une nouvelle représentation intermédiaire pour faciliter la génération de code, et son exécution parallèle en flot de données. Nous avons également mis en œuvre ces algorithmes dans un prototype fondé sur GCC, et contribué au développement d’une plateforme de simulation permettant d’explorer la parallélisation en flot de données à grande échelle. Ces extensions et l'architecture simulée permettent l'exploration de modèles innovants de mémoire pour le parallélisme en flot de données. Ces outils et modèles ont également été évalués sur des applications réalistes. / Across the wide range of multiprocessor architectures, all seem to share one common problem: they are hard to program. It is a general belief that parallelism is a software problem, and that perhaps we need more sophisticated compilation techniques to partition the application into concurrent threads. Many experts also make the point that the underlining architecture plays an equally important architecture before one may expect significant progress in the programmability of multiprocessors. Our approach favors a convergence of these viewpoints. The convergence of dataflow and von Neumann architecture promises latency tolerance, the exploitation of a high degree of parallelism, and light thread switching cost. Multithreaded dataflow architectures require a high degree of parallelism to tolerate latency. On the other hand, it is error-prone for programmers to partition the program into large number of fine grain threads. To reconcile these facts, we aim to advance the state of the art in automatic thread partitioning, in combination with programming language support for coarse-grain, functionally deterministic concurrency. This thesis presents a general thread partitioning algorithm for transforming sequential code into a parallel data-flow program targeting a multithreaded dataflow architecture. Our algorithm operates on the program dependence graph and on the static single assignment form, extracting task, pipeline, and data parallelism from arbitrary control flow, and coarsening its granularity using a generalized form of typed fusion. We design a new intermediate representation to ease code generation for an explicit token match dataflow execution model. We also implement a GCC-based prototype. We also evaluate coarse-grain dataflow extensions of OpenMP in the context of a large-scale 1024-core, simulated multithreaded dataflow architecture. These extension and simulated architecture allow the exploration of innovative memory models for dataflow computing. We evaluate these tools and models on realistic applications.
194

Simulation and emulation of massively parallel processor for solving constraint satisfaction problems based on oracles

Chaudhari, Gunavant Dinkar 01 January 2011 (has links)
Most part of my thesis is devoted to efficient automated logic synthesis of oracle processors. These Oracle Processors are of interest to several modern technologies, including Scheduling and Allocation, Image Processing and Robot Vision, Computer Aided Design, Games and Puzzles, and Cellular Automata, but so far the most important practical application is to build logic circuits to solve various practical Constraint Satisfaction Problems in Intelligent Robotics. For instance, robot path planning can be reduced to Satisfiability. In short, an oracle is a circuit that has some proposition of solution on the inputs and answers yes/no to this proposition. In other language, it is a predicate or a concept-checking machine. Oracles have many applications in AI and theoretical computer science but so far they were not used much in hardware architectures. Systematic logic synthesis methodologies for oracle circuits were so far not a subject of a special research. It is not known how big advantages these processors will bring when compared to parallel processing with CUDA/GPU processors, or standard PC processing. My interest in this thesis is only in architectural and logic synthesis aspects and not in physical (technological) design aspects of these circuits. In future, these circuits will be realized using reversible, nano and some new technologies, but the interest in this thesis is not in the future realization technologies. We want just to answer the following question: Is there any speed advantage of the new oracle-based architectures, when compared with standard serial or parallel processors?
195

REAL-TIME SCHEDULING ALGORITHMS FOR PRECEDENCE RELATED TASKS ON HETEROGENEOUS MULTIPROCESSORS

AULUCK, NITIN 23 May 2005 (has links)
No description available.
196

Capacity Metric for Chip Heterogeneous Multiprocessors

Otoom, Mwaffaq Naif 05 March 2012 (has links)
The primary contribution of this thesis is the development of a new performance metric, Capacity, which evaluates the performance of Chip Heterogeneous Multiprocessors (CHMs) that process multiple heterogeneous channels. Performance metrics are required in order to evaluate any system, including computer systems. A lack of appropriate metrics can lead to ambiguous or incorrect results, something discovered while developing the secondary contribution of this thesis, that of workload modes for CHMs — or Workload Specific Processors (WSPs). For many decades, computer architects and designers have focused on techniques that reduce latency and increase throughput. The change in modern computer systems built around CHMs that process multi-channel communications in the service of single users calls this focus into question. Modern computer systems are expected to integrate tens to hundreds of processor cores onto single chips, often used in the service of single users, potentially as a way to access the Internet. Here, the design goal is to integrate as much functionality as possible during a given time window. Without the ability to correctly identify optimal designs, not only will the best performing designs not be found, but resources will be wasted and there will be a lack of insight to what leads to better performing designs. To address performance evaluation challenges of the next generation of computer systems, such as multicore computers inside of cell phones, we found that a structurally different metric is needed and proceeded to develop such a metric. In contrast to single-valued metrics, Capacity is a surface with dimensionality related to the number of input streams, or channels, processed by the CHM. We develop some fundamental Capacity curves in two dimensions and show how Capacity shapes reveal interaction of not only programs and data, but the interaction of multiple data streams as they compete for access to resources on a CHM as well. For the analysis of Capacity surface shapes, we propose the development of a demand characterization method in which its output is in the form of a surface. By overlaying demand surfaces over Capacity surfaces, we are able to identify when a system meets its demands and by how much. Using the Capacity metric, computer performance optimization is evaluated against workloads in the service of individual users instead of individual applications, aggregate applications, or parallel applications. Because throughput was originally derived by drawing analogies between processor design and pipelines in the automobile industry, we introduce our Capacity metric for CHMs by drawing an analogy to automobile production, signifying that Capacity is the successor to throughput. By developing our Capacity metric, we illustrate how and why different processor organizations cannot be understood as being better performers without both magnitude and shape analysis in contrast to other metrics, such as throughput, that consider only magnitude. In this work, we make the following major contributions: • Definition and development of the Capacity metric as a surface with dimensionality related to the number of input streams, or channels, processed by the CHM. • Techniques for analysis of the Capacity metric. Since the Capacity metric was developed out of necessity, while pursuing the development of WSPs, this work also makes the following minor contributions: • Definition and development of three foundations in order to establish an experimental foundation — a CHM model, a multimedia cell phone example, and a Workload Specific Processor (WSP). • Definition of Workload Modes, which was the original objective of this thesis. • Definition and comparison of two approaches to workload mode identification at run time; The Workload Classification Model (WCM) and another model that is based on Hidden Markov Models (HMMs). • Development of a foundation for analysis of the Capacity metric, so that the impact of architectural features in a CHM may be better understood. In order to do this, we develop a Demand Characterization Method (DCM) that characterizes the demand of a specific usage pattern in the form of a curve (or a surface in general). By doing this, we will be able to overlay demand curves over Capacity curves of different architectures to compare their performance and thus identify optimal performing designs. / Ph. D.
197

Utility Accrual Real-Time Scheduling and Synchronization on Single and Multiprocessors: Models, Algorithms, and Tradeoffs

Cho, Hyeonjoong 26 September 2006 (has links)
This dissertation presents a class of utility accrual scheduling and synchronization algorithms for dynamic, single and multiprocessor real-time systems. Dynamic real-time systems operate in environments with run-time uncertainties including those on activity execution times and arrival behaviors. We consider the time/utility function (or TUF) timing model for specifying application time constraints, and the utility accrual (or UA) timeliness optimality criteria of satisfying lower bounds on accrued activity utility, and maximizing the total accrued utility. Efficient TUF/UA scheduling algorithms exist for single processors---e.g., the Resource-constrained Utility Accrual scheduling algorithm (RUA), and the Dependent Activity Scheduling Algorithm (DASA). However, they all use lock-based synchronization. To overcome shortcomings of lock-based (e.g., serialized object access, increased run-time overhead, deadlocks), we consider non-blocking synchronization including wait-free and lock-free synchronization. We present a buffer-optimal, scheduler-independent wait-free synchronization protocol (the first such), and develop wait-free versions of RUA and DASA. We also develop their lock-free versions, and upper bound their retries under the unimodal arbitrary arrival model. The tradeoff between wait-free, lock-free, and lock-based is fundamentally about their space and time costs. Wait-free sacrifices space efficiency in return for no additional time cost, as opposed to the blocking time of lock-based and the retry time of lock-free. We show that wait-free RUA/DASA outperform lock-based RUA/DASA when the object access times of both approaches are the same, e.g., when the shared data size is so large that the data copying process dominates the object access time of two approaches. We derive lower bounds on the maximum accrued utility that is possible with wait-free over lock-based. Further, we show that when maximum sojourn times under lock-free RUA/DASA is shorter than under lock-based, it is a necessary condition that the object access time of lock-free is shorter than that of lock-based. We also establish the maximum increase in activity utility that is possible under lock-free and lock-based. Multiprocessor TUF/UA scheduling has not been studied in the past. For step TUFs, periodic arrivals, and under-loads, we first present a non-quantum-based, optimal scheduling algorithm called Largest Local Remaining Execution time-tasks First (or LLREF) that yields the optimum total utility. We then develop another algorithm for non-step TUFs, arbitrary arrivals, and overloads, called the global Multiprocessor Utility Accrual scheduling algorithm (or gMUA). We show that gMUA lower bounds each activity's accrued utility, as well as the system-wide, total accrued utility. We consider lock-based, lock-free, and wait-free synchronization under LLREF and gMUA. We derive LLREF's and gMUA's minimum-required space cost for wait-free synchronization using our space-optimal wait-free algorithm, which also applies for multiprocessors. We also develop lock-free versions of LLREF and gMUA with bounded retries. While the tradeoff between wait-free LLREF/gMUA versus lock-based LLREF/gMUA is similar to that for the single processor case, that between lock-free LLREF/gMUA and lock-based LLREF/gMUA hinges on the cost of the lock-free retry, blocking time under lock-based, and the operating system overhead. / Ph. D.
198

High Performance Applications for the Single-Chip Message-Passing Parallel Computer

Dickenson, William Wesley 05 May 2004 (has links)
Computer architects continue to push the limits of modern microprocessors. By using techniques such as out-of-order execution, branch prediction, and dynamic scheduling, designers have found ways to speed execution. However, growing architectural complexity has led to unsustained development and testing times. Shrinking feature sizes are causing increased wire resistances and signal propagation, thereby limiting a design's scalability. Indeed, the method of exploiting instruction-level parallelism (ILP) within applications is reaching a point of diminishing returns. One approach to the aforementioned challenges is the Single-Chip Message-Passing (SCMP) Parallel Computer, developed at Virginia Tech. SCMP is a unique, tiled architecture aimed at thread-level parallelism (TLP). Identical cores are replicated across the chip, and global wire traces have been eliminated. The nodes are connected via a 2-D grid network and each contains a local memory bank. This thesis presents the design and analysis of three high-performance applications for SCMP. The results show that the architecture proves itself as a formidable opponent to several current systems. / Master of Science
199

Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors

Lodde, Mario 04 February 2014 (has links)
La jerarquía de caches y la red en el chip (NoC) son dos componentes clave de los chip multiprocesadores (CMPs). La mayoría del trafico en la NoC se debe a mensajes que las caches envían según lo que establece el protocolo de coherencia. La cantidad de trafico, el porcentaje de mensajes cortos y largos y el patrón de trafico en general varían dependiendo de la geometría de las caches y del protocolo de coherencia. La arquitectura de la NoC y la jerarquía de caches están de hecho firmemente acopladas, y estos dos componentes deben ser diseñados y evaluados conjuntamente para estudiar como el variar uno afecta a las prestaciones del otro. Además, cada componente debe ajustarse a los requisitos y a las oportunidades del otro, y al revés. Normalmente diferentes clases de mensajes se envían por diferentes redes virtuales o por NoCs con diferente ancho de banda, separando mensajes largos y cortos. Sin embargo, otra clasificación de los mensajes se puede hacer dependiendo del tipo de información que proveen: algunos mensajes, como las peticiones de datos, necesitan campos para almacenar información (dirección del bloque, tipo de petición, etc.); otros, como los mensajes de reconocimiento (ACK), no proporcionan ninguna información excepto por el ID del nodo destino: solo proveen una información de tipo temporal, en el sentido que la recepción de un ACK indica que el nodo fuente ha recibido el mensaje al que está contestando con el ACK y completado todas las operaciones determinadas por el protocolo de coherencia. Esta segunda clase de mensaje no necesita de mucho ancho de banda: la latencia es mucho mas importante, dado que el nodo destino esta típicamente bloqueado esperando la recepción de ellos. En este trabajo de tesis se desarrolla una red dedicada para trasmitir la segunda clase de mensajes; la red es muy sencilla y rápida, y permite la entrega de los ACKs con una latencia de pocos ciclos de reloj. Reduciendo la latencia y el trafico en la NoC debido a los ACKs, es posible: -acelerar la fase de invalidación en fase de escritura en un sistema que usa un protocolo de coherencia basado en directorios -mejorar las prestaciones de un protocolo de coerencia basado en broadcast, hasta llegar a prestaciones comparables con las de un protocolo de directorios pero sin el coste de área debido a la necesidad de almacenar el directorio -implementar un mapeado dinámico de bloques a las caches de ultimo nivel de forma eficiente, con el objetivo de acercar cuanto al máximo los bloques a los cores que los utilizan El objetivo final es obtener un co-diseño de NoC y jerarquía de caches que minimice los problemas de escalabilidad de los protocolos de coherencia. Como gran objetivo final, se pretende la implementación de un CMP con ubicación dinámica de los recursos de cache y red, tal que estos recursos se puedan particionar de forma eficiente e independiente para asignar diferentes particiones a diferentes aplicaciones en un entorno virtualizado. / Lodde, M. (2014). Smart Memory and Network-On-Chip Design for High-Performance Shared-Memory Chip Multiprocessors [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/35325
200

Head-of-Line Blocking Reduction in Power-Efficient Networks-on-Chip

Escamilla López, José Vicente 03 November 2017 (has links)
Tesis por compendio / Nowadays, thanks to the continuous improvements in the integration scale, more and more cores are added on the same chip, leading to higher system performance. In order to interconnect all nodes, a network-on-chip (NoC) is used, which is in charge of delivering data between cores. However, increasing the number of cores leads to a significant power consumption increase, leading the NoC to be one of the most expensive components in terms of power. Because of this, during the last years, several mechanisms have been proposed to address the NoC power consumption by means of DVFS (Dynamic Voltage and Frequency Scaling) and power-gating strategies. Nevertheless, improvements achieved by these mechanisms are achieved, to a greater or lesser extent, at the cost of system performance, potentially increasing the risk of saturating the network by forming congested points which, in turn, compromise the rest of the system functionality. One side effect is the creation of the "Head-of-Line blocking" effect where congested packets at the head of queues prevent other non-blocked packets from advancing. To address this issue, in this thesis, on one hand, we propose novel congestion control techniques in order to improve system performance by removing the "Head-of-Line" blocking effect. On the other hand, we propose combined solutions adapted to DVFS in order to achieve improvements in terms of performance and power. In addition to this, we propose a path-aware power-gating-based mechanism, which is capable of detecting the flows sharing buffer resources along data paths and perform to switch them off when not needed. With all these combined solutions we can significantly reduce the power consumption of the NoC when compared with state-of-the-art proposals. / Hoy en día, gracias a las mejoras en la escala de integración cada vez se integran más y más núcleos en un mismo chip, mejorando así sus prestaciones. Para interconectar todos los nodos dentro del chip se emplea una red en chip (NoC, Network-on-Chip), la cual es la encargada de intercambiar información entre núcleos. No obstante, aumentar el número de núcleos en el chip también conlleva a su vez un importante incremento en el consumo de la NoC, haciendo que ésta se convierta en una de las partes más caras del chip en términos de consumo. Por ello, en los últimos años se han propuesto diversas técnicas de ahorro de energía orientadas a reducir el consumo de la NoC mediante el uso de DVFS (Dynamic Voltage and Frequency Scaling) o estrategias basadas en "power-gating". Sin embargo, éstas mejoras de consumo normalmente se obtienen a costa de sacrificar, en mayor o menor medida, las prestaciones del sistema, aumentado potencialmente así el riesgo de saturar la red, generando puntos de congestión que, a su vez, comprometen el rendimiento del resto del sistema. Un efecto colateral es el "Head-of-Line blocking", mediante el que paquetes congestionados en la cabeza de la cola impiden que otros paquetes no congestionados avancen. Con el fin de solucionar este problema, en ésta tesis, en primer lugar, proponemos técnicas novedosas de control de congestión para incrementar el rendimiento del sistema mediante la eliminación del "Head-of-Line blocking", mientras que, por otra parte, proponemos soluciones combinadas adaptadas a DVFS con el fin de conseguir mejoras en términos de rendimiento y energía. Además, proponemos una técnica de "power-gating" orientada a rutas de datos, la cual es capaz de detectar flujos de datos compartiendo recursos a lo largo de rutas y apagar dichos recursos de forma dinámica cuando no son necesarios. Con todas éstas soluciones combinadas podemos reducir el consumo de energía de la NoC en comparación con otras técnicas presentes en el estado del arte. / Hui en dia, gr\`acies a les millores en l'escala d'integraci\'o, cada vegada s'integren m\'es i m\'es nuclis en un mateix xip, la qual cosa millora les seues prestacions. Per tal d'interconectar tots els nodes dins el xip es fa \'us d'una Xarxa en Xip (NoC; Network-on-Chip), la qual \'es l'encarregada d'intercanviar informaci\'o entre els nuclis. No obstant aix\`o, incrementar el nombre de nuclis en el xip tamb\'e comporta un important augment en el consum de la NoC, la qual cosa fa que aquesta es convertisca en una de les parts m\'es costoses del xip en termes de consum. Per aix\`o, en els \'ultims anys s'han proposat diverses t\`ecniques d'estalvi d'energia orientades a reduir el consum de la NoC mitjançant l'\'us de DVFS (Dynamic Voltage and Frequency Scaling) o estrat\`egies basades en ``power-gating''. Malgrat aix\`o, aquestes millores en les prestacions normalment s'obtenen a costa de sacrificar, en major o menor mesura, les prestacions del sistema i augmenta aix\'i el risc de saturar la xarxa al generar-se punts de congesti\'o, que al mateix temps, comprometen el rendiment de la resta del sistema. Un efecte col-lateral \'es el ``Head-of- Line blocking'', mitjançant el qual, els paquets congestionats al cap de la cua, impedixen que altres paquets no congestionats avancen. A fi de solucionar eixe problema, en aquesta tesi, en primer lloc, proposem noves t\`ecniques de control de congesti\'o amb l'objectiu d'incrementar el rendiment del sistema per mitj\`a de l'eliminaci\'o del ``Head-of- Line blocking'', i d'altra banda, proposem solucions combinades adaptades a DVFS amb la finalitat d'aconseguir millores en termes de rendiment i energia. A m\'es, proposem una t\`ecnica de ``power-gating'' orientada a rutes de dades, la qual \'es capa\c c de detectar fluxos de dades al compartir recursos al llarg de les rutes i apagar eixos recursos de forma din\`amica quan no s\'on necessaris. Amb totes aquestes solucions combinades podem reduir el consum d'energia de la NoC en comparaci\'o amb altres t\`ecniques presents en l'estat de l'art. / Escamilla López, JV. (2017). Head-of-Line Blocking Reduction in Power-Efficient Networks-on-Chip [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90419 / Compendio

Page generated in 0.0693 seconds