• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2
  • 1
  • 1
  • Tagged with
  • 4
  • 4
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Dodec: A Random-link Approach for Low-radix On-chip Networks

Yang, Haofan 11 December 2013 (has links)
Network topologies play a vital role in chip design; they largely determine the cost of the network and significantly impact performance in many-core architectures. We propose a novel set of on-chip networks, dodecs, and illustrate how they reduce network diameter with randomized low-radix router connections. In addition, we design an adaptive routing algorithm for dodec networks to achieve high throughput. By introducing randomness, dodec networks exhibit more uniform message latency. By using low-radix routers, dodec networks simplify the router microarchitecture and attain 20% area and 22% power reduction compared to mesh routers while delivering the same overall application performance for PARSEC. We compare our dodec network to alternative low-radix network topologies and show that at the same cost, dodec networks increase the throughput up to 50% while reducing average latency by 10% compared to a mesh.
2

Dodec: A Random-link Approach for Low-radix On-chip Networks

Yang, Haofan 11 December 2013 (has links)
Network topologies play a vital role in chip design; they largely determine the cost of the network and significantly impact performance in many-core architectures. We propose a novel set of on-chip networks, dodecs, and illustrate how they reduce network diameter with randomized low-radix router connections. In addition, we design an adaptive routing algorithm for dodec networks to achieve high throughput. By introducing randomness, dodec networks exhibit more uniform message latency. By using low-radix routers, dodec networks simplify the router microarchitecture and attain 20% area and 22% power reduction compared to mesh routers while delivering the same overall application performance for PARSEC. We compare our dodec network to alternative low-radix network topologies and show that at the same cost, dodec networks increase the throughput up to 50% while reducing average latency by 10% compared to a mesh.
3

Shared resource management for efficient heterogeneous computing

Lee, Jaekyu 13 January 2014 (has links)
The demand for heterogeneous computing, because of its performance and energy efficiency, has made on-chip heterogeneous chip multi-processors (HCMP) become the mainstream computing platform, as the recent trend shows in a wide spectrum of platforms from smartphone application processors to desktop and low-end server processors. The performance of on-chip GPUs is not yet comparable to that of discrete GPU cards, but vendors have integrated more powerful GPUs and this trend will continue in upcoming processors. In this architecture, several system resources are shared between CPUs and GPUs. The sharing of system resources enables easier and cheaper data transfer between CPUs and GPUs, but it also causes resource contention problems between cores. The resource sharing problem has existed since the homogeneous (CPU-only) chip-multi processor (CMP) was introduced. However, resource sharing in HCMPs shows different aspects because of the different nature of CPU and GPU cores. In order to solve the resource sharing problem in HCMPs, we consider efficient shared resource management schemes, in particular tackling the problem in shared last-level cache and interconnection network. In the thesis, we propose four resource sharing mechanisms: First, we propose an efficient cache sharing mechanism that exploits the different characteristics of CPU and GPU cores to effectively share cache space between them. Second, adaptive virtual channel partitioning for on-chip interconnection network is proposed to isolate inter-application interference. By partitioning virtual channels to CPUs and GPUs, we can prevent the interference problem while guaranteeing quality-of-service (QoS) for both cores. Third, we propose a dynamic frequency controlling mechanism to efficiently share system resources. When both cores are active, the degree of resource contention as well as the system throughput will be affected by the operating frequency of CPUs and GPUs. The proposed mechanism tries to find optimal operating frequencies for both cores, which reduces the resource contention while improving system throughput. Finally, we propose a second cache sharing mechanism that exploits GPU-semantic information. The programming and execution models of GPUs are more strict and easier than those of CPUs. Also, programmers are asked to provide more information to the hardware. By exploiting these characteristics, GPUs can energy-efficiently exercise the cache and simpler, but more efficient cache partitioning can be enabled for HCMPs.
4

Réconcilier performance et prédictibilité sur un many-coeur en utilisant des techniques d'ordonnancement hors-ligne / Reconciling performance and predictability on a noc-based mpsoc using off-line scheduling techniques

Fakhfakh, Manel 27 June 2014 (has links)
Les réseaux-sur-puces (NoCs) utilisés dans les architectures multiprocesseurs-sur-puces posent des défis importants aux approches d'ordonnancement temps réel en ligne (dynamique) et hors-ligne (statique). Un NoC contient un grand nombre de points de contention potentiels, a une capacité de bufferisation limitée et le contrôle réseau fonctionne à l'échelle de petits paquets de données. Par conséquent, l'allocation efficace de ressources nécessite l'utilisation des algorithmes da faible complexité sur des modèles de matériel avec un niveau de détail sans précédent dans l'ordonnancement temps réel. Nous considérons dans cette thèse une approche d'ordonnancement statique sur des architectures massivement parallèles (Massively parallel processor arrays ou MPPAs) caractérisées par un grand nombre (quelques centaines) de c¿urs de calculs. Nous identifions les mécanismes matériels facilitant l'analyse temporelle et l'allocation efficace de ressources dans les MPPAs existants. Nous déterminons que le NoC devrait permettre l'ordonnancement hors-ligne de communications, d'une manière synchronisée avec l'ordonnancement de calculs sur les processeurs. Au niveau logiciel, nous proposons une nouvelle méthode d'allocation et d'ordonnancement capable de synthétiser des ordonnancements globaux de calculs et de communications couvrants toutes les ressources d'exécution, de communication et de la mémoire d'un MPPA. Afin de permettre une utilisation efficace de ressources du matériel, notre méthode prend en compte les spécificités architecturales d'un MPPA et implémente des techniques d'ordonnancement avancées comme la préemption pré-calculée de transmissions de données. Nous avons évalué n / On-chip networks (NoCs) used in multiprocessor systems-on-chips (MPSoCs) pose significant challenges to both on-line (dynamic) and off-line (static) real-time scheduling approaches. They have large numbers of potential contention points, have limited internal buffering capabilities, and network control operates at the scale of small data packets. Therefore, efficient resource allocation requires scalable algorithms working on hardware models with a level of detail that is unprecedented in real-time scheduling. We consider in this thesis a static scheduling approach, and we target massively parallel processor arrays (MPPAs), which are MPSoCs with large numbers (hundreds) of processing cores. We first identify and compare the hardware mechanisms supporting precise timing analysis and efficient resource allocation in existing MPPA platforms. We determine that the NoC should ideally provide the means of enforcing a global communications schedule that is computed off-line (before execution) and which is synchronized with the scheduling of computations on processors. On the software side, we propose a novel allocation and scheduling method capable of synthesizing such global computation and communication schedules covering all the execution, communication, and memory resources in an MPPA. To allow an efficient use of the hardware resources, our method takes into account the specificities of MPPA hardware and implements advanced scheduling techniques such as pre-computed preemption of data transmissions. We evaluate our technique by mapping two signal processing applications, for which we obtain good latency, throughput, and resource use figures.

Page generated in 0.2496 seconds