1 |
Thermal modeling of many-core processorsSathe, Nikhil 07 July 2010 (has links)
Sustaining high performance demand has led to the development of manycore processors. These manycore processors have thermal properties which are different from conventional processors. In order to understand the thermal characteristics of such manycore processors, we have developed a modeling environment with a rich set of features which can be used to used to model different scenarios in manycore processors. Using this modeling framework, we have developed a thermal management policy called 'Weight based management policy'. We have also developed a GUI based modeling tool which can be integrated into the computer architecture curriculum so as to enable students to understand the importance of thermal limitations right during the design phase.
|
2 |
Compilation efficace d'applications de traitement d'images pour processeurs manycore / Efficient Compilation of Image Processing Applications for Manycore ProcessorsGuillou, Pierre 30 November 2016 (has links)
Nous assistons à une explosion du nombre d’appareils mobiles équipés de capteurs optiques : smartphones, tablettes, drones... préfigurent un Internet des objets imminent. De nouvelles applications de traitement d’images (filtres, compression, réalité augmentée) exploitent ces capteurs mais doivent répondre à des contraintes fortes de vitesse et d’efficacité énergétique. Les architectures modernes — processeurs manycore, GPUs,... — offrent un potentiel de performance, avec cependant une hausse sensible de la complexité de programmation.L’ambition de cette thèse est de vérifier l’adéquation entre le domaine du traitement d’images et ces architectures modernes : concilier programmabilité, portabilité et performance reste encore aujourd’hui un défi. Le domaine du traitement d’images présente un fort parallélisme intrinsèque, qui peut potentiellement être exploité par les différents niveaux de parallélisme offerts par les architectures actuelles. Nous nous focalisons ici sur le domaine du traitement d’images par morphologie mathématique, et validons notre approche avec l’architecture manycore du processeur MPPA de la société Kalray.Nous prouvons d’abord la faisabilité de chaînes de compilation intégrées, composées de compilateurs, bibliothèques et d’environnements d’exécution, qui à partir de langages de haut niveau tirent parti de différents accélérateurs matériels. Nous nous concentrons plus particulièrement sur les processeurs manycore, suivant les différents modèles de programmation : OpenMP ; langage flot de données ; OpenCL ; passage de messages. Trois chaînes de compilation sur quatre ont été réalisées, et sont accessibles à des applications écrites dans des langages spécifiques au domaine du traitement d’images intégrés à Python ou C. Elles améliorent grandement la portabilité de ces applications, désormais exécutables sur un plus large panel d’architectures cibles.Ces chaînes de compilation nous ont ensuite permis de réaliser des expériences comparatives sur un jeu de sept applications de traitement d’images. Nous montrons que le processeur MPPA est en moyenne plus efficace énergétiquement qu’un ensemble d’accélérateurs matériels concurrents, et ceci particulièrement avec le modèle de programmation flot de données. Nous montrons que la compilation d’un langage spécifique intégré à Python vers un langage spécifique intégré à C permet d’augmenter la portabilité et d’améliorer les performances des applications écrites en Python.Nos chaînes de compilation forment enfin un environnement logiciel complet dédié au développement d’applications de traitement d’images par morphologie mathématique, capable de cibler efficacement différentes architectures matérielles, dont le processeur MPPA, et proposant des interfaces dans des langages de haut niveau. / Many mobile devices now integrate optic sensors; smartphones, tablets, drones... are foreshadowing an impending Internet of Things (IoT). New image processing applications (filters, compression, augmented reality) are taking advantage of these sensors under strong constraints of speed and energy efficiency. Modern architectures, such as manycore processors or GPUs, offer good performance, but are hard to program.This thesis aims at checking the adequacy between the image processing domain and these modern architectures: conciliating programmability, portability and performance is still a challenge today. Typical image processing applications feature strong, inherent parallelism, which can potentially be exploited by the various levels of hardware parallelism inside current architectures. We focus here on image processing based on mathematical morphology, and validate our approach using the manycore architecture of the Kalray MPPA processor.We first prove that integrated compilation chains, composed of compilers, libraries and run-time systems, allow to take advantage of various hardware accelerators from high-level languages. We especially focus on manycore processors, through various programming models: OpenMP, data-flow language, OpenCL, and message passing. Three out of four compilation chains have been developed, and are available to applications written in domain-specific languages (DSL) embedded in C or Python. They greatly improve the portability of applications, which can now be executed on a large panel of target architectures.Then, these compilation chains have allowed us to perform comparative experiments on a set of seven image processing applications. We show that the MPPA processor is on average more energy-efficient than competing hardware accelerators, especially with the data-flow programming model. We show that compiling a DSL embedded in Python to a DSL embedded in C increases both the portability and the performance of Python-written applications.Thus, our compilation chains form a complete software environment dedicated to image processing application development. This environment is able to efficiently target several hardware architectures, among them the MPPA processor, and offers interfaces in high-level languages.
|
3 |
Models and Methods for Development of DSP Applications on Manycore ProcessorsBengtsson, Jerker January 2009 (has links)
Advanced digital signal processing systems require specialized high-performance embedded computer architectures. The term high-performance translates to large amounts of data and computations per time unit. The term embedded further implies requirements on physical size and power efficiency. Thus the requirements are of both functional and non-functional nature. This thesis addresses the development of high-performance digital signal processing systems relying on manycore technology. We propose building two-level hierarchical computer architectures for this domain of applications. Further, we outline a tool flow based on methods and analysis techniques for automated, multi-objective mapping of such applications on distributed memory manycore processors. In particular, the focus is put on how to provide a means for tunable strategies for mapping of task graphs on array structured distributed memory manycores, with respect to given application constraints. We argue for code mapping strategies based on predicted execution performance, which can be used in an auto-tuning feedback loop or to guide manual tuning directed by the programmer. Automated parallelization, optimisation and mapping to a manycore processor benefits from the use of a concurrent programming model as the starting point. Such a model allows the programmer to express different types and granularities of parallelism as well as computation characteristics of importance in the addressed class of applications. The programming model should also abstract away machine dependent hardware details. The analytical study of WCDMA baseband processing in radio base stations, presented in this thesis, suggests dataflow models as a good match to the characteristics of the application and as execution model abstracting computations on a manycore. Construction of portable tools further requires a manycore machine model and an intermediate representation. The models are needed in order to decouple algorithms, used to transform and map application software, from hardware. We propose a manycore machine model that captures common hardware resources, as well as resource dependent performance metrics for parallel computation and communication. Further, we have developed a multifunctional intermediate representation, which can be used as source for code generation and for dynamic execution analysis. Finally, we demonstrate how we can dynamically analyse execution using abstract interpretation on the intermediate representation. It is shown that the performance predictions can be used to accurately rank different mappings by best throughput or shortest end-to-end computation latency.
|
Page generated in 0.0472 seconds