Global ETD Search

11	Estudo sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC Silva, Gustavo Girão Barreto da January 2009 (has links) Ao longo dos últimos anos, os sistemas embarcados vêm se tornando cada vez mais complexos tanto em termos de hardware quanto de software. Ultimamente têm-se adotado como solução o uso de MPSoCs (sistemas multiprocessados integrados em chip) para uma maior eficiência energética e computacional nestes sistemas. Com o uso de diversos elementos de processamento, redes-em-chip (NoC - networks-on-chip) aparecem como soluções de melhor desempenho do que barramentos. Nestes ambientes cujo desempenho depende da eficiência do modelo de comunicação, a hierarquia de memória se torna um elemento chave. Baseando-se neste cenário, este trabalho realiza uma investigação sobre o impacto da hierarquia de memória em MPSoCs baseados em NoC. Dentro deste escopo foi desenvolvida uma nova organização de memória fisicamente centralizada com diferentes espaços de endereçamentos denominada nDMA. Este trabalho também apresenta uma comparação entre a nova organização e outras três organizações bastante difundidas tais como memória distribuída, memória compartilhada e memória compartilhada distribuída. Estas duas ultimas adotam um modelo de coerência de cache baseado em diretório completamente desenvolvido em hardware. Os modelos de memória foram implementados na plataforma virtual SIMPLE (SIMPLE Multiprocessor Platform Environment). Resultados experimentais mostram uma forte dependência com relação à carga de comunicação gerada pelas aplicações. O modelo de memória distribuída apresenta melhores resultados conforme a carga de comunicação das aplicações é baixa. Por outro lado, o novo modelo de memória fisicamente compartilhado com diferentes espaços de endereçamento apresenta melhores resultados conforme a carga de comunicação das aplicações é alta. Também foram realizados experimentos objetivando analisar o desempenho dos modelos de memória em situações de alta latência de comunicação na rede. Resultados mostram melhores resultados do modelo de memória distribuída quando a carga de comunicação das aplicações é alta e, caso contrário, o modelo nDMA apresenta melhores resultados. Por fim, foram analisados os desempenhos dos modelos de memória durante o processo de migração de tarefas. Neste caso, os modelos de memória compartilhada e compartilhada distribuída apresentaram melhores resultados devido ao fato de que não se faz necessária o envio dos dados da aplicação nestes modelos e também devido ao menor tamanho de código se comparado com os outros modelos. / In the past few the years, embedded systems have become even more complex both on terms of hardware and software. Lately, the use of MPSoCs (Multi-Processor Systems-on-Chip) has been adopted on these systems for a better energetic and computational efficiency. Due to the use of several processing elements, Networks-on-Chip arise as better performance solutions than buses. Considering this scenario, this work performs an investigation on the impact of memory hierarchy in NoC-based MPSoCs. In this context, a new physically centralized and shared memory organization with different address spaces named nDMA was developed. This work also presents a comparison between the new memory organization and three different well-known memory hierarchy models such as distributed memory and shared and distributed shared memories that make use of a fully hardware cache coherence solution. The memory models were implemented in the SIMPLE (SIMPLE Multiprocessor Platform Environment) virtual platform. Experimental results shows a strong dependency on the application communication workload. The distributed memory model presents better results as the application communication workload is low. On the other hand, the new memory model (physically shared with different address spaces) presents better results as the application communication workload is high. There were also experiments aiming at observing the performance of the memory models in situations where the communication latency on the network is high. Results show better results of the distributed memory model when the application communication workload is high, and the nDMA model presents better results otherwise. Finally, the performance of the memory models during a task migration process were evaluated. In this case, the shared memory and distributed shared memory models presented better results due to the fact that in this case the data memory does not need to be transferred from one point to another and also due to the low size of the memory code in these cases if compared to other memory models. Microeletrônica MPSoC NoC Embedded systems Multiprocessor system-on-chip Network-on-chip Cache coherence Task migration
12	Paralelní genetický algoritmus pro vícejádrové systémy / The Parallel Genetic Algorithm for Multicore Systems Vrábel, Lukáš January 2010 (has links) Genetický algoritmus je optimalizačná metóda zameraná na efektívne hľadanie riešení rozličných problémov. Je založená na princípe evolúcie a prirodzeného výberu najschopnejších jedincov v prírode. Keďže je táto metóda výpočtovo náročná, bolo vymyslených veľa spôsobov na jej paralelizáciu. Avšak väčšina týchto metód je z historických dôvodov založená na superpočítačoch alebo rozsiahlych počítačových systémoch. Moderný vývoj v oblasti informačných technológií prináša na trh osobných počítačov stále lacnejšie a výkonnejšie viacjadrové systémy. Táto práca sa zaoberá návrhom nových metód paralelizácie genetického algoritmu, ktoré sa snažia naplno využiť možnosti práve týchto počítačových systémov. Tieto metódy sú následne naimplementované v programovacom jazyku C za využitia knižnice OpenMP určenej na paralelizáciu. Implementácia je následne použitá na experimentálne ohodnotenie rozličných charakteristík každej z prezentovaných metód (zrýchlenie oproti sekvenčnej verzii, závislosť konvergencie výsledných hodnôt od miery paralelizácie alebo od vyťaženia procesoru, ...). V poslednej časti práce sú prezentované porovnania nameraných hodnôt a závery vyplývajúce z týchto meraní. Následne sú prediskutované možné vylepšenia daných metód vyplývajúce z týchto záverov, ako aj možnosti spracovania väčšieho množstva charakteristík na presnejšie ohodnotenie efektivity paralelizácie genetických algoritmov.
13	Mapping to a Time-predictable Multiprocessor System-on-Chip Amstutz, Christian January 2012 (has links) Traditional design methods could not cope with the recent development of multiprocessorsystems-on-chip (MPSoC). Especially, hard real-time systems that requiretime-predictability are cumbersome to develop. What is needed, is an efficient, automaticprocess that abstracts away all the implementation details. ForSyDe, a designmethodology developed at KTH, allows this on the system modelling side. The NoCSystem Generator, another project at KTH, has the ability to create automaticallycomplex systems-on-chip based on a network-on-chip on an FPGA. Both of themsupport the synchronous model of computation to ensure time-predictability. Inthis thesis, these two projects are analysed and modelled. Considering the characteristicsof the projects and exploiting the properties of the synchronous model ofcomputation, a mapping process to map processes to the processors at the differentnetwork nodes of the generated system-on-chip was developed. The mapping processis split into three steps: (1) Binding processes to processors, (2) Placement of theprocessors on net network nodes, and (3) scheduling of the processes on the nodes.An implementation of the mapping process is described and some synthetic exampleswere mapped to show the feasibility of algorithms. Mapping Synchronous Systems Multiprocessor System-on-Chip Design Methodology Time-predictability Engineering and Technology Teknik och teknologier
14	Scheduling of tasks in multiprocessor system using hybrid genetic algorithms Varghese, B., Hossain, M. Alamgir, Dahal, Keshav P. January 2007 (has links) This paper presents an investigation into the optimal scheduling of realtime tasks of a multiprocessor system using hybrid genetic algorithms (GAs). A comparative study of heuristic approaches such as `Earliest Deadline First (EDF)¿ and `Shortest Computation Time First (SCTF)¿ and genetic algorithm is explored and demonstrated. The results of the simulation study using MATLAB is presented and discussed. Finally, conclusions are drawn from the results obtained that genetic algorithm can be used for scheduling of real-time tasks to meet deadlines, in turn to obtain high processor utilization. Optimal scheduling Hard real-time tasks Multiprocessor system Heuristics Genetic algorithms
15	Compilation d'applications flot de données paramétriques pour MPSoC dédiés à la radio logicielle / Compilation of Parametric Dataflow Applications for Software-Defined-Radio-Dedicated MPSoCs Dardaillon, Mickaël 19 November 2014 (has links) Le développement de la radio logicielle fait suite à l’évolution rapide du domaine des télécommunications. Les besoins en performance et en dynamicité ont donné naissance à des MPSoC dédiés à la radio logicielle. La spécialisation de ces MPSoC rend cependant leur pro- grammation et leur vérification complexes. Des travaux proposent d’atténuer cette complexité par l’utilisation de paradigmes tels que le modèle de calcul flot de données. Parallèlement, le besoin de modèles flexibles et vérifiables a mené au développement de nouveaux modèles flot de données paramétriques. Dans cette thèse, j’étudie la compilation d’applications utilisant un modèle de calcul flot de données paramétrique et ciblant des plateformes de radio logicielle. Après un état de l’art du matériel et logiciel du domaine, je propose un raffinement de l’ordonnancement flot de données, et présente son application à la vérification des tailles mémoires. Ensuite, j’introduis un nouveau format de haut niveau pour définir le graphe et les acteurs flot de données, ainsi que le flot de compilation associé. J’applique ces concepts à la génération de code optimisé pour la plateforme de radio logicielle Magali. La compilation de parties du protocole LTE permet d’évaluer les performances du flot de compilation proposé. / The emergence of software-defined radio follows the rapidly evolving telecommunication domain. The requirements in both performance and dynamicity has engendered software- defined-radio-dedicated MPSoCs. Specialization of these MPSoCs make them difficult to program and verify. Dataflow models of computation have been suggested as a way to mi- tigate this complexity. Moreover, the need for flexible yet verifiable models has led to the development of new parametric dataflow models. In this thesis, I study the compilation of parametric dataflow applications targeting software-defined-radio platforms. After a hardware and software state of the art in this field, I propose a new refinement of dataflow scheduling, and outline its application to buffer size’s verification. Then, I introduce a new high-level format to define dataflow actors and graph, with the associated compilation flow. I apply these concepts to optimised code generation for the Magali software-defined-radio platform. Compilation of parts of the LTE protocol are used to evaluate the performances of the proposed compilation flow. Télécommunications Radio logicielle Flot de données Système embarqué Multiprocessor System-On-Chip - MPSoC Compilation de données Protocole LTE Telecommunications Software radio Data Flow Embedded System Multiprocessor System-On-Chip - MPSoC Data compilation LTE Protocol 621.384 028 507 2
16	Design and Multi-Technology Multi-objective Comparative Analysis of Families of MPSOC. Wang, Zhoukun 12 November 2009 (has links) (PDF) Multiprocessor system on chip (MPSOC) have strongly emerged in the past decade in communication, multimedia, networking and other embedded domains. MPSOC became a new paradigm of high performance embedded application design. This thesis addresses the design and the physical implementation of a Network on Chip (NoC) based Multiprocessor System on Chip. We studied several aspects at different design stages: high level synthesis, architecture design, FPGA implementation, application evaluation and ASIC physical implementation. We try to analysis and find the impacts of these aspects for the MPSOC's final performance, power consumption and area cost. We implemented a NoC based 16 processors embedded system on FPGA prototyping. Three NoCs provide different functionalities for sixteen PE tiles. We also demonstrated the use of our performance monitoring system for software debugging and tuning. With the bi-synchronous FIFO method, our GALS architecture successfully solves the long clock signal distribution problem and allows that each clock domain can run at its own clock frequency. On the other hand we successfully implemented AES and TDES block cipher cryptographic algorithms on this platform and results show linear speedup in computation time. The network part of our architecture has been implemented on ASIC technology and has been explored with different timing constraints and different library categories of STmicroelectronics' 65nm/45nm technologies. The experimental results of ASIC and FPGA are compared, and we inducted the discussion of technology change impact on parallel programming. Network-on-Chip Multiprocessor system on chip High level synthesis Fpga Asic
17	Stratégie de réduction des cycles thermiques pour systèmes temps-réel multiprocesseurs sur puce / Strategy to reduce thermal cycles for real-time multiprocessor systems-on-chip Baati, Khaled 19 December 2013 (has links) L'augmentation de la densité des transistors dans les circuits électroniques conduit à une augmentation de la consommation d'énergie induisant des phénomènes thermiques plus complexes à maitriser. Dans le cas de systèmes embarqués en environnement où la température ambiante varie dans des proportions importantes (automobile par exemple), ces phénomènes peuvent conduire à des problèmes de fiabilité. Parmi les mécanismes de défaillance observés, on peut citer les cycles thermiques (CT) qui induisent des déformations dans les couches métalliques de la puce pouvant conduire à des fissurations. L’objectif de la thèse est de proposer pour des architectures de type multiprocesseur sur puce une technique de réduction des CT subis par les processeurs, et ce en respectant les contraintes temps réel des applications. L’exemple du circuit MPC5517 de Freescale a été considéré. Dans un premier temps un modèle thermique de ce circuit a été élaboré à partir de mesures par une caméra thermique sur ce circuit décapsulé. Un environnement de simulation a été mis en oeuvre pour permettre d’effectuer simultanément des analyses thermiques et d’ordonnancement de tâches et mettre en évidence l’influence de la température sur la puissance dissipée. Une heuristique globale pour réduire à la fois les CT et la température maximale des processeurs a été étudiée. Elle tient compte des variations de la température ambiante et se base sur les techniques DVFS et DPM. Les résultats de simulation avec les algorithmes d’ordonnancement globaux RM, EDF et EDZL et avec différentes charges processeur (sur un circuit type MPC5517 et un UltraSparc T1) illustrent l’efficacité de la technique proposée. / Increasing the density of transistors in electronic circuits leads to an increase in energy consumption resulting in more complex thermal phenomena to master. For systems embedded in environments where the ambient temperature can vary in large range (e.g. automotive), these thermal effects can induce reliability problems. Among classical failure mechanisms thermal cycles (CTs) produce deformations in materials and play a major role in the cracking of the metal layers in the chip. The aim of the thesis is to propose a reduction technique of CTs suffered by the processor cores in a multiprocessor on chip architecture such that real-time application constraints are met. The example of the Freescale MPC5517 circuit has been considered. In a first step a thermal model of this circuit was developed. This was achieved from measurements taken by a thermal camera on a decapsulated circuit. Next, a simulation environment has been implemented allowing both the analysis of thermal behavior and the scheduling of tasks so as to highlight the influence of temperature on the dissipated power. A global heuristic to reduce both the CTs and the maximum temperature of processors has been studied. It takes into account variations in the ambient temperature and is based on DVFS and DPM techniques. Simulation results with global scheduling algorithms RM, EDF and EDZL and different processor loads (for a MPC5517 type circuit and a T1 UltraSparc from Sun Microsystems) illustrate the effectiveness of the proposed technique. Modélisation thermique Conception système Système sur puce Ordonnancement temps réel Basse consommation Fiabilité Cycles thermiques Co-simulation Temperature aware system-level design Multiprocessor-system on chip Real time scheduling Low-power Reliability Themal cycles Co-simulation
18	Simula??o de reservat?rios de petr?leo em ambiente MPSoC / Reservoir simulation in a MPSOC environment Oliveira, Bruno Cruz de 22 May 2009 (has links) Made available in DSpace on 2014-12-17T15:47:50Z (GMT). No. of bitstreams: 1 BrunoCO.pdf: 708202 bytes, checksum: 3eb4368a0c268064bcd6ad892e1f2c0c (MD5) Previous issue date: 2009-05-22 / The constant increase of complexity in computer applications demands the development of more powerful hardware support for them. With processor's operational frequency reaching its limit, the most viable solution is the use of parallelism. Based on parallelism techniques and the progressive growth in the capacity of transistors integration in a single chip is the concept of MPSoCs (Multi-Processor System-on-Chip). MPSoCs will eventually become a cheaper and faster alternative to supercomputers and clusters, and applications developed for these high performance systems will migrate to computers equipped with MP-SoCs containing dozens to hundreds of computation cores. In particular, applications in the area of oil and natural gas exploration are also characterized by the high processing capacity required and would benefit greatly from these high performance systems. This work intends to evaluate a traditional and complex application of the oil and gas industry known as reservoir simulation, developing a solution with integrated computational systems in a single chip, with hundreds of functional unities. For this, as the STORM (MPSoC Directory-Based Platform) platform already has a shared memory model, a new distributed memory model were developed. Also a message passing library has been developed folowing MPI standard / O constante aumento da complexidade das aplica??es demanda um suporte de hardware computacionalmente mais poderoso. Com a aproxima??o do limite de velocidade dos processadores, a solu??o mais vi?vel ? o paralelismo. Baseado nisso e na crescente capacidade de integra??o de transistores em um ?nico chip surgiram os chamados MPSoCs (Multiprocessor System-on-Chip) que dever?o ser, em um futuro pr?ximo, uma alternativa mais r?pida e mais barata aos supercomputadores e clusters. Aplica??es tidas como destinadas exclusivamente a execu??o nesses sistemas de alto desempenho dever?o migrar para m?quinas equipadas com MPSoCs dotados de dezenas a centenas de n?cleos computacionais. Aplica??es na ?rea de explora??o de petr?leo e g?s natural tamb?m se caracterizam pela enorme capacidade de processamento requerida e dever?o se beneficiar desses novos sistemas de alto desempenho. Esse trabalho apresenta uma avalia??o de uma tradicional e complexa aplica??o da ind?stria de petr?leo e g?s natural, a simula??o de reservat?rios, sob a nova ?tica do desenvolvimento de sistemas computacionais integrados em um ?nico chip, dotados de dezenas a centenas de unidades funcionais. Para isso, um modelo de mem?ria distribu?da foi desenvolvido para a plataforma STORM (MPSoC Directory-Based Platform), que j? contava com um modelo de mem?ria compartilhada. Foi desenvolvida, ainda, uma biblioteca de troca de mensagens para esse modelo de mem?ria seguindo o padr?o MPI Sistemas-em-chip multiprocessados Redes-em-chip Modelos de mem?ria Simula??o de reservat?rios Multiprocessor system-on-chip Network-on-chip Memory models Reservoir simulation
19	High level design and control of adaptive multiprocessor system-on-chips / Conception et contrôle de haut niveau pour les systèmes sur puce multiprocesseurs adaptatifs An, Xin 16 October 2013 (has links) La conception de systèmes embarqués modernes est de plus en plus complexe, car plus de fonctionnalités sont intégrées dans ces systèmes. En même temps, afin de répondre aux exigences de calcul tout en conservant une consommation d'énergie de faible niveau, MPSoCs sont apparus comme les principales solutions pour tels systèmes embarqués. En outre, les systèmes embarqués sont de plus en plus adaptatifs, comme l’adaptabilité peut apporter un certain nombre d'avantages, tels que la flexibilité du logiciel et l'efficacité énergétique. Cette thèse vise la conception sécuritaire de ces MPSoCs adaptatifs. Tout d'abord, chaque configuration de système doit être analysée en ce qui concerne ses propriétés fonctionnelles et non fonctionnelles. Nous présentons un cadre abstraite de conception et d’analyse qui permet des décisions d’implémentation plus rapide et plus rentable. Ce cadre est conçu comme un support de raisonnement intermédiaire pour les environnements de co-conception de logiciel / matériel au niveau de système. Il peut élaguer l'espace de conception à sa plus grande portée, et identifier les candidats de solutions de conception de manière rapide et efficace. Dans ce cadre, nous utilisons un codage basé sur l’horloge abstrait pour modéliser les comportements du système. Différents scénarios d'applications de mapping et de planification sur MPSoCs sont analysés via les traces d'horloge qui représentent les simulations du système. Les propriétés d'intérêt sont l’exactitude du comportement fonctionnel, la performance temporelle et la consommation d'énergie. Deuxièmement, la gestion de la reconfiguration de MPSoCs adaptatifs doit être abordée. Nous sommes particulièrement intéressés par les MPSoCs implémentés sur des architectures reconfigurables de hardware (ex. FPGA tissus) qui offrent une bonne flexibilité et une efficacité de calcul pour les MPSoCs adaptatifs. Nous proposons un cadre général de conception basésur la technique de la synthèse de contrôleurs discrets (SCD) pour résoudre ce problème. L’avantage principal de cette technique est qu'elle permet une synthèse d'un contrôleur automatique vis-à-vis d’une spécification donnée des objectifs de contrôle. Dans ce cadre, le comportement de reconfiguration du système est modélisé en termes d'automates synchrones en parallèle. Le problème de calcul de la gestion reconfiguration vis-à-vis de multiples objectifs concernant, par exemple, les usages des ressources, la performance et la consommation d’énergie est codé comme un problème de SCD . Le langage de programmation BZR existant et l’outil Sigali sont employés pour effectuer SCD et générer un contrôleur qui satisfait aux exigences du système. Finalement, nous étudions deux façons différentes de combiner les deux cadres de conception proposées pour MPSoCs adaptatifs. Tout d'abord, ils sont combinés pour construire un flot de conception complet pour MPSoCs adaptatifs. Deuxièmement, ils sont combinés pour présenter la façon dont le gestionnaire d'exécution conçu dans le second cadre peut être intégré dans le premier cadre de sorte que les simulations de haut niveau peuvent être effectuées pour évaluer le gestionnaire d'exécution. / The design of modern embedded systems is getting more and more complex, as more func- tionality is integrated into these systems. At the same time, in order to meet the compu- tational requirements while keeping a low level power consumption, MPSoCs have emerged as the main solutions for such embedded systems. Furthermore, embedded systems are be- coming more and more adaptive, as the adaptivity can bring a number of benefits, such as software flexibility and energy efficiency. This thesis targets the safe design of such adaptive MPSoCs. First, each system configuration must be analyzed concerning its functional and non- functional properties. We present an abstract design and analysis framework, which allows for faster and cost-effective implementation decisions. This framework is intended as an intermediate reasoning support for system level software/hardware co-design environments. It can prune the design space at its largest, and identify candidate design solutions in a fast and efficient way. In the framework, we use an abstract clock-based encoding to model system behaviors. Different mapping and scheduling scenarios of applications on MPSoCs are analyzed via clock traces representing system simulations. Among properties of interest are functional behavioral correctness, temporal performance and energy consumption. Second, the reconfiguration management of adaptive MPSoCs must be addressed. We are specially interested in MPSoCs implemented on reconfigurable hardware architectures (i.e., FPGA fabrics), which provide a good flexibility and computational efficiency for adap- tive MPSoCs. We propose a general design framework based on the discrete controller syn- thesis (DCS) technique to address this issue. The main advantage of this technique is that it allows the automatic controller synthesis w.r.t. a given specification of control objectives. In the framework, the system reconfiguration behavior is modeled in terms of synchronous parallel automata. The reconfiguration management computation problem w.r.t. multiple objectives regarding e.g., resource usages, performance and power consumption is encoded as a DCS problem. The existing BZR programming language and Sigali tool are employed to perform DCS and generate a controller that satisfies the system requirements. Finally, we investigate two different ways of combining the two proposed design frame- works for adaptive MPSoCs. Firstly, they are combined to construct a complete design flow for adaptive MPSoCs. Secondly, they are combined to present how the designed run-time manager by the second framework can be integrated into the first framework so that high level simulations can be performed to assess the run-time manager. Approche synchrone Synthèse de contrôleurs discrets Exploration de l'espace de conception Conception et validation formelle Adaptive Multiprocessor System-on-Chips Synchronous approach Discrete controller syntheis Design space exploration Formal design and validation 004
20	Polymorphic ASIC : For Video Decoding Adarsha Rao, S J January 2013 (has links) (PDF) Video applications are becoming ubiquitous in recent times due to an explosion in the number of devices with video capture and display capabilities. Traditionally, video applications are implemented on a variety of devices with each device targeting a speciﬁc application. However, the advances in technology have created a need to support multiple applications from a single device like a smart phone or tablet. Such convergence of applications necessitates support for interoperability among various applications, scalable performance meet the requirements of different applications and a high degree of reconﬁgurability to accommodate rapid evolution in applications features. In addition, low power consumption requirement is also very stringent for many video applications. The conventional custom hardware implementations of video applications deliver high performance at low power consumption while the recent MPSoC implementations enable high degree of interoperability and are useful to support application evolution. In this thesis, we combine the best features of custom hardware and MPSoC approaches to design a Polymorphic ASIC. A Polymorphic ASIC is an integrated circuit designed to meet the requirements of several applications belonging to a particular domain. A polymorphic ASIC consists of a fabric of computation, storage and communication resources, using which applications are composed dynamically. Although different video applications differ widely in the internal de-tails of operation, at the heart of almost every video application is a video codec (encoder and decoder). The requirements of scalability, high performance and low power consumption are very stringent for video decoding. Therefore this thesis focuses mainly on the architectural design of a Polymorphic ASIC for video decoding. We present an uniﬁed software and hardware architecture (USHA) for Polymorphic ASIC. USHA is a tiled architecture which uses loosely coupled processor and hardware tiles that are software programmable and hardware reconﬁgurable respectively. The distinctive feature of Polymorphic ASIC is the static partitioning of the application and dynamic mapping of ap-plication processes onto the computational tiles. Depending on the application scenarios, a process may be mapped onto one of the hardware or processor tiles. Polymorphic ASIC incor-porates a network–on–chip (NoC) to achieve ﬂexible communication across different tiles. Formulation of a programming framework for Polymorphic ASIC requires an implementation model that captures the structure of video decoder applications as well as the properties of the Polymorphic ASIC architecture. We derive an implementation model based on a combination of parametric polyhedral process networks, stream based functions and windowed dataﬂow models of computation. The implementation model leads to a process network oriented compilation ﬂow that achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi–processor, semi hardware and full hardware conﬁgurations of a video decoder. The thesis also presents an application QoS aware scheduler that selects a decoder conﬁguration that best meets the application performance requirements, thereby enabling dynamic performance scaling. The memory hierarchy of Polymorphic ASIC makes use of an application speciﬁc cache. Through a combined analysis of miss rate and external memory bandwidth, we show that the degradation in decoder performance due to memory stall cycles depends on the properties of the video being decoded as well as the behavior of the external memory interface. Based on this observation, we present the design of a reconﬁgurable 2–D cache architecture which can adjust its parameters in accordance with the characteristics of the video stream being decoded. We validate the Polymorphic ASIC using a proof–of–concept implementation on an FPGA. The performance of H.264 decoder on Polymorphic ASIC is evaluated for uniprocessor, multi processor, hardware accelerated and full hardware conﬁgurations. The scaling in performance delivered by these conﬁgurations shows that the Polymorphic ASIC enables the application to achieve super linear speedups [1]. The experimental results show that different implementations of a H.264 video decoder on the Polymorphic ASIC can deliver performance comparable to a wide spectrum of devices ranging from embedded processor like ARM 9 to MPSoCs like IBM Cell. We also present the energy consumption of various conﬁgurations of video decoders on Polymorphic ASIC and an application to conﬁguration mapping aimed at minimizing the overall energy consumption of a Polymorphic ASIC. Video Decoding Polymorphic ASIC Architecture Multiprocessor System-On-Chip (MPSoC) H.264 Decoder Video Encoding H.264 Decoders H.264 Decoding Computer Science

Search results