• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 7
  • 4
  • 3
  • 2
  • Tagged with
  • 16
  • 16
  • 16
  • 10
  • 8
  • 7
  • 7
  • 6
  • 5
  • 5
  • 5
  • 4
  • 4
  • 4
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Stratégie de réduction des cycles thermiques pour systèmes temps-réel multiprocesseurs sur puce / Strategy to reduce thermal cycles for real-time multiprocessor systems-on-chip

Baati, Khaled 19 December 2013 (has links)
L'augmentation de la densité des transistors dans les circuits électroniques conduit à une augmentation de la consommation d'énergie induisant des phénomènes thermiques plus complexes à maitriser. Dans le cas de systèmes embarqués en environnement où la température ambiante varie dans des proportions importantes (automobile par exemple), ces phénomènes peuvent conduire à des problèmes de fiabilité. Parmi les mécanismes de défaillance observés, on peut citer les cycles thermiques (CT) qui induisent des déformations dans les couches métalliques de la puce pouvant conduire à des fissurations. L’objectif de la thèse est de proposer pour des architectures de type multiprocesseur sur puce une technique de réduction des CT subis par les processeurs, et ce en respectant les contraintes temps réel des applications. L’exemple du circuit MPC5517 de Freescale a été considéré. Dans un premier temps un modèle thermique de ce circuit a été élaboré à partir de mesures par une caméra thermique sur ce circuit décapsulé. Un environnement de simulation a été mis en oeuvre pour permettre d’effectuer simultanément des analyses thermiques et d’ordonnancement de tâches et mettre en évidence l’influence de la température sur la puissance dissipée. Une heuristique globale pour réduire à la fois les CT et la température maximale des processeurs a été étudiée. Elle tient compte des variations de la température ambiante et se base sur les techniques DVFS et DPM. Les résultats de simulation avec les algorithmes d’ordonnancement globaux RM, EDF et EDZL et avec différentes charges processeur (sur un circuit type MPC5517 et un UltraSparc T1) illustrent l’efficacité de la technique proposée. / Increasing the density of transistors in electronic circuits leads to an increase in energy consumption resulting in more complex thermal phenomena to master. For systems embedded in environments where the ambient temperature can vary in large range (e.g. automotive), these thermal effects can induce reliability problems. Among classical failure mechanisms thermal cycles (CTs) produce deformations in materials and play a major role in the cracking of the metal layers in the chip. The aim of the thesis is to propose a reduction technique of CTs suffered by the processor cores in a multiprocessor on chip architecture such that real-time application constraints are met. The example of the Freescale MPC5517 circuit has been considered. In a first step a thermal model of this circuit was developed. This was achieved from measurements taken by a thermal camera on a decapsulated circuit. Next, a simulation environment has been implemented allowing both the analysis of thermal behavior and the scheduling of tasks so as to highlight the influence of temperature on the dissipated power. A global heuristic to reduce both the CTs and the maximum temperature of processors has been studied. It takes into account variations in the ambient temperature and is based on DVFS and DPM techniques. Simulation results with global scheduling algorithms RM, EDF and EDZL and different processor loads (for a MPC5517 type circuit and a T1 UltraSparc from Sun Microsystems) illustrate the effectiveness of the proposed technique.
12

Simula??o de reservat?rios de petr?leo em ambiente MPSoC / Reservoir simulation in a MPSOC environment

Oliveira, Bruno Cruz de 22 May 2009 (has links)
Made available in DSpace on 2014-12-17T15:47:50Z (GMT). No. of bitstreams: 1 BrunoCO.pdf: 708202 bytes, checksum: 3eb4368a0c268064bcd6ad892e1f2c0c (MD5) Previous issue date: 2009-05-22 / The constant increase of complexity in computer applications demands the development of more powerful hardware support for them. With processor's operational frequency reaching its limit, the most viable solution is the use of parallelism. Based on parallelism techniques and the progressive growth in the capacity of transistors integration in a single chip is the concept of MPSoCs (Multi-Processor System-on-Chip). MPSoCs will eventually become a cheaper and faster alternative to supercomputers and clusters, and applications developed for these high performance systems will migrate to computers equipped with MP-SoCs containing dozens to hundreds of computation cores. In particular, applications in the area of oil and natural gas exploration are also characterized by the high processing capacity required and would benefit greatly from these high performance systems. This work intends to evaluate a traditional and complex application of the oil and gas industry known as reservoir simulation, developing a solution with integrated computational systems in a single chip, with hundreds of functional unities. For this, as the STORM (MPSoC Directory-Based Platform) platform already has a shared memory model, a new distributed memory model were developed. Also a message passing library has been developed folowing MPI standard / O constante aumento da complexidade das aplica??es demanda um suporte de hardware computacionalmente mais poderoso. Com a aproxima??o do limite de velocidade dos processadores, a solu??o mais vi?vel ? o paralelismo. Baseado nisso e na crescente capacidade de integra??o de transistores em um ?nico chip surgiram os chamados MPSoCs (Multiprocessor System-on-Chip) que dever?o ser, em um futuro pr?ximo, uma alternativa mais r?pida e mais barata aos supercomputadores e clusters. Aplica??es tidas como destinadas exclusivamente a execu??o nesses sistemas de alto desempenho dever?o migrar para m?quinas equipadas com MPSoCs dotados de dezenas a centenas de n?cleos computacionais. Aplica??es na ?rea de explora??o de petr?leo e g?s natural tamb?m se caracterizam pela enorme capacidade de processamento requerida e dever?o se beneficiar desses novos sistemas de alto desempenho. Esse trabalho apresenta uma avalia??o de uma tradicional e complexa aplica??o da ind?stria de petr?leo e g?s natural, a simula??o de reservat?rios, sob a nova ?tica do desenvolvimento de sistemas computacionais integrados em um ?nico chip, dotados de dezenas a centenas de unidades funcionais. Para isso, um modelo de mem?ria distribu?da foi desenvolvido para a plataforma STORM (MPSoC Directory-Based Platform), que j? contava com um modelo de mem?ria compartilhada. Foi desenvolvida, ainda, uma biblioteca de troca de mensagens para esse modelo de mem?ria seguindo o padr?o MPI
13

Polymorphic ASIC : For Video Decoding

Adarsha Rao, S J January 2013 (has links) (PDF)
Video applications are becoming ubiquitous in recent times due to an explosion in the number of devices with video capture and display capabilities. Traditionally, video applications are implemented on a variety of devices with each device targeting a specific application. However, the advances in technology have created a need to support multiple applications from a single device like a smart phone or tablet. Such convergence of applications necessitates support for interoperability among various applications, scalable performance meet the requirements of different applications and a high degree of reconfigurability to accommodate rapid evolution in applications features. In addition, low power consumption requirement is also very stringent for many video applications. The conventional custom hardware implementations of video applications deliver high performance at low power consumption while the recent MPSoC implementations enable high degree of interoperability and are useful to support application evolution. In this thesis, we combine the best features of custom hardware and MPSoC approaches to design a Polymorphic ASIC. A Polymorphic ASIC is an integrated circuit designed to meet the requirements of several applications belonging to a particular domain. A polymorphic ASIC consists of a fabric of computation, storage and communication resources, using which applications are composed dynamically. Although different video applications differ widely in the internal de-tails of operation, at the heart of almost every video application is a video codec (encoder and decoder). The requirements of scalability, high performance and low power consumption are very stringent for video decoding. Therefore this thesis focuses mainly on the architectural design of a Polymorphic ASIC for video decoding. We present an unified software and hardware architecture (USHA) for Polymorphic ASIC. USHA is a tiled architecture which uses loosely coupled processor and hardware tiles that are software programmable and hardware reconfigurable respectively. The distinctive feature of Polymorphic ASIC is the static partitioning of the application and dynamic mapping of ap-plication processes onto the computational tiles. Depending on the application scenarios, a process may be mapped onto one of the hardware or processor tiles. Polymorphic ASIC incor-porates a network–on–chip (NoC) to achieve flexible communication across different tiles. Formulation of a programming framework for Polymorphic ASIC requires an implementation model that captures the structure of video decoder applications as well as the properties of the Polymorphic ASIC architecture. We derive an implementation model based on a combination of parametric polyhedral process networks, stream based functions and windowed dataflow models of computation. The implementation model leads to a process network oriented compilation flow that achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi–processor, semi hardware and full hardware configurations of a video decoder. The thesis also presents an application QoS aware scheduler that selects a decoder configuration that best meets the application performance requirements, thereby enabling dynamic performance scaling. The memory hierarchy of Polymorphic ASIC makes use of an application specific cache. Through a combined analysis of miss rate and external memory bandwidth, we show that the degradation in decoder performance due to memory stall cycles depends on the properties of the video being decoded as well as the behavior of the external memory interface. Based on this observation, we present the design of a reconfigurable 2–D cache architecture which can adjust its parameters in accordance with the characteristics of the video stream being decoded. We validate the Polymorphic ASIC using a proof–of–concept implementation on an FPGA. The performance of H.264 decoder on Polymorphic ASIC is evaluated for uniprocessor, multi processor, hardware accelerated and full hardware configurations. The scaling in performance delivered by these configurations shows that the Polymorphic ASIC enables the application to achieve super linear speedups [1]. The experimental results show that different implementations of a H.264 video decoder on the Polymorphic ASIC can deliver performance comparable to a wide spectrum of devices ranging from embedded processor like ARM 9 to MPSoCs like IBM Cell. We also present the energy consumption of various configurations of video decoders on Polymorphic ASIC and an application to configuration mapping aimed at minimizing the overall energy consumption of a Polymorphic ASIC.
14

Polymorphic ASIC : For Video Decoding

Adarsha Rao, S J January 2013 (has links) (PDF)
Video applications are becoming ubiquitous in recent times due to an explosion in the number of devices with video capture and display capabilities. Traditionally, video applications are implemented on a variety of devices with each device targeting a specific application. However, the advances in technology have created a need to support multiple applications from a single device like a smart phone or tablet. Such convergence of applications necessitates support for interoperability among various applications, scalable performance meet the requirements of different applications and a high degree of reconfigurability to accommodate rapid evolution in applications features. In addition, low power consumption requirement is also very stringent for many video applications. The conventional custom hardware implementations of video applications deliver high performance at low power consumption while the recent MPSoC implementations enable high degree of interoperability and are useful to support application evolution. In this thesis, we combine the best features of custom hardware and MPSoC approaches to design a Polymorphic ASIC. A Polymorphic ASIC is an integrated circuit designed to meet the requirements of several applications belonging to a particular domain. A polymorphic ASIC consists of a fabric of computation, storage and communication resources, using which applications are composed dynamically. Although different video applications differ widely in the internal de-tails of operation, at the heart of almost every video application is a video codec (encoder and decoder). The requirements of scalability, high performance and low power consumption are very stringent for video decoding. Therefore this thesis focuses mainly on the architectural design of a Polymorphic ASIC for video decoding. We present an unified software and hardware architecture (USHA) for Polymorphic ASIC. USHA is a tiled architecture which uses loosely coupled processor and hardware tiles that are software programmable and hardware reconfigurable respectively. The distinctive feature of Polymorphic ASIC is the static partitioning of the application and dynamic mapping of ap-plication processes onto the computational tiles. Depending on the application scenarios, a process may be mapped onto one of the hardware or processor tiles. Polymorphic ASIC incor-porates a network–on–chip (NoC) to achieve flexible communication across different tiles. Formulation of a programming framework for Polymorphic ASIC requires an implementation model that captures the structure of video decoder applications as well as the properties of the Polymorphic ASIC architecture. We derive an implementation model based on a combination of parametric polyhedral process networks, stream based functions and windowed dataflow models of computation. The implementation model leads to a process network oriented compilation flow that achieves realization agnostic application partitioning and enables seamless migration across uniprocessor, multi–processor, semi hardware and full hardware configurations of a video decoder. The thesis also presents an application QoS aware scheduler that selects a decoder configuration that best meets the application performance requirements, thereby enabling dynamic performance scaling. The memory hierarchy of Polymorphic ASIC makes use of an application specific cache. Through a combined analysis of miss rate and external memory bandwidth, we show that the degradation in decoder performance due to memory stall cycles depends on the properties of the video being decoded as well as the behavior of the external memory interface. Based on this observation, we present the design of a reconfigurable 2–D cache architecture which can adjust its parameters in accordance with the characteristics of the video stream being decoded. We validate the Polymorphic ASIC using a proof–of–concept implementation on an FPGA. The performance of H.264 decoder on Polymorphic ASIC is evaluated for uniprocessor, multi processor, hardware accelerated and full hardware configurations. The scaling in performance delivered by these configurations shows that the Polymorphic ASIC enables the application to achieve super linear speedups [1]. The experimental results show that different implementations of a H.264 video decoder on the Polymorphic ASIC can deliver performance comparable to a wide spectrum of devices ranging from embedded processor like ARM 9 to MPSoCs like IBM Cell. We also present the energy consumption of various configurations of video decoders on Polymorphic ASIC and an application to configuration mapping aimed at minimizing the overall energy consumption of a Polymorphic ASIC.
15

Réalisation d'un réseau de neurones "SOM" sur une architecture matérielle adaptable et extensible à base de réseaux sur puce "NoC" / Neural Network Implementation on an Adaptable and Scalable Hardware Architecture based-on Network-on-Chip

Abadi, Mehdi 07 July 2018 (has links)
Depuis son introduction en 1982, la carte auto-organisatrice de Kohonen (Self-Organizing Map : SOM) a prouvé ses capacités de classification et visualisation des données multidimensionnelles dans différents domaines d’application. Les implémentations matérielles de la carte SOM, en exploitant le taux de parallélisme élevé de l’algorithme de Kohonen, permettent d’augmenter les performances de ce modèle neuronal souvent au détriment de la flexibilité. D’autre part, la flexibilité est offerte par les implémentations logicielles qui quant à elles ne sont pas adaptées pour les applications temps réel à cause de leurs performances temporelles limitées. Dans cette thèse nous avons proposé une architecture matérielle distribuée, adaptable, flexible et extensible de la carte SOM à base de NoC dédiée pour une implantation matérielle sur FPGA. A base de cette approche, nous avons également proposé une architecture matérielle innovante d’une carte SOM à structure croissante au cours de la phase d’apprentissage / Since its introduction in 1982, Kohonen’s Self-Organizing Map (SOM) showed its ability to classify and visualize multidimensional data in various application fields. Hardware implementations of SOM, by exploiting the inherent parallelism of the Kohonen algorithm, allow to increase the overall performances of this neuronal network, often at the expense of the flexibility. On the other hand, the flexibility is offered by software implementations which on their side are not suited for real-time applications due to the limited time performances. In this thesis we proposed a distributed, adaptable, flexible and scalable hardware architecture of SOM based on Network-on-Chip (NoC) designed for FPGA implementation. Moreover, based on this approach we also proposed a novel hardware architecture of a growing SOM able to evolve its own structure during the learning phase
16

High-Performance Network-on-Chip Design for Many-Core Processors

Wang, Boqian January 2020 (has links)
With the development of on-chip manufacturing technologies and the requirements of high-performance computing, the core count is growing quickly in Chip Multi/Many-core Processors (CMPs) and Multiprocessor System-on-Chip (MPSoC) to support larger scale parallel execution. Network-on-Chip (NoC) has become the de facto solution for CMPs and MPSoCs in addressing the communication challenge. In the thesis, we tackle a few key problems facing high-performance NoC designs. For general-purpose CMPs, we encompass a full system perspective to design high-performance NoC for multi-threaded programs. By exploring the cache coherence under the whole system scenario, we present a smart communication service called Advance Virtual Channel Reservation (AVCR) to provide a highway to target packets, which can greatly reduce their contention delay in NoC. AVCR takes advantage of the fact that we can know or predict the destination of some packets ahead of their arrival at the Network Interface (NI). Exploiting the time interval before a packet is ready, AVCR establishes an end-to-end highway from the source NI to the destination NI. This highway is built up by reserving the Virtual Channel (VC) resources ahead of the target packet transmission and offering priority service to flits in the reserved VC in the wormhole router, which can avoid the target packets’ VC allocation and switch arbitration delay. Besides, we also propose an admission control method in NoC with a centralized Artificial Neural Network (ANN) admission controller, which can improve system performance by predicting the most appropriate injection rate of each node using the network performance information. In the online control process, a data preprocessing unit is applied to simplify the ANN architecture and make the prediction results more accurate. Based on the preprocessed information, the ANN predictor determines the control strategy and broadcasts it to each node where the admission control will be applied. For application-specific MPSoCs, we focus on developing high-performance NoC and NI compatible with the common AMBA AXI4 interconnect protocol. To offer the possibility of utilizing the AXI4 based processors and peripherals in the on-chip network based system, we propose a whole system architecture solution to make the AXI4 protocol compatible with the NoC based communication interconnect in the many-core system. Due to possible out-of-order transmission in the NoC interconnect, which conflicts with the ordering requirements specified by the AXI4 protocol, in the first place, we especially focus on the design of the transaction ordering units, realizing a high-performance and low cost solution to the ordering requirements. The microarchitectures and the functionalities of the transaction ordering units are also described and explained in detail for ease of implementation. Then, we focus on the NI and the Quality of Service (QoS) support in NoC. In our design, the NI is proposed to make the NoC architecture independent from the AXI4 protocol via message format conversion between the AXI4 signal format and the packet format, offering high flexibility to the NoC design. The NoC based communication architecture is designed to support high-performance multiple QoS schemes. The NoC system contains Time Division Multiplexing (TDM) and VC subnetworks to apply multiple QoS schemes to AXI4 signals with different QoS tags and the NI is responsible for traffic distribution between two subnetworks. Besides, a QoS inheritance mechanism is applied in the slave-side NI to support QoS during packets’ round-trip transfer in NoC. / Med utvecklingen av tillverkningsteknologi av on-chip och kraven på högpresterande da-toranläggning växer kärnantalet snabbt i Chip Multi/Many-core Processors (CMPs) ochMultiprocessor Systems-on-Chip (MPSoCs) för att stödja större parallellkörning. Network-on-Chip (NoC) har blivit den de facto lösningen för CMP:er och MPSoC:er för att mötakommunikationsutmaningen. I uppsatsen tar vi upp några viktiga problem med hög-presterande NoC-konstruktioner.Allmänna CMP:er omfattas ett fullständigt systemperspektiv för att design högprester-ande NoC för flertrådad program. Genom att utforska cachekoherensen under hela system-scenariot presenterar vi en smart kommunikationstjänst, AVCR (Advance Virtual ChannelReservation) för att tillhandahålla en motorväg till målpaket, vilket i hög grad kan min-ska deras förseningar i NoC. AVCR utnyttjar det faktum att vi kan veta eller förutsägadestinationen för vissa paket före deras ankomst till nätverksgränssnittet (Network inter-face, NI). Genom att utnyttja tidsintervallet innan ett paket är klart, etablerar AVCRen ände till ände motorväg från källan NI till destinationen NI. Denna motorväg byggsupp genom att reservera virtuell kanal (Virtual Channel, VC) resurser före målpaket-söverföringen och erbjuda prioriterade tjänster till flisar i den reserverade VC i wormholerouter. Dessutom föreslår vi också en tillträdeskontrollmetod i NoC med en centraliseradartificiellt neuronät (Artificial Neural Network, ANN) tillträdeskontroll, som kan förbättrasystemets prestanda genom att förutsäga den mest lämpliga injektionshastigheten för varjenod via nätverksprestationsinformationen. I onlinekontrollprocessen används en förbehan-dlingsenhet på data för att förenkla ANN-arkitekturen och göra förutsägningsresultatenmer korrekta. Baserat på den förbehandlade informationen bestämmer ANN-prediktornkontrollstrategin och sänder den till varje nod där tillträdeskontrollen kommer att tilläm-pas.För applikationsspecifika MPSoC:er fokuserar vi på att utveckla högpresterande NoCoch NI kompatibla med det gemensamma AMBA AXI4 protokoll. För att erbjuda möj-ligheten att använda AXI4-baserade processorer och kringutrustning i det on-chip baseradenätverkssystemet föreslår vi en hel systemarkitekturlösning för att göra AXI4 protokolletkompatibelt med den NoC-baserade kommunikation i det multikärnsystemet. På grundav den out-of-order överföring i NoC, som strider mot ordningskraven som anges i AXI4-protokollet, fokuserar vi i första hand på utformningen av transaktionsordningsenheterna,för att förverkliga en hög prestanda och låg kostnad-lösning på ordningskraven. Sedanfokuserar vi på NI och Quality of Service (QoS)-stödet i NoC. I vår design föreslås NI attgöra NoC-arkitekturen oberoende av AXI4-protokollet via meddelandeformatkonverteringmellan AXI4 signalformatet och paketformatet, vilket erbjuder NoC-designen hög flexi-bilitet. Den NoC-baserade kommunikationsarkitekturen är utformad för att stödja fleraQoS-schema med hög prestanda. NoC-systemet innehåller Time-Division Multiplexing(TDM) och VC-subnät för att tillämpa flera QoS-scheman på AXI4-signaler med olikaQoS-taggar och NI ansvarar för trafikdistribution mellan två subnät. Dessutom tillämpasen QoS-arvsmekanism i slav-sidan NI för att stödja QoS under paketets tur-returöverföringiNoC / <p>QC 20201008</p>

Page generated in 0.0749 seconds