Global ETD Search

21	Adaptive mpsoc architectures : principLes, methods and tools / Architectures multi-processeurs adaptatives : principes, méthodes et outils Marchesan Almeida, Gabriel 21 November 2011 (has links) Les systèmes multiprocesseurs sur puce (MPSoC) offrent des performances supérieures tout en conservant la flexibilité et la réutilisabilité grâce à la customisation du logiciel embarqué. Alors que la plupart de MPSoC sont aujourd'hui hétérogènes pour mieux répondre aux besoins des applications ciblées, les MPSoCs homogènes pourraient devenir dans un proche avenir une alternative viable apportant d'autres avantages tels que l'équilibrage de charge de l'exécution, la migration des tâches et l'ájustement de fréquence dynamique. Cette thèse s'appuie sur une plateforme MPSoC homogène, développée pour explorer techniques d'adaptation en ligne. Chaque processeur de ce système est compact et exécute un système d'exploitation préemptif qui surveille diverses métriques et est habilité à prendre des décisions de remapping grâce à des techniques de migration de code et du changement dynamique de la fréquence. Cette approche permet la mise en œuvre des capacités de raffinage d'application à l'exécution en fonction de différents critères. / Multiprocessor Systems-on-Chip (MPSoC) offer superior performance while maintaining flexibility and reusability thanks to software oriented personalization. While most MPSoCs are today heterogeneous for better meeting the targeted application requirements, homogeneous MPSoCs may become in a near future a viable alternative bringing other benefits such as run-time load balancing, task migration and dynamic frequency scaling. This thesis relies on a homogeneous NoC-based MPSoC platform developed for exploring scalable and adaptive on-line continuous mapping techniques. Each processor of this system is compact and runs a tiny preemptive operating system that monitors various metrics and is entitled to take remapping decisions through code migration techniques and dynamic frequency scaling. This approach that endows the architecture with decisional capabilities permits refining application implementation at run-time according to various criteria. MPSoC Architectures homogènes NoC Rtos Adaptable Vhdl MPSoC Homogeneous NoC Rtos Adaptive Vhdl
22	Communication et contrôle dans les architectures homogènes de circuits pour télécommunications / Communication and Control in homogeneous architectures for telecommunication design Jalier, Camille 05 July 2010 (has links) Les travaux de thèse s'intéressent à la problématique de contrôle et de communication dans le domaine de la conception des systèmes numériques embarqués pour les applications de télécommunication de quatrième génération. La complexité des applications couplée aux besoins de productivité croissants impose de repenser les méthodologies de conception et les architectures sous jacentes. Afin de lever ces verrous, nous proposons plusieurs contributions originales. En effet, une méthodologie d'exploration d'un espace de conception ainsi qu'une architecture basée sur des noeuds de traitements homogènes et flexibles interconnectés à travers un réseau sur silicium sont proposées. Chaque noeud de traitement possède plusieurs blocs visant à exécuter efficacement et dynamiquement les applications de télécommunication. Pour répondre aux contraintes de faible consommation, nous proposons plusieurs solutions innovantes afin de minimiser cette métrique notamment au travers de techniques de migration de tâches. / This PhD research aims to solve challenges about control and communication in the design of digital embedded systems for 4G telecom applications. The application complexity added to the increasing productivity gap force to think about new design methodologies and the underlying architectures. Several new research directions is proposed in this work. A methodology for design space exploration and a digital architecture based on homogeneous and flexible processing units interconnected by a Network-on-Chip is proposed. A processing unit is a cluster of DSPs controled by a MIPS processor to compute telecom applications. To meet low power constraints, we propose optimization techniques based on resource management including task migration. MPSOC 4G LTE multiprocesseur NOC DSP MPSOC 4G LTE multiprocessor NOC DSP
23	Acceleration of deep convolutional neural networks on multiprocessor system-on-chip Reiche Myrgård, Martin January 2019 (has links) In this master thesis some of the most promising existing frameworks and implementations of deep convolutional neural networks on multiprocessor system-on-chips (MPSoCs) are researched and evaluated. The thesis’ starting point was a previousthesis which evaluated possible deep learning models and frameworks for object detection on infra-red images conducted in the spring of 2018. In order to fit an existing deep convolutional neural network (DCNN) on a Multiple-Processor-System on Chip it needs modifications. Most DCNNs are trained on Graphic processing units (GPUs) with a bit width of 32 bit. This is not optimal for a platform with hard memory constraints such as the MPSoC which means it needs to be shortened. The optimal bit width depends on the network structure and requirements in terms of throughput and accuracy although most of the currently available object detection networks drop significantly when reduced below 6 bits width. After reducing the bit width, the network needs to be quantized and pruned for better memory usage. After quantization it can be implemented using one of many existing frameworks. This thesis focuses on Xilinx CHaiDNN and DNNWeaver V2 though it touches a little on revision, HLS4ML and DNNWeaver V1 as well. In conclusion the implementation of two network models on Xilinx Zynq UltraScale+ ZCU102 using CHaiDNN were evaluated. Conversion of existing network were done and quantization tested though not fully working. The results were a two to six times more power efficient implementation in comparison to GPU inference. Neurala nätverk MPSoC FPGA DCNN Embedded Systems Inbäddad systemteknik
24	Conception d'une architecture journalisée tolérante aux fautes pour un processeur à pile de données / Design of a fault-tolerant journalized architecture for a stack processor Amin, Mohsin 09 June 2011 (has links) Dans cette thèse, nous proposons une nouvelle approche pour la conception d'un processeur tolérant aux fautes. Celle-ci répond à plusieurs objectifs dont celui d'obtenir un niveau de protection élevé contre les erreurs transitoires et un compromis raisonnable entre performances temporelles et coût en surface. Le processeur résultant sera utilisé ultérieurement comme élément constitutif d'un système multiprocesseur sur puce (MPSoC) tolérant aux fautes. Les concepts mis en œuvre pour la tolérance aux fautes reposent sur l'emploi de techniques de détection concurrente d'erreurs et de recouvrement par réexécution. Les éléments centraux de la nouvelle architecture sont, un cœur de processeur à pile de données de type MISC (Minimal Instruction Set Computer) capable d'auto-détection d'erreurs, et un mécanisme matériel de journalisation chargé d'empêcher la propagation d'erreurs vers la mémoire centrale (supposée sûre) et de limiter l'impact du mécanisme de recouvrement sur les performances temporelles. L'approche méthodologique mise en œuvre repose sur la modélisation et la simulation selon différents modes et niveaux d'abstraction, le développement d'outils logiciels dédiées, et le prototypage sur des technologies FPGA. Les résultats, obtenus sans recherche d'optimisation poussée, montrent clairement la pertinence de l'approche proposée, en offrant un bon compromis entre protection et performances. En effet, comme le montrent les multiples campagnes d'injection d'erreurs, le niveau de tolérance au fautes est élevé avec 100% des erreurs simples détectées et recouvrées et environ 60% et 78% des erreurs doubles et triples. Le taux recouvrement reste raisonnable pour des erreurs à multiplicité plus élevée, étant encore de 36% pour des erreurs de multiplicité 8 / In this thesis, we propose a new approach to designing a fault tolerant processor. The methodology is addressing several goals including high level of protection against transient faults along with reasonable performance and area overhead trade-offs. The resulting fault-tolerant processor will be used as a building block in a fault tolerant MPSoC (Multi-Processor System-on-Chip) architecture. The concepts being used to achieve fault tolerance are based on concurrent detection and rollback error recovery techniques. The core elements in this architecture are a stack processor core from the MISC (Minimal Instruction Set Computer) class and a hardware journal in charge of preventing error propagation to the main memory (supposedly dependable) and limiting the impact of the rollback mechanism on time performance. The design methodology relies on modeling at different abstraction levels and simulating modes, developing dedicated software tools, and prototyping on FPGA technology. The results, obtained without seeking a thorough optimization, show clearly the relevance of the proposed approach, offering a good compromise in terms of protection and performance. Indeed, fault tolerance, as revealed by several error injection campaigns, prove to be high with 100% of errors being detected and recovered for single bit error patterns, and about 60% and 78% for double and triple bit error patterns, respectively. Furthermore, recovery rate is still acceptable for larger error patterns, with yet a recovery rate of 36%on 8 bit error patterns Tolérance aux fautes Processeurs à pile de données MPSoC Modélisation RTL
25	Automatic Communication Synthesis with Hardware Sharing for Multi-Processor SoC Design TAKADA, Hiroaki, TOMIYAMA, Hiroyuki, HONDA, Shinya, SHIBATA, Seiya, ANDO, Yuki 01 December 2010 (has links) No description available. MPSoC design space exploration hardware sharing system-level design
26	System-Level Synthesis of Dataplane Subsystems for MPSoCs January 2013 (has links) abstract: In recent years we have witnessed a shift towards multi-processor system-on-chips (MPSoCs) to address the demands of embedded devices (such as cell phones, GPS devices, luxury car features, etc.). Highly optimized MPSoCs are well-suited to tackle the complex application demands desired by the end user customer. These MPSoCs incorporate a constellation of heterogeneous processing elements (PEs) (general purpose PEs and application-specific integrated circuits (ASICS)). A typical MPSoC will be composed of a application processor, such as an ARM Coretex-A9 with cache coherent memory hierarchy, and several application sub-systems. Each of these sub-systems are composed of highly optimized instruction processors, graphics/DSP processors, and custom hardware accelerators. Typically, these sub-systems utilize scratchpad memories (SPM) rather than support cache coherency. The overall architecture is an integration of the various sub-systems through a high bandwidth system-level interconnect (such as a Network-on-Chip (NoC)). The shift to MPSoCs has been fueled by three major factors: demand for high performance, the use of component libraries, and short design turn around time. As customers continue to desire more and more complex applications on their embedded devices the performance demand for these devices continues to increase. Designers have turned to using MPSoCs to address this demand. By using pre-made IP libraries designers can quickly piece together a MPSoC that will meet the application demands of the end user with minimal time spent designing new hardware. Additionally, the use of MPSoCs allows designers to generate new devices very quickly and thus reducing the time to market. In this work, a complete MPSoC synthesis design flow is presented. We first present a technique \cite{leary1_intro} to address the synthesis of the interconnect architecture (particularly Network-on-Chip (NoC)). We then address the synthesis of the memory architecture of a MPSoC sub-system \cite{leary2_intro}. Lastly, we present a co-synthesis technique to generate the functional and memory architectures simultaneously. The validity and quality of each synthesis technique is demonstrated through extensive experimentation. / Dissertation/Thesis / Ph.D. Computer Science 2013 Computer science Computer Science Dataplane MPSoC Network-on-Chip PhD Synthesis
27	Constrained Energy Optimization in Heterogeneous Platforms using Generalized Scaling Models January 2014 (has links) abstract: Mobile platforms are becoming highly heterogeneous by combining a powerful multiprocessor system-on-chip (MpSoC) with numerous resources including display, memory, power management IC (PMIC), battery and wireless modems into a compact package. Furthermore, the MpSoC itself is a heterogeneous resource that integrates many processing elements such as CPU cores, GPU, video, image, and audio processors. As a result, optimization approaches targeting mobile computing needs to consider the platform at various levels of granularity. Platform energy consumption and responsiveness are two major considerations for mobile systems since they determine the battery life and user satisfaction, respectively. In this work, the models for power consumption, response time, and energy consumption of heterogeneous mobile platforms are presented. Then, these models are used to optimize the energy consumption of baseline platforms under power, response time, and temperature constraints with and without introducing new resources. It is shown, the optimal design choices depend on dynamic power management algorithm, and adding new resources is more energy efficient than scaling existing resources alone. The framework is verified through actual experiments on Qualcomm Snapdragon 800 based tablet MDP/T. Furthermore, usage of the framework at both design and runtime optimization is also presented. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2014 Electrical engineering Computer engineering Energy consumption Mobile platoforms MpSoC Optimization
28	Low Power Design Methodology and Photonics Networks on Chip for Multiprocessor System on Chip / Méthodologie de conception basse consommation et réseaux optiques sur puces pour multiprocesseur système sur puce Hamwi, Khawla 30 May 2013 (has links) Les systèmes multiprocesseurs sur puce (MPSoC)s sont fortement émergent comme principaux composants dans les systèmes embarqués à hautes performances. La principale complexité dans la conception et l’implémentation des MPSoC est la communication entre les cœurs. Les réseaux sur puce (NoC) sont considérés comme la solution pour cet effet. ITRS prédit que des centaines de cœurs seront utilisées dans la génération future de système sur puce (SoC), ce qui va donc augmenter les coûts de l’évolutivité, de bande passante et de l’implémentation des réseaux sur puce (NoC)s. Ces problèmes sont présents dans diverses tendances technologiques dans le domaine des semiconducteurs et de la photonique. Cette thèse préconise l'utilisation de la synthèse NoC comme l'approche la plus appropriée pour exploiter ces tendances technologiques et rattraper les exigences des applications. A partir de plusieurs méthodologies de conception basées sur la technologie FPGA et des techniques d'estimation basse énergie (HLS) pour plusieurs IPs, nous proposons une implémentation ASIC basée sur la technologie 3D Tezzaron. Multi-FPGA technologie est utilisée pour valider la conception MPSoC avec 64 processeurs Butterfly NoC. La synthèse NoC est basée sur le regroupement de maîtres et d’esclaves générant des architectures asymétriques avec un soutien approprié pour les demandes très haut débit par optique NoC (ONoC), tandis que les demandes de bande passante inférieure sont traitées par électronique NoC. Une programmation linéaire est proposée comme une solution pour la synthèse NoC. / Multiprocessor systems on chip (MPSoC)s are strongly emerging as main components in high performance embedded systems. Several challenges can be determined in MPSoC design like the challenge which comes from interconnect infrastructure. Network-on-Chip (NOC) with multiple constraints to be satisfied is a promising solution for these challenges. ITRS predicts that hundreds of cores will be used in future generation system on chip (SoC) and thus raises the issue of scalability, bandwidth and implementation costs for NoCs. These issues are raised within the various technological trends in semiconductors and photonics. This PhD thesis advocates the use of NoC synthesis as the most appropriate approach to exploit these technological trends catch up with the applications requirements. Starting with several design methodologies based on FPGA technology and low power estimation techniques (HLS) for several IPs, we propose an ASIC implementation based on 3D Tezzaron technology. Multi-FPGA technology is used to validate MPSoC design with up to 64 processors with Butterfly NoC. NoC synthesis is based on a clustering of masters and slaves generating asymmetric architectures with appropriate support for very high bandwidth requests through Optical NoC (ONoC) while lower bandwidth requests are processed by electronic NoC. A linear programming is proposed as a solution to the NoC synthesis. Consommation d'énergie MPSoC NoC Optique NoC Synthèse NoC hybride Power consumption MPSoC NoC Optical NoC Hybrid NoC synthesis
29	Une approche de fouille de données pour le débogage temporel des applications embarquées de streaming / Data Mining Approach to Temporal Debugging of Embedded Streaming Applications Iegorov, Oleg 08 April 2016 (has links) Le déboggage des applications de streaming qui s'exécutent sur les systèmes embarqués multimédia est l'un des domaines les plus exigeants dans le développement de logiciel embarqué. Les nouvelles générations de materiel embarqué introduisent de nouvelles systèmes sur une puce, qui fait que les développeurs du logiciel doivent adapter leurs logiciels aux nouvelles platformes. Le logiciel embarqué doit non seulement fournir des résultats corrects mais aussi le faire en temps réel afin de respecter les propriétés de qualité de service (Quality-of-Service, QoS) du système. Lorsque les propriétés QoS ne sont pas respectées, des bugs temporels font leur apparition. Ces bugs se manifestent comme, par exemple, des glitches dans le flux vidéo ou des craquements dans le flux audio. Le déboggage temporel est en général difficile à effectuer car les bugs temporels n'ont pas souvent de rapport avec l'exactitude fonctionnelle du code des applications, ce qui rend les outils de débogage traditionels, comme GDB, peu utiles. Le non-respect des propriétés QoS peut provenir des interactions entre les applications, ou entre les applications et les processus systèmes. Par conséquent, le contexte d'exécution entier doit être pris en compte pour le déboggage temporel. Les avancements récents en collecte des traces d'exécution permettent aux développeurs de recueillir des traces et de les analyser après la fin d'exécution pour comprendre quelle activité système est responsable des bugs temporels. Cependant, les traces d'exécution ont une taille conséquente, ce qui demande aux devéloppeurs des connaissainces en analyse de données qu'ils n’ont souvent pas.Dans cette thèse, nous proposons SATM - une approche novatrice pour le déboggage temporel des applications de streaming. SATM repose sur la prémisse que les applications sont conçues avec le modèle dataflow, i.e. peuvent être représentées comme un graphe orienté où les données sont transmises entre des unités de calcul (fontions, modules, etc.) appelées "acteurs". Les acteurs doivent être exécutés de manière périodique afin de respecter les propriétés QoS représentées par les contraintes de temps-réél. Nous montrons qu'un acteur qui ne respecte pas de façon répétée sa période pendant l'exécution de l'application cause la violation des contraintes temps-reel de l'application. En pratique, SATM est un workflow d'analyse de données venant des traces d'exécution qui combine des mesures statistiques avec des algorithmes de fouille de données. SATM fournit une méthode automatique du débogage temporel des applications de streaming. Notre approche prend en entrée une trace d'exécution d'une application ayant une QoS basse ainsi qu'une liste de ses acteurs, et tout d'abord détecte des invocations des acteurs dans la trace. SATM découvre ensuite les périodes des acteurs ainsi que les séctions de la trace où la période n'a pas été respectée. Enfin, ces séctions sont analysées afin d'extraire des motifs de l'activité système qui différencient ces sections des autres séctions de la trace. De tels motifs peuvent donner des indices sur l'origine du problème temporel dans le systeme et sont rendus au devéloppeur. Plus précisément, nous représentons ces motifs comme des séquences contrastes minimales et nous étudions des différentes solutions pour fouiller ce type de motifs à partir des traces d'exécution.Enfin, nous montrons la capacité de SATM de détecter une perturbation temporelle injectée artificiellement dans un framework multimedia GStreamer, ainsi que des bugs temporels dans deux cas d'utilisation des applications de streaming industrielles provenant de la société STMicroelectronics. Nous fournissons également une analyse détaillée des algorithmes de fouille de motifs séquentiels appliqués sur les données venant des traces d'exécution, et nous expliquons pour quelle est la raison les algorithmes de pointe n'arrivent pas à fouiller les motifs séquentiels à partir des traces d'exécution de façon efficace. / Debugging streaming applications run on multimedia embedded systems found in modern consumer electronics (e.g. in set-top boxes, smartphones, etc) is one of the most challenging areas of embedded software development. With each generation of hardware, more powerful and complex Systems-on-Chip (SoC) are released, and developers constantly strive to adapt their applications to these new platforms. Embedded software must not only return correct results but also deliver these results on time in order to respect the Quality-of-Service (QoS) properties of the entire system. The non-respect of QoS properties lead to the appearance of temporal bugs which manifest themselves in multimedia embedded systems as, for example, glitches in the video or cracks in the sound. Temporal debugging proves to be tricky as temporal bugs are not related to the functional correctness of the code, thus making traditional GDB-like debuggers essentially useless. Violations of QoS properties can stem from complex interactions between a particular application and the system or other applications; the complete execution context must be, therefore, taken into account in order to perform temporal debugging. Recent advances in tracing technology allow software developers to capture a trace of the system's execution and to analyze it afterwards to understand which particular system activity is responsible for the violations of QoS properties. However, such traces have a large volume, and understanding them requires data analysis skills that are currently out of the scope of the developers' education.In this thesis, we propose SATM (Streaming Application Trace Miner) - a novel temporal debugging approach for embedded streaming applications. SATM is based on the premise that such applications are designed under the dataflow model of computation, i.e. as a directed graph where data flows between computational units called actors. In such setting, actors must be scheduled in a periodic way in order to meet QoS properties expressed as real-time constraints, e.g. displaying 30 video frames per second. We show that an actor which does not eventually respect its period at runtime causes the violation of the application’s real-time constraints. In practice, SATM is a data analysis workflow combining statistical measures and data mining algorithms. It provides an automatic solution to the problem of temporal debugging of streaming applications. Given an execution trace of a streaming application exhibiting low QoS as well as a list of its actors, SATM firstly determines exact actors’ invocations found in the trace. It then discovers the actors’ periods, as well as parts of the trace in which the periods are not respected. Those parts are further analyzed to extract patterns of system activity that differentiate them from other parts of the trace. Such patterns can give strong hints on the origin of the problem and are returned to the developer. More specifically, we represent those patterns as minimal contrast sequences and investigate various solutions to mine such sequences from execution trace data.Finally, we demonstrate SATM’s ability to detect both an artificial perturbation injected in an open source multimedia framework, as well as temporal bugs from two industrial use cases coming from STMicroelectronics. We also provide an extensive analysis of sequential pattern mining algorithms applied on execution trace data and explain why state-of-the-art algorithms fail to efficiently mine sequential patterns from real-world traces. MPSoC Analyse de Traces Fouille de Données Clustering Fouille de Motifs Séquentietiels MPSoC Trace Analysis Data Mining Clustering Sequential Pattern Mining 004
30	Gestion de l'activité et de la consommation dans les architectures multi-coeurs massivement parallèles / Activity and Power Management in Massively Parallel Multi-core Architectures Bizot, Gilles 25 October 2012 (has links) Les variabilités du processus de fabrication des technologies avancées (typ. < 32nm) sont de plus en plus difficile à maîtriser. Elles impactent plus sévèrement la fréquence de fonctionnement et la consommation d'énergie, et induisent de plus en plus de défaillances dans le circuit. Ceci est particulièrement vrai pour les MPSoCs, où le nombre de coeurs de calculs est très important. Les besoins (performances, fonctionnalités, faible consommation, tolérance aux fautes) ne cessent de croître et les caractéristiques hétérogènes (fréquence, énergie, défaillances) rendent difficile la mise en oeuvre de systèmes répondant à ces exigences. Ces travaux s'inscrivent dans l'optique de traiter ces problèmes pour des systèmes MPSoCs massivement parallèles, basés sur une topologie en maille 2D. Cette thèse propose une méthodologie automatisée qui permet le placement et l'ordonnancement d'applications dans les systèmes ciblés. Les aspects variabilité, consommation et performance sont pris en compte. D'autre part, cette thèse propose une technique de placement adaptatif tolérant aux fautes basée sur une stratégie de recouvrement des erreurs. Cette stratégie permet de garantir la terminaison de l'application en présence de défaillances, sans avoir recours à la prise de « check-points ». Cette technique est complété par des algorithmes adaptatifs distribués, prenant en compte la variabilité et la consommation d'énergie. / With the advanced technologies (typ. < 32nm), it is more and more difficult to control the manufacturing variabilities. It impacts more severely the working frequency and the consumed energy, and induces more and more failure inside the device. This is particularly true for MPSoC with a large number of computing cores. With the increasing needs (performance, functionalities, low power, fault tolerance) and heterogeneous characteristics (frequency, energy, failures) it becomes difficult to apply to systems able to meet these requirements. This work focus on this perspective to deal with these issues for the massively parallel MPSoC, based on 2D mesh topology. This thesis proposes an automated methodology, allowing the mapping and scheduling of application on the targeted system. It takes into account the variability, energy and computing power. Furthermore, this thesis proposes a fault tolerant adaptive mapping technique, paired with an original failure recovering strategy. This strategy allows to guarantee the termination of the application in the presence of failures, without the check-point requirement. The technique has been extended with an adaptive distributed algorithm, taking into account the manufacturing variability and aimed at reducing the consumed energy. MPSoC Faible consommation Ordonnancement Variabilité Tolérance aux fautes NoC MPSoC Low power Scheduling Variability Fault tolerance NoC

Search results