Global ETD Search

81	Automatic Design Space Exploration of Fault-tolerant Embedded Systems Architectures Tierno, Antonio 26 January 2023 (has links) Embedded Systems may have competing design objectives, such as to maximize the reliability, increase the functional safety, minimize the product cost, and minimize the energy consumption. The architectures must be therefore configured to meet varied requirements and multiple design objectives. In particular, reliability and safety are receiving increasing attention. Consequently, the configuration of fault-tolerant mechanisms is a critical design decision. This work proposes a method for automatic selection of appropriate fault-tolerant design patterns, optimizing simultaneously multiple objective functions. Firstly, we present an exact method that leverages the power of Satisfiability Modulo Theory to encode the problem with a symbolic technique. It is based on a novel assessment of reliability which is part of the evaluation of alternative designs. Afterwards, we empirically evaluate the performance of a near-optimal approximation variation that allows us to solve the problem even when the instance size makes it intractable in terms of computing resources. The efficiency and scalability of this method is validated with a series of experiments of different sizes and characteristics, and by comparing it with existing methods on a test problem that is widely used in the reliability optimization literature.
82	Design, Analysis, and Applications of Approximate Arithmetic Modules Ullah, Salim 06 April 2022 (has links) From the initial computing machines, Colossus of 1943 and ENIAC of 1945, to modern high-performance data centers and Internet of Things (IOTs), four design goals, i.e., high-performance, energy-efficiency, resource utilization, and ease of programmability, have remained a beacon of development for the computing industry. During this period, the computing industry has exploited the advantages of technology scaling and microarchitectural enhancements to achieve these goals. However, with the end of Dennard scaling, these techniques have diminishing energy and performance advantages. Therefore, it is necessary to explore alternative techniques for satisfying the computational and energy requirements of modern applications. Towards this end, one promising technique is analyzing and surrendering the strict notion of correctness in various layers of the computation stack. Most modern applications across the computing spectrum---from data centers to IoTs---interact and analyze real-world data and take decisions accordingly. These applications are broadly classified as Recognition, Mining, and Synthesis (RMS). Instead of producing a single golden answer, these applications produce several feasible answers. These applications possess an inherent error-resilience to the inexactness of processed data and corresponding operations. Utilizing these applications' inherent error-resilience, the paradigm of Approximate Computing relaxes the strict notion of computation correctness to realize high-performance and energy-efficient systems with acceptable quality outputs. The prior works on circuit-level approximations have mainly focused on Application-specific Integrated Circuits (ASICs). However, ASIC-based solutions suffer from long time-to-market and high-cost developing cycles. These limitations of ASICs can be overcome by utilizing the reconfigurable nature of Field Programmable Gate Arrays (FPGAs). However, due to architectural differences between ASICs and FPGAs, the utilization of ASIC-based approximation techniques for FPGA-based systems does not result in proportional performance and energy gains. Therefore, to exploit the principles of approximate computing for FPGA-based hardware accelerators for error-resilient applications, FPGA-optimized approximation techniques are required. Further, most state-of-the-art approximate arithmetic operators do not have a generic approximation methodology to implement new approximate designs for an application's changing accuracy and performance requirements. These works also lack a methodology where a machine learning model can be used to correlate an approximate operator with its impact on the output quality of an application. This thesis focuses on these research challenges by designing and exploring FPGA-optimized logic-based approximate arithmetic operators. As multiplication operation is one of the computationally complex and most frequently used arithmetic operations in various modern applications, such as Artificial Neural Networks (ANNs), we have, therefore, considered it for most of the proposed approximation techniques in this thesis. The primary focus of the work is to provide a framework for generating FPGA-optimized approximate arithmetic operators and efficient techniques to explore approximate operators for implementing hardware accelerators for error-resilient applications. Towards this end, we first present various designs of resource-optimized, high-performance, and energy-efficient accurate multipliers. Although modern FPGAs host high-performance DSP blocks to perform multiplication and other arithmetic operations, our analysis and results show that the orthogonal approach of having resource-efficient and high-performance multipliers is necessary for implementing high-performance accelerators. Due to the differences in the type of data processed by various applications, the thesis presents individual designs for unsigned, signed, and constant multipliers. Compared to the multiplier IPs provided by the FPGA Synthesis tool, our proposed designs provide significant performance gains. We then explore the designed accurate multipliers and provide a library of approximate unsigned/signed multipliers. The proposed approximations target the reduction in the total utilized resources, critical path delay, and energy consumption of the multipliers. We have explored various statistical error metrics to characterize the approximation-induced accuracy degradation of the approximate multipliers. We have also utilized the designed multipliers in various error-resilient applications to evaluate their impact on applications' output quality and performance. Based on our analysis of the designed approximate multipliers, we identify the need for a framework to design application-specific approximate arithmetic operators. An application-specific approximate arithmetic operator intends to implement only the logic that can satisfy the application's overall output accuracy and performance constraints. Towards this end, we present a generic design methodology for implementing FPGA-based application-specific approximate arithmetic operators from their accurate implementations according to the applications' accuracy and performance requirements. In this regard, we utilize various machine learning models to identify feasible approximate arithmetic configurations for various applications. We also utilize different machine learning models and optimization techniques to efficiently explore the large design space of individual operators and their utilization in various applications. In this thesis, we have used the proposed methodology to design approximate adders and multipliers. This thesis also explores other layers of the computation stack (cross-layer) for possible approximations to satisfy an application's accuracy and performance requirements. Towards this end, we first present a low bit-width and highly accurate quantization scheme for pre-trained Deep Neural Networks (DNNs). The proposed quantization scheme does not require re-training (fine-tuning the parameters) after quantization. We also present a resource-efficient FPGA-based multiplier that utilizes our proposed quantization scheme. Finally, we present a framework to allow the intelligent exploration and highly accurate identification of the feasible design points in the large design space enabled by cross-layer approximations. The proposed framework utilizes a novel Polynomial Regression (PR)-based method to model approximate arithmetic operators. The PR-based representation enables machine learning models to better correlate an approximate operator's coefficients with their impact on an application's output quality.:1. Introduction 1.1 Inherent Error Resilience of Applications 1.2 Approximate Computing Paradigm 1.2.1 Software Layer Approximation 1.2.2 Architecture Layer Approximation 1.2.3 Circuit Layer Approximation 1.3 Problem Statement 1.4 Focus of the Thesis 1.5 Key Contributions and Thesis Overview 2. Preliminaries 2.1 Xilinx FPGA Slice Structure 2.2 Multiplication Algorithms 2.2.1 Baugh-Wooley’s Multiplication Algorithm 2.2.2 Booth’s Multiplication Algorithm 2.2.3 Sign Extension for Booth’s Multiplier 2.3 Statistical Error Metrics 2.4 Design Space Exploration and Optimization Techniques 2.4.1 Genetic Algorithm 2.4.2 Bayesian Optimization 2.5 Artificial Neural Networks 3. Accurate Multipliers 3.1 Introduction 3.2 Related Work 3.3 Unsigned Multiplier Architecture 3.4 Motivation for Signed Multipliers 3.5 Baugh-Wooley’s Multiplier 3.6 Booth’s Algorithm-based Signed Multipliers 3.6.1 Booth-Mult Design 3.6.2 Booth-Opt Design 3.6.3 Booth-Par Design 3.7 Constant Multipliers 3.8 Results and Discussion 3.8.1 Experimental Setup and Tool Flow 3.8.2 Performance comparison of the proposed accurate unsigned multiplier 3.8.3 Performance comparison of the proposed accurate signed multiplier with the state-of-the-art accurate multipliers 3.8.4 Performance comparison of the proposed constant multiplier with the state-of-the-art accurate multipliers 3.9 Conclusion 4. Approximate Multipliers 4.1 Introduction 4.2 Related Work 4.3 Unsigned Approximate Multipliers 4.3.1 Approximate 4 × 4 Multiplier (Approx-1) 4.3.2 Approximate 4 × 4 Multiplier (Approx-2) 4.3.3 Approximate 4 × 4 Multiplier (Approx-3) 4.4 Designing Higher Order Approximate Unsigned Multipliers 4.4.1 Accurate Adders for Implementing 8 × 8 Approximate Multipliers from 4 × 4 Approximate Multipliers 4.4.2 Approximate Adders for Implementing Higher-order Approximate Multipliers 4.5 Approximate Signed Multipliers (Booth-Approx) 4.6 Results and Discussion 4.6.1 Experimental Setup and Tool Flow 4.6.2 Evaluation of the Proposed Approximate Unsigned Multipliers 4.6.3 Evaluation of the Proposed Approximate Signed Multiplier 4.7 Conclusion 5. Designing Application-specific Approximate Operators 5.1 Introduction 5.2 Related Work 5.3 Modeling Approximate Arithmetic Operators 5.3.1 Accurate Multiplier Design 5.3.2 Approximation Methodology 5.3.3 Approximate Adders 5.4 DSE for FPGA-based Approximate Operators Synthesis 5.4.1 DSE using Bayesian Optimization 5.4.2 MOEA-based Optimization 5.4.3 Machine Learning Models for DSE 5.5 Results and Discussion 5.5.1 Experimental Setup and Tool Flow 5.5.2 Accuracy-Performance Analysis of Approximate Adders 5.5.3 Accuracy-Performance Analysis of Approximate Multipliers 5.5.4 AppAxO MBO 5.5.5 ML Modeling 5.5.6 DSE using ML Models 5.5.7 Proposed Approximate Operators 5.6 Conclusion 6. Quantization of Pre-trained Deep Neural Networks 6.1 Introduction 6.2 Related Work 6.2.1 Commonly Used Quantization Techniques 6.3 Proposed Quantization Techniques 6.3.1 L2L: Log_2_Lead Quantization 6.3.2 ALigN: Adaptive Log_2_Lead Quantization 6.3.3 Quantitative Analysis of the Proposed Quantization Schemes 6.3.4 Proposed Quantization Technique-based Multiplier 6.4 Results and Discussion 6.4.1 Experimental Setup and Tool Flow 6.4.2 Image Classification 6.4.3 Semantic Segmentation 6.4.4 Hardware Implementation Results 6.5 Conclusion 7. A Framework for Cross-layer Approximations 7.1 Introduction 7.2 Related Work 7.3 Error-analysis of approximate arithmetic units 7.3.1 Application Independent Error-analysis of Approximate Multipliers 7.3.2 Application Specific Error Analysis 7.4 Accelerator Performance Estimation 7.5 DSE Methodology 7.6 Results and Discussion 7.6.1 Experimental Setup and Tool Flow 7.6.2 Behavioral Analysis 7.6.3 Accelerator Performance Estimation 7.6.4 DSE Performance 7.7 Conclusion 8. Conclusions and Future Work info:eu-repo/classification/ddc/004 ddc:004
83	Processor design-space exploration through fast simulation / Exploration de l'espace de conception de processeurs via simulation accélérée Khan, Taj Muhammad 12 May 2011 (has links) Nous nous focalisons sur l'échantillonnage comme une technique de simulation pour réduire le temps de simulation. L'échantillonnage est basé sur le fait que l'exécution d'un programme est composée des parties du code qui se répètent, les phases. D'où vient l'observation que l'on peut éviter la simulation entière d'un programme et simuler chaque phase juste une fois et à partir de leurs performances calculer la performance du programme entier. Deux questions importantes se lèvent: quelles parties du programme doit-on simuler? Et comment restaurer l'état du système avant chaque simulation? Pour répondre à la première question, il existe deux solutions: une qui analyse l'exécution du programme en termes de phases et choisit de simuler chaque phase une fois, l'échantillonnage représentatif, et une deuxième qui prône de choisir les échantillons aléatoirement, l'échantillonnage statistique. Pour répondre à la deuxième question de la restauration de l'état du système, des techniques ont été développées récemment qui restaurent l'état (chauffent) du système en fonction des besoins du bout du code simulé (adaptativement). Les techniques des choix des échantillons ignorent complètement les mécanismes de chauffage du système ou proposent des alternatives qui demandent beaucoup de modification du simulateur et les techniques adaptatives du chauffage ne sont pas compatibles avec la plupart des techniques d'échantillonnage. Au sein de cette thèse nous nous focalisons sur le fait de réconcilier les techniques d'échantillonnage avec celles du chauffage adaptatif pour développer un mécanisme qui soit à la fois facile à utiliser, précis dans ses résultats, et soit transparent à l'utilisateur. Nous avons prit l'échantillonnage représentatif et statistique et modifié les techniques adaptatives du chauffage pour les rendre compatibles avec ces premiers dans un seul mécanisme. Nous avons pu montrer que les techniques adaptatives du chauffage peuvent être employées dans l'échantillonnage. Nos résultats sont comparables avec l'état de l'art en terme de précision mais en débarrassant l'utilisateur des problèmes du chauffage et en lui cachant les détails de la simulation, nous rendons le processus plus facile. On a aussi constaté que l'échantillonnage statistique donne des résultats meilleurs que l'échantillonnage représentatif / Simulation is a vital tool used by architects to develop new architectures. However, because of the complexity of modern architectures and the length of recent benchmarks, detailed simulation of programs can take extremely long times. This impedes the exploration of processor design space which the architects need to do to find the optimal configuration of processor parameters. Sampling is one technique which reduces the simulation time without adversely affecting the accuracy of the results. Yet, most sampling techniques either ignore the warm-up issue or require significant development effort on the part of the user.In this thesis we tackle the problem of reconciling state-of-the-art warm-up techniques and the latest sampling mechanisms with the triple objective of keeping the user effort minimum, achieving good accuracy and being agnostic to software and hardware changes. We show that both the representative and statistical sampling techniques can be adapted to use warm-up mechanisms which can accommodate the underlying architecture's warm-up requirements on-the-fly. We present the experimental results which show an accuracy and speed comparable to latest research. Also, we leverage statistical calculations to provide an estimate of the robustness of the final results. Simulation Architecture de processeur Echantillonage Echantillonage représentatif Echantillonage aléatoire Chauffage Phases Prédiction de phases Exploration d'espace de conception Simulation Computer Architecture Sampling Representative Sampling Random Sampling Warm-up Phases Phase Prediction Design Space Exploration
84	Dynamic instruction set extension of microprocessors with embedded FPGAs Bauer, Heiner 13 April 2017 (has links) (PDF) Increasingly complex applications and recent shifts in technology scaling have created a large demand for microprocessors which can perform tasks more quickly and more energy efficient. Conventional microarchitectures exploit multiple levels of parallelism to increase instruction throughput and use application specific instruction sets or hardware accelerators to increase energy efficiency. Reconfigurable microprocessors adopt the same principle of providing application specific hardware, however, with the significant advantage of post-fabrication flexibility. Not only does this offer similar gains in performance but also the flexibility to configure each device individually. This thesis explored the benefit of a tight coupled and fine-grained reconfigurable microprocessor. In contrast to previous research, a detailed design space exploration of logical architectures for island-style field programmable gate arrays (FPGAs) has been performed in the context of a commercial 22nm process technology. Other research projects either reused general purpose architectures or spent little effort to design and characterize custom fabrics, which are critical to system performance and the practicality of frequently proposed high-level software techniques. Here, detailed circuit implementations and a custom area model were used to estimate the performance of over 200 different logical FPGA architectures with single-driver routing. Results of this exploration revealed similar tradeoffs and trends described by previous studies. The number of lookup table (LUT) inputs and the structure of the global routing network were shown to have a major impact on the area delay product. However, results suggested a much larger region of efficient architectures than before. Finally, an architecture with 5-LUTs and 8 logic elements per cluster was selected. Modifications to the microprocessor, whichwas based on an industry proven instruction set architecture, and its software toolchain provided access to this embedded reconfigurable fabric via custom instructions. The baseline microprocessor was characterized with estimates from signoff data for a 28nm hardware implementation. A modified academic FPGA tool flow was used to transform Verilog implementations of custom instructions into a post-routing netlist with timing annotations. Simulation-based verification of the system was performed with a cycle-accurate processor model and diverse application benchmarks, ranging from signal processing, over encryption to computation of elementary functions. For these benchmarks, a significant increase in performance with speedups from 3 to 15 relative to the baseline microprocessor was achieved with the extended instruction set. Except for one case, application speedup clearly outweighed the area overhead for the extended system, even though the modeled fabric architecturewas primitive and contained no explicit arithmetic enhancements. Insights into fundamental tradeoffs of island-style FPGA architectures, the developed exploration flow, and a concrete cost model are relevant for the development of more advanced architectures. Hence, this work is a successful proof of concept and has laid the basis for further investigations into architectural extensions and physical implementations. Potential for further optimizationwas identified on multiple levels and numerous directions for future research were described. / Zunehmend komplexere Anwendungen und Besonderheiten moderner Halbleitertechnologien haben zu einer großen Nachfrage an leistungsfähigen und gleichzeitig sehr energieeffizienten Mikroprozessoren geführt. Konventionelle Architekturen versuchen den Befehlsdurchsatz durch Parallelisierung zu steigern und stellen anwendungsspezifische Befehlssätze oder Hardwarebeschleuniger zur Steigerung der Energieeffizienz bereit. Rekonfigurierbare Prozessoren ermöglichen ähnliche Performancesteigerungen und besitzen gleichzeitig den enormen Vorteil, dass die Spezialisierung auf eine bestimmte Anwendung nach der Herstellung erfolgen kann. In dieser Diplomarbeit wurde ein rekonfigurierbarer Mikroprozessor mit einem eng gekoppelten FPGA untersucht. Im Gegensatz zu früheren Forschungsansätzen wurde eine umfangreiche Entwurfsraumexploration der FPGA-Architektur im Zusammenhang mit einem kommerziellen 22nm Herstellungsprozess durchgeführt. Bisher verwendeten die meisten Forschungsprojekte entweder kommerzielle Architekturen, die nicht unbedingt auf diesen Anwendungsfall zugeschnitten sind, oder die vorgeschlagenen FGPA-Komponenten wurden nur unzureichend untersucht und charakterisiert. Jedoch ist gerade dieser Baustein ausschlaggebend für die Leistungsfähigkeit des gesamten Systems. Deshalb wurden im Rahmen dieser Arbeit über 200 verschiedene logische FPGA-Architekturen untersucht. Zur Modellierung wurden konkrete Schaltungstopologien und ein auf den Herstellungsprozess zugeschnittenes Modell zur Abschätzung der Layoutfläche verwendet. Generell wurden die gleichen Trends wie bei vorhergehenden und ähnlich umfangreichen Untersuchungen beobachtet. Auch hier wurden die Ergebnisse maßgeblich von der Größe der LUTs (engl. "Lookup Tables") und der Struktur des Routingnetzwerks bestimmt. Gleichzeitig wurde ein viel breiterer Bereich von Architekturen mit nahezu gleicher Effizienz identifiziert. Zur weiteren Evaluation wurde eine FPGA-Architektur mit 5-LUTs und 8 Logikelementen ausgewählt. Die Performance des ausgewählten Mikroprozessors, der auf einer erprobten Befehlssatzarchitektur aufbaut, wurde mit Ergebnissen eines 28nm Testchips abgeschätzt. Eine modifizierte Sammlung von akademischen Softwarewerkzeugen wurde verwendet, um Spezialbefehle auf die modellierte FPGA-Architektur abzubilden und eine Netzliste für die anschließende Simulation und Verifikation zu erzeugen. Für eine Reihe unterschiedlicher Anwendungs-Benchmarks wurde eine relative Leistungssteigerung zwischen 3 und 15 gegenüber dem ursprünglichen Prozessor ermittelt. Obwohl die vorgeschlagene FPGA-Architektur vergleichsweise primitiv ist und keinerlei arithmetische Erweiterungen besitzt, musste dabei, bis auf eine Ausnahme, kein überproportionaler Anstieg der Chipfläche in Kauf genommen werden. Die gewonnen Erkenntnisse zu den Abhängigkeiten zwischen den Architekturparametern, der entwickelte Ablauf für die Exploration und das konkrete Kostenmodell sind essenziell für weitere Verbesserungen der FPGA-Architektur. Die vorliegende Arbeit hat somit erfolgreich den Vorteil der untersuchten Systemarchitektur gezeigt und den Weg für mögliche Erweiterungen und Hardwareimplementierungen geebnet. Zusätzlich wurden eine Reihe von Optimierungen der Architektur und weitere potenziellen Forschungsansätzen aufgezeigt. embedded FPGA architecture design space exploration reconfigurable computing 22 nm CMOS technologie instruction set extension tight coupled fabric embedded FPGA Architektur Entwurfsraumexploration rekonfigurierbare System 22 nm Halbleitertechnologie Befehlssatzerweiterung enge Kopplung ddc:621.3 rvk:ZN 4940
85	Methods for parameterizing and exploring Pareto frontiers using barycentric coordinates Daskilewicz, Matthew John 08 April 2013 (has links) The research objective of this dissertation is to create and demonstrate methods for parameterizing the Pareto frontiers of continuous multi-attribute design problems using barycentric coordinates, and in doing so, to enable intuitive exploration of optimal trade spaces. This work is enabled by two observations about Pareto frontiers that have not been previously addressed in the engineering design literature. First, the observation that the mapping between non-dominated designs and Pareto efficient response vectors is a bijection almost everywhere suggests that points on the Pareto frontier can be inverted to find their corresponding design variable vectors. Second, the observation that certain common classes of Pareto frontiers are topologically equivalent to simplices suggests that a barycentric coordinate system will be more useful for parameterizing the frontier than the Cartesian coordinate systems typically used to parameterize the design and objective spaces. By defining such a coordinate system, the design problem may be reformulated from y = f(x) to (y,x) = g(p) where x is a vector of design variables, y is a vector of attributes and p is a vector of barycentric coordinates. Exploration of the design problem using p as the independent variables has the following desirable properties: 1) Every vector p corresponds to a particular Pareto efficient design, and every Pareto efficient design corresponds to a particular vector p. 2) The number of p-coordinates is equal to the number of attributes regardless of the number of design variables. 3) Each attribute y_i has a corresponding coordinate p_i such that increasing the value of p_i corresponds to a motion along the Pareto frontier that improves y_i monotonically. The primary contribution of this work is the development of three methods for forming a barycentric coordinate system on the Pareto frontier, two of which are entirely original. The first method, named "non-domination level coordinates," constructs a coordinate system based on the (k-1)-attribute non-domination levels of a discretely sampled Pareto frontier. The second method is based on a modification to an existing "normal boundary intersection" multi-objective optimizer that adaptively redistributes its search basepoints in order to sample from the entire frontier uniformly. The weights associated with each basepoint can then serve as a coordinate system on the frontier. The third method, named "Pareto simplex self-organizing maps" uses a modified a self-organizing map training algorithm with a barycentric-grid node topology to iteratively conform a coordinate grid to the sampled Pareto frontier. Multi-objective optimization Pareto frontier Barycentric coordinates Design space exploration Decision theory Multi-attribute design Design optimization Visualization Multidisciplinary design optimization Engineering design Decision making Multiple criteria decision making
86	Design, Implementation and Evaluation of a Configurable NoC for AcENoCs FPGA Accelerated Emulation Platform Lotlikar, Swapnil Subhash 2010 August 1900 (has links) The heterogenous nature and the demand for extensive parallel processing in modern applications have resulted in widespread use of Multicore System-on-Chip (SoC) architectures. The emerging Network-on-Chip (NoC) architecture provides an energy-efficient and scalable communication solution for Multicore SoCs, serving as a powerful replacement for traditional bus-based solutions. The key to successful realization of such architectures is a flexible, fast and robust emulation platform for fast design space exploration. In this research, we present the design and evaluation of a highly configurable NoC used in AcENoCs (Accelerated Emulation platform for NoCs), a flexible and cycle accurate field programmable gate array (FPGA) emulation platform for validating NoC architectures. Along with the implementation details, we also discuss the various design optimizations and tradeoffs, and assess the performance improvements of AcENoCs over existing simulators and emulators. We design a hardware library consisting of routers and links using verilog hardware description language (HDL). The router is parameterized and has a configurable number of physical ports, virtual channels (VCs) and pipeline depth. A packet switched NoC is constructed by connecting the routers in either 2D-Mesh or 2D-Torus topology. The NoC is integrated in the AcENoCs platform and prototyped on Xilinx Virtex-5 FPGA. The NoC was evaluated under various synthetic and realistic workloads generated by AcENoCs' traffic generators implemented on the Xilinx MicroBlaze embedded processor. In order to validate the NoC design, performance metrics like average latency and throughput were measured and compared against the results obtained using standard network simulators. FPGA implementation of the NoC using Xilinx tools indicated a 76% LUT utilization for a 5x5 2D-Mesh network. A VC allocator was found to be the single largest consumer of hardware resources within a router. The router design synthesized at a frequency of 135MHz, 124MHz and 109MHz for 3-port, 4-port and 5-port configurations, respectively. The operational frequency of the router in the AcENoCs environment was limited only by the software execution latency even though the hardware itself could be clocked at a much higher rate. An AcENoCs emulator showed speedup improvements of 10000-12000X over HDL simulators and 5-15X over software simulators, without sacrificing cycle accuracy. FPGA NoC Network-on-Chip On-Chip Communication SoC System-on-Chip RTL HDL Hardware Description Language Hardware-Software Emulation Framework Network-on-Chip Emulation Network-on-Chip Validation Network-on-Chip Design Space Exploration
87	A Method for Optimised Allocation of System Architectures with Real-time Constraints Marcus, Ventovaara, Arman, Hasanbegović January 2018 (has links) Optimised allocation of system architectures is a well researched area as it can greatly reduce the developmental cost of systems and increase performance and reliability in their respective applications.In conjunction with the recent shift from federated to integrated architectures in automotive, and the increasing complexity of computer systems, both in terms of software and hardware, the applications of design space exploration and optimised allocation of system architectures are of great interest.This thesis proposes a method to derive architectures and their allocations for systems with real-time constraints.The method implements integer linear programming to solve for an optimised allocation of system architectures according to a set of linear constraints while taking resource requirements, communication dependencies, and manual design choices into account.Additionally, this thesis describes and evaluates an industrial use case using the method wherein the timing characteristics of a system were evaluated, and, the method applied to simultaneously derive a system architecture, and, an optimised allocation of the system architecture.This thesis presents evidence and validations that suggest the viability of the method and its use case in an industrial setting.The work in this thesis sets precedence for future research and development, as well as future applications of the method in both industry and academia. optimised allocation system architectures automotive systems real-time response time analysis integer linear programming design space exploration worst-case execution time functional safety Robotics Robotteknik och automation Embedded Systems Inbäddad systemteknik
88	High level design and control of adaptive multiprocessor system-on-chips / Conception et contrôle de haut niveau pour les systèmes sur puce multiprocesseurs adaptatifs An, Xin 16 October 2013 (has links) La conception de systèmes embarqués modernes est de plus en plus complexe, car plus de fonctionnalités sont intégrées dans ces systèmes. En même temps, afin de répondre aux exigences de calcul tout en conservant une consommation d'énergie de faible niveau, MPSoCs sont apparus comme les principales solutions pour tels systèmes embarqués. En outre, les systèmes embarqués sont de plus en plus adaptatifs, comme l’adaptabilité peut apporter un certain nombre d'avantages, tels que la flexibilité du logiciel et l'efficacité énergétique. Cette thèse vise la conception sécuritaire de ces MPSoCs adaptatifs. Tout d'abord, chaque configuration de système doit être analysée en ce qui concerne ses propriétés fonctionnelles et non fonctionnelles. Nous présentons un cadre abstraite de conception et d’analyse qui permet des décisions d’implémentation plus rapide et plus rentable. Ce cadre est conçu comme un support de raisonnement intermédiaire pour les environnements de co-conception de logiciel / matériel au niveau de système. Il peut élaguer l'espace de conception à sa plus grande portée, et identifier les candidats de solutions de conception de manière rapide et efficace. Dans ce cadre, nous utilisons un codage basé sur l’horloge abstrait pour modéliser les comportements du système. Différents scénarios d'applications de mapping et de planification sur MPSoCs sont analysés via les traces d'horloge qui représentent les simulations du système. Les propriétés d'intérêt sont l’exactitude du comportement fonctionnel, la performance temporelle et la consommation d'énergie. Deuxièmement, la gestion de la reconfiguration de MPSoCs adaptatifs doit être abordée. Nous sommes particulièrement intéressés par les MPSoCs implémentés sur des architectures reconfigurables de hardware (ex. FPGA tissus) qui offrent une bonne flexibilité et une efficacité de calcul pour les MPSoCs adaptatifs. Nous proposons un cadre général de conception basésur la technique de la synthèse de contrôleurs discrets (SCD) pour résoudre ce problème. L’avantage principal de cette technique est qu'elle permet une synthèse d'un contrôleur automatique vis-à-vis d’une spécification donnée des objectifs de contrôle. Dans ce cadre, le comportement de reconfiguration du système est modélisé en termes d'automates synchrones en parallèle. Le problème de calcul de la gestion reconfiguration vis-à-vis de multiples objectifs concernant, par exemple, les usages des ressources, la performance et la consommation d’énergie est codé comme un problème de SCD . Le langage de programmation BZR existant et l’outil Sigali sont employés pour effectuer SCD et générer un contrôleur qui satisfait aux exigences du système. Finalement, nous étudions deux façons différentes de combiner les deux cadres de conception proposées pour MPSoCs adaptatifs. Tout d'abord, ils sont combinés pour construire un flot de conception complet pour MPSoCs adaptatifs. Deuxièmement, ils sont combinés pour présenter la façon dont le gestionnaire d'exécution conçu dans le second cadre peut être intégré dans le premier cadre de sorte que les simulations de haut niveau peuvent être effectuées pour évaluer le gestionnaire d'exécution. / The design of modern embedded systems is getting more and more complex, as more func- tionality is integrated into these systems. At the same time, in order to meet the compu- tational requirements while keeping a low level power consumption, MPSoCs have emerged as the main solutions for such embedded systems. Furthermore, embedded systems are be- coming more and more adaptive, as the adaptivity can bring a number of benefits, such as software flexibility and energy efficiency. This thesis targets the safe design of such adaptive MPSoCs. First, each system configuration must be analyzed concerning its functional and non- functional properties. We present an abstract design and analysis framework, which allows for faster and cost-effective implementation decisions. This framework is intended as an intermediate reasoning support for system level software/hardware co-design environments. It can prune the design space at its largest, and identify candidate design solutions in a fast and efficient way. In the framework, we use an abstract clock-based encoding to model system behaviors. Different mapping and scheduling scenarios of applications on MPSoCs are analyzed via clock traces representing system simulations. Among properties of interest are functional behavioral correctness, temporal performance and energy consumption. Second, the reconfiguration management of adaptive MPSoCs must be addressed. We are specially interested in MPSoCs implemented on reconfigurable hardware architectures (i.e., FPGA fabrics), which provide a good flexibility and computational efficiency for adap- tive MPSoCs. We propose a general design framework based on the discrete controller syn- thesis (DCS) technique to address this issue. The main advantage of this technique is that it allows the automatic controller synthesis w.r.t. a given specification of control objectives. In the framework, the system reconfiguration behavior is modeled in terms of synchronous parallel automata. The reconfiguration management computation problem w.r.t. multiple objectives regarding e.g., resource usages, performance and power consumption is encoded as a DCS problem. The existing BZR programming language and Sigali tool are employed to perform DCS and generate a controller that satisfies the system requirements. Finally, we investigate two different ways of combining the two proposed design frame- works for adaptive MPSoCs. Firstly, they are combined to construct a complete design flow for adaptive MPSoCs. Secondly, they are combined to present how the designed run-time manager by the second framework can be integrated into the first framework so that high level simulations can be performed to assess the run-time manager. Approche synchrone Synthèse de contrôleurs discrets Exploration de l'espace de conception Conception et validation formelle Adaptive Multiprocessor System-on-Chips Synchronous approach Discrete controller syntheis Design space exploration Formal design and validation 004
89	Visualisation d’information pour une décision informée en exploration d’espace de conception par shopping / Information visualization for an informed decision to design space exploration by shopping Abi Akle, Audrey 10 July 2015 (has links) Lors de l’exploration d’espace de conception, les données résultantes de la simulation d’un grand nombre d’alternatives de conception peuvent conduire à la surcharge d’information quand il s’agit de choisir une bonne solution de conception. Cette exploration d’espace de conception s’apparente à une méthode d’optimisation en conception multicritère mais en mode manuel pour lequel des outils appropriés à la visualisation de données multidimensionnelle sont employés. Pour le concepteur, un processus en trois phases – découverte, optimisation, sélection – est suivi selon un paradigme dit de Design by Shopping. Le fait de « parcourir » l’espace de conception permet de gagner en intuition sur les sous-espaces de solutions faisables et infaisables et sur les solutions offrant de bons compromis. Le concepteur apprend au cours de ces manipulations graphiques de données. La sélection d’une solution optimale se fait donc sur la base d’une décision dite informée. L’objectif de cette recherche est la performance des représentations graphiques pour l’exploration d’espace de conception, pour les trois phases du processus en Design by Shopping. Pour cela, cinq représentations graphiques, identifiées comme potentiellement performantes, sont testées à travers deux expérimentations. Dans la première, trente participants ont testé trois graphiques, pour la phase de sélection dans une situation multi-attribut, à travers trois scénarios de conception où une voiture doit être choisie parmi quarante selon des préférences énoncées. Pour cela, un indice de qualité est proposé pour calculer la qualité de la solution du concepteur pour un des trois scénarios définis, la solution optimale selon cet indice étant comparée à celles obtenues après manipulation des graphiques. Dans la deuxième expérimentation, quarante-deux concepteurs novices ont résolu deux problèmes de conception à l’aide de trois graphiques. Dans ce cas, la performance des graphiques est testée pour la prise de décision informée et pour les trois phases du processus dans une situation multi-objectif. Les résultats révèlent qu’un graphique est adapté à chacune des trois phases du Design by Shopping :: le graphique Scatter Plot Matrix pour la phase de découverte et pour la prise de décision informée, le graphique Simple Scatter pour la phase d’optimisation et le graphique Parallel Coordinate Plot pour la phase de sélection aussi bien dans une situation multi-attribut que multi-objectif. / In Design space exploration, the resulting data, from simulation of large amount of new design alternatives, can lead to information overload when one good design solution must be chosen. The design space exploration relates to a multi-criteria optimization method in design but in manual mode, for which appropriate tools to support multi-dimensional data visualization are employed. For the designer, a three-phase process - discovery, optimization, selection - is followed according to a paradigm called Design by Shopping. Exploring the design space helps to gain insight into both feasible and infeasible solutions subspaces, and into solutions presenting good trade-offs. Designers learn during these graphical data manipulations and the selection of an optimal solution is based on a so-called informed decision. The objective of this research is the performance of graphs for design space exploration according to the three phases of the Design by Shopping process. In consequence, five graphs, identified as potentially efficient, are tested through two experiments. In the first, thirty participants tested three graphs, in three design scenarios where one car must be chosen out of a total of forty, for the selection phase in a multi-attribute situation where preferences are enounced. A response quality index is proposed to compute the choice quality for each of the three given scenarios, the optimal solutions being compared to the ones resulting from the graphical manipulations. In the second experiment, forty-two novice designers solved two design problems with three graphs. In this case, the performance of graphs is tested for informed decision-making and for the three phases of the process in a multi-objective situation. The results reveal three efficient graphs for the design space exploration: the Scatter Plot Matrix for the discovery phase and for informed decision-making, the Simple Scatter Plot for the optimization phase and the Parallel Coordinate Plot for the selection phase in a multi-attribute as well as multi-objective situation. Exploration d’espace de conception Design by Shopping Décision informée Optimisation en conception multicritère Visualisation d’information Design space exploration Design by Shopping Informed decision Multi-criteria design optimization Information visualization Multidimensional data visualization
90	Emulation platform synthesis and NoC evaluation for embedded systems : towards next generation networks / Synthèse de plateformes d’émulation et évaluation de NoCs pour les systèmes embarqués : vers les réseaux du futur Alcantara de Lima, Otavio Junior 09 September 2015 (has links) La complexité croissante des systèmes embarqués multi-coeur exige des structures de communication flexibles et capables de supporter de nombreuses requêtes de trafics au moment de l’exécution. Les Réseaux sur Puce (NoC) émergent comme la technologie de communication la plus prometteuse pour les SoCs (Systèmes sur Puce), du fait de leur plus grande flexibilité par rapport aux autres solutions comme les bus et les connexions points à points. Les NoCs sont devenus le standard comme support de communication pour les SoC, mais les outils d’évaluation de performances deviennent critiques pour ces systèmes. Les outils d’émulation sur FPGA accélèrent l’analyse comparative de NoC ainsi que l’exploration de l’espace de conception. Ces outils ont une grande précision et un faible temps d’exécution par rapport aux simulateurs de NoC. Un outil d’émulation basé sur FPGA est composé de dizaines ou de centaines de composants distribués. Ces composants doivent être correctement gérés afin d’exécuter différents scénarii d’évaluation de trafic. Pour cela, il faut être à même de re-programmer les composants, en utilisant un protocole standard qui permet alors de piloter l’émulateur de NoC sur FPGA. Ces protocoles facilitent l’intégration des composants d’émulation développés par différents concepteurs et simplifient la configuration des noeuds d’émulation sans resynthèse ainsi que l’extraction des résultats d’émulation. Bien que l’émulation matérielle de NoC soit assez difficile, il est important de valider de nouvelles architectures de NoC avec des trafics basés sur les applications réelles pour permettre d’obtenir des résultats plus précis. La génération de modèles de trafic basés sur des applications est une préoccupation majeure pour l’émulation de NoC. Les traces intégrant des informations de dépendances sont plus précises que les traces ordinaires, ceci pour un large éventail d’architectures de NoC. Cependant, elles ont tendance à être plus grosses que les traces originales et exigent plus de ressources FPGA. L’objectif de cette thèse est la synthèse de plateformes d’émulation de NoC sur FPGA pour les futurs systèmes embarqués multi-noeuds. Une recherche approfondie s’est portée sur les stratégies éventuelles pour la génération des modèles réalistes de trafic pour le NoC émulé sur FPGA, et pour la gestion des plateformes d’émulation en utilisant des protocoles standard inspirés des protocoles de réseaux informatiques. Une première contribution de cette thèse est une structure (« framework ») d’analyse de traces capable d’extraire les dépendances de paquets. La plateforme proposée analyse un ensemble de traces extraites d’une application embarquée basée sur l’échange de messages afin de construire un modèle de calcul (MoC). Un générateur de trafic (TG) intégrant cette dépendance est créé à partir du MoC proposé. Ce TG reproduit le motif de trafic d’une application pour une plateforme d’émulation sur FPGA. Une seconde contribution est une version allégée du protocole SNMP (Simple Network Management Protocol) pour la gestion d’une plateforme d’émulation de NoC sur FPGA. L’architecture de la plateforme d’émulation proposée est basée sur les concepts du protocole SNMP. Elle offre une interface standard de haut niveau pour les composants d’émulation fournis par le protocole SNMP. Ce protocole facilite également l’intégration de composants d’émulation créés par différents concepteurs. Une analyse prospective des futures architectures de NoC constitue également une contribution dans cette thèse. Dans cette analyse, une architecture conceptuelle d’un système embarqué multi-noeuds du futur constitue un modèle pour extraire les contraintes de ces réseaux. Un autre mécanisme présenté est un NoC tolérant aux pannes, basé sur l’utilisation de liens de contournement. Enfin, la dernière contribution repose sur une analyse de base des besoins des futurs NoC pour les outils d’émulation sur FPGA / The ever-increasing complexity of many-core embedded system applications demands a flexible communication structure capable of supporting different traffics requirements at run-time. The Networks-on-Chip (NoCs) emerge as the most promising communication technology for the modern many-cores SoC (System-on-Chip), whereby they have greater scalability than other solutions such as buses and point to point connections. As NoCs become de facto standard for on chip systems, NoC performance evaluation tools become critical for SoCs design. The FPGA based emulation platforms accelerate NoC benchmarking as well as design space exploration. Those platforms have high accuracy and low execution time in relation to NoC simulators. An FPGA-based emulation platform is composed by tens or hundreds of distributed components. These components should be timely managed in order to execute an evaluation scenario. There is a lack of standard protocols to drive FPGA-based NoC emulators. Such protocols could ease the integration of emulation components developed by different designers, as well as they could enable the configuration of the emulation nodes without FPGA re-synthesis and the extraction of emulation results. The NoC hardware emulation is quite challenging. It is important to validate new NoC architectures with realistic workloads, because they provide much more accurate results. The generation of applications traffic patterns is a key concern for NoC emulation. The dependency aware traces are an appealing solution for the generation of realistic traffic workloads. They are more accurate than ordinary traces for a broad range of NoC architectures because they contain packets dependencies information. However, they tend to be bigger than the original ones what demands more FPGA resources. This thesis aims the synthesis of FPGA-based NoC emulation platforms for the future multi-core embedded systems. We are interested in investigating strategies to generate realistic traffic patterns for NoCs emulated on FPGAs, as well as the management of the emulation platform using standard protocols inspired by the computer networks protocols. One contribution of this thesis is a trace analysis framework which addresses the packets dependencies extraction problem. The proposed framework analyzes traces from a message passing application in order to build a Model of Computation (MoC). This MoC reproduces the communicative behavior of an application node. A dependency-aware Traffic Generator (TG) is created from the proposed MoC. This TG generates the application traffic pattern during an FPGA-based NoC emulation. Another contribution is a light version of SNMP (Simple Network Management Protocol) to manage an FPGA-based NoC emulation platform. An FPGA-based emulation platform architecture is proposed based on the principles of SNMP protocol. This platform has a high-level interface to the emulation components provided by that protocol, which also eases the integration of emulation components created by different designers. The emulation platform and the protocol capacities are evaluated during a task mapping and mesh topology design space exploration. A prospective analysis of future NoCs architectures is also a contribution of this thesis. In this analysis, a conceptual architecture of a future multi-core embedded system is used as model to extract these networks requirements. From this analysis, it is proposed some networking mechanisms. The first mechanism is a congestion-aware routing algorithm, which is an adaptive routing algorithm that selects the output path for a given packet based on a simple prioritized scheme of sets of rules. It is also proposed a congestion-control mechanisms for the vertical links interconnecting the layers of a 3D NoC. This mechanism is based upon the diffusion of congestion information by a piggyback protocol NoC FPGA Plateformes d'émulation Protocole basé sur SNMP Exploration de l'espace de conception Évaluation de NoC Analyse de traces d'applications Réseaux sur puce du futur NoC FPGA Emulation platforms SNMP-like protocol Design space exploration NoC evaluation Trace application analysis Next generation NoC

Search results