Global ETD Search

91	Custom floating-point arithmetic for integer processors : algorithms, implementation, and selection / Arithmétique à virgule flottante spécifique pour processeurs entiers : algorithmes, implémentation et sélection Jourdan, Jingyan 15 November 2012 (has links) Les applications multimédia se composent généralement de blocs numériques exhibant des schémas de calcul flottant réguliers. Sur les processeurs sans support architectural pour l'arithmétique flottante, ils peuvent être profitablement transformés en opérateurs dédiés, s'ajoutant aux 5 opérateurs élémentaires (+, -, X, / et √) : en traitant plus d'opérations simultanément, ils permettent d'obtenir de meilleures performances. Cette thèse porte sur la conception de tels opérateurs, et les techniques de compilation mises en œuvre pour les sélectionner. Nous avons réalisé des implémentations optimisées pour un ensemble d'opérateurs dédiés : élévation au carré, mise à l'échelle, fused multiply-add, produit scalaire en dimension deux (DP2), addition/soustraction simultané et sinus/cosinus simultanés. En proposant de nouveaux algorithmes cherchant à maximiser le parallélisme d'instructions et détaillés ici, nous obtenons des accélérations d'un facteur allant jusqu'à 4.2 par appel. Nous détaillons également les changements apportés dans le compilateur pour effectuer la sélection. La plupart des opérateurs sont sélectionnés au niveau syntaxique. Cependant, pour certains opérateurs, nous avons dû améliorer l'analyse d'intervalles entiers pour prendre en compte les variables de type flottant, afin de prouver certaines conditions de positivité requises à leur sélection. Enfin, nous apportons la preuve en pratique de la pertinence de cette approche : sur des noyaux typiques du traitement du signal et sur certaines applications, nous mesurons une amélioration de performance allant jusqu'à 1.59x en comparaison avec la performance obtenue avec les seuls opérateurs élémentaires. / Media processing applications typically involve numerical blocks that exhibit regular floating-point computation patterns. For processors whose architecture supports only integer arithmetic, these patterns can be profitably turned into custom operators, coming in addition to the five basic ones (+, -, X, / and √), but achieving better performance by treating more operations. This thesis addresses the design of such custom operators as well as the techniques developed in the compiler to select them in application codes. We have designed optimized implementations for a set of custom operators which includes squaring, scaling, adding two nonnegative terms, fused multiply-add, fused square-add (x*x+z, with z>=0), two-dimensional dot products (DP2), sums of two squares, as well as simultaneous addition/subtraction and sine/cosine. With novel algorithms targeting high instruction-level parallelism and detailed here for squaring, scaling, DP2, and sin/cos, we achieve speedups of up to 4.2x for individual custom operators even when subnormal numbers are fully supported. Furthermore, we introduce the optimizations developed in the ST231 C/C++ compiler for selecting such operators. Most of the selections are achieved at high level, using syntactic criteria. However, for fused square-add, we also enhance the framework of integer range analysis to support floating-point variables in order to prove the required positivity condition z>= 0. Finally, we provide quantitative evidence of the benefits to support this selection of custom operations: on DSP kernels and benchmarks, our approach allows us to be up to 1.59x faster compared to the sole usage of basic ones. Arithmétique virgule flottante Opérateur dédié Processeur embarqué entier Architecture VLIW Optimisation du compilateur Sélection du code par le compilateur IEEE floating-point arithmetic Custom operator Embedded integer processor VLIW architecture Compiler optimization Compiler code selection
92	Implementation trade-offs for FGPA accelerators / Compromis pour l'implémentation d'accélérateurs sur FPGA Deest, Gaël 14 December 2017 (has links) L'accélération matérielle désigne l'utilisation d'architectures spécialisées pour effectuer certaines tâches plus vite ou plus efficacement que sur du matériel générique. Les accélérateurs ont traditionnellement été utilisés dans des environnements contraints en ressources, comme les systèmes embarqués. Cependant, avec la fin des règles empiriques ayant régi la conception de matériel pendant des décennies, ces quinze dernières années ont vu leur apparition dans les centres de calcul et des environnements de calcul haute performance. Les FPGAs constituent une plateforme d'implémentation commode pour de tels accélérateurs, autorisant des compromis subtils entre débit/latence, surface, énergie, précision, etc. Cependant, identifier de bons compromis représente un défi, dans la mesure où l'espace de recherche est généralement très large. Cette thèse propose des techniques de conception pour résoudre ce problème. Premièrement, nous nous intéressons aux compromis entre performance et précision pour la conversion flottant vers fixe. L'utilisation de l'arithmétique en virgule fixe au lieu de l'arithmétique flottante est un moyen efficace de réduire l'utilisation de ressources matérielles, mais affecte la précision des résultats. La validité d'une implémentation en virgule fixe peut être évaluée avec des simulations, ou en dérivant des modèles de précision analytiques de l'algorithme traité. Comparées aux approches simulatoires, les méthodes analytiques permettent une exploration plus exhaustive de l'espace de recherche, autorisant ainsi l'identification de solutions potentiellement meilleures. Malheureusement, elles ne sont applicables qu'à un jeu limité d'algorithmes. Dans la première moitié de cette thèse, nous étendons ces techniques à des filtres linéaires multi-dimensionnels, comme des algorithmes de traitement d'image. Notre méthode est implémentée comme une analyse statique basée sur des techniques de compilation polyédrique. Elle est validée en la comparant à des simulations sur des données réelles. Dans la seconde partie de cette thèse, on se concentre sur les stencils itératifs. Les stencils forment un motif de calcul émergeant naturellement dans de nombreux algorithmes utilisés en calcul scientifique ou dans l'embarqué. À cause de cette diversité, il n'existe pas de meilleure architecture pour les stencils de façon générale : chaque algorithme possède des caractéristiques uniques (intensité des calculs, nombre de dépendances) et chaque application possède des contraintes de performance spécifiques. Pour surmonter ces difficultés, nous proposons une famille d'architectures pour stencils. Nous offrons des paramètres de conception soigneusement choisis ainsi que des modèles analytiques simples pour guider l'exploration. Notre architecture est implémentée sous la forme d'un flot de génération de code HLS, et ses performances sont mesurées sur la carte. Comme les résultats le démontrent, nos modèles permettent d'identifier les solutions les plus intéressantes pour chaque cas d'utilisation. / Hardware acceleration is the use of custom hardware architectures to perform some computations faster or more efficiently than on general-purpose hardware. Accelerators have traditionally been used mostly in resource-constrained environments, such as embedded systems, where resource-efficiency was paramount. Over the last fifteen years, with the end of empirical scaling laws, they also made their way to datacenters and High-Performance Computing environments. FPGAs constitute a convenient implementation platform for such accelerators, allowing subtle, application-specific trade-offs between all performance metrics (throughput/latency, area, energy, accuracy, etc.) However, identifying good trade-offs is a challenging task, as the design space is usually extremely large. This thesis proposes design methodologies to address this problem. First, we focus on performance-accuracy trade-offs in the context of floating-point to fixed-point conversion. Usage of fixed-point arithmetic instead of floating-point is an affective way to reduce hardware resource usage, but comes at a price in numerical accuracy. The validity of a fixed-point implementation can be assessed using either numerical simulations, or with analytical models derived from the algorithm. Compared to simulation-based methods, analytical approaches enable more exhaustive design space exploration and can thus increase the quality of the final architecture. However, their are currently only applicable to limited sets of algorithms. In the first part of this thesis, we extend such techniques to multi-dimensional linear filters, such as image processing kernels. Our technique is implemented as a source-level analysis using techniques from the polyhedral compilation toolset, and validated against simulations with real-world input. In the second part of this thesis, we focus on iterative stencil computations, a naturally-arising pattern found in many scientific and embedded applications. Because of this diversity, there is no single best architecture for stencils: each algorithm has unique computational features (update formula, dependences) and each application has different performance constraints/requirements. To address this problem, we propose a family of hardware accelerators for stencils, featuring carefully-chosen design knobs, along with simple performance models to drive the exploration. Our architecture is implemented as an HLS-optimized code generation flow, and performance is measured with actual execution on the board. We show that these models can be used to identify the most interesting design points for each use case. FPGA Accélérateurs matériels Optimisation des largeurs Conversion flottantvers fixe Analyse de précision Stencils itératifs Synthèse de haut niveau Modèles de performance FPGA Hardware Accelerators Wordlength Optimization Floating-Point to fixed-Point conversion Accuracy Analysis Iterative Stencil Computations High-Level Synthesis Performance Models
93	Μονάδες επεξεργασίας δεδομένων για μικροεπεξεργαστές υψηλών αποδόσεων Δημητρακόπουλος, Γεώργιος 16 March 2009 (has links) Οι μονάδες επεξεργασίας δεδομένων αποτελούν τις βασικές δομικές μονάδες όλων των μικροεπεξεργαστών. Κάποια από τα κυκλώματα αυτής της κατηγορίας υλοποιούν τις βασικές αριθμητικές πράξεις πάνω σε δεδομένα τόσο σταθερής όσο και κινητής υποδιαστολής, ενώ κάποια άλλα αναλαμβάνουν την αναδιοργάνωση των δεδομένων αυτών για την επιτάχυνση του υπολογισμού. Σε επεξεργαστές ειδικού σκοπού, όπως οι επεξεργαστές πολυμέσων και γραφικών, οι μονάδες επεξεργασίας δεδομένων καταλαμβάνουν περισσότερο από το 30% του ολοκληρωμένου και η αποτελεσματική σχεδίαση τους έχει άμεσο αντίκτυπο στην απόδοση ολόκληρου του συστήματος. Στο μέλλον, αναμένεται πως ακόμα και οι επεξεργαστές γενικού σκοπού, θα είναι εξοπλισμένοι από εξειδικευμένους επιταχυντές, οι οποίοι θα εκτελούν απ’ ευθείας σε υλικό σύνθετους αλγορίθμους με μεγάλες υπολογιστικές απαιτήσεις. Η βάση όλων των προτεινόμενων λύσεων σ’ αυτή τη διατριβή είναι η αναλυτική εύρεση ενός εγγενώς απλούστερου αλγορίθμου, ο οποίος θα επιτρέπει την αποτελεσματική υλοποίηση των αντίστοιχων κυκλωμάτων ανεξάρτητα από την τεχνολογία που θα χρησιμοποιηθεί και από τους επιπλέον περιορισμούς που τυχόν θα επιβληθούν στο μέλλον κατά την κατασκευή των κυκλωμάτων αυτών. Η ανάλυση και τα πειραματικά αποτελέσματα που συλλέξαμε βασίζονται τόσο σε υλοποιήσεις σε επίπεδο τρανζίστορ, που είναι η κύρια μέχρι τώρα πρακτική σχεδίασης των μικροεπεξεργαστών υψηλών επιδόσεων, όσο και σε πλήρως αυτοματοποιημένες υλοποιήσεις. Φυσικά, στη δεύτερη περίπτωση η απόδοση των κυκλωμάτων επιβαρύνεται, τόσο σε καθυστέρηση όσο και σε ενέργεια, εξαιτίας των περιορισμών των αυτοματοποιημένων εργαλείων και την αναγκαστική χρήση των προσχεδιασμένων βιβλιοθηκών βασικών πυλών. Η μελέτη που πραγματοποιήσαμε στοχεύει στην πλήρη εξερεύνηση του χώρου λύσεων των κυκλωμάτων αυτών. Η ανάλυση της συμπεριφοράς τους πραγματοποιήθηκε χρησιμοποιώντας τις βέλτιστες καμπύλες της ενέργειας ως προς την καθυστέρηση, οι οποίες αποτελούν τον πιο έγκυρο τρόπο περιγραφής της απόδοσης ενός κυκλώματος. Τα κυκλώματα που παρουσιάζονται ανήκουν σε τρεις βασικές κατηγορίες. Στην πρώτη ανήκουν οι αθροιστές παράλληλου προθέματος, που χρησιμοποιούν τα κρατούμενα του Ling για την υλοποίηση της δυαδικής πρόσθεσης. Τα κρατούμενα που προτάθηκαν από τον Ling αποτελούν απλοποιημένες μορφές των κλασικών σχέσεων πρόβλεψης κρατουμένου και χρησιμοποιούνται αυτή τη στιγμή στην πλειοψηφία των εμπορικών επεξεργαστών. Το νέο κύκλωμα, που προτείναμε, αποτελεί ουσιαστικά τη γενίκευση των σχέσεων αυτών, επιτρέποντας την υλοποίηση τους με απλοποιημένες δομές παράλληλου προθέματος, με αποτέλεσμα τη μείωση τόσο της καθυστέρησης όσο και της απαιτούμενης ενέργειας. Η νέα τεχνική οδηγεί σε γρηγορότερα κυκλώματα ανεξάρτητα από τη λογική οικογένεια που θα χρησιμοποιηθεί (στατική ή δυναμική CMOS λογική) και το δένδρο παράλληλου προθέματος που θα επιλεγεί. Η δεύτερη κατηγορία αναφέρεται σε κυκλώματα αναδιάταξης των δεδομένων που είναι αποθηκευμένα μέσα στους καταχωρητές του επεξεργαστή. Η αποδοτική αναδιάταξη των δεδομένων καταλήγει να είναι σε πολλούς αλγορίθμους (κρυπτογραφία, ψηφιακή επεξεργασία σήματος, πολυμέσα) τόσο αναγκαία όσο και η γρήγορη υλοποίηση των βασικών αριθμητικών πράξεων, αλλά και η ταχεία επικοινωνία με τη μνήμη. H προσπάθεια μας εστιάστηκε στην αποδοτική υλοποίηση μιας γενικής εντολής αναδιάταξης δεδομένων, στοχεύοντας σε όσο το δυνατόν ταχύτερες υλοποιήσεις. Όλες οι εκδοχές που προτείναμε στηρίζονται σε μια νέα μορφή δικτύων ταξινόμησης, η οποία μας επιτρέπει να παρέχουμε λύσεις που είναι σημαντικά πιο αποδοτικές σε σχέση με τις ήδη υπάρχουσες. Τα κυκλώματα που προτείνουμε κατασκευάζονται με τη χρήση ενός μόνο κελιού υπολογισμού (διαφορετικό για κάθε δίκτυο ταξινόμησης) και διατηρούν μια πλήρως κανονική δομή. Το στοιχείο αυτό, συμβάλλει, πέρα από τη βελτίωση της απόδοσης, στην αποτελεσματικότερη χωροθέτηση του κυκλώματος και στη μείωση των αρνητικών επιδράσεων των γραμμών διασύνδεσης. Η τελευταία κατηγορία κυκλωμάτων αναφέρεται σε κυκλώματα που χρησιμοποιούνται για την υλοποίηση της πρόσθεσης αριθμών κινητής υποδιαστολής. Τα κυκλώματα που προτείνουμε χρησιμοποιούνται στα πιο κρίσιμα στάδια, από πλευράς καθυστέρησης, του υπολογισμού του αθροίσματος και αφορούν στην πρόσθεση των μεγεθών και στην κανονικοποίηση του αποτελέσματος. Αρχικά, περιγράφουμε μια εναλλακτική προσέγγιση για την υλοποίηση των αθροιστών μεγέθους των αριθμών κινητής υποδιαστολής. Οι νέες μονάδες εκμεταλλεύονται την αναπαράσταση συμπληρώματος ως προς ένα και τις γρήγορες μονάδες υπολογισμού του κρατουμένου, που βασίζονται στην τεχνική παράλληλου προθέματος. Προτείνουμε μια ενοποιημένη μεθοδολογία για το πως μπορούμε να παράγουμε δομές παράλληλου προθέματος ανεξάρτητα από το μέγεθος της λέξης εισόδου, ενώ καταφέρνουμε να ενώσουμε για πρώτη φορά τις απλοποιημένες σχέσεις κρατουμένου του Ling με την πρόσθεση αριθμών που ακολουθούν την αναπαράσταση συμπληρώματος ως προς ένα. Στη συνέχεια, περιγράφεται ένας νέος απλός τρόπος για την υλοποίηση της πρόβλεψης και της μέτρησης των προπορευόμενων μηδενικών που εμφανίζονται στα αποτελέσματα των πράξεων αριθμών κινητής υποδιαστολής. Με τη χρήση των νέων κυκλωμάτων η κανονικοποίηση του αποτελέσματος μπορεί να πραγματοποιηθεί σε λιγότερο χρόνο και με σημαντικά μικρότερη ενέργεια. / Data processing units (or simply datapath) constitute a major part of all microprocessors. They take over the execution of all arithmetic operations either of fixed point or floating-point data, while they are also responsible for the execution of the needed data rearrangements in order to speed up the computation. In application-specific processors used for media and graphics applications, datapath circuits occupy more than one third of the processor’s core area and their efficient design directly affects the energy-delay behavior of the whole circuit. In the near future, it is expected that even general-purpose processors will be equipped we specialized accelerators that will execute directly in hardware complex algorithms with large computational demands. The basis of all circuits presented in this thesis is the derivation of an inherently simpler algorithm that would allow their efficient implementation irrespective the technology used and the constraints that would be imposed in the future, concerning the reliable and more predictable circuit fabrication in very deep submicron technologies. Our analysis relies on full-custom transistor-level designs that is the most common technique employed in high-performance microprocessor design. The performance of some of the presented circuits has also been investigated using an automated design flow. It is expected that, in these cases, the performance of the presented circuits will be aggravated due to the limitations imposed by the design automation tools and the available standard cell library. In this study, we aim at fully exploring the design space of our circuits. For this reason, we derived an optimal energy-delay curve for each one of the examined circuits in order to analyze its behavior. An energy-delay curve is the most reliable metric for presenting the performance of a circuit and allows the designer to perform a fair comparison among various design alternatives and circuit topologies. The new circuits presented in this thesis belong to three categories. In the first class, we find the parallel prefix adders that adopt the carries proposed by Ling. These carries are a simplified form of the classic carry lookahead equations and they are used at the moment in the majority of commercial high-speed microprocessors. The newly proposed circuits are based on a transformation of the Ling carries that leads to more efficient parallel prefix structures, which are better suited for Ling-carry computation. This new technique offers faster implementations irrespective the logic family used (either static or dynamic CMOS) and the prefix structure selected for the implementation. The second class refers to circuits that rearrange the data stored inside one or more of the processor’s registers. Efficient data rearrangement ends up being, in many cases, such as cryptography, digital signal processing, and multimedia applications, as essential as the fast implementation of basic arithmetic operations and the high bandwidth processor-memory communication. Our effort has focused on the efficient implementation of one of the most versatile permutation instruction, aiming to the reduction of the delay of the corresponding circuit. The design of the proposed permutation units is put under a common framework and their functionality resembles that of sorting networks. All the presented variants are designed using a single processing element (different for each sorting network) and have a very regular structure. This fact significantly contributes to the delay reduction because of the regular placement of the circuits’ cells that also alleviates the interconnect delay overhead. The last class of circuits is used for the implementation of high-speed floating-point units. The proposed circuits participate in two of the most time critical parts of any floating-point adder that is the significand (or fraction) adder and the result normalization unit. At first, we describe an alternative implementation of the significant adder that employs the one’s complement representation in order to reduce the delay of the circuit. The proposed parallel-prefix structures are derived using a general design methodology that leads to efficient designs irrespective the wordlength of the input operands. Also, we managed for the first time to produce simplified parallel-prefix carry computation units for the case of one’s complement addition that rely on the definition of Ling carries. Secondly, we describe a simple and practical algorithm for counting the number of leading zeros that may appear in the result of floating-point addition. New circuits are also presented that simplify the design of the corresponding leading zero anticipation logic. Using the proposed structures, normalization can be performed with less delay and significantly reduced power dissipation compared to already known implementations. Μικροεπεξεργαστές 621.39 Very large scale integration (VLSI) Datapath design Floating point units
94	Efficient algorithms for verified scientific computing : Numerical linear algebra using interval arithmetic / Algorithmes efficaces pour le calcul scientifique vérifié : algèbre linéaire numérique et arithmétique par intervalles Nguyen, Hong Diep 18 January 2011 (has links) L'arithmétique par intervalles permet de calculer et simultanément vérifier des résultats. Cependant, une application naïve de cette arithmétique conduit à un encadrement grossier des résultats. De plus, de tels calculs peuvent être lents.Nous proposons des algorithmes précis et des implémentations efficaces, utilisant l'arithmétique par intervalles, dans le domaine de l'algèbre linéaire. Deux problèmes sont abordés : la multiplication de matrices à coefficients intervalles et la résolution vérifiée de systèmes linéaires. Pour le premier problème, nous proposons deux algorithmes qui offrent de bons compromis entre vitesse et précision. Pour le second problème, nos principales contributions sont d'une part une technique de relaxation, qui réduit substantiellement le temps d'exécution de l'algorithme, et d'autre part l'utilisation d'une précision étendue en quelques portions bien choisies de l'algorithme, afin d'obtenir rapidement une grande précision. / Interval arithmetic is a means to compute verified results. However, a naive use of interval arithmetic does not provide accurate enclosures of the exact results. Moreover, interval arithmetic computations can be time-consuming. We propose several accurate algorithms and efficient implementations in verified linear algebra using interval arithmetic. Two fundamental problems are addressed, namely the multiplication of interval matrices and the verification of a floating-point solution of a linear system. For the first problem, we propose two algorithms which offer new tradeoffs between speed and accuracy. For the second problem, which is the verification of the solution of a linear system, our main contributions are twofold. First, we introduce a relaxation technique, which reduces drastically the execution time of the algorithm. Second, we propose to use extended precision for few, well-chosen parts of the computations, to gain accuracy without losing much in term of execution time. Calcul scientifique vérifié Arithmétique par intervalles Arithmétique flottante Algèbre linéaire Produit de matrices Système linéaire Précision Efficacité Performances Verified scientific computing Interval arithmetic Floating-point arithmetic Linear algebra Matrix product Linear system Precision Accuracy Efficiency Performances
95	Numerical Quality and High Performance In Interval Linear Algebra on Multi-Core Processors / Algèbre linéaire d'intervalles - Qualité Numérique et Hautes Performances sur Processeurs Multi-Cœurs Theveny, Philippe 31 October 2014 (has links) L'objet est de comparer des algorithmes de multiplication de matrices à coefficients intervalles et leurs implémentations.Le premier axe est la mesure de la précision numérique. Les précédentes analyses d'erreur se limitent à établir une borne sur la surestimation du rayon du résultat en négligeant les erreurs dues au calcul en virgule flottante. Après examen des différentes possibilités pour quantifier l'erreur d'approximation entre deux intervalles, l'erreur d'arrondi est intégrée dans l'erreur globale. À partir de jeux de données aléatoires, la dispersion expérimentale de l'erreur globale permet d'éclairer l'importance des différentes erreurs (de méthode et d'arrondi) en fonction de plusieurs facteurs : valeur et homogénéité des précisions relatives des entrées, dimensions des matrices, précision de travail. Cette démarche conduit à un nouvel algorithme moins coûteux et tout aussi précis dans certains cas déterminés.Le deuxième axe est d'exploiter le parallélisme des opérations. Les implémentations précédentes se ramènent à des produits de matrices de nombres flottants. Pour contourner les limitations d'une telle approche sur la validité du résultat et sur la capacité à monter en charge, je propose une implémentation par blocs réalisée avec des threads OpenMP qui exécutent des noyaux de calcul utilisant les instructions vectorielles. L'analyse des temps d'exécution sur une machine de 4 octo-coeurs montre que les coûts de calcul sont du même ordre de grandeur sur des matrices intervalles et numériques de même dimension et que l'implémentation par bloc passe mieux à l'échelle que l'implémentation avec plusieurs appels aux routines BLAS. / This work aims at determining suitable scopes for several algorithms of interval matrices multiplication.First, we quantify the numerical quality. Former error analyses of interval matrix products establish bounds on the radius overestimation by neglecting the roundoff error. We discuss here several possible measures for interval approximations. We then bound the roundoff error and compare experimentally this bound with the global error distribution on several random data sets. This approach enlightens the relative importance of the roundoff and arithmetic errors depending on the value and homogeneity of relative accuracies of inputs, on the matrix dimension, and on the working precision. This also leads to a new algorithm that is cheaper yet as accurate as previous ones under well-identified conditions.Second, we exploit the parallelism of linear algebra. Previous implementations use calls to BLAS routines on numerical matrices. We show that this may lead to wrong interval results and also restrict the scalability of the performance when the core count increases. To overcome these problems, we implement a blocking version with OpenMP threads executing block kernels with vector instructions. The timings on a 4-octo-core machine show that this implementation is more scalable than the BLAS one and that the cost of numerical and interval matrix products are comparable. Algèbre linéaire numérique Multiplication de matrices Implémentation parallèle Processeurs multi-cœurs Memoire partagée Virgule flottante Analyse d’erreur Arithmétique d’intervalles Numerical linear algebra Matrix multiplication Parallel implementation Multi-core processors Shared memory Floating-point number Error analysis Interval arithmetic
96	Simulation temps-réel embarquée de systèmes électriques au moyen de FPGA / FPGA-based Embedded real time simulation of electrical systems Dagbagi, Mohamed 08 October 2015 (has links) L'objectif de ce travail de thèse est de développer une bibliothèque de modules IPs (Intellectual Properties) de simulateurs temps réel embarqués qui simulent différents éléments d'un système électrique. Ces modules ont été conçus pour être utiliser non seulement pour une validation HIL (Hardware-In-the-Loop) des commandes numériques mais aussi pour des applications de commande embarquées, où le module IP de simulateur et le contrôleur sont tous les deux implémentés et exécutés dans la même cible FPGA. Cette nouvelle classe de simulateurs temps réel devrait être de plus en plus incluse dans la prochaine génération de contrôleurs numériques. En effet, ces modules IPs de simulateurs temps réel embarqués peuvent être avantageusement intégrés dans les contrôleurs numériques pour assurer des fonctions comme l'observation, l'estimation, le diagnostic où la surveillance de la santé. Inversement aux cas de HIL, le principal défi lors de la conception de tels simulateurs est de faire face à leur complexité ayant à l'esprit que, dans le cas des systèmes embarqués, les ressources matérielles disponibles sont limitées en raison du coût. En outre, ce problème est renforcé par la nécessité des pas de simulation très petit. Ceci est généralement le cas lors de la simulation des convertisseurs de puissance.Pour développer ces modules IPs, des lignes directrices dédiés de conception ont été proposées pour être suivies pour gérer la complexité de ces simulateurs (solveur de modèle, solveur numérique, pas de simulation, conditionnement de données) tout en tenant compte des contraintes temporelles et matérielles/coût (temps de calcul limité, ressources matérielles limitées ...).Les modules IPs de simulateurs à développer ont été organisés en deux catégories principales: ceux qui sont consacrées aux éléments électromagnétiques d'un système électrique, et ceux dédiés à ses éléments commutés.La première catégorie regroupe les éléments où les phénomènes électriques, magnétiques sont modélisés en plus de phénomènes mécaniques (pour les parties mécaniques) et des phénomènes potentiellement thermiques. Trois cas sont traités: le simulateur temps réel embarqué d'une machine synchrone triphasée, celui d'une machine asynchrone triphasée et celui d'un alternateur synchrone à trois étages. En plus de cela, les avantages de l'utilisation de la transformation delta pour améliorer la stabilité du solveur numérique lorsque un petit pas de calcul et le codage virgule fixe (avec une précision de données limitée) sont utilisés, ont été étudiés.La deuxième catégorie concerne des éléments commutés tels que les convertisseurs de puissance où les événements de commutation sont considérés. Là encore, plusieurs topologies de convertisseurs ont été étudiées: un redresseur simple alternance, un hacheur série, un hacheur réversible en courant, un hacheur quatre quadrant, un onduleur monophasé, un onduleur triphasé, un redresseur à diodes triphasé et un redresseur MLI triphasé. Pour tous ces modules IPs de simulateurs, l'approche de modélisation ADC (Associated Discrete Circuit) est adoptée.Le module IP de simulateur temps réel embarqué du redresseur MLI a été appliqué dans un contexte d'une application embarquée. Cette dernière consiste en une commande tolérante aux défauts d'un convertisseur de tension coté réseau. Ainsi, ce module IP est associé à celui d'un simulateur temps réel d'un filtre RL triphasé et les deux sont embarqués dans le dispositif de commande du redresseur pour estimer les courants de lignes. Ces courants sont injectés dans le dispositif de commande dans le cas d'un défaut de capteur de courant. La capacité de cet estimateur de garantir la continuité de service en cas de défauts est validée par des tests HIL et expérimentalement. / The aim of this thesis work is to develop an IP-Library of FPGA-based embedded real-time simulator IPs (Intellectual Properties) that simulate different elements of an electrical system. These IPs have been designed to be used not only for Hardware-In-the-Loop (HIL) testing of digital controllers but also for low cost embedded control applications, where the simulator IP and the controller are both implemented and run altogether in the same FPGA device. This emerging class of real-time simulators is expected to be more and more included in the next generation of digital controllers. Indeed, such embedded real-time simulator IPs can be advantageously embedded within digital controllers to ensure functions like observation, estimation, diagnostic or health-monitoring. Conversely to the HIL case, the main challenge when designing such simulator IPs is to cope with their complexity having in mind that, in the case of embedded systems, the available hardware resources are limited due to the cost. Furthermore, this challenge is strengthened by the need of very short simulation time-steps which is typically the case when simulating power converters.To develop these IPs, dedicated design guidelines have been proposed to be followed to manage the complexity of these simulator IPs (model solver, numerical solver, time-step, data conditioning) with regards to the timing and the area/cost constraints (computation time limit, limited hardware resources …).The simulators IPs to be developed have been organized into two main categories: those dedicated to electromagnetic elements of an electrical system and those dedicated to their switching elements.The first category gathers elements where electric, magnetic phenomena are modelized in addition to mechanical phenomena (for moving systems) and potentially thermal phenomena. Three cases are dealt with: the embedded real-time simulator of a three-phase synchronous machine, the one of a three-phase induction machine and the one of a brushless synchronous generator. Also, the advantages of using delta transformation to improve the stability of the numerical solver when short simulation time-step and fixed-point (with limited data precision) are used, have been studied.The second category concerns switching elements such as power converters where switching events are considered. Here again, several converter topologies have been studied: a half-wave rectifier, a buck DC-DC converter, a bidirectional buck DC-DC converter, a H-bridge DC-DC converter, a single-phase H-bridge DC-AC converter, a three-phase voltage source inverter, a three-phase diode rectifier and a three-phase PWM rectifier. For all these IPs, the Associated Discrete Circuit (ADC) modeling approach is adopted.The embedded real-time simulator IP of the three-phase PWM rectifier has been applied in the context of an embedded application. The latter consists of a fault-tolerant control of a grid-connected voltage source rectifier. Thus, this simulator IP is associated with the one of a three-phase RL-filter and are both implemented within the rectifier controller to estimate the grid currents. These currents are injected in the controller in the case of a current sensor fault. The ability of this estimator to guarantee the service continuity in the case of faults is validated through HIL tests and experiments. Fpga Simulation temps réel embarquée Systèmes électriques Codage virgule flottante Opérateur delta Codage virgule fixe Fpga Embedded real-Time simulation Electrical systems Floating-Point quantization Delta operator Fixed-Point quantization
97	Automatické řízení výpočtu ve specializovaném výpočetním systému / Specialized Computer System Automatic Control Opálka, Jan January 2016 (has links) This work deals with the automatic control of calculations of specialized system. The reader is acquainted with the numerical solution of differential equations by Taylor series method and numerical integrators. The practical aim of this work is to analyze parallel characteristics of Taylor series method, specification of parallel math operations and design of control of this operations.
98	Paralelní řešení parciálních diferenciálnich rovnic / Partial Differential Equations Parallel Solutions Čambor, Michal January 2011 (has links) This thesis deals with the concepts of numerical integrator using floating point arithmetic for solving partial differential equations. The integrator uses Euler method and Taylor series. Thesis shows parallel and serial approach to computing with exponents and significands. There is also a comparison between modern parallel systems and the proposed concepts.
99	Výpočet vlastních čísel a vlastních vektorů hermitovské matice / Computation of the eigenvalues and eigenvectors of Hermitian matrix Štrympl, Martin January 2016 (has links) This project deals with computation of eigenvalues and eigenvectors of Hermitian positive-semidefinite complex square matrix of order 4. The target is an implementation of computation in language VHDL to field-programmable gate array of type Xilinx Zynq-7000. This master project deals with algorithms used for computation of eigenvalues and eigenvectors of positive-semidefinite symmetric real square and positive-semidefinite complex Hermitian matrix and the analysis of algorithms by AnalyzeAlgorithm program assembled for this purpose. The closing part of this project describes implementation of the computation into field-programmable gate array with use of IP core Xilinx® Floating-Point \linebreak Operator and SVAOptimalizer, SVAInterpreter and SVAToDSPCompiler programs.
100	Leveraging Posits for the Conjugate Gradient Linear Solver on an Application-Level RISC-V Core Mallasén Quintana, David January 2022 (has links) Emerging floating-point arithmetics provide a way to optimize the execution of computationally-intensive algorithms. This is the case with scientific computational kernels such as the Conjugate Gradient (CG) linear solver. Exploring new arithmetics is of paramount importance to maximize the accuracy and timing performance of these algorithms. In this thesis, I have studied the use of the novel posit arithmetic in hardware to improve the accuracy of the CG method. In particular, on PERCIVAL, an application-level RISC-V core with support for posits and quire. The open RISC-V architecture supplies a flexible platform for the exploration of new computer architecture studies. Previous works have tackled the use of posits in the high-performance computing and machine learning fields, amongst others. However, until recently, the lack of hardware support has been a significant barrier to their scalability. The key results from this thesis show that posits are a promising alternative when solving 1D and 2D Poisson equations using the CG linear solver. Notably, this novel arithmetic can execute as fast as IEEE 754 floating-point numbers on specialized hardware, and provide up to 2 orders of magnitude higher accuracy. This accuracy improvement spans both the error of the output values of the algorithms and the value of the final residual in the CG iterative method. Furthermore, the use of the quire accumulator register in the computation of dot-products in posit arithmetic significantly boosts the accuracy of the outputs. Since 32-bit posits perform practically as fast as 32-bit floats, and thus faster than 64-bit floats, they present an intermediate solution between single- and double-precision arithmetic. This paves the way for the deployment of high-efficiency solutions that make intensive use of floating-point operations. / Ny kommande flyttalsaritmetik ger ett sätt att optimera exekveringen av beräkningsintensiva algoritmer. Detta är fallet med vetenskapliga beräkningskärnor som den Conjugate Gradient (CG) metoden kräver. Att utforska ny aritmetik är av största vikt för att minska energikostnaderna för dessa algoritmer. I detta examensarbete har jag studerat användningen av den nya positaritmetiken i hårdvara för att förbättra noggrannheten i CG-metoden. I synnerhet på PERCIVAL, en RISC-V-kärna på applikationsnivå med stöd för posits och quire. Den öppna RISC-V-arkitekturen tillhandahåller en flexibel plattform för utforskning av nya dator arkitekturstudier. Tidigare arbeten har tagit itu med användningen av positurer inom områdena högpresterande datorer och maskininlärning, bland annat. Men fram till nyligen har bristen på hårdvarustöd varit ett betydande hinder för deras skalbarhet. Nyckelresultaten från denna avhandling visar att posits är ett lovande alternativ när man löser 1D och 2D Poisson-ekvationer med den linjära CG-lösaren. Noterbart kan denna nya aritmetik köra så snabbt som IEEE 754 flyttal på specialiserad hårdvara och ge upp till två storleksordningar högre noggrannhet. Denna noggrannhetsförbättring sträcker sig över både felet i algoritmernas utvärden och värdet på den slutliga residualen i den iterativa CG-metoden. Dessutom ökar användningen av quire-ackumulatorregistret vid beräkning av punktprodukter i positaritmetik avsevärt noggrannheten hos utsignalerna. Eftersom 32-bitars posits presterar praktiskt taget lika snabbt som 32-bitars flöten, och därmed snabbare än 64-bitars flöten, presenterar de en mellanlösning mellan enkel-och dubbelprecisionsaritmetik. Detta banar väg för utbyggnaden av högeffektiva lösningar som intensivt utnyttjar flyttalsoperationer. Computer Arithmetic Conjugate Gradient Posit IEEE-754 Floating Point High-Performance Computing RISC-V Datoraritmetik Konjugerad Gradient Posit IEEE-754 Flytande Punkt Högpresterande Beräkningar RISC-V Computer and Information Sciences Data- och informationsvetenskap

Search results