Global ETD Search

71	Mitchell-Based Approximate Operations on Floating-Point Numbers Hellman, Noah January 2021 (has links) By adapting Mitchell's algorithm for floating-point numbers, one can efficiently perform arithmetic floating-point operations in an approximate logarithmic domain in order to perform approximate computations of functions such as multiplication, division, square root and others. This work examines how this algorithm can be improved in terms of accuracy and hardware complexity by applying a set of various methods that are parametrized and offer a large design space. Optimal coefficients for a large portion of this space is determined and used to synthesize circuits for both ASIC and FPGA circuits using the bfloat16 format\@. Optimal configurations are then extracted to create an optimal curve where one can select an acceptable error range and obtain a circuit with a minimal hardware cost. floating-point numbers approximations arithmetics asic fpga mitchell Computer Systems Datorsystem Computer Sciences Datavetenskap (datalogi) Computer and Information Sciences Data- och informationsvetenskap
72	C++ knihovna pro práci s čísly v pohyblivé řádové čárce s libovolnou přesností / C++ Arbitrary Precision Floating Point Library Závada, Vladislav January 2019 (has links) This thesis deals with the design of a floating point module, which allows to perform operations with floating point operands that have any bit width. For this purpose, the module is implemented as a template class in C ++. The module is designed to allow it to be used when designing an application-specific processor. First, the floating point number and template functions in c ++ are described. In the practical part the algorithms of the individual operations and the design of the module itself are described as template libraries.
73	Towards a modern floating-point environment / Vers l'environnement flottant moderne Kupriianova, Olga 11 December 2015 (has links) Cette thèse fait une étude sur deux moyens d'enrichir l'environnement flottant courant : le premier est d'obtenir plusieurs versions d'implantation pour chaque fonction mathématique, le deuxième est de fournir des opérations de la norme IEEE754, qui permettent de mélanger les entrées et la sortie dans les bases différentes. Comme la quantité de versions différentes pour chaque fonction mathématique est énorme, ce travail se concentre sur la génération du code. Notre générateur de code adresse une large variété de fonctions: il produit les implantations paramétrées pour les fonctions définies par l'utilisateur. Il peut être vu comme un générateur de fonctions boîtes-noires. Ce travail inclut un nouvel algorithme pour le découpage de domaine et une tentative de remplacer les branchements pendant la reconstruction par un polynôme. Le nouveau découpage de domaines produit moins de sous-domaines et les degrés polynomiaux sur les sous-domaines adjacents ne varient pas beaucoup. Pour fournir les implantations vectorisables il faut éviter les branchements if-else pendant la reconstruction. Depuis la révision de la norme IEEE754 en 2008, il est devenu possible de mélanger des nombres de différentes précisions dans une opération. Par contre, il n'y a aucun mécanisme qui permettrait de mélanger les nombres dans des bases différentes dans une opération. La recherche dans l'arithmétique en base mixte a commencé par les pires cas pour le FMA. Un nouvel algorithme pour convertir une suite de caractères décimaux du longueur arbitraire en nombre flottant binaire est présenté. Il est indépendant du mode d'arrondi actuel et produit un résultat correctement arrondi. / This work investigates two ways of enlarging the current floating-point environment. The first is to support several implementation versions of each mathematical function (elementary such as $\exp$ or $\log$ and special such as $\erf$ or $\Gamma$), the second one is to provide IEEE754 operations that mix the inputs and the output of different \radixes. As the number of various implementations for each mathematical function is large, this work is focused on code generation. Our code generator supports the huge variety of functions: it generates parametrized implementations for the user-specified functions. So it may be considered as a black-box function generator. This work contains a novel algorithm for domain splitting and an approach to replace branching on reconstruction by a polynomial. This new domain splitting algorithm produces less subdomains and the polynomial degrees on adjacent subdomains do not change much. To produce vectorizable implementations, if-else statements on the reconstruction step have to be avoided. Since the revision of the IEEE754 Standard in 2008 it is possible to mix numbers of different precisions in one operation. However, there is no mechanism that allows users to mix numbers of different radices in one operation. This research starts an examination ofmixed-radix arithmetic with the worst cases search for FMA. A novel algorithm to convert a decimal character sequence of arbitrary length to a binary floating-point number is presented. It is independent of currently-set rounding mode and produces correctly-rounded results. Arithmétique des ordinateurs Virgule flottante Fonctions élémentaires Générateur de code Metalibm Arithmétique en base mixte Computer arithmetic Floating-point numbers Elementary functions 004
74	Analysis of Fix‐point Aspects for Wireless Infrastructure Systems Grill, Andreas, Englund, Robin January 2009 (has links) A large amount of today’s telecommunication consists of mobile and short distance wireless applications, where the effect of the channel is unknown and changing over time, and thus needs to be described statistically. Therefore the received signal can not be accurately predicted and has to be estimated. Since telecom systems are implemented in real-time, the hardware in the receiver for estimating the sent signal can for example be based on a DSP where the statistic calculations are performed. A fixed-point DSP with a limited number of bits and a fixed binary point causes larger quantization errors compared to floating point operations with higher accuracy. The focus on this thesis has been to build a library of functions for handling fixed-point data. A class that can handle the most common arithmetic operations and a least squares solver for fixed-point have been implemented in MATLAB code. The MATLAB Fixed-Point Toolbox could have been used to solve this task, but in order to have full control of the algorithms and the fixed-point handling an independent library was created. The conclusion of the simulation made in this thesis is that the least squares result are depending more on the number of integer bits then the number of fractional bits. / En stor del av dagens telekommunikation består av mobila trådlösa kortdistanstillämpningar där kanalens påverkan är okänd och förändras över tid. Signalen måste därför beskrivas statistiskt, vilket gör att den inte kan bestämmas exakt, utan måste estimeras. Eftersom telekomsystem arbetar i realtid består hårdvaran i mottagaren av t.ex. en DSP där de statistiska beräkningarna görs. En fixtals DSP har ett bestämt antal bitar och fast binärpunkt, vilket introducerar ett större kvantiseringsbrus jämfört med flyttalsoperationer som har en större noggrannhet. Tyngdpunkten på det här arbetet har varit att skapa ett bibliotek av funktioner för att hantera fixtal. En klass har skapats i MATLAB-kod som kan hantera de vanligaste aritmetiska operationerna och lösa minsta-kvadrat-problem. MATLAB:s Fixed-Point Toolbox skulle kunna användas för att lösa den här uppgiften men för att ha full kontroll över algoritmerna och fixtalshanteringen behövs ett eget bibliotek av funktioner som är oberoende av MATLAB:s Fixed-Point Toolbox. Slutsatsen av simuleringen gjord i detta examensarbete är att resultatet av minsta-kvadrat-metoden är mer beroende av antalet heltalsbitar än antalet binaler. / fixtal, telekommunikation, DSP, MATLAB, Fixed-Point Toolbox, minsta-kvadrat-lösning, flyttal, Householder QR faktorisering, saturering, kvantiseringsbrus fixed-point fixed point telecommunication DSP MATLAB Fixed‐Point Toolbox least squares solution floating point Householder QR decomposition saturation quantization error Electronics Elektronik Telecommunication Telekommunikation Electrical engineering Elektroteknik
75	Improving the Numerical Accuracy of Floating-Point Programs with Automatic Code Transformation Methods / Amélioration de la précision numérique de programmes basés sur l'arithmétique flottante par les méthodes de transformation automatique Damouche, Nasrine 12 December 2016 (has links) Les systèmes critiques basés sur l’arithmétique flottante exigent un processus rigoureux de vérification et de validation pour augmenter notre confiance en leur sureté et leur fiabilité. Malheureusement, les techniques existentes fournissent souvent une surestimation d’erreurs d’arrondi. Nous citons Arian 5 et le missile Patriot comme fameux exemples de désastres causés par les erreurs de calculs. Ces dernières années, plusieurs techniques concernant la transformation d’expressions arithmétiques pour améliorer la précision numérique ont été proposées. Dans ce travail, nous allons une étape plus loin en transformant automatiquement non seulement des expressions arithmétiques mais des programmes complets contenant des affectations, des structures de contrôle et des fonctions. Nous définissons un ensemble de règles de transformation permettant la génération, sous certaines conditions et en un temps polynômial, des expressions pluslarges en appliquant des calculs formels limités, au sein de plusieurs itérations d’une boucle. Par la suite, ces larges expressions sont re-parenthésées pour trouver la meilleure expression améliorant ainsi la précision numérique des calculs de programmes. Notre approche se base sur les techniques d’analyse statique par interprétation abstraite pour sur-rapprocher les erreurs d’arrondi dans les programmes et au moment de la transformation des expressions. Cette approche est implémenté dans notre outil et des résultats expérimentaux sur des algorithmes numériques classiques et des programmes venant du monde d’embarqués sont présentés. / Critical software based on floating-point arithmetic requires rigorous verification and validation process to improve our confidence in their reliability and their safety. Unfortunately available techniques for this task often provide overestimates of the round-off errors. We can cite Arian 5, Patriot rocket as well-known examples of disasters. These last years, several techniques have been proposed concerning the transformation of arithmetic expressions in order to improve their numerical accuracy and, in this work, we go one step further by automatically transforming larger pieces of code containing assignments, control structures and functions. We define a set of transformation rules allowing the generation, under certain conditions and in polynomial time, of larger expressions by performing limited formal computations, possibly among several iterations of a loop. These larger expressions are better suited to improve, by re-parsing, the numerical accuracy of the program results. We use abstract interpretation based static analysis techniques to over-approximate the round-off errors in programs and during the transformation of expressions. A tool has been implemented and experimental results are presented concerning classical numerical algorithms and algorithms for embedded systems. Arithmétique flottante Précision numérique Erreur d'arrondi Transformation automatique de programmes Analyse statique Interprétation abstraite Floating-point arithmetic Numerical accuracy Rounding errors Automatic transformation of programs Static analysis Abstract interpretation 004
76	Analysis of Fix‐point Aspects for Wireless Infrastructure Systems Grill, Andreas, Englund, Robin January 2009 (has links) <p>A large amount of today’s telecommunication consists of mobile and short distance wireless applications, where the effect of the channel is unknown and changing over time, and thus needs to be described statistically. Therefore the received signal can not be accurately predicted and has to be estimated. Since telecom systems are implemented in real-time, the hardware in the receiver for estimating the sent signal can for example be based on a DSP where the statistic calculations are performed. A fixed-point DSP with a limited number of bits and a fixed binary point causes larger quantization errors compared to floating point operations with higher accuracy.</p><p>The focus on this thesis has been to build a library of functions for handling fixed-point data. A class that can handle the most common arithmetic operations and a least squares solver for fixed-point have been implemented in MATLAB code.</p><p>The MATLAB Fixed-Point Toolbox could have been used to solve this task, but in order to have full control of the algorithms and the fixed-point handling an independent library was created.</p><p>The conclusion of the simulation made in this thesis is that the least squares result are depending more on the number of integer bits then the number of fractional bits.</p> / <p>En stor del av dagens telekommunikation består av mobila trådlösa kortdistanstillämpningar där kanalens påverkan är okänd och förändras över tid. Signalen måste därför beskrivas statistiskt, vilket gör att den inte kan bestämmas exakt, utan måste estimeras. Eftersom telekomsystem arbetar i realtid består hårdvaran i mottagaren av t.ex. en DSP där de statistiska beräkningarna görs. En fixtals DSP har ett bestämt antal bitar och fast binärpunkt, vilket introducerar ett större kvantiseringsbrus jämfört med flyttalsoperationer som har en större noggrannhet.</p><p>Tyngdpunkten på det här arbetet har varit att skapa ett bibliotek av funktioner för att hantera fixtal. En klass har skapats i MATLAB-kod som kan hantera de vanligaste aritmetiska operationerna och lösa minsta-kvadrat-problem.</p><p>MATLAB:s Fixed-Point Toolbox skulle kunna användas för att lösa den här uppgiften men för att ha full kontroll över algoritmerna och fixtalshanteringen behövs ett eget bibliotek av funktioner som är oberoende av MATLAB:s Fixed-Point Toolbox.</p><p>Slutsatsen av simuleringen gjord i detta examensarbete är att resultatet av minsta-kvadrat-metoden är mer beroende av antalet heltalsbitar än antalet binaler.</p> / fixtal, telekommunikation, DSP, MATLAB, Fixed-Point Toolbox, minsta-kvadrat-lösning, flyttal, Householder QR faktorisering, saturering, kvantiseringsbrus fixed-point fixed point telecommunication DSP MATLAB Fixed‐Point Toolbox least squares solution floating point Householder QR decomposition saturation quantization error Electronics Elektronik Telecommunication Telekommunikation Electrical engineering Elektroteknik
77	Taking architecture and compiler into account in formal proofs of numerical programs Nguyen, Thi Minh Tuyen 11 June 2012 (has links) (PDF) On some recently developed architectures, a numerical program may give different answers depending on the execution hardware and the compilation. These discrepancies of the results come from the fact that each floating-point computation is calculated with different precisions. The goal of this thesis is to formally prove properties about numerical programs while taking the architecture and the compiler into account. In order to do that, we propose two different approaches. The first approach is to prove properties of floating-point programs that are true for multiple architectures and compilers. This approach states the rounding error of each floating-point computation whatever the environment and the compiler choices. It is implemented in the Frama-C platform for static analysis of C code. The second approach is to prove behavioral properties of numerical programs by analyzing their compiled assembly code. We focus on the issues and traps that may arise on floating-point computations. Direct analysis of the assembly code allows us to take into account architecture- or compiler-dependent features such as the possible use of extended precision registers. It is implemented above the Why platform for deductive verification [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Floating-point arithmetic Numerical programs Static analysis Compile-time optimizations The Why platform The Frama-C platform
78	Custom floating-point arithmetic for integer processors : algorithms, implementation, and selection Jourdan, Jingyan 15 November 2012 (has links) (PDF) Media processing applications typically involve numerical blocks that exhibit regular floating-point computation patterns. For processors whose architecture supports only integer arithmetic, these patterns can be profitably turned into custom operators, coming in addition to the five basic ones (+, -, X, / and √), but achieving better performance by treating more operations. This thesis addresses the design of such custom operators as well as the techniques developed in the compiler to select them in application codes. We have designed optimized implementations for a set of custom operators which includes squaring, scaling, adding two nonnegative terms, fused multiply-add, fused square-add (x*x+z, with z>=0), two-dimensional dot products (DP2), sums of two squares, as well as simultaneous addition/subtraction and sine/cosine. With novel algorithms targeting high instruction-level parallelism and detailed here for squaring, scaling, DP2, and sin/cos, we achieve speedups of up to 4.2x for individual custom operators even when subnormal numbers are fully supported. Furthermore, we introduce the optimizations developed in the ST231 C/C++ compiler for selecting such operators. Most of the selections are achieved at high level, using syntactic criteria. However, for fused square-add, we also enhance the framework of integer range analysis to support floating-point variables in order to prove the required positivity condition z>= 0. Finally, we provide quantitative evidence of the benefits to support this selection of custom operations: on DSP kernels and benchmarks, our approach allows us to be up to 1.59x faster compared to the sole usage of basic ones. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre IEEE floating-point arithmetic Custom operator Embedded integer processor VLIW architecture Compiler optimization Compiler code selection
79	Efficient algorithms for verified scientific computing : Numerical linear algebra using interval arithmetic Nguyen, Hong Diep 18 January 2011 (has links) (PDF) Interval arithmetic is a means to compute verified results. However, a naive use of interval arithmetic does not provide accurate enclosures of the exact results. Moreover, interval arithmetic computations can be time-consuming. We propose several accurate algorithms and efficient implementations in verified linear algebra using interval arithmetic. Two fundamental problems are addressed, namely the multiplication of interval matrices and the verification of a floating-point solution of a linear system. For the first problem, we propose two algorithms which offer new tradeoffs between speed and accuracy. For the second problem, which is the verification of the solution of a linear system, our main contributions are twofold. First, we introduce a relaxation technique, which reduces drastically the execution time of the algorithm. Second, we propose to use extended precision for few, well-chosen parts of the computations, to gain accuracy without losing much in term of execution time. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Verified scientific computing Interval arithmetic Floating-point arithmetic Linear algebra Matrix product Linear system Precision Accuracy Efficiency Performances
80	Tools for the Design of Reliable and Efficient Functions Evaluation Libraries / Outils pour la conception de bibliothèques de calcul de fonctions efficaces et fiables Torres, Serge 22 September 2016 (has links) La conception des bibliothèques d’évaluation de fonctions est un activité complexe qui requiert beaucoup de soin et d’application, particulièrement lorsque l’on vise des niveaux élevés de fiabilité et de performances. En pratique et de manière habituelle, on ne peut se livrer à ce travail sans disposer d’outils qui guident le concepteur dans le dédale d’un espace de solutions étendu et complexe mais qui lui garantissent également la correction et la quasi-optimalité de sa production. Dans l’état actuel de l’art, il nous faut encore plutôt raisonner en termes de « boite à outils » d’où le concepteur doit tirer et combiner des mécanismes de base, au mieux de ses objectifs, plutôt qu’imaginer que l’on dispose d’un dispositif à même de résoudre automatiquement tous les problèmes.Le présent travail s’attache à la conception et la réalisation de tels outils dans deux domaines:∙ la consolidation du test d’arrondi de Ziv utilisé, jusqu’à présent de manière plus ou moins empirique, dans l’implantation des approximations de fonction ;∙ le développement d’une implantation de l’algorithme SLZ dans le but de résoudre le « Dilemme du fabricant de table » dans le cas de fonctions ayant pour opérandes et pour résultat approché des nombres flottants en quadruple précision (format Binary64 selon la norme IEEE-754). / The design of function evaluation libraries is a complex task that requires a great care and dedication, especially when one wants to satisfy high standards of reliability and performance. In actual practice, it cannot be correctly performed, as a routine operation, without tools that not only help the designer to find his way in a complex and extended solution space but also to guarantee that his solutions are correct and (almost) optimal. As of the present state of the art, one has to think in terms of “toolbox” from which he can smartly mix-and-match the utensils that fit better his goals rather than expect to have at hand a solve-all automatic device.The work presented here is dedicated to the design and implementation of such tools in two realms:∙ the consolidation of Ziv’s rounding test that is used, in a more or less empirical way, for the implementation of functions approximation;∙ the development of an implementation of the SLZ-algorithm in order to solve the Table Maker Dilemma for the function with quad-precision floating point (IEEE-754 Binary128 format) arguments and images. Arithmétique des ordinateurs Approximation de fonctions Arithmétique flottante Dilemme du fabricant de tables Arrondi correct Computer arithmetic Function approximation Floating-point arithmetic Table Maker's Dilemma Correct rounding

Search results