• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 76
  • 6
  • 5
  • 4
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 106
  • 106
  • 44
  • 38
  • 20
  • 19
  • 15
  • 15
  • 15
  • 14
  • 11
  • 9
  • 8
  • 8
  • 8
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

The Design, Simulation and Synthesis of Pipelined Floating-Point Radix-4 Fast Fourier Transform Data Path in VHDL

Nicklous, Francis Edward January 2010 (has links)
The Fast Fourier Transform (FFT) converts time or spatial information into the frequency domain. The FFT is one of the most widely used digital signal processing (DSP) algorithms. DSPs are used in a number of applications from communication and controls to speech and image processing. DSPs have also found their way into toys, music synthesizers and in most digital instruments. Many applications have relied on Digital Signal Processors and Application Specific Integrated Circuits (ASIC) for most of the signal processing needs. DSPs provide an adequate means of performance and efficiency for many applications as well as robust tools to ease the development process. However, the requirements of important emerging DSP applications have begun to exceed the capabilities of DSPs. With this in mind, system developers have begun to consider alternatives such as ASICs and Field Programmable Gate Arrays (FPGA). Although ASICs can provide excellent performance and efficiency, the time, cost and risk associated with the design of ASICs is leading developers towards FPGAs. A number of significant advances in FPGA technology have improved the suitability of FPGAs for DSP applications. These advances include increased device capacity and speed, DSP-oriented architectural enhancements, better DSP-oriented tools, and increasing availability of DSP-oriented IP libraries. The thesis research focuses on the design of a single precision floating-point radix-4 FFT FPGA using VHDL for real time DSP applications. The paper will go into further detail pertaining to the FFT algorithm used, the description of the design steps taken as well as the results from both simulation and synthesis. / Electrical and Computer Engineering
62

Characterization of FPGA-based High Performance Computers

Pimenta Pereira, Karl Savio 02 September 2011 (has links)
As CPU clock frequencies plateau and the doubling of CPU cores per processor exacerbate the memory wall, hybrid core computing, utilizing CPUs augmented with FPGAs and/or GPUs holds the promise of addressing high-performance computing demands, particularly with respect to performance, power and productivity. While traditional approaches to benchmark high-performance computers such as SPEC, took an architecture-based approach, they do not completely express the parallelism that exists in FPGA and GPU accelerators. This thesis follows an application-centric approach, by comparing the sustained performance of two key computational idioms, with respect to performance, power and productivity. Specifically, a complex, single precision, floating-point, 1D, Fast Fourier Transform (FFT) and a Molecular Dynamics modeling application, are implemented on state-of-the-art FPGA and GPU accelerators. As results show, FPGA floating-point FFT performance is highly sensitive to a mix of dedicated FPGA resources; DSP48E slices, block RAMs, and FPGA I/O banks in particular. Estimated results show that for the floating-point FFT benchmark on FPGAs, these resources are the performance limiting factor. Fixed-point FFTs are important in a lot of high performance embedded applications. For an integer-point FFT, FPGAs exploit a flexible data path width to trade-off circuit cost and speed of computation, improving performance and resource utilization. GPUs cannot fully take advantage of this, having a fixed data-width architecture. For the molecular dynamics application, FPGAs benefit from the flexibility in creating a custom, tightly-pipelined datapath, and a highly optimized memory subsystem of the accelerator. This can provide a 250-fold improvement over an optimized CPU implementation and 2-fold improvement over an optimized GPU implementation, along with massive power savings. Finally, to extract the maximum performance out of the FPGA, each implementation requires a balance between the formulation of the algorithm on the platform, the optimum use of available external memory bandwidth, and the availability of computational resources; at the expense of a greater programming effort. / Master of Science
63

Contribution to error analysis of algorithms in floating-point arithmetic / Contribution à l'analyse d'algorithmes en arithmétique à virgule flottante

Plet, Antoine 07 July 2017 (has links)
L’arithmétique virgule flottante est une approximation de l’arithmétique réelle dans laquelle chaque opération peut introduire une erreur. La norme IEEE 754 requiert que les opérations élémentaires soient aussi précises que possible, mais au cours d’un calcul, les erreurs d’arrondi s’accumulent et peuvent conduire à des résultats totalement faussés. Cela arrive avec une expression aussi simple que ab + cd, pour laquelle l’algorithme naïf retourne parfois un résultat aberrant, avec une erreur relative largement supérieure à 1. Il est donc important d’analyser les algorithmes utilisés pour contrôler l’erreur commise. Je m’intéresse à l’analyse de briques élémentaires du calcul en cherchant des bornes fines sur l’erreur relative. Pour des algorithmes suffisamment précis, en arithmétique de base β et de précision p, on arrive en général à prouver une borne sur l'erreur de la forme α·u + o(u²) où α > 0 et u = 1/2·β1-p est l'unité d'arrondi. Comme indication de la finesse d'une telle borne, on peut fournir des exemples numériques pour les précisions standards qui approchent cette borne, ou bien un exemple paramétré par la précision qui génère une erreur de la forme α·u + o(u²), prouvant ainsi l'optimalité asymptotique de la borne. J’ai travaillé sur la formalisation d’une arithmétique à virgule flottante symbolique, sur des nombres paramétrés par la précision, et à son implantation dans le logiciel de calcul formel Maple. J’ai aussi obtenu une borne d'erreur très fine pour un algorithme d’inversion complexe en arithmétique flottante. Ce résultat suggère le calcul d'une division décrit par la formule x/y = (1/y)·x, par opposition à x/y = (x·y)/|y|². Quel que soit l'algorithme utilisé pour effectuer la multiplication, nous avons une borne d'erreur plus petite pour les algorithmes décrits par la première formule. Ces travaux sont réalisés avec mes directeurs de thèse, en collaboration avec Claude-Pierre Jeannerod (CR Inria dans AriC, au LIP). / Floating-point arithmetic is an approximation of real arithmetic in which each operation may introduce a rounding error. The IEEE 754 standard requires elementary operations to be as accurate as possible. However, through a computation, rounding errors may accumulate and lead to totally wrong results. It happens for example with an expression as simple as ab + cd for which the naive algorithm sometimes returns a result with a relative error larger than 1. Thus, it is important to analyze algorithms in floating-point arithmetic to understand as thoroughly as possible the generated error. In this thesis, we are interested in the analysis of small building blocks of numerical computing, for which we look for sharp error bounds on the relative error. For this kind of building blocks, in base and precision p, we often successfully prove error bounds of the form α·u + o(u²) where α > 0 and u = 1/2·β1-p is the unit roundoff. To characterize the sharpness of such a bound, one can provide numerical examples for the standard precisions that are close to the bound, or examples that are parametrized by the precision and generate an error of the same form α·u + o(u²), thus proving the asymptotic optimality of the bound. However, the paper and pencil checking of such parametrized examples is a tedious and error-prone task. We worked on the formalization of a symbolicfloating-point arithmetic, over numbers that are parametrized by the precision, and implemented it as a library in the Maple computer algebra system. We also worked on the error analysis of the basic operations for complex numbers in floating-point arithmetic. We proved a very sharp error bound for an algorithm for the inversion of a complex number in floating-point arithmetic. This result suggests that the computation of a complex division according to x/y = (1/y)·x may be preferred, instead of the more classical formula x/y = (x·y)/|y|². Indeed, for any complex multiplication algorithm, the error bound is smaller with the algorithms described by the “inverse and multiply” approach.This is a joint work with my PhD advisors, with the collaboration of Claude-Pierre Jeannerod (CR Inria in AriC, at LIP).
64

Contribution à l'arithmétique des ordinateurs et applications aux systèmes embarqués / Contributions to computer arithmetic and applications to embedded systems

Brunie, Nicolas 16 May 2014 (has links)
Au cours des dernières décennies les systèmes embarqués ont dû faire face à des demandes applicatives de plus en plus variées et de plus en plus contraintes. Ce constat s'est traduit pour l’arithmétique par le besoin de toujours plus de performances et d'efficacité énergétique. Ce travail se propose d'étudier des solutions allant du matériel au logiciel, ainsi que les diverses interactions qui existent entre ces domaines, pour améliorer le support arithmétique dans les systèmes embarqués. Certains résultats ont été intégrés au processeur MPPA développé par Kalray. La première partie est consacrée au support de l'arithmétique virgule flottante dans le MPPA. Elle commence par la mise au point d'une unité flottante matérielle basée sur l'opérateur classique FMA (fused multiply-Add). Les améliorations proposées, implémentées et évaluées incluent un FMA à précision mixte, l'addition à 3 opérandes et le produit scalaire 2D, à chaque fois avec un seul arrondi et le support des sous-Normaux. Cette partie se poursuit par l'étude de l'implémentation des autres primitives flottantes normalisées : division et racine carrée. L'unité flottante matérielle précédente est réutilisée et modifiée pour optimiser ces primitives à moindre coût. Cette première partie s’ouvre sur le développement d'un générateur de code destiné à l'implémentation de bibliothèques mathématiques optimisées pour différents contextes (architecture, précision, latence, débit). La seconde partie consiste en la présentation d'une nouvelle architecture de coprocesseur reconfigurable. Cet opérateur matériel peut être dynamiquement modifié pour s'adapter à la volée à des besoins applicatifs variés. Il vise à fournir des performances se rapprochant d'une implémentation matérielle dédiée sans renier la flexibilité inhérente au logiciel. Il a été spécifiquement pensé pour être intégré avec un cœur embarqué faible consommation du MPPA. Cette partie s'attache aussi à décrire le développement d'un environnement logiciel pour cibler ce coprocesseur ainsi qu'explorer divers choix architecturaux envisagés. La dernière partie étudie un problème plus large : l'utilisation efficace de ressources arithmétiques parallèles. Elle présente une amélioration des architectures régulières Single Instruction Multiple Data tels qu’on les trouve dans les accélérateurs graphiques (GPU) pour l'exécution de graphes de flot de contrôle divergents. / In the last decades embedded systems have been challenged with more and more application variety, each time more constrained. This implies an ever growing need for performances and energy efficiency in arithmetic units. This work studies solutions ranging from hardware to software to improve arithmetic support in embedded systems. Some of these solutions were integrated in Kalray's MPPA processor. The first part of this work focuses on floating-Point arithmetic support in the MPPA. It starts with the design of a floating-Point unit (FPU) based on the classical FMA (Fused Multiply-Add) operator. The improvements we suggest, implement and evaluate include a mixed precision FMA, a 3-Operand add and a 2D scalar product, each time with a single rounding and support for subnormal numbers. It then considers the implementation of division and square root. The FPU is reused and modified to optimize the software implementations of those primitives at a lower cost. Finally, this first part opens up on the development of a code generator designed for the implementation of highly optimized mathematical libraries in different contexts (architecture, accuracy, latency, throughput). The second part studies a reconfigurable coprocessor, a hardware operator that could be dynamically modified to adapt on the fly to various applicative needs. It intends to provide performance close to ASIC implementation, with some of the flexibility of software. One of the addressed challenges is the integration of such a reconfigurable coprocessor into the low power embedded cluster of the MPPA. Another is the development of a software framework targeting the coprocessor and allowing design space exploration. The last part of this work leaves micro-Architecture considerations to study the efficient use of parallel arithmetic resources. It presents an improvement of regular architectures (Single Instruction Multiple Data), like those found in graphic processing units (GPU), for the execution of divergent control flow graphs.
65

High-Level Language Programming Environment for Parallel Real-Time Telemetry Processor

LaPlante, John R., Barge, Steve G. 11 1900 (has links)
International Telemetering Conference Proceedings / October 30-November 02, 1989 / Town & Country Hotel & Convention Center, San Diego, California / The difficulty of incorporating custom real-time processing into a conventional telemetry system frustrates many design engineers. Custom algorithms such as data compression/conversion, software decommutation, signal processing or sensitive defense related algorithms, are often executed on expensive and time-consuming mainframe computers during post-processing. The cost to implement such algorithms on real-time hardware is greater, because programming for such hardware is usually done in assembly language or microcode, resulting in: * The need for specially trained software specialists * Long and often unpredictable development time * Poor maintainability * Non-portability to new applications or hardware. This paper presents an alternative to host-based, post-processing telemetry systems. The Loral System 500 offers an easy to use, high-level language programming environment that couples real-time performance with fast development time, portability and easy maintenance. Targeted to Weltek's XL-Serles 32 and 64 bit floating point processors, delivering 20 MFLOPS peak performance, the environment transparently integrates the C programming environment with a parallel date-flow telemetry processing architecture. Supporting automatic human interface generation, symbolic high-level debugging and a complete floating point math library the System 500 programming environment extends to parallel execution transparently. It handles process scheduling, memory management and data conversion automatically. Configured to run under UNIX, the system's development environment is powerful and portable. The platform can be migrated to PC's and other hosts, facilitating eventual integration with an array of standard off-the-shelf tools.
66

HIGH-LEVEL LANGUAGE PROGRAMMING ENVIRONMENT FOR PARALLEL REAL-TIME TELEMETRY PROCESSOR

LaPlante, John R., Barge, Steve G. 11 1900 (has links)
International Telemetering Conference Proceedings / October 30-November 02, 1989 / Town & Country Hotel & Convention Center, San Diego, California / The difficulty of incorporating custom real-time processing into a conventional telemetry system frustrates many design engineers. Custom algorithms such as data compression/conversion, software decommutation, signal processing or sensitive defense related algorithms, are often executed on expensive and timeconsuming mainframe computers during post-processing. The cost to implement such algorithms on real-time hardware is greater, because programming for such hardware is usually done in assembly language or microcode, resulting in: The need for specially trained software specialists Long and often unpredictable development time Poor maintainability Non-portability to new applications or hardware This paper presents an alternative to host-based, post-processing telemetry systems. The Loral System 500 offers an easy to use, high-level language programming environment that couples real-time performance with fast development time, portability and easy maintenance. Targeted to Weltek’s XL-Serles 32 and 64 bit floating point processors, delivering 20 MFLOPS peak performance, the environment transparently integrates the C programming environment with a parallel date-flow telemetry processing architecture. Supporting automatic human interface generation, symbolic high-level debugging and a complete floating point math library the System 500 programming environment extends to parallel execution transparently. It handles process scheduling, memory management and data conversion automatically. Configured to run under UNIX, the system’s development environment is powerful and portable. The platform can be migrated to PC’s and other hosts, facilitating eventual integration with an array of standard off-the-shelf tools.
67

Evaluation of Word Length Effects on Multistandard Soft Decision Viterbi Decoding

Salim, Ahmed January 2011 (has links)
There have been proposals of many parity inducing techniques like Forward ErrorCorrection (FEC) which try to cope the problem of channel induced errors to alarge extent if not completely eradicate. The convolutional codes have been widelyidentified to be very efficient among the known channel coding techniques. Theprocess of decoding the convolutionally encoded data stream at the receiving nodecan be quite complex, time consuming and memory inefficient.This thesis outlines the implementation of multistandard soft decision viterbidecoder and word length effects on it. Classic Viterbi algorithm and its variantsoft decision viterbi algorithm, Zero-tail termination and Tail-Biting terminationfor the trellis are discussed. For the final implementation in C language, the "Zero-Tail Termination" approach with soft decision Viterbi decoding is adopted. Thismemory efficient implementation approach is flexible for any code rate and anyconstraint length.The results obtained are compared with MATLAB reference decoder. Simulationresults have been provided which show the performance of the decoderand reveal the interesting trade-off of finite word length with system performance.Such investigation can be very beneficial for the hardware design of communicationsystems. This is of high interest for Viterbi algorithm as convolutional codes havebeen selected in several famous standards like WiMAX, EDGE, IEEE 802.11a,GPRS, WCDMA, GSM, CDMA 2000 and 3GPP-LTE.
68

Complexity issues in counting, polynomial evaluation and zero finding

Briquel, Irénée 29 November 2011 (has links) (PDF)
In the present thesis, we try to compare the classical boolean complexity with the algebraic complexity, by studying problems related to polynomials. We consider the algebraic models from Valiant and from Blum, Shub and Smale (BSS). To study the algebraic complexity classes, one can start from results and open questions from the boolean case, and look at their translation in the algebraic context. The comparison of the results obtained in the two settings will then boost our understanding of both complexity theories. The first part follows this framework. By considering a polynomial canonically associated to a boolean formula, we get a link between boolean complexity issues on the formula and algebraic complexity problems on the polynomial. We studied the complexity of computing the polynomial in Valiant's model, as a function of the complexity of the boolean formula. We found algebraic counterparts to some boolean results. Along the way, we could also use some algebraic methods to improve boolean results, in particular by getting better counting reductions. Another motivation for algebraic models of computation is to offer an elegant framework to the study of numerical algorithms. The second part of this thesis follows this approach. We started from new algorithms for the search of approximate zeros of complex systems of n polynomials in n variables. Up to now, those were BSS machine algorithms. We studied the implementation of these algorithms on digital computers, and propose an algorithm using floating arithmetic for this problem.
69

Analyses de terminaison des calculs flottants / Termination Analysis of Floating-Point Computations

Maurica Andrianampoizinimaro, Fonenantsoa 08 December 2017 (has links)
Le tristement célèbre Ecran Bleu de la Mort de Windows introduit bien le problème traité. Ce bug est souvent causé par la non-terminaison d'un pilote matériel : le programme s'exécute infiniment, bloquant ainsi toutes les ressources qu'il s'est approprié pour effectuer ses calculs. Cette thèse développe des techniques qui permettent de décider, préalablement à l'exécution, la terminaison d'un programme donné pour l'ensemble des valeurs possibles de ses paramètres en entrée. En particulier, nous nous intéressons aux programmes qui manipulent des nombres flottants. Ces nombres sont omniprésents dans les processeurs actuels et sont utilisés par pratiquement tous les développeurs informatiques. Pourtant, ils sont souvent mal compris et, de fait, source de bugs. En effet, les calculs flottants sont entachés d'erreurs, inhérentes au fait qu'ils sont effectués avec une mémoire finie. Par exemple, bien que vraie dans les réels, l'égalité 0.2 + 0.3 = 0.5 est fausse dans les flottants. Non gérées correctement, ces erreurs peuvent amener à des évènements catastrophiques, tel l'incident du missile Patriot qui a fait 28 morts. Les théories que nous développons sont illustrées, et mises à l'épreuve par des extraits de codes issus de programmes largement répandus. Notamment, nous avons pu exhiber des bugs de terminaisons dues à des calculs flottants incorrects dans certains paquets de la distribution Ubuntu. / The infamous Blue Screen of Death of Windows appropriately introduces the problem at hand. This bug is often caused by a non-terminating device driver: the program runs infinitely, blocking in the process all the resources it allocated for its calculations. This thesis develops techniques that allow to decide, before runtime,termination of a given program for any possible value ​​of its inputs. In particular, we are interested in programs that manipulate floating-point numbers. These numbers are ubiquitous in current processors andare used by nearly all software developers. Yet, they are often misunderstood and, hence, source of bugs.Indeed, floating-point computations are tainted with errors. This is because they are performed within a finite amount of memory. For example, although true in the reals, the equality 0.2 + 0.3 = 0.5 is false in the floats. Not handled properly, these errors can lead to catastrophic events,such as the Patriot missile incident that killed 28 people. The theories we develop are illustrated, and put to the test, by code snippets taken from widely used programs. Notably, we were able to exhibit termination bugs due toincorrect floating-point computations in some packages of the Ubuntu distribution.
70

Methods to evaluate accuracy-energy trade-off in operator-level approximate computing / Méthodes d'évaluation du compromis précision-énergie pour le calcul approximatif niveau opérateur

Barrois, Benjamin 11 December 2017 (has links)
Les limites physiques des circuits à base de silicium étant en passe d'être atteintes, de nouveaux moyens doivent être trouvés pour outrepasser la fin de la loi de Moore. Beaucoup d'applications peuvent tolérer des approximations dans leurs calculs à différents niveaux, sans dégrader la qualité de leur sortie, ou en la dégradant de manière acceptable. Cette thèse se concentre sur les architectures arithmétiques approximatives afin de saisir cette opportunité. Tout d'abord, une étude critique de l'état de l'art des additionneurs et multiplieurs approximatifs est présentée. Ensuite, un modèle de propagation d'erreur virgule-fixe mettant en œuvre la densité spectrale de puissance est proposée, suivi d'un modèle de propagation du taux d'erreur binaire positionnel des opérateurs approximatifs. Les opérateurs approximatifs sont ensuite utilisés pour la reproduction des effets de la VOS dans les opérateurs arithmétiques exacts. Grâce à notre outil de travail open-source ApxPerf et ses bibliothèques synthétisables C++ apx_fixed pour les opérateurs approximatifs et ct_float pour l'arithmétique flottante basse consommation, deux études consécutives sont proposées, basées sur des applications de traitement du signal complexes. Tout d'abord, les opérateurs approximatifs sont comparés à l'arithmétique virgule-fixe, et la supériorité de la virgule-fixe est soulignée. Enfin, la virgule fixe est comparée aux petits flottants dans des conditions équivalentes. En fonction des conditions applicatives, la virgule-flottante montre une compétitivité inattendue face à la virgule-fixe. Les résultats et discussions de cette thèse donnent un regard nouveau sur l'arithmétique approximative et suggère de nouvelles directions pour le futur des architectures efficaces en énergie. / The physical limits being reached in silicon-based computing, new ways have to be found to overcome the predicted end of Moore's law. Many applications can tolerate approximations in their computations at several levels without degrading the quality of their output, or degrading it in an acceptable way. This thesis focuses on approximate arithmetic architectures to seize this opportunity. Firstly, a critical study of state-of-the-art approximate adders and multipliers is presented. Then, a model for fixed-point error propagation leveraging power spectral density is proposed, followed by a model for bitwise-error rate propagation of approximate operators. Approximate operators are then used for the reproduction of voltage over-scaling effects in exact arithmetic operators. Leveraging our open-source framework ApxPerf and its synthesizable template-based C++ libraries apx_fixed for approximate operators, and ct_float for low-power floating-point arithmetic, two consecutive studies are proposed leveraging complex signal processing applications. Firstly, approximate operators are compared to fixed-point arithmetic, and the superiority of fixed-point is highlighted. Secondly, fixed-point is compared to small-width floating-point in equivalent conditions. Depending on the applicative conditions, floating-point shows an unexpected competitiveness compared to fixed-point. The results and discussions of this thesis give a fresh look on approximate arithmetic and suggest new directions for the future of energy-efficient architectures.

Page generated in 0.0829 seconds