Global ETD Search

61	Minimization Of Power Dissipation In Digital Circuits Using Pipelining And A Study Of Clock Gating Technique Panchangam, Ranganath 01 January 2004 (has links) Power dissipation is one of the major design issues of digital circuits. The power dissipated by a circuit affects its speed and performance. Multiplier is one of the most commonly used circuits in the digital devices. There are various types of multipliers available depending upon the application in which they are used. In the present thesis report, the importance of power dissipation in today's digital technology is discussed and the various types and sources of power dissipation have been elaborated. Different types of multipliers have been designed which vary in their structure and amount of power dissipation. The concept of pipelining is explained and the reduction in the power dissipation of the multipliers after pipelining is experimentally determined. Clock gating is a very important technique used in the design of digital circuits to reduce power dissipation. Various types of clock gating techniques have been presented as a case study. The technology used in the simulation of these circuits is 0.35µm CMOS and the simulator used is SPECTRE S. Dissertations Academic Clock gating Digital circuits Pipelining Power dissipation Electrical and Computer Engineering Engineering
62	Traitement des signaux et images en temps réel : "implantation de H.264 sur MPSoC" Messaoudi, Kamel 19 December 2012 (has links) Cette thèse est élaborée en cotutelle entre l’université Badji Mokhtar (Laboratoire LERICA) et l’université de bourgogne (Laboratoire LE2I, UMR CNRS 5158). Elle constitue une contribution à l’étude et l’implantation de l’encodeur H.264/AVC. Durent l’évolution des normes de compression vidéo, une réalité sure est vérifiée de plus en plus : avoir une bonne performance du processus de compression nécessite l’élaboration d’équipements beaucoup plus performants en termes de puissance de calcul, de flexibilité et de portabilité et ceci afin de répondre aux exigences des différents traitements et satisfaire au critère « Temps Réel ». Pour assurer un temps réel pour ce genre d’applications, une solution reste possible est l’utilisation des systèmes sur puce (SoC) ou bien des systèmes multiprocesseurs sur puce (MPSoC) implantés sur des plateformes reconfigurables à base de circuit FPGA. L’objective de cette thèse consiste à l’étude et l’implantation des algorithmes de traitement des signaux et images et en particulier la norme H.264/AVC, et cela dans le but d’assurer un temps réel pour le cycle codage-décodage. Nous utilisons deux plateformes FPGA de Xilinx (ML501 et XUPV5). Dans la littérature, il existe déjà plusieurs implémentations du décodeur. Pour l’encodeur, malgré les efforts énormes réalisés, il reste toujours du travail pour l’optimisation des algorithmes et l’extraction des parallélismes possibles surtout avec une variété de profils et de niveaux de la norme H.264/AVC.Dans un premier temps de cette thèse, nous proposons une implantation matérielle d’un contrôleur mémoire spécialement pour l’encodeur H.264/AVC. Ce contrôleur est réalisé en ajoutant, au contrôleur mémoire DDR2 des deux plateformes de Xilinx, une couche intelligente capable de calculer les adresses et récupérer les données nécessaires pour les différents modules de traitement de l’encodeur. Ensuite, nous proposons des implantations matérielles (niveau RTL) des modules de traitement de l’encodeur H.264. Sur ces implantations, nous allons exploiter les deux principes de parallélisme et de pipelining autorisé par l’encodeur en vue de la grande dépendance inter-blocs. Nous avons ainsi proposé plusieurs améliorations et nouvelles techniques dans les modules de la chaine Intra et le filtre anti-blocs. A la fin de cette thèse, nous utilisons les modules réalisés en matériels pour la l’implantation Matérielle/logicielle de l’encodeur H.264/AVC. Des résultats de synthèse et de simulation, en utilisant les deux plateformes de Xilinx, sont montrés et comparés avec les autres implémentations existantes / This thesis has been carried out in joint supervision between the Badji Mokhtar University (LERICA Laboratory) and the University of Burgundy (LE2I laboratory, UMR CNRS 5158). It is a contribution to the study and implementation of the H.264/AVC encoder. The evolution in video coding standards have historically demanded stringent performances of the compression process, which imposes the development of platforms that perform much better in terms of computing power, flexibility and portability. Such demands are necessary to fulfill requirements of the different treatments and to meet "Real Time" processing constraints. In order to ensure real-time performances, a possible solution is to made use of systems on chip (SoC) or multiprocessor systems on chip (MPSoC) built on platforms based reconfigurable FPGAs. The objective of this thesis is the study and implementation of algorithms for signal and image processing (in particular the H.264/AVC standard); especial attention was given to provide real-time coding-decoding cycles. We use two FPGA platforms (ML501 and XUPV5 from Xilinx) to implement our architectures. In the literature, there are already several implementations of the decoder. For the encoder part, despite the enormous efforts made, work remains to optimize algorithms and extract the inherent parallelism of the architecture. This is especially true with a variety of profiles and levels of H.264/AVC. Initially, we proposed a hardware implementation of a memory controller specifically targeted to the H.264/AVC encoder. This controller is obtained by adding, to the DDR2 memory controller, an intelligent layer capable of calculating the addresses and to retrieve the necessary data for several of the processing modules of the encoder. Afterwards, we proposed hardware implementations (RTL) for the processing modules of the H.264 encoder. In these implementations, we made use of principles of parallelism and pipelining, taking into account the constraints imposed by the inter-block dependency in the encoder. We proposed several enhancements and new technologies in the channel Intra modules and the deblocking filter. At the end of this thesis, we use the modules implemented in hardware for implementing the H.264/AVC encoder in a hardware/software design. Synthesis and simulation results, using both platforms for Xilinx, are shown and compared with other existing implementations Implantation matérielle Codesign L’encodeur H.264/AVC Temps réel Gestion de mémoire Contrôleur mémoire Parallélisme Pipelining SoC MPSoC FPGA Plateforme de prototypage Xilinx ML501 XUPV5 Hardware implementation Codesign H.264/AVC encoder Real time Memory management Memory controller Parallelism Pipelining SoC MPSoC FPGA Prototyping platform Xilinx ML501 XUPV5 005.1 006.6 621.38
63	Hardware bidirectional real time motion estimator on a Xilinx Virtex II Pro FPGA Iqbal, Rashid January 2006 (has links) <p>This thesis describes the implementation of a real-time, full search, 16x16 bidirectional motion estimation at 24 frames per second with the record performance of 155 Gop/s (1538 ops/pixel) at a high clock rate of 125 MHz. The core of bidirectional motion estimation uses close to 100% FPGA resources with 7 Gbit/s bandwidth to external memory. The architecture allows extremely controlled, macro level floor-planning with parameterized block size, image size, placement coordinates and data words length. The FPGA chip is part of the board that was developed at the Institute of Computer & Communication Networking Engineering, Technical University Braunschweig Germany, in collaboration with Grass Valley Germany in the FlexFilm research project. The goal of the project was to develop hardware and programming methodologies for real-time digital film image processing. Motion estimation core uses FlexWAFE reconfigurable architecture where FPGAs are configured using macro components that consist of weakly programmable address generation units and data stream processing units. Bidirectional motion estimation uses two cores of motion estimation engine (MeEngine) forming main data processing unit for backward and forward motion vectors. The building block of the core of motion estimation is an RPM-macro which represents one processing element and performs 10-bit difference, a comparison, and 19-bit accumulation on the input pixel streams. In order to maximize the throughput between elements, the processing element is replicated and precisely placed side-by-side by using four hierarchal levels, where each level is a very compact entity with its own local control and placement methodology. The achieved speed was further improved by regularly inserting pipeline stages in the processing chain.</p> Bidirectional motion estimation FPGA Relationally placed macro CLB slice tristate buffers comparator pipelining search upper search lower VHDL Xilinx block matching MeEngine LMC CMC. sum of absolute differences systolic array SARow PE2X8 MeProC 125 MHz Virtex II Pro Electronics Elektronik
64	Hardware bidirectional real time motion estimator on a Xilinx Virtex II Pro FPGA Iqbal, Rashid January 2006 (has links) This thesis describes the implementation of a real-time, full search, 16x16 bidirectional motion estimation at 24 frames per second with the record performance of 155 Gop/s (1538 ops/pixel) at a high clock rate of 125 MHz. The core of bidirectional motion estimation uses close to 100% FPGA resources with 7 Gbit/s bandwidth to external memory. The architecture allows extremely controlled, macro level floor-planning with parameterized block size, image size, placement coordinates and data words length. The FPGA chip is part of the board that was developed at the Institute of Computer & Communication Networking Engineering, Technical University Braunschweig Germany, in collaboration with Grass Valley Germany in the FlexFilm research project. The goal of the project was to develop hardware and programming methodologies for real-time digital film image processing. Motion estimation core uses FlexWAFE reconfigurable architecture where FPGAs are configured using macro components that consist of weakly programmable address generation units and data stream processing units. Bidirectional motion estimation uses two cores of motion estimation engine (MeEngine) forming main data processing unit for backward and forward motion vectors. The building block of the core of motion estimation is an RPM-macro which represents one processing element and performs 10-bit difference, a comparison, and 19-bit accumulation on the input pixel streams. In order to maximize the throughput between elements, the processing element is replicated and precisely placed side-by-side by using four hierarchal levels, where each level is a very compact entity with its own local control and placement methodology. The achieved speed was further improved by regularly inserting pipeline stages in the processing chain. Bidirectional motion estimation FPGA Relationally placed macro CLB slice tristate buffers comparator pipelining search upper search lower VHDL Xilinx block matching MeEngine LMC CMC. sum of absolute differences systolic array SARow PE2X8 MeProC 125 MHz Virtex II Pro Electronics Elektronik
65	Traitement des signaux et images en temps réel : "implantation de H.264 sur MPSoC" Messaoudi, Kamel 19 December 2012 (has links) (PDF) Cette thèse est élaborée en cotutelle entre l'université Badji Mokhtar (Laboratoire LERICA) et l'université de bourgogne (Laboratoire LE2I, UMR CNRS 5158). Elle constitue une contribution à l'étude et l'implantation de l'encodeur H.264/AVC. Durent l'évolution des normes de compression vidéo, une réalité sure est vérifiée de plus en plus : avoir une bonne performance du processus de compression nécessite l'élaboration d'équipements beaucoup plus performants en termes de puissance de calcul, de flexibilité et de portabilité et ceci afin de répondre aux exigences des différents traitements et satisfaire au critère " Temps Réel ". Pour assurer un temps réel pour ce genre d'applications, une solution reste possible est l'utilisation des systèmes sur puce (SoC) ou bien des systèmes multiprocesseurs sur puce (MPSoC) implantés sur des plateformes reconfigurables à base de circuit FPGA. L'objective de cette thèse consiste à l'étude et l'implantation des algorithmes de traitement des signaux et images et en particulier la norme H.264/AVC, et cela dans le but d'assurer un temps réel pour le cycle codage-décodage. Nous utilisons deux plateformes FPGA de Xilinx (ML501 et XUPV5). Dans la littérature, il existe déjà plusieurs implémentations du décodeur. Pour l'encodeur, malgré les efforts énormes réalisés, il reste toujours du travail pour l'optimisation des algorithmes et l'extraction des parallélismes possibles surtout avec une variété de profils et de niveaux de la norme H.264/AVC.Dans un premier temps de cette thèse, nous proposons une implantation matérielle d'un contrôleur mémoire spécialement pour l'encodeur H.264/AVC. Ce contrôleur est réalisé en ajoutant, au contrôleur mémoire DDR2 des deux plateformes de Xilinx, une couche intelligente capable de calculer les adresses et récupérer les données nécessaires pour les différents modules de traitement de l'encodeur. Ensuite, nous proposons des implantations matérielles (niveau RTL) des modules de traitement de l'encodeur H.264. Sur ces implantations, nous allons exploiter les deux principes de parallélisme et de pipelining autorisé par l'encodeur en vue de la grande dépendance inter-blocs. Nous avons ainsi proposé plusieurs améliorations et nouvelles techniques dans les modules de la chaine Intra et le filtre anti-blocs. A la fin de cette thèse, nous utilisons les modules réalisés en matériels pour la l'implantation Matérielle/logicielle de l'encodeur H.264/AVC. Des résultats de synthèse et de simulation, en utilisant les deux plateformes de Xilinx, sont montrés et comparés avec les autres implémentations existantes [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre [SPI:OTHER] Engineering Sciences/Other Implantation matérielle Codesign L'encodeur H.264/AVC Temps réel Gestion de mémoire Contrôleur mémoire Parallélisme Pipelining SoC MPSoC FPGA Plateforme de prototypage Xilinx ML501 XUPV5
66	Chemnitzer Linux-Tage 2012: Tagungsband – 17. und 18. März 2012 Schöner, Axel, Meier, Wilhelm, Kubieziel, Jens, Berger, Uwe, Götz, Sebastian, Leuthäuser, Max, Piechnick, Christian, Reimann, Jan, Richly, Sebastian, Schroeter, Julia, Wilke, Claas, Aßmann, Uwe, Schütz, Georg, Kastrup, David, Lang, Jens, Luithardt, Wolfram, Gachet, Daniel, Nasrallah, Olivier, Kölbel, Cornelius, König, Harald, Wachtler, Axel, Wunsch, Jörg, Vorwerk, Matthias, Knopper, Klaus, Kramer, Frederik, Jamous, Naoum 20 April 2012 (has links) Die Chemnitzer Linux-Tage sind eine Veranstaltung rund um das Thema Open Source. Im Jahr 2012 wurden 104 Vorträge und Workshops gehalten. Der Band enthält ausführliche Beiträge zu 14 Hauptvorträgen sowie Zusammenfassungen zu 90 weiteren Vorträgen. / The "Chemnitz Linux Days" is a conference that deals with Linux and Open Source Software. In 2012 104 talks and workshops were given. This volume contains papers of 14 main lectures and 90 abstracts. info:eu-repo/classification/ddc/004 ddc:004
67	Reducing Power in FPGA Designs Through Glitch Reduction Rollins, Nathaniel Hatley 27 February 2007 (has links) (PDF) While FPGAs provide flexibility for performing high performance DSP functions, they consume a significant amount of power. Often, a large portion of the dynamic power is wasted on unproductive signal glitches. Reducing glitching reduces dynamic energy consumption. In this study, retiming is used to reduce the unproductive energy wasted in signal glitches. Retiming can reduce energy by up to 92%. Evaluating energy consumption is an important part of energy reduction. In this work, an activity rate-based power estimation tool is introduced to provide FPGA architecture independent energy estimations at the gate level. This tool can accurately estimate power consumption to within 13% on average. This activation rate-based tool and retiming are combined in a single algorithm to reduce energy consumption of FPGA designs at the gate level. In this work, an energy evaluation metric called energy area delay is used to weigh the energy reduction and clock rate improvements gained from retiming against the area and latency costs. For a set of benchmark designs, the algorithm that combines retiming and the activation rate-based power estimator reduces power on average by 40% and improves clock rate by 54% for an average 1.1x area cost and a 1.5x latency increase. FPGA digital design glitch reduction power estimation power reduction RPower JPower XPower activity-rate based power estimation pipelining retiming energy area delay energy metrics probability model transition model Electrical and Computer Engineering
68	Kompiliatorių optimizavimas IA-64 architektūroje / Compiler optimizations on ia-64 architecture Valiukas, Tadas 01 July 2014 (has links) Tradicinės x86 architektūros spartinimui artėjant prie galimybių ribos, kompanija Intel pradėjo kurti naują IA-64 architektūrą, paremtą EPIC – išreikštinai lygiagrečiai vykdomomis instrukcijomis vieno takto metu. Ši pagrindinė savybė leidžia vykdyti iki šešių instrukcijų per vieną taktą. Taipogi architektūra pasižymi tokiomis savybėmis, kurios leido efektyviai spręsti su kodo optimizavimu susijusias problemas tradicinėse architektūrose. Tačiau kompiliatorių optimizavimo algoritmai ilgą laiką buvo tobulinami tradicinėse architektūrose, todėl norint išnaudoti naująją architektūrą, reikia ieškoti būdų tobulinti esamus kompiliatorius. Vienas iš būdų – kompiliatoriaus vidinių parametrų atsakingų už optimizacijas reikšmių pritaikymas IA-64. Būtent toks yra šio darbo tikslas, kuriam pasiekti reikia išnagrinėti IA-64 savybes, jas vėliau eksperimentiškai taikyti realaus kodo pavyzdžiuose bei įvertinti jų įtaką kodo vykdymo spartai. Pagal gautus rezultatus nagrinėjami kompiliatoriaus vidiniai parametrai ir su specialia kompiliatorių testavimo programa randamas geriausias reikšmių rinkinys šiai architektūrai. Vėliau šis rinkinys išbandomas su taikomosiomis programomis. Gauto parametrų rinkinio reikšmės turėtų leisti generuoti efektyvesnį kodą IA-64 architektūrai. / After performance optimization of traditional architectures began to reach their limits, Intel corporation started to develop new architecture based on EPIC – Explicitly Parallel Instruction Counting. This main feature allowed up to six instructions to be executed in single CPU cycle. Also this architecture includes more features, which allowed efficient solution of traditional architectures code optimization problems. However for long time code optimization algorithms have been improved for traditional architectures only, as a result those algorithms should be adopted to new architecture. One of the ways to do that – exploration of internal compilers parameters, which are responsible for code optimizations. That is the primary target of this work and in order to reach it the features of the IA-64 architecture and impact to execution performance must be explored using real-life code examples. Tests results may be used later for internal parameters selection and further exploration of these parameters values by using special compiler performance testing benchmarks. The set of those new values could be tested with real life applications in order to prove efficiency of IA-64 architecture features. IA-64 architektūra Itanium Predikacija Išankstinis duomenų užkrovimas Ciklų optimizavimas Valdymo spėjimas IA-64 architecture VLIW – Very Long Instruction Word Predication Control and data speculation Software pipelining Branch prediction Prefetching RSE – rotating register engine Gcc optimization
69	Calcul flottant haute performance sur circuits reconfigurables / High-performance floating-point computing on reconfigurable circuits Pasca, Bogdan Mihai 21 September 2011 (has links) De plus en plus de constructeurs proposent des accélérateurs de calculs à base de circuits reconfigurables FPGA, cette technologie présentant bien plus de souplesse que le microprocesseur. Valoriser cette flexibilité dans le domaine de l'accélération de calcul flottant en utilisant les langages de description de circuits classiques (VHDL ou Verilog) reste toutefois très difficile, voire impossible parfois. Cette thèse a contribué au développement du logiciel FloPoCo, qui offre aux utilisateurs familiers avec VHDL un cadre C++ de description d'opérateurs arithmétiques génériques adapté au calcul reconfigurable. Ce cadre distingue explicitement la fonctionnalité combinatoire d'un opérateur, et la problématique de son pipeline pour une précision, une fréquence et un FPGA cible donnés. Afin de pouvoir utiliser FloPoCo pour concevoir des opérateurs haute performance en virgule flottante, il a fallu d'abord concevoir des blocs de bases optimisés. Nous avons d'abord développé des additionneurs pipelinés autour des lignes de propagation de retenue rapides, puis, à l'aide de techniques de pavages, nous avons conçu de gros multiplieurs, possiblement tronqués, utilisant des petits multiplieurs. L'évaluation de fonctions élémentaires en flottant implique souvent l'évaluation en virgule fixe d'une fonction. Nous présentons un opérateur générique de FloPoCo qui prend en entrée l'expression de la fonction à évaluer, avec ses précisions d'entrée et de sortie, et construit un évaluateur polynomial optimisé de cette fonction. Ce bloc de base a permis de développer des opérateurs en virgule flottante pour la racine carrée et l'exponentielle qui améliorent considérablement l'état de l'art. Nous avons aussi travaillé sur des techniques de compilation avancée pour adapter l'exécution d'un code C aux pipelines flexibles de nos opérateurs. FloPoCo a pu ainsi être utilisé pour implanter sur FPGA des applications complètes. / Due to their potential performance and unmatched flexibility, FPGA-based accelerators are part of more and more high-performance computing systems. However, exploiting this flexibility for accelerating floating-point computations by manually using classical circuit description languages (VHDL or Verilog) is very difficult, and sometimes impossible. This thesis has contributed to the development of the FloPoCo software, a C++ framework for describing flexible FPGA-specific arithmetic operators. This framework explicitly separates the description of the combinatorial functionality of an arithmetic operator, and its pipelining for a given precision, operating frequency and target FPGA.In order to be able to use FloPoCo for designing high performance floating-point operators, we first had to design the optimized basic blocks. We first developed pipelined addition architectures exploiting the fast-carry lines present in modern FPGAs. Next, we focused on multiplication architectures. Using tiling techniques, we proposed novel architectures for large multipliers, but also truncated multipliers, based on the multipliers found in modern FPGA DSP blocks. We also present a generic FloPoCo operator which inputs the expression of a function, its input and output precisions, and builds an optimized polynomial evaluator for the fixed-point evaluation of this function. Using this building block we have designed floating-point operators for the square-root and exponential functions which significantly outperform existing operators. Finally, we also made use of advanced compilation techniques for adapting the execution of a C program to the flexible pipelines of our operators. FPGA Virgule flottante FloPoCo Chemin de données arithmétique Pipeline pour une fréquence donnée Addition pipelinée Additionneur rapide Multiplication Karatsuba-Offman Carré Multiplieur tronqué Multiplication par pavage Virgule fixe Approximation polynomiale Racine carrée flottante Exponentielle flottante Accumulation flottante Schéma d'évaluation de Horner Somme de carrés flottante Synthèse de haut niveau Nid de boucles parfait Multiplication de matrices Jacobi Dilemme du fabricant de table Méthode des différences tabulées Communications pipelinées FPGA Floating-point FloPoCo Arithmetic datapath Frequency-driven pipelining Pipelined addition Short-latency adder Multiplication Karatsuba-Offman Squarer Truncated multiplier Multiplication tiling Fixed-point Polynomial approximation Floating-point square root Floating-point exponential Floating-point accumulation Horner datapath Floating-point sum-of-products High-level synthesis Perfect loop nests Matrix-matrix multiply Jacobi stencil Table maker's dilemma Pipelined communications Pipelined communications

Search results