• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 51
  • 10
  • 6
  • 5
  • 3
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 103
  • 103
  • 103
  • 36
  • 27
  • 25
  • 21
  • 21
  • 20
  • 19
  • 19
  • 19
  • 17
  • 16
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

A design flow to automatically Generate on chip monitors during high-level synthesis of Hardware accelarators / Un flot de conception pour générer automatiquement des moniteurs sur puce pendant la synthèse de haut niveau d'accélérateurs matériels

Ben Hammouda, Mohamed 11 December 2014 (has links)
Les systèmes embarqués sont de plus en plus utilisés dans des domaines divers tels que le transport, l’automatisation industrielle, les télécommunications ou la santé pour exécuter des applications critiques et manipuler des données sensibles. Ces systèmes impliquent souvent des intérêts financiers et industriels, mais aussi des vies humaines ce qui impose des contraintes fortes de sûreté. Par conséquent, un élément clé réside dans la capacité de tels systèmes à répondre correctement quand des erreurs se produisent durant l’exécution et ainsi empêcher des comportements induits inacceptables. Les erreurs peuvent être d’origines naturelles telles que des impacts de particules, du bruit interne (problème d’intégrité), etc. ou provenir d’attaques malveillantes. Les architectures de systèmes embarqués comprennent généralement un ou plusieurs processeurs, des mémoires, des contrôleurs d’entrées/sorties ainsi que des accélérateurs matériels utilisés pour améliorer l’efficacité énergétique et les performances. Avec l’évolution des applications, le cycle de conception d’accélérateurs matériels devient de plus en plus complexe. Cette complexité est due en partie aux spécifications des accélérateurs matériels qui reposent traditionnellement sur l’écriture manuelle de fichiers en langage de description matérielle (HDL).Cependant, la synthèse de haut niveau (HLS) qui favorise la génération automatique ou semi-automatique d’accélérateurs matériels à partir de spécifications logicielles, comme du code C, permet de réduire cette complexité.Le travail proposé dans ce manuscrit cible l’intégration d’un support de vérification dans les outils de HLS pour générer des moniteurs sur puce au cours de la synthèse de haut niveau des accélérateurs matériels. Trois contributions distinctes ont été proposées. La première contribution consiste à contrôler les erreurs de comportement temporel des entrées/sorties (impactant la synchronisation avec le reste du système) ainsi que les erreurs du flot de contrôle (sauts illégaux ou problèmes de boucles infinies). La synthèse des moniteurs est automatique sans qu’aucune modification de la spécification utilisée en entrée de la HLS ne soit nécessaire. La deuxième contribution vise la synthèse des propriétés de haut niveau (ANSI-C asserts) qui ont été ajoutées dans la spécification logicielle de l’accélérateur matériel. Des options de synthèse ont été proposées pour arbitrer le compromis entre le surcout matériel, la dégradation de la performance et le niveau de protection. La troisième contribution améliore la détection des corruptions des données qui peuvent modifier les valeurs stockées, et/ou modifier les transferts de données, sans violer les assertions (propriétés) ni provoquer de sauts illégaux. Ces erreurs sont détectées en dupliquant un sous-ensemble des données du programme, limité aux variables les plus critiques. En outre, les propriétés sur l’évolution des variables d’induction des boucles ont été automatiquement extraites de la description algorithmique de l’accélérateur matériel. Il faut noter que l’ensemble des approches proposées dans ce manuscrit, ne s’intéresse qu’à la détection d’erreurs lors de l’exécution. La contreréaction c.à.d. la manière dont le moniteur réagit si une erreur est détectée n’est pas abordée dans ce document. / Embedded systems are increasingly used in various fields like transportation, industrial automation, telecommunication or healthcare to execute critical applications and manipulate sensitive data. These systems often involve financial and industrial interests but also human lives which imposes strong safety constraints.Hence, a key issue lies in the ability of such systems to respond safely when errors occur at runtime and prevent unacceptable behaviors. Errors can be due to natural causes such as particle hits as well as internal noise, integrity problems, but also due to malicious attacks. Embedded system architecture typically includes processor (s), memories, Input / Output interface, bus controller and hardware accelerators that are used to improve both energy efficiency and performance. With the evolution of applications, the design cycle of hardware accelerators becomes more and more complex. This complexity is partly due to the specification of hardware accelerators traditionally based on handwritten Hardware Description Language (HDL) files. However, High-Level Synthesis (HLS) that promotes automatic or semi-automatic generation of hardware accelerators according to software specification, like C code, allows reducing this complexity.The work proposed in this document targets the integration of verification support in HLS tools to generate On-Chip Monitors (OCMs) during the high-level synthesis of hardware accelerators (HWaccs). Three distinct contributions are proposed. The first one consists in checking the Input / Output timing behavior errors (synchronization with the whole system) as well as the control flow errors (illegal jumps or infinite loops). On-Chip Monitors are automatically synthesized and require no modification in their high-level specification. The second contribution targets the synthesis of high-level properties (ANSI-C asserts) that are added into the software specification of HWacc. Synthesis options are proposed to trade-off area overhead, performance impact and protection level. The third contribution improves the detection of data corruptions that can alter the stored values or/and modify the data transfers without causing assertions violations or producing illegal jumps. Those errors are detected by duplicating a subset of program’s data limited to the most critical variables. In addition, the properties over the evolution of loops induction variables are automatically extracted from the algorithmic description of HWacc. It should be noticed that all the proposed approaches, in this document, allow only detecting errors at runtime. The counter reaction i.e. the way how the HWacc reacts if an error is detected is out of scope of this work.
82

Implementation trade-offs for FGPA accelerators / Compromis pour l'implémentation d'accélérateurs sur FPGA

Deest, Gaël 14 December 2017 (has links)
L'accélération matérielle désigne l'utilisation d'architectures spécialisées pour effectuer certaines tâches plus vite ou plus efficacement que sur du matériel générique. Les accélérateurs ont traditionnellement été utilisés dans des environnements contraints en ressources, comme les systèmes embarqués. Cependant, avec la fin des règles empiriques ayant régi la conception de matériel pendant des décennies, ces quinze dernières années ont vu leur apparition dans les centres de calcul et des environnements de calcul haute performance. Les FPGAs constituent une plateforme d'implémentation commode pour de tels accélérateurs, autorisant des compromis subtils entre débit/latence, surface, énergie, précision, etc. Cependant, identifier de bons compromis représente un défi, dans la mesure où l'espace de recherche est généralement très large. Cette thèse propose des techniques de conception pour résoudre ce problème. Premièrement, nous nous intéressons aux compromis entre performance et précision pour la conversion flottant vers fixe. L'utilisation de l'arithmétique en virgule fixe au lieu de l'arithmétique flottante est un moyen efficace de réduire l'utilisation de ressources matérielles, mais affecte la précision des résultats. La validité d'une implémentation en virgule fixe peut être évaluée avec des simulations, ou en dérivant des modèles de précision analytiques de l'algorithme traité. Comparées aux approches simulatoires, les méthodes analytiques permettent une exploration plus exhaustive de l'espace de recherche, autorisant ainsi l'identification de solutions potentiellement meilleures. Malheureusement, elles ne sont applicables qu'à un jeu limité d'algorithmes. Dans la première moitié de cette thèse, nous étendons ces techniques à des filtres linéaires multi-dimensionnels, comme des algorithmes de traitement d'image. Notre méthode est implémentée comme une analyse statique basée sur des techniques de compilation polyédrique. Elle est validée en la comparant à des simulations sur des données réelles. Dans la seconde partie de cette thèse, on se concentre sur les stencils itératifs. Les stencils forment un motif de calcul émergeant naturellement dans de nombreux algorithmes utilisés en calcul scientifique ou dans l'embarqué. À cause de cette diversité, il n'existe pas de meilleure architecture pour les stencils de façon générale : chaque algorithme possède des caractéristiques uniques (intensité des calculs, nombre de dépendances) et chaque application possède des contraintes de performance spécifiques. Pour surmonter ces difficultés, nous proposons une famille d'architectures pour stencils. Nous offrons des paramètres de conception soigneusement choisis ainsi que des modèles analytiques simples pour guider l'exploration. Notre architecture est implémentée sous la forme d'un flot de génération de code HLS, et ses performances sont mesurées sur la carte. Comme les résultats le démontrent, nos modèles permettent d'identifier les solutions les plus intéressantes pour chaque cas d'utilisation. / Hardware acceleration is the use of custom hardware architectures to perform some computations faster or more efficiently than on general-purpose hardware. Accelerators have traditionally been used mostly in resource-constrained environments, such as embedded systems, where resource-efficiency was paramount. Over the last fifteen years, with the end of empirical scaling laws, they also made their way to datacenters and High-Performance Computing environments. FPGAs constitute a convenient implementation platform for such accelerators, allowing subtle, application-specific trade-offs between all performance metrics (throughput/latency, area, energy, accuracy, etc.) However, identifying good trade-offs is a challenging task, as the design space is usually extremely large. This thesis proposes design methodologies to address this problem. First, we focus on performance-accuracy trade-offs in the context of floating-point to fixed-point conversion. Usage of fixed-point arithmetic instead of floating-point is an affective way to reduce hardware resource usage, but comes at a price in numerical accuracy. The validity of a fixed-point implementation can be assessed using either numerical simulations, or with analytical models derived from the algorithm. Compared to simulation-based methods, analytical approaches enable more exhaustive design space exploration and can thus increase the quality of the final architecture. However, their are currently only applicable to limited sets of algorithms. In the first part of this thesis, we extend such techniques to multi-dimensional linear filters, such as image processing kernels. Our technique is implemented as a source-level analysis using techniques from the polyhedral compilation toolset, and validated against simulations with real-world input. In the second part of this thesis, we focus on iterative stencil computations, a naturally-arising pattern found in many scientific and embedded applications. Because of this diversity, there is no single best architecture for stencils: each algorithm has unique computational features (update formula, dependences) and each application has different performance constraints/requirements. To address this problem, we propose a family of hardware accelerators for stencils, featuring carefully-chosen design knobs, along with simple performance models to drive the exploration. Our architecture is implemented as an HLS-optimized code generation flow, and performance is measured with actual execution on the board. We show that these models can be used to identify the most interesting design points for each use case.
83

Stream Computing on FPGAs

Plavec, Franjo 01 September 2010 (has links)
Field Programmable Gate Arrays (FPGAs) are programmable logic devices used for the implementation of a wide range of digital systems. In recent years, there has been an increasing interest in design methodologies that allow high-level design descriptions to be automatically implemented in FPGAs. This thesis describes the design and implementation of a novel compilation flow that implements circuits in FPGAs from a streaming programming language. The streaming language supported is called FPGA Brook, and is based on the existing Brook and GPU Brook languages, which target streaming multiprocessors and graphics processing units (GPUs), respectively. A streaming language is suitable for targeting FPGAs because it allows system designers to express applications in a way that exposes parallelism, which can then be exploited through parallel hardware implementation. FPGA Brook supports replication, which allows the system designer to trade-off area for performance, by specifying the parts of an application that should be implemented as multiple hardware units operating in parallel, to achieve desired application throughput. Hardware units are interconnected through FIFO buffers, which effectively utilize the small memory modules available in FPGAs. The FPGA Brook design flow uses a source-to-source compiler, and combines it with a commercial behavioural synthesis tool to generate hardware. The source-to-source compiler was developed as a part of this thesis and includes novel algorithms for implementation of complex reductions in FPGAs. The design flow is fully automated and presents a user-interface similar to traditional software compilers. A suite of benchmark applications was developed in FPGA Brook and implemented using our design flow. Experimental results show that applications implemented using our flow achieve much higher throughput than the Nios II soft processor implemented in the same FPGA device. Comparison to the commercial C2H compiler from Altera shows that while simple applications can be effectively implemented using the C2H compiler, complex applications achieve significantly better throughput when implemented by our system. Performance of many applications implemented using our design flow would scale further if a larger FPGA device were used. The thesis demonstrates that using an automated design flow to implement streaming applications in FPGAs is a promising methodology.
84

Stream Computing on FPGAs

Plavec, Franjo 01 September 2010 (has links)
Field Programmable Gate Arrays (FPGAs) are programmable logic devices used for the implementation of a wide range of digital systems. In recent years, there has been an increasing interest in design methodologies that allow high-level design descriptions to be automatically implemented in FPGAs. This thesis describes the design and implementation of a novel compilation flow that implements circuits in FPGAs from a streaming programming language. The streaming language supported is called FPGA Brook, and is based on the existing Brook and GPU Brook languages, which target streaming multiprocessors and graphics processing units (GPUs), respectively. A streaming language is suitable for targeting FPGAs because it allows system designers to express applications in a way that exposes parallelism, which can then be exploited through parallel hardware implementation. FPGA Brook supports replication, which allows the system designer to trade-off area for performance, by specifying the parts of an application that should be implemented as multiple hardware units operating in parallel, to achieve desired application throughput. Hardware units are interconnected through FIFO buffers, which effectively utilize the small memory modules available in FPGAs. The FPGA Brook design flow uses a source-to-source compiler, and combines it with a commercial behavioural synthesis tool to generate hardware. The source-to-source compiler was developed as a part of this thesis and includes novel algorithms for implementation of complex reductions in FPGAs. The design flow is fully automated and presents a user-interface similar to traditional software compilers. A suite of benchmark applications was developed in FPGA Brook and implemented using our design flow. Experimental results show that applications implemented using our flow achieve much higher throughput than the Nios II soft processor implemented in the same FPGA device. Comparison to the commercial C2H compiler from Altera shows that while simple applications can be effectively implemented using the C2H compiler, complex applications achieve significantly better throughput when implemented by our system. Performance of many applications implemented using our design flow would scale further if a larger FPGA device were used. The thesis demonstrates that using an automated design flow to implement streaming applications in FPGAs is a promising methodology.
85

System-level design of power efficient FSMD architectures

Agarwal, Nainesh 06 May 2009 (has links)
Power dissipation in CMOS circuits is of growing concern as the computational requirements of portable, battery operated devices increases. The ability to easily develop application specific circuits, rather than program general-purpose architectures can provide tremendous power savings. To this end, we present a design platform for rapidly developing power efficient hardware architectures starting at a system level. This high level VLSI design platform, called CoDeL, allows hardware description at the algorithm level, and thus dramatically reduces design time and power dissipation. We compare the CoDeL platform to a modern DSP and find that the CoDeL platform produces designs with somewhat slower run times but dramatically lower power dissipation. The CoDeL compiler produces an FSMD (Finite State Machine with Datapath) implementation of the circuit. This regular structure can be exploited to further reduce power through various techniques. To reduce dynamic power dissipation in the resulting architecture, the CoDeL compiler automatically inserts clock gating for registers. Power analysis shows that CoDeL's automated, high-level clock gating provides considerably more power savings than existing automated clock gating tools. To reduce static power, we use the CoDeL platform to analyze the potential and performance impact of power gating individual registers. We propose a static gating method, with very low area overhead, which uses the information available to the CoDeL compiler to predict, at compile time, when the registers can be powered off and powered on. Static branch prediction is used to more intelligently traverse the finite state machine description of the circuit to discover gating opportunities. Using simulation and estimation, we find that CoDeL with backward branch prediction gives the best overall combination of gating potential and performance. Compared to a dynamic time-based technique, this method gives dramatically more power savings, without any additional performance loss. Finally, we propose techniques to efficiently partition a FSMD using Integer Linear Programming and a simulated annealing approach. The FSMD is split into two or more simpler communicating processors. These separate processors can then be clock gated or power gated to achieve considerable power savings since only one processor is active at any given time. Implementation and estimation shows that significant power savings can be expected, when the original machine is partitioned into two or more submachines.
86

Flot de conception pour l'ultra faible consommation : échantillonnage non-uniforme et électronique asynchrone / Design flow for ultra-low power : non-uniform sampling and asynchronous circuits

Simatic, Jean 07 December 2017 (has links)
Les systèmes intégrés sont souvent des systèmes hétérogènes avec des contraintes fortes de consommation électrique. Ils embarquent aujourd'hui des actionneurs, des capteurs et des unités pour le traitement du signal. Afin de limiter l'énergie consommée, ils peuvent tirer profit des techniques évènementielles que sont l'échantillonnage non uniforme et l'électronique asynchrone. En effet, elles permettent de réduire drastiquement la quantité de données échantillonnées pour de nombreuses classes de signaux et de diminuer l'activité. Pour aider les concepteurs à développer rapidement des plateformes exploitant ces deux techniques évènementielles, nous avons élaboré un flot de conception nommé ALPS. Il propose un environnement permettant de déterminer et de simuler au niveau algorithmique le schéma d'échantillonnage et les traitements associés afin de sélectionner les plus efficients en fonction de l'application ciblée. ALPS génère directement le convertisseur analogique/numérique à partir des paramètres d'échantillonnage choisis. L'élaboration de la partie de traitement s'appuie quant à elle sur un outil de synthèse de haut niveau synchrone et une méthode de désynchronisation exploitant des protocoles asynchrones spécifiques, capables d'optimiser la surface et la consommation du circuit. Enfin, des simulations au niveau porteslogiques permettent d'analyser et de valider l'énergie consommée avant de poursuivre par un flot classique de placement et routage. Les évaluations conduites montrent une réduction d'un facteur 3 à 8 de la consommation des circuits automatiquement générés. Le flot ALPS permet à un concepteur non-spécialiste de se concentrer sur l'optimisation de l'échantillonnage et de l'algorithme en fonction de l'application et de potentiellement réduire d'un ou plusieurs ordres de grandeur la consommation du circuit. / Integrated systems are mainly heterogeneous systems with strong powerconsumption constraints. They embed actuators, sensors and signalprocessing units. To limit the energy consumption, they can exploitevent-based techniques, namely non-uniform sampling and asynchronouscircuits. Indeed, they allow cutting drastically the amount of sampleddata for many types of signals and reducing the system activity. To helpdesigners in quickly developing platforms that exploit those event-basedtechniques, we elaborated a design framework called ALPS. It proposes anenvironment to determine and simulate at algorithmic level the samplingscheme and the associated processing in order to select the mostefficient ones depending on the targetted application. ALPS generatesdirectly the analog-to-digital converter based on the chosen samplingparameters. The elaboration of the processing unit uses a synchronoushigh-level synthesis tool and a desynchronization method that exploitsspecific asynchronous protocols to optimize the circuit area and powerconsumption. Finally, gate-level simulations allow analyzing andvalidating the energy consumption before continuing with a standardplacement and routing flow. The conducted evaluations show a reductionfactor of 3 to 8 of the consumption of the automatically generatedcirctuis. The flow ALPS allow non-specialists to concentrate on theoptimization of the sampling and the processing in function of theirapplication and to reduice the circuit power consumptions by one toseveral orders of magnitude.
87

Méthode de modélisation et de raffinement pour les systèmes hétérogènes. Illustration avec le langage System C-AMS / Study and development of a AMS design-flow in SytemC : semantic, refinement and validation

Paugnat, Franck 25 October 2012 (has links)
Les systèmes sur puces intègrent aujourd’hui sur le même substrat des parties analogiques et des unités de traitement numérique. Tandis que la complexité de ces systèmes s’accroissait, leur temps de mise sur le marché se réduisait. Une conception descendante globale et coordonnée du système est devenue indispensable de façon à tenir compte des interactions entre les parties analogiques et les partis numériques dès le début du développement. Dans le but de répondre à ce besoin, cette thèse expose un processus de raffinement progressif et méthodique des parties analogiques, comparable à ce qui existe pour le raffinement des parties numériques. L'attention a été plus particulièrement portée sur la définition des niveaux analogiques les plus abstraits et à la mise en correspondance des niveaux d’abstraction entre parties analogiques et numériques. La cohérence du raffinement analogique exige de détecter le niveau d’abstraction à partir duquel l’utilisation d’un modèle trop idéalisé conduit à des comportements irréalistes et par conséquent d’identifier l’étape du raffinement à partir de laquelle les limitations et les non linéarités aux conséquences les plus fortes sur le comportement doivent être introduites. Cette étape peut être d’un niveau d'abstraction élevé. Le choix du style de modélisation le mieux adapté à chaque niveau d'abstraction est crucial pour atteindre le meilleur compromis entre vitesse de simulation et précision. Les styles de modélisations possibles à chaque niveau ont été examinés de façon à évaluer leur impact sur la simulation. Les différents modèles de calcul de SystemC-AMS ont été catégorisés dans cet objectif. Les temps de simulation obtenus avec SystemC-AMS ont été comparés avec Matlab Simulink. L'interface entre les modèles issus de l'exploration d'architecture, encore assez abstraits, et les modèles plus fin requis pour l'implémentation, est une question qui reste entière. Une bibliothèque de composants électroniques complexes décrits en SystemC-AMS avec le modèle de calcul le plus précis (modélisation ELN) pourrait être une voie pour réussir une telle interface. Afin d’illustrer ce que pourrait être un élément d’une telle bibliothèque et ainsi démontrer la faisabilité du concept, un modèle d'amplificateur opérationnel a été élaboré de façon à être suffisamment détaillé pour prendre en compte la saturation de la tension de sortie et la vitesse de balayage finie, tout en gardant un niveau d'abstraction suffisamment élevé pour rester indépendant de toute hypothèse sur la structure interne de l'amplificateur ou la technologie à employer. / Systems on Chip (SoC) embed in the same chip analogue parts and digital processing units. While their complexity is ever increasing, their time to market is becoming shorter. A global and coordinated top-down design approach of the whole system is becoming crucial in order to take into account the interactions between the analogue and digital parts since the beginning of the development. This thesis presents a systematic and gradual refinement process for the analogue parts comparable to what exists for the digital parts. A special attention has been paid to the definition of the highest abstracted analogue levels and to the correspondence between the analogue and the digital abstraction levels. The analogue refinement consistency requires to detect the abstraction level where a too idealised model leads to unrealistic behaviours. Then the refinement step consist in introducing – for instance – the limitations and non-linearities that have a strong impact on the behaviour. Such a step can be done at a relatively high level of abstraction. Correctly choosing a modelling style, that suits well an abstraction level, is crucial to obtain the best trade-off between the simulation speed and the accuracy. The modelling styles at each abstraction level have been examined to understand their impact on the simulation. The SystemC-AMS models of computation have been classified for this purpose. The SystemC-AMS simulation times have been compared to that obtained with Matlab Simulink. The interface between models arisen from the architectural exploration – still rather abstracted – and the more detailed models that are required for the implementation, is still an open question. A library of complex electronic components described with the most accurate model of computation of SystemC-AMS (ELN modelling) could be a way to achieve such an interface. In order to show what should be an element of such a library, and thus prove the concept, a model of an operational amplifier has been elaborated. It is enough detailed to take into account the output voltage saturation and the finite slew rate of the amplifier. Nevertheless, it remains sufficiently abstracted to stay independent from any architectural or technological assumption.
88

Evaluating Vivado High-Level Synthesis on OpenCV Functions for the Zynq-7000 FPGA

Johansson, Henrik January 2015 (has links)
More complex and intricate Computer Vision algorithms combined with higher resolution image streams put bigger and bigger demands on processing power. CPU clock frequencies are now pushing the limits of possible speeds, and have instead started growing in number of cores. Most Computer Vision algorithms' performance respond well to parallel solutions. Dividing the algorithm over 4-8 CPU cores can give a good speed-up, but using chips with Programmable Logic (PL) such as FPGA's can give even more. An interesting recent addition to the FPGA family is a System on Chip (SoC) that combines a CPU and an FPGA in one chip, such as the Zynq-7000 series from Xilinx. This tight integration between the Programmable Logic and Processing System (PS) opens up for designs where C programs can use the programmable logic to accelerate selected parts of the algorithm, while still behaving like a C program. On that subject, Xilinx has introduced a new High-Level Synthesis Tool (HLST) called Vivado HLS, which has the power to accelerate C code by synthesizing it to Hardware Description Language (HDL) code. This potentially bridges two otherwise very separate worlds; the ever popular OpenCV library and FPGAs. This thesis will focus on evaluating Vivado HLS from Xilinx primarily with image processing in mind for potential use on GIMME-2; a system with a Zynq-7020 SoC and two high resolution image sensors, tailored for stereo vision.
89

Využití funkcionálních jazyků pro hardwarovou akceleraci / Hardware Acceleration Using Functional Languages

Hodaňová, Andrea January 2013 (has links)
The aim of this thesis is to research how the functional paradigm can be used for hardware acceleration with an emphasis on data-parallel tasks. The level of abstraction of the traditional hardware description languages, such as VHDL or Verilog, is becoming to low. High-level languages from the domains of software development and modeling, such as C/C++, SystemC or MATLAB, are experiencing a boom for hardware description on the algorithmic or behavioral level. Functional Languages are not so commonly used, but they outperform imperative languages in verification, the ability to capture inherent paralellism and the compactness of code. Data-parallel task are often accelerated on FPGAs, GPUs and multicore processors. In this thesis, we use a library for general-purpose GPU programs called Accelerate and extend it to produce VHDL. Accelerate is a domain-specific language embedded into Haskell with a backend for the NVIDIA CUDA platform. We use the language and its frontend, and create a new backend for high-level synthesis of circuits in VHDL.
90

Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis

Hempel, Gerald 11 November 2019 (has links)
Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform. This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements.

Page generated in 0.1259 seconds