• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 52
  • 10
  • 6
  • 5
  • 3
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 104
  • 104
  • 104
  • 37
  • 28
  • 25
  • 21
  • 21
  • 21
  • 20
  • 20
  • 20
  • 17
  • 16
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
51

Design and Multi-Technology Multi-objective Comparative Analysis of Families of MPSOC.

Wang, Zhoukun 12 November 2009 (has links) (PDF)
Multiprocessor system on chip (MPSOC) have strongly emerged in the past decade in communication, multimedia, networking and other embedded domains. MPSOC became a new paradigm of high performance embedded application design. This thesis addresses the design and the physical implementation of a Network on Chip (NoC) based Multiprocessor System on Chip. We studied several aspects at different design stages: high level synthesis, architecture design, FPGA implementation, application evaluation and ASIC physical implementation. We try to analysis and find the impacts of these aspects for the MPSOC's final performance, power consumption and area cost. We implemented a NoC based 16 processors embedded system on FPGA prototyping. Three NoCs provide different functionalities for sixteen PE tiles. We also demonstrated the use of our performance monitoring system for software debugging and tuning. With the bi-synchronous FIFO method, our GALS architecture successfully solves the long clock signal distribution problem and allows that each clock domain can run at its own clock frequency. On the other hand we successfully implemented AES and TDES block cipher cryptographic algorithms on this platform and results show linear speedup in computation time. The network part of our architecture has been implemented on ASIC technology and has been explored with different timing constraints and different library categories of STmicroelectronics' 65nm/45nm technologies. The experimental results of ASIC and FPGA are compared, and we inducted the discussion of technology change impact on parallel programming.
52

Towards hardware synthesis of a flexible radio from a high-level language / Synthèse matérielle d'une radio flexible et reconfigurable depuis un langage de haut niveau dédié aux couches physiques radio

Tran, Mai-Thanh 13 November 2018 (has links)
La radio logicielle est une technologie prometteuse pour répondre aux exigences de flexibilité des nouvelles générations de standards de communication. Elle peut être facilement reprogrammée au niveau logiciel pour implémenter différentes formes d'onde. En s'appuyant sur une technologie dite logicielle telle que les microprocesseurs, cette approche est particulièrement flexible et assez facile à mettre en œuvre. Cependant, ce type de technologie conduit généralement à une faible capacité de calcul et, par conséquent, à des débit faibles. Pour résoudre ce problème, la technologie FPGA s'avère être une bonne alternative pour la mise en œuvre de la radio logicielle. En effet, les FPGAs offrent une puissance de calcul élevée et peuvent être reconfigurés. Ainsi, inclure des FPGAs dans le concept de radio logicielle peut permettre de prendre en charge plus de formes d'onde avec des exigences plus strictes qu'une approche basée sur la technologie logicielle. Cependant, les principaux inconvénients d’une conception à base de FPGAs sont le niveau du langage de description d'entrée qui doit typiquement être le niveau matériel, et le temps de reconfiguration qui peut dépasser les exigences d'exécution si le FPGA est entièrement reconfiguré. Pour surmonter ces problèmes, cette thèse propose une méthodologie de conception qui exploite à la fois la synthèse de haut niveau et la reconfiguration dynamique. La méthodologie proposée donne un cadre pour construire une radio flexible pour la radio logicielle à base de FPGAs et qui peut être reconfigurée pendant l'exécution. / Software defined radio (SDR) is a promising technology to tackle flexibility requirements of new generations of communication standards. It can be easily reprogrammed at a software level to implement different waveforms. When relying on a software-based technology such as microprocessors, this approach is clearly flexible and quite easy to design. However, it usually provides low computing capability and therefore low throughput performance. To tackle this issue, FPGA technology turns out to be a good alternative for implementing SDRs. Indeed, FPGAs have both high computing power and reconfiguration capacity. Thus, including FPGAs into the SDR concept may allow to support more waveforms with more strict requirements than a processor-based approach. However, main drawbacks of FPGA design are the level of the input description language that basically needs to be the hardware level, and, the reconfiguration time that may exceed run-time requirements if the complete FPGA is reconfigured. To overcome these issues, this PhD thesis proposes a design methodology that leverages both high-level synthesis tools and dynamic reconfiguration. The proposed methodology is a guideline to completely build a flexible radio for FPGA-based SDR, which can be reconfigured at run-time.
53

Low Power Technology Mapping and Performance Driven Placement for Field Programmable Gate Arrays

Li, Hao, 09 November 2004 (has links)
As technology geometries have shrunk to the deep sub-micron (DSM) region, the chip density and clock frequency of FPGAs have increased significantly. This makes computer-aided design (CAD) for FPGAs very important and challenging. Due to the increasing demands of portable devices and mobile computing, low power design is crucial in CAD nowadays. In this dissertation, we present a framework to optimize power consumption for technology mapping onto FPGAs. We propose a low-power technology mapping scheme which is able to predict the impact of choosing a subnetwork covering on the ultimate mapping solution. We dynamically update the power estimation for a sequence of options and choose the one that yields the least power consumption. This technique outperforms the best low-power mapping algorithms reported in the literature. We further extend this work to generate mapping solutions with optimal delay. We also propose placement algorithms to optimize the performance of the placed circuit. Net cluster based methodology is designed to ensure closely connected nets will be routed in the same region. Net cluster is obtained by clique partitioning on the net dependency graph. Net positions and consequent cell positions are computed with a force-directed approach which drags nets connected to closer positions. We further study the performance-driven placement problem for high level synthesis. We use the Automatic Design Instantiation (AUDI) high level synthesis system to generate a register-transistor level (RTL) netlist. This RTL netlist is fed into a CAD tool for physical synthesis. We do not necessarily go through the entire physical design process which is usually quite time-consuming. Instead, we have created an accurate wirelength/timing estimator working on the floorplan. If the estimated timing information does not meet the constraints, a guidance is generated and provided to AUDI system. The guidance consists of the estimated timing information and instructions to produce a new netlist in order to improve the performance. Finally the circuit is placed and routed on a satisfying design. This performance-driven placement framework yields better results as compared to a commercial CAD tool.
54

CHESS: A Tool for CDFG Extraction and High-Level Synthesis of VLSI Systems

Namballa, Ravi K 08 July 2003 (has links)
In this thesis, a new tool, named CHESS, is designed and developed for control and data-flow graph (CDFG) extraction and the high-level synthesis of VLSI systems. The tool consists of three individual modules for:(i) CDFG extraction, (ii) scheduling and allocation of the CDFG, and (iii) binding, which are integrated to form a comprehensive high-level synthesis system. The first module for CDFG extraction includes a new algorithm in which certain compiler-level transformations are applied first, followed by a series of behavioral-preserving transformations on the given VHDL description. Experimental results indicate that the proposed conversion tool is quite accurate and fast. The CDFG is fed to the second module which schedules it for resource optimization under a given set of time constraints. The scheduling algorithm is an improvement over the Tabu Search based algorithm described in [6] in terms of execution time. The improvement is achieved by moving the step of identifying mutually exclusive operations to the CDFG extraction phase, which, otherwise, is normally done during scheduling. The last module of the proposed tool implements a new binding algorithm based on a game-theoretic approach. The problem of binding is formulated as a non-cooperative finite game, for which a Nash-Equilibrium function is applied to achieve a power-optimized binding solution. Experimental results for several high-level synthesis benchmarks are presented which establish the efficacy of the proposed synthesis tool.
55

Architecture synthesis for adaptive multiprocessor systems on chip

Ishebabi, Harold January 2010 (has links)
This thesis presents methods for automated synthesis of flexible chip multiprocessor systems from parallel programs targeted at FPGAs to exploit both task-level parallelism and architecture customization. Automated synthesis is necessitated by the complexity of the design space. A detailed description of the design space is provided in order to determine which parameters should be modeled to facilitate automated synthesis by optimizing a cost function, the emphasis being placed on inclusive modeling of parameters from application, architectural and physical subspaces, as well as their joint coverage in order to avoid pre-constraining the design space. Given a parallel program and a set of an IP library, the automated synthesis problem is to simultaneously (i) select processors (ii) map and schedule tasks to them, and (iii) select one or several networks for inter-task communications such that design constraints and optimization objectives are met. The research objective in this thesis is to find a suitable model for automated synthesis, and to evaluate methods of using the model for architectural optimizations. Our contributions are a holistic approach for the design of such systems, corresponding models to facilitate automated synthesis, evaluation of optimization methods using state of the art integer linear and answer set programming, as well as the development of synthesis heuristics to solve runtime challenges. / Aktuelle Technologien erlauben es komplexe Multiprozessorsysteme auf einem Chip mit Milliarden von Transistoren zu realisieren. Der Entwurf solcher Systeme ist jedoch zeitaufwendig und schwierig. Diese Arbeit befasst sich mit der Frage, wie On-Chip Multiprozessorsysteme ausgehend von parallelen Programmen automatisch synthetisiert werden können. Die Implementierung der Multiprozessorsysteme auf rekonfigurierbaren Chips erlaubt es die gesamte Architektur an die Struktur eines vorliegenden parallelen Programms anzupassen. Auf diese Weise ist es möglich die aktuellen technologischen Unzulänglichkeiten zu umgehen, insbesondere die nicht weitersteigende Taktfrequenzen sowie den langsamen Zugriff auf Datenspeicher. Eine Automatisierung des Entwurfs von Multiprozessorsystemen ist notwendig, da der Entwurfsraum von Multiprozessorsystemen zu groß ist, um vom Menschen überschaut zu werden. In einem ersten Ansatz wurde das Syntheseproblem mittels linearer Gleichungen modelliert, die dann durch lineare Programmierungswerkzeuge gelöst werden können. Ausgehend von diesem Ansatz wurde untersucht, wie die typischerweise langen Rechenzeiten solcher Optimierungsmethoden durch neuere Methode aus dem Gebiet der Erfüllbarkeitsprobleme der Aussagenlogik minimiert werden können. Dabei wurde die Werkzeugskette Potassco verwendet, in der lineare Programme direkt in Logikprogramme übersetzt werden können. Es wurde gezeigt, dass dieser zweite Ansatz die Optimierungszeit um bis zu drei Größenordnungen beschleunigt. Allerdings lassen sich große Syntheseprobleme auf diese weise wegen Speicherbegrenzungen nicht lösen. Ein weiterer Ansatz zur schnellen automatischen Synthese bietet die Verwendung von Heuristiken. Es wurden im Rahmen diese Arbeit drei Heuristiken entwickelt, die die Struktur des vorliegenden Syntheseproblems ausnutzen, um die Optimierungszeit zu minimieren. Diese Heuristiken wurden unter Berücksichtigung theoretischer Ergebnisse entwickelt, deren Ursprung in der mathematische Struktur des Syntheseproblems liegt. Dadurch lassen sich optimale Architekturen in kurzer Zeit ermitteln. Die durch diese Dissertation offen gewordene Forschungsarbeiten sind u. a. die Berücksichtigung der zeitlichen Reihenfolge des Datenaustauschs zwischen parallelen Tasks, die Optimierung des logik-basierten Ansatzes, die Integration von Prozessor- und Netzwerksimulatoren zur funktionalen Verifikation synthetisierter Architekturen, sowie die Entwicklung geeigneter Architekturkomponenten.
56

Simulation Parallèle en SystemC/TLM de Composants Matériels décrits pour la Synthèse de Haut-Niveau / Parallel SystemC/TLM Simulation of Hardware Components described for High-Level Synthesis

Becker, Denis 11 December 2017 (has links)
Les systèmes sur puce sont constitués d'une partie matérielle (un circuit intégré) et d'une partie logicielle (un programme) qui utilise les ressources matérielles de la puce. La conséquence de cela est que le logiciel d'un système sur puce est intrinsèquement lié à sa partie matérielle. Les composants matériels d'accélération sont des facteurs clés de différenciation d'un produit à l'autre.Il est nécessaire de pouvoir simuler ces systèmes très tôt lors de leur conception; bien avant que la puce ne soit physiquement disponible, et même avant que la puce ne soit complètement spécifiée. Pour cela, un modèle du système sur puce est réalisé à l'aide du langage SystemC, au niveau d'abstraction TLM (Transaction Level Modeling). La partie matérielle d'un système sur puce est constituée de composants, qui s'exécutent en parallèle. Pour autant, la simulation avec le simulateur SystemC de référence est séquentielle. Ceci permet de garantir les bonnes propriétés des simulations SystemC, en particulier la reproductibilité et le confort d'écriture des modèles.Les travaux de cette thèse portent sur la simulation parallèle de modèles SystemC/TLM. L'objectif de l'exécution parallèle est d'accélérer les simulations dans un mode d'utilisation correspondant à la phase de développement, où il est primordial de disposer de simulations qui donnent rapidement un résultat. Afin de cerner le problème de performance remarqué sur des modèles complexes à STMicroelectronics, le premier travail de cette thèse a été d'analyser le profil d'exécution d'une étude de cas représentative de la complexité actuelle des platformes SystemC/TLM. Pour cette étude, nous avons développé un outil de collecte de traces et de visualisation. Les résultats de cette analyse ont indiqué que la lenteur d'exécution en simulation était due à la complexité des composants matériels d'accélération. L'étude de l'état de l'art en simulation parallèle de modèles SystemC nous a conduit à chercher d'autres pistes que celles actuellement existantes.Pour réaliser les composants matériels plus rapidement, et permettre d'augmenter la réutilisabilité de composants d'un projet à l'autre, le flot de conception HLS (High Level Synthesis) est utilisé, notamment à STMicroelectronics. Ce flot de conception permet, à partir de la description d'une fonction en C/C++, de générer un plan de composant matériel qui va réaliser la même fonction. La description des composants est découpée en sous-fonctions, individuellement plus simples. Afin d'obtenir de bonnes performances, les sous-fonctions sont assemblées en chaîne, à travers laquelle circulent les données à traiter. Il est indispensable de pouvoir réutiliser le code écrit pour la HLS dans les simulations SystemC/TLM@: cette situation deviendra de plus en plus fréquente, et il n'a pas assez de temps pour réécrire ces modèles dans ces projets courts.Nous avons développé une infrastructure de simulation parallèle permettant d'intégrer et de simuler efficacement des composants de traitement de données écrits pour la HLS. L'application de cette infrastructure à un exemple a permis d'accélérer l'exécution de la simulation d'un facteur 1.6 avec 4 processeurs. Au-delà de ce résultat, les conclusions principales de cette thèse sont que la simulation parallèle de modèles à haut niveau d'abstraction, en SystemC/TLM, passe par la combinaison de plusieurs techniques de parallélisation. Il est également important d'identifier les parties parallélisables dans des simulations industrielles, notamment pour les nouveaux défis que sont les simulations multi-physiques et l'internet des objets. / Systems on chip consists in a hardware part (an integrated circuit) and a software part (a program) that uses the hardware resources of the chip. Consequently, the embedded software is intrinsically connected to the chip hardware. Hardware acceleration components are key differentiation factors from one product to another.It is necessary to simulate systems on chip very early in the design flow; before the chip is physically available and even before its full specification. For such simulations, developers write a model of the system on chip in SystemC, at the TLM (Transaction Level Modeling) abstraction level. The hardware part of a chip consists in components that behave in parallel with each other. However, the reference SystemC simulator execute simulations sequentially. The sequential execution enables to keep good properties of SystemC simulations, namely reproducibility and ease of model writing.This thesis work address the parallel execution of SystemC/TLM simulations. The goal of parallel simulation is to speed up simulations, in the context of the model development, where it is important to quickly get results. In order to identify the performance problem of complex models at STMicroelectronics, the first step of this thesis was to analyse the execution profile of a case study, representative of the complexity of current platforms. For this study, we developed a trace recording and visualization tool. The results of this study indicated that the performance critical parts of the simulation are hardware acceleration components. Studying existing parallel simulation approaches led us to look for other parallel simulation techniques.To speed up the development of hardware acceleration components, and increase the reusability from one project to another, the HLS (High Level Synthesis) design flow is used, notably at STMicroelectronics. This design flow enables to generate a logically synthesizable model of a component, from a high level behavioral description in C/C++. This design flow also constraints the development: it is split in sub-functions, assembled in a pipeline. The code written for HLS must be re-used in SystemC/TLM models: this situation will become more and more frequent and there is no time to rewrite the models of such components within short delays.We developed a parallel simulation infrastructure enabling the integration and efficient simulation of hardware components written for HLS.We applied this infrastructure to an example platform, which resulted in speeding up the simulation. Beyond this result, one of the main conclusion of this thesis is that parallel simulation of abstract SystemC/TLM models will require to combine multiple parallelization techniques. Future research work can identify other types of potential parallelism in industrial models. This will become critical with the new challenges of simulation, as multi-physical simulations and internet of things.
57

Automatic synthesis of hardware accelerator from high-level specifications of physical layers for flexible radio / Synthèse automatique d'accélérateurs matériels depuis des spécifications de haut niveau de formes d'ondes pour la radio flexible

Ouedraogo, Ganda Stéphane 10 December 2014 (has links)
L'internet des objets vise à connecter des milliards d'objets physiques ainsi qu'à les rendre accessibles depuis le monde numérique que représente l'internet d'aujourd'hui. Pour ce faire, l'accès à ces objets sera majoritairement réalisé sans fil et sans utiliser d'infrastructures prédéfinies ou de normes spécifiques. Une telle technologie nécessite de définir et d'implémenter des nœuds radio intelligents capables de s'adapter à différents protocoles physiques de communication. Nos travaux de recherches ont consisté à définir un flot de conception pour ces nœuds intelligents partant de leur modélisation à haut niveau jusqu'à leur implémentation sur des cibles de types FPGA. Ce flot vise à améliorer la programmabilité des formes d'ondes par l'utilisation de spécification de haut niveau exécutables et synthétisables, il repose sur la synthèse de haut niveau (HLS pour High Level Synthesis) pour le prototypage rapide des briques de base ainsi que sur le modèle de calcul de types flot de données des formes d'ondes radio. Le point d'entrée du flot consiste en un langage à usage spécifique (DSL pour Domain Specific Language) qui permet de modéliser à haut niveau une forme d'onde tout en insérant des contraintes d'implémentation pour des architectures reconfigurables telles que les FPGA. Il est associé à un compilateur qui permet de générer du code synthétisable ainsi que des scripts de synthèse. La forme d'onde finale est composée d'un chemin de données et d'une entité de contrôle implémentée sous forme d'une machine d'état hiérarchique. / The Internet of Things (IoT) aims at connecting billions of communicating devices through an internet-like network. To this aim, the access to these things is expected to be performed via wireless technologies without using any predefined infrastructures or standards. This technology requires defining and implementing smart nodes capable to adapt to different radio communication protocols. In this thesis, we have defined a design methodology/flow, for such smart nodes, starting from their high-level specification down to their implementation in FPGA fabrics. This flow aims at improving the programmability of the waveforms by leveraging some high-level specifications. Thus, it relies on the High-Level Synthesis (HLS) for rapid prototyping of the waveforms functional blocks as well as the dataflow model of computation. Its entry point is Domain-Specific Language which enables modeling a waveform while inserting some implementation constraints for reconfigurable architectures such as the FPGAs. The flow is featured with a compiler which purpose is to produce some synthesis scripts and generate some RTL source code. The final waveform consists of a datapath and a control unit implemented as a Hierarchical Finite State Machine (HFSM).
58

Využití syntézy na systémové úrovni pro aplikace s platformou ZYNQ / Using High-Level Synthesis for ZYNQ Platform Applications

Husák, Jiří January 2015 (has links)
This work describes using High-Level Synthesis in image processing application. The application is for Xilinx ZYNQ platform. The source code of components for FPGA is written in C++ programming language. For High-Level Synthesis is used Xilinx Vivado HLS tool. In the application are designed and implemented Sobel filter, Median filter, Bilateral filter and architecture for AdaBoost classificator. The extension of this work is implemented the component for network traffic. The component finds the begin of the packet.
59

Classification of road side material using convolutional neural network and a proposed implementation of the network through Zedboard Zynq 7000 FPGA

Rahman, Tanvir 12 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / In recent years, Convolutional Neural Networks (CNNs) have become the state-of- the-art method for object detection and classi cation in the eld of machine learning and arti cial intelligence. In contrast to a fully connected network, each neuron of a convolutional layer of a CNN is connected to fewer selected neurons from the previous layers and kernels of a CNN share same weights and biases across the same input layer dimension. These features allow CNN architectures to have fewer parameters which in turn reduces calculation complexity and allows the network to be implemented in low power hardware. The accuracy of a CNN depends mostly on the number of images used to train the network, which requires a hundred thousand to a million images. Therefore, a reduced training alternative called transfer learning is used, which takes advantage of features from a pre-trained network and applies these features to the new problem of interest. This research has successfully developed a new CNN based on the pre-trained CIFAR-10 network and has used transfer learning on a new problem to classify road edges. Two network sizes were tested: 32 and 16 Neuron inputs with 239 labeled Google street view images on a single CPU. The result of the training gives 52.8% and 35.2% accuracy respectively for 250 test images. In the second part of the research, High Level Synthesis (HLS) hardware model of the network with 16 Neuron inputs is created for the Zynq 7000 FPGA. The resulting circuit has 34% average FPGA utilization and 2.47 Watt power consumption. Recommendations to improve the classi cation accuracy with deeper network and ways to t the improved network on the FPGA are also mentioned at the end of the work.
60

FPGA Accelerated Digital Image Correlation For Clamping Force Measurement

Csuvarszki, János Csanád January 2023 (has links)
Digital image correlation is a contactless optical method used for displacement and strain measurement which has become increasingly popular in the field of experimental mechanics. A specialized use case for the algorithm is to measure the clamping force in bolted joints, a crucial metric when considering the longevity and reliability of the constructs. However, in order to be able to measure the clamping force in real-time, the digital image correlation has to be carried out rapidly as the tightening of the bolts can happen in milliseconds. One approach to increase the speed of the process is hardware acceleration. This thesis presents and evaluates multiple variations of an Field Programmable Gate Arrays (FPGA)-accelerated digital image correlation framework. The goal of the project is to accelerate the image correlation to sufficient speeds so it can be used for highly dynamic and continuous tightenings, which can take 20 to 200 ms and 200 to 1000 ms or more to finish respectively. A baseline implementation was created based on an innovative digital image correlation framework. Strain calculation was altered for the specialized use of clamping force determination. Afterward, different parts of the framework were selected and optimized for hardware acceleration. The parts include both preprocessing and correlation steps. The targets for acceleration were optimized using techniques such as quantization and pipelining. The accelerators were created using high-level synthesis and the resulting implementations utilize both the processor and FPGA parts of a Zynq-7000 system-on-chip. Results show that all accelerators reduce the total execution time of the framework by varying degrees. Accelerators targeting the preprocessing parts such as Gaussian and B-spline filtering proved to be the most effective in speeding up the process achieving a 1,56 and 1,12 times speedup for the fixed-point and a 1,2 and 1,07 times speedup for the double floating-point versions respectively. A combined version containing multiple accelerators resulted in a 1,9 times average speedup. It can be concluded that the presented approach is not fast enough for all highly dynamic tightening processes, as the fastest execution speed achieved is above 100 ms, but could be used for continuous tightening depending on constructs. / Digital image correlation(DIC) är en kontaktlös optisk metod, använd för mätning av förskjutning och töjning, som blivit en allt mer populär inom experimentell mekanik. Ett användningsområde för algoritmen är att mäta klämkraften i skruvförband, en avgörande faktor för hållbarhet och tillförlitlighet i konstruktioner. Men för att mäta klämkraft i realtid, behöver DIC utföras väldigt snabbt då åtdragningsförloppet kan ske inom loppet av millisekunder. En metod för att öka hastigheten är hårdvaruacceleration. Denna avhandling presenterar och utvärderar ett flertal varianter av ett Field Programmable Gate Arrays (FPGA)-accelererat DIC ramverk. Avhandlingen syftar till att accelerera bildkorrelationen tillräckligt mycket för att kunna användas till dynamiska och kontinuerliga åtdragningar som tar 20 till 200 ms respektive 200 till 1000 ms eller mer. En referens-implementation skapades baserat på ett innovativt DIC ramverk. Beräkning av töjning anpassades för specialfallet: bestämmandet av klämkraft. Efter det valdes olika delar av ramverket ut och optimerades för hårdvaruacceleration. De valda delarna innehåller både preprocessor- och korrelationssteg. Delarna som valdes ut för acceleration optimerades med hjälp av tekniker som kvantisering och pipelining. Acceleratorerna skapades med hjälp av high-level synthesis och de resulterande implementationerna använder både processor och FPGA i en Zynq-7000 system-on-chip. Resultaten visar att alla acceleratorer reducerar ramverkets totala exekveringstid med varierande grad. Acceleratorer som riktar sig mot preprocessing som Gaussian och B-spline filtrering visade sig vara mest effektiva och resulterade i en 1.56 respektive 1.12 gånger snabbare exekveringstid för fixed point, och 1.2 respektive 1.07 gånger snabbare exikveringstid för double floating-point. En kombinerad version som innehöll flera acceleratorer resulterade i en 1.9 gånger snabbare genomsnittlig exekveringstid. Slutsatsen är att den presenterade metoden inte är tillräckligt snabb för alla dynamiska åtdragningsförlopp, då den snabbaste uppnådda exekveringstiden är över 100 ms. Men metoden skulle kunna användas för kontinuerliga åtdragningar beroende på konstruktionen.

Page generated in 0.0677 seconds