• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 51
  • 10
  • 6
  • 5
  • 3
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 103
  • 103
  • 103
  • 36
  • 27
  • 25
  • 21
  • 21
  • 20
  • 19
  • 19
  • 19
  • 17
  • 16
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Le modèle flot de données appliqué à la synthèse haut-niveau pour le traitement d’images sur caméra intelligente à base de FPGA. Application aux systèmes d’apprentissage supervisés / The dataflow model for High-Level Synthesis on FPGA-based smart camera. Application to supervised machine learning algorithms

Bourrasset, Cédric 09 February 2016 (has links)
La synthèse de haut niveau (High Level Synthesis (HLS)) est un domaine de recherche qui vise à automatiser le passage de la description d’un algorithme à une représentation au niveau registre de celui-ci en vue de son implantation sur un circuit numérique. Si le problème reste à ce jour largement ouvert pour des algorithmes quelconques, des solutions ont commencé à voir le jour au sein de domaines spécifiques. C’est notamment le cas dans le domaine du traitement d’images où l’utilisation du modèle flot de données offre un bon compromis entre expressivité et efficacité. C’est ce que nous cherchons à démontrer dans cette thèse, qui traite de l’applicabilité du modèle flot de données au problème de la synthèse haut niveau à travers deux exemples d’implantation d’applications de vision complexes sur FPGA. Les applications, issues du domaine de l’apprentissage supervisé sont un système de classification à bases de machines à vecteurs supports (SVM) et un système de reconnaissance exploitant un réseau de neurones convolutionnels (CNN). Dans les deux cas, on étudie les problématiques posées par la reformulation, au sein du modèle flot de données, des structures de données et algorithmes associés ainsi que l’impact de cette reformulation sur l’efficacité des implémentations résultantes. Les expérimentations sont menées avec CAPH, un outil de HLS exploitant le modèle flot de données. / High-level synthesis is a field of research that aims to automate the transformation from an high-level algorithmic description to a register level representation for its implementation on a digital circuit. Most of existing tools based on imperative languages try to provide a general solution to any type of existing algorithm. This approach can be inefficient in some applications where the algorithm description relies on a different paradigm from the hardware execution model. This major drawback can be figured out by the use of specific langages, named Domain Specific Language (DSL). Applied to the image processing field, the dataflow model appears as a good compromise between the expressiveness of the algorithm description and the final implementation efficiency. This thesis address the use of the dataflow programming model as response to high-level synthesis problematics for image processing algorithms on FPGA. To demonstrate the effectiveness of the proposed method but also to put forth the algorithmic reformulation effort to be made by the developer, an ambitious class of applications was chosen : supervised machine learning systems. It will be addressed in particular two algorithms, a classification system based on Support Vector Machine and a convolutional neural network. Experiments will be made with the CAPH langage, a specific HLS tool based on the dataflow programming model.
72

Akcelerace kompresního algoritmu LZ4 v FPGA / Acceleration of LZ4 Compression Algorithm in FPGA

Marton, Dominik January 2017 (has links)
This project describes the implementation of an LZ4 compression algorithm in a C/C++-like language, that can be used to generate VHDL programs for FPGA integrated circuits embedded in accelerated network interface controllers (NICs). Based on the algorithm specification, software versions of LZ4 compressor and decompressor are implemented, which are then transformed into a synthesizable language, that is then used to generate fully functional VHDL code for both components. Execution time and compression ratio of all implementations are then compared. The project also serves as a demonstration of usability and influence of high-level synthesis and high-level approach to design and implementation of hardware applications known from common programming languages.
73

Implementace algoritmu dekompozice matice a pseudoinverze na FPGA / Implementation of matrix decomposition and pseudoinversion on FPGA

Röszler, Pavel January 2018 (has links)
The purpose of this thesis is to implement algorithms of matrix eigendecomposition and pseudoinverse computation on a Field Programmable Gate Array (FPGA) platform. Firstly, there are described matrix decomposition methods that are broadly used in mentioned algorithms. Next section is focused on the basic theory and methods of computation eigenvalues and eigenvectors as well as matrix pseudoinverse. Several examples of implementation using Matlab are attached. The Vivado High-Level Synthesis tools and libraries were used for final implementation. After the brief introduction into the FPGA fundamentals the thesis continues with a description of implemented blocks. The results of each variant were compared in terms of timing and FPGA utilization. The selected block has been validated on the development board and its arithmetic precision was analyzed.
74

Methodology to Derive Resource Aware Context Adaptable Architectures for Field Programmable Gate Arrays

Samala, Harikrishna 01 December 2009 (has links)
The design of a common architecture that can support multiple data-flow patterns (or contexts) embedded in complex control flow structures, in applications like multimedia processing, is particularly challenging when the target platform is a Field Programmable Gate Array (FPGA) with a heterogeneous mixture of device primitives. This thesis presents scheduling and mapping algorithms that use a novel area cost metric to generate resource aware context adaptable architectures. Results of a rigorous analysis of the methodology on multiple test cases are presented. Results are compared against published techniques and show an area savings and execution time savings of 46% each.
75

Throughput Constrained and Area Optimized Dataflow Synthesis for FPGAs

Sun, Hua 21 February 2008 (has links) (PDF)
Although high-level synthesis has been researched for many years, synthesizing minimum hardware implementations under a throughput constraint for computationally intensive algorithms remains a challenge. In this thesis, three important techniques are studied carefully and applied in an integrated way to meet this challenging synthesis requirement. The first is pipeline scheduling, which generates a pipelined schedule that meets the throughput requirement. The second is module selection, which decides the most appropriate circuit module for each operation. The third is resource sharing, which reuses a circuit module by sharing it between multiple operations. This work shows that combining module selection and resource sharing while performing pipeline scheduling can significantly reduce the hardware area, by either using slower, more area-efficient circuit modules or by time-multiplexing faster, larger circuit modules, while meeting the throughput constraint. The results of this work show that the combined approach can generate on average 43% smaller hardware than possible when a single technique (resource sharing or module selection) is applied. There are four major contributions of this work. First, given a fixed throughput constraint, it explores all feasible frequency and data introduction interval design points that meet this throughput constraint. This enlarged pipelining design space exploration results in superior hardware architectures than previous pipeline synthesis work because of the larger sapce. Second, the module selection algorithm in this work considers different module architectures, as well as different pipelining options for each architecture. This not only addresses the unique architecture of most FPGA circuit modules, it also performs retiming at the high-level synthesis level. Third, this work proposes a novel approach that integrates the three inter-related synthesis techniques of pipeline scheduling, module selection and resource sharing. To the author's best knowledge, this is the first attempt to do this. The integrated approach is able to identify more efficient hardware implementations than when only one or two of the three techniques are applied. Fourth, this work proposes and implements several algorithms that explore the combined pipeline scheduling, module selection and resource sharing design space, and identifies the most efficient hardware architecture under the synthesis constraint. These algorithms explore the combined design space in different ways which represents the trade off between algorithm execution time and the size of the explored design space.
76

An Embedded System for Classification and Dirt Detection on Surgical Instruments

Hallgrímsson, Guðmundur January 2019 (has links)
The need for automation in healthcare has been rising steadily in recent years, both to increase efficiency and for freeing educated workers from repetitive, menial, or even dangerous tasks. This thesis investigates the implementation of two pre-determined and pre-trained convolutional neural networks on an FPGA for the classification and dirt detection of surgical instruments in a robotics application. A good background on the inner workings and history of artificial neural networks is given and expanded on in the context of convolutional neural networks. The Winograd algorithm for computing convolutional operations is presented as a method for increasing the computational performance of convolutional neural networks. A selection of development platform and toolchains is then made. A high-level design of the overall system is explained, before details of the high-level synthesis implementation of the dirt detection convolutional neural network are shown. Measurements are then made on the performance of the high-level synthesis implementation of the various blocks needed for convolutional neural networks. The main convolutional kernel is implemented both by using the Winograd algorithm and the naive convolution algorithm and comparisons are made. Finally, measurements on the overall performance of the end-to-end system are made and conclusions are drawn. The final product of the project gives a good basis for further work in implementing a complete system to handle this functionality in a manner that is both efficient in power and low in latency. Such a system would utilize the different strengths of general-purpose sequential processing and the parallelism of an FPGA and tie those together in a single system. / Behovet av automatisering inom vård och omsorg har blivit allt större de senaste åren, både vad gäller effektivitet samt att befria utbildade arbetare från repetitiva, enkla eller till och med farliga arbetsmoment. Den här rapporten undersöker implementeringen av två tidigare för-definierade och för-tränade faltade neurala nätverk på en FPGA, för att klassificera och upptäcka föroreningar på kirurgiska verktyg. En bra bakgrund på hur neurala nätverk fungerar, och deras historia, presenteras i kontexten faltade neurala nätverk. Winograd algoritmen, som används för att beräkna faltningar, beskrivs som en metod med syfte att öka beräkningsmässig prestanda. Val av utvecklingsplattform och verktyg utförs. Systemet beskrivs på en hög nivå, innan detaljer om hög-nivå-syntesimplementeringen av förorenings-detekterings-nätverket visas. Mätningar görs sedan av de olika bygg-blockens prestanda. Kärnkoden med faltnings-algoritmen implementeras både med Winograd-algoritmen och med den traditionella, naiva, metoden, och utfallet för bägge metoderna jämförs. Slutligen utförs mätningar på hela systemets prestanda och slutsatser dras därav. Projektets slutprodukt kan användas som en bra bas för vidare utveckling av ett komplett system som både är effektivt angående effektförbrukning och har bra prestanda, genom att knyta ihop styrkan hos traditionella sekventiella processorer med parallelismen i en FPGA till ett enda system.
77

High Data Rate Signal Processing Architectures and Compilation Strategies for Scalable, Multi-Gigabit Digital Systems

Nybo, Daniel Alexander 12 April 2024 (has links) (PDF)
In this study we present a high-performance computing architecture and hardware acceleration strategy for a heterogeneous multi-gigabit computing system. The system architecture integrates a BeeGFS distributed file system, capable of achieving 80 Gbps of sustained write throughput across five nodes, essential for managing the high data volumes generated by a 25 high performance computer (HPC) compute cluster. To ensure operational efficiency and scalability, the tasks performed on the Linux compute cluster consisting of 30 nodes are automated using Ansible, facilitating seamless deployment, management, and updates. We present compilation strategies for a hardware accelerated Polyphase Filter Bank (PFB) channelization routine optimized for Xilinx Ultrascale+ FPGAs, capable of simultaneously processing 2048 channels per 12 input streams. This setup shows the efficiency of High Level Sysnthesis of FPGA-based signal processing in handling demanding data analysis tasks. We also present the implementation and verification of a 1.6 Gsps Direct Memory Access (DMA) transfer from DDR4 memory to a modern Radio Frequency System on Chip (RFSoC) digital to analog converter. The combination of a high-throughput file system, streamlined automation, and advanced signal processing capabilities shows these system's ability to meet the needs of complex, real-time data analysis and processing applications, advancing the field of computational research.
78

Evaluation of FPGA Partial Reconfiguration : for real-time Vision applications

Guo, Guanghao January 2020 (has links)
The usage of programmable logic resources in Field Programmable Gate Arrays, also known as FPGAs, has increased a lot recently due to the complexity of the algorithms, especially for some computer vision algorithms. Due to this reason, sometimes the hardware resources in the FPGA are not sufficient. Partial reconfiguration provides us with the possibility to solve this problem. Partial reconfiguration is a technique that can be used to reconfigure specific parts of the FPGA during run-time. By using this technique, we can reduce the need for programmable logic resources. This master thesis project aims to design a software framework for partial reconfiguration that can load a set of processing components/algorithms (e.g. object detection, optical flow, Harris-Corner detection etc) in the FPGA area without affecting real-time static components such as camera capture, basic image filtering and colour conversion which are continuously running. Partial reconfiguration has been applied to two different video processing pipelines, a direct streaming architecture and a frame buffer streaming architecture respectively. The result shows that reconfiguration time is predictable which depends on the partial bitstream size, and that partial reconfiguration can be used in real-time applications taking the partial bitstream size and the frequency to switch the partial bitstreams into account. / Användningen av programmerbara logiska resurser i Field Programmable Gate Arrayer, även känd som FPGA:er, har ökat mycket nyligen på grund av komplexiteten hos algoritmerna, speciellt för vissa datorvisningsalgoritmer. På grund av detta är det ibland inte tillräckligt med hårdvaruresurser i FPGA:n. Partiell omkonfiguration ger oss möjlighet att lösa detta problem. Partiell omkonfigurering är en teknik som kan användas för att omkonfigurera specifika delar av FPGA:n under körtid. Genom att använda denna teknik kan vi minska behovet av programmerbara logiska resurser. Det här mastersprojektet syftar till att utforma ett programvaru-ramverk för partiell omkonfiguration som kan ladda en uppsättning processkomponenter / algoritmer (t.ex. objektdetektering, optiskt flöde, Harris-Corner detection etc) i FPGA- området utan att påverka statiska realtids-komponenter såsom kamerafångst, grundläggande bildfiltrering och färgkonvertering som körs kontinuerligt. Partiell omkonfiguration har tillämpats på två olika videoprocessnings-pipelines, en direkt-strömmande respektive en rambuffert-strömmande arkitektur. Resultatet visar att omkonfigurationstiden är förutsägbar och att partiell omkonfiguration kan användas i realtids-tillämpningar.
79

Develop a Graphical User Interface for the assembler for SiLago Platform / Utveckla ett grafiskt användargränssnitt för assemblern för SiLago Platform

Wang, Yuxuan January 2023 (has links)
Vesyla-II is developed as the High-Level Synthesis (HLS) tool serving the SiLago platform. The assembler Manas is a part of the Coarse Grain Reconfigurable Architectures (CGRA) compiler in Vesyla-II, which is used to transform the information from source code into the target language. A group of graphical Intermediate Representatives (IRs) associated with the instruction set of Dynamically Reconfigurable Resource Array (DRRA) plays an important role in this transformation. Manual effort is required to optimize the result of transformation by configuring the DRRA instructions and organizing the relationship among them. In this thesis project, we provide a graphical user interface ManasUI to assist Vesyla-II developers in graphically operating the transformation. We follow the normal procedure of software development and start by working out the requirements of ManasUI based on the background knowledge of the graphical IRs applied in Vesyla-II. Then we design and implement the prototype of ManasUI based on the solution of the web application. ManasUI takes the DRRA instruction set or one of the IRs — Hierarchical Multi-Thread Dependency Graph (HMTDG) as input and graphically represents the concepts of node, hierarchy, and dependency relationship in HMTDG. The canvas component in ManasUI provides a handy interface for users to interact with the graph of HMTDG directly. The functionality, scalability, and portability of ManasUI are verified by conducting a group of test cases. The test cases are designed aiming to cover all user scenarios. From the results of testing, the prototype of ManasUI meets all the requirements that we have identified. / Vesyla-II är utvecklat som ett HLS verktyg för SiLago-plattformen. Assembleren Manas är en del av CGRA kompilatorn i Vesyla-II, som används för att omvandla informationen från källkod till det specifika språket. En grupp av IRs associerad med instruktionsuppsättningen av DRRA spelar en viktig roll i denna transformation. Manuellt arbete krävs för att optimera resultatet av transformationen genom att konfigurera DRRA instruktionerna och organisera förhållandet mellan dem. I detta examensarbete skapade vi ett grafiskt användargränssnitt ManasUI för att hjälpa Vesyla-II utvecklare att grafiskt hantera omvandlingen av källkod. Vi följer det normala tillvägagångsättet för mjukvaruutveckling och börjar med att dokumentera och arbeta fram kraven för ManasUI baserat på bakgrundskunskapen om den grafiska IRs som tillämpas i Vesyla-II. Sedan designar och implementerar vi prototypen av ManasUI baserat på webbapplikationens lösning. ManasUI tar DRRA instruktionsuppsättningen eller en av de IRs — HMTDG som indata och representerar grafiskt begreppen om nod, hierarki och beroendeförhållande i HMTDG. Canvas-komponenten i ManasUI ger ett praktiskt gränssnitt för användare att interagera med grafen för HMTDG direkt. Funktionaliteten, skalbarheten och portabiliteten för ManasUI är verifierad genom att genomföra en grupp testfall. Testfallen är utformade för att täcka alla användarscenarier. Testresultaten visar att prototypen av ManasUI uppfyller alla krav som vi har identifierat.
80

Development of Parallel Architectures for Radar/Video Signal Processing Applications

Jarrah, Amin January 2014 (has links)
No description available.

Page generated in 2.3175 seconds