• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 51
  • 10
  • 6
  • 5
  • 3
  • 3
  • 2
  • 1
  • 1
  • Tagged with
  • 103
  • 103
  • 103
  • 36
  • 27
  • 25
  • 21
  • 21
  • 20
  • 19
  • 19
  • 19
  • 17
  • 16
  • 16
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

HIERARCHICAL MEMORY SYNTHESIS IN RECONFIGURABLE COMPUTERS

OUAISS, IYAD 14 October 2002 (has links)
No description available.
12

Preemptive HW/SW-Threading by combining ESL methodology and coarse grained reconfiguration

Rößler, Marko, Heinkel, Ulrich 14 January 2014 (has links) (PDF)
Modern systems fulfil calculation tasks across the hardware- software boundary. Tasks are divided into coarse parallel subtasks that run on distributed resources. These resources are classified into a software (SW) and a hardware (HW) domain. The software domain usually contains processors for general purpose or digital signal calculations. Dedicated co-processors such as encryption or video en-/decoding units belong to the hardware domain. Nowadays, a decision in which domain a certain subtask will be executed in a system is usually taken during system level design. This is done on the basis of certain assumptions about the system requirements that might not hold at runtime. The HW/SW partitioning is static and cannot adapt to dynamically changing system requirements at runtime. Our contribution to tackle this, is to combine a ESL based HW/SW codesign methodology with a coarse grained reconfigurable System on Chip architecture. We propose this as Preemptive HW/SW-Threading.
13

Acceleration of a bioinformatics application using high-level synthesis / Accélération d'une application en bioinformatique utilisant une synthèse de haut niveau

Abbas, Naeem 22 May 2012 (has links)
Les avancées dans le domaine de la bioinformatique ont ouvert de nouveaux horizons pour la recherche en biologie et en pharmacologie. Les machines comme les algorithmes utilisées aujourd'hui ne sont cependant plus en mesure de répondre à la demande exponentiellement croissante en puissance de calcul. Il existe donc un besoin pour des plate-formes de calculs spécialisées pour ce types de traitement, qui sauraient tirer partie de l'ensemble des technologie de calcul parallèle actuelles (Grilles, multi-coeurs, GPU, FPGA). Dans cette thèse nous étudions comment l'utilisation d'outils de synthèse de haut niveau peut aider à la conception d'accélérateurs matériels spécialisés massivement parallèles. Ces outils permettent de réduire considérablement les temps de conception mais ne sont pas conçus pour produire des architectures matérielles massivement parallèles efficaces. Les travaux de cette thèse se sont attachés à dégager des techniques de parallélisation, ainsi que les moyens d'exprimer efficacement ce parallélisme, pour des outils de type HLS. Nous avons appliqué ces résultats à une application de bioinformatique connue sous le nom de HMMER. Cet algorithme qui pourrait être un bon candidat à une accélération matérielle est très délicat à paralléliser. Nous avons proposé un schéma d'exécution parallèle original, basé sur une réécriture mathématique de l'algorithme, qui a été suivi par une exploration des schéma d'exécution matériels possible sur FPGA. Ce résultat à ensuite donnée lieu à une mise en œuvre sur un accélérateur matériel et a démontré des facteurs d'accélération encourageants. Les travaux démontre également la pertinence des outils de HLS pour la conception d'accélérateur matériel pour le calcul haute performance en Bioinformatique, à la fois pour réduire les temps de conception, mais aussi pour obtenir des architectures plus efficaces et plus facilement reciblables d'un plateforme à une autre. / The revolutionary advancements in the field of bioinformatics have opened new horizons in biological and pharmaceutical research. However, the existing bioinformatics tools are unable to meet the computational demands, due to the recent exponential growth in biological data. So there is a dire need to build future bioinformatics platforms incorporating modern parallel computation techniques. In this work, we investigate FPGA based acceleration of these applications, using High-Level Synthesis. High-Level Synthesis tools enable automatic translation of abstract specifications to the hardware design, considerably reducing the design efforts. However, the generation of an efficient hardware using these tools is often a challenge for the designers. Our research effort encompasses an exploration of the techniques and practices, that can lead to the generation of an efficient design from these high-level synthesis tools. We illustrate our methodology by accelerating a widely used application -- HMMER -- in bioinformatics community. HMMER is well-known for its compute-intensive kernels and data dependencies that lead to a sequential execution. We propose an original parallelization scheme based on rewriting of its mathematical formulation, followed by an in-depth exploration of hardware mapping techniques of these kernels, and finally show on-board acceleration results. Our research work demonstrates designing flexible hardware accelerators for bioinformatics applications, using design methodologies which are more efficient than the traditional ones, and where resulting designs are scalable enough to meet the future requirements.
14

Utilisation du modèle polyédrique pour la synthèse d'architectures pipelinées / Synthesis of pipelined architectures using the polyhedral model

Morvan, Antoine 28 June 2013 (has links)
Grâce aux progrès réalisés dans le domaine des semi-conducteurs, les plateformes matérielles embarquées sont capables de satisfaire les contraintes de performances d'applications de plus en plus complexes. Cette augmentation conduit à une explosion des coûts de conception, ce qui pousse les concepteurs de ces plateformes à utiliser des outils travaillant à des niveaux d’abstraction plus élevés. Aujourd’hui, les outils de synthèse de haut niveau opèrent sur des descriptions C/C++ pour en générer des accélérateurs matériels spécialisés. Ces outils offrent des gains en productivité significatifs par rapport à la génération précédente, qui opérait sur des descriptions structurelles de l’architecture en VHDL ou Verilog. Ces descriptions algorithmiques doivent être retravaillées pour que les outils puissent générer des circuits performants. Pour faciliter cette tâche, une solution consiste à mettre en œuvre une boite à outils pour des transformations source-à-source orientées synthèse de haut niveau. En particulier, cette thèse s’intéresse aux transformations de boucles, avec pour objectif d’améliorer les performances en exposant des boucles parallèles et en améliorant la localité des accès mémoire. En nous appuyant sur une représentation des boucles dans le modèle polyédrique, nous proposons une approche qui améliore l’applicabilité du pipeline de nids de boucles en vérifiant sa légalité de manière plus précise que les approches existantes. De plus, lorsque la vérification échoue, nous proposons une technique de correction qui insère statiquement des états d’attente pour assurer la légalité du pipeline. Enfin, ce pipeline est mis en œuvre en utilisant une technique de génération de code qui met les nids de boucles à plat. Ces contributions ont été implémentées dans l’infrastructure de compilation source-à-source Gecos, avant d’être appliquées à un ensemble de benchmarks représentatifs des noyaux de calculs cibles de la synthèse de haut niveau. Les résultats montrent un gain en performances significatif, avec un surcoût en surface modéré. / Due to the advances in semiconductor technologies, embedded hardware is capable of satisfying the performance constraints of increasingly complex applications. This leads to a design cost explosion, thus pushing the hardware designers to use tools working with higher levels of abstractions. High-Level Synthesis tools generate custom hardware accelerators out of C/C++ specifications. They offer significant productivity gains compared to the previous generation of tools that worked at the level of hardware description languages, such as VHDL or Verilog. These higher level specifications have to be reworked in order for the High-Level Synthesis tools to generate efficient hardware accelerators. To ease this task, one solution is to provide a source-to-source transformation toolbox targeting High-Level Synthesis. Specifically, this thesis explores loop transformations in order to improve performance by exposing parallel loops and improving the locality of memory accesses. Using polyhedral representation of loop nests, we propose an approach to improve the applicability of nested loop pipelining by verifying its legality in a more precise way than existing approaches. Moreover, we propose a correction mechanism that statically inserts wait states for enforcing the pipeline legality for cases when the verification fails. The resulting pipeline is implemented using a code generation technique that flattens the loop nests. These contributions have been implemented within the GeCoS source-to-source compilation infrastructure, and applied to a set of benchmarks targeted towards High-Level Synthesis. Results show significant performance improvement at the price of a moderate area overhead.
15

High Level Synthesis for Optimising Hybrid Electric Vehicle Fuel Consumption Using FPGAs and Dynamic Programming

Skarman, Frans January 2019 (has links)
The fuel usage of a hybrid electric vehicle can be reduced by strategically combining the usage of the combustion engine with the electric motor. One method to determine an optimal split between the two is to use dynamic programming. However, the amount of computations grows exponentially with the amount of states which makes its usage difficult on sequential hardware. This thesis project explores the usage of FPGAs for speeding up the required computations to possibly allow the optimisation to run in real time in the vehicle. A tool to convert a vehicle model to a hardware description language was developed and evaluated. The current version does not run fast enough to run in real time, but some optimisations which would allow that are proposed.
16

Detection algorithms and FPGA implementations for SC-FDMA uplink receivers

Hänninen, T. (Tuomo) 29 June 2018 (has links)
Abstract The demand in mobile broadband communications is increasing dramatically. It is expected that 1000 times more mobile-network capacity will be needed within 10 years. Multiple-input, multiple-output (MIMO) antenna configuration and spatial multiplexing are among the essential techniques for reaching the targets. This creates motivation for study of advanced receivers for combating inter-antenna interference (IAI) and inter-symbol interference (ISI). While various receiver structures have been extensively considered for MIMO receivers, the emphasis has been on those operating in downlink orthogonal frequency-division multiple access (OFDM) systems, wherein ISI is not a problem. Advanced receiver structures for single-carrier frequency-division multiple access (SC-FDMA) uplink systems were studied and analysed. Various receivers were compared via MATLAB simulations, with the objective being to gain solid understanding of how they perform in different channel environments. An efficient combination of IAI and ISI equalisation for SC-FDMA receivers is proposed. The proposed receiver architecture is shown to be a considerable improvement over the conventional linear minimum mean-square error (LMMSE) receiver. Several MIMO detector algorithms and their performance–complexity characteristics are presented. The K-best algorithm with a list size of 8 is shown to be the best option for practical MIMO detector implementation of this receiver in the 4x4 MIMO 64-level quadrature amplitude modulation (QAM) scenario. The second objective involved examining the implementation aspects of the 8-best receiver to achieve good understanding of the complexity of various implementation architectures. It emerged that avoiding the sorting operation in the 8-best list sphere detector (LSD) tree-search algorithm implementation is not recommendable in the 4x4 MIMO 64-QAM scenario. Several field-programmable gate array (FPGA) implementations were carried out, with a range of high-level synthesis (HLS) tools. It is shown that HLS tools have improved significantly and are especially favourable for prototyping of large designs. Additionally, the importance of FPGA technology selection is addressed. Smaller silicon technology should be exploited if base-station baseband processing power consumption is to be minimised. The potential performance or complexity-related gain with the latest FPGAs should be taken into account in comparison of the performance–complexity characteristics of the algorithms. Differences of a few tens of per cent in estimated complexity or performance between two algorithms are often below the threshold of what can be gained or lost in the practical implementation process. / Tiivistelmä Tiheään asuttujen kaupunkien uudet langattomat palvelut tarvitsevat tietoliikenneverkkoja, jotka mahdollistavat suuremman tiedonsiirtonopeuden ja kapasiteetin kuin sen, jonka nykyiset mobiiliverkot voivat tarjota. On arveltu, että mobiiliverkkojen kapasiteetin tarve tuhatkertaistuu seuraavan kymmenen vuoden aikana. Tuhatkertainen kapasiteetti on arvioitu saavutettavan kasvattamalla kolmea eri osa-aluetta kymmenkertaiseksi: taajuusspektrin määrä, spektrin käytön tehokkuus sekä tukiasematiheys. Tämä väitöskirja keskittyy spektrin käytön tehokkuuden kasvattamiseen. Moniantennitoteutus (multiple-input multiple-output, MIMO) on siinä välttämätön. MIMO-tekniikkaa hyödyntävien solukkojärjestelmien tukiasemavastaanottimissa tarvitaan melko monimutkainen kanavakorjain sekä ilmaisin, joiden algoritmien optimointi ja toteutus ymmärretään vielä sangen puutteellisesti. Väitöskirjatutkimuksen päätavoitteena on tutkia edistyksellisiä vastaanotinrakenteita, joilla saavutetaan LTE-A-standardin tavoitetiedonsiirtonopeus kohtuullisella kompleksisuudella. Työssä keskitytään ns. nousevaan siirtosuuntaan (uplink) eli päätelaitteesta tukiasemaan tapahtuvaan tiedonsiirtoon, jossa käytetään yhden kantoaallon taajuusjakomonikäyttötekniikkaa (single-carrier frequency-division multiple-access, SC-FDMA) ortognaalisen taajuusjakomonikäytön (orthogonal frequency division multiple access, OFDMA) sijaan. Eri vastaanotinrakenteita ja näiden ilmaisinalgoritmeja vertaillaan tietokonesimuloinnein MATLAB-ympäristössä. Väitöskirjassa ehdotetaan kaksiosaista vastaanotinrakennetta, jossa antennien välinen keskinäishäiriö (inter antenna interference, IAI) ja symbolien välinen keskinäisvaikutus (intersymbol interference, ISI) poistetaan kahdessa eri vaiheessa. Tietokoneimulaatiot osoittavat ko. rakenteen parantavan suorituskykyä huomattavasti perinteiseen lineaariseen keskineliövirheen minimoivaan (linear minimum mean square error, LMMSE) vastaanottimeen verrattuna. Nk. K parasta polkua valitsevan MIMO-ilmaisinalgoritmin listan koolla kahdeksan todetaan tarjoavan 4x4 MIMO 64-tasoisen kvadratuuriamplitudimodulaation (quadrature amplitude modulation, QAM) ympäristössä parhaan kompromissin suorituskyvyn ja kompleksisuuden suhteen. Käytännön toteutettavuuden kannalta keskitytään ohjelmoitavaan digitaalipiiritoteutukseen (field-programmable gate array, FGPA) ja ns. korkean tason synteesi (high-level synthesis, HLS) -työkalujen käyttöön vastaanottimen suunnittelussa. K parasta polkua valitsevan MIMO-ilmaisinalgoritmin arkkitehtuurivertailut osoittavat, että sinänsä vaativaa lajittelualgoritmia ei aina kannata yrittää välttää kirjallisuudessa aikaisemmin ehdotetulla ratkaisulla. Useita eri HLS työkaluja käytetään FPGA toteutuksissa ja todetaan että työkalut ovat kehittyneet huomattavasti viimeisen kahdeksan vuoden aikana. Lisäksi todetaan, että 16 nm viivanleveyden piireillä voidaan saavuttaa noin 15 % suurempi ilmaisunopeus ja 60 % pienempi tehonkulutus verrattuna 28 nm viivanleveyttä käyttäviin piireihin. Erityisesti potentiaali tehonkulutuksen minimoiseksi kannattaa hyödyntää, mikäli signaalinkäsittely näyttelee merkittävää roolia vastaanottimen kokonaistehonkulutuksessa. Kokonaisuutena todetaan, että toteutukseen liittyvät valinnat sekä vaikutus lopputulokseen, tulisi ottaa huomioon jo algoritmien valinnassa. Pieni ero kahden eri algoritmin suorituskyvyn välillä häviää helposti toteutusvaiheen ratkaisujen vaikutusten alle.
17

Using Source-to-Source Transformations to Add Debug Observability to HLS-Synthesized Circuits

Monson, Joshua Scott 01 March 2016 (has links)
This dissertation introduces a novel approach for exposing the internal, source-level expressions of circuits generated by high-level synthesis (HLS) for in-circuit debug. The approach uses source-to-source transformations to instrument specific source-level expressions with debug ports. These debug ports allow a user to connect a debugging instrument (e.g. an embedded logic analyzer) to record the activity of the expression corresponding to the debug port. This dissertation demonstrates that a debugging solution based on these source-to-source transformations is feasible and that individual debug ports can be added for a cost of a 1-2% increase in circuit area on average. It also introduces another transformation that permits pointer-valued expressions to be instrumented for debug. It is demonstrated that all pointers in the CHStone benchmarks can be instrumented for an average 4% increase in circuit area. The debug port transformations are demonstrated on two HLS tools – Vivado HLS and Legup. The architecture of the source-to-source compiler allowed the necessary adaptations for the second tool (Legup) to be implemented using a minimal amount of additional code. Due to limitations in the Legup compiler an additional optimization was added to reduce the latency overhead incurred by the debug ports. User manuals and other documentation from 10 additional C-based HLS tools is examined to determine whether they are amenable to debug instrumentation using the source-to-source transformations. Of the 10 additional HLS tools examined, 6 were amenable to the transformations, 3 were likely to be amenable, and 1 was not. This dissertation estimates the cost of a complete debugging solution (i.e. one with debug ports and a debugging instrument) and identifies a possible worst case bound for adding debug ports. Finally, this dissertation analyzes two different debugging instruments and determines which instrument would be best for most HLS circuit mapped to FPGAs. It then estimates the overhead of this debugging solution.
18

Enabling Hardware/Software Co-design in High-level Synthesis

Choi, Jongsok 21 November 2012 (has links)
A hardware implementation can bring orders of magnitude improvements in performance and energy consumption over a software implementation. Hardware design, however, can be extremely difficult. High-level synthesis, the process of compiling software to hardware, promises to make hardware design easier. However, compiling an entire software program to hardware can be inefficient. This thesis proposes hardware/software co-design, where computationally intensive functions are accelerated by hardware, while remaining program segments execute in software. The work in this thesis builds a framework where user-designated software functions are automatically compiled to hardware accelerators, which can execute serially or in parallel to work in tandem with a processor. To support multiple parallel accelerators, new multi-ported cache designs are presented. These caches provide low-latency high-bandwidth data to further improve the performance of accelerators. An extensive range of cache architectures are explored, and results show that certain cache architectures significantly outperform others in a processor/accelerator system.
19

Enabling Hardware/Software Co-design in High-level Synthesis

Choi, Jongsok 21 November 2012 (has links)
A hardware implementation can bring orders of magnitude improvements in performance and energy consumption over a software implementation. Hardware design, however, can be extremely difficult. High-level synthesis, the process of compiling software to hardware, promises to make hardware design easier. However, compiling an entire software program to hardware can be inefficient. This thesis proposes hardware/software co-design, where computationally intensive functions are accelerated by hardware, while remaining program segments execute in software. The work in this thesis builds a framework where user-designated software functions are automatically compiled to hardware accelerators, which can execute serially or in parallel to work in tandem with a processor. To support multiple parallel accelerators, new multi-ported cache designs are presented. These caches provide low-latency high-bandwidth data to further improve the performance of accelerators. An extensive range of cache architectures are explored, and results show that certain cache architectures significantly outperform others in a processor/accelerator system.
20

High-Level Synthesis of Software Function Calls

TOMIYAMA, Hiroyuki, KANBARA, Hiroyuki, ISHIMORI, Yoshiyuki, ISHIURA, Nagisa, NISHIMURA, Masanari 01 December 2008 (has links)
No description available.

Page generated in 0.0785 seconds