• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 15
  • 8
  • 4
  • 3
  • 3
  • Tagged with
  • 33
  • 33
  • 10
  • 9
  • 9
  • 7
  • 7
  • 6
  • 6
  • 6
  • 5
  • 5
  • 5
  • 5
  • 5
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

A Coarse Grained Reconfigurable Architecture Framework Supporting Macro-Dataflow Execution

Varadarajan, Keshavan 12 1900 (has links) (PDF)
A Coarse-Grained Reconfigurable Architecture (CGRA) is a processing platform which constitutes an interconnection of coarse-grained computation units (viz. Function Units (FUs), Arithmetic Logic Units (ALUs)). These units communicate directly, viz. send-receive like primitives, as opposed to the shared memory based communication used in multi-core processors. CGRAs are a well-researched topic and the design space of a CGRA is quite large. The design space can be represented as a 7-tuple (C, N, T, P, O, M, H) where each of the terms have the following meaning: C -choice of computation unit, N -choice of interconnection network, T -Choice of number of context frame (single or multiple), P -presence of partial reconfiguration, O choice of orchestration mechanism, M -design of memory hierarchy and H host-CGRA coupling. In this thesis, we develop an architectural framework for a Macro-Dataflow based CGRA where we make the following choice for each of these parameters: C -ALU, N -Network-on-Chip (NoC), T -Multiple contexts, P -support for partial reconfiguration, O -Macro Dataflow based orchestration, M -data memory banks placed at the periphery of the reconfigurable fabric (reconfigurable fabric is the name given to the interconnection of computation units), H -loose coupling between host processor and CGRA, enabling our CGRA to execute an application independent of the host-processor’s intervention. The motivations for developing such a CGRA are: To execute applications efficiently through reduction in reconfiguration time (i.e. the time needed to transfer instructions and data to the reconfigurable fabric) and reduction in execution time through better exploitation of all forms of parallelism: Instruction Level Parallelism (ILP), Data Level Parallelism (DLP) and Thread/Task Level Parallelism (TLP). We choose a macro-dataflow based orchestration framework in combination with partial reconfiguration so as to ease exploitation of TLP and DLP. Macro-dataflow serves as a light weight synchronization mechanism. We experiment with two variants of the macro-dataflow orchestration units, namely: hardware controlled orchestration unit and the compiler controlled orchestration unit. We employ a NoC as it helps reduce the reconfiguration overhead. To permit customization of the CGRA for a particular domain through the use of domain-specific custom-Intellectual Property (IP) blocks. This aids in improving both application performance and makes it energy efficient. To develop a CGRA which is completely programmable and accepts any program written using the C89 standard. The compiler and the architecture were co-developed to ensure that every feature of the architecture could be automatically programmed through an application by a compiler. In this CGRA framework, the orchestration mechanism (O) and the host-CGRA coupling (H) are kept fixed and we permit design space exploration of the other terms in the 7-tuple design space. The mode of compilation and execution remains invariant of these changes, hence referred to as a framework. We now elucidate the compilation and execution flow for this CGRA framework. An application written in C language is compiled and is transformed into a set of temporal partitions, referred to as HyperOps in this thesis. The macro-dataflow orchestration unit selects a HyperOp for execution when all its inputs are available. The instructions and operands for a ready HyperOp are transferred to the reconfigurable fabric for execution. Each ALU (in the computation unit) is capable of waiting for the availability of the input data, prior to issuing instructions. We permit the launch and execution of a temporal partition to progress in parallel, which reduces the reconfiguration overhead. We further cut launch delays by keeping loops persistent on fabric and thus eliminating the need to launch the instructions. The CGRA framework has been implemented using Bluespec System Verilog. We evaluate the performance of two of these CGRA instances: one for cryptographic applications and another instance for linear algebra kernels. We also run other general purpose integer and floating point applications to demonstrate the generic nature of these optimizations. We explore various microarchitectural optimizations viz. pipeline optimizations (i.e. changing value of T ), different forms of macro dataflow orchestration such as hardware controlled orchestration unit and compiler-controlled orchestration unit, different execution modes including resident loops, pipeline parallelism, changes to the router etc. As a result of these optimizations we observe 2.5x improvement in performance as compared to the base version. The reconfiguration overhead was hidden through overlapping launching of instructions with execution making. The perceived reconfiguration overhead is reduced drastically to about 9-11 cycles for each HyperOp, invariant of the size of the HyperOp. This can be mainly attributed to the data dependent instruction execution and use of the NoC. The overhead of the macro-dataflow execution unit was reduced to a minimum with the compiler controlled orchestration unit. To benchmark the performance of these CGRA instances, we compare the performance of these with an Intel Core 2 Quad running at 2.66GHz. On the cryptographic CGRA instance, running at 700MHz, we observe one to two orders of improvement in performance for cryptographic applications and up to one order of magnitude performance degradation for linear algebra CGRA instance. This relatively poor performance of linear algebra kernels can be attributed to the inability in exploiting ILP across computation units interconnected by the NoC, long latency in accessing data memory placed at the periphery of the reconfigurable fabric and unavailability of pipelined floating point units (which is critical to the performance of linear algebra kernels). The superior performance of the cryptographic kernels can be attributed to higher computation to load instruction ratio, careful choice of custom IP block, ability to construct large HyperOps which allows greater portion of the communication to be performed directly (as against communication through a register file in a general purpose processor) and the use of resident loops execution mode. The power consumption of a computation unit employed on the cryptography CGRA instance, along with its router is about 76mW, as estimated by Synopsys Design Vision using the Faraday 90nm technology library for an activity factor of 0.5. The power of other instances would be dependent on specific instantiation of the domain specific units. This implies that for a reconfigurable fabric of size 5 x 6 the total power consumption is about 2.3W. The area and power ( 84mW) dissipated by the macro dataflow orchestration unit, which is common to both instances, is comparable to a single computation unit, making it an effective and low overhead technique to exploit TLP.
22

Proposta e desenvolvimento de um algoritmo de associatividade reconfigurável em memórias cache. / Proposal and development of a reconfigurable associativity algorithm in cache memories.

Kerr Junior, Roberto Borges 25 June 2008 (has links)
A evolução constante dos processadores está aumentando cada vez o overhead dos acessos à memória. Tentando evitar este problema, os desenvolvedores de processadores utilizam diversas técnicas, entre elas, o emprego de memórias cache na hierarquia de memórias dos computadores. As memórias cache, por outro lado, não conseguem suprir totalmente as suas necessidades, sendo interessante alguma técnica que tornasse possível aproveitar melhor a memória cache. Para resolver este problema, autores propõem a utilização de técnicas de computação reconfigurável. Este trabalho analisa um trabalho na área de reconfiguração na associatividade de memórias cache, e propõe melhorias nele para uma melhor utilização de seus recursos, apresentando resultados práticos de simulações realizadas com diversas organizações de cache. / With the constant evolution of processors architecture, its getting even bigger the overhead generated with memory access. Trying to avoid this problem, some processors developers are using several techniques to improve the performance, as the use of cache memories. By the otherside, cache memories cannot supply all their needs, thats why its important some new technique that could use better the cache memory. Working on this problem, some authors are using reconfigurable computing to improve the cache memorys performance. This work analyses the reconfiguration of the cache memory associativity algorithm, and propose some improvements on this algorithm to better use its resources, showing some practical results from simulations with several cache organizations.
23

Implementation of a Low Cost Reconfigurable Transform Architecture for Multiple Video Codecs

2012 June 1900 (has links)
Currently different types of transform techniques are used by different video codecs to achieve data compression during video frame transmission. Among them, Discrete Cosine Transform (DCT) is supported by most of modern video standards. The integer DCT (Int-DCT) is an integer approximation of DCT. It can be implemented exclusively with integer arithmetic. Int-DCT proves to be highly advantageous in cost and speed for hardware implementations. In particular, the 4x4 and 8x8 block size Int-DCTs have the increased applicability at the current multimedia industry because of their simpler implementation and better de-correlation performance for high definition (HD) video signals. In this thesis, we present a fast and cost-shared reconfigurable architecture to compute variable block size Int-DCT for four modern video codecs – AVS, H.264/AVC, VC-1 and HEVC (under development). Based on the symmetric structure of the transform matrices and the similarity in matrix operations, we have developed a generalized “decompose and share” algorithm to compute the 4x4 and 8x8 block size Int-DCT. The algorithm is later applied to those four video codecs. Our shared hardware approach ensures the maximum circuit reuse during the computation. The entire architecture is multiplier free and designed with only adders and shifters to minimize hardware cost and improve working frequency. Finally, the design is implemented on a FPGA and later synthesized in CMOS 0.18um technology to compare the cost and performance with existing designs. The results show significant reduction in hardware cost and meet the requirements of real time video coding applications.
24

Proposta e implementa??o de uma arquitetura reconfigur?vel h?brida para aplica??es baseadas em fluxo de dados

Pereira, M?nica Magalh?es 21 February 2008 (has links)
Made available in DSpace on 2014-12-17T15:47:47Z (GMT). No. of bitstreams: 1 MonicaMP.pdf: 1183724 bytes, checksum: 59ab47a1731d0a647c07a25b7e4f0a84 (MD5) Previous issue date: 2008-02-21 / The increase of applications complexity has demanded hardware even more flexible and able to achieve higher performance. Traditional hardware solutions have not been successful in providing these applications constraints. General purpose processors have inherent flexibility, since they perform several tasks, however, they can not reach high performance when compared to application-specific devices. Moreover, since application-specific devices perform only few tasks, they achieve high performance, although they have less flexibility. Reconfigurable architectures emerged as an alternative to traditional approaches and have become an area of rising interest over the last decades. The purpose of this new paradigm is to modify the device s behavior according to the application. Thus, it is possible to balance flexibility and performance and also to attend the applications constraints. This work presents the design and implementation of a coarse grained hybrid reconfigurable architecture to stream-based applications. The architecture, named RoSA, consists of a reconfigurable logic attached to a processor. Its goal is to exploit the instruction level parallelism from intensive data-flow applications to accelerate the application s execution on the reconfigurable logic. The instruction level parallelism extraction is done at compile time, thus, this work also presents an optimization phase to the RoSA architecture to be included in the GCC compiler. To design the architecture, this work also presents a methodology based on hardware reuse of datapaths, named RoSE. RoSE aims to visualize the reconfigurable units through reusability levels, which provides area saving and datapath simplification. The architecture presented was implemented in hardware description language (VHDL). It was validated through simulations and prototyping. To characterize performance analysis some benchmarks were used and they demonstrated a speedup of 11x on the execution of some applications / O aumento na complexidade das aplica??es vem exigindo dispositivos cada vez mais flex?veis e capazes de alcan?ar alto desempenho. As solu??es de hardware tradicionais s?o ineficientes para atender as exig?ncias dessas aplica??es. Processadores de prop?sito geral, embora possuam flexibilidade inerente devido ? capacidade de executar diversos tipos de tarefas, n?o alcan?am alto desempenho quando comparados ?s arquiteturas de aplica??o espec?fica. Este ?ltimo, por ser especializado em uma pequena quantidade de tarefas, alcan?a alto desempenho, por?m n?o possui flexibilidade. Arquiteturas reconfigur?veis surgiram como uma alternativa ?s abordagens convencionais e vem ganhado espa?o nas ?ltimas d?cadas. A proposta desse paradigma ? alterar o comportamento do hardware de acordo com a aplica??o a ser executada. Dessa forma, ? poss?vel equilibrar flexibilidade e desempenho e atender a demanda das aplica??es atuais. Esse trabalho prop?e o projeto e a implementa??o de uma arquitetura reconfigur?vel h?brida de granularidade grossa, voltada a aplica??es baseadas em fluxo de dados. A arquitetura, denominada RoSA, consiste de um bloco reconfigur?vel anexado a um processador. Seu objetivo ? explorar paralelismo no n?vel de instru??o de aplica??es com intenso fluxo de dados e com isso acelerar a execu??o dessas aplica??es no bloco reconfigur?vel. A explora??o de paralelismo no n?vel de instru??o ? feita em tempo de compila??o e para tal, esse trabalho tamb?m prop?e uma fase de otimiza??o para a arquitetura RoSA a ser inclu?da no compilador GCC. Para o projeto da arquitetura esse trabalho tamb?m apresenta uma metodologia baseada no reuso de hardware em caminho de dados, denominada RoSE. Sua proposta ? visualizar as unidades reconfigur?veis atrav?s de n?veis de reusabilidade, que permitem a economia de ?rea e a simplifica??o do projeto do caminho de dados da arquitetura. A arquitetura proposta foi implementada em linguagem de descri??o de hardware (VHDL). Sua valida??o deu-se atrav?s de simula??es e da prototipa??o em FPGA. Para an?lise de desempenho foram utilizados alguns estudos de caso que demonstraram uma acelera??o de at? 11 vezes na execu??o de algumas aplica??es
25

Implementa??o de processador banda base ofdma para downlink lte em fpga

Silva, Bruno Leonardo Mendes Tavares 31 March 2011 (has links)
Made available in DSpace on 2014-12-17T14:55:50Z (GMT). No. of bitstreams: 1 BrunoLMTS_DISSERT.pdf: 3836374 bytes, checksum: 430e05d393bcb665a7880036b61844c2 (MD5) Previous issue date: 2011-03-31 / This work treats of an implementation OFDMA baseband processor in hardware for LTE Downlink. The LTE or Long Term Evolution consist the last stage of development of the technology called 3G (Mobile System Third Generation) which offers an increasing in data rate and more efficiency and flexibility in transmission with application of advanced antennas and multiple carriers techniques. This technology applies in your physical layer the OFDMA technical (Orthogonal Frequency Division Multiple Access) for generation of signals and mapping of physical resources in downlink and has as base theoretical to OFDM multiple carriers technique (Orthogonal Frequency Division Multiplexing). With recent completion of LTE specifications, different hardware solutions have been developed, mainly, to the level symbol processing where the implementation of OFDMA processor in base band is commonly considered, because it is also considered a basic architecture of others important applications. For implementation of processor, the reconfigurable hardware offered by devices as FPGA are considered which shares not only to meet the high requirements of flexibility and adaptability of LTE as well as offers possibility of an implementation quick and efficient. The implementation of processor in reconfigurable hardware meets the specifications of LTE physical layer as well as have the flexibility necessary for to meet others standards and application which use OFDMA processor as basic architecture for your systems. The results obtained through of simulation and verification functional system approval the functionality and flexibility of processor implemented / Esta disserta??o trata da implementa??o de um processador banda base em hardware para Downlink LTE. O LTE ou Long Term Evolution compreende o ?ltimo est?gio de desenvolvimento das tecnologias chamadas de 3G (Telefonia M?vel de Terceira Gera??o) que prov? um incremento nas taxas de dados e maior efici?ncia e flexibilidade na transmiss?o com emprego de t?cnicas avan?adas de antenas e de t?cnicas de transmiss?o de m?ltiplas portadoras. Esta tecnologia aplica em sua camada f?sica a t?cnica OFDMA (Orthogonal F requency Division Multiple Access) para gera??o de sinais e mapeamento dos recursos f?sicos no downlink e tem como base te?rica ? t?cnica de m?ltiplas portadoras OFDM (Orthogonal Frequency Division Multiplexing). Com recente finaliza??o das especifica??es da tecnologia LTE, diversas solu??es em hardware tem sido propostas e desenvolvidas, principalmente, ao n?vel de processamento de s?mbolo em que a implementa??o do processador OFDMA em banda base ? comumente considerada, visto que ela ? tamb?m considerada como arquitetura b?sica de outras importantes aplica??es. Para implementa??o do processador, hardwares reconfigur?veis oferecidos por dispositivos como FPGA s?o considerados que visa n?o s? atender os altos requisitos de flexibilidade e adaptabilidade do LTE como tamb?m oferecem a possibilidade de uma implementa??o r?pida e eficiente. A implementa??o do processador em hardware reconfigur?vel atendeu as especifica??es da camada f?sica LTE bem como se mostrou flex?vel o suficiente para atender outros padr?es e aplica??es que utilizem o processador OFDMA como arquitetura b?sica de seus sistemas. Os resultados obtidos atrav?s de simula??o e verifica??o funcional do sistema atestam a funcionalidade e a flexibilidade do processador implementado
26

Proposta e desenvolvimento de um algoritmo de associatividade reconfigurável em memórias cache. / Proposal and development of a reconfigurable associativity algorithm in cache memories.

Roberto Borges Kerr Junior 25 June 2008 (has links)
A evolução constante dos processadores está aumentando cada vez o overhead dos acessos à memória. Tentando evitar este problema, os desenvolvedores de processadores utilizam diversas técnicas, entre elas, o emprego de memórias cache na hierarquia de memórias dos computadores. As memórias cache, por outro lado, não conseguem suprir totalmente as suas necessidades, sendo interessante alguma técnica que tornasse possível aproveitar melhor a memória cache. Para resolver este problema, autores propõem a utilização de técnicas de computação reconfigurável. Este trabalho analisa um trabalho na área de reconfiguração na associatividade de memórias cache, e propõe melhorias nele para uma melhor utilização de seus recursos, apresentando resultados práticos de simulações realizadas com diversas organizações de cache. / With the constant evolution of processors architecture, its getting even bigger the overhead generated with memory access. Trying to avoid this problem, some processors developers are using several techniques to improve the performance, as the use of cache memories. By the otherside, cache memories cannot supply all their needs, thats why its important some new technique that could use better the cache memory. Working on this problem, some authors are using reconfigurable computing to improve the cache memorys performance. This work analyses the reconfiguration of the cache memory associativity algorithm, and propose some improvements on this algorithm to better use its resources, showing some practical results from simulations with several cache organizations.
27

Ferramentas e metodologias de desenvolvimento para sistemas parcialmente reconfiguráveis. / Development tools and methodologies for partial reconfigurable systems.

Valiante Filho, Filippo 19 May 2008 (has links)
Alguns tipos de FPGA (Field Programmable Gate Array) possuem a capacidade de serem reconfigurados parcialmente em tempo de execução formando um Sistema Parcialmente Reconfigurável (SPR), cuja utilização traz diversas vantagens dentre as quais a redução de custos. A maior utilização de SPRs enfrenta, como um dos fatores limitantes, a dificuldade de acesso e de utilização de ferramentas de desenvolvimento apropriadas. Este trabalho aborda os SPRs, suas aplicações e uma análise das ferramentas de desenvolvimento existentes. posteriormente dedica-se ao aperfeiçoamento de uma dessas ferramentas, o PARBIT, com o desenvolvimento de uma interface gráfica de usuário (GUI, -- Graphical User Interface) e a atualização de sua metodologia de desenvolvimento. As metodologias de projeto suportadas pelo fabricante do FPGA também são apresentadas. As metodologias são validadas através do projeto de um SPR. / Some types of FPGA (Field Programmable Gate Array) can be partially reconfigured during run-time forming a Partial Reconfigurable System (PRS). The use of PRSs brings several advantages like cost reduction. A larger use of PRSs faces a limiting factor: the difficult to access and use appropriate development tools. This work shows the PRSs, its applications and an analysis of the existing development tools. Later, it dedicates to the improvement of one of these tools, the PARBIT, developing a graphical user interface (GUI) and updating its project methodology. The project methodologies supported by the manufacturer of the FPGA are also presented. The methodologies are validated through the design of a PRS.
28

Ferramentas e metodologias de desenvolvimento para sistemas parcialmente reconfiguráveis. / Development tools and methodologies for partial reconfigurable systems.

Filippo Valiante Filho 19 May 2008 (has links)
Alguns tipos de FPGA (Field Programmable Gate Array) possuem a capacidade de serem reconfigurados parcialmente em tempo de execução formando um Sistema Parcialmente Reconfigurável (SPR), cuja utilização traz diversas vantagens dentre as quais a redução de custos. A maior utilização de SPRs enfrenta, como um dos fatores limitantes, a dificuldade de acesso e de utilização de ferramentas de desenvolvimento apropriadas. Este trabalho aborda os SPRs, suas aplicações e uma análise das ferramentas de desenvolvimento existentes. posteriormente dedica-se ao aperfeiçoamento de uma dessas ferramentas, o PARBIT, com o desenvolvimento de uma interface gráfica de usuário (GUI, -- Graphical User Interface) e a atualização de sua metodologia de desenvolvimento. As metodologias de projeto suportadas pelo fabricante do FPGA também são apresentadas. As metodologias são validadas através do projeto de um SPR. / Some types of FPGA (Field Programmable Gate Array) can be partially reconfigured during run-time forming a Partial Reconfigurable System (PRS). The use of PRSs brings several advantages like cost reduction. A larger use of PRSs faces a limiting factor: the difficult to access and use appropriate development tools. This work shows the PRSs, its applications and an analysis of the existing development tools. Later, it dedicates to the improvement of one of these tools, the PARBIT, developing a graphical user interface (GUI) and updating its project methodology. The project methodologies supported by the manufacturer of the FPGA are also presented. The methodologies are validated through the design of a PRS.
29

Architecture matérielle et flot de programmation associé pour la conception de systèmes numériques tolérants aux fautes / Hardware architecture and associated programming flow for the design of digital fault-tolerant systems

Peyret, Thomas 02 December 2014 (has links)
Que ce soit dans l’automobile avec des contraintes thermiques ou dans l’aérospatial et lenucléaire soumis à des rayonnements ionisants, l’environnement entraîne l’apparition de fautesdans les systèmes électroniques. Ces fautes peuvent être transitoires ou permanentes et vontinduire des résultats erronés inacceptables dans certains contextes applicatifs. L’utilisation decomposants dits « rad-hard » est parfois compromise par leurs coûts élevés ou les difficultésd’approvisionnement liés aux règles d’exportation.Cette thèse propose une approche conjointe matérielle et logicielle indépendante de la technologied’intégration permettant d’utiliser des composants numériques programmables dans desenvironnements susceptibles de générer des fautes. Notre proposition comporte la définitiond’une Architecture Reconfigurable à Gros Grains (CGRA) capable d’exécuter des codes applicatifscomplets mais aussi l’ensemble des mécanismes matériels et logiciels permettant de rendrecette architecture tolérante aux fautes. Ce résultat est obtenu par l’association de redondance etde reconfiguration dynamique du CGRA en s’appuyant sur une banque de configurations généréepar une chaîne de programmation complète. Cette chaîne outillée repose sur un flot permettantde porter un code sous forme de Control and Data Flow Graph (CDFG) sur l’architecture enobtenant un grand nombre de configurations différentes et qui permet d’exploiter au mieux lepotentiel de l’architecture.Les travaux, qui ont été validés aux travers d’expériences sur des applications du domaine dutraitement du signal et de l’image, ont fait l’objet de publications en conférences internationaleset de dépôts de brevets. / Whether in automotive with heat stress or in aerospace and nuclear field subjected to cosmic,neutron and gamma radiation, the environment can lead to the development of faults in electronicsystems. These faults, which can be transient or permanent, will lead to erroneous results thatare unacceptable in some application contexts. The use of so-called rad-hard components issometimes compromised due to their high costs and supply problems associated with exportrules.This thesis proposes a joint hardware and software approach independent of integrationtechnology for using digital programmable devices in environments that generate faults. Ourapproach includes the definition of a Coarse Grained Reconfigurable Architecture (CGRA) ableto execute entire application code but also all the hardware and software mechanisms to make ittolerant to transient and permanent faults. This is achieved by the combination of redundancyand dynamic reconfiguration of the CGRA based on a library of configurations generated by acomplete conception flow. This implemented flow relies on a flow to map a code represented as aControl and Data Flow Graph (CDFG) on the CGRA architecture by obtaining directly a largenumber of different configurations and allows to exploit the full potential of architecture.This work, which has been validated through experiments with applications in the field ofsignal and image processing, has been the subject of two publications in international conferencesand of two patents.
30

Du prototypage à l’exploitation d’overlays FPGA / From prototyping to exploitation of FPGA overlays

Bollengier, Théotime 15 January 2018 (has links)
De part leur capacité de reconfiguration et les performances qu’ils offrent, les FPGAs sont de bons candidats pour accélérer des applications dans le Cloud. Cependant, les FPGAs présentent certaines caractéristiques qui font obstacle à leur utilisation dans le Cloud et leur adoption par les clients : premièrement, la programmation des FPGAs se fait à bas niveau et demande une certaine expertise, que n’ont pas nécessairement les clients habituels du Cloud. Deuxièmement, les FPGAs ne présentent pas de mécanismes natifs permettant leur intégration dans le modèle de gestion dynamique d’une infrastructure Cloud.Dans ce travail, nous proposons d’utiliser des architectures overlay afin de faciliter l’adoption, l’intégration et l’exploitation de FPGAs dans le Cloud. Les overlays sont des architectures reconfigurables elles-mêmes implémentée sur FPGA. En tant que couche d’abstraction matérielle placée entre le FPGA et les applications, les overlays permettent de monter le niveau d’abstraction du modèle d’exécution présenté aux applications et aux utilisateurs, ainsi que d’implémenter des mécanismes facilitant leur intégration et leur exploitation dans une infrastructure Cloud.Ce travail présente une approche verticale adressant tous les aspects de la mise en œuvre d’overlays dans le Cloud en tant qu’accélérateurs reconfigurables par les clients : de la conception et l’implémentation des overlays, leur intégration sur des plateformes FPGA commerciales, la mise en place de leurs mécanismes d’exploitation, jusqu’à la réalisationde leurs outils de programmation. L’environnement réalisé est complet, modulaire et extensible, il repose en partie sur différents outils existants, et démontre la faisabilité de notre approche. / Due to their reconfigurable capability and the performance they offer, FPGAs are good candidates for accelerating applications in the cloud. However, FPGAs have some features that hinder their use in the Cloud as well as their adoption by customers : first, FPGA programming is done at low level and requires some expertise that usual Cloud clients do not necessarily have. Secondly, FPGAs do not have native mechanisms allowing them to easily fit in the dynamic execution model of the Cloud.In this work, we propose to use overlay architectures to facilitate FPGA adoption, integration, and operation in the Cloud. Overlays are reconfigurable architectures synthesized on FPGA. As hardware abstraction layers placed between the FPGA and applications, overlays allow to raise the abstraction level of the execution model presented to applications and users, as well as to implement mechanisms making them fit in a Cloud infrastructure.This work presents a vertical approach addressing all aspects of overlay operation in the Cloud as reconfigurable accelerators programmable by tenants : from designing and implementing overlays, integrating them on commercial FPGA platforms, setting up their operating mechanisms, to developping their programming tools. The environment developped in this work is complete, modular and extensible, it is partially based on several existing tools, and demonstrate the feasibility of our approach.

Page generated in 0.1175 seconds