Global ETD Search

31	Hardware Support for FPGA Resource Elasticity Aliyeva, Fidan January 2022 (has links) FPGAs are commonly used in cloud computing due to their ability to be programmed as a processor that serves a specific purpose; hence, achieving high performance at low power. On the other hand, FPGAs have a lot of resources available, which are wasted if they host a single application or serve a single user’s request. Partially Reconfiguration technology enables FPGAs to divide their resources into different regions and then dynamically reprogram those regions with various applications during runtime. Therefore, they are considered as a good solution to eliminate the underutilization resource problem. Nevertheless, the sizes of these regions are static; they cannot be increased or decreased once they are defined. Thereby, it leads to the underutilization of reconfigurable region resources. This thesis addresses this problem, i.e., how to dynamically increase/decrease partially reconfigurable FPGA resources matching an application’s needs. Our solution enables expanding and contracting the FPGA resources allocated to an application by 1) application acceleration requirements expressed in multiple smaller modules which are configured into multiple reconfigurable regions assigned to the application dynamically and 2) providing a low - area - overhead, configurable, and isolated communication mechanism by adjusting crossbar interconnect and WISHBONE interface among those multiple reconfigurable regions. / FPGA - kretsar har en förmåga att programmeras som processorer med ett specifikt syfte vilket gör att de ofta används i molnlösningar. Det tager hög prestanda med låg effektförbrukning. Å andra sidan disponerar FPGA - kretsar över stora resurser, vilka är bortkastade om de enbart används av en applikation eller endast på en användares förfrågan. Partiellt omkonfigurerbara teknologier tillåter FPGA - kretsar att fördela resurser mellan olika regioner, och sen dynamiskt omprogrammera regioner med olika applikationer vid körning. Därför betraktas partiellt omkonfigurerbara teknologier som en bra lösning för att minimera underutnyttjande av resurser. Storleken på regionerna är statiska och kan inte ändras när de väl definierats, vilket leder till underutnyttjande av de omkonfigurerbara regionernas resurser. Denna uppsats angriper problemet med dynamisk allokering av partiellt omkonfigurerbara FPGA - resurser utifrån applikationens behov. Vår lösning möjliggör ökning och minskning av FPGA - resurser allokerade till en applikation genom 1) accelerering av applikationen genom att applikationen tilldelas flera mindre moduler konfigurerade till dynamiskt omkonfigurerbara regioner, och 2) tillhanda hållande av en effektiv konfigurerbar och isolerad kommunikationsmekanism, genom justering av crossbar - sammankoppling en och WISHBONE - gränssnittet hos de omkonfigurerbara regionerna. FPGA Elasticity Partial reconfiguration Crossbar WISHBONE Multicast FPGA Elasticitet Partiell omkonfigurering Crossbar WISHBONE Multicast Computer and Information Sciences Data- och informationsvetenskap
32	Metodika návrhu systémů odolných proti poruchám do omezeného implementačního prostoru na bázi FPGA / Methodology for Fault Tolerant Systems Design into Limited Implementation Area in FPGA Mičulka, Lukáš January 2017 (has links) Tato práce popisuje navrženou metodologii pro návrh systémů odolných proti poruchám v FPGA schopnou ochránit systém před projevy přechodných a trvalých poruch. Oprava přechodné poruchy je prováděna částečnou dynamickou rekonfigurací. Oprava omezeného počtu trvalých poruch je založena na použití odolných architektur využívajících menší množství zdrojů než předchozí použitá architektura. Vadná část FPGA tak není dále využívána. Tato technika je založena na použití předkompilovaných konfigurací uložených v externí paměti. Pro snížení paměťových nároků pro uložení konfiguračních bitových posloupností je použita technika relokace.
33	FPGA Floor-Planning Impact on Implementation Results Lamprecht, Jaren Tyler 14 November 2012 (has links) (PDF) The field programmable gate array (FPGA) is an attractive computational platform for many applications because of its customizable nature and modest development cost, in terms of both time and money. As FPGAs scale to increased logical capacities, designers have increased flexibility. However, the FPGA placement problem becomes more difficult at increased sizes. Increasingly, designers are encouraged to structure designs hierarchically and floor-plan. Floor planning is a manual process which maps specified design submodules to selected physical regions of the FPGA device fabric. This thesis explores several of the effects that floor-planning has on submodules and the designs they comprise. A method is developed to explore the floor-planning impact on submodules independent of a full design. Six different submodules are independently subjected to varying timing constraints and to area constraints of varying aspect ratios and area allocations. The resulting submodule minimum clock periods, routing overflows, and relocatabilities are assembled from millions of submodule implementations. The aggregate results suggest that EDA placement and routing tools can meet design constraints even with extreme combinations of submodule aspect ratio and area allocations; however, the probability of implementations meeting constraints may be low at those extremes. Separate sets of submodule floor-planning guidelines are developed to optimize for meeting minimum clock period constraints, minimizing routing overflow, and maximize relocatability. The submodule floor planning guidelines for meeting minimum clock period are verified in full design implementations. FPGA floor-plan area constraint clock constraint routing spillover partial reconfiguration submodule relocation Xilinx Electrical and Computer Engineering
34	Reconfigurable Technologies for Next Generation Internet and Cluster Computing Unnikrishnan, Deepak C. 01 September 2013 (has links) Modern web applications are marked by distinct networking and computing characteristics. As applications evolve, they continue to operate over a large monolithic framework of networking and computing equipment built from general-purpose microprocessors and Application Specific Integrated Circuits (ASICs) that offers few architectural choices. This dissertation presents techniques to diversify the next-generation Internet infrastructure by integrating Field-programmable Gate Arrays (FPGAs), a class of reconfigurable integrated circuits, with general-purpose microprocessor-based techniques. Specifically, our solutions are demonstrated in the context of two applications - network virtualization and distributed cluster computing. Network virtualization enables the physical network infrastructure to be shared among several logical networks to run diverse protocols and differentiated services. The design of a good network virtualization platform is challenging because the physical networking substrate must scale to support several isolated virtual networks with high packet forwarding rates and offer sufficient flexibility to customize networking features. The first major contribution of this dissertation is a novel high performance heterogeneous network virtualization system that integrates FPGAs and general-purpose CPUs. Salient features of this architecture include the ability to scale the number of virtual networks in an FPGA using existing software-based network virtualization techniques, the ability to map virtual networks to a combination of hardware and software resources on demand, and the ability to use off-chip memory resources to scale virtual router features. Partial-reconfiguration has been exploited to dynamically customize virtual networking parameters. An open software framework to describe virtual networking features using a hardware-agnostic language has been developed. Evaluation of our system using a NetFPGA card demonstrates one to two orders of improved throughput over state-of-the-art network virtualization techniques. The demand for greater computing capacity grows as web applications scale. In state-of-the-art systems, an application is scaled by parallelizing the computation on a pool of commodity hardware machines using distributed computing frameworks. Although this technique is useful, it is inefficient because the sequential nature of execution in general-purpose processors does not suit all workloads equally well. Iterative algorithms form a pervasive class of web and data mining algorithms that are poorly executed on general purpose processors due to the presence of strict synchronization barriers in distributed cluster frameworks. This dissertation presents Maestro, a heterogeneous distributed computing framework that demonstrates how FPGAs can break down such synchronization barriers using asynchronous accumulative updates. These updates allow for the accumulation of intermediate results for numerous data points without the need for iteration-based barriers. The benefits of a heterogeneous cluster are illustrated by executing a general-class of iterative algorithms on a cluster of commodity CPUs and FPGAs. Computation is dynamically prioritized to accelerate algorithm convergence. We implement a general-class of three iterative algorithms on a cluster of four FPGAs. A speedup of 7× is achieved over an implementation of asynchronous accumulative updates on a general-purpose CPU. The system offers 154× speedup versus a standard Hadoop-based CPU-workstation cluster. Improved performance is achieved by clusters of FPGAs. Distributed computing FPGA Heterogeneous computing Network virtualization Partial reconfiguration Reconfigurable computing Computer Engineering Computer Sciences Electrical and Computer Engineering
35	Reconfigurable Computing For Video Coding Huang, Jian 01 January 2010 (has links) Video coding is widely used in our daily life. Due to its high computational complexity, hardware implementation is usually preferred. In this research, we investigate both ASIC hardware design approach and reconfigurable hardware design approach for video coding applications. First, we present a unified architecture that can perform Discrete Cosine Transform (DCT), Inverse Discrete Cosine Transform (IDCT), DCT domain motion estimation and compensation (DCT-ME/MC). Our proposed architecture is a Wavefront Array-based Processor with a highly modular structure consisting of 8*8 Processing Elements (PEs). By utilizing statistical properties and arithmetic operations, it can be used as a high performance hardware accelerator for video transcoding applications. We show how different core algorithms can be mapped onto the same hardware fabric and can be executed through the pre-defined PEs. In addition to the simplified design process of the proposed architecture and savings of the hardware resources, we also demonstrate that high throughput rate can be achieved for IDCT and DCT-MC by fully utilizing the sparseness property of DCT coefficient matrix. Compared to fixed hardware architecture using ASIC design approach, reconfigurable hardware design approach has higher flexibility, lower cost, and faster time-to-market. We propose a self-reconfigurable platform which can reconfigure the architecture of DCT computations during run-time using dynamic partial reconfiguration. The scalable architecture for DCT computations can compute different number of DCT coefficients in the zig-zag scan order to adapt to different requirements, such as power consumption, hardware resource, and performance. We propose a configuration manager which is implemented in the embedded processor in order to adaptively control the reconfiguration of scalable DCT architecture during run-time. In addition, we use LZSS algorithm for compression of the partial bitstreams and on-chip BlockRAM as a cache to reduce latency overhead for loading the partial bitstreams from the off-chip memory for run-time reconfiguration. A hardware module is designed for parallel reconfiguration of the partial bitstreams. The experimental results show that our approach can reduce the external memory accesses by 69% and can achieve 400 MBytes/s reconfiguration rate. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration. Prediction algorithm of zero quantized DCT (ZQDCT) to control the run-time reconfiguration of the proposed scalable architecture has been used, and 12 different modes of DCT computations including zonal coding, multi-block processing, and parallel-sequential stage modes are supported to reduce power consumptions, required hardware resources, and computation time with a small quality degradation. Detailed trade-offs of power, throughput, and quality are investigated, and used as a criterion for self-reconfiguration to meet the requirements set by the users. Reconfigurable Computing FPGA ASIC Dynamic Partial Reconfiguration Self-reconfiguration Video Coding DCT Motion Estimation Electrical and Computer Engineering Electrical and Electronics Engineering
36	BitMaT - Bitstream Manipulation Tool for Xilinx FPGAs Morford, Casey Justin 03 January 2006 (has links) With the introduction of partially reconfigurable FPGAs, we are now able to perform dynamic changes to hardware running on an FPGA without halting the operation of the design. Module based partial reconfiguration allows the hardware designer to create multiple hardware modules that perform different tasks and swap them in and out of designated dynamic regions on an FPGA. However, the current mainstream partial reconfiguration flow provides a limited and inefficient approach that requires a strict set of guidelines to be met. This thesis introduces BitMaT, a tool that provides the low-level bitstream manipulation as a member tool of an alternative, automated, modular partial reconfiguration flow. / Master of Science Field programmable gate arrays Partial Reconfiguration Virtex-II Pro Bitstream Manipulation Bitstream Xilinx Virtex-II Dynamic Module Server
37	Management of Dynamic Reconfiguration in a Wireless Digital Communication Context / Gestion de la reconfiguration dynamique dans un contexte de communication numérique sans fil. Rihani, Mohamad-Al-Fadl 18 December 2018 (has links) Aujourd'hui, les appareils sans fil disposent généralement de plusieurs technologies d'accès radio (LTE, WiFi,WiMax, ...) pour gérer une grande variété de normes ou de technologies. Ces appareils doivent être suffisamment intelligents et autonomes pour atteindre un niveau de performance donné ou sélectionne automatiquement la meilleure technologie sans fil disponible en fonction de la disponibilité des normes. Du point de vue matériel, les périphériques System on Chip (SoC) intègrent des processeurs et des structures logiques FPGA sur la même puce avec une interconnexion rapide. Cela permet de concevoir des systèmes logiciels / matériels et de mettre en oeuvre de nouvelles techniques et méthodologies qui améliorent considérablement les performances des systèmes de communication. Dans ces dispositifs, la reconfiguration partielle dynamique (DPR) constitue une technique bien connue pour reconfigurer seulement une zone spécifique dans le FPGA tandis que d'autres parties continuent à fonctionner indépendamment. Pour évaluer quand il est avantageux d'effectuer un DPR, des techniques adaptatives ont été proposées. Ils consistent à reconfigurer automatiquement des parties du système en fonction de paramètres spécifiques. Dans cette thèse, un système de communication sans fil intelligent visant à implémenter un émetteur OFDM adaptatif et à effectuer un transfert vertical dans des réseaux hétérogènes est présenté. Une couche physique unifiée pour les réseaux WiFi-WiMax est également proposée. Un algorithme de transfert vertical intelligent (VHA) basé sur les réseaux neuronaux (NN) a été proposé pour sélectionner le meilleur standard sans fil disponible dans un réseau hétérogène. Le système a été implémenté et testé sur un ZedBoard équipé d'un Xilinx Zynq-7000-SoC. La performance du système est décrite et des résultats de simulation sont présentés afin de valider l'architecture proposée. Des mesures de puissance en temps réel ont été appliquées pour calculer l'énergie de surcharge pour l'opération de RP. De plus, des démonstrations ont été effectuées pour tester et valider le système mis en place. / Today, wireless devices generally feature multiple radio access technologies (LTE, WiFi, WiMax, ...) to handle a rich variety of standards or technologies. These devices should be intelligent and autonomous enough in order to either reach a given level of performance or automatically select the best available wireless standard. On the hardware side, System on Chip (SoC) devices integrate processors and FPGA logic fabrics on the same chip with fast inter-connection. This allows designing Software/Hardware systems. In these devices, Dynamic Partial Reconfiguration (DPR) constitutes a well-known technique for reconfiguring only a specific area within the FPGA while other parts continue to operate independently. To evaluate when it is advantageous to perform DPR, adaptive techniques have been proposed. They consist in reconfiguring parts of the system automatically according to specific parameters. In this thesis, an intelligent wireless communication system aiming at implementing an adaptive OFDM based transmitter is presented. An unified physical layer for WiFi-WiMax networks is also proposed. An intelligent Vertical Handover Algorithm (VHA) based on Neural Networks (NN) was proposed to select best available wireless standard in heterogeneous network. The system was implemented and tested on a ZedBoard which features a Xilinx Zynq-7000-SoC. The performance of the system is described and simulation results are presented in order to validate the proposed architecture. Real time power measurements have been applied to compute the overhead power for the PR operation. In addition demonstrations have been performed to test and validate the implemented system. FPGA Machine Learning Radio Cognitive Reconfiguration Partielle Réseaux de Neurones SoC Cognitive Radio FPGA Machine Learning Neural Networks Partial Reconfiguration SoC 384.5
38	Chipcflow - validação e implementação do modelo de partição e protocolo de comunicação no grafo a fluxo de dados dinâmico / Chipflow - gvalidation and implementation of the partition model and communication protocol in the dynamic data flow graph Souza Júnior, Francisco de 24 January 2011 (has links) A ferramenta ChipCflow vem sendo desenvolvida nos últimos quatro anos, inicialmente a partir de um projeto de arquitetura a fluxo de dados dinâmico em hardware reconfigurável, mas agora como uma ferramenta de compilação. Ela tem como objetivo a execução de algoritmos por meio do modelo de arquitetura a fluxo de dados associado ao conceito de dispositivos parcialmente reconfiguráveis. Sua característica principal é acelerar o tempo de execução de programas escritos em Linguagem de Programação de Alto Nível (LPAN), do inglês, High Level Languages, em particular nas partes mais intensas de processamento. Isso é feito por meio da implementação dessas partes de código diretamente em hardware reconfigurável - utilizando a tecnologia Field-programmable Gate Array (FPGA) - aproveitando ao máximo o paralelismo considerado natural do modelo a fluxo de dados e as características do hardware parcialmente reconfigurável. Neste trabalho, o objetivo é a prova de conceito do processo de partição e do protocolo de comunicação entre as partições definidas a partir de um Grafo de Fluxo de Dados (GFD), para a execução direta em hardware reconfigurável utilizando Reconfiguração Parcial Dinâmica (RPD). Foi necessário elaborar um mecanismo de partição e protocolo de comunicação entre essas partições, uma vez que a RPD insere características tecnológicas limitantes não encontradas em hardwares reconfiguráveis mais tradicionais. O mecanismo criado se mostrou parcialmente adequado à prova de conceito, significando a possibilidade de se executar GFDs na plataforma parcialmente reconfigurável. Todavia, os tempos de reconfiguração inviabilizaram a proposta inicial de se utilizar RPD para diminuir o tempo de tag matching dos GFDs dinâmicos / The ChipCflow tool has been developed over the last four years, initially from an architectural design the flow of Dynamic Data in reconfigurable hardware, but now as a compilation tool. It aims to run algorithms using the model of the data flow architecture associated with the concept of partially reconfigurable devices. Its main feature is to accelerate the execution time of programs written in High Level Languages, particularly in the most intense processing. This is done by implementing those parts of code directly in reconfigurable hardware - using FPGA technology - leveraging the natural parallelism of the data flow model and characteristics of the partially reconfigurable hardware. In this work, the main goal is the proof of concept of the partition process and protocol communication between the partitions defined from Data Flow Graph for direct execution in reconfigurable hardware using Active Partial Reconfiguration. This required a mechanism to partition and a protocol for communication between these partitions, since the Active Partial Reconfiguration inserts technological features limiting not found in traditional reconfigurable hardware. The mechanism developed is show to be partially adequate to the proof of concept, meaning the ability to run Data Flow Graphs in a platform that is partially reconfigurable. However, the reconfiguration time inserts a great overhead into the execution time, which made the proposal of the use of Active Partial Reconfiguration to decrease the time matching Data Flow Graph unfeasible Computação reconfigurável Dataflow Fluxo de dados FPGA FPGA Hardware description language Hardware reconfigurável Linguagem de descrição de hardware Partial reconfiguration Reconfigurable computing Reconfigurable hardware Reconfiguração parcial Xilins Xilinx
39	Méthodologie et architecture adaptative pour le placement efficace de tâches matérielles de tailles variables sur des partitions reconfigurables / Methodology and adaptative architecture for the effective placement of variable size material tasks on reconfigurable partition Marques, Nicolas 26 November 2012 (has links) Les architectures reconfigurables à base de FPGA sont capables de fournir des solutions adéquates pour plusieurs applications vu qu'elles permettent de modifier le comportement d'une partie du FPGA pendant que le reste du circuit continue de s'exécuter normalement. Ces architectures, malgré leurs progrès, souffrent encore de leur manque d'adaptabilité fasse à des applications constituées de tâches matérielles de taille différente. Cette hétérogénéité peut entraîner de mauvais placements conduisant à une utilisation sous-optimale des ressources et par conséquent une diminution des performances du système. La contribution de cette thèse porte sur la problématique du placement des tâches matérielles de tailles différentes et de la génération efficace des régions reconfigurables. Une méthodologie et une couche intermédiaire entre le FPGA et l'application sont proposées pour permettre le placement efficace des tâches matérielles de tailles différentes sur des partitions reconfigurables de taille prédéfinie. Pour valider la méthode, on propose une architecture basée sur l'utilisation de la reconfiguration partielle afin d'adapter le transcodage d'un format de compression vidéo à un autre de manière souple et efficace. Une étude sur le partitionnement de la région reconfigurable pour les tâches matérielles de l'encodeur entropique (CAVLC / VLC) est proposée afin de montrer l'apport du partitionnement. Puis une évaluation du gain obtenu et du surcoût de la méthode est présentée / FPGA-based reconfigurable architectures can deliver appropriate solutions for several applications as they allow for changing the performance of a part of the FPGA while the rest of the circuit continues to run normally. These architectures, despite their improvements, still suffer from their lack of adaptability when confronted with applications consisting of variable size material tasks. This heterogeneity may cause wrong placements leading to a sub-optimal use of resources and therefore a decrease in the system performances. The contribution of this thesis focuses on the problematic of variable size material task placement and reconfigurable region effective generation. A methodology and an intermediate layer between the FPGA and the application are proposed to allow for the effective placement of variable size material tasks on reconfigurable partitions of a predefined size. To approve the method, we suggest an architecture based on the use of partial reconfiguration in order to adapt the transcoding of one video compression format to another in a flexible and effective way. A study on the reconfigurable region partitioning for the entropy encoder material tasks (CAVLC / VLC) is proposed in order to show the contribution of partitioning. Then an assessment of the gain obtained and of the method additional costs is submitted Architecture auto-adaptatives Méthodologie de partitionnement Reconfiguration partielle FPGA Placement de tâches matérielles Self-adaptive architecture Partitioning methodology Partial reconfiguration FPGA Placement of hardware tasks 629.895 63
40	Design and implementation of a reliable reconfigurable real-time operating system (R3TOS) Iturbe, Xabier January 2013 (has links) Twenty-first century Field-Programmable Gate Arrays (FPGAs) are no longer used for implementing simple “glue logic” functions. They have become complex arrays of reconfigurable logic resources and memories as well as highly optimised functional blocks, capable of implementing large systems on a single chip. Moreover, Dynamic Partial Reconfiguration (DPR) capability permits to adjust some logic resources on the chip at runtime, whilst the rest are still performing active computations. During the last few years, DPR has become a hot research topic with the objective of building more reliable, efficient and powerful electronic systems. For instance, DPR can be used to mitigate spontaneously occurring bit upsets provoked by radiation, or to jiggle around the FPGA resources which progressively get damaged as the silicon ages. Moreover, DPR is the enabling technology for a new computing paradigm which combines computation in time and space. In Reconfigurable Computing (RC), a battery of computation-specific circuits (“hardware tasks”) are swapped in and out of the FPGA on demand to hold a continuous stream of input operands, computation and output results. Multitasking, adaptation and specialisation are key properties in RC, as multiple swappable tasks can run concurrently at different positions on chip, each with custom data-paths for efficient execution of specific computations. As a result, considerable computational throughput can be achieved even at low clock frequencies. However, DPR penetration in the commercial market is still testimonial, mainly due to the lack of suitable high-level design tools to exploit this technology. Indeed, currently, special skills are required to successfully develop a dynamically reconfigurable application. In light of the above, this thesis aims at bridging the gap between high-level application and low-level DPR technology. Its main objective is to develop Operating System (OS)-like support for high-level software-centric application developers in order to exploit the benefits brought about by DPR technology, without having to deal with the complex low-level hardware details. The developed solution in this thesis is named as R3TOS, which stands for Reliable Reconfigurable Real-Time Operating System. R3TOS defines a flexible infrastructure for reliably executing reconfigurable hardware-based applications under real-time constraints. In R3TOS, the hardware tasks are scheduled in order to meet their computation deadlines and allocated to non-damaged resources, keeping the system fault-free at all times. In addition, R3TOS envisages a computing framework whereby both hardware and software tasks coexist in a seamless manner, allowing the user to access the advanced computation capabilities of modern reconfigurable hardware from a software “look and feel” environment. This thesis covers all of the design and implementation aspects of R3TOS. The thesis proposes a novel EDF-based scheduling algorithm, two novel task allocation heuristics (EAC and EVC) and a novel task allocation strategy (called Snake), addressing many RC-related particularities as well as technological constraints imposed by current FPGA technology. Empirical results show that these approaches improve on the state of the art. Besides, the thesis describes a novel way to harness the internal reconfiguration mechanism of modern FPGAs to performinter-task communications and synchronisation regardless of the physical location of tasks on-chip. This paves the way for implementing more sophisticated RC solutions which were only possible in theory in the past. The thesis illustrates R3TOS through a proof-of-concept prototype with two demonstrator applications: (1) dependability oriented control of the power chain of a railway traction vehicle, and (2) datastreaming oriented Software Defined Radio (SDR). 621.39

Search results