Global ETD Search

1	Component design for application-directed FPGA system generation frameworks Bandara, Sahan Lakshitha 20 September 2024 (has links) Field Programmable Gate Arrays (FPGAs) can fulfill many critical and contrasting roles in modern computing due to their combination of powerful computing and communication, inherent hardware flexibility, and energy efficiency. FPGAs are traditionally used in application areas such as emulation, prototyping, telecommunication, network packet processing, Digital Signal Processing (DSP), and a myriad of embedded and edge applications. Over the last decade, this use has expanded to include various functions in data centers including supporting low-latency communication and as a computing resource offered by cloud service providers. There are, however, challenges in development and design portability in FPGAs as the typical design flows involve rebuilding the entire hardware stack for each deployment. To overcome these challenges and make FPGAs more accessible to developers, FPGA vendors and academic researchers have made attempts to add operating system-like abstractions to the FPGA use model. One approach is providing infrastructure logic, typically referred to as an FPGA shell, that implements and manages external interfaces and provides services necessary for application logic to function properly. While they simplify the FPGA use model, fixed implementations of FPGA shells do not fully address the design portability limitations. They often use FPGA resources unnecessarily as most applications do not require all the capabilities of the FPGA shell, and there is no flexibility in terms of the features implemented by the FPGA shell. Automatic generation of FPGA system designs based on application requirements can overcome the limitations of fixed FPGA shells. It allows the infrastructure logic to be customized to match the application requirements and, therefore, to provide better resource utilization. Automatic system generation also makes it easier to port designs across devices. We refer to a system design that manages FPGA resources and provides services to a user application as a “hardware operating system” (hOS); and a framework that maps user requirements and available system components to such system designs as an hOS generator. Critical to automatic system generation for FPGAs are system components designed to be integrated into an hOS generator. In this dissertation, we develop a design strategy that maximizes component reuse and design portability while maintaining the implementation effort at an acceptable level. We also present a component design example that follows the proposed design strategy to implement a host-FPGA PCIe communication subsystem. We demonstrate how the PCIe subsystem is integrated into a system generator framework, used to enable different applications, and ported to different devices. Additionally, we establish a set of characteristics for a good hOS generator design. We also discuss how and to what extent the system generation framework used in this work, named DISL, displays these ideal characteristics. Finally, we attempt to address the open question of how to evaluate a system generator. We discuss qualitative metrics and how they relate to the previously identified ideal characteristics of an hOS generator; and evaluate DISL based on these metrics. Computer engineering PCIe Virtio
2	Vysokorychlostní paketové DMA přenosy do FPGA / High-Speed Packet Data DMA Transfers to FPGA Kubálek, Jan January 2020 (has links) This thesis deals on the design, implementation, testing and measuring of a firmware module for FPGA chips, which enables DMA transfers of network data from computer RAM to the FPGA chip placed on a network interface card. These transfers are carried out using a PCIe bus on the speed of up to 100 Gbps with the possible support of speeds 200 Gbps and 400 Gbps. The goal of this technology is to allow network data processing for the purpose of maintenance of backbone network nodes and data centers. The module is designed so it can be used on different types of FPGA chips, mainly those produced by companies Xilinx and Intel. Ultrascale+; DMA; Stratix10; FPGA; PCIe
3	Um cluster de PCs usando nós baseados em módulos aceleradores de hardware (FPGA) como co-processadores Wanderley Pimentel Araujo, Rodrigo 31 January 2010 (has links) Made available in DSpace on 2014-06-12T15:58:17Z (GMT). No. of bitstreams: 2 arquivo3450_1.pdf: 2428220 bytes, checksum: 164a34bb1ebc71c885503d9ef049987d (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2010 / Conselho Nacional de Desenvolvimento Científico e Tecnológico / A criação de novas soluções para aumentar o desempenho de aplicações está crescendo de importância, pois os processamentos convencionais estão se tornando obsoletos. Diferentes abordagens têm sido estudadas e usadas, porém vários problemas foram encontrados. Um exemplo é dos processadores com vários núcleos, que, apesar de dissipar pouca potência, apresentam velocidade de transmissão baixa e pequena largura de banda. Circuitos ASICs apresentam alto desempenho, baixa dissipação de potência, mas possuem um alto custo de engenharia. Na tentativa de conseguir mais altos níveis de aceleração, plataformas que associam o uso de cluster de computadores convencionais com FPGAs têm sido estudadas. Este tipo de plataforma requer o uso de barramentos de alto desempenho para minimizar o gargalo de comunicação entre PC e FPGA, e um comunicador eficiente entre os nós do sistema. Neste trabalho, são vistas as principais características de algumas arquiteturas que utilizam cluster de PCs. Com isto, é proposta uma arquitetura que utiliza FPGA como co‐processador em cada nó do sistema, utilizando a interface MPI para comunicação entre os nós e um device driver, para Linux, que permite transferência em rajada dos dados, através do barramento PCIe. Como estudo de caso, usado para a validação da arquitetura, é implementado a multiplicação de matrizes densas, esta funcionalidade é baseada no nível três da biblioteca BLAS Cluster Computação de alto desempenho FPGA MPI, Device Driver PCIe
4	Design and Analysis of a Real-time Data Monitoring Prototype for the LWA Radio Telescope Vigraham, Sushrutha 11 March 2011 (has links) Increasing computing power has been helping researchers understand many complex scientific problems. Scientific computing helps to model and visualize complex processes such as molecular modelling, medical imaging, astrophysics and space exploration by processing large set of data streams collected through sensors or cameras. This produces a massive amount of data which consume a large amount of processing and storage resources. Monitoring the data streams and filtering unwanted information will enable efficient use of the available resources. This thesis proposes a data-centric system that can monitor high-speed data streams in real-time. The proposed system provides a flexible environment where users can plug-in application-specific data monitoring algorithms. The Long Wavelength Array telescope (LWA) is an astronomical apparatus that works with high speed data streams, and the proposed data-centric platform is developed to evaluate FPGAs to implement data monitoring algorithms in LWA. The throughput of the data-centric system has been modeled and it is observed that the developed data-centric system can deliver a maximum throughput of 164 MB/s. / Master of Science data-centric computing PCIe data streaming Field programmable gate arrays
5	Implementation of an industrial process control interface for the LSC11 system using Lattice ECP2M FPGA Murali Baskar Rao, Parthasarathy January 2012 (has links) Reconfigurable devices are the mainstream in today’s system on chip solutions. Reconfigurable devices have the advantages of reduced cost over their equivalent custom design, quick time to market and the ability to reconfigure the design at will and ease. One such reconfigurable device is an FPGA. In this industrial thesis, the design and implementation of a control process interface using ECP2M FPGA and PCIe communication is accomplished. This control process interface is designed and implemented for a 3-D plotter system called LSC11. In this thesis, the FPGA unit implemented drives the plotter device based on specific timing requirements charted by the customer. The FPGA unit is interfaced to a Host CPU in this thesis (through PCIe communication) for controlling the LSC11 system using a custom software. All the peripherals required for the LSC11 system such as the ADC, DAC, Quadrature decoder and the PWM unit are also implemented as part of this thesis. This thesis also implements an efficient methodology to send all the inputs of the LSC11 system to the Host CPU without the necessity for issuing any cyclic read commands on the Host CPU. The RTL design is synthesised in FPGA and the system is verified for correctness and accuracy. The LSC11 system design consumed 79% of the total FPGA resources and the maximum clock frequency achieved was 130 Mhz. This thesis has been carried out at Abaxor Engineering GmbH, Germany. It is demonstrated in this thesis how FPGA aids in quick designing and implementation of system on chip solutions with PCIe communication. PCI Express (PCIe) Lattice ECP2M (ECP2M) Field Programmable Gate Arrays (FPGA)
6	Accelerated Simulation of Modelica Models Using an FPGA-Based Approach Lundkvist, Herman, Yngve, Alexander January 2018 (has links) This thesis presents Monza, a system for accelerating the simulation of modelsof physical systems described by ordinary differential equations, using a generalpurpose computer with a PCIe FPGA expansion card. The system allows bothautomatic generation of an FPGA implementation from a model described in theModelica programming language, and simulation of said system.Monza accomplishes this by using a customizable hardware architecture forthe FPGA, consisting of a variable number of simple processing elements. A cus-tom compiler, also developed in this thesis, tailors and programs the architectureto run a specific model of a physical system.Testing was done on two test models, a water tank system and a Weibel-lung,with up to several thousand state variables. The resulting system is several timesfaster for smaller models and somewhat slower for larger models compared to aCPU. The conclusion is that the developed hardware architecture and softwaretoolchain is a feasible way of accelerating model execution, but more work isneeded to ensure faster execution at all times. FPGA Modelica VHDL MBSE model simulation ODE HLS hardware acceleration PCIe ZYNQ Computer Systems Datorsystem
7	Exploitation from malicious PCI express peripherals Rothwell, Colin Lewis January 2018 (has links) The thesis of this dissertation is that, despite widespread belief in the security community, systems are still vulnerable to attacks from malicious peripherals delivered over the PCI Express (PCIe) protocol. Malicious peripherals can be plugged directly into internal PCIe slots, or connected via an external Thunderbolt connection. To prove this thesis, we designed and built a new PCIe attack platform. We discovered that a simple platform was insufficient to carry out complex attacks, so created the first PCIe attack platform that runs a full, conventional OS. To allows us to conduct attacks against higher-level OS functionality built on PCIe, we made the attack platform emulate in detail the behaviour of an Intel 82574L Network Interface Controller (NIC), by using a device model extracted from the QEMU emulator. We discovered a number of vulnerabilities in the PCIe protocol itself, and with the way that the defence mechanisms it provides are used by modern OSs. The principal defence mechanism provided is the Input/Output Memory Management Unit (IOMMU). The remaps the address space used by peripherals in 4KiB chunks, and can prevent access to areas of address space that a peripheral should not be able to access. We found that, contrary to belief in the security community, the IOMMUs in modern systems were not designed to protect against attacks from malicious peripherals, but to allow virtual machines direct access to real hardware. We discovered that use of the IOMMU is patchy even in modern operating systems. Windows effectively does not use the IOMMU at all; macOS opens windows that are shared by all devices; Linux and FreeBSD map windows into host memory separately for each device, but only if poorly documented boot flags are used. These OSs make no effort to ensure that only data that should be visible to the devices is in the mapped windows. We created novel attacks that subverted control flow and read private data against systems running macOS, Linux and FreeBSD with the highest level of relevant protection enabled. These represent the first use of the relevant exploits in each case. In the final part of this thesis, we evaluate the suitability of a number of proposed general purpose and specific mitigations against DMA attacks, and make a number of recommendations about future directions in IOMMU software and hardware.
8	Design, implementering och evaluering av en AI accelerator med Google Coral Dual Edge TPU / Design, implementation and evaluation of an AI accelerator using Google Coral Dual Edge TPU Burwall, Oscar January 2023 (has links) Den snabbt växande utvecklingen av AI-baserade applikationer och den stora mängden data dessa applikationer behandlar ställer ökade krav på prestanda och optimering av datorsystemen. För att tillfredsställa de växande datorbehoven används hårdvaruacceleratorer som förbättrar databehandlingshastigheten genom att avlasta den befintliga utrustningen genom att hjälpa till med uppgifter och komplexa beräkningar. De befintliga lösningarna som används i dagsläget är kostsamma och MT-FoU på Umeå Universitetssjukhus efterfrågar därför en alternativ lösning i form av att kombinera mindre integrerande acceleratorer på ett större PCIe-kort. I detta examensarbete designas och implementeras en AI-accelerator bestående av fyra Google Coral Dual Edge TPU M.2 på ett 16x PCIe-kort. Arbetet genomfördes på MT-FoU och målet med examensarbetet var att undersöka om den tilltänkta konstruktionen kan förbättra prestandan hos AI-baserade system och fungera som ett billigare alternativ i verksamheten. Schemaritning och PCB-design utfördes i KiCad och information om gränssnitt och komponenter hämtades främst från tillverkares hemsidor och datablad. Kretsen består i huvudsak av fyra stycken M.2 E key kontaktdon, en 16port/16lane packetswitch och en 16x PCIe-anslutning. Switchen delar upp banorna från PCIe porten så att Edge TPU’erna kan anslutas parallellt i M.2 kontakterna. Edge TPU’erna använder pipelineparallellism för att fördela arbetsuppgifter på varje TPU så att större, mer komplexa program kan exekveras. Vid monteringen av kretskortet uppstod problem med fastlödningen av vissa komponenter. För att undvika att dessa problem uppstår och möjliggöra avlägsnandet av dessa felkällor bör montering istället beställas av fabrik där lödrobot finns tillgängligt. På grund av att tiden för kursen tog slut hann en sådan beställning inte göras och evaluering av den framtagna designen var därför inte möjlig att genomföra. Den design som togs fram var dock betydligt billigare än de existerande lösningarna och med pipelineparallellism förväntas designen kunna utföra komplexa beräkningar och därmed förbättra prestandan i befintliga system. / The rapidly growing development of AI-based applications and the large amount of data these applications process place increased demands on the performance and optimization of conventional computer systems. To satisfy these growing computing requirements, hardware accelerators are used to improve the data processing speed by offloading the existing equipment by executing models and complex calculations. The existing solutions currently used are costly and MT-R&D at Umeå University Hospital is therefore requesting an alternative solution by combining smaller integrating accelerators on a larger PCIe card. In this thesis, an AI accelerator using four Google Coral Dual Edge TPU M.2 on a 16x PCIe card is designed and implemented. The work was carried out at MT-R&D and the goal of the thesis was to investigate whether the intended design can improve the performance of AI-based systems and serve as a cheaper alternative in the institution. Schematic and PCB were designed in KiCad and information on interfaces and components was obtained from manufacturers' websites and data sheets. The circuit’s main components are four M.2 E key connectors, a 16port/16lane packet switch and a 16x PCIe connection. The switch divides the lanes from the PCIe port so that the Edge TPUs can be connected in parallel in the M.2 connectors. The Edge TPUs use pipeline parallelism to distribute models across each TPU so that larger, more complex programs can be executed. When assembling the circuit board, problems arose with the soldering of certain components. In order to avoid these sources of error, assembly should instead be ordered from a factory where a soldering robot is available. Due to the fact that the time for the course ran out, such an order could not be placed and evaluation of the design was therefore not possible to carry out. However, the design that was produced was significantly cheaper than the existing solutions and by using pipeline parallelism, the design is expected to be able to perform complex calculations and thus improve the performance of existing systems. PCIe PCI Express Edge TPU AI accelerator PCB Google Coral Elektroteknik och elektronik
9	Design and Implementation of the Heterogeneous Computing Device Management Architecture Schultek, Brian Robert January 2014 (has links) No description available. Electrical Engineering Computer Engineering Heterogeneous Computing Hardware Acceleration Algorithm Acceleration PCIe Device Management High Throughput Applications

Search results