Global ETD Search

281	Improving Field-Programmable Gate Array Scaling Through Wire Emulation Fong, Ryan Joseph Lim 23 September 2004 (has links) Field-programmable gate arrays (FPGAs) are excellent devices for high-performance computing, system-on-chip realization, and rapid system prototyping. While FPGAs offer flexibility and performance, they continue to lag behind application specific integrated circuit (ASIC) performance and power consumption. As manufacturing technology improves and IC feature size decreases, FPGAs may further lag behind ASICs due to interconnection scalability issues. To improve FPGA scalability, this thesis proposes an architectural enhancement to improve global communications in large FPGAs, where chip-length programmable interconnects are slow. It is expected that this architectural enhancement, based on wire emulation techniques, can reduce chip-length communication latency and routing congestion. A prototype wire emulation system that uses FPGA self-reconfiguration as a non-traditional means of intra-FPGA communication is implemented and verified on a Xilinx Virtex-II XC2V1000 FPGA. Wire emulation benefits and impact to FPGA architecture are examined with quantitative and qualitative analysis. / Master of Science Xilinx wire Field programmable gate arrays scaling emulation ICAP self-reconfiguration Virtex-II
282	On the Programmability and Performance of OpenCL Designs for FPGA Verma, Anshuman 09 February 2018 (has links) Field programmable gate arrays (FPGAs) have been emerging as a promising bedrock to provide opportunities for several types of accelerators that spans across various domains such as finance, web-search, and data center networking, among others. Research interests facilitating the development of accelerators on FPGAs are increasing significantly, in particular, because of their effectiveness with a variety of applications, flexibility, and high performance per watt. However, several key challenges remain that hinder their large-scale deployment. Overcoming these challenges would enable them to match the pervasiveness of graphics processor units (GPUs), their principal competitors in this arena. One of the primary reasons responsible for the slow adaptation by programmers has been the programming model, which uses a low-level hardware description language (HDL). Using HDLs require a detailed understanding of logic design and significant effort to implement and verify the behavioral models, with the latter growing with its complexity. Recent advancements in high-level language synthesis (HLS) tools have addressed this challenge to a considerable extent by allowing the programmers to write their applications in a high-level language named OpenCL. These applications are then compiled and synthesized to create a bitstream that configures the FPGA. This thesis characterizes the efficacy of HLS compiler optimizations that can be employed to improve the performance of these applications. The synthesized hardware from OpenCL kernels is fundamentally different from traditional hardware such as CPUs and GPUs, which exploit instruction level parallelism (ILP) thread level parallelism (TLP), or data level parallelism (DLP) for performance gains. FPGAs typically use deep pipelining (i.e., ILP) for performance. A stall in this pipeline may severely undermine the performance of applications. Thus, it is imperative to identify and remove any such bottlenecks. To this end, this thesis presents and discusses a software-centric framework to debug and profile the synthesized designs generated using HLS tools. This thesis proposes basic code patterns, including a timestamp and a scalable framework, which can be plugged easily into OpenCL kernels, to collect and process run-time information dynamically. This scalable framework has a small overhead for area utilization and frequency but provides fine-grained information about the bottlenecks and latencies in design. Additionally, although HLS tools have improved programmability, this may come at the cost of performance or area utilization. This thesis addresses this design trade-off via a comparative study of a hand-coded design in HDL and an architecturally similar, tool-generated design using an OpenCL compiler in the application area of 3D-stencil (i.e., structured grid) computation. Experiments in this thesis show that the performance of an OpenCL approach can achieve 95% of the peak attainable performance of a microkernel for multiple problem sizes. In comparison to the OpenCL approach, an HDL approach results in approximately 50% less memory usage and only 2% better performance on average. / MS Field programmable gate arrays OpenCL HDL HLS AOCL Verilog Accelerator GEM Stencil
283	Enabling the use of Heterogeneous Computing for Bioinformatics Bijanapalli Chakri, Ramakrishna 02 October 2013 (has links) The huge amount of information in the encoded sequence of DNA and increasing interest in uncovering new discoveries has spurred interest in accelerating the DNA sequencing and alignment processes. The use of heterogeneous systems, that use different types of computational units, has seen a new light in high performance computing in recent years; However expertise in multiple domains and skills required to program these systems is causing an hindrance to bioinformaticians in rapidly deploying their applications into these heterogeneous systems. This work attempts to make an heterogeneous system, Convey HC-1, with an x86-based host processor and FPGA-based co-processor, accessible to bioinformaticians. First, a highly efficient dynamic programming based Smith-Waterman kernel is implemented in hardware, which is able to achieve a peak throughput of 307.2 Giga Cell Updates per Second (GCUPS) on Convey HC-1. A dynamic programming accelerator interface is provided to any application that uses Smith-Waterman. This implementation is also extended to General Purpose Graphics Processing Units (GP-GPUs), which achieved a peak throughput of 9.89 GCUPS on NVIDIA GTX580 GPU. Second, a well known graphical programming tool, LabVIEW is enabled as a programming tool for the Convey HC-1. A connection is established between the graphical interface and the Convey HC-1 to control and monitor the application running on the FPGA-based co-processor. / Master of Science Field programmable gate arrays Hardware Acceleration High Performance Computing DNA Alignment LabVIEW Heterogeneous Computing GP-GPUs
284	Characterization of FPGA-based High Performance Computers Pimenta Pereira, Karl Savio 02 September 2011 (has links) As CPU clock frequencies plateau and the doubling of CPU cores per processor exacerbate the memory wall, hybrid core computing, utilizing CPUs augmented with FPGAs and/or GPUs holds the promise of addressing high-performance computing demands, particularly with respect to performance, power and productivity. While traditional approaches to benchmark high-performance computers such as SPEC, took an architecture-based approach, they do not completely express the parallelism that exists in FPGA and GPU accelerators. This thesis follows an application-centric approach, by comparing the sustained performance of two key computational idioms, with respect to performance, power and productivity. Specifically, a complex, single precision, floating-point, 1D, Fast Fourier Transform (FFT) and a Molecular Dynamics modeling application, are implemented on state-of-the-art FPGA and GPU accelerators. As results show, FPGA floating-point FFT performance is highly sensitive to a mix of dedicated FPGA resources; DSP48E slices, block RAMs, and FPGA I/O banks in particular. Estimated results show that for the floating-point FFT benchmark on FPGAs, these resources are the performance limiting factor. Fixed-point FFTs are important in a lot of high performance embedded applications. For an integer-point FFT, FPGAs exploit a flexible data path width to trade-off circuit cost and speed of computation, improving performance and resource utilization. GPUs cannot fully take advantage of this, having a fixed data-width architecture. For the molecular dynamics application, FPGAs benefit from the flexibility in creating a custom, tightly-pipelined datapath, and a highly optimized memory subsystem of the accelerator. This can provide a 250-fold improvement over an optimized CPU implementation and 2-fold improvement over an optimized GPU implementation, along with massive power savings. Finally, to extract the maximum performance out of the FPGA, each implementation requires a balance between the formulation of the algorithm on the platform, the optimum use of available external memory bandwidth, and the availability of computational resources; at the expense of a greater programming effort. / Master of Science FFT molecular dynamics integer-point floating-point GPU HPC Field programmable gate arrays
285	FPGA-Based Accelerator Development for Non-Engineers Uliana, David Christopher 02 June 2014 (has links) In today's world of big-data computing, access to massive, complex data sets has reached an unprecedented level, and the task of intelligently processing such data into useful information has become a growing concern to the high-performance computing community. However, domain experts, who are the brains behind this processing, typically lack the skills required to build FPGA-based hardware accelerators ideal for their applications, as traditional development flows targeting such hardware require digital design expertise. This work proposes a usable, end-to-end accelerator development methodology that attempts to bridge this gap between domain-experts and the vast computational capacity of FPGA-based heterogeneous platforms. To accomplish this, two development flows were assembled, both targeting the Convey Hybrid-Core HC-1 heterogeneous platform and utilizing existing graphical design environments for design entry. Furthermore, incremental implementation techniques were applied to one of the flows to accelerate bitstream compilation, improving design productivity. The efficacy of these flows in extending FPGA-based acceleration to non-engineers in the life sciences was informally tested at two separate instances of an NSF-funded summer workshop, organized and hosted by the Virginia Bioinformatics Institute at Virginia Tech. In both workshops, groups of four or five non-engineer participants made significant modifications to a bare-bones Smith-Waterman accelerator, extending functionality and improving performance. / Master of Science Big-data HPC Field programmable gate arrays Heterogeneous Computing Life Sciences
286	Flexible and Lightweight Cryptographic Engines for Constrained Systems Gulcan, Ege 04 June 2015 (has links) There is a significant effort in building lightweight cryptographic operations, yet the proposed solutions are typically single purpose modules that can only provide a fixed functionality. However, flexibility is an important aspect of cryptographic designs where a module can perform multiple operations with different configurations. In this work, we combine flexibility with lightweight designs and propose two cryptographic engines based on the SIMON block cipher. The first proposed engine is the Flexible SIMON, which can execute all configurations of SIMON thus enables an adaptive security with variable key sizes. Our second proposed implementation is BitCryptor, a bit-serialized Compact Crypto Engine that can perform symmetric key encryption, hash computation and pseudo-random-number-generation. The implementation results on a Spartan-3 s50 FPGA show that the proposed engines occupies 90 and 95 slices respectively, which are more compact than the majority of their single purpose counterparts. Therefore, these engines are suitable cryptographic blocks for resource-constrained systems. / Master of Science Lightweight Cryptography Block Ciphers Flexible Architectures SIMON Field programmable gate arrays
287	Making Radios with GReasy: GNU Radio With FPGAs Made Easy Marlow, Ryan Lane 29 August 2014 (has links) Radio technology is rapidly evolving and as processing capabilities and algorithms become more complex, the need for alternative compilation and user interface abstraction increases. Field Programmable Gate Array (FPGA) technology introduces unique reconfigurable hardware architectures that can aid in software defined radio (SDR) design. FPGAs have greater processing capability than traditional general purpose processors (GPP) found in desktop workstations. This work builds on an ongoing project, GReasy, that augments a Linux based open source SDR development platform, GNU Radio, with FPGA processing capabilities. By delegating processing intensive portions of a radio design to the Xilinx Zynq FPGA architecture, the domain of deployable radios by GNU Radio can be broadened. Xilinx Zynq, integrates the FPGA fabric and CPU onto a single chip, which eliminates the need for a controlling host computer; thus, providing a single, portable, low-power, embedded platform. This thesis presents a Zynq capable version of GNU Radio -- an open-source rapid radio deployment tool -- with an enhanced flow that utilizes the processing capability of FPGAs. This work features TFlow -- an FPGA back-end compilation accelerator for instant FPGA assembly. GReasy generates a description of the hardware components that are used by TFlow for the instant FPGA assembly. Once the FPGA is programmed with a design based on the description generated by GReasy, modules and the target hardware can be parameterized to realize an even larger class of applications and further solidify the concept of rapid assembly of software defined radios. / Master of Science Field programmable gate arrays GNU Radio Software radio Rapid Assembly Productivity
288	Exploits in Concurrency for Boolean Satisfiability Sohanghpurwala, Ali Asgar Ali Akbar 14 December 2018 (has links) Boolean Satisfiability (SAT) is a problem that holds great theoretical significance along with effective formulations that benefit many real-world applications. While the general problem is NP-complete, advanced solver algorithms and heuristics allow for fast solutions to many large industrial problems. In addition to SAT, many applications rely on generalizations of Satisfiability such as MaxSAT, and Satisfiability Modulo Theories (SMT). Much of the advancement in SAT solver performance has been in the realm of improved sequential solvers with advanced conflict resolution, learning mechanisms, and sophisticated heuristics. There have been some successful demonstrations of massively parallel and hardware-accelerated solvers for SAT, but these have failed to find their way into mainstream usage. This document first presents previous work in Hardware Acceleration of Satisfiability followed by an analysis of why these attempts failed to gain widespread acceptance. It then demonstrates an alternative, hardware-centric approach, based on distributed Stochastic Local Search (SLS) that is better suited to efficient hardware implementation. Then a parallel SLS/CDCL hybrid approach is proposed that is suitable for distributed search with minimal communication overhead while maintaining completeness. Finally the efficacy and flexibility of distributed local search is considered with an adaptation to Weighted Partial MaxSAT (WPMS) and a focused case study on converted Probabilistic Inference instances. / Ph. D. / The Boolean Satisfiability (SAT) problem is an important decision problem that asks whether there exists a solution that satisfies all given constraints over a set of variables that can assume values of either 0 or 1. May real-world decision problems can be translated into SAT, and there exist efficient sequential solvers that can quickly resolve many such instances. Less progress has been made in efficiently scaling SAT solvers to modern multi-core systems and massively parallel hardware accelerators such as GPUs and Field Programmable Gate Arrays (FPGAs). This thesis explore different approaches to solving SAT based decision and optimization problems with the goal of increasing concurrency. Satisfiability Field programmable gate arrays SLS MaxSAT Parallel Local Search SAT
289	The Effects of Caching on Reconfigurable Adaptive Computing Systems Hendry, James Hugh 21 January 2004 (has links) Adaptive computing systems have proven useful for implementing a wide range of algorithms. A limitation of current systems is the relatively small amount of reconfigurable hardware resources. Many algorithms require more hardware resources than are available. One solution to this problem is runtime reconfiguration (RTR). Using RTR techniques, a large algorithm is implemented as a collection of configurations for the reconfigurable hardware. These configurations are loaded onto the reconfigurable hardware as necessary to implement the algorithm. A primary limitation of RTR is that the reconfiguration process is slow. Therefore, methods of decreasing reconfiguration time are desirable. Another method of implementing large algorithms on small hardware is to use multiple configurable computing platforms connected via a communication network. RTR techniques can be used in conjunction with this method to further increase hardware availability. In this case reconfiguration time is increased by the overhead of transmitting data across the communication network. Methods of decreasing network overhead are desirable. This thesis discusses the use of caching techniques to decrease reconfiguration time. An architecture for caching configurations is implemented on a configurable computing system platform. The use of caching to decrease network overhead is discussed and exhibited. An example application is implemented and used to evaluate the effects of caching on reconfiguration time and algorithm performance. / Master of Science Configurable Computing Field programmable gate arrays Adaptive Computing Run-Time Reconfiguration
290	A Development Platform to Evaluate UAV Runtime Verification Through Hardware-in-the-loop Simulation Rafeeq, Akhil Ahmed 17 June 2020 (has links) The popularity and demand for safe autonomous vehicles are on the rise. Advances in semiconductor technology have led to the integration of a wide range of sensors with high-performance computers, all onboard the autonomous vehicles. The complexity of the software controlling the vehicles has also seen steady growth in recent years. Verifying the control software using traditional verification techniques is difficult and thus increases their safety concerns. Runtime verification is an efficient technique to ensure the autonomous vehicle's actions are limited to a set of acceptable behaviors that are deemed safe. The acceptable behaviors are formally described in linear temporal logic (LTL) specifications. The sensor data is actively monitored to verify its adherence to the LTL specifications using monitors. Corrective action is taken if a violation of a specification is found. An unmanned aerial vehicle (UAV) development platform is proposed for the validation of monitors on configurable hardware. A high-fidelity simulator is used to emulate the UAV and the virtual environment, thereby eliminating the need for a real UAV. The platform interfaces the emulated UAV with monitors implemented on configurable hardware and autopilot software running on a flight controller. The proposed platform allows the implementation of monitors in an isolated and scalable manner. Scenarios violating the LTL specifications can be generated in the simulator to validate the functioning of the monitors. / Master of Science / Safety is one of the most crucial factors considered when designing an autonomous vehicle. Modern vehicles that use a machine learning-based control algorithm can have unpredictable behavior in real-world scenarios that were not anticipated while training the algorithm. Verifying the underlying software code with all possible scenarios is a difficult task. Runtime verification is an efficient solution where a relatively simple set of monitors validate the decisions made by the sophisticated control software against a set of predefined rules. If the monitors detect an erroneous behavior, they initiate a predetermined corrective action. Unmanned aerial vehicles (UAVs), like drones, are a class of autonomous vehicles that use complex software to control their flight. This thesis proposes a platform that allows the development and validation of monitors for UAVs using configurable hardware. The UAV is emulated on a high-fidelity simulator, thereby eliminating the time-consuming process of flying and validating monitors on a real UAV. The platform supports the implementation of multiple monitors that can execute in parallel. Scenarios to violate rules and cause the monitors to trigger corrective actions can easily be generated on the simulator. Runtime Verification Drone aircraft Hardware-in-the-loop Simulation Hardware and Software Monitors PSoC Field programmable gate arrays

Search results