Global ETD Search

1431	Limited Resource Feature Detection, Description, and Matching Fowers, Spencer G. 20 April 2012 (has links) (PDF) The aims of this research work are to develop a feature detection, description, and matching system for low-resource applications. This work was motivated by the need for a vision sensor to assist the flight of a quad-rotor UAV. This application presented a real-world challenge of autonomous drift stabilization using vision sensors. The initial solution implemented a basic feature detector and matching system on an FPGA. The research then pursued ways to improve the vision system. Research began with color feature detection, and the Color Difference of Gaussians feature detector was developed. CDoG provides better results than gray scale DoG and does not require any additional processing than gray scale if implemented in a parallel architecture. The CDoG Scale-Invariant Feature Transform modification was developed which provided color feature detection and description to the gray scale SIFT descriptor. To demonstrate the benefits of color information, the CDSIFT algorithm was applied to a real application: library book inventory. While color provides added benefit to the CDSIFT descriptor, CDSIFT descriptors are still computationally intractable for a low-resource hardware implementation. Because of these shortcomings, this research focused on developing a new feature descriptor. The BAsis Sparse-coding Inspired Similarity (BASIS) descriptor was developed with low-resource systems in mind. BASIS utilizes sparse coding to provide a generic description of feature characterstics. The BASIS descriptor provided improved accuracy over SIFT, and similar accuracy to SURF on the task of aerial UAV frame-to-frame feature matching. However, basis dictionaries are non-orthogonal and can contain redundant information. In addition to a feature descriptor, an FPGA-based feature correlation (or matching) system needed to be developed. TreeBASIS was developed to answer this need and address the redundancy issues of BASIS. TreeBASIS utilizes a vocabulary tree to drastically reduce descriptor computation time and descriptor size. TreeBASIS also obtains a higher level of accuracy than SIFT, SURF, and BASIS on the UAV aerial imagery task. Both BASIS and TreeBASIS were implemented in VHDL and are well suited for low-resource FPGA applications. TreeBASIS provides a complete feature detection, description, and correlation system-on-a-chip for low-resource FPGA vision systems. feature detection feature description feature matching low-resource limited-resource FPGA computer vision Electrical and Computer Engineering
1432	Power Side-Channel DAC Implementations for Xilinx FPGAs Savory, Daniel Chase 24 April 2014 (has links) (PDF) This thesis presents a novel power side-channel DAC (PS-DAC) which is constructed from user-controllable short circuits in FPGAs and which manipulate overall system power through dynamic power dissipation. Alternately, similar PS-DACs are created using shift-register primitives(SRL16E) which manipulate system power through switching logic, for means of comparison with short-circuit-based PS-DACs. PS-DACs are created of various sizes using both short-circuit-based and shift-register-based methods. These PS-DACs are characterized in terms of output linearity,monotonicity, and frequency distortion. Applications explored in this thesis which use PS-DAC technology include a Simple Power Analysis (SPA) side-channel transmitter, and a frequency watermarking application. These applications serve as proof-of-concept for PS-DAC use in sidechannel communication applications. power side channel FPGA Xilinx killswitch virus DAC short circuit watermark Electrical and Computer Engineering
1433	Single Event Mitigation for Aurora Protocol Based MGT FPGA Designs in Space Environments Harding, Alexander Stanley 17 June 2014 (has links) (PDF) This work has extended an existing Aurora protocol for high-speed serial I/O between FPGAs to provide greater fault recovery in the presence of high-energy radiation. To improve on the Aurora protocol, additional resets that affect larger portions of the system were used. Detection for additional error modes that occurred but were not detected by the Aurora protocol was designed. Radiation testing was performed on the Aurora protocol with the additional mitigation hardware. The test gathered large amounts of data on the various error modes of the Aurora protocol and how the additional mitigation circuitry affected the system. The test results showed that the addition of the recovery circuitry greatly enhanced the Aurora protocol's ability to recover from errors. The recovery circuit recovered from all but 0.01% of errors that the Aurora protocol could not. The recovery circuit further increased the availability of the transmission link by proactively applying resets at much shorter intervals than used in previous testing. This quick recovery caused the recovery mechanism to fix some errors that may have recovered automatically with enough time. However, the system still showed an increase in performance, and unrecoverable errors were reduced 100x. The estimated unrecoverable error rate of the system is 5.9E-07 in geosynchronous orbit. The bit error rate of the enhanced system was 8.47754E-015, an order of magnitude improvement. FPGA radiation testing BYU MGT Aurora reliability high-speed serial I/O Electrical and Computer Engineering
1434	Tincr: Integrating Custom CAD Tool Frameworks with the Xilinx Vivado Design Suite White, Brad S 01 December 2014 (has links) (PDF) The field programmable gate array (FPGA) is appealing as a computational platform because of its ability to be repurposed for a number of different applications and its relatively low design cost. Traditionally, FPGA vendors provide a set of electronic design automation (EDA) tools to assist customers with the implementation of their designs. These tools are necessarily general purpose, and the resulting tool flow does not provide the user much in the way of customization. Frameworks such as RapidSmith and Torc allow for the creation of custom CAD tools that are able to target actual Xilinx FPGA devices. However, they are built on the Xilinx Design Language (XDL), which was discontinued with the introduction of Xilinx's new tool suite Vivado. Instead, Vivado provides direct access to its data structures through a Tcl interface, as well as EDIF and Xilinx Design Constraint (XDC) files. This thesis discusses Vivado's ability to support a custom CAD tool framework similar to RapidSmith and Torc. It provides a detailed description of the CAD-related aspects of Vivado's Tcl API and shows how its command set can be used to integrate a custom CAD tool framework. This is demonstrated through the introduction of Tincr, a suite of two Tcl-based libraries that each encapsulate a separate method for implementing such a framework. The first is the TincrCAD library, a high-level CAD tool framework built within Vivado's Tcl environment. The second is TincrIO, a set of Tcl commands that comprise a file-based interface into Vivado, similar to XDL. These libraries are offered up as evidence that the Vivado Design Suite can provide a foundation for the implementation of custom CAD tools that operate on Xilinx FPGAs for the foreseeable future. FPGA Vivado Tcl CAD tool Tincr RapidSmith Xilinx Electrical and Computer Engineering
1435	Interface Design and Synthesis for Structural Hybrid Microarchitectural Simulators Ruan, Zhuo 01 December 2013 (has links) (PDF) Computer architects have discovered the potential of using FPGAs to accelerate software microarchitectural simulators. One type of FPGA-accelerated microarchitectural simulator, namedthe hybrid structural microarchitectural simulator, is very promising. This is because a hybrid structural microarchitectural simulator combines structural software and hardware, and this particular organization provides both modeling flexibility and fast simulation speed. The performance of a hybrid simulator is significantly affected by how the interface between software and hardware is constructed. The work of this thesis creates an infrastructure, named Simulator Partitioning Research Infrastructure (SPRI), to implement the synthesis of hybrid structural microarchitectural simulators which includes simulator partitioning, simulator-to-hardware synthesis, interface synthesis. With the support of SPRI, this thesis characterizes the design space of interfaces for synthesized hybrid structural microarchitectural simulators and provides the implementations for several such interfaces. The evaluation of this thesis thoroughly studies the important design tradeoffs and performance factors (e.g. hardware capacity, design scalability, and interface latency) involved in choosing an efficient interface. The work of this thesis is essential to the research community of computer architecture. It not only contributes a complete synthesis infrastructure, but also provides guidelines to architects on how to organize software microarchitectural models and choose a proper software/hardware interface so the hybrid microarchitectural simulators synthesized from these software models can achieve desirable speedup hybrid microarchitectural simulator software codesign hardware codesign SystemC FPGA Electrical and Computer Engineering
1436	Hardware accelerators for post-quantum cryptography and fully homomorphic encryption Agrawal, Rashmi 16 January 2023 (has links) With the monetization of user data, data breaches have become very common these days. In the past five years, there were more than 7000 data breaches involving theft of personal information of billions of people. In the year 2020 alone, the global average cost per data breach was $3.86 million, and this number rose to $4.24 million in 2021. Therefore, the need for maintaining data security and privacy is becoming increasingly critical. Over the years, various data encryption schemes including RSA, ECC, and AES are being used to enable data security and privacy. However, these schemes are deemed vulnerable to quantum computers with their enormous processing power. As quantum computers are expected to become main stream in the near future, post-quantum secure encryption schemes are required. To this end, through NIST’s standardization efforts, code-based and lattice-based encryption schemes have emerged as one of the plausible way forward. Both code-based and lattice-based encryption schemes enable public key cryptosystems, key exchange mechanisms, and digital signatures. In addition, lattice-based encryption schemes support fully homomorphic encryption (FHE) that enables computation on encrypted data. Over the years, there have been several efforts to design efficient FPGA-based and ASIC-based solutions for accelerating the code-based and lattice-based encryption schemes. The conventional code-based McEliece cryptosystem uses binary Goppa code, which has good code rate and error correction capability, but suffers from high encoding and decoding complexity. Moreover, the size of the generated public key is in several MBs, leading to cryptosystem designs that cannot be accommodated on low-end FPGAs. In lattice-based encryption schemes, large polynomial ring operations form the core compute kernel and remain a key challenge for many hardware designers. To extend support for large modular arithmetic operations on an FPGA, while incurring low latency and hardware resource utilization requires substantial design efforts. Moreover, prior FPGA solutions for lattice-based FHE include hardware acceleration of basic FHE primitives for impractical parameter sets without the support for bootstrapping operation that is critical to building real-time privacy-preserving applications. Similarly, prior ASIC proposals of FHE that include bootstrapping are heavily memory bound, leading to large execution times, underutilized compute resources, and cost millions of dollars. To respond to these challenges, in this dissertation, we focus on the design of efficient hardware accelerators for code-based and lattice-based public key cryptosystems (PKC). For code-based PKC, we propose the design of a fully-parameterized en/decryption co-processor based on a new variant of McEliece cryptosystem. This co-processor takes advantage of the non-binary Orthogonal Latin Square Code (OLSC) to achieve a lower computational complexity along with smaller key size than that of the binary Goppa code. Our FPGA-based implementation of the co-processor is ∼3.5× faster than an existing classic McEliece cryptosystem implementation. For lattice-based PKC, we propose the design of a co-processor that implements large polynomial ring operations. It uses a fully-pipelined NTT polynomial multiplier to perform fast polynomial multiplications. We also propose the design of a highly-optimized Gaussian noise sampler, capable of sampling millions of high-precision samples per second. Through an FPGA-based implementation of this lattice-based PKC co-processor, we achieve a speedup of 6.5× while utilizing 5× less hardware resources as compared to state-of-the-art implementations. Leveraging our work on lattice-based PKC implementation, we explore the design of hardware accelerators that perform FHE operations using Cheon-Kim-Kim-Song (CKKS) scheme. Here, we first perform an in-depth architectural analysis of various FHE operations in the CKKS scheme so as to explore ways to accelerate an end-to-end FHE application. For this analysis, we develop a custom architecture modeling tool, SimFHE, to measure the compute and memory bandwidth requirements of hardware-accelerated CKKS. Our analysis using SimFHE reveals that, without a prohibitively large cache, all FHE operations exhibit low arithmetic intensity (<1 Op/byte). To address the memory bottleneck resulting from the low arithmetic intensity, we propose several memory-aware design (MAD) techniques, including caching and algorithmic optimizations, to reduce the memory requirements of CKKS-based application execution. We show that the use of our MAD techniques can yield an ASIC design that is at least 5-10× cheaper than the large-cache proposals, but only ∼2-3× slower. We also design FAB, an FPGA-based accelerator for bootstrappable FHE. FAB, for the first time ever, accelerates bootstrapping (along with basic FHE primitives) on an FPGA for a secure and practical parameter set. FAB tackles the memory-bounded nature of bootstrappable FHE through judicious datapath modification, smart operation scheduling, and on-chip memory management techniques to maximize the overall FHE-based compute throughput. FAB outperforms all prior CPU/GPU works by 9.5× to 456× and provides a practical performance for our target application: secure training of logistic regression models. / 2025-01-16T00:00:00Z Computer engineering FPGA Fully homomorphic encryption Hardware accelerator Post-quantum cryptography Privacy-preserving computing
1437	High-Speed Communication Scheme in OSI Layer 2 Research and Implementation Zaklouta, Ahmadmunthar January 2019 (has links) This thesis is part of a project at Bombardier’s Object Controller System. This system acts as a communication interface for several sub-systems that control the railway traffic. Therefore, part of the safety and availability of railway transportation is dependent on the performance and reliability of this system especially the digital communication system that handles the board-to-board communication. Thus, Bombardier has implemented new high-speed LVDS channels to use instead of the implemented RS-485 channels to improve the board-to-board communication performance in the Object Controller System but they lack a transceiver. This thesis work explores possible transceiver solutions that achieve Bombardier requirements. Reusability is very important for Bombardier for safety compliance and certification. Therefore, the investigation was carried out by looking into what is currently implemented and then was carried on by looking into transceivers that used in highspeed communication and check their suitability and compliance for the FPGA and the requirements. This exploration results in three experiments for different transceiver architecture. The first experiment exploits the currently implemented transceiver architecture and it is not suitable for high-speed data rate due to a limitation in the buffer. The second experiment overcomes the buffer limitation by using a clock domain crossing buffer and results in a 100-time faster system. The third experiment aimed to achieve a higher data rate by using a clock and data recovery transceiver and results in a promising solution but needs some enhancements. For testing, a verification methodology following the one-way stress test architecture has been developed using VHDL for simulation and for in-chip testing and the results were verified using ChipScope logic analyzer from Xilinx. In addition, a thermal test for the solution from the second experiment has been performed. / Denna avhandling är en del av ett projekt på Bombardiers Object Controller System. Detta system fungerar som ett kommunikationsgränssnitt för flera delsystem som styr järnvägstrafiken. Därför är en del av säkerheten och tillgängligheten av järnvägstransporten beroende av systemets prestanda och tillförlitlighet, särskilt det digitala kommunikationssystemet som hanterar kommunikationen ombord. Bombardier har sålunda implementerat nya höghastighets LVDS-kanaler för att använda istället för de implementerade RS-485-kanalerna för att förbättra kommunikationsprestandan ombord i objektkontrollen, men de saknar en transceiver. Denna avhandling arbetar med att undersöka möjliga transceiverlösningar som uppnår Bombardier-krav. Återanvändbarhet är mycket viktigt för Bombardier för säkerhetsöverensstämmelse och certifiering. Undersökningen genomfördes därför genom att undersöka vad som för närvarande implementeras och sedan genomföras genom att titta på transceivers som används i höghastighetskommunikation och kontrollera deras lämplighet och överensstämmelse för FPGA och kraven. Denna undersökning resulterar i tre experiment för olika transceiverarkitektur. Det första experimentet utnyttjar den nuvarande implementerade transceiverarkitekturen. Den är inte lämplig för höghastighetsdatakommunikation på grund av en begränsning i bufferten. Det andra experimentet övervinns buffertbegränsningen genom att använda en klockdomänöverföringsbuffert vilket resulterar i ett 100-timmars snabbare system. Det tredje experimentet syftade till att uppnå en högre datahastighet genom att använda en klockoch dataåterställningstransceiver vilket resulterar i en lovande lösning men behöver vissa förbättringar. För testning har en verifieringsmetod som följer envägsstresstestarkitekturen utvecklats med hjälp av VHDL för simulering och för inchip-testning. Resultaten verifierades med hjälp av ChipScope logic analyzer från Xilinx. Dessutom har ett termiskt test för lösningen från det andra experimentet utförts. Computer and Information Sciences Data- och informationsvetenskap
1438	A Data Sorting Hardware Accelerator on FPGA Liu, Boyan January 2020 (has links) In recent years, with the rise of the application of big data, efficiency has become more important for data processing, and simple sorting methods require higher stability and efficiency in large-scale scenarios. This thesis explores topics related to hardware acceleration for data sorting networks of massive input resource or data stream, which leads to our three different design approaches: running the whole data processing fully on the software side (sorting and merging on PC), a combination of PC side and field- programmable gate arrays (FPGA) platform (hardware sorting with software merging), and fully hardware side (sorting and merging on FPGA). Parallel data hardware sorters have been proposed before, but they do not consider that the loading and off-loading of data often is serial in nature. In this analysis, we explore an insertion-sort solution that can sort data in the same clock cycle as is written to the sorter and compare it with standard parallel sorters.‌ The main contributions in this thesis are techniques for data sorting acceleration for large data streams, which involve fully software design, hardware/software co-design and fully hardware design solution on a reconfigurable FPGA platform. The results of this whole experiment mostly meet our predictions, and we show that Insertion-Sort implemented in hardware can improve the data processing speed for small input data sizes. / De senaste årens ökning av tillämpad big data har inneburit att effektiviteten blivit viktigare vid databehandling. Enkla sorteringsmetoder kräver högre stabilitet och effektivitet i storskaliga scenarier. Den här avhandlingen undersöker ämnen relaterade till hårdvaruacceleration av datasorteringsnätverk med massiv inmatning eller strömmande data, vilket leder till tre olika designmetoder: att köra databehandlingen helt på mjukvarusidan (sortering och sammanslagning på PC), en kombination av PC och Fält- Programmerbara Gate-Arrays (FPGA) plattform (hårdvarusortering med mjukvarusammanslagning), och enbart hårdvarulösning (sortering och sammanslagning på FPGA). Parallella hårdvarusorterare has föreslagits förr, men de tar vanligtvis inte hänsyn till att indata och utdata ofta är seriell till sin natur. I den här avhandlingen undersöker vi en insertion-sort lösning, som kan sortera indata i samma clock cykel som den läses in, och jämför den med några standard parallella sorterare.‌ De viktigaste bidragen i den här avhandlingen är tekniker för datasorteringsacceleration av stora dataströmmar, vilket involverar en implementering helt i mjukvara, en HW/SW codesign lösning och en implementering helt i hårdvara på en rekonfigurerbar FPGA plattform. Resultaten av experimenten uppfyller mestadels våra förutsägelser, och vi visar att Insertion-Sort implementerad i hårdvara kan förbättra databehandlingshastigheten för små dataserier. Data Sorting Hardware Accelerator Algorithm Block Circuit FPGA Computer and Information Sciences Data- och informationsvetenskap
1439	Design of a GUI Protocol for the Authentication of FPGA Based ROPUFs Khaloozadeh, Kiyan January 2021 (has links) No description available. Electrical Engineering Computer Science FPGA ASIC PUF ROPUF GUI Machine Learning Modeling Attacks
1440	Fpga-based Design Of A Maximum-power-point Tracking System For Space A Persen, Todd 01 January 2004 (has links) Satellites need a source of power throughout their missions to help them remain operational for several years. The power supplies of these satellites, provided primarily by solar arrays, must have high efficiencies and low weights in order to meet stringent design constraints. Power conversion from these arrays is required to provide robust and reliable conversion which performs optimally in varying conditions of peak power, solar flux, and occlusion conditions. Since the role of these arrays is to deliver power, one of the principle factors in achieving maximum power output from an array is tracking and holding its maximum-power point. This point, which varies with temperature, insolation, and loading conditions, must be continuously monitored in order to react to rapid changes. Until recently, the control of maximum power point tracking (MPPT) has been implemented in microcontrollers and digital signal processors (DSPs). While DSPs can provide a reasonable performance, they do not provide the advantages that field-programmable gate arrays (FPGA) chips can potentially offer to the implementation of MPPT control. In comparison to DSP implementations, FPGAs offer lower cost implementations since the functions of various components can be integrated onto the same FPGA chip as opposed to DSPs which can perform only DSP-related computations. In addition, FPGAs can provide equivalent or higher performance with the customization potential of an ASIC. Because FPGAs can be reprogrammed at any time, repairs can be performed in-situ while the system is running thus providing a high degree of robustness. Beside robustness, this reprogrammability can provide a high level of (i) flexibility that can make upgrading an MPPT control system easy by merely updating or modifying the MPPT algorithm running on the FPGA chip, and (ii) expandability that makes expanding an FPGA-based MPPT control system to handle multi-channel control. In addition, this reprogrammability provides a level of testability that DSPs cannot match by allowing the emulation of the entire MPPT control system onto the FPGA chip. This thesis proposes an FPGA-based implementation of an MPPT control system suitable for space applications. At the core of this system, the Perturb-and-observe algorithm is used to track the maximum power point. The algorithm runs on an Alera FLEX 10K FPGA chip. Additional functional blocks, such as the ADC interface, FIR filter, dither generator, and DAC interface, needed to support the MPPT control system are integrated within the same FPGA device thus streamlining the part composition of the physical prototype used to build this control system. MPPT FPGA Perturb and Observe Photovoltaic Power Satellite Electrical and Computer Engineering Electrical and Electronics Engineering

Search results