Global ETD Search

341	BitMaT - Bitstream Manipulation Tool for Xilinx FPGAs Morford, Casey Justin 03 January 2006 (has links) With the introduction of partially reconfigurable FPGAs, we are now able to perform dynamic changes to hardware running on an FPGA without halting the operation of the design. Module based partial reconfiguration allows the hardware designer to create multiple hardware modules that perform different tasks and swap them in and out of designated dynamic regions on an FPGA. However, the current mainstream partial reconfiguration flow provides a limited and inefficient approach that requires a strict set of guidelines to be met. This thesis introduces BitMaT, a tool that provides the low-level bitstream manipulation as a member tool of an alternative, automated, modular partial reconfiguration flow. / Master of Science Field programmable gate arrays Partial Reconfiguration Virtex-II Pro Bitstream Manipulation Bitstream Xilinx Virtex-II Dynamic Module Server
342	Real-Time Computed Tomography-based Medical Diagnosis Using Deep Learning Goel, Garvit 24 February 2022 (has links) Computed tomography has been widely used in medical diagnosis to generate accurate images of the body's internal organs. However, cancer risk is associated with high X-ray dose CT scans, limiting its applicability in medical diagnosis and telemedicine applications. CT scans acquired at low X-ray dose generate low-quality images with noise and streaking artifacts. Therefore we develop a deep learning-based CT image enhancement algorithm for improving the quality of low-dose CT images. Our algorithm uses a convolution neural network called DenseNet and Deconvolution network (DDnet) to remove noise and artifacts from the input image. To evaluate its advantages in medical diagnosis, we use DDnet to enhance chest CT scans of COVID-19 patients. We show that image enhancement can improve the accuracy of COVID-19 diagnosis (~5% improvement), using a framework consisting of AI-based tools. For training and inference of the image enhancement AI model, we use heterogeneous computing platform for accelerating the execution and decreasing the turnaround time. Specifically, we use multiple GPUs in distributed setup to exploit batch-level parallelism during training. We achieve approximately 7x speedup with 8 GPUs running in parallel compared to training DDnet on a single GPU. For inference, we implement DDnet using OpenCL and evaluate its performance on multi-core CPU, many-core GPU, and FPGA. Our OpenCL implementation is at least 2x faster than analogous PyTorch implementation on each platform and achieves comparable performance between CPU and FPGA, while FPGA operated at a much lower frequency. / Master of Science / Computed tomography has been widely used in the medical diagnosis of diseases, such as cancer/tumor, viral pneumonia, and more recently, COVID-19. However, the risk of cancer associated with X-ray dose in CT scans limits the use of computed tomography in biomedical imaging. Therefore we develop a deep learning-based image enhancement algorithm that can be used with low X-ray dose computed tomography scanners to generate high-quality CT images. The algorithm uses a state-of-the-art convolution neural network for increased performance and computational efficiency. Further, we use image enhancement algorithm to develop a framework of AI-based tools to improve the accuracy of COVID-19 diagnosis. We test and validate the framework with clinical COVID-19 data. Our framework applies to the diagnosis of COVID-19 and its variants, and other diseases that can be diagnosed via computed tomography. We utilize high-performance computing techniques to reduce the execution time of training and testing AI models in our framework. We also evaluate the efficacy of training and inference of the neural network on heterogeneous computing platforms, including multi-core CPU, many-core GPU, and field-programmable gate arrays (FPGA), in terms of speed and power consumption. AI biomedical imaging corona virus COVID-19 deep learning diagnosis neural networks GPU Field programmable gate arrays
343	Hardware-Software Co-Design for Sensor Nodes in Wireless Networks Zhang, Jingyao 11 June 2013 (has links) Simulators are important tools for analyzing and evaluating different design options for wireless sensor networks (sensornets) and hence, have been intensively studied in the past decades. However, existing simulators only support evaluations of protocols and software aspects of sensornet design. They cannot accurately capture the significant impacts of various hardware designs on sensornet performance. As a result, the performance/energy benefits of customized hardware designs are difficult to be evaluated in sensornet research. To fill in this technical void, in first section, we describe the design and implementation of SUNSHINE, a scalable hardware-software emulator for sensornet applications. SUNSHINE is the first sensornet simulator that effectively supports joint evaluation and design of sensor hardware and software performance in a networked context. SUNSHINE captures the performance of network protocols, software and hardware up to cycle-level accuracy through its seamless integration of three existing sensornet simulators: a network simulator TOSSIM, an instruction-set simulator SimulAVR and a hardware simulator GEZEL. SUNSHINE solves several sensornet simulation challenges, including data exchanges and time synchronization across different simulation domains and simulation accuracy levels. SUNSHINE also provides hardware specification scheme for simulating flexible and customized hardware designs. Several experiments are given to illustrate SUNSHINE's simulation capability. Evaluation results are provided to demonstrate that SUNSHINE is an efficient tool for software-hardware co-design in sensornet research. Even though SUNSHINE can simulate flexible sensor nodes (nodes contain FPGA chips as coprocessors) in wireless networks, it does not estimate power/energy consumption of sensor nodes. So far, no simulators have been developed to evaluate the performance of such flexible nodes in wireless networks. In second section, we present PowerSUNSHINE, a power- and energy-estimation tool that fills the void. PowerSUNSHINE is the first scalable power/energy estimation tool for WSNs that provides an accurate prediction for both fixed and flexible sensor nodes. In the section, we first describe requirements and challenges of building PowerSUNSHINE. Then, we present power/energy models for both fixed and flexible sensor nodes. Two testbeds, a MicaZ platform and a flexible node consisting of a microcontroller, a radio and a FPGA based co-processor, are provided to demonstrate the simulation fidelity of PowerSUNSHINE. We also discuss several evaluation results based on simulation and testbeds to show that PowerSUNSHINE is a scalable simulation tool that provides accurate estimation of power/energy consumption for both fixed and flexible sensor nodes. Since the main components of sensor nodes include a microcontroller and a wireless transceiver (radio), their real-time performance may be a bottleneck when executing computation-intensive tasks in sensor networks. A coprocessor can alleviate the burden of microcontroller from multiple tasks and hence decrease the probability of dropping packets from wireless channel. Even though adding a coprocessor would gain benefits for sensor networks, designing applications for sensor nodes with coprocessors from scratch is challenging due to the consideration of design details in multiple domains, including software, hardware, and network. To solve this problem, we propose a hardware-software co-design framework for network applications that contain multiprocessor sensor nodes. The framework includes a three-layered architecture for multiprocessor sensor nodes and application interfaces under the framework. The layered architecture is to make the design of multiprocessor nodes' applications flexible and efficient. The application interfaces under the framework are implemented for deploying reliable applications of multiprocessor sensor nodes. Resource sharing technique is provided to make processor, coprocessor and radio work coordinately via communication bus. Several testbeds containing multiprocessor sensor nodes are deployed to evaluate the effectiveness of our framework. Network experiments are executed in SUNSHINE emulator to demonstrate the benefits of using multiprocessor sensor nodes in many network scenarios. / Ph. D. Sensor networks multiprocessor sensor node Field programmable gate arrays simulator hardware-software co-design power/energy estimation testbeds
344	HE-MT6D: A Network Security Processor with Hardware Engine for Moving Target IPv6 Defense (MT6D) over 1 Gbps IEEE 802.3 Ethernet Sagisi, Joseph Lozano 28 July 2017 (has links) Traditional static network addressing allows attackers the incredible advantage of taking time to plan and execute attacks against a network. To counter, Moving Target IPv6 Defense (MT6D) provides a network host obfuscation technique that dynamically obscures network and transport layer addresses. Software driven implementations have posed many challenges, namely, constant code maintenance to remain compliant with all library and kernel dependencies, less than optimal throughput, and the requirement for a dedicated general purpose hardware. The work of this thesis presents Network Security Processor and Hardware Engine for MT6D (HE-MT6D) to overcome these challenges. HE-MT6D is a soft core Intellectual Property (IP) block developed in full Register Transfer Level (RTL) and is the first hardware-oriented design of MT6D. Major contributions of HE-MT6D include the complete separation of the data and control planes, development of a nonlinear Complex Instruction Set Computer (CISC) Network Security Processor for in-flight packet modification, a specialized Packet Assembly language, a configurable and a parallelized memory search through tag-based Hybrid Content Addressable Memory (HCAM) L1 write-through cache, full RTL Network Time Protocol version 4 hardware module, and a modular crypto engine. HE-MT6D supports multiple nodes and provides 1,025% throughput performance increase over earlier C-based MT6D at 863 Mbps with full encapsulation and decapsulation, and it matches bare wire throughput performance for all other traffic. The HE-MT6D IP block can be configured as an independent physical gateway device, built as embedded Application Specific Integrated Circuit (ASIC), or serve as a System on Chip (SoC) integrated submodule. / Master of Science IPv6 Security Moving Target Defense Network Security Processor Field programmable gate arrays Moving Target IPv6 Defense System on Chip Packet Processor
345	An Investigation of Methods to Improve Area and Performance of Hardware Implementations of a Lattice Based Cryptosystem Beckwith, Luke Parkhurst 05 November 2020 (has links) With continuing research into quantum computing, current public key cryptographic algorithms such as RSA and ECC will become insecure. These algorithms are based on the difficulty of integer factorization or discrete logarithm problems, which are difficult to solve on classical computers but become easy with quantum computers. Because of this threat, government and industry are investigating new public key standards, based on mathematical assumptions that remain secure under quantum computing. This paper investigates methods of improving the area and performance of one of the proposed algorithms for key exchanges, "NewHope." We describe a pipelined FPGA implementation of NewHope512cpa which dramatically increases the throughput for a similar design area. Our pipelined encryption implementation achieves 652.2 Mbps and a 0.088 Mbps/LUT throughput-to-area (TPA) ratio, which are the best known results to date, and achieves an energy efficiency of 0.94 nJ/bit. This represents TPA and energy efficiency improvements of 10.05× and 8.58×, respectively, over a non-pipelined approach. Additionally, we investigate replacing the large SHAKE XOF (hash) function with a lightweight Trivium based PRNG, which reduces the area by 32% and improves energy efficiency by 30% for the pipelined encryption implementation, and which could be considered for future cipher specifications. / Master of Science / Cryptography is prevalent in almost every aspect of our lives. It is used to protect communication, banking information, and online transactions. Current cryptographic protections are built specifically upon public key encryption, which allows two people who have never communicated before to setup a secure communication channel. However, due to the nature of current cryptographic algorithms, the development of quantum computers will make it possible to break the algorithms that secure our communications. Because of this threat, new algorithms based on principles that stand up to quantum computing are being investigated to find a suitable alternative to secure our systems. These algorithms will need to be efficient in order to keep up with the demands of the ever growing internet. This paper investigates four hardware implementations of a proposed quantum-secure algorithm to explore ways to make designs more efficient. The improvements are valuable for high throughput applications, such as a server which must handle a large number of connections at once. Post Quantum Cryptography NewHope Field programmable gate arrays Cryptography Pipelined Architecture Trivium Random Number Generation Register Transfer Level Design NIST
346	Securing Software Intellectual Property on Commodity and Legacy Embedded Systems Gora, Michael Arthur 25 June 2010 (has links) The proliferation of embedded systems into nearly every aspect of modern infrastructure and society has seen their deployment in such diverse roles as monitoring the power grid and processing commercial payments. Software intellectual property (SWIP) is a critical component of these increasingly complex systems and represents a significant investment to its developers. However, deeply immersed in their environment, embedded systems are difficult to secure. As a result, developers want to ensure that their SWIP is protected from being reverse engineered or stolen by unauthorized parties. Many techniques have been proposed to address the issue of SWIP protection for embedded systems. These range from secure memory components to complete shifts in processor architectures. While powerful, these approaches often require the development of systems from the ground up or the application of specialized and often expensive hardware components. As a result they are poorly suited to address the security concerns of legacy embedded systems or systems based on commodity components. This work explores the protection of SWIP on heavily constrained, legacy and commodity embedded systems. We accomplish this by evaluating a generic embedded system to identify the security concerns in the context of SWIP protection. The evaluation is applied to determine the limitations of a software only approach on a real world legacy embedded system that lacks any specialized security hardware features. We improve upon this system by developing a prototype system using only commodity components. Finally we propose a Portable Embedded Software Intellectual Property Security (PESIPS) system that can easily be deployed as a framework on both legacy and commodity systems. / Master of Science Secure Embedded Systems Software Intellectual Property Firmware Security Field programmable gate arrays Design Flow Physical Unclonable Function
347	Cost Beneficial Solution for High Rate Data Processing Mirchandani, Chandru, Fisher, David, Ghuman, Parminder 10 1900 (has links) International Telemetering Conference Proceedings / October 25-28, 1999 / Riviera Hotel and Convention Center, Las Vegas, Nevada / GSFC in keeping with the tenets of NASA has been aggressively investigating new technologies for spacecraft and ground communications and processing. The application of these technologies, together with standardized telemetry formats, make it possible to build systems that provide high-performance at low cost in a short development cycle. The High Rate Telemetry Acquisition System (HRTAS) Prototype is one such effort that has validated Goddard's push towards faster, better and cheaper. The HRTAS system architecture is based on the Peripheral Component Interconnect (PCI) bus and VLSI Application-Specific Integrated Circuits (ASICs). These ASICs perform frame synchronization, bit-transition density decoding, cyclic redundancy code (CRC) error checking, Reed-Solomon error detection/correction, data unit sorting, packet extraction, annotation and other service processing. This processing in performed at rates of up to and greater than 150 Mbps sustained using a high-end performance workstation running standard UNIX O/S, (DEC 4100 with DEC UNIX or better). ASICs are also used for the digital reception of Intermediate Frequency (IF) telemetry as well as the spacecraft command interface for commands and data simulations. To improve the efficiency of the back-end processing, the level zero processing sorting element is being developed. This will provide a complete hardware solution to extracting and sorting source data units and making these available in separate files on a remote disk system. Research is on going to extend this development to higher levels of the science data processing pipeline. The fact that level 1 and higher processing is instrument dependent; an acceleration approach utilizing ASICs is not feasible. The advent of field programmable gate array (FPGA) based computing, referred to as adaptive or reconfigurable computing, provides a processing performance close to ASIC levels while maintaining much of the programmability of traditional microprocessor based systems. This adaptive computing paradigm has been successfully demonstrated and its cost performance validated, to make it a viable technology for the level one and higher processing element for the HRTAS. Higher levels of processing are defined as the extraction of useful information from source telemetry data. This information has to be made available to the science data user in a very short period of time. This paper will describe this low cost solution for high rate data processing at level one and higher processing levels. The paper will further discuss the cost-benefit of this technology in terms of cost, schedule, reliability and performance. Telemetry frame synchronization service processing CCSDS low cost platforms Field Programmable Gate Arrays (FPGAs)
348	Implementation of decision trees for embedded systems Badr, Bashar January 2014 (has links) This research work develops real-time incremental learning decision tree solutions suitable for real-time embedded systems by virtue of having both a defined memory requirement and an upper bound on the computation time per training vector. In addition, the work provides embedded systems with the capabilities of rapid processing and training of streamed data problems, and adopts electronic hardware solutions to improve the performance of the developed algorithm. Two novel decision tree approaches, namely the Multi-Dimensional Frequency Table (MDFT) and the Hashed Frequency Table Decision Tree (HFTDT) represent the core of this research work. Both methods successfully incorporate a frequency table technique to produce a complete decision tree. The MDFT and HFTDT learning methods were designed with the ability to generate application specific code for both training and classification purposes according to the requirements of the targeted application. The MDFT allows the memory architecture to be specified statically before learning takes place within a deterministic execution time. The HFTDT method is a development of the MDFT where a reduction in the memory requirements is achieved within a deterministic execution time. The HFTDT achieved low memory usage when compared to existing decision tree methods and hardware acceleration improved the performance by up to 10 times in terms of the execution time. 621.39
349	Towards the development of a reliable reconfigurable real-time operating system on FPGAs Hong, Chuan January 2013 (has links) In the last two decades, Field Programmable Gate Arrays (FPGAs) have been rapidly developed from simple “glue-logic” to a powerful platform capable of implementing a System on Chip (SoC). Modern FPGAs achieve not only the high performance compared with General Purpose Processors (GPPs), thanks to hardware parallelism and dedication, but also better programming flexibility, in comparison to Application Specific Integrated Circuits (ASICs). Moreover, the hardware programming flexibility of FPGAs is further harnessed for both performance and manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR allows a part or parts of a circuit to be reconfigured at run-time, without interrupting the rest of the chip’s operation. As a result, hardware resources can be more efficiently exploited since the chip resources can be reused by swapping in or out hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR improves fault tolerance against transient errors and permanent damage, such as Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid error accumulation. Furthermore, power and heat can be reduced by removing finished or idle tasks from the chip. For all these reasons above, DPR has significantly promoted Reconfigurable Computing (RC) and has become a very hot topic. However, since hardware integration is increasing at an exponential rate, and applications are becoming more complex with the growth of user demands, highlevel application design and low-level hardware implementation are increasingly separated and layered. As a consequence, users can obtain little advantage from DPR without the support of system-level middleware. To bridge the gap between the high-level application and the low-level hardware implementation, this thesis presents the important contributions towards a Reliable, Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the user exploitation of DPR from the application level, by managing the complex hardware in the background. In R3TOS, hardware tasks behave just like software tasks, which can be created, scheduled, and mapped to different computing resources on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and EVC), which schedule tasks in real-time and circumvent emerging faults while maintaining more compact empty areas. 3) Design and implementation of a faulttolerant microprocessor by harnessing the existing FPGA resources, such as Error Correction Code (ECC) and configuration primitives. 4) A novel symmetric multiprocessing (SMP)-based architectures that supports shared memory programing interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest Neighbour classifier, which is a non-parametric classification algorithm widely used in various fields of data mining; and b) pairwise sequence alignment, namely the Smith Waterman algorithm, used for identifying similarities between two biological sequences. R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking applications, whereby resources can be dynamically managed in respect of user requirements and hardware availability. Benefiting from this, not only the hardware resources can be more efficiently used, but also the system performance can be significantly increased. Results show that the scheduling and allocating efficiencies have been improved up to 2x, and the overall system performance is further improved by ~2.5x. Future work includes the development of Network on Chip (NoC), which is expected to further increase the communication throughput; as well as the standardization and automation of our system design, which will be carried out in line with the enablement of other high-level synthesis tools, to allow application developers to benefit from the system in a more efficient manner. 621.39
350	Modélisation, exploration et estimation de la consommation pour les architectures hétérogènes reconfigurables dynamiquement / Model, exploration and estimation of consumption in dynamically reconfigurable heterogeneous architectures Bonamy, Robin 12 July 2013 (has links) L'utilisation des accélérateurs reconfigurables, pour la conception de system-on-chip hétérogènes, offre des possibilités intéressantes d'augmentation des performances et de réduction de la consommation d'énergie. En effet, ces accélérateurs sont couramment utilisés en complément d'un (ou de plusieurs) processeur(s) pour permettre de décharger celui-ci (ceux-ci) des calculs intensifs et des traitements de flots de données. Le concept de reconfiguration dynamique, supporté par certains constructeurs de FPGA, permet d'envisager des systèmes beaucoup plus flexibles en offrant notamment la possibilité de séquencer temporellement l'exécution de blocs de calcul sur la même surface de silicium, réduisant alors les besoins en ressources d'exécution. Cependant, la reconfiguration dynamique n'est pas sans impact sur les performances globales du système et il est difficile d'estimer la répercussion des décisions de configuration sur la consommation d'énergie. L'objectif principal de cette thèse consiste à proposer une méthodologie d'exploration permettant d'évaluer l'impact des choix d'implémentation des différentes tâches d'une application sur un system-on-chip contenant une ressource reconfigurable dynamiquement, en vue d'optimiser la consommation d'énergie ou le temps d'exécution. Pour cela, nous avons établi des modèles de consommation des composants reconfigurables, en particulier les FPGAs, qui permettent d'aider le concepteur dans son design. À l'aide d'une méthodologie de mesure sur Virtex-5, nous montrons dans un premier temps qu'il est possible de générer des accélérateurs matériels de tailles variées ayant des performances temporelles et énergétiques diverses. Puis, afin de quantifier les coûts d'implémentation de ces accélérateurs, nous construisons trois modèles de consommation de la reconfiguration dynamique partielle. Finalement, à partir des modèles définis et des accélérateurs produits, nous développons un algorithme d'exploration des solutions d'implémentation pour un système complet. En s'appuyant sur une plate-forme de modélisation à haut niveau, celui-ci analyse les coûts d'implémentation des tâches et leur exécution sur les différentes ressources disponibles (processeur ou région configurable). Les solutions offrant les meilleures performances en fonction des contraintes de conception sont retenues pour être exploitées. / The use of reconfigurable accelerators when designing heterogeneous system-on-chip has the potential to increase performance and reduce energy consumption. Indeed, these accelerators are commonly a adjunct to one (or more) processor(s) and unload intensive computations and treatments. The concept of dynamic reconfiguration, supported by some FPGA vendors, allows to consider more flexible systems including the ability to sequence the execution of accelerators on the same silicon area, while reducing resource requirements. However, dynamic reconfiguration may impact overall system performance and it is hard to estimate the impact of configuration decisions on energy consumption.. The main objective of this thesis is to provide an exploration methodology to assess the impact of implementation choices of tasks of an application on a system-on-chip containing a dynamically reconfigurable resource, to optimize the energy consumption or the processing time. Therefore, we have established consumption models of reconfigurable components, particularly FPGAs, which assists the designer. Using a measurement methodology on Virtex-5, we first show the possibility to generate hardware accelerators of various sizes, execution time and energy consumption. Then, in order to quantify the implementation costs of these accelerators, we build three power models of the dynamic and partial reconfiguration. Finally, from these models, we develop an algorithm for the exploration of implementation and allocation possibilities for a complete system. Based on a high-level modeling platform, the implementation costs of the tasks and their performance on various resources (CPU or reconfigurable region) are analyzed. The solutions with the best characteristics, based on design constraints, are extracted. Consommation d'énergie Synthèse de haut niveau (informatique) Reconfiguration ( informatique) Energy consumption Field programmable gate arrays High-level synthesis

Search results