Global ETD Search

31	Testing and Validation of a Prototype Gpgpu Design for FPGAs Merchant, Murtaza 01 January 2013 (has links) (PDF) Due to their suitability for highly parallel and pipelined computation, field programmable gate arrays (FPGAs) and general-purpose graphics processing units (GPGPUs) have emerged as top contenders for hardware acceleration of high-performance computing applications. FPGAs are highly specialized devices that can be customized to a specific application, whereas GPGPUs are made of a fixed array of multiprocessors with a rigid architectural model. To alleviate this rigidity as well as to combine some other benefits of the two platforms, it is desirable to explore the implementation of a flexible GPGPU (soft GPGPU) using the reconfigurable fabric found in an FPGA. This thesis describes an aggressive effort to test and validate a prototype GPGPU design targeted to a Virtex-6 FPGA. Individual design stages are tested and integrated together using manually-generated RTL testbenches and logic simulation tools. The soft GPGPU design is validated by benchmarking the platform against five standard CUDA benchmarks. The platform is fully CUDA-compatible and supports direct execution of CUDA compiled binaries. Platform scalability is validated by varying the number of processing cores as well as multiprocessors, and evaluating their effects on area and performance. Experimental results show as average speedup of 25x for a 32 core soft GPGPU configuration over a fully optimized MicroBlaze soft microprocessor, accentuating benefits of the thread-based execution model of GPUs and their ability to perform complex control flow operations in hardware. The testing and validation of the designed soft GPGPU system serves as a prerequisite for rapid design exploration of the platform in the future. GPGPU FPGA hardware acceleration CUDA compatible scalable flexible
32	Volume Visualization Using Advanced Graphics Hardware Shaders XUE, Daqing 12 September 2008 (has links) No description available. Computer Science volume visualization graphics hardware hardware acceleration flow visualization multi-shader rendering
33	A Scalable Framework for Monte Carlo Simulation Using FPGA-based Hardware Accelerators with Application to SPECT Imaging Kinsman, Phillip J. 04 1900 (has links) <p>As the number of transistors that are integrated onto a silicon die continues to in- crease, the compute power is becoming a commodity. This has enabled a whole host of new applications that rely on high-throughput computations. Recently, the need for faster and cost-effective applications in form-factor constrained environments has driven an interest in on-chip acceleration of algorithms based on Monte Carlo simula- tions. Though Field Programmable Gate Arrays (FPGAs), with hundreds of on-chip arithmetic units, show significant promise for accelerating these embarrassingly paral- lel simulations, a challenge exists in sharing access to simulation data amongst many concurrent experiments. This thesis presents a compute architecture for accelerating Monte Carlo simulations based on the Network-on-Chip (NoC) paradigm for on-chip communication. We demonstrate through the complete implementation of a Monte Carlo-based image reconstruction algorithm for Single-Photon Emission Computed Tomography (SPECT) imaging that this complex problem can be accelerated by two orders of magnitude on even a modestly-sized FPGA over a 2GHz Intel Core 2 Duo Processor. Futhermore, we have created a framework for further increasing paral- lelism by scaling our architecture across multiple compute devices and by extending our original design to a multi-FPGA system nearly linear increase in acceleration with logic resources was achieved.</p> / Master of Applied Science (MASc) Monte Carlo Hardware Acceleration Scientific Computing Network on Chip FPGA SPECT Computer and Systems Architecture Computer and Systems Architecture
34	Enabling the use of Heterogeneous Computing for Bioinformatics Bijanapalli Chakri, Ramakrishna 02 October 2013 (has links) The huge amount of information in the encoded sequence of DNA and increasing interest in uncovering new discoveries has spurred interest in accelerating the DNA sequencing and alignment processes. The use of heterogeneous systems, that use different types of computational units, has seen a new light in high performance computing in recent years; However expertise in multiple domains and skills required to program these systems is causing an hindrance to bioinformaticians in rapidly deploying their applications into these heterogeneous systems. This work attempts to make an heterogeneous system, Convey HC-1, with an x86-based host processor and FPGA-based co-processor, accessible to bioinformaticians. First, a highly efficient dynamic programming based Smith-Waterman kernel is implemented in hardware, which is able to achieve a peak throughput of 307.2 Giga Cell Updates per Second (GCUPS) on Convey HC-1. A dynamic programming accelerator interface is provided to any application that uses Smith-Waterman. This implementation is also extended to General Purpose Graphics Processing Units (GP-GPUs), which achieved a peak throughput of 9.89 GCUPS on NVIDIA GTX580 GPU. Second, a well known graphical programming tool, LabVIEW is enabled as a programming tool for the Convey HC-1. A connection is established between the graphical interface and the Convey HC-1 to control and monitor the application running on the FPGA-based co-processor. / Master of Science Field programmable gate arrays Hardware Acceleration High Performance Computing DNA Alignment LabVIEW Heterogeneous Computing GP-GPUs
35	Compression temps réel de séquences d'images médicales sur les systèmes embarqués / Real time medical image compression in embedded System Bai, Yuhui 18 November 2014 (has links) Dans le domaine des soins de santé, l'imagerie médicale a rapidement progressé et est aujourd'hui largement utilisés pour le diagnostic médical et le traitement du patient. La santé mobile devient une tendance émergente qui fournit des soins de santé et de diagnostic à distance. de plus, à l'aide des télécommunications, les données médicale incluant l'imagerie médicale et les informations du patient peuvent être facilement et rapidement partagées entre les hôpitaux et les services de soins de santé. En raison de la grande capacité de stockage et de la bande passante de transmission limitée, une technique de compression efficace est nécessaire. En tant que technique de compression d'image certifiée médicale, WAAVES fournit des taux de compression élevé, tout en assurant une qualité d'image exceptionnelle pour le diagnostic médical. Le défi consiste à transmettre à distance l'image médicale de l'appareil mobile au centre de soins de santé via un réseau à faible bande passante. Nos objectifs sont de proposer une solution de compression d'image intégrée à une vitesse de compression de 10 Mo/s, tout en maintenant la qualité de compression. Nous examinons d'abord l'algorithme WAAVES et évaluons sa complexité logicielle, basée sur un profilage précis du logiciel qui indique un complexité de l'algorithme WAAVES très élevée et très difficile à optimiser de contraintes très sévères en terme de surface, de temps d'exécution ou de consommation d'énergie. L'un des principaux défis est que les modules Adaptative Scanning et Hierarchical Enumerative Coding de WAAVES prennent plus de 90% du temps d'exécution. Par conséquent, nous avons exploité plusieurs possibilités d'optimisation de l'algorithme WAAVES pour simplifier sa mise en œuvre matérielle. Nous avons proposé des méthodologies de mise en œuvre possible de WAAVES, en premier lieu une mise en œuvre logiciel sur plateforme DSP. En suite, nous avons réalisé notre implémentation matérielle de WAAVES. Comme les FPGAs sont largement utilisés pour le prototypage ou la mise en œuvre de systèmes sur puce pour les applications de traitement du signal, leur capacités de parallélisme massif et la mémoire sur puce abondante permet une mise en œuvre efficace qui est souvent supérieure aux CPUs et DSPs. Nous avons conçu WAAVES Encoder SoC basé sur un FPGA de Stratix IV de chez Altera, les deux grands blocs coûteux en temps: Adaptative Scanning et Hierarchical Enumerative Coding sont implementés comme des accélérateurs matériels. Nous avons réalisé ces accélérateurs avec deux niveaux d'optimisations différents et les avons intégrés dans notre Encodeur SoC. La mise en œuvre du matérielle fonctionnant à 100MHz fournit des accélérations significatives par rapport aux implémentations logicielles, y compris les implémentations sur ARM Cortex A9, DSP et CPU et peut atteindre une vitesse de codage de 10 Mo/s, ce qui répond bien aux objectifs de notre thèse. / In the field of healthcare, developments in medical imaging are progressing very fast. New technologies have been widely used for the support of patient medical diagnosis and treatment. The mobile healthcare becomes an emerging trend, which provides remote healthcare and diagnostics. By using telecommunication networks and information technology, the medical records including medical imaging and patient's information can be easily and rapidly shared between hospitals and healthcare services. Due to the large storage size and limited transmission bandwidth, an efficient compression technique is necessary. As a medical certificate image compression technique, WAAVES provides high compression ratio while ensuring outstanding image quality for medical diagnosis. The challenge is to remotely transmit the medical image through the mobile device to the healthcare center over a low bandwidth network. Our goal is to propose a high-speed embedded image compression solution, which can provide a compression speed of 10MB/s while maintaining the equivalent compression quality as its software version. We first analyzed the WAAVES encoding algorithm and evaluated its software complexity, based on a precise software profiling, we revealed that the complex algorithm in WAAVES makes it difficult to be optimized for certain implementations under very hard constrains, including area, timing and power consumption. One of the key challenges is that the Adaptive Scanning block and Hierarchical Enumerative Coding block in WAAVES take more than 90% of the total execution time. Therefore, we exploited several potentialities of optimizations of the WAAVES algorithm to simplify the hardware implementation. We proposed the methodologies of the possible implementations of WAAVES, which started from the evaluation of software implementation on DSP platforms, following this evaluation we carried out our hardware implementation of WAAVES. Since FPGAs are widely used as prototyping or actual SoC implementation for signal processing applications, their massive parallelism and abundant on-chip memory allow efficient implementation that often rivals CPUs and DSPs. We designed our WAAVES Encoder SoC based on an Altera's Stratix IV FPGA, the two major time consuming blocks: Adaptive Scanning and Hierarchical Enumerative Coding are designed as IP accelerators. We realized the IPs with two different optimization levels and integrated them into our Encoder SoC. The Hardware implementation running at 100MHz provides significant speedup compared to the other software implementation including ARM Cortex A9, DSP and CPU and can achieve a coding speed of 10MB/s that fulfills the goals of our thesis. Waaves Imagerie médicale Systèmes embarqués Compression d'image Fpga Accélération matérielle Waaves Medical imaging Embedded System Image compression Fpga Hardware acceleration
36	Accélération matérielle pour la traduction dynamique de programmes binaires / Hardware acceleration of dynamic binary translation Rokicki, Simon 17 December 2018 (has links) Cette thèse porte sur l’utilisation de techniques d’accélération matérielle pour la conception de processeurs basés sur l’optimisation dynamique de binaires. Dans ce type de machine, les instructions du programme exécuté par le processeur sont traduites et optimisées à la volée par un outil de compilation dynamique intégré au processeur. Ce procédé permet de mieux exploiter les ressources du processeur cible, mais est délicate à exploiter car le temps de cette recompilation impacte de manière très significative l’effet global de ces optimisations. Dans cette thèse, nous montrons que l’utilisation d’accélérateurs matériels pour certaines étapes clés de cette compilation (construction de la représentation intermédiaire, ordonnancement des instructions), permet de ramener le temps de compilation à des valeurs très faible (en moyenne 6 cycles par instruction, contre plusieurs centaines dans le cas d’une mise en œuvre classique). Nous avons également montré comment ces techniques peuvent être exploitées pour offrir de meilleurs compromis performance/consommation sur certains types de noyaux de calculs. La thèse à également débouché sur la mise à disposition de la communauté de recherche du compilateur développé. / This thesis is focused on the hardware acceleration of processors based on Dynamic Binary Translation. Such architectures execute binaries by translating and optimizing each instruction at run-time, thanks to a DBT toolchain embedded in the system. This process leads to a better ressource utilization but also induces execution time overheads, which affect the overall performances. During this thesis, we've shown that the use of hardware components to accelerate critical parts of the DBT process (First translation, generation of an intermediate representation and instruction scheduling) drastically reduce the compilation time (around 6 cycles to schedule one instruction, against several hundreds for a fully-software DBT). We've also demonstrated that the proposed approach enables several continuous optimizations flow, which offers better energy/performance trade-offs. Finally, the DBT toolchain is open-source and available online. Systèmes embarqués Traduction Dynamique de Binaires VLIW Ordonnancement Accélération matérielle Embedded Systems Dynamic Binary Translation VLIW Instruction Scheduling Hardware Acceleration
37	OPTIMIZATION OF IMAGE GUIDED RADIATION THERAPY USING LIMITED ANGLE PROJECTIONS Ren, Lei January 2009 (has links) <p>Digital tomosynthesis (DTS) is a quasi-three-dimensional (3D) imaging technique which reconstructs images from a limited angle of cone-beam projections with shorter acquisition time, lower imaging dose, and less mechanical constraint than full cone-beam CT (CBCT). However, DTS images reconstructed by the conventional filtered back projection method have low plane-to-plane resolution, and they do not provide full volumetric information for target localization due to the limited angle of the DTS acquisition. </p><p>This dissertation presents the optimization and clinical implementation of image guided radiation therapy using limited-angle projections.</p><p>A hybrid multiresolution rigid-body registration technique was developed to automatically register reference DTS images with on-board DTS images to guide patient positioning in radiation therapy. This hybrid registration technique uses a faster but less accurate static method to achieve an initial registration, followed by a slower but more accurate adaptive method to fine tune the registration. A multiresolution scheme is employed in the registration to further improve the registration accuracy, robustness and efficiency. Normalized mutual information is selected as the criterion for the similarity measure, and the downhill simplex method is used as the search engine. This technique was tested using image data both from an anthropomorphic chest phantom and from head-and-neck cancer patients. The effects of the scan angle and the region-of-interest size on the registration accuracy and robustness were investigated. The average capture ranges in single-axis simulations with a 44° scan angle and a large ROI covering the entire DTS volume were between -31 and +34 deg for rotations and between -89 and +78 mm for translations in the phantom study, and between -38 and +38 deg for rotations and between -58 and +65 mm for translations in the patient study.</p><p>Additionally, a novel limited-angle CBCT estimation method using a deformation field map was developed to optimally estimate volumetric information of organ deformation for soft tissue alignment in image guided radiation therapy. The deformation field map is solved by using prior information, a deformation model, and new projection data. Patients' previous CBCT data are used as the prior information, and the new patient volume to be estimated is considered as a deformation of the prior patient volume. The deformation field is solved by minimizing bending energy and maintaining new projection data fidelity using a nonlinear conjugate gradient method. The new patient CBCT volume is then obtained by deforming the prior patient CBCT volume according to the solution to the deformation field. The method was tested for different scan angles in 2D and 3D cases using simulated and real projections of a Shepp-Logan phantom, liver, prostate and head-and-neck patient data. Hardware acceleration and multiresolution scheme are used to accelerate the 3D estimation process. The accuracy of the estimation was evaluated by comparing organ volume, similarity and pixel value differences between limited-angle CBCT and full-rotation CBCT images. Results showed that the respiratory motion in the liver patient, rectum volume change in the prostate patient, and the weight loss and airway volume change in the head-and-neck patient were accurately estimated in the 60° CBCT images. This new estimation method is able to optimally estimate the volumetric information using 60-degree projection images. It is both technically and clinically feasible for image-guidance in radiation therapy.</p> / Dissertation Biomedical Engineering Biophysics, Medical cone beam CT digital tomosynthesis hardware acceleration image guided radiation therapy image reconstruction image registration
38	Towards IQ-Appliances: Quality-awareness in Information Virtualization Niranjan Mysore, Radhika 03 May 2007 (has links) Our research addresses two important problems that arise in modern large-scale distributed systems: 1. The necessity to virtualize their data flows by applying actions such as filtering, format translation, coalescing or splitting, etc. 2. The desire to separate such actions from application level logic, to make it easier for future service-oriented codes to inter-operate in diverse and dynamic environments. This research considers the runtimes of the `information appliances used for these purposes, particularly with respect to their ability to provide diverse levels of Quality of Service (QoS) in lieu of dynamic application behaviors and the consequent changes in the resource needs of their data flows. Our specific contribution is the enrichment of these runtimes with methods for QoS-awareness, thereby giving them the ability to deliver desired levels of QoS even under sudden requirement changes IQ-appliances. For experimental evaluation, we enrich a prototype implementation of an IQ-appliance, based on the Intel IXP network processor, with the additional functionality needed to guarantee QoS constraints for diverse data streams. Measurements demonstrate the feasibility and utility of the approach. Further, we enhance the Self-Virtualized Network Interface developed in previous work from our group with QoS awareness and demonstrate the importance of such functionality in end-to-end virtualized infrastructures. Information virtualization Smart appliances Hardware acceleration
39	FPGA-Based Acceleration of LTE Protocol Decoding Thelin, William January 2021 (has links) This work investigates the possibility to accelerate a procedure in 4G/LTE systems, known as control channel analysis. The aim is to perform the procedure in real-time on cheap and accessible hardware.An LTE decoder implemented in software is modified to perform the procedure.The modified software is analyzed and profiled. The most time-consuming decoding steps are identified and implemented in hardware description language.The results show an acceleration of the most time-consuming steps of almost 50 times faster compared to implementation in software only. Furthermore, the resource utilization of the hardware design scales linearly with respect to faster decode time, if necessary the acceleration can be increased. However, the results from the profiling and time measurements of the software show that the time requirement is violated by other decoding steps.The thesis concludes that an acceleration in hardware of the most time-consuming steps is possible. However, to satisfy the time requirement further decode steps are required to be accelerated and/or a faster processor can be used. fpga hardware acceleration decode acceleration LTE 4G control channel analysis 4G protocol decoding LTE protocol decoding Computer Engineering Datorteknik
40	Návrh protokolu hardwarového akcelerátoru náročných výpočtů nad více jádry / A Hardware-acceleration Protocol Design for Demanding Computations over Multiple Cores Bareš, Jan January 2018 (has links) This work deals with design of communication protocol for data transmission between control computer and computing cores implemented on FPGA chips. The purpose of the communication is speeding the performance demanding software algorithms of non-stream data processing by their hardware computation on accelerating system. The work defines a terminology used for protocol design and analyses current solutions of given issue. After that the work designs structure of the accelerating system and communication protocol. In the main part the work describes the implementation of the protocol in VHDL language and the simulation of implemented modules. At the end of the work the aplication of designed solution is presented along with possible extension of this work.

Search results