• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 62
  • 37
  • 3
  • 2
  • 2
  • Tagged with
  • 137
  • 137
  • 68
  • 57
  • 57
  • 25
  • 24
  • 24
  • 23
  • 20
  • 20
  • 19
  • 19
  • 18
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Design of the Software/Hardware Codesign Platform-IRES

Yeh, Ta-li 20 August 2008 (has links)
High-performance reconfigurable computing has demonstrated its potential to accelerate demanding computational applications. Thus, the current trend is towards combining the microprocessor with the power of reconfigurable hardware in embedded system research area. However, integrating hardware and software that is the interface of communication is challenging. In this thesis, we present a methodology flow to improve the cohesion between hardware and software for reconfigurable embedded system design through IRES (I-link for Reconfigurable Embedded System), Hardware-Software integration platform. In IRES, we set up the platform and produce the Executor through I-link (Hardware-Software Integration Link). The Executor consists of tasks and hardware bitstreams which are provided by user design, bootloader and operation system which are provided by system, and PSPs (Program Segment Prefix) which are from the files given above. We initial the system through bootloader which will scan the PSPs of Executor to construct Task Control Block (TCB), Hardware Control Block (HCB) and Netlist IP Information Block (NIB) data structure. User can get the hardware information from those data structures, and communicate with hardware by using simple functions like ¡§read()¡¨ and ¡§write()¡¨. Then, the system transmits the data to and from multi-hardware through Hardware Management Unit (HMU) which also has data buffering ability. Finally, we successfully accomplish IRES Hardware-Software integration platform in HSCP, which is developed in our laboratory, and verify the feasibility of communication between hardware and software.
2

A Multicore Computing Platform for Benchmarking Dynamic Partial Reconfiguration Based Designs

Thorndike, David Andrew 27 August 2012 (has links)
No description available.
3

Timing-Aware Automatic Floorplanning of Partially Reconfigurable Designs for Accelerating FPGA Productivity

Raja Gopalan, Sureshwar 24 September 2010 (has links)
FPGA implementation tool speed has not kept pace with the increase in design size and FPGA density. It is difficult to parallelize place-and-route algorithms without sacrificing determinism or quality of results. We address the implementation problem using a divide-and-conquer approach. The PATIS automatic floorplanner enables dynamic modular design, which sacrifices some design speed and area optimization for faster implementation of layout changes, including addition of debug logic. Automatic generation of a timing-driven floorplan for a partially reconfigurable design aims to remove the need for implementation iterations to meet all constraints. Floorplan speculation may anticipate small changes to a design. Although PATIS supports incremental design, complete re-implementation is still rapid because the partial bitstream for each block is generated by independent concurrent invocations of the standard Xilinx tools. / Master of Science
4

Accelerating Incremental Floorplanning of Partially Reconfigurable Designs to Improve FPGA Productivity

Chandrasekharan, Athira 17 August 2010 (has links)
FPGA implementation tool turnaround time has unfortunately not kept pace with FPGA density advances. It is difficult to parallelize place-and-route algorithms without sacrificing determinism or quality of results. We approach the problem in a different way for development environments in which some circuit speed and area optimization may be sacrificed for improved implementation and debug turnaround. The PATIS floorplanner enables dynamic modular design, which accelerates non-local changes to the physical layout arising from design exploration and the addition of debug circuitry. We focus in this work on incremental and speculative floorplanning in PATIS, to accommodate minor design changes and to proactively generate possible floorplan variants. Current floorplan topology is preserved to minimize ripple effects and maintain reasonable module aspect ratios. The design modules are run-time reconfigurable to enable concurrent module implementation by independent invocations of the standard FPGA tools running on separate cores or hosts. / Master of Science
5

Efficient reconfigurable architectures for 3D medical image compression

Afandi, Ahmad January 2010 (has links)
Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In these fields, medical image compression is important since both efficient storage and transmission of data through high-bandwidth digital communication lines are of crucial importance. Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow for quick upgradeability with real-time applications. Moreover, in order to obtain efficient solutions for large medical volumes data, an efficient implementation of these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system building block in the construction of high-performance systems at an economical price. Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent advantages such as massive parallelism capabilities, multimillion gate counts, and special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits promising results in reducing Gaussian white noise in medical images. In terms of hardware implementation, promising trade-offs on maximum frequency, throughput and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC) has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that 3-D IT demonstrates better computational complexity than the 3-D DWT, whilst the 3-D DWT with LS exhibits a lossless compression that is significantly useful for medical image compression. Additionally, an architecture of CAVLC that is capable of compressing high-definition (HD) images in real-time without any buffer between the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.
6

A custom coprocessor for matrix algorithms

Amira, A. A. January 2001 (has links)
No description available.
7

OpenCL Framework for a CPU, GPU, and FPGA Platform

Ahmed, Taneem 01 December 2011 (has links)
With the availability of multi-core processors, high capacity FPGAs, and GPUs, a heterogeneous platform with tremendous raw computing capacity can be constructed consisting of any number of these computing elements. However, one of the major challenges for constructing such a platform is the lack of a standardized framework under which an application’s computational task and data can be easily and effectively managed amongst the computing elements. In this thesis work such a framework is developed based on OpenCL (Open Computing Language). An OpenCL API and run time framework, called O4F, was implemented to incorporate FPGAs in a platform with CPUs and GPUs under the OpenCL framework. O4F help explore the possibility of using OpenCL as the framework to incorporate FPGAs with CPUs and GPUs. This thesis details the findings of this first-generation implementation and provides recommendations for future work.
8

OpenCL Framework for a CPU, GPU, and FPGA Platform

Ahmed, Taneem 01 December 2011 (has links)
With the availability of multi-core processors, high capacity FPGAs, and GPUs, a heterogeneous platform with tremendous raw computing capacity can be constructed consisting of any number of these computing elements. However, one of the major challenges for constructing such a platform is the lack of a standardized framework under which an application’s computational task and data can be easily and effectively managed amongst the computing elements. In this thesis work such a framework is developed based on OpenCL (Open Computing Language). An OpenCL API and run time framework, called O4F, was implemented to incorporate FPGAs in a platform with CPUs and GPUs under the OpenCL framework. O4F help explore the possibility of using OpenCL as the framework to incorporate FPGAs with CPUs and GPUs. This thesis details the findings of this first-generation implementation and provides recommendations for future work.
9

Image Processing On Reconfigurable System-on-Chip

Han, Jie Unknown Date (has links)
Real-time image processing requires not only sophisticated heuristic algorithms customized for a particular application, but also needs substantial computational power to handle a massive quantity of input image data. Reconfigurable System-on- Chip (rSoC), a powerful method to harness the power of FPGA technology, is well suited to real-time image processing. It balances the design cost and performance via a combination of hardware and software. However, hardware/software co-design requires specialized design skills, and designs are complex. This thesis investigates how best to use FPGA-based reconfigurable computing to provide efficient speed-up of real-time image processing algorithms. Existing rSoC systems, face detection and recognition algorithms, hardware/software co-design methods are first reviewed and analyzed. The advantages and disadvantages of existing research results are also presented. However, these existing approaches all have shortcomings. A new rSoC system without a separate host machine is presented for standalone embedded platforms. A new hardware/software co-design method including hardware/software communication and partitioning is also explained. This rSoC system is a highly modular system, it runs without a host machine and it supports the Linux operating systems. Hardware and software designs can be rapidly implemented on this new platform. A new method for hardware/software communication in rSoC design is presented, which is based on shared memory and semaphores, and makes hardware coprocessors appear like software processes. Individual processes in hardware-software systems can communicate without knowing whether other co-operating processes are hardware or software. This approach enables re-useable hardware components to be readily accessed by designers, without specialist hardware knowledge. Processes also can be easily swapped between hardware and software. The partitioning method handles the software/hardware partition iteratively during the implementation. The partition is based on experimental profiling, so it is easier to realize and may achieve a more optimal result than a fixed a priori partition. An example face recognition system has been implemented to test the new design method. It is a four-stage pipeline architecture which contains image capture, face detection, image enhancement, and face recognition. Firstly, a software-only solution using semaphores and shared memory method is implemented on a Linux PC. Results of 5.5 frames per second indicate that the speed may not be fast enough for real-time image processing. Secondly, that software-only solution is moved to the new rSoC platform. The performance of 0.1 frames per second is worse than PC platform since the PC’s CPU is much more powerful than the rSoC’s. Finally the new design method is used to move some bottleneck modules to hardware. The new hardware/software communication method is used, so software modules remain unchanged and unaware of the movement of other modules to hardware. Results show that moving only one module to hardware was not helpful. However when both the bottleneck modules were moved to hardware, the system speedup was approximately 200 with a final system speed of 19 frames per second.
10

Reconfigurable cellular automata computing for complex systems on the SPACE machine

George, David Frederick James January 2006 (has links)
Many complex natural and man made systems are inherently concurrent in nature, consisting of many autonomous parts that interact with each other. Cellular automata allow the concurrency and interactions of these complex systems to be modelled. Using a reconfigurable a computing platform for running cellular automata models allows the natural concurrency of digital electronics to be directly exploited by the system being modelled. This thesis investigates methods and philosophies for developing cellular automata models on a reconfigurable computing platform, the SPACE machine. Modelling and verification techniques are developed using a process algebra, Circal. These techniques allow the desired behaviour of a system to be specified and simulated. The model is then translated into a digital design, which can be verified as correct against the behavioural model using the Circal system. Three cellular automata system are used to develop the methods and philosophies. The Game of Life is used to investigate how to model and implement CA on the SPACE machine. The Philosophies and techniques that are developed for the Game of Life are used in the following systems. More complex cellular automata models of road traffic are used to further develop the modelling techniques developed in the Game a Life. A user interface, which was created for viewing the outputs from the Game a Life, is extended to allow cellular automata cells to be dynamically placed and moved about on the computing surface, allowing the user to observe and modify experiment in real time. A cellular automata based cryptography system is then used to further enhance the techniques developed, and particularly to explore the area of producing dynamically reconfigured circuits as the inputs to the system change. The thesis concludes that there are many real life complex systems, such as road traffic simulation and cryptography, which require high performs systems to run on. The methods and philosophies developed in this thesis allow CA systems to be modelled using process algebra and run directly in digital hardware, allowing the natural concurrency of the hardware to be fully exploited.

Page generated in 0.1121 seconds