Global ETD Search

241	A multi-channel real-time GPS position location system Parkinson, Kevin James, Surveying & Spatial Information Systems, Faculty of Engineering, UNSW January 2008 (has links) Since its introduction in the early 1980??s, the Global Positioning System (GPS) has become an important worldwide resource. Although the primary use of GPS is for position location, the inherent timing accuracy built into the system has allowed it become an important synchronisation resource for other systems. In most cases the GPS end user only requires a position estimate without awareness of the timing and synchronisation aspects of the system. A low accuracy position (at the several-metre level) with a low update rate of about 1Hz is often acceptable. However, obtaining more accurate position estimates (at the sub-metre level) at higher update rates requires the use of differential correction signals (DGPS) and greater processing power in the receiver. Furthermore, some extra challenges arise when simultaneously gathering information from a group of independently moving remote GPS receivers (rovers) at increased sampling rates (10Hz). This creates the need for a high bandwidth telemetry system and techniques to synchronise the position measurements for tracking each rover. This thesis investigates and develops an overall solution to these problems using GPS for both position location and synchronisation. A system is designed to generate relative position information from 30 or more rovers in real-time. The important contributions of this research are as follows: a) A GPS synchronised telemetry system is developed to transport GPS data from each rover. Proof of concept experiments show why a conventional RF Local Area Network (LAN) is not suitable for this application. The new telemetry system is developed using Field Programmable Gate Array (FPGA) devices to embed both the synchronising logic and the central processor. b) A new system architecture is developed to reduce the processing load of the GPS receiver. Furthermore, the need to transfer the DGPS correction data to the rover is eliminated. Instead, the receiver raw data is processed in a centralised Kalman filter to produce multiple position estimates in real-time. c) Steps are taken to optimise the telemetry data stream by using only the bare essential data from each rover. A custom protocol is developed to deliver the GPS receiver raw data to the central point with minimal latency. The central software is designed to extract and manage common elements such as satellite ephemeris data from the central reference receiver only. d) Methods are developed to make the overall system more robust by identifying and understanding the points of failure, providing fallback options to allow recovery with minimal impact. Based on the above a system is designed and integrated using a mixture of custom hardware, custom software and off-the-shelf hardware. Overall tests show that efforts to minimise latency, minimise power requirements and improve reliability have delivered good results. TDMA GPS FPGA GPS receivers Synchronization
242	Customization of floating-point units for embedded systems and field programmable gate arrays Chong, Michael Yee Jern, Computer Science & Engineering, Faculty of Engineering, UNSW January 2009 (has links) While Application Specific Instruction Set Processors (ASIPs) have allowed designers to create processors with custom instructions to target specific applications, floating-point units (FPUs) are still instantiated as non-customizable general-purpose units, which if under utilized, wastes area and performance. However, customizing FPUs manually is a complex and time-consuming process. Therefore, there is a need for an automated custom FPU generation scheme. This thesis presents a methodology for generating application-specific FPUs customized at the instruction level, with integrated datapath merging to minimize area. The methodology reduces the subset of floating-point instructions implemented to the minimum required for the application. Datapath merging is then performed on the required datapaths to minimize area. Previous datapath merging techniques failed to consider merging components of different bit-widths and thus ignore the bit-alignment problem in datapath merging. This thesis presents a novel bit-alignment solution during datapath merging. In creating the custom FPU, the subset of floating-point instructions that should be implemented in hardware has to be determined. Implementing more instructions in hardware reduces the cycle count of the application, but may lead to increased delay due to multiplexers inserted on the critical path during datapath merging. A rapid design space exploration was performed to explore the trade-offs. By performing this exploration, a designer could determine the number of instructions that should be implemented as a custom FPU and the number that should be left for software emulation, such that performance and area meets the designer's requirements. Customized FPUs were generated for different Mediabench applications and compared to a fully-featured reference FPU that implemented all floating-point operations. Reducing the floating-point instruction set reduced the FPU area by an average of 55%. Performing instruction reduction and then datapath merging reduced the FPU area by an average of 68%. Experiments showed that datapath merging without bit-alignment achieved an average area reduction of 10.1%. With bit-alignment, an average of 16.5% was achieved. Bit-alignment proved most beneficial when there was a diverse mix of different bit-widths in the datapaths. Performance of Field-Programmable Gate Arrays (FPGAs) used for floating-point applications is poor due to the complexity of floating-point arithmetic. Implementing floating-point units on FPGAs consume a large amount of resources. Therefore, there is a need for embedded FPUs in FPGAs. However, if unutilized, they waste area on the FPGA die. To overcome this issue, a novel flexible multi-mode embedded FPU for FPGAs is presented in this thesis that can be configured to perform a wide range of operations. The floating-point adder and multiplier in the embedded FPU can each be configured to perform one double-precision operation or two single-precision operations in parallel. To increase flexibility further, access to the large integer multiplier, adder and shifters in the FPU is provided. It is also capable of floating-point and integer multiply-add operations. Benchmark circuits were implemented on both a standard Xilinx Virtex-II FPGA and on the FPGA with embedded FPU blocks. The implementations on the FPGA with embedded FPUs showed mean area and delay improvements of 5.2x and 5.8x respectively for the double-precision benchmarks, and 4.4x and 4.2x for the single-precision benchmarks. embedded systems floating-point FPGA FPU
243	Interactive Online Laboratories Gang Wang Unknown Date (has links) No description available.
244	Enabling Gigabit IP for Embedded Systems Tsakiris, Nicholas, n.tsakiris@internode.on.net January 2009 (has links) For any practical implementation of chip design, there needs to be a hardware platform available for the purpose of prototyping and implementation of FPGA-based programs, whether they are written in VHDL or Verilog. Communication between the platform and a computer is a useful feature of many hardware solutions as it allows for the capability of regular data transmission between the two devices. Furthermore, the ability to communicate between the platform and a computer at high-speeds requires a specially constructed interface, one that can be modified by the designer at their choosing. There are a number of commercial packages which provide a hardware platform to perform this task, however there are drawbacks to many of the available options. Some may require special hardware to connect to a computer using proprietary connectors or boards, which increases the cost and reduces the flexibility of any solution. Other options may have limited access to the internal structure of the interface, limiting the ability of the developer to modify the interface to suit their needs. There may be an extra cost to provide the code to the interface, separate from the board, which can also tax design budgets. This dissertation provides a solution in the form of a Gigabit Ethernet connection with a custom IP/network layer written in VHDL to facilitate the connection. With an increasing number of IP-enabled devices available such as IPTV and set top boxes, the ability to link hardware using Ethernet is very useful and so the development of a lean and capable network layer was considered a suitable focus for the project. The overall goal has been to provide an interface which is cheap, open, robust and efficient, retaining the flexibility a developer might require to modify the code to their needs. After covering some basic background information about the project, the dissertation looks at the requirements of the board and interface, as well as the alternative interface solutions which were looked at before deciding on Gigabit Ethernet. The protocols used in Ethernet are then covered, with both an explanation of the structure of each and their relevance to the implementation. The Finite State Machines which control operation of the interface are covered in depth, with an explanation of their inter-connectivity to each other and how they fit in the data-flow between the computer and the board. Error correction and reliability is discussed, as well as any remaining components critical to the operation of the interface. Pipelining, the method of design which provides the speed required for Gigabit Ethernet, is covered along with the extra speed optimisation techniques used in the design such as RAM swinging buffers. Testing and synthesis are covered which ensure the design is as robust as possible, both in simulations and in real-world applications. The final design was implemented on a Xilinx Spartan 3 FPGA (XC3S5000-5FG900C) and capable of a maximum speed of 128.287 MHz, which is more than enough to satisfy the requirements of Gigabit Ethernet under a variety of network conditions. The interface code occupies 1,166 slices of logic on the FPGA (3% of the total amount of logic available), making it sufficiently compact to run large projects on the same chip. The core was tested on physical hardware and performed correctly at real line Gigabit speeds. Configuration of the computer along with the method of connecting to the board and transferring data is mentioned, with explanation of the code run on the computer to make this possible. Finally, the dissertation provides an example application through the use of JPEG2000 image compression/decompression. embedded vhdl fpga jpeg jpeg2000 pipelining xilinx
245	Customization of floating-point units for embedded systems and field programmable gate arrays Chong, Michael Yee Jern, Computer Science & Engineering, Faculty of Engineering, UNSW January 2009 (has links) While Application Specific Instruction Set Processors (ASIPs) have allowed designers to create processors with custom instructions to target specific applications, floating-point units (FPUs) are still instantiated as non-customizable general-purpose units, which if under utilized, wastes area and performance. However, customizing FPUs manually is a complex and time-consuming process. Therefore, there is a need for an automated custom FPU generation scheme. This thesis presents a methodology for generating application-specific FPUs customized at the instruction level, with integrated datapath merging to minimize area. The methodology reduces the subset of floating-point instructions implemented to the minimum required for the application. Datapath merging is then performed on the required datapaths to minimize area. Previous datapath merging techniques failed to consider merging components of different bit-widths and thus ignore the bit-alignment problem in datapath merging. This thesis presents a novel bit-alignment solution during datapath merging. In creating the custom FPU, the subset of floating-point instructions that should be implemented in hardware has to be determined. Implementing more instructions in hardware reduces the cycle count of the application, but may lead to increased delay due to multiplexers inserted on the critical path during datapath merging. A rapid design space exploration was performed to explore the trade-offs. By performing this exploration, a designer could determine the number of instructions that should be implemented as a custom FPU and the number that should be left for software emulation, such that performance and area meets the designer's requirements. Customized FPUs were generated for different Mediabench applications and compared to a fully-featured reference FPU that implemented all floating-point operations. Reducing the floating-point instruction set reduced the FPU area by an average of 55%. Performing instruction reduction and then datapath merging reduced the FPU area by an average of 68%. Experiments showed that datapath merging without bit-alignment achieved an average area reduction of 10.1%. With bit-alignment, an average of 16.5% was achieved. Bit-alignment proved most beneficial when there was a diverse mix of different bit-widths in the datapaths. Performance of Field-Programmable Gate Arrays (FPGAs) used for floating-point applications is poor due to the complexity of floating-point arithmetic. Implementing floating-point units on FPGAs consume a large amount of resources. Therefore, there is a need for embedded FPUs in FPGAs. However, if unutilized, they waste area on the FPGA die. To overcome this issue, a novel flexible multi-mode embedded FPU for FPGAs is presented in this thesis that can be configured to perform a wide range of operations. The floating-point adder and multiplier in the embedded FPU can each be configured to perform one double-precision operation or two single-precision operations in parallel. To increase flexibility further, access to the large integer multiplier, adder and shifters in the FPU is provided. It is also capable of floating-point and integer multiply-add operations. Benchmark circuits were implemented on both a standard Xilinx Virtex-II FPGA and on the FPGA with embedded FPU blocks. The implementations on the FPGA with embedded FPUs showed mean area and delay improvements of 5.2x and 5.8x respectively for the double-precision benchmarks, and 4.4x and 4.2x for the single-precision benchmarks. embedded systems floating-point FPGA FPU
246	Interactive Online Laboratories Gang Wang Unknown Date (has links) No description available.
247	Image Processing On Reconfigurable System-on-Chip Han, Jie Unknown Date (has links) Real-time image processing requires not only sophisticated heuristic algorithms customized for a particular application, but also needs substantial computational power to handle a massive quantity of input image data. Reconfigurable System-on- Chip (rSoC), a powerful method to harness the power of FPGA technology, is well suited to real-time image processing. It balances the design cost and performance via a combination of hardware and software. However, hardware/software co-design requires specialized design skills, and designs are complex. This thesis investigates how best to use FPGA-based reconfigurable computing to provide efficient speed-up of real-time image processing algorithms. Existing rSoC systems, face detection and recognition algorithms, hardware/software co-design methods are first reviewed and analyzed. The advantages and disadvantages of existing research results are also presented. However, these existing approaches all have shortcomings. A new rSoC system without a separate host machine is presented for standalone embedded platforms. A new hardware/software co-design method including hardware/software communication and partitioning is also explained. This rSoC system is a highly modular system, it runs without a host machine and it supports the Linux operating systems. Hardware and software designs can be rapidly implemented on this new platform. A new method for hardware/software communication in rSoC design is presented, which is based on shared memory and semaphores, and makes hardware coprocessors appear like software processes. Individual processes in hardware-software systems can communicate without knowing whether other co-operating processes are hardware or software. This approach enables re-useable hardware components to be readily accessed by designers, without specialist hardware knowledge. Processes also can be easily swapped between hardware and software. The partitioning method handles the software/hardware partition iteratively during the implementation. The partition is based on experimental profiling, so it is easier to realize and may achieve a more optimal result than a fixed a priori partition. An example face recognition system has been implemented to test the new design method. It is a four-stage pipeline architecture which contains image capture, face detection, image enhancement, and face recognition. Firstly, a software-only solution using semaphores and shared memory method is implemented on a Linux PC. Results of 5.5 frames per second indicate that the speed may not be fast enough for real-time image processing. Secondly, that software-only solution is moved to the new rSoC platform. The performance of 0.1 frames per second is worse than PC platform since the PCs CPU is much more powerful than the rSoCs. Finally the new design method is used to move some bottleneck modules to hardware. The new hardware/software communication method is used, so software modules remain unchanged and unaware of the movement of other modules to hardware. Results show that moving only one module to hardware was not helpful. However when both the bottleneck modules were moved to hardware, the system speedup was approximately 200 with a final system speed of 19 frames per second. reconfigurable computing FPGA image processing System-on-Chip
248	PMEMD-HW: simulação por dinâmica molecular usando hardware reconfigurável Mohr, Adilson Arthur January 2010 (has links) Made available in DSpace on 2013-08-07T18:43:00Z (GMT). No. of bitstreams: 1 000425483-Texto+Completo-0.pdf: 1217247 bytes, checksum: 2d1bad79b7e96a9d75748adf3146bedd (MD5) Previous issue date: 2010 / Molecular dynamics systems are defined by the position and energy of their component particles, as well as by the interactions among these. Such systems can be simulated through mathematical methods like the computation of electrostatic forces based on the Coulomb Law. Predicting the states through which such system evolves by computing the interaction of each particle with its neighbors is a computationally costly task, even for a small number of particles. Thus, it can only be beneficial to apply specific techniques for accelerating these computations. While some studies propose the use of new algorithms, others advocate the use of specific processors or custom designed hardware, the later being the technique employed in this Dissertation. This work describes the design and prototyping of a hardware architecture that has the potential to accelerate an application based on the computation of electrostatic forces among non-bonded particles. A special emphasis is given to the aspects of integration between the accelerating hardware and the modified target application, the PMEMD (Particle Mesh Ewald Molecular Dynamics) software, part of the AMBER (Assisted Model Building with Energy Refinement) platform. The costliest computations of PMEMD were identified and moved to an FPGA hardware implementation, creating a custom coprocessor – PMEMD-HW. The choice for reconfigurable hardware is due, among other reasons, to the ease with which it enables the evolution of the design towards the target acceleration. The main contribution of this work is the mastering of the technology to design and analyze hardware coprocessors that target the acceleration of applications in Biology and Biophysics. A working prototype is available, using a commercial hardware prototyping platform. The proof-of-concept implementation demonstrates the viability of successfully using the proposed techniques. / Sistemas de dinâmica molecular são definidos pela posição e energia das partículas que o compõe, assim como pelas interações entre estas. Tais sistemas podem ser simu-lados através de métodos matemáticos como o cálculo de forças eletrostáticas baseadas na Lei de Coulomb. Computar os estados através dos quais um sistema destes evolui, avaliando a interação de cada partícula, é tarefa computacionalmente dispendiosa, mes-mo para um número pequeno de partículas. Portanto, podem-se obter benefícios ao se aplicar técnicas específicas para acelerar tais computações. Enquanto alguns estudos propõem o uso de algoritmos diferenciados, existem os que empregam processadores especiais ou hardware personalizado, a técnica abordada nesta Dissertação. Descreve-se aqui o projeto e a prototipação de uma arquitetura de hardware com potencial para acelerar uma aplicação que computa forças eletrostáticas entre partículas não ligadas. Dá-se ênfase especificamente aos aspectos da integração entre o hardware e a aplicação-alvo empregada neste projeto, o programa PMEMD (Particle Mesh Ewald Molecular Dynamics), parte da plataforma AMBER (Assisted Model Building with Energy Refinement). Os cálculos mais onerosos deste programa foram identificados e movidos para uma implementação de hardware em FPGA, criando um co-processador específico – o PMEMD-HW. A escolha de um hardware reconfigurável se deve, entre outros motivos, à facilidade de fazer evoluir o processo de projeto e obter a aceleração almejada. A principal contribuição deste trabalho é o domínio da tecnologia de uso de co-processadores de hardware para acelerar aplicações nas áreas de Biologia e Biofísica. Um protótipo funcional está disponível, utilizando uma plataforma comercial de prototipa-ção de hardware. Esta prova de conceito demonstra a viabilidade de usar com sucesso as técnicas desenvolvidas. INFORMÁTICA ARQUITETURA DE COMPUTADOR FPGA
249	Multidimensional DFT IP Generators for FPGA Platforms January 2012 (has links) abstract: Multidimensional (MD) discrete Fourier transform (DFT) is a key kernel algorithm in many signal processing applications, such as radar imaging and medical imaging. Traditionally, a two-dimensional (2-D) DFT is computed using Row-Column (RC) decomposition, where one-dimensional (1-D) DFTs are computed along the rows followed by 1-D DFTs along the columns. However, architectures based on RC decomposition are not efficient for large input size data which have to be stored in external memories based Synchronous Dynamic RAM (SDRAM). In this dissertation, first an efficient architecture to implement 2-D DFT for large-sized input data is proposed. This architecture achieves very high throughput by exploiting the inherent parallelism due to a novel 2-D decomposition and by utilizing the row-wise burst access pattern of the SDRAM external memory. In addition, an automatic IP generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex-5 devices. For a 2048x2048 input size, the proposed architecture is 1.96 times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations. While the proposed 2-D DFT IP can achieve high performance, its output is bit-reversed. For systems where the output is required to be in natural order, use of this DFT IP would result in timing overhead. To solve this problem, a new bandwidth-efficient MD DFT IP that is transpose-free and produces outputs in natural order is proposed. It is based on a novel decomposition algorithm that takes into account the output order, FPGA resources, and the characteristics of off-chip memory access. An IP generator is designed and integrated into an in-house FPGA development platform, AlgoFLEX, for easy verification and fast integration. The corresponding 2-D and 3-D DFT architectures are ported onto the BEE3 board and their performance measured and analyzed. The results shows that the architecture can maintain the maximum memory bandwidth throughout the whole procedure while avoiding matrix transpose operations used in most other MD DFT implementations. The proposed architecture has also been ported onto the Xilinx ML605 board. When clocked at 100 MHz, 2048x2048 images with complex single-precision can be processed in less than 27 ms. Finally, transpose-free imaging flows for range-Doppler algorithm (RDA) and chirp-scaling algorithm (CSA) in SAR imaging are proposed. The corresponding implementations take advantage of the memory access patterns designed for the MD DFT IP and have superior timing performance. The RDA and CSA flows are mapped onto a unified architecture which is implemented on an FPGA platform. When clocked at 100MHz, the RDA and CSA computations with data size 4096x4096 can be completed in 323ms and 162ms, respectively. This implementation outperforms existing SAR image accelerators based on FPGA and GPU. / Dissertation/Thesis / Ph.D. Electrical Engineering 2012 Electrical engineering DSP FFT FPGA SDRAM VLSI
250	Circuitos assíncronos na plataforma FPGA Mocho, Renato Ubiratan Reis January 2006 (has links) Os circuitos digitais cada vez mais são exigidos quanto ao desempenho e modularidade nos processos dos dias atuais. Para resolver estes processos, o comércio utiliza largamente circuitos digitais síncronos, que se baseiam no controle do sincronismo através de um relógio central. Esses circuitos, apesar de serem de fácil implementação e terem uma metodologia já conhecida, apresentam limitações quando se considera a distribuição dos sinais de sincronismo, a interferência do meio e os possíveis atrasos. Os circuitos assíncronos apresentam uma solução natural a essas exigências, uma vez que, possuem independência do sinal do relógio e toda sua construção é modular. Este trabalho apresenta um estudo comparativo de alguns estilos de projetos para construção de circuitos assíncronos utilizando dispositivos programados por lógica, PLDs, utilizando ferramentas de síntese lógica comerciais para circuitos síncronos. Esses circuitos assíncronos são descritos em VHDL para as células Muller, elementos M de N, registrador assíncrono, somadores e circuitos mais complexos em anel assíncrono e implementados em CPLDs e FPGAs. Os circuitos mais complexos são construídos em quatro estilos de projeto para os circuitos dos somadores: Descrição comportamental com indicação forte do sinal, DIMS, NCL e derivação a partir de circuito combinacional síncrono. Através dessa avaliação foi possível verificar as tendências do custo de elementos de programação e atrasos para realização de cálculos, frente aos circuitos síncronos similares. / This work presents a study about the implementation of asynchronous circuits on programmable devices platform. It investigates four different ways of implementing asynchronous circuits, including implementation of several different circuits in platforms provided by three different manufacturers. The implemented asynchronous circuits have a very poor performance when compared to their synchronous counterpart. However, this was expected as the platforms used were developed to be used with synchronous designs. The contributions of this work are in the following areas. First, it was described in detail how to implement VHDL code for self-timed designs. Second, different design were implemented to test the VHDL descriptions in the chosen platforms. Third, by comparing four different asynchronous styles, it is possible to find a style that is the more adequate for use in current FPGAs. Fourth, by analyzing the results obtained, it was possible to derive some conclusions on why asynchronous designs are so costly for these platforms and derive some suggestions to be used in the implementation of asynchronous FPGAs. Microeletrônica Fpga Asynchronous circuits VHDL Implementation

Search results