Global ETD Search

1	Heterogeneous Processing in Software Defined Radio: Flexible Implementation and Optimal Resource Mapping Bieberly, Frank 05 April 2012 (has links) The advantages provided by Software Defined Radios (SDRs) have made them useful tools for communication engineers and academics alike. The ability to support a wide range of communication waveforms with varying modulation, encoding, or frequencies on a single hardware platform can decrease production costs while accelerating wave-form development. SDR applications are expanding in military and commercial environments as advances in transistor technology allow greater computational density with decreased power-consumption, size, and weight. As the demand for greater performance continues to increase, some SDR manufacturers are experimenting with heterogeneous processing platforms to meet these requirements. Heterogeneous processing, a method of dividing computational tasks among dissimilar processors, is well-suited to the data flow programming paradigm used in many common SDR software frameworks. Particularly on embedded platforms, heterogeneous processing can offer significant gains in computational power while maintaining low power-consumption, opening the door for affordable and useful mobile SDR platforms. Many past SDR hardware implementations utilize a partially heterogeneous processing approach. A field programmable gate array (FPGA) is often used to perform high-speed processing (DDC, decimation) near the radio front-end while another processor (GPP, DSP or FPGA) performs the rest of the SDR application signal processing (gain control, filtering, demodulation). A few recent SDR hardware platforms are designed to allow the use of multiple processor types throughout the SDR application's processing chain. This can result in significant benefit to SDR software that can take advantage of the greater heterogeneous processing now available. This thesis will present a new method of heterogeneous processing in the framework of GNU Radio. In this implementation a software wrapper allows a DSP to participate seamlessly in GNU Radio applications. The DSP can be directly substituted for existing GNU Radio signal processing blocks—significantly expanding the platform's capabilities while maintaining the benefits of the component-based design methodology. A similar approach could be applied to additional processing elements (e.g. FPGAs and co-processors) and to other SDR software frameworks. As the capabilities of this heterogeneous framework increase users will be required to assign hardware resources to signal processing tasks to maximize performance. To remove this burden, a method of predicting GNU Radio application performance and a heuristic resource mapping algorithm, which seems to perform well in practice, are presented. / Master of Science oftware Defined Radio Heterogeneous Processing Resource Mapping
2	Metodologia e ferramentas para paralelização de laços perfeitamente aninhados com processamento heterogêneo. / Methodology and tools for parallelization of nested perfectly loops with heterogeneous processing. Luz, Cleber Silva Ferreira da 01 February 2018 (has links) Aplicações podem apresentar laços perfeitamente aninhados que demandam um alto poder de processamento. Diversas aplicações científicas contêm laços aninhados em suas estruturas. Tais laços podem processar computações heterogêneas. Uma solução para reduzir o tempo de execução desta classe de aplicações é a paralelização destes laços. A heterogeneidade dos tempos de execução de computações presentes nas iterações de laços perfeitamente aninhados demanda uma paralelização adequada visando uma distribuição de carga homogênea entre os recursos computacionais para reduzir a ociosidade de tais recursos. Esta heterogeneidade implica em um número ideal de recursos computacionais a partir do qual, o seu aumento não impactaria no ganho de desempenho, uma vez que, o tempo mínimo possível é o tempo de execução da tarefa que consome o maior tempo de processamento. Neste trabalho é proposta uma metodologia e ferramentas para paralelização de laços perfeitamente aninhados sem dependência de dados e com processamento heterogêneo em sistemas paralelos e distribuídos. A implementação da metodologia proposta em aplicações melhora o desempenho da execução e reduz a ociosidade dos recursos de processamento. Na metodologia proposta, alguns procedimentos são apoiados por ferramentas desenvolvidas para auxiliá-los. O sistema de processamento poderá ser: um computador Multicore, um Cluster real ou virtual alocado na nuvem. Resultados experimentais são apresentados neste trabalho. Tais resultados mostram a viabilidade e eficiência da metodologia proposta. / Applications may have nested perfectly loops that require a high processing power. Various scientific applications contain nested loops in their structures. Such loops can process heterogeneous computations. A solution to reduce the execution time of this class of applications is the parallelization of these loops. The heterogeneity of the execution times of computations present in the iterations of nested perfectly loops demands an adequate parallelization aiming at a homogeneous load distribution among the computational resources to reduce the idleness of such resources. This heterogeneity implies an ideal number of computational resources which, its increase would not impact the performance gain, since the minimum possible time is the execution time of the task that consumes the longest processing time. In this work is proposed a methodology and tools for parallelization of loops perfectly nested with heterogeneous processing in parallel and distributed systems. The implementation of proposed methodology in application improves execution performance and reduce idles of the processing resources. In the methodology proposed, some procedures are supported by tools developed to assist them. The processing system can be: a computer multicore, a cluster real or virtual allocated in cloud. Experimental results are presented in this work. These results show the feasibility and efficiency of the proposed methodology. Computação distribuída Distribution computation Nested loops Parallel computation Programação concorrente Programação paralela
3	Efficiency of CNN on Heterogeneous Processing Devices Ringenson, Josefin January 2019 (has links) In the development of advanced driver assistance systems, computer vision problemsneed to be optimized to run efficiently on embedded platforms. Convolutional neural network(CNN) accelerators have proven to be very efficient for embedded camera platforms,such as the ones used for automotive vision systems. Therefore, the focus of this thesisis to evaluate the efficiency of a CNN on a future embedded heterogeneous processingdevice. The memory size in an embedded system is often very limited, and it is necessary todivide the input into multiple tiles. In addition, there are power and speed constraintsthat needs to be met to be able to use a computer vision system in a car. To increaseefficiency and optimize the memory usage, different methods for CNN layer fusion areproposed and evaluated for a variety of tile sizes. Several different layer fusion methods and input tile sizes are chosen as optimal solutions,depending on the depth of the layers in the CNN. The solutions investigated inthe thesis are most efficient for deep CNN layers, where the number of channels is high. CNN Accelerator Convolution Heterogeneous Processing Device AI Engine FPGA Hardware Architecture Automotive Security Computer Vision Layer Fusion Computer Systems Datorsystem Embedded Systems Inbäddad systemteknik
4	Metodologia e ferramentas para paralelização de laços perfeitamente aninhados com processamento heterogêneo. / Methodology and tools for parallelization of nested perfectly loops with heterogeneous processing. Cleber Silva Ferreira da Luz 01 February 2018 (has links) Aplicações podem apresentar laços perfeitamente aninhados que demandam um alto poder de processamento. Diversas aplicações científicas contêm laços aninhados em suas estruturas. Tais laços podem processar computações heterogêneas. Uma solução para reduzir o tempo de execução desta classe de aplicações é a paralelização destes laços. A heterogeneidade dos tempos de execução de computações presentes nas iterações de laços perfeitamente aninhados demanda uma paralelização adequada visando uma distribuição de carga homogênea entre os recursos computacionais para reduzir a ociosidade de tais recursos. Esta heterogeneidade implica em um número ideal de recursos computacionais a partir do qual, o seu aumento não impactaria no ganho de desempenho, uma vez que, o tempo mínimo possível é o tempo de execução da tarefa que consome o maior tempo de processamento. Neste trabalho é proposta uma metodologia e ferramentas para paralelização de laços perfeitamente aninhados sem dependência de dados e com processamento heterogêneo em sistemas paralelos e distribuídos. A implementação da metodologia proposta em aplicações melhora o desempenho da execução e reduz a ociosidade dos recursos de processamento. Na metodologia proposta, alguns procedimentos são apoiados por ferramentas desenvolvidas para auxiliá-los. O sistema de processamento poderá ser: um computador Multicore, um Cluster real ou virtual alocado na nuvem. Resultados experimentais são apresentados neste trabalho. Tais resultados mostram a viabilidade e eficiência da metodologia proposta. / Applications may have nested perfectly loops that require a high processing power. Various scientific applications contain nested loops in their structures. Such loops can process heterogeneous computations. A solution to reduce the execution time of this class of applications is the parallelization of these loops. The heterogeneity of the execution times of computations present in the iterations of nested perfectly loops demands an adequate parallelization aiming at a homogeneous load distribution among the computational resources to reduce the idleness of such resources. This heterogeneity implies an ideal number of computational resources which, its increase would not impact the performance gain, since the minimum possible time is the execution time of the task that consumes the longest processing time. In this work is proposed a methodology and tools for parallelization of loops perfectly nested with heterogeneous processing in parallel and distributed systems. The implementation of proposed methodology in application improves execution performance and reduce idles of the processing resources. In the methodology proposed, some procedures are supported by tools developed to assist them. The processing system can be: a computer multicore, a cluster real or virtual allocated in cloud. Experimental results are presented in this work. These results show the feasibility and efficiency of the proposed methodology. Computação distribuída Programação concorrente Programação paralela Distribution computation Nested loops Parallel computation
5	Efficient FPGA SoC Processing Design for a Small UAV Radar Newmeyer, Luke Oliver 01 April 2018 (has links) Modern radar technology relies heavily on digital signal processing. As radar technology pushes the boundaries of miniaturization, computational systems must be developed to support the processing demand. One particular application for small radar technology is in modern drone systems. Many drone applications are currently inhibited by safety concerns of autonomous vehicles navigating shared airspace. Research in radar based Detect and Avoid (DAA) attempts to address these concerns by using radar to detect nearby aircraft and choosing an alternative flight path. Implementation of radar on small Unmanned Air Vehicles (UAV), however, requires a lightweight and power efficient design. Likewise, the radar processing system must also be small and efficient.This thesis presents the design of the processing system for a small Frequency Modulated Continuous Wave (FMCW) phased array radar. The radar and processing is designed to be light-weight and low-power in order to fly onboard a UAV less than 25 kg in weight. The radar algorithms for this design include a parallelized Fast Fourier Transform (FFT), cross correlation, and beamforming. Target detection algorithms are also implemented. All of the computation is performed in real-time on a Xilinx Zynq 7010 System on Chip (SoC) processor utilizing both FPGA and CPU resources.The radar system (excluding antennas) has dimensions of 2.25 x 4 x 1.5 in3, weighs 120 g, and consumes 8 W of power of which the processing system occupies 2.6 W. The processing system performs over 652 million arithmetic operations per second and is capable of performing the full processing in real-time. The radar has also been tested in several scenarios both airborne on small UAVs as well as on the ground. Small UAVs have been detected to ranges of 350 m and larger aircraft up to 800 m. This thesis will describe the radar design architecture, the custom designed radar hardware, the FPGA based processing implementations, and conclude with an evaluation of the system's effectiveness and performance. Efficient Computing Phased Array Radar FMCW Radar Digital Beamforming FPGA SoC Xilinx Zynq Heterogeneous Processing Hardware Acceleration Detect and Avoid Sense and Avoid Unmanned Air Vehicles Drone Technology Engineering
6	Optimizing array processing on complex I/O stacks usingindices and data summarization Xing, Haoyuan January 2021 (has links) No description available. Computer Science Computer Engineering
7	Efficient FPGA SoC Processing Design for a Small UAV Radar Newmeyer, Luke Oliver 01 April 2018 (has links) Modern radar technology relies heavily on digital signal processing. As radar technology pushes the boundaries of miniaturization, computational systems must be developed to support the processing demand. One particular application for small radar technology is in modern drone systems. Many drone applications are currently inhibited by safety concerns of autonomous vehicles navigating shared airspace. Research in radar based Detect and Avoid (DAA) attempts to address these concerns by using radar to detect nearby aircraft and choosing an alternative flight path. Implementation of radar on small Unmanned Air Vehicles (UAV), however, requires a lightweight and power efficient design. Likewise, the radar processing system must also be small and efficient. This thesis presents the design of the processing system for a small Frequency Modulated Continuous Wave (FMCW) phased array radar. The radar and processing is designed to be light-weight and low-power in order to fly onboard a UAV less than 25 kg in weight. The radar algorithms for this design include a parallelized Fast Fourier Transform (FFT), cross correlation, and beamforming. Target detection algorithms are also implemented. All of the computation is performed in real-time on a Xilinx Zynq 7010 System on Chip (SoC) processor utilizing both FPGA and CPU resources. The radar system (excluding antennas) has dimensions of 2.25 x 4 x 1.5 in3, weighs 120 g, and consumes 8 W of power of which the processing system occupies 2.6 W. The processing system performs over 652 million arithmetic operations per second and is capable of performing the full processing in real-time. The radar has also been tested in several scenarios both airborne on small UAVs as well as on the ground. Small UAVs have been detected to ranges of 350 m and larger aircraft up to 800 m. This thesis will describe the radar design architecture, the custom designed radar hardware, the FPGA based processing implementations, and conclude with an evaluation of the system's effectiveness and performance. Efficient Computing Phased Array Radar FMCW Radar Digital Beamforming FPGA SoC Xilinx Zynq Heterogeneous Processing Hardware Acceleration Detect and Avoid Sense and Avoid Unmanned Air Vehicles Drone Technology Electrical and Computer Engineering

1

Page generated in 0.0713 seconds