Spelling suggestions: "subject:"aprocessing unit"" "subject:"eprocessing unit""
61 |
A GPU Stream Computing Approach to Terrain Database Integrity MonitoringMcKeon, Sean Patrick 10 July 2009 (has links)
Synthetic Vision Systems (SVS) provide an aircraft pilot with a virtual 3-D image of surrounding terrain which is generated from a digital elevation model stored in an onboard database. SVS improves the pilot's situational awareness at night and in inclement weather, thus reducing the chance of accidents such as controlled flight into terrain. A terrain database integrity monitor is needed to verify the accuracy of the displayed image due to potential database and navigational system errors. Previous research has used existing aircraft sensors to compare the real terrain position with the predicted position. We propose an improvement to one of these models by leveraging the stream computing capabilities of commercial graphics hardware. "Brook for GPUs," a system for implementing stream computing applications on programmable graphics processors, is used to execute a streaming ray-casting algorithm that correctly simulates the beam characteristics of a radar altimeter during all phases of flight.
|
62 |
Hybrid Spectral Ray Tracing Method for Multi-scale Millimeter-wave and Photonic Propagation ProblemsHailu, Daniel 30 September 2011 (has links)
This thesis presents an efficient self-consistent Hybrid Spectral Ray Tracing (HSRT) technique for analysis and design of multi-scale sub-millimeter wave problems, where sub-wavelength features are modeled using rigorous methods, and complex structures with dimensions in the order of tens or even hundreds of wavelengths are modeled by asymptotic methods.
Quasi-optical devices are used in imaging arrays for sub-millimeter and terahertz applications, THz time-domain spectroscopy (THz-TDS), high-speed wireless communications, and space applications to couple terahertz radiation from space to a hot electron bolometer. These devices and structures, as physically small they have become, are very large in terms of the wavelength of the driving quasi-optical sources and may have dimension in the tens or even hundreds of wavelengths. Simulation and design optimization of these devices and structures is an extremely challenging electromagnetic problem. The analysis of complex electrically large unbounded wave structures using rigorous methods such as method of moments (MoM), finite element method (FEM), and finite difference time domain (FDTD) method can become almost impossible due to the need for large computational resources. Asymptotic high-frequency techniques are used for analysis of electrically large quasi-optical systems and hybrid methods for solving multi-scale problems.
Spectral Ray Tracing (SRT) has a number of unique advantages as a candidate for hybridization. The SRT method has the advantages of Spectral Theory of Diffraction (STD). STD can model reflection, refraction and diffraction of an arbitrary wave incident on the complex structure, which is not the case for diffraction theories such as Geometrical Theory of Diffraction (GTD), Uniform theory of Diffraction (UTD) and Uniform Asymptotic Theory (UAT). By including complex rays, SRT can effectively analyze both near-fields and far-fields accurately with minimal approximations. In this thesis, a novel matrix representation of SRT is presented that uses only one spectral integration per observation point and applied to modeling a hemispherical and hyper-hemispherical lens. The hybridization of SRT with commercially available FEM and MoM software is proposed in this work to solve the complexity of multi-scale analysis. This yields a computationally efficient self-consistent HSRT algorithm. Various arrangements of the Hybrid SRT method such as FEM-SRT, and MoM-SRT, are investigated and validated through comparison of radiation patterns with Ansoft HFSS for the FEM method, FEKO for MoM, Multi-level Fast Multipole Method (MLFMM) and physical optics. For that a bow-tie terahertz antenna backed by hyper-hemispherical silicon lens, an on-chip planar dipole fabricated in SiGe:C BiCMOS technology and attached to a hyper-hemispherical silicon lens and a double-slot antenna backed by silica lens will be used as sample structures to be analyzed using the HSRT. Computational performance (memory requirement, CPU/GPU time) of developed algorithm is compared to other methods in commercially available software. It is shown that the MoM-SRT, in its present implementation, is more accurate than MoM-PO but comparable in speed. However, as shown in this thesis, MoM-SRT can take advantage of parallel processing and GPU. The HSRT algorithm is applied to simulation of on-chip dipole antenna backed by Silicon lens and integrated with a 180-GHz VCO and radiation pattern compared with measurements. The radiation pattern is measured in a quasi-optical configuration using a power detector. In addition, it is shown that the matrix formulation of SRT and HSRT are promising approaches for solving complex electrically large problems with high accuracy.
This thesis also expounds on new measurement setup specifically developed for measuring integrated antennas, radiation pattern and gain of the embedded on-chip antenna in the mmW/ terahertz range. In this method, the radiation pattern is first measured in a quasi-optical configuration using a power detector. Subsequently, the radiated power is estimated form the integration over the radiation pattern. Finally, the antenna gain is obtained from the measurement of a two-antenna system.
|
63 |
Hardware Acceleration of a Monte Carlo Simulation for Photodynamic Therapy Treatment PlanningLo, William Chun Yip 15 February 2010 (has links)
Monte Carlo (MC) simulations are widely used in the field of medical biophysics, particularly for modelling light propagation in biological tissue. The iterative nature of MC simulations and their high computation time currently limit their use to solving the forward solution for a given set of source characteristics and tissue optical properties. However, applications such as photodynamic therapy treatment planning or image reconstruction in diffuse optical tomography require solving the inverse problem given a desired light dose distribution or absorber distribution,
respectively. A faster means for performing MC simulations would enable the use of MC-based models for such tasks. In this thesis, a gold standard MC code called MCML was accelerated using two distinct hardware-based approaches, namely designing custom hardware on field-programmable gate arrays (FPGAs) and programming commodity graphics processing units (GPUs). Currently, the GPU-based approach is promising, offering approximately 1000-fold speedup with 4 GPUs compared to an Intel Xeon CPU.
|
64 |
Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing UnitDelorme, Michael Christopher 18 March 2013 (has links)
We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at the begining and end of the sort. Our fine-grained implementation utilizes the APU’s integrated memory system to share data throughout the sort. Both these implementations outperform the current state of the art GPU radix sort from NVIDIA. We therefore demonstrate that the CPU can be efficiently used to speed up radix sort on the APU.
Our fine-grained implementation slightly outperforms our coarse-grained implementation. This demonstrates the benefit of the APU’s integrated architecture. This performance benefit is hindered by limitations in the APU’s architecture and programming model. We believe that the performance benefits will increase once these limitations are addressed in future generations of the APU.
|
65 |
Hardware Acceleration of a Monte Carlo Simulation for Photodynamic Therapy Treatment PlanningLo, William Chun Yip 15 February 2010 (has links)
Monte Carlo (MC) simulations are widely used in the field of medical biophysics, particularly for modelling light propagation in biological tissue. The iterative nature of MC simulations and their high computation time currently limit their use to solving the forward solution for a given set of source characteristics and tissue optical properties. However, applications such as photodynamic therapy treatment planning or image reconstruction in diffuse optical tomography require solving the inverse problem given a desired light dose distribution or absorber distribution,
respectively. A faster means for performing MC simulations would enable the use of MC-based models for such tasks. In this thesis, a gold standard MC code called MCML was accelerated using two distinct hardware-based approaches, namely designing custom hardware on field-programmable gate arrays (FPGAs) and programming commodity graphics processing units (GPUs). Currently, the GPU-based approach is promising, offering approximately 1000-fold speedup with 4 GPUs compared to an Intel Xeon CPU.
|
66 |
Hybrid Spectral Ray Tracing Method for Multi-scale Millimeter-wave and Photonic Propagation ProblemsHailu, Daniel 30 September 2011 (has links)
This thesis presents an efficient self-consistent Hybrid Spectral Ray Tracing (HSRT) technique for analysis and design of multi-scale sub-millimeter wave problems, where sub-wavelength features are modeled using rigorous methods, and complex structures with dimensions in the order of tens or even hundreds of wavelengths are modeled by asymptotic methods.
Quasi-optical devices are used in imaging arrays for sub-millimeter and terahertz applications, THz time-domain spectroscopy (THz-TDS), high-speed wireless communications, and space applications to couple terahertz radiation from space to a hot electron bolometer. These devices and structures, as physically small they have become, are very large in terms of the wavelength of the driving quasi-optical sources and may have dimension in the tens or even hundreds of wavelengths. Simulation and design optimization of these devices and structures is an extremely challenging electromagnetic problem. The analysis of complex electrically large unbounded wave structures using rigorous methods such as method of moments (MoM), finite element method (FEM), and finite difference time domain (FDTD) method can become almost impossible due to the need for large computational resources. Asymptotic high-frequency techniques are used for analysis of electrically large quasi-optical systems and hybrid methods for solving multi-scale problems.
Spectral Ray Tracing (SRT) has a number of unique advantages as a candidate for hybridization. The SRT method has the advantages of Spectral Theory of Diffraction (STD). STD can model reflection, refraction and diffraction of an arbitrary wave incident on the complex structure, which is not the case for diffraction theories such as Geometrical Theory of Diffraction (GTD), Uniform theory of Diffraction (UTD) and Uniform Asymptotic Theory (UAT). By including complex rays, SRT can effectively analyze both near-fields and far-fields accurately with minimal approximations. In this thesis, a novel matrix representation of SRT is presented that uses only one spectral integration per observation point and applied to modeling a hemispherical and hyper-hemispherical lens. The hybridization of SRT with commercially available FEM and MoM software is proposed in this work to solve the complexity of multi-scale analysis. This yields a computationally efficient self-consistent HSRT algorithm. Various arrangements of the Hybrid SRT method such as FEM-SRT, and MoM-SRT, are investigated and validated through comparison of radiation patterns with Ansoft HFSS for the FEM method, FEKO for MoM, Multi-level Fast Multipole Method (MLFMM) and physical optics. For that a bow-tie terahertz antenna backed by hyper-hemispherical silicon lens, an on-chip planar dipole fabricated in SiGe:C BiCMOS technology and attached to a hyper-hemispherical silicon lens and a double-slot antenna backed by silica lens will be used as sample structures to be analyzed using the HSRT. Computational performance (memory requirement, CPU/GPU time) of developed algorithm is compared to other methods in commercially available software. It is shown that the MoM-SRT, in its present implementation, is more accurate than MoM-PO but comparable in speed. However, as shown in this thesis, MoM-SRT can take advantage of parallel processing and GPU. The HSRT algorithm is applied to simulation of on-chip dipole antenna backed by Silicon lens and integrated with a 180-GHz VCO and radiation pattern compared with measurements. The radiation pattern is measured in a quasi-optical configuration using a power detector. In addition, it is shown that the matrix formulation of SRT and HSRT are promising approaches for solving complex electrically large problems with high accuracy.
This thesis also expounds on new measurement setup specifically developed for measuring integrated antennas, radiation pattern and gain of the embedded on-chip antenna in the mmW/ terahertz range. In this method, the radiation pattern is first measured in a quasi-optical configuration using a power detector. Subsequently, the radiated power is estimated form the integration over the radiation pattern. Finally, the antenna gain is obtained from the measurement of a two-antenna system.
|
67 |
Avaliação de unidade de beneficiamento de milho (Zea mays L.) e diretrizes para implantação de sistema de gestão da qualidade / Evaluation of processing unit of corn (Zea mays L.) and guindelines for implementation of quality management systemDomene, Maria Paula 17 August 2018 (has links)
Orientadores: João Domingos Biagi, Benedito Carlos Benedetti / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Agrícola / Made available in DSpace on 2018-08-17T05:14:17Z (GMT). No. of bitstreams: 1
Domene_MariaPaula_D.pdf: 812193 bytes, checksum: 6448c4a1133ba5e015287838e5fe5573 (MD5)
Previous issue date: 2010 / Resumo: As indústrias de sementes e alimentícias têm sofrido pressões do mercado para produção de matéria-prima segura em relação às contaminações físicas, químicas e biológicas. Um caminho inverso começa a ser percorrido, no qual se tira o foco apenas do controle da qualidade do produto final, iniciando-se o rastreamento de toda a cadeia produtiva, para que os processos possam ser controlados e medidas preventivas possam ser tomadas para melhoria da qualidade e diminuição dos prejuízos. Este modelo exige que normas e padrões sejam seguidos a fim de assegurar a inocuidade dos alimentos. Contudo, estas normas são eficientes quando utilizadas em sistemas que tenham algum nível de organização, mas o que se observa é que os produtores rurais não dominam as ferramentas gerenciais comprometendo o controle da qualidade. Um fator que é observado dentro de unidades de beneficiamento (UB) de grãos e de sementes é a variabilidade do produto recebido, em relação aos aspectos físicos e sanitários. Assim, com a finalidade de avaliar o processo de beneficiamento de milho, adequando-se ainda mais às exigências do mercado consumidor e da legislação, foi desenvolvido este trabalho conjunto entre a COPLACANA (Cooperativa dos Plantadores de Cana do Estado de São
Paulo), sediada no município de Piracicaba-SP, FEAGRI/UNICAMP, ESALQ/USP e CATI. Foram realizados levantamentos para determinação dos pontos críticos de controle (PCCs) através da avaliação do sistema de beneficiamento de grãos de milho da UB da COPLACANA. Ao final do levantamento, foi observada a variabilidade do material recebido quanto aos grãos avariados e, baseado nesta variabilidade da matéria-prima, testou-se se as sujidades influenciariam nas propriedades físicas dos grãos. Buscando alternativas seguras na manipulação e para o meio ambiente, foi testado o efeito de óleos essenciais de espécies de Eucalyptus citriodora, E.camaldulensis e o efeito sinérgico na germinação de sementes. Ao final da pesquisa, foram determinados quatro pontos críticos de controle principais que deveriam ser trabalhados, sendo eles recepção, limpeza, secagem e armazenagem; já os pontos críticos de controle em relação à infraestrutura, ambiental e capacitação, são complementares. O grau de umidade dos grãos recebidos na UB apresentou umidade média de 18,5%, sendo o mês de fevereiro o mais crítico, o que pode ter influenciado na qualidade dos grãos em relação a grãos ardidos. O monitoramento nas fases de pré-colheita e colheita é recomendado para minimizar os danos imediatos e de ardidos nos grãos. Em relação às propriedades físicas, as porcentagens de sujidades influenciaram nos valores da massa aparente específica, velocidade terminal e o coeficiente de atrito para a chapa de concreto. O óleo essencial de Eucaliptus camaldulensis e sua interação com o óleo essencial de Eucaliptus citriodora não influenciaram negativamente na germinação das sementes. Os fungos Penicillium spp e Fusarium sp. Foram controlados pelos óleos essenciais de E. camaldulensis e E. citriodora / Abstract: The seed and food industries have suffered from market pressures to produce safe raw material in relation to physical contamination, chemical and biological weapons. A reverse path starts to go, in which it takes the focus only on the quality control of the final product, starting tracking the entire production chain, so that processes can be monitored and preventive measures can be taken to improve quality and reduction of losses. This model requires that rules and standards are followed to ensure food safety. However, these Standards are effective when used in systems that have some level of organization, but what is observed is that farmers do not dominate the managerial tools compromising quality control. One factor that is observed within processing units (UB) of grains and seeds is the variability of the product received in the physical aspects and health. Thus, in order to evaluate the process of maize improvement, adapting to the demands of even the consumer market and the legislation was developed this collaborative effort between the COPLACANA (Cane Growers Cooperative of State of São Paulo) based in Piracicaba- SP, FEAGRI /UNICAMP, ESALQ / USP and CATI. We raised the critical control points by evaluating the system of processing of corn from the UB COPLACANA. At the end of the survey, we observed the variability of the material received regarding damaged grains and decided to check the influence of dirt on the physical properties of the grains. Seeking alternatives for safe handling and the environment, it was decided to test the effect of essential oils of species of Eucalyptus citriodora, E.camaldulensis and synergistic effect on seed germination. At the end of the study, we determined four main critical control points that should be worked while they were receiving, cleaning, drying and storage, as the critical control points in relation to infrastructure, environmental and training are complementary. The moisture content of grain received at UB had average humidity of 18,5%, being the month February the most critical, which may have influenced the quality of grains for burning grains. The monitoring in the pre-harvest and harvest is recommended to minimize the immediate damage and rot in the grains. Regarding physical properties, the percentages of dirt influenced by the values of apparent specific mass, terminal velocity and the coefficient of friction for the concrete plate. The essential oil of Eucalyptus camaldulensis and this interaction with essential oil of Euclyptus citriodora had no effect germination. Penicillium spp and Fusarium sp were controlled by the essential oils of E. camaldulensis and E. citriodora / Doutorado / Tecnologia Pós-Colheita / Doutor em Engenharia Agrícola
|
68 |
Programmable MIMO detectorsJanhunen, J. (Janne) 22 November 2011 (has links)
Abstract
The multiple-input multiple-output (MIMO) technique combined with an orthogonal frequency division multiplexing (MIMO--OFDM) has been introduced as a promising approach for the ever increasing capacity and quality of service (QoS) requirements for wireless communication systems. An efficient radio spectrum utilization expects a flexible transceiver solution, which has been the reason for the development of the software defined radio (SDR) technologies which in their turn are expected to enable the creation of cognitive radios. As a result, any radio solution could be invoked on demand on any platform.
In this thesis work, we have studied detector algorithms and programmable processor architectures in order to find practical solutions for the future wireless systems. A programmable receiver can reduce the energy dissipation of the receiver by changing the detection algorithm based on the current channel realizations. To provide a realistic aspect to the implementations in different channel realizations, we present a wide state-of-the-art detector comparison. In addition, we present an extensive number arithmetic and word length study in order to evaluate realistic hardware complexity and energy dissipations of the implementations. The study includes a comprehensive design chain from the algorithm development to the actual processor design and finally programming software for the platforms.
We evaluate single and multi-core processor implementations by comparing the achieved results to the Long Term Evolution (LTE) performance requirements. We implement detectors on digital signal processors (DSPs), graphics processing unit (GPU) and transport triggered architecture (TTA). The implementation results are compared in throughput, silicon area and energy efficiency. Finally, we discuss the advantages and disadvantages of the architectures and the implementation effort. / Tiivistelmä
Usean antennin tekniikka yhdistettynä ortogonaaliseen taajuusvaihtelumodulointiin lähetin-vastaanotimessa on esitetty eräänä lupaavana ratkaisuna jatkuvasti kasvaviin kapasiteetti- ja palvelunlaatuvaatimuksiin langattomissa tietoliikennejärjestelmissä. Tehokas radiospektrin käyttö edellyttää joustavaa lähetin-vastaanotinratkaisua, mikä on ollut syynä ohjelmistoradioteknologioiden kehitykselle. Ohjelmistoradioiden kehityksen on puolestaan odotettu mahdollistavan kognitiiviradioiden syntymisen. Tuloksena, mikä tahansa radiosovellus voitaisiin herättää tarpeen mukaan millä tahansa ohjelmoitavalla sovellusalustalla.
Tässä väitöskirjatyössä tutkitaan ilmaisinalgoritmeja sekä ohjelmoitavia prosessoriarkkitehtuureja tarkoituksena löytää käytännöllisiä ratkaisuja tulevaisuuden langattomiin järjestelmiin. Ohjelmoitavalla vastaanottimella voidaan vähentää vastaanottimen energiankulutusta vaihtamalla ilmaisinalgoritmeja vallitsevan kanavatilan mukaan. Työssä esitellään laaja, viimeisintä tutkimusta edustava ilmaisinalgoritmivertailu, joka antaa realistisen näkökannan toteutuksiin erilaisissa kanavatiloissa. Lisäksi työssä esitellään numeroaritmetiikka- ja sananpituustutkimus, jonka tarkoituksena on arvioida toteutusten realistista kovokompleksisuutta sekä energiankulutusta. Tutkimus sisältää kattavan suunnitteluketjun algoritmikehityksestä todelliseen prosessorisuunnitteluun ja lopulta algoritmin ohjelmointiin tietylle sovellusalustalle.
Väitöskirjatyössä arvioidaan yksi- ja moniytimisiä prosessoritoteutuksia vertaamalla saavutettuja tuloksia Long Term Evolution -standardin suorituskykyvaatimuksiin. Ilmaisimia toteutetaan digitaalisilla signaaliprosessoreilla, grafiikkaprosessorilla sekä siirtoliipaisuarkkitehtuurilla. Toteutustuloksia vertaillaan laskentatehona, pinta-alana sekä energiatehokkuutena. Lopuksi käsitellään arkkitehtuurien hyviä ja huonoja puolia sekä suunnittelun työläyttä.
|
69 |
Faster upper body pose recognition and estimation using compute unified device architectureBrown, Dane January 2013 (has links)
>Magister Scientiae - MSc / The SASL project is in the process of developing a machine translation system that can
translate fully-fledged phrases between SASL and English in real-time. To-date, several
systems have been developed by the project focusing on facial expression, hand shape,
hand motion, hand orientation and hand location recognition and estimation. Achmed
developed a highly accurate upper body pose recognition and estimation system. The
system is capable of recognizing and estimating the location of the arms from a twodimensional video captured from a monocular view at an accuracy of 88%. The system operates at well below real-time speeds. This research aims to investigate the use of optimizations and parallel processing techniques using the CUDA framework on Achmed’s algorithm to achieve real-time upper body pose recognition and estimation. A detailed analysis of Achmed’s algorithm identified potential improvements to the algorithm. Are- implementation of Achmed’s algorithm on the CUDA framework, coupled with these improvements culminated in an enhanced upper body pose recognition and estimation system that operates in real-time with an increased accuracy.
|
70 |
Efficient Execution Of AMR Computations On GPU SystemsRaghavan, Hari K 11 1900 (has links) (PDF)
Adaptive Mesh Refinement (AMR) is a method which dynamically varies the spatio-temporal resolution of localized mesh regions in numerical simulations, based on the strength of the solution features. Due to high resolution discretization of localized regions of interests into rectangular mesh units called patches, AMR provides low cost of computations and high degree of accuracy. General purpose graphics processing units (GPGPUs) with their support for fine-grained parallelism, offer an attractive option for obtaining high performance for AMR applications. The data parallel computations of the finite difference schemes of AMR can be efficiently performed on GPGPUs. This research deals with challenges and develops techniques for efficient executions of AMR applications with uniform and non-uniform patches on GPUs.
In the first part of the thesis, we optimize an AMR model with uniform patches. We have developed strategies for continuous online visualization of time evolving data for AMR applications executed on GPUs. In-situ visualization plays an important role for analyzing the time evolving characteristics of the domain structures. Continuous visualization of the output data for various time steps results in better study of the underlying domain and the model used for simulating the domain. We reorder the meshes for computations on the GPU based on the users input related to the subdomain that he wants to visualize. This makes the data available for visualization at a faster rate. We then perform asynchronous executions of the visualization steps and fix-up operations on the coarse meshes on the CPUs while the GPU advances the solution. By performing experiments on Tesla S1070 and Fermi C2070 clusters, we found that our strategies result in up to 60% improvement in response time and 16% improvement in the rate of visualization of frames over the existing strategy of performing fix-ups and visualization at the end of the time steps.
The second part of the thesis deals with adaptive strategies for efficient execution of block structured AMR applications with non-uniform patches on GPUs. Most AMR approaches use patches of uniform sizes over regions of interests. Since this leads to over-refinement, some efforts have focused on forming patches of non-uniform dimensions to improve computational efficiency since the dimensions of a patch can be tuned to the geometry of a region of interest. While effective hybrid execution strategies exist for applications with uniform patches, our work considers efficient execution of non-uniform patches with different workloads. Our techniques include a geometric bin-packing method to load balance GPU computations and reduce thread idling, adaptive determination of amount of work to maximize asynchronism between CPU and GPU executions using a knapsack formulation, and scheduling communications for multi-GPU executions. We test our strategies for synthetic inputs as well as for traces from real applications. Our experiments on Tesla S1070 and Fermi C2070 clusters with both single-GPU and multi-GPU executions show that our strategies result in up to 69% improvement in performance over existing strategies. Our bin-packing based load balancing gives performance gains up to 39%, kernel optimizations give an improvement of up to 20%, and our strategies for adaptive asynchronism between CPU-GPU executions give performance improvements of up to 17% over default static asynchronous executions.
|
Page generated in 0.1072 seconds