Global ETD Search

921	An Efficient, Extensible, Hardware-aware Indexing Kernel Sadoghi Hamedani, Mohammad 20 June 2014 (has links) Modern hardware has the potential to play a central role in scalable data management systems. A realization of this potential arises in the context of indexing queries, a recurring theme in real-time data analytics, targeted advertising, algorithmic trading, and data-centric workflows, and of indexing data, a challenge in multi-version analytical query processing. To enhance query and data indexing, in this thesis, we present an efficient, extensible, and hardware-aware indexing kernel. This indexing kernel rests upon novel data structures and (parallel) algorithms that utilize the capabilities offered by modern hardware, especially abundance of main memory, multi-core architectures, hardware accelerators, and solid state drives. This thesis focuses on presenting our query indexing techniques to cope with processing queries in data-intensive applications that are susceptible to ever increasing data volume and velocity. At the core of our query indexing kernel lies the BE-Tree family of memory-resident indexing structures that scales by overcoming the curse of dimensionality through a novel two-phase space-cutting technique, an effective Top-k processing, and adaptive parallel algorithms to operate directly on compressed data (that exploits the multi-core architecture). Furthermore, we achieve line-rate processing by harnessing the unprecedented degrees of parallelism and pipelining only available through low-level logic design using FPGAs. Finally, we present a comprehensive evaluation that establishes the superiority of BE-Tree in comparison with state-of-the-art algorithms. In this thesis, we further expand the scope of our indexing kernel and describe how to accelerate analytical queries on (multi-version) databases by enabling indexes on the most recent data. Our goal is to reduce the overhead of index maintenance, so that indexes can be used effectively for analytical queries without being a heavy burden on transaction throughput. To achieve this end, we re-design the data structures in the storage hierarchy to employ an extra level of indirection over solid state drives. This indirection layer dramatically reduces the amount of magnetic disk I/Os that is needed for updating indexes and localizes the index maintenance. As a result, by rethinking how data is indexed, we eliminate the dilemma between update vs. query performance and reduce index maintenance and query processing cost substantially. Publish/Subscribe Event Processing Complex-Event Processing Boolean Expression Indexing Indexing Kernel Data Structure Data Indexing Query Indexing Storage Hierarchy Index Maintenance Compressed Matching Delta Compression Event Matching Content-based Publish/Subscribe FPGA SSD GPU Multicore 0984
922	Tuned and asynchronous stencil kernels for CPU/GPU systems Venkatasubramanian, Sundaresan 18 May 2009 (has links) We describe heterogeneous multi-CPU and multi-GPU implementations of Jacobi's iterative method for the 2-D Poisson equation on a structured grid, in both single- and double-precision. Properly tuned, our best implementation achieves 98% of the empirical streaming GPU bandwidth (66% of peak) on a NVIDIA C1060. Motivated to find a still faster implementation, we further consider "wildly asynchronous" implementations that can reduce or even eliminate the synchronization bottleneck between iterations. In these versions, which are based on the principle of a chaotic relaxation (Chazan and Miranker, 1969), we simply remove or delay synchronization between iterations, thereby potentially trading off more flops (via more iterations to converge) for a higher degree of asynchronous parallelism. Our relaxed-synchronization implementations on a GPU can be 1.2-2.5x faster than our best synchronized GPU implementation while achieving the same accuracy. Looking forward, this result suggests research on similarly "fast-and-loose" algorithms in the coming era of increasingly massive concurrency and relatively high synchronization or communication costs. Hybrid High performance computing Architecture Chaotic relaxation Tesla Linear system of equations Numerical methods Occupancy Algorithms Experimentation Performance Scientific computing Gauss siedel Shared memory Coalesced memory Bank conflicts GPU CUDA Nvidia Heterogenous CPU Iterative methods (Mathematics) Kernel functions
923	Medical Image Processing on the GPU : Past, Present and Future Eklund, Anders, Dufort, Paul, Forsberg, Daniel, LaConte, Stephen January 2013 (has links) Graphics processing units (GPUs) are used today in a wide range of applications, mainly because they can dramatically accelerate parallel computing, are affordable and energy efficient. In the field of medical imaging, GPUs are in some cases crucial for enabling practical use of computationally demanding algorithms. This review presents the past and present work on GPU accelerated medical image processing, and is meant to serve as an overview and introduction to existing GPU implementations. The review covers GPU acceleration of basic image processing operations (filtering, interpolation, histogram estimation and distance transforms), the most commonly used algorithms in medical imaging (image registration, image segmentation and image denoising) and algorithms that are specific to individual modalities (CT, PET, SPECT, MRI, fMRI, DTI, ultrasound, optical imaging and microscopy). The review ends by highlighting some future possibilities and challenges. Graphics processing unit (GPU) OpenGL DirectX CUDA OpenCL Filtering Interpolation Histogram estimation Distance transforms Image registration Image segmentation Image denoising CT PET SPECT MRI fMRI DTI Ultrasound Optical imaging Microscopy
924	Photodynamic therapies of high-grade gliomas : from theory to clinical perspectives / Thérapies photodynamiques appliquées aux gliomes de haut grade : de la théorie à la réalité clinique Dupont, Clément 24 November 2017 (has links) Les gliomes sont les tumeurs cérébrales primaires les plus communes chez l’adulte. Parmi eux, le glioblastome (GBM) représente la tumeur cérébrale la plus fréquente avec le pronostic le plus sombre. Son incidence annuelle est d'environ 3 à 5 cas pour 100 000 personnes (environ 3000 nouvelles chaque année en France). La survie médiane varie entre 11 et 13 mois selon la qualité de la résection tumorale.Le standard de soins inclue une résection chirurgicale et est suivie d'une radiothérapie et d'une chimiothérapie. Une résection maximale est souhaitée afin de diminuer les risques de récidive. Bien que l’utilisation de la technique de diagnostic photodynamique peropératoire, appelée résection fluoroguidée (FGR), améliore la qualité de résection, une récidive survient dans ces berges de la cavité opératoire dans 85% des cas.Des thérapies alternatives doivent être développées pour améliorer la survie globale des patients. Dans ce contexte, la thérapie photodynamique (PDT) semble pertinente. La PDT est basée sur la synergie de trois paramètres : une molécule, la photosensibilisateur (PS) qui se concentre préférentiellement dans les cellules tumorales, la lumière laser et l'oxygène. La lumière laser induit une réaction entre le PS et l’oxygène de la cellule. Cette réaction produit des molécules cytotoxiques (dont l'oxygène singulet) et conduit à la mort de cellules tumorales. Deux modalités de traitement sont étudiées : la PDT interstitielle (iPDT) ou la PDT peropératoire.L'objectif principal de cette thèse est de fournir des outils technologiques afin développer la PDT pour le traitement du GBM. Ainsi, les deux modalités de traitement ont été étudiées.Lorsque la résection n'est pas réalisable (environ 20% à 30% des cas), l'iPDT peut être privilégiée. Cette modalité vise à insérer des fibres optiques dans la cible thérapeutique pour éclairer les tissus tumoraux. Ainsi, la simulation de la propagation de la lumière dans les tissus est nécessaire pour planifier la localisation des fibres optiques. Considérée comme méthode de référence, un modèle Monte-Carlo accéléré par processeurs graphiques a été développé. Ce modèle calcule la propagation de la lumière émise par un diffuseur cylindrique dans des milieux hétérogènes. La précision du modèle a été évaluée avec des mesures expérimentales. L'accélération fournie par la parallélisation permet son utilisation dans la routine clinique.L'iPDT doit être planifiée à l'aide d'un système de planification de traitement (TPS). Une preuve de concept d'un TPS dédié au traitement stéréotaxique iPDT du GBM a été développée. Ce logiciel fournit des outils de base pour planifier l'insertion stéréotaxique de diffuseurs cylindriques et calculer la dosimétrie associée. Le recalage stéréotaxique et la précision du calcul dosimétrique ont été évalués avec des méthodologies spécifiques.Lorsque la résection est réalisable, la PDT peropératoire peut être appliquée au début de la FGR. Celle-ci profite de la présence du PS (la protoporphyrine IX) utilisé pour la FGR et qui s’est déjà concentrée dans les cellules tumorales. Ainsi, la stratégie de traitement proposée peut s’inclure facilement au standard de soin. Un dispositif médical a été conçu pour s'adapter à la cavité et éclairer de façon homogène les berges de la cavité opératoire. Le dispositif est constitué de deux parties : un trocart couplé à un ballon gonflable et un guide de fibre optique développé au sein du laboratoire ONCO-THAI permettant d'insérer la source lumineuse. Des méthodologies spécifiques ont été développées pour étalonner et évaluer l'appareil en termes de contrainte mécanique et de dosimétrie. L'étalonnage a permis la création d’une fonction de transfert permettant une prescription de durée de traitement rapide, robuste et facile. De plus, de nombreux tests ont été réalisés en amont de l'essai clinique qui évalue la sécurité de la procédure. / Gliomas are the most common primary brain tumors in adults. Among them, glioblastoma (GBM) represents the most frequent primary brain tumor and have the most dismal prognosis. Its annual incidence is about 3 to 5 cases for 100,000 persons (about 3000 news cases each year in France). Median survival varies between 11 to 13 months according the extent of tumor resection.The standard of care includes surgery and is followed by radiation therapy and chemotherapy. Maximal resection is expected to delay recurrence. Despite of using intraoperative photodynamic diagnosis, or fluorescence guided resection (FGR), which improves the extent of resection, relapse still occurs in these resection margins in 85% of cases.Alternatives therapies have to be developed to enhance patients’ overall survival. In this context, Photodynamic Therapy (PDT) seems relevant. PDT is based on the synergy of three parameters: a photosensitizing molecule, the photosensitizer (PS) that concentrates preferentially into the tumor cells, laser light and oxygen. Laser light induces a reaction between the PS and the oxygen of the cell. This reaction produces highly cytotoxic molecules (including singlet oxygen) and leads to death of tumor cells. Two treatment modalities are investigated: interstitial PDT (iPDT) or intraoperative PDT.The main goal of this thesis is to provide technological tools to develop the PDT for GBM treatment. Thus, the two treatment modalities have been investigated.When tumor resection is non-achievable (about 20% to 30% of cases), iPDT may be preferred. This modality aims to insert optical fibers directly into the target to illuminate tumor tissues. Thus, simulation of light propagation in brain tissues is required to plan the location of optical fibers. Considered as reference method, a Monte-Carlo model accelerated by graphics processing unit was developed. This model computes the light propagation emitted by a cylindrical diffusor inside heterogeneous media. Accuracy of the model was evaluated with experimental measurements. The acceleration provided by the parallelization allows its use in clinical routine.The iPDT has to be planned using a Treatment Planning System (TPS). A proof of concept of a TPS dedicated to the stereotactic iPDT treatment of GBM was developed. This software provides basic tools to plan the stereotactic insertion of cylindrical diffusors in patient’s brain and to compute the associated dosimetry. The stereotactic registration and the dosimetry computation’s accuracy were evaluated with specific methodologies.When tumor resection is achievable, the intraoperative PDT may be applied early after the FGR. It takes advantage of the presence of the PS (the protoporphyrin IX) used for FGR purpose and that is already concentrates into the tumor cells. Thus, the proposed treatment strategy fits into the current standard of care. A medical device was designed to fit to the resection cavity and illuminate homogeneously the cavity’s margins. The device is constituted of two parts: a trocar coupled to an inflatable balloon and a fiber guide developed in the ONCO-THAI laboratory allowing to insert the light source. Specific methodologies were developed to calibrate and assess the device in terms of mechanical properties and dosimetry. The calibration process leaded to a transfer function that provides fast, robust and easy treatment duration prescription to induce a PDT response in cavity margins. Furthermore, a comprehensive experimental design has been worked out prior to the clinical trial that evaluate the safety of the procedure. Thérapie photodynamique Glioblastome Dosimétrie Simulation Monte-Carlo Dispositif médical Étude clinique Photodynamic therapy Glioblastonoma Dosimetry Monte-Carlo simulation GPU computing Medical device Clinical trial
925	Robot Goalkeeper : A robotic goalkeeper based on machine vision and motor control Adeboye, Taiyelolu January 2018 (has links) This report shows a robust and efficient implementation of a speed-optimized algorithm for object recognition, 3D real world location and tracking in real time. It details a design that was focused on detecting and following objects in flight as applied to a football in motion. An overall goal of the design was to develop a system capable of recognizing an object and its present and near future location while also actuating a robotic arm in response to the motion of the ball in flight. The implementation made use of image processing functions in C++, NVIDIA Jetson TX1, Sterolabs’ ZED stereoscopic camera setup in connection to an embedded system controller for the robot arm. The image processing was done with a textured background and the 3D location coordinates were applied to the correction of a Kalman filter model that was used for estimating and predicting the ball location. A capture and processing speed of 59.4 frames per second was obtained with good accuracy in depth detection while the ball was well tracked in the tests carried out. Object detection 3D reconstruction Object tracking Robot goalkeeper C++ OpenCV CUDA MATLAB Image processing Machine vision Linux OS GPU NVIDIA TX1 Stereolabs ZED Robotics Robotteknik och automation Embedded Systems Inbäddad systemteknik
926	Optimisation de méthodes numériques pour la physique des plasmas : application aux faisceaux de particules chargées / Optimisation of numerical methods for plasma physics : application to charged particle beams Crestetto, Anaïs 04 October 2012 (has links) Cette thèse propose différentes méthodes numériques pour simuler les plasmas ou les faisceaux de particules chargées à coût réduit. Le mouvement de particules chargées soumises à un champ électromagnétique est régi par l'équation de Vlasov, couplée aux équations de Maxwell ou de Poisson. Dans la première partie, une méthode multi-fluides est utilisée pour la résolution du système de Vlasov-Poisson 1D. Elle est basée sur la connaissance a priori de la forme prise par la fonction de distribution f. Ce type de méthodes est plutôt adapté aux systèmes restant proches de l'état d'équilibre. La deuxième partie propose de décomposer f en une partie d'équilibre et une perturbation. L'équilibre est résolu par une méthode fluide, la perturbation par une méthode cinétique plus précise. On construit un schéma préservant l'asymptotique pour le système de Vlasov-Poisson-BGK basé sur une telle décomposition. On étudie dans la troisième partie la méthode PIC en géométrie 2D axisymétrique. Un travail basé sur l'analyse isogéométrique est présenté ainsi qu'un code PIC - Galerkin Discontinu parallélisé sur carte graphique. / This thesis presents different numerical methods for the simulation of plasmas or charged particles beams with reduced cost. Movement of charged particles in an electromagnetic field is given by the Vlasov equation, coupled to the Maxwell equations for the electromagnetic field, or to the Poisson equation. In the first part, a multi-fluid method is used for solving the 1D Vlasov-Poisson system. It is based on the a priori knowledge of the shape of f. This kind of methods is rather adapted to systems staying close to the equilibrium. The second part presents the decomposition of f between an equilibrium part and a perturbation. The equilibrium part is solved by a fluid method whereas we use a kinetic method for the perturbation. We construct an asymptotic preserving scheme for the Vlasov-Poisson-BGK system using such a decomposition. The third part deals with the PIC method in 2D axisymmetric geometry. A work based on isogeometric analysis is presented, and then a PIC - Discontinuous Galerkin program computed on graphic card. Vlasov-Maxwell Vlasov-Poisson Vlasov-BGK Décomposition micro-macro Schéma préservant l'asymptotique Programmation sur GPU Méthode des moments Modèle multi-water-bag Vlasov-Maxwell Vlasov-Poisson Vlasov-BGK Micro-macro decomposition 518.2
927	Task-based multifrontal QR solver for heterogeneous architectures / Solveur multifrontal QR à base de tâches pour architectures hétérogènes Lopez, Florent 11 December 2015 (has links) Afin de s'adapter aux architectures multicoeurs et aux machines de plus en plus complexes, les modèles de programmations basés sur un parallélisme de tâche ont gagné en popularité dans la communauté du calcul scientifique haute performance. Les moteurs d'exécution fournissent une interface de programmation qui correspond à ce paradigme ainsi que des outils pour l'ordonnancement des tâches qui définissent l'application. Dans cette étude, nous explorons la conception de solveurs directes creux à base de tâches, qui représentent une charge de travail extrêmement irrégulière, avec des tâches de granularités et de caractéristiques différentes ainsi qu'une consommation mémoire variable, au-dessus d'un moteur d'exécution. Dans le cadre du solveur qr mumps, nous montrons dans un premier temps la viabilité et l'efficacité de notre approche avec l'implémentation d'une méthode multifrontale pour la factorisation de matrices creuses, en se basant sur le modèle de programmation parallèle appelé "flux de tâches séquentielles" (Sequential Task Flow). Cette approche, nous a ensuite permis de développer des fonctionnalités telles que l'intégration de noyaux dense de factorisation de type "minimisation de cAfin de s'adapter aux architectures multicoeurs et aux machines de plus en plus complexes, les modèles de programmations basés sur un parallélisme de tâche ont gagné en popularité dans la communauté du calcul scientifique haute performance. Les moteurs d'exécution fournissent une interface de programmation qui correspond à ce paradigme ainsi que des outils pour l'ordonnancement des tâches qui définissent l'application. Dans cette étude, nous explorons la conception de solveurs directes creux à base de tâches, qui représentent une charge de travail extrêmement irrégulière, avec des tâches de granularités et de caractéristiques différentes ainsi qu'une consommation mémoire variable, au-dessus d'un moteur d'exécution. Dans le cadre du solveur qr mumps, nous montrons dans un premier temps la viabilité et l'efficacité de notre approche avec l'implémentation d'une méthode multifrontale pour la factorisation de matrices creuses, en se basant sur le modèle de programmation parallèle appelé "flux de tâches séquentielles" (Sequential Task Flow). Cette approche, nous a ensuite permis de développer des fonctionnalités telles que l'intégration de noyaux dense de factorisation de type "minimisation de cAfin de s'adapter aux architectures multicoeurs et aux machines de plus en plus complexes, les modèles de programmations basés sur un parallélisme de tâche ont gagné en popularité dans la communauté du calcul scientifique haute performance. Les moteurs d'exécution fournissent une interface de programmation qui correspond à ce paradigme ainsi que des outils pour l'ordonnancement des tâches qui définissent l'application. / To face the advent of multicore processors and the ever increasing complexity of hardware architectures, programming models based on DAG parallelism regained popularity in the high performance, scientific computing community. Modern runtime systems offer a programming interface that complies with this paradigm and powerful engines for scheduling the tasks into which the application is decomposed. These tools have already proved their effectiveness on a number of dense linear algebra applications. In this study we investigate the design of task-based sparse direct solvers which constitute extremely irregular workloads, with tasks of different granularities and characteristics with variable memory consumption on top of runtime systems. In the context of the qr mumps solver, we prove the usability and effectiveness of our approach with the implementation of a sparse matrix multifrontal factorization based on a Sequential Task Flow parallel programming model. Using this programming model, we developed features such as the integration of dense 2D Communication Avoiding algorithms in the multifrontal method allowing for better scalability compared to the original approach used in qr mumps. In addition we introduced a memory-aware algorithm to control the memory behaviour of our solver and show, in the context of multicore architectures, an important reduction of the memory footprint for the multifrontal QR factorization with a small impact on performance. Following this approach, we move to heterogeneous architectures where task granularity and scheduling strategies are critical to achieve performance. We present, for the multifrontal method, a hierarchical strategy for data partitioning and a scheduling algorithm capable of handling the heterogeneity of resources. Finally we present a study on the reproducibility of executions and the use of alternative programming models for the implementation of the multifrontal method. All the experimental results presented in this study are evaluated with a detailed performance analysis measuring the impact of several identified effects on the performance and scalability. Thanks to this original analysis, presented in the first part of this study, we are capable of fully understanding the results obtained with our solver. Méthode multifrontale Multicœur Moteurs d'exécutions Architectures hétérogènes Calcul haute performance GPU Sparse direct solvers Multifrontal method Multicores Runtime systems Scheduling Memory-aware algorythms Heterogeneous architectures High-performance computing
928	Audiovisual voice activity detection and localization of simultaneous speech sources / Detecção de atividade de voz e localização de fontes sonoras simultâneas utilizando informações audiovisuais Minotto, Vicente Peruffo January 2013 (has links) Em vista da tentência de se criarem intefaces entre humanos e máquinas que cada vez mais permitam meios simples de interação, é natural que sejam realizadas pesquisas em técnicas que procuram simular o meio mais convencional de comunicação que os humanos usam: a fala. No sistema auditivo humano, a voz é automaticamente processada pelo cérebro de modo efetivo e fácil, também comumente auxiliada por informações visuais, como movimentação labial e localizacão dos locutores. Este processamento realizado pelo cérebro inclui dois componentes importantes que a comunicação baseada em fala requere: Detecção de Atividade de Voz (Voice Activity Detection - VAD) e Localização de Fontes Sonoras (Sound Source Localization - SSL). Consequentemente, VAD e SSL também servem como ferramentas mandatórias de pré-processamento em aplicações de Interfaces Humano-Computador (Human Computer Interface - HCI), como no caso de reconhecimento automático de voz e identificação de locutor. Entretanto, VAD e SSL ainda são problemas desafiadores quando se lidando com cenários acústicos realísticos, particularmente na presença de ruído, reverberação e locutores simultâneos. Neste trabalho, são propostas abordagens para tratar tais problemas, para os casos de uma e múltiplas fontes sonoras, através do uso de informações audiovisuais, explorando-se variadas maneiras de se fundir as modalidades de áudio e vídeo. Este trabalho também emprega um arranjo de microfones para o processamento de som, o qual permite que as informações espaciais dos sinais acústicos sejam exploradas através do algoritmo estado-da-arte SRP (Steered Response Power). Por consequência adicional, uma eficiente implementação em GPU do SRP foi desenvolvida, possibilitando processamento em tempo real do algoritmo. Os experimentos realizados mostram uma acurácia média de 95% ao se efetuar VAD de até três locutores simultâneos, e um erro médio de 10cm ao se localizar tais locutores. / Given the tendency of creating interfaces between human and machines that increasingly allow simple ways of interaction, it is only natural that research effort is put into techniques that seek to simulate the most conventional mean of communication humans use: the speech. In the human auditory system, voice is automatically processed by the brain in an effortless and effective way, also commonly aided by visual cues, such as mouth movement and location of the speakers. This processing done by the brain includes two important components that speech-based communication require: Voice Activity Detection (VAD) and Sound Source Localization (SSL). Consequently, VAD and SSL also serve as mandatory preprocessing tools for high-end Human Computer Interface (HCI) applications in a computing environment, as the case of automatic speech recognition and speaker identification. However, VAD and SSL are still challenging problems when dealing with realistic acoustic scenarios, particularly in the presence of noise, reverberation and multiple simultaneous speakers. In this work we propose some approaches for tackling these problems using audiovisual information, both for the single source and the competing sources scenario, exploiting distinct ways of fusing the audio and video modalities. Our work also employs a microphone array for the audio processing, which allows the spatial information of the acoustic signals to be explored through the stateof- the art method Steered Response Power (SRP). As an additional consequence, a very fast GPU version of the SRP is developed, so that real-time processing is achieved. Our experiments show an average accuracy of 95% when performing VAD of up to three simultaneous speakers and an average error of 10cm when locating such speakers. Reconhecimento : Padroes Reconhecimento : Voz Voz computacional Tempo real Voice activity detection Sound source localization Multiple speakers Competing sources Multimodal fusion Microphone array HiddenMarkov model Support vector machine GPU programming
929	Noise Reduction in Flash X-ray Imaging Using Deep Learning Sundman, Tobias January 2018 (has links) Recent improvements in deep learning architectures, combined with the strength of modern computing hardware such as graphics processing units, has lead to significant results in the field of image analysis. In this thesis work, locally connected architectures are employed to reduce noise in flash X-ray diffraction images. The layers in these architectures use convolutional kernels, but without shared weights. This combines the benefits of lower model memory footprint in convolutional networks with the higher model capacity of fully connected networks. Since the camera used to capture the diffraction images has pixelwise unique characteristics, and thus lacks equivariance, this compromise can be beneficial. The background images of this thesis work were generated with an active laser but without injected samples. Artificial diffraction patterns were then added to these background images allowing for training U-Net architectures to separate them. Architecture A achieved a performance of 0.187 on the test set, roughly translating to 35 fewer photon errors than a model similar to state of the art. After smoothing the photon errors this performance increased to 0.285, since the U-Net architectures managed to remove flares where state of the art could not. This could be taken as a proof of concept that locally connected networks are able to separate diffraction from background in flash X-Ray imaging. flash x-ray imaging machine learning deep learning neural network locally connected layer autoencoder LCLS GPU tensorflow python U-Net free electron laser selu scaled exponential linear unit diffraction simulation residual concatenation Engineering and Technology Teknik och teknologier
930	Plataforma computacional híbrida de coprocessamento paralelo distribuído por web services aplicada à radiointerferometria Silva, Gustavo Poli Lameirão da 19 August 2013 (has links) Made available in DSpace on 2016-06-02T19:03:58Z (GMT). No. of bitstreams: 1 5593.pdf: 13078959 bytes, checksum: 1cc88a226e87c0a4ca26af32176acea5 (MD5) Previous issue date: 2013-08-19 / Financiadora de Estudos e Projetos / The requirements imposed by the new applications presents great challenges to the computation. There is not a perfect computer architecture, capable to attend to all the requirements. The parallel and hybrid computer arrangement rise as a solution to this scenario i.e., the CPU-Coprocessor pair arrangement can form a specialized computerized instrument for a special application task. This doctoral thesis proposes a parallel and hybrid computational platform denoted CoP-WS, that uses the interoperability technology known as Web Services. As coprocessor it is used the graphic processing unit, known as the GPU, functioning recently as parallel thread level processing of general use applications. The platform test of feasibility was inspired in radio astronomy, and it has been implemented two applications: a complex correlator of signals provided by a radio interferometric arrangement, and a flare recognition system with a solar radio interferometer image. Both processings can be inserted in the context of pipeline execution, using sufficient configuration of CPU-GPU pairs, having on one side the interferometric arrangement antenna signal input and in the other side the result of the solar flare recognition. The obtained results of the both applications show the feasibility of the CoP-WS platform, for greater volume of data being processed in quasi real time. In the case of the correlator the average processing time in each integration period was around 160 ms, and in the case of the solar flare recognition, 48 ms for each solar disk image. / Os requisitos impostos pelas novas aplicações, sejam estas científicas, ou não, apresentam grandes desafios à computação. Não existe uma arquitetura de computadores "perfeita" que seja capaz de atender a todos estes requisitos. A configuração de arranjos paralelos e híbridos de computadores se apresenta como uma solução para este cenário, ou seja, a configuração de arranjos de pares CPU-Coprocessador, pode ser especializada para o processamento de uma aplicação distintas. Este trabalho de doutorado propõe uma plataforma computacional paralela e híbrida distribuída denominada CoP-WS, que utiliza a tecnologia de interoperabilidade conhecida como Web Services. Como coprocessador é utilizada a unidade de processamento gráfico conhecida como GPU, cuja função tem sido de processamento paralelo ao nível de threads, para aplicações gerais nos últimos tempos. A prova de viabilidade da plataforma implementada foi inspirada na radioastronomia, tendo sido implementados dois aplicativos: um correlacionador complexo de sinais provindos dos arranjos interferométricos e um sistema para o reconhecimento de explosões solares, numa imagem de radiointerferometria solar. Ambos os processamentos podem ser inseridos num contexto de execução em pipeline, usando uma configuração suficiente de pares CPU-GPU, tendo de um lado a entrada dos sinais das antenas do arranjo interferométrico e do outro lado o resultado do processamento de reconhecimento de explosões solares. Em ambas aplicações os resultados foram satisfatórios sendo que no caso do correlacionador o tempo médio de processamento de cada ciclo de integração foi de aproximadamente 160 ms, e para a aplicação de reconhecimento de explosões solares, de 48 ms por imagem de disco solar. Arquitetura de computador GPU CUDA Rádiointerferometria Correlação de sinais complexos Arquitetura paralela e híbrida Web service Correlacionador de sinais Reconhecimento de explosões solares Architecture parallel hybrid Sinal correlator Solar flare recognition

Search results