Global ETD Search

81	Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit Delorme, Michael Christopher 18 March 2013 (has links) We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at the begining and end of the sort. Our fine-grained implementation utilizes the APU’s integrated memory system to share data throughout the sort. Both these implementations outperform the current state of the art GPU radix sort from NVIDIA. We therefore demonstrate that the CPU can be efficiently used to speed up radix sort on the APU. Our fine-grained implementation slightly outperforms our coarse-grained implementation. This demonstrates the benefit of the APU’s integrated architecture. This performance benefit is hindered by limitations in the APU’s architecture and programming model. We believe that the performance benefits will increase once these limitations are addressed in future generations of the APU. Parallel sorting Radix sort Heterogeneous computing GPU GPGPU AMD Fusion Llano APU Accelerated Processing Unit OpenCL Fusion Sort GPU computing 0984
82	Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine Learning Westphal, Florian January 2018 (has links) Large collections of historical document images have been collected by companies and government institutions for decades. More recently, these collections have been made available to a larger public via the Internet. However, to make accessing them truly useful, the contained images need to be made readable and searchable. One step in that direction is document image binarization, the separation of text foreground from page background. This separation makes the text shown in the document images easier to process by humans and other image processing algorithms alike. While reasonably well working binarization algorithms exist, it is not sufficient to just being able to perform the separation of foreground and background well. This separation also has to be achieved in an efficient manner, in terms of execution time, but also in terms of training data used by machine learning based methods. This is necessary to make binarization not only theoretically possible, but also practically viable. In this thesis, we explore different ways to achieve efficient binarization in terms of execution time by improving the implementation and the algorithm of a state-of-the-art binarization method. We find that parameter prediction, as well as mapping the algorithm onto the graphics processing unit (GPU) help to improve its execution performance. Furthermore, we propose a binarization algorithm based on recurrent neural networks and evaluate the choice of its design parameters with respect to their impact on execution time and binarization quality. Here, we identify a trade-off between binarization quality and execution performance based on the algorithm’s footprint size and show that dynamically weighted training loss tends to improve the binarization quality. Lastly, we address the problem of training data efficiency by evaluating the use of interactive machine learning for reducing the required amount of training data for our recurrent neural network based method. We show that user feedback can help to achieve better binarization quality with less training data and that visualized uncertainty helps to guide users to give more relevant feedback. / Scalable resource-efficient systems for big data analytics image binarization heterogeneous computing recurrent neural networks interactive machine learning historical documents Computer Engineering Datorteknik Computer Sciences Datavetenskap (datalogi)
83	Design and Implementation of the Heterogeneous Computing Device Management Architecture Schultek, Brian Robert January 2014 (has links) No description available. Electrical Engineering Computer Engineering Heterogeneous Computing Hardware Acceleration Algorithm Acceleration PCIe Device Management High Throughput Applications
84	Construção de mosaico de imagens aéreas em plataformas heterogêneas para aplicações agrícolas / Construction of aerial imagery mosaic on platforms for agricultural applications Candido, Leandro Rosendo 29 March 2019 (has links) A agricultura de precisão tem agregado alto valor para os agricultores por causa das tecnologias que estão ligadas a ela. Sistemas que extraem informações de imagens digitais são extremamente utilizados para que o agricultor tome decisões a fim de aumentar sua produtividade. Uma das técnicas de realizar o monitoramento é a construção de um mosaico de imagens aéreas, onde são utilizadas aeronaves voando em baixa altitude. Esta técnica pode levar dezenas de horas para ser concluída, dependendo da configuração do computador que a executa. Com o intuito de reduzir o tempo nessa construção e tornar possível o embarque a essa aplicação, este trabalho apresenta uma maneira simplificada de construir o mosaico de imagens aéreas baseada na técnica de georreferenciamento direto, no qual utiliza a computação heterogênea para acelerar o desempenho. Essa abordagem é composta por apenas três técnicas que também compõem a abordagem clássica para a construção de mosaicos (warping, extração de características e combinação de características), além de inserir em seus cálculos os dados fornecidos pelos sensores GPS e IMU com a finalidade de direcionar e posicionar cada imagem pertencente ao conjunto que formará o mosaico. A plataforma de computação heterogênea utilizada neste trabalho é a NVIDIA Jetson TK1 escolhida pelo fato de disponibilizar de uma GPU que suporta a linguagem de programação CUDA. Utilizando esta abordagem, a falta de correção da perspectiva do conteúdo (geometria) da imagem gera um resultado inesperado, pois os dados fornecidos pela IMU, ao contrário do que se imagina, apenas servem para corrigir a posição das coordenadas do GPS registradas no momento de captura de cada imagem que compõem o mosaico. O tempo de execução da aplicação desenvolvida é satisfatório tornando possível a adoção desta abordagem. / Accuracy agriculture has added value to farmers thanks to the new technologies that are linked to it. Systems that extract information from digital images are very usefull to help farmers making decisions in order to increase their productivity. One of the techniques to perform this kind of monitoring is the construction of an aerial imagery mosaic where aircrafts flies in low altitude. This technique may take hours to be completed, depending on computer\'s configuration. With the purpose of reducing time in this construction, this thesis presents a simplified way to make aerial imagery mosaic based on direct georeferencing. This approach is composed by three techniques that also make up the classic approach to building mosaics (warping, extraction of characteristics and combination of characteristics), the difference is with this technique here presented is also possible to insert into the calculations the data provided by the GPS and IMU sensors with the purpose of directing and positioning each image to the belonging set to form the mosaic. The heterogeneous computing platform used in this work is the NVIDIA JetsonTK1, this platform was chosen because it offers a GPU that supports the language of CUDA programming. If the images\' geometry errors weren\'t rectfyed, using this approach, an unexpected result happens, because the data provided by IMU, contrary to what is imagined, only serve to correct the position of the GPS coordinates recorded at the moment of capture of each image that composes the mosaic. The developing time in this application is satisfactory making the adoption of this approch favorable. Aerial image mosaic Agricultura de precisão Agriculture of precision Bundle adjustment Bundle adjustment Computação heterogênea CUDA CUDA Direct georeferencing Georreferenciamento direto GPS GPS Heterogeneous computing IMU IMU JetsonTK1 Mosaico de imagens aéreas NVIDIA NVIDIA Jetson TK1 Triangulação Triangulation
85	A framework for efficient execution on GPU and CPU+GPU systems / Framework pour une exécution efficace sur systèmes GPU et CPU+GPU Dollinger, Jean-François 01 July 2015 (has links) Les verrous technologiques rencontrés par les fabricants de semi-conducteurs au début des années deux-mille ont abrogé la flambée des performances des unités de calculs séquentielles. La tendance actuelle est à la multiplication du nombre de cœurs de processeur par socket et à l'utilisation progressive des cartes GPU pour des calculs hautement parallèles. La complexité des architectures récentes rend difficile l'estimation statique des performances d'un programme. Nous décrivons une méthode fiable et précise de prédiction du temps d'exécution de nids de boucles parallèles sur GPU basée sur trois étapes : la génération de code, le profilage offline et la prédiction online. En outre, nous présentons deux techniques pour exploiter l'ensemble des ressources disponibles d'un système pour la performance. La première consiste en l'utilisation conjointe des CPUs et GPUs pour l'exécution d'un code. Afin de préserver les performances il est nécessaire de considérer la répartition de charge, notamment en prédisant les temps d'exécution. Le runtime utilise les résultats du profilage et un ordonnanceur calcule des temps d'exécution et ajuste la charge distribuée aux processeurs. La seconde technique présentée met le CPU et le GPU en compétition : des instances du code cible sont exécutées simultanément sur CPU et GPU. Le vainqueur de la compétition notifie sa complétion à l'autre instance, impliquant son arrêt. / Technological limitations faced by the semi-conductor manufacturers in the early 2000's restricted the increase in performance of the sequential computation units. Nowadays, the trend is to increase the number of processor cores per socket and to progressively use the GPU cards for highly parallel computations. Complexity of the recent architectures makes it difficult to statically predict the performance of a program. We describe a reliable and accurate parallel loop nests execution time prediction method on GPUs based on three stages: static code generation, offline profiling, and online prediction. In addition, we present two techniques to fully exploit the computing resources at disposal on a system. The first technique consists in jointly using CPU and GPU for executing a code. In order to achieve higher performance, it is mandatory to consider load balance, in particular by predicting execution time. The runtime uses the profiling results and the scheduler computes the execution times and adjusts the load distributed to the processors. The second technique, puts CPU and GPU in a competition: instances of the considered code are simultaneously executed on CPU and GPU. The winner of the competition notifies its completion to the other instance, implying the termination of the latter. GPGPU Modèle polyédrique Prédiction de performance Sélection de code adaptative Calcul hétérogène Ordonnancement CPU vs GPU CPU + GPU GPGPU Polyhedral model Performance prediction Adaptive code selection Heterogeneous computing Scheduling CPU vs GPU CPU + GPU 004.2 004.3
86	A Trusted Autonomic Architecture to Safeguard Cyber-Physical Control Leaf Nodes and Protect Process Integrity Chiluvuri, Nayana Teja 16 September 2015 (has links) Cyber-physical systems are networked through IT infrastructure and susceptible to malware. Threats targeting process control are much more safety-critical than traditional computing systems since they jeopardize the integrity of physical infrastructure. Existing defence mechanisms address security at the network nodes but do not protect the physical infrastructure if network integrity is compromised. An interface guardian architecture is implemented on cyber-physical control leaf nodes to maintain process integrity by enforcing high-level safety and stability policies. Preemptive detection schemes are implemented to monitor process behavior and anticipate malicious activity before process safety and stability are compromised. Autonomic properties are employed to automatically protect process integrity by initiating switch-over to a verified backup controller. Subsystems adhere to strict trust requirements safeguarding them from adversarial intrusion. The preemptive detection schemes, switch-over logic, backup controller, and process communication are all trusted components that are separated from the untrusted production controller. The proposed architecture is applied to a rotary inverted pendulum experiment and implemented on a Xilinx Zynq-7000 configurable SoC. The leaf node implementation is integrated into a cyber-physical control topology. Simulated attack scenarios show strengthened resilience to both network integrity and reconfiguration attacks. Threats attempting to disrupt process behavior are successfully thwarted by having a backup controller maintain process stability. The system ensures both safety and liveness properties even under adversarial conditions. / Master of Science Process control systems cyber-physical systems autonomic systems programmable logic controller remote terminal unit human-machine interface Field programmable gate arrays trust configurable system-on-chip heterogeneous computing high-level synthesis
87	NEW COMPUTATIONAL METHODS FOR 3D STRUCTURE DETERMINATION OF MACROMOLECULAR COMPLEXES BY SINGLE PARTICLE CRYO-ELECTRON MICROSCOPY / Methodische Entwicklungen in der Bildverarbeitung kryoelektronenmikroskopischer Aufnahmen und deren Anwendung in der Strukturbestimmung biologischer Makromoleküle Schmeißer, Martin 17 April 2009 (has links) No description available. 500 Naturwissenschaften allgemein Mathematics and Computer Science Parallel Prozessierung Grafikkarten Prozessierung Verteilte Systeme Nicht dedizierte Systeme Mehrkernsysteme Performanz Hochleistungsrechner Elektronenmikroskopie Parallel Processing GPU Processing Distributed heterogeneous computing Non-dedicated systems Multicore Performance Cluster Computing Electron microscopy RA 000: Allgemeine Naturwissenschaften

Page generated in 0.1066 seconds