Spelling suggestions: "subject:"heterogeneouscomponents"" "subject:"heterogeneousclustering""
81 |
Coordinated system level resource management for heterogeneous many-core platformsGupta, Vishakha 24 August 2011 (has links)
A challenge posed by future computer architectures is the efficient exploitation of their many and sometimes
heterogeneous computational cores. This challenge is exacerbated by the multiple facilities for data movement
and sharing across cores resident on such platforms. To answer the question of how systems software should treat heterogeneous
resources, this dissertation describes an approach that (1) creates a common manageable pool for all the
resources present in the platform, and then (2) provides virtual machines (VMs) with multiple `personalities',
flexibly mapped to and efficiently run on the heterogeneous underlying hardware. A VM's personality is its execution
context on the different types of available processing resources usable by the VM. We provide mechanisms for
making such platforms manageable and evaluate coordinated scheduling policies for mapping different VM personalities on
heterogeneous hardware.
Towards that end, this dissertation contributes technologies that include
(1) restructuring hypervisor and system functions to create high performance environments that enable flexibility
of execution and data sharing,
(2) scheduling and other resource management infrastructure for supporting diverse application needs and
heterogeneous platform characteristics, and
(3) hypervisor level policies to permit efficient and coordinated resource usage and sharing.
Experimental evaluations on multiple heterogeneous platforms, like one comprised of x86-based cores with attached
NVIDIA accelerators and others with asymmetric elements on chip,
demonstrate the utility of the approach and its ability to efficiently host diverse applications
and resource management methods.
|
82 |
Performance and energy efficiency via an adaptive MorphCore architectureKhubaib 09 July 2014 (has links)
The level of Thread-Level Parallelism (TLP), Instruction-Level Parallelism (ILP), and Memory-Level Parallelism (MLP) varies across programs and across program phases. Hence, every program requires different underlying core microarchitecture resources for high performance and/or energy efficiency. Current core microarchitectures are inefficient because they are fixed at design time and do not adapt to variable TLP, ILP, or MLP. I show that if a core microarchitecture can adapt to the variation in TLP, ILP, and MLP, significantly higher performance and/or energy efficiency can be achieved. I propose MorphCore, a low-overhead adaptive microarchitecture built from a traditional OOO core with small changes. MorphCore adapts to TLP by operating in two modes: (a) as a wide-width large-OOO-window core when TLP is low and ILP is high, and (b) as a high-performance low-energy highly-threaded in-order SMT core when TLP is high. MorphCore adapts to ILP and MLP by varying the superscalar width and the out-of-order (OOO) window size by operating in four modes: (1) as a wide-width large-OOO-window core, 2) as a wide-width medium-OOO-window core, 3) as a medium-width large-OOO-window core, and 4) as a medium-width medium-OOO-window core. My evaluation with single-thread and multi-thread benchmarks shows that when highest single-thread performance is desired, MorphCore achieves performance similar to a traditional out-of-order core. When energy efficiency is desired on single-thread programs, MorphCore reduces energy by up to 15% (on average 8%) over an out-of-order core. When high multi-thread performance is desired, MorphCore increases performance by 21% and reduces energy consumption by 20% over an out-of-order core. Thus, for multi-thread programs, MorphCore's energy efficiency is similar to highly-threaded throughput-optimized small and medium core architectures, and its performance is two-thirds of their potential. / text
|
83 |
Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing UnitDelorme, Michael Christopher 18 March 2013 (has links)
We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at the begining and end of the sort. Our fine-grained implementation utilizes the APU’s integrated memory system to share data throughout the sort. Both these implementations outperform the current state of the art GPU radix sort from NVIDIA. We therefore demonstrate that the CPU can be efficiently used to speed up radix sort on the APU.
Our fine-grained implementation slightly outperforms our coarse-grained implementation. This demonstrates the benefit of the APU’s integrated architecture. This performance benefit is hindered by limitations in the APU’s architecture and programming model. We believe that the performance benefits will increase once these limitations are addressed in future generations of the APU.
|
84 |
Efficient Document Image Binarization using Heterogeneous Computing and Interactive Machine LearningWestphal, Florian January 2018 (has links)
Large collections of historical document images have been collected by companies and government institutions for decades. More recently, these collections have been made available to a larger public via the Internet. However, to make accessing them truly useful, the contained images need to be made readable and searchable. One step in that direction is document image binarization, the separation of text foreground from page background. This separation makes the text shown in the document images easier to process by humans and other image processing algorithms alike. While reasonably well working binarization algorithms exist, it is not sufficient to just being able to perform the separation of foreground and background well. This separation also has to be achieved in an efficient manner, in terms of execution time, but also in terms of training data used by machine learning based methods. This is necessary to make binarization not only theoretically possible, but also practically viable. In this thesis, we explore different ways to achieve efficient binarization in terms of execution time by improving the implementation and the algorithm of a state-of-the-art binarization method. We find that parameter prediction, as well as mapping the algorithm onto the graphics processing unit (GPU) help to improve its execution performance. Furthermore, we propose a binarization algorithm based on recurrent neural networks and evaluate the choice of its design parameters with respect to their impact on execution time and binarization quality. Here, we identify a trade-off between binarization quality and execution performance based on the algorithm’s footprint size and show that dynamically weighted training loss tends to improve the binarization quality. Lastly, we address the problem of training data efficiency by evaluating the use of interactive machine learning for reducing the required amount of training data for our recurrent neural network based method. We show that user feedback can help to achieve better binarization quality with less training data and that visualized uncertainty helps to guide users to give more relevant feedback. / Scalable resource-efficient systems for big data analytics
|
85 |
Design and Implementation of the Heterogeneous Computing Device Management ArchitectureSchultek, Brian Robert January 2014 (has links)
No description available.
|
86 |
Construção de mosaico de imagens aéreas em plataformas heterogêneas para aplicações agrícolas / Construction of aerial imagery mosaic on platforms for agricultural applicationsCandido, Leandro Rosendo 29 March 2019 (has links)
A agricultura de precisão tem agregado alto valor para os agricultores por causa das tecnologias que estão ligadas a ela. Sistemas que extraem informações de imagens digitais são extremamente utilizados para que o agricultor tome decisões a fim de aumentar sua produtividade. Uma das técnicas de realizar o monitoramento é a construção de um mosaico de imagens aéreas, onde são utilizadas aeronaves voando em baixa altitude. Esta técnica pode levar dezenas de horas para ser concluída, dependendo da configuração do computador que a executa. Com o intuito de reduzir o tempo nessa construção e tornar possível o embarque a essa aplicação, este trabalho apresenta uma maneira simplificada de construir o mosaico de imagens aéreas baseada na técnica de georreferenciamento direto, no qual utiliza a computação heterogênea para acelerar o desempenho. Essa abordagem é composta por apenas três técnicas que também compõem a abordagem clássica para a construção de mosaicos (warping, extração de características e combinação de características), além de inserir em seus cálculos os dados fornecidos pelos sensores GPS e IMU com a finalidade de direcionar e posicionar cada imagem pertencente ao conjunto que formará o mosaico. A plataforma de computação heterogênea utilizada neste trabalho é a NVIDIA Jetson TK1 escolhida pelo fato de disponibilizar de uma GPU que suporta a linguagem de programação CUDA. Utilizando esta abordagem, a falta de correção da perspectiva do conteúdo (geometria) da imagem gera um resultado inesperado, pois os dados fornecidos pela IMU, ao contrário do que se imagina, apenas servem para corrigir a posição das coordenadas do GPS registradas no momento de captura de cada imagem que compõem o mosaico. O tempo de execução da aplicação desenvolvida é satisfatório tornando possível a adoção desta abordagem. / Accuracy agriculture has added value to farmers thanks to the new technologies that are linked to it. Systems that extract information from digital images are very usefull to help farmers making decisions in order to increase their productivity. One of the techniques to perform this kind of monitoring is the construction of an aerial imagery mosaic where aircrafts flies in low altitude. This technique may take hours to be completed, depending on computer\'s configuration. With the purpose of reducing time in this construction, this thesis presents a simplified way to make aerial imagery mosaic based on direct georeferencing. This approach is composed by three techniques that also make up the classic approach to building mosaics (warping, extraction of characteristics and combination of characteristics), the difference is with this technique here presented is also possible to insert into the calculations the data provided by the GPS and IMU sensors with the purpose of directing and positioning each image to the belonging set to form the mosaic. The heterogeneous computing platform used in this work is the NVIDIA JetsonTK1, this platform was chosen because it offers a GPU that supports the language of CUDA programming. If the images\' geometry errors weren\'t rectfyed, using this approach, an unexpected result happens, because the data provided by IMU, contrary to what is imagined, only serve to correct the position of the GPS coordinates recorded at the moment of capture of each image that composes the mosaic. The developing time in this application is satisfactory making the adoption of this approch favorable.
|
87 |
A framework for efficient execution on GPU and CPU+GPU systems / Framework pour une exécution efficace sur systèmes GPU et CPU+GPUDollinger, Jean-François 01 July 2015 (has links)
Les verrous technologiques rencontrés par les fabricants de semi-conducteurs au début des années deux-mille ont abrogé la flambée des performances des unités de calculs séquentielles. La tendance actuelle est à la multiplication du nombre de cœurs de processeur par socket et à l'utilisation progressive des cartes GPU pour des calculs hautement parallèles. La complexité des architectures récentes rend difficile l'estimation statique des performances d'un programme. Nous décrivons une méthode fiable et précise de prédiction du temps d'exécution de nids de boucles parallèles sur GPU basée sur trois étapes : la génération de code, le profilage offline et la prédiction online. En outre, nous présentons deux techniques pour exploiter l'ensemble des ressources disponibles d'un système pour la performance. La première consiste en l'utilisation conjointe des CPUs et GPUs pour l'exécution d'un code. Afin de préserver les performances il est nécessaire de considérer la répartition de charge, notamment en prédisant les temps d'exécution. Le runtime utilise les résultats du profilage et un ordonnanceur calcule des temps d'exécution et ajuste la charge distribuée aux processeurs. La seconde technique présentée met le CPU et le GPU en compétition : des instances du code cible sont exécutées simultanément sur CPU et GPU. Le vainqueur de la compétition notifie sa complétion à l'autre instance, impliquant son arrêt. / Technological limitations faced by the semi-conductor manufacturers in the early 2000's restricted the increase in performance of the sequential computation units. Nowadays, the trend is to increase the number of processor cores per socket and to progressively use the GPU cards for highly parallel computations. Complexity of the recent architectures makes it difficult to statically predict the performance of a program. We describe a reliable and accurate parallel loop nests execution time prediction method on GPUs based on three stages: static code generation, offline profiling, and online prediction. In addition, we present two techniques to fully exploit the computing resources at disposal on a system. The first technique consists in jointly using CPU and GPU for executing a code. In order to achieve higher performance, it is mandatory to consider load balance, in particular by predicting execution time. The runtime uses the profiling results and the scheduler computes the execution times and adjusts the load distributed to the processors. The second technique, puts CPU and GPU in a competition: instances of the considered code are simultaneously executed on CPU and GPU. The winner of the competition notifies its completion to the other instance, implying the termination of the latter.
|
88 |
A Trusted Autonomic Architecture to Safeguard Cyber-Physical Control Leaf Nodes and Protect Process IntegrityChiluvuri, Nayana Teja 16 September 2015 (has links)
Cyber-physical systems are networked through IT infrastructure and susceptible to malware. Threats targeting process control are much more safety-critical than traditional computing systems since they jeopardize the integrity of physical infrastructure. Existing defence mechanisms address security at the network nodes but do not protect the physical infrastructure if network integrity is compromised. An interface guardian architecture is implemented on cyber-physical control leaf nodes to maintain process integrity by enforcing high-level safety and stability policies.
Preemptive detection schemes are implemented to monitor process behavior and anticipate malicious activity before process safety and stability are compromised. Autonomic properties are employed to automatically protect process integrity by initiating switch-over to a verified backup controller. Subsystems adhere to strict trust requirements safeguarding them from adversarial intrusion. The preemptive detection schemes, switch-over logic, backup controller, and process communication are all trusted components that are separated from the untrusted production controller.
The proposed architecture is applied to a rotary inverted pendulum experiment and implemented on a Xilinx Zynq-7000 configurable SoC. The leaf node implementation is integrated into a cyber-physical control topology. Simulated attack scenarios show strengthened resilience to both network integrity and reconfiguration attacks. Threats attempting to disrupt process behavior are successfully thwarted by having a backup controller maintain process stability. The system ensures both safety and liveness properties even under adversarial conditions. / Master of Science
|
89 |
NEW COMPUTATIONAL METHODS FOR 3D STRUCTURE DETERMINATION OF MACROMOLECULAR COMPLEXES BY SINGLE PARTICLE CRYO-ELECTRON MICROSCOPY / Methodische Entwicklungen in der Bildverarbeitung kryoelektronenmikroskopischer Aufnahmen und deren Anwendung in der Strukturbestimmung biologischer MakromoleküleSchmeißer, Martin 17 April 2009 (has links)
No description available.
|
Page generated in 0.0888 seconds