• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 475
  • 88
  • 87
  • 56
  • 43
  • 21
  • 14
  • 14
  • 10
  • 5
  • 5
  • 3
  • 3
  • 3
  • 3
  • Tagged with
  • 988
  • 321
  • 203
  • 183
  • 168
  • 165
  • 154
  • 138
  • 124
  • 104
  • 96
  • 95
  • 93
  • 88
  • 83
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
291

Optimizations for Deep Learning-Based CT Image Enhancement

Chaturvedi, Ayush 04 March 2024 (has links)
Computed tomography (CT) combined with deep learning (DL) has recently shown great potential in biomedical imaging. Complex DL models with varying architectures inspired by the human brain are improving imaging software and aiding diagnosis. However, the accuracy of these DL models heavily relies on the datasets used for training, which often contain low-quality CT images from low-dose CT (LDCT) scans. Moreover, in contrast to the neural architecture of the human brain, DL models today are dense and complex, resulting in a significant computational footprint. Therefore, in this work, we propose sparse optimizations to minimize the complexity of the DL models and leverage architecture-aware optimization to reduce the total training time of these DL models. To that end, we leverage a DL model called DenseNet and Deconvolution Network (DDNet). The model enhances LDCT chest images into high-quality (HQ) ones but requires many hours to train. To further improve the quality of final HQ images, we first modified DDNet's architecture with a more robust multi-level VGG (ML-VGG) loss function to achieve state-of-the-art CT image enhancement. However, improving the loss function results in increased computational cost. Hence, we introduce sparse optimizations to reduce the complexity of the improved DL model and then propose architecture-aware optimizations to efficiently utilize the underlying computing hardware to reduce the overall training time. Finally, we evaluate our techniques for performance and accuracy using state-of-the-art hardware resources. / Master of Science / Deep learning-based (DL) techniques that leverage computed tomography (CT) are becoming omnipresent in diagnosing diseases and abnormalities associated with different parts of the human body. However, their diagnostic accuracy is directly proportional to the quality of the CT images used in training the DL models, which is majorly governed by the radiation dose of the X-ray in the CT scanner. To improve the quality of low-dose CT (LDCT) images, DL-based techniques show promising improvements. However, these techniques require substantial computational resources and time to train the DL models. Therefore, in this work, we incorporate algorithmic techniques inspired by sparse neural architecture of the human brain to reduce the complexity of such DL models. To that end, we leverage a DL model called DenseNet and Deconvolution Network (DDNet) that enhances the quality of CT images generated by low X-ray dosage into high-quality CT images. However, due to its architecture, it takes hours to train DDNet on state-of-the-art hardware resources. Hence, in this work, we propose techniques that efficiently utilize the hardware resources and reduce the time required to train DDNet. We evaluate the efficacy of our techniques on modern supercomputers in terms of speed and accuracy.
292

Addressing Challenges in Utilizing GPUs for Accelerating Privacy-Preserving Computation

Yudha, Ardhi Wiratama Baskara 01 January 2024 (has links) (PDF)
Cloud computing increasingly handles confidential data, like private inference and query databases. Two strategies are used for secure computation: (1) employing CPU Trusted Execution Environments (TEEs) like AMD SEV, Intel SGX, or ARM TrustZone, and (2) utilizing emerging cryptographic methods like Fully Homomorphic Encryption (FHE) with libraries such as HElib, Microsoft SEAL, and PALISADE. To enhance computation, GPUs are often employed. However, using GPUs to accelerate secure computation introduces challenges addressed in three works. In the first work, we tackle GPU acceleration for secure computation with CPU TEEs. While TEEs perform computations on confidential data, extending their capabilities to GPUs is essential for leveraging their power. Existing approaches assume co-designed CPU-GPU setups, but we contend that co-designing CPU and GPU is difficult to achieve and requires early coordination between CPU and GPU manufacturers. To address this, we propose software-based memory encryption for CPU-GPU TEE co-design via the software layer. Yet, this introduces issues due to AES's 128-bit granularity. We present optimizations to mitigate these problems, resulting in execution time overheads of 1.1\% and 56\% for regular and irregular applications. In the second work, we focus on GPU acceleration for the CPU FHE library HElib, particularly for comparison operations on encrypted data. These operations are vital in Machine Learning, Image Processing, and Private Database Queries, yet their acceleration is often overlooked. We extend HElib to harness GPU acceleration for its resource-intensive components like BluesteinNTT, BluesteinFFT, and Element-wise Operations. Addressing memory separation, dynamic allocation, and parallelization challenges, we employ several optimizations to address these challenges. With all optimizations and hybrid CPU-GPU parallelism, we achieve a 11.1$\times$ average speedup over the state-of-the-art CPU FHE library. In our latest work, we concentrate on minimizing the ciphertext size by leveraging insights from algorithms, data access patterns, and application requirements to reduce the operational footprint of an FHE application, particularly targeting Neural Network inference tasks. Through the implementation of all three levels of ciphertext compression (precision reduction in comparisons, optimization of access patterns, and adjustments in data layout), we achieve a remarkable 5.6$\times$ speedup compared to the state-of-the-art GPU implementation in 100x\cite{100x}. Overcoming these challenges is crucial for achieving significant GPU-driven performance improvements. This dissertation provides solutions to these hurdles, aiming to facilitate GPU-based acceleration of confidential data computation.
293

Implementierung des Genom-Alignments auf modernen hochparallelen Plattformen / Implementing Genome Alignment Algorithms on Highly Parallel Platforms

Knodel, Oliver 26 March 2014 (has links) (PDF)
Durch die wachsende Bedeutung der DNS-Sequenzierung wurden die Geräte zur Sequenzierung weiterentwickelt und ihr Durchsatz so erhöht, dass sie Millionen kurzer Nukleotidsequenzen innerhalb weniger Tage liefern. Moderne Algorithmen und Programme, welche die dadurch entstehenden großen Datenmengen in akzeptabler Zeit verarbeiten können, ermitteln jedoch nur einen Bruchteil der Positionen der Sequenzen in bekannten Datenbanken. Eine derartige Suche ist eine der wichtigsten Aufgaben in der modernen Molekularbiologie. Diese Arbeit untersucht mögliche Übertragungen moderner Genom-Alignment Programme auf hochparallele Plattformen wie FPGA und GPU. Die derzeitig an das Problem angepassten Programme und Algorithmen werden untersucht und hinsichtlich ihrer Parallelisierbarkeit auf den beiden Plattformen FPGA und GPU analysiert. Nach einer Bewertung der Alternativen erfolgt die Auswahl eines Algorithmus. Anschließend wird dessen Übertragung auf die beiden Plattformen entworfen und implementiert. Dabei stehen die Geschwindigkeit der Suche, die Anzahl der ermittelten Positionen sowie die Nutzbarkeit im Vordergrund. Der auf der GPU implementierte reduzierte Smith & Waterman-Algorithmus ist effizient an die Problemstellung angepasst und erreicht für kurze Sequenzen höhere Geschwindigkeiten als bisherige Realisierungen auf Grafikkarten. Eine vergleichbare Umsetzung auf dem FPGA benötigt eine deutlich geringere Laufzeit, findet ebenfalls jede Position in der Datenbank und erreicht dabei ähnliche Geschwindigkeiten wie moderne leistungsfähige Programme, die aber heuristisch arbeiten. Die Anzahl der gefundenen Positionen ist bei FPGA und GPU damit mehr als doppelt so hoch wie bei sämtlichen vergleichbaren Programmen. / Further developments of DNA sequencing devices produce millions of short nucleotide sequences. Finding the positions of these sequences in databases of known sequences is an important problem in modern molecular biology. Current heuristic algorithms and programs only find a small fraction of these positions. In this thesis genome alignment algorithms are implemented on massively parallel platforms as FPGA and GPU. The next generation sequencing technologies that are currently in use are reviewed regarding their possible parallelization on FPGA and GPU. After evaluation one algorithm is chosen for parallelization. Its implementation on both platforms is designed and realized. Runtime, accuracy as well as usability are important features of the implementation. The reduced Smith & Waterman algorithm which is realized on the GPU outperforms similar GPU programs in speed and efficiency for short sequences. The runtime of the FPGA approach is similar to those of widely used heuristic software mappers and much lower than on the GPU. Furthermore the FPGA guarantees to find all alignment positions of a sequence in the database, which is more than twice the number that is found by comparable software algorithms.
294

Throughput-oriented analytical models for performance estimation on programmable hardware accelerators / Analyse de performance potentielle d'une simulation de QCD sur réseau sur processeur Cell et GPU

Lai, Junjie 15 February 2013 (has links)
Durant cette thèse, nous avons principalement travaillé sur deux sujets liés à l'analyse de la performance GPU (Graphics Processing Unit - Processeur graphique). Dans un premier temps, nous avons développé une méthode analytique et un outil d'estimation temporel (TEG) pour prédire les performances d'applications CUDA s’exécutant sur des GPUs de la famille GT200. Cet outil peut prédire les performances avec une précision approchant celle des outils précis au cycle près. Dans un second temps, nous avons développé une approche pour estimer la borne supérieure des performances d'une application GPU, en se basant sur l'analyse de l'application et de son code assembleur. Avec cette borne, nous connaissons la marge d'optimisation restante, et nous pouvons décider des efforts d'optimisation à fournir. Grâce à cette analyse, nous pouvons aussi comprendre quels paramètres sont critiques à la performance. / In this thesis work, we have mainly worked on two topics of GPU performance analysis. First, we have developed an analytical method and a timing estimation tool (TEG) to predict CUDA application's performance for GT200 generation GPUs. TEG can predict GPU applications' performance in cycle-approximate level. Second, we have developed an approach to estimate GPU applications' performance upper bound based on application analysis and assembly code level benchmarking. With the performance upper bound of an application, we know how much optimization space is left and can decide the optimization effort. Also with the analysis we can understand which parameters are critical to the performance.
295

Modélisation et rendu temps-réel de milieux participants à l'aide du GPU / Modeling and real-time rendering of participating media using the GPU

Giroud, Anthony 18 December 2012 (has links)
Cette thèse traite de la modélisation, l'illumination et le rendu temps-réel de milieux participants à l'aide du GPU. Dans une première partie, nous commençons par développer une méthode de rendu de nappes de brouillard hétérogènes pour des scènes en extérieur. Le brouillard est modélisé horizontalement dans une base 2D de fonctions de Haar ou de fonctions B-Spline linéaires ou quadratiques, dont les coefficients peuvent être chargés depuis une textit{fogmap}, soit une carte de densité en niveaux de gris. Afin de donner au brouillard son épaisseur verticale, celui-ci est doté d'un coefficient d'atténuation en fonction de l'altitude, utilisé pour paramétrer la rapidité avec laquelle la densité diminue avec la distance au milieu selon l'axe Y. Afin de préparer le rendu temps-réel, nous appliquons une transformée en ondelettes sur la carte de densité du brouillard, afin d'en extraire une approximation grossière (base de fonctions B-Spline) et une série de couches de détails (bases d'ondelettes B-Spline), classés par fréquence.%Les détails sont ainsi classés selon leur fréquence et, additionnées, permettent de retrouver la carte de densité d'origine. Chacune de ces bases de fonctions 2D s'apparente à une grille de coefficients. Lors du rendu sur GPU, chacune de ces grilles est traversée pas à pas, case par case, depuis l'observateur jusqu'à la plus proche surface solide. Grâce à notre séparation des différentes fréquences de détails lors des pré-calculs, nous pouvons optimiser le rendu en ne visualisant que les détails les plus contributifs visuellement en avortant notre parcours de grille à une distance variable selon la fréquence. Nous présentons ensuite d'autres travaux concernant ce même type de brouillard : l'utilisation de la transformée en ondelettes pour représenter sa densité via une grille non-uniforme, la génération automatique de cartes de densité et son animation à base de fractales, et enfin un début d'illumination temps-réel du brouillard en simple diffusion. Dans une seconde partie, nous nous intéressons à la modélisation, l'illumination en simple diffusion et au rendu temps-réel de fumée (sans simulation physique) sur GPU. Notre méthode s'inspire des Light Propagation Volumes (volume de propagation de lumière), une technique à l'origine uniquement destinée à la propagation de la lumière indirecte de manière complètement diffuse, après un premier rebond sur la géométrie. Nous l'adaptons pour l'éclairage direct, et l'illumination des surfaces et milieux participants en simple diffusion. Le milieu est fourni sous forme d'un ensemble de bases radiales (blobs), puis est transformé en un ensemble de voxels, ainsi que les surfaces solides, de manière à disposer d'une représentation commune. Par analogie aux LPV, nous introduisons un Occlusion Propagation Volume, dont nous nous servons, pour calculer l'intégrale de la densité optique entre chaque source et chaque autre cellule contenant soit un voxel du milieu, soit un voxel issu d'une surface. Cette étape est intégrée à la boucle de rendu, ce qui permet d'animer le milieu participant ainsi que les sources de lumière sans contrainte particulière. Nous simulons tous types d'ombres : dues au milieu ou aux surfaces, projetées sur le milieu ou les surfaces / This thesis deals with modeling, illuminating and rendering participating media in real-time using graphics hardware. In a first part, we begin by developing a method to render heterogeneous layers of fog for outdoor scenes. The medium is modeled horizontally using a 2D Haar or linear/quadratic B-Spline function basis, whose coefficients can be loaded from a fogmap, i.e. a grayscale density image. In order to give to the fog its vertical thickness, it is provided with a coefficient parameterizing the extinction of the density when the altitude to the fog increases. To prepare the rendering step, we apply a wavelet transform on the fog's density map, and extract a coarse approximation and a series of layers of details (B-Spline wavelet bases).These details are ordered according to their frequency and, when summed back together, can reconstitute the original density map. Each of these 2D function basis can be viewed as a grid of coefficients. At the rendering step on the GPU, each of these grids is traversed step by step, cell by cell, since the viewer's position to the nearest solid surface. Thanks to our separation of the different frequencies of details at the precomputations step, we can optimize the rendering by only visualizing details that contribute most to the final image and abort our grid traversal at a distance depending on the grid's frequency. We then present other works dealing with the same type of fog: the use of the wavelet transform to represent the fog's density in a non-uniform grid, the automatic generation of density maps and their animation based on Julia fractals, and finally a beginning of single-scattering illumination of the fog, where we are able to simulate shadows by the medium and the geometry. In a second time, we deal with modeling, illuminating and rendering full 3D single-scattering sampled media such as smoke (without physical simulation) on the GPU. Our method is inspired by light propagation volumes, a technique whose only purpose was, at the beginning, to propagate fully diffuse indirect lighting. We adapt it to direct lighting, and the illumination of both surfaces and participating media. The medium is provided under the form of a set of radial bases (blobs), and is then transformed into a set of voxels, together with solid surfaces, so that both entities can be manipulated more easily under a common form. By analogy to the LPV, we introduce an occlusion propagation volume, which we use to compute the integral of the optical density, between each source and each other cell containing a voxel either generated from the medium, or from a surface. This step is integrated into the rendering process, which allows to animate participating media and light sources without any further constraint
296

Contribution à la définition, à l'optimisation et à l'implantation d'IP de traitement du signal et des données en temps réel sur des cibles programmables / Contribution to the definition, optimization and implementation of signal processing IPs on programmable target

Ouerhani, Yousri 16 November 2012 (has links)
En dépit du succès que les implantations optiques des applications de traitement d'images ont connu, le traitement optique de l'information suscite aujourd'hui moins d'intérêt que dans les années 80-90. Ceci est dû à l'encombrement des réalisations optiques, la qualité des images traitées et le coût des composants optiques. De plus, les réalisations optiques ont eu du mal à s’affranchir de l’avènement des circuits numériques. C’est dans ce cadre que s’inscrivent les travaux de cette thèse dont l’objectif est de proposer une implantation numérique des méthodes optiques de traitement d’images. Pour réaliser cette implantation nous avons choisi d’utiliser les FPGA et les GPU grâce aux bonnes performances de ces circuits en termes de rapidité. En outre, pour améliorer la productivité nous nous sommes focalisés à la réutilisation des blocs préconçus ou IP « Intellectual Properties ». Malgré que les IP commerciales existantes soient optimisées, ces dernières sont souvent payantes et dépendent de la famille de la carte utilisée. La première contribution est de proposer une implantation optimisée des IP pour le calcul de la transformée de Fourier FFT et de la DCT. En effet, le choix de ces deux transformations est justifié par l'utilisation massive de ces deux transformées (FFT et DCT), dans les algorithmes de reconnaissance de formes et de compression, respectivement. La deuxième contribution est de valider le fonctionnement des IP proposées par un banc de test et de mesure. Enfin, la troisième contribution est de concevoir sur FPGA et GPU des implantations numériques des applications de reconnaissance de formes et de compression. Un des résultats probant obtenu dans cette thèse consiste à avoir une rapidité de l’IP FFT proposée 3 fois meilleure que celle de l’IP FFT Xilinx et de pouvoir réaliser 4700 corrélations par seconde. / The main objective of this thesis is to realize a numerical implementation of optical methods of image and signal processing. To achieve this end, we opted to use FPGA (Field Programmable Gate Array) and GPU (Graphical Processing Unit) devices. This choice is justified by their high performance in terms of speed. In addition, to improve productivity, we focused on the reuse of predesigned blocks or "Intellectual Properties" IP. While existing commercial IP are optimized, they are often paid and highly dependent on the card. The first contribution is to provide an optimized IP for Fourier transform (FFT) and the cosine transform (DCT) computing. Indeed, the choice of these two transformations is justified by the widespread use of these two transforms (FFT and DCT), particularly in pattern recognition and compression algorithms. The second contribution is to validate the operation of the proposed IP using a bench test. The last contribution is to implement on FPGA and GPU applications for pattern recognition and compression. One of the convincing results obtained in this thesis is to propose an IP for FFT computing three times faster than Xilinx IP and thus to achieve 4700 correlations per second.
297

[en] VISUALIZATION OF ARBITRARY CROSS SECTION OF UNSTRUCTURED MESHES / [pt] VISUALIZAÇÃO DE SEÇÕES DE CORTE ARBITRÁRIAS DE MALHAS NÃO ESTRUTURADAS

BERNARDO BIANCHI FRANCESCHIN 13 January 2015 (has links)
[pt] Na visualização de campos escalares de dados volumétricos, o uso de seções de corte é uma técnica eficaz para se inspecionar a variação do campo no interior do domínio. A técnica de visualização consiste em mapear sobre a superfície da seção de corte um mapa de cores, o qual representa a variação do campo escalar na interseção da superfície com o volume. Este trabalho propõe um método eficiente para o mapeamento de campos escalares de malhas não estruturadas em seções de corte arbitrárias. Trata-se de um método de renderização direta (a interseção da superfície com o modelo não é extraída) que usa a GPU para garantir bom desempenho. A idéia básica do método proposto é utilizar o rasterizador da placa gráfica para gerar os fragmentos da superfície de corte e calcular a interseção de cada fragmento com o modelo em GPU. Para isso, é necessário testar a localização de cada fragmento na malha não estruturada de maneira eficiente. Como estrutura de aceleração, foram testadas três variações de grades regulares para armazenar os elementos (células) da malha, e cada elemento é representado pela lista de planos de suas faces, facilitando o teste de pertinência fragmento-elemento. Uma vez determinado o elemento que contém o fragmento, são aplicados procedimentos para interpolar o campo escalar e para identificar se o fragmento está próximo à fronteira do elemento, a fim de representar o aramado (wireframe) da malha na superfície de corte. Resultados obtidos demonstram a eficácia e a eficiência do método proposto. / [en] For the visualization of scalar fields in volume data, the use of cross sections is an effective technique to inspect the field variation inside the domain. The technique consists in mapping, on the cross section surfaces, a colormap that represents the scalar field on the surfasse-volume intersection. In this work, we propose an efficient method for mapping scalar fields of unstructured meshes on arbitrary cross sections. It is a direct-rendering method (the intersection of the surface and the model is not extracted) that uses GPU to ensure efficiency. The basic idea is to use the graphics rasterizer to generate the fragments of the cross-section surface and to compute the intersection of each fragment with the model. For this, it is necessary to test the location of each fragment with respect to the unstructured mesh in an efficient way. As acceleration data structure, we tested three variations of regular grids to store the elements (cells) of the mesh, and each elemento is represented by the list of face planes, easing the in-out test between fragments and elements. Once the element that contains the fragment is determined, it is applied procedures to interpolate the scalar field and to check if the fragment is close to the element boundary, to reveal the mesh wireframe on the surface. Achieved results demonstrate the effectiveness and the efficiency of the proposed method.
298

Desenvolvimento de um software de Monte Carlo para transporte de fótons em estruturas de voxels usando unidades de processamento gráfico / Development of a GPU Monte Carlo software for photon transport in voxel structures

Bellezzo, Murillo 26 June 2014 (has links)
Sendo o método mais preciso para estimar a dose absorvida em radioterapia, o Método de Monte Carlo (MMC) tem sido amplamente utilizado no planejamento de tratamento radioterápico. No entanto, a sua eciência pode ser melhorada para aplicações clínicas de rotina. Nesta dissertação é apresentado o código CUBMC, um código de Monte Carlo que simula o transporte de fótons para cálculo de dose, desenvolvido na plataforma CUDA (Compute Unified Device Architecture). A simulação de eventos físicos é baseada no algoritmo presente no código PENELOPE, e as tabelas de seção de choque utilizadas são geradas pela rotina MATERIAL, também presente no código PENELOPE. Os fótons são transportados em objetos simuladores descritos por voxels. Existem duas abordagens distintas utilizadas para a simulação. A primeira delas obriga o fóton a realizar uma parada toda vez que cruza a fronteira de um voxel, a segunda e pelo Método de Woodcock, onde o fóton ignora a existência de fronteiras e é transportado em um meio homogêneo fictício. O código CUBMC tem como objetivo ser uma opção de código simulador que, ao utilizar a capacidade de processamento paralelo de unidades de processamento gráfico (GPU), apresente alto desempenho em máquinas compactas e de baixo custo, podendo assim ser aplicado em casos clínicos e incorporado a sistemas de planejamento de tratamento em radioterapia. / As the most accurate method to estimate absorbed dose in radiotherapy, Monte Carlo Method (MCM) has been widely used in radiotherapy treatment planning. Nevertheless, its efficiency can be improved for clinical routine applications. In this master thesis, the CUBMC code is presented, a GPU-based MC photon transport algorithm for dose calculation under the Compute Unified Device Architecture (CUDA) platform. The simulation of physical events is based on the algorithm used in PENELOPE, and the cross section table used is the one generated by the MATERIAL routine, also present in PENELOPE code. Photons are transported in voxel-based geometries with different compositions. There are two distinct approaches used for transport simulation. The first of them forces the photon to stop at every voxel frontier, the second one is the Woodcock method, where the photon ignores the existence of borders and travels in homogeneous fictitious medium. The CUBMC code aims to be an alternative for Monte Carlo simulator code that, by using the capability of parallel processing of graphics processing units (GPU), provides high performance simulations in low cost compact machines, and thus can be applied in clinical cases and incorporated in treatment planning systems for radiotherapy.
299

Segmentação e reconhecimento de gestos em tempo real com câmeras e aceleração gráfica / Real-time segmentation and gesture recognition with cameras and graphical acceleration

Dantas, Daniel Oliveira 15 March 2010 (has links)
O objetivo deste trabalho é reconhecer gestos em tempo real apenas com o uso de câmeras, sem marcadores, roupas ou qualquer outro tipo de sensor. A montagem do ambiente de captura é simples, com apenas duas câmeras e um computador. O fundo deve ser estático, e contrastar com o usuário. A ausência de marcadores ou roupas especiais dificulta a tarefa de localizar os membros. A motivação desta tese é criar um ambiente de realidade virtual para treino de goleiros, que possibilite corrigir erros de movimentação, posicionamento e de escolha do método de defesa. A técnica desenvolvida pode ser aplicada para qualquer atividade que envolva gestos ou movimentos do corpo. O reconhecimento de gestos começa com a detecção da região da imagem onde se encontra o usuário. Nessa região, localizamos as regiões mais salientes como candidatas a extremidades do corpo, ou seja, mãos, pés e cabeça. As extremidades encontradas recebem um rótulo que indica a parte do corpo que deve representar. Um vetor com as coordenadas das extremidades é gerado. Para descobrir qual a pose do usuário, o vetor com as coordenadas das suas extremidades é classificado. O passo final é a classificação temporal, ou seja, o reconhecimento do gesto. A técnica desenvolvida é robusta, funcionando bem mesmo quando o sistema foi treinado com um usuário e aplicado a dados de outro. / Our aim in this work is to recognize gestures in real time with cameras, without markers or special clothes. The capture environment setup is simple, uses just two cameras and a computer. The background must be static, and its colors must be different the users. The absence of markers or special clothes difficults the location of the users limbs. The motivation of this thesis is to create a virtual reality environment for goalkeeper training, but the technique can be applied in any activity that involves gestures or body movements. The recognition of gestures starts with the background subtraction. From the foreground, we locate the more proeminent regions as candidates to body extremities, that is, hands, feet and head. The found extremities receive a label that indicates the body part it may represent. To classify the users pose, the vector with the coordinates of his extremities is compared to keyposes and the best match is selected. The final step is the temporal classification, that is, the gesture recognition. The developed technique is robust, working well even when the system was trained with an user and applied to another users data.
300

Gestion dynamique du parallélisme dans les architectures multi-cœurs pour applications mobiles / Dynamic parallelism adaptation in multicore architectures for mobile applications

Texier, Matthieu 08 December 2014 (has links)
Le nombre de smartphones vendus a récemment dépassé celui des ordinateurs. Ces appareils tendent à regrouper de plus en plus de fonctions, ceci grâce à des applications de plus en plus variées telles que la vidéo conférence, la réalité augmentée, ou encore les jeux vidéo. Le support de ces applications est assuré par des ressources de calculs hétérogènes qui sont spécifiques aux différents types de traitements et qui respectent les performances requises et les contraintes de consommation du système. Les applications graphiques, telles que les jeux vidéo, sont par exemple accélérées par un processeur graphique. Cependant les applications deviennent de plus en plus complexes. Une application de réalité augmentée va par exemple nécessiter du traitement d'image, du rendu graphique et un traitement des informations à afficher. Cette complexité induit souvent une variation de la charge de travail qui impacte les performances et donc les besoins en puissance de calcul de l'application. Ainsi, la parallélisation de l'application, généralement prévue pour une certaine charge, devient inappropriée. Ceci induit un gaspillage des ressources de calcul qui pourraient être exploitées par d'autres applications ou par d'autres étages de l'application. Un pipeline de rendu graphique a été choisi comme cas d'utilisation car c'est une application dynamique et qui est de plus en plus répandu dans les appareils mobiles. Cette application a été implémentée et parallélisée sur un simulateur d'architecture multi-cœurs. Un profilage a confirmé l'aspect dynamique, le temps de calcul de chaque donnée ainsi que le nombre d'objets à calculer variant de manière significative dans le temps et que la meilleure répartition du parallélisme évolue en fonction de la scène rendue. Ceci nous a amenés à définir un système permettant d'adapter, au fil de l'exécution, le parallélisme d'une application en fonction d'une prédiction faite de ses besoins. Le choix d'un nouveau parallélisme nécessite de connaître les besoins en puissance de calcul des différents étages, en surveillant les transferts de données entre les étages de l'application. Enfin, l'adaptation du parallélisme implique une nouvelle répartition des tâches en fonction des besoins des différents étages qui est effectuée grâce à un contrôleur central. Le système a été implémenté dans un simulateur précis au niveau TTLM afin d'estimer les gains de performances permis par l'adaptation dynamique. Une architecture permettant l'accélération de différents types d'applications que ce soit généralistes ou graphiques a été définie et comparée à d'autres architectures multi-cœurs. Le coût matériel de cette architecture a de plus été quantifié. Ainsi, pour un support matériel dont la complexité est inférieure à 1,5 % du design complet, on démontre des gains de performance allant jusqu'à 20 % par rapport à certains déploiements statiques, ainsi que la capacité à gérer dynamiquement un nombre de ressources de calcul variable. / The amount of smartphone sales recently surpassed the desktop computer ones. This is mainly due to the smart integration of many functionalities in the same architecture. This is also due to the wide variety of supported applications like augmented reality, video conferencing and video games. The support of these applications is made by heterogeneous computing resources specialized to support each application type thus allowing to meet required performance and power consumption. For example, multimedia applications are accelerated by hardware modules that help video encoding and decoding and video game 3D rendering is accelerated by specialized processors (GPU). However, applications become more and more complicated. As an example, augmented reality requires image processing, 3D rendering and computing the information to display. This complexity often comes with a variation of the computing load, which dynamically changes application performance requirements. When this application is implemented in parallel, the way parallelism is chosen for a specific workload, becomes inefficient for a different one. This leads to a waste in computing resources and our objective is to optimize the usage of all available computing resources at runtime. The selected use case is a graphic rendering pipeline application because it is a dynamic application, which is also widely used in mobile devices. This application has been implemented and parallelized on a multicore architecture simulator. The profiling shows that the dynamicity of the application, the time and the amount of data needed to compute vary. The profiling also shows that the best balance of the parallelism depends on the rendered scene; a dynamic load balancing is therefore required for this application. These studies brought us about defining a system allowing to dynamically adapt the application parallelism depending on a prediction of its computing requirements, which can be performed by monitoring the data exchanges between the application tasks. Then the new parallelism is calculated for each stage by a central controller that manages the whole application. This system has been implemented in a Timed-TLM simulator in order to estimate performance improvements allowed by the dynamic adaptation. An architecture allowing to accelerate mobile applications, such as general-purpose and 3D applications, has been defined and compared to other multicore architectures. The hardware complexity and the performance of the architecture have also been estimated. For an increased complexity lower that 1,5%, we demonstrate performance improvements up to 20% compared with static parallelisms. We also demonstrated the ability to support a variable amount of resources.

Page generated in 0.0392 seconds