Global ETD Search

1	Parallel implementation of fuzzy artmap on a hypercube (iPSC) Malkani, Anil January 1992 (has links) No description available. parallel implementation fuzzy artmap hypercube (iPSC)
2	Parallel implementation of template matching on hypercube array processors Chai, Sin-Kuo January 1989 (has links) No description available. Parallel Implementation Template Matching Hypercube Array Processors
3	Parallel implementation of the split and merge algorithm on the hypercube machine Lakshman, Prabhashankar January 1989 (has links) No description available. Parallel Implementation Split and Merge Algorithm Hypercube
4	A Parallel FPGA Implementation of Image Convolution Ström, Henrik January 2016 (has links) Image convolution is a common algorithm that can be found in most graphics editors. It is used to filter images by multiplying and adding pixel values with coefficients in a filter kernel. Previous research work have implemented this algorithm on different platforms, such as FPGAs, CUDA, C etc. The performance of these implementations have then been compared against each other. When the algorithm has been implemented on an FPGA it has almost always been with a single convolution. The goal of this thesis was to investigate and in the end present one possible way to implement the algorithm with 16 parallel convolutions on a Xilinx Spartan 6 LX9 FPGA and then compare the performance with results from previous work. The final system performs better than multi-threaded implementations on both a GPU and CPU. FPGA Image Convolution Parallel Implementation VHDL Computer Engineering Datorteknik
5	Fast 3D Deformable Image Registration on a GPU Computing Platform Mousazadeh, Mohammad Hamed 10 1900 (has links) <p>Image registration has become an indispensable tool in medical diagnosis and intervention. The increasing need for speed and accuracy in clinical applications have motivated researchers to focus on developing fast and reliable registration algorithms. In particular, advanced deformable registration routines are emerging for medical applications involving soft-tissue organs such as brain, breast, kidney, liver, prostate, etc. Computational complexity of such algorithms are significantly higher than those of conventional rigid and affine methods, leading to substantial increases in execution time. In this thesis, we present a parallel implementation of a newly developed deformable image registration algorithm by Marami et al. [1] using the Computer Unified Device Architecture (CUDA). The focus of this study is on acceleration of the computations on a Graphics Processing Unit (GPU) to reduce the execution time to nearly real-time for diagnostic and interventional applications. The algorithm co-registers preoperative and intraoperative 3-dimensional magnetic resonance (MR) images of a deforming organ. It employs a linear elastic dynamic finite-element model of the deformation and distance measures such as mutual information and sum of squared difference to align volumetric image data sets. In this study, we report a parallel implementation of the algorithm for 3D-3D MR registration based on SSD on a CUDA capable NVIDIA GTX 480 GPU. Computationally expensive tasks such as interpolation, displacement and force calculation are significantly accelerated using the GPU. The result of the experiments carried out with a realistic breast phantom tissue shows a 37-fold speedup for the GPUbased implementation compared with an optimized CPU-based implementation in high resolution MR image registration. The CPU is a 3.20 GHz Intel core i5 650 processor with 4GB RAM that also hosts the GTX 480 GPU. This GPU has 15 streaming multiprocessors, each with 32 streaming processors, i.e. a total of 480 cores. The GPU implementation registers 3D-3D high resolution (512×512×136) image sets in just over 2 seconds, compared to 1.38 and 23.25 minutes for CPU and MATLAB-based implementations, respectively. Most GPU kernels which are employed in 3D-3D registration algorithm also can be employed to accelerate the 2D-3D registration algorithm in [1].</p> / Master of Applied Science (MASc) GPU Parallel Implementation Image Registration CUDA Computer Engineering Computer Engineering
6	Real-time Stereo To Multi-view Video Conversion Cigla, Cevahir 01 July 2012 (has links) (PDF) A novel and efficient methodology is presented for the conversion of stereo to multi-view video in order to address the 3D content requirements for the next generation 3D-TVs and auto-stereoscopic multi-view displays. There are two main algorithmic blocks in such a conversion system / stereo matching and virtual view rendering that enable extraction of 3D information from stereo video and synthesis of inexistent virtual views, respectively. In the intermediate steps of these functional blocks, a novel edge-preserving filter is proposed that recursively constructs connected support regions for each pixel among color-wise similar neighboring pixels. The proposed recursive update structure eliminates pre-defined window dependency of the conventional approaches, providing complete content adaptibility with quite low computational complexity. Based on extensive tests, it is observed that the proposed filtering technique yields better or competitive results against some leading techniques in the literature. The proposed filter is mainly applied for stereo matching to aggregate cost functions and also handles occlusions that enable high quality disparity maps for the stereo pairs. Similar to box filter paradigm, this novel technique yields matching of arbitrary-shaped regions in constant time. Based on Middlebury benchmarking, the proposed technique is currently the best local matching technique in the literature in terms of both precision and complexity. Next, virtual view synthesis is conducted through depth image based rendering, in which reference color views of left and right pairs are warped to the desired virtual view using the estimated disparity maps. A feedback mechanism based on disparity error is introduced at this step to remove salient distortions for the sake of visual quality. Furthermore, the proposed edge-aware filter is re-utilized to assign proper texture for holes and occluded regions during view synthesis. Efficiency of the proposed scheme is validated by the real-time implementation on a special graphics card that enables parallel computing. Based on extensive experiments on stereo matching and virtual view rendering, proposed method yields fast execution, low memory requirement and high quality outputs with superior performance compared to most of the state-of-the-art techniques.
7	Résolution du problème du p-médian, application à la restructuration de bases de données semi-structurées / Resolution of the p-median problem : application to restructuring semi-structured data Gay, Jean-Christophe 19 October 2011 (has links) Les problèmes que nous considérons dans cette thèse sont de nature combinatoire. Notre principal intérêt est le problème de restructuration de données semi-structurées. Par exemple des données stockées sous la forme d’un fichier XML sont des données semi-structurées. Ce problème peut être ramené à une instance du problème du p-médian. Le principal obstacle ici est la taille des instances qui peut devenir très grande. Certaines instances peuvent avoir jusqu’à 10000 ou 20000 sommets, ce qui implique plusieurs centaines de millions de variables. Pour ces instances, résoudre ne serait-ce que la relaxation linéaire du problème est très difficile. Lors d’expériences préliminaires nous nous sommes rendu compte que CPLEX peut résoudre des instances avec 1000 sommets dans des temps raisonnables. Mais pour des instances de 5000 sommets, il peut prendre jusqu’à 14 jours pour résoudre uniquement la relaxation linéaire. Pour ces raisons nous ne pouvons utiliser de méthodes qui considère la résolution de la relaxation linéaire comme une opération de base, comme par exemple les méthodes de coupes et de branchements. Au lieu d’utiliser CPLEX nous utilisons une implémentation parallèle (utilisant 32 processeurs) de l’algorithme du Volume. L’instance pour laquelle CPLEX demande 14 heures est résolue en 24 minutes par l’implémentation séquentielle et en 10 minutes par l’implémentation parallèle de l’algorithme du Volume. La solution de la relaxation linéaire est utilisée pour construire une solution réalisable, grâce à l’application d’une heuristique de construction gloutonne puis d’une recherche locale. Nous obtenons des résultats comparables aux résultats obtenus par les meilleures heuristiques connues à ce jour, qui utilisent beaucoup plus de mémoire et réalisent beaucoup plus d’opérations. La mémoire est importante dans notre cas, puisque nous travaillons sur des données de très grandes tailles. Nous étudions le dominant du polytope associé au problème du p-médian. Nous discutons de sa relaxation linéaire ainsi que de sa caractérisation polyédrale. Enfin, nous considérons une version plus réaliste du problème de restructuration de données semi-structurées. Grosso modo, nous ajoutons au problème du p-médian original des nouveaux sommets s’ils aident à réduire le coût global des affectations. / The problems we consider in this thesis are of combinatorial nature. Our main interest is the problem of approximating typing of a semistructured data. For example XML is a semistructured data. This problem may be reduced to an instance of the p-median problem. The main obstacle here is the size of the instances that may be very huge, about 10000 and 20000 nodes which imply several hundreds of million variables. For these instances, even solving the linear relaxation is a hard task. In some preliminary results we noticed that Cplex may solve instances of size 1000 in an acceptable time. But for some instances having 5000 nodes, it may needs 14 days for solving only the linear relaxation. Therefore, we cannot use methods that consider the linear relaxation as an elementary operation, as for example branch-and-cut methods. Instead of using Cplex we use the Volume algorithm in a parallel implementation (32 processors).For the instance where the Cplex needs 14 hours, the Volume algorithm in sequential implementation needs 24 minutes and in parallel implementation it needs 10 minutes. The solution of the linear relaxation is used to produce a feasible solution by first applying a greedy and then a local search heuristic. We notice that the results we obtain are relatively the same as those given by the best method known up today, which produces more effort and consumes more memory. Memory is important in our case since the data we consider are huge. We study the dominant of the polytope associated with the p-median problem. We discuss linear relaxation and a polyhedral characterization. Finally, we consider a more realistic version of the p-median problem when applied to the problem of approximating typing of a semistructured data. Roughly speaking, we add new nodes to the underlying graph if this help to reduce the overall cost. P-médian Algorithme du Volume Relaxation linéaire Implémentation parallèle Base de donnée semi-structurée P-median Volume algorithm Linear relaxation Parallel implementation Semi-structured database
8	Detekce deformovatelného pole markerů / Detection of Deformable Marker Field Schery, Miroslav January 2013 (has links) This Thesis is focused on study of augmented reality and creation of algorithm for a uniform marker field detector. The marker field is modified to be tolerant to a high degree of deformation. Existing marker types are studied. Important part of the paper is a description of uniform marker field technique, from which a modified assignment is derived. It also describes CUDA architecture on which the first part of the detection algorithm is implemented. Deformation tolerance, detection rate and speed tests are performed on the resulting detector algorithm.
9	Numerical Quality and High Performance In Interval Linear Algebra on Multi-Core Processors / Algèbre linéaire d'intervalles - Qualité Numérique et Hautes Performances sur Processeurs Multi-Cœurs Theveny, Philippe 31 October 2014 (has links) L'objet est de comparer des algorithmes de multiplication de matrices à coefficients intervalles et leurs implémentations.Le premier axe est la mesure de la précision numérique. Les précédentes analyses d'erreur se limitent à établir une borne sur la surestimation du rayon du résultat en négligeant les erreurs dues au calcul en virgule flottante. Après examen des différentes possibilités pour quantifier l'erreur d'approximation entre deux intervalles, l'erreur d'arrondi est intégrée dans l'erreur globale. À partir de jeux de données aléatoires, la dispersion expérimentale de l'erreur globale permet d'éclairer l'importance des différentes erreurs (de méthode et d'arrondi) en fonction de plusieurs facteurs : valeur et homogénéité des précisions relatives des entrées, dimensions des matrices, précision de travail. Cette démarche conduit à un nouvel algorithme moins coûteux et tout aussi précis dans certains cas déterminés.Le deuxième axe est d'exploiter le parallélisme des opérations. Les implémentations précédentes se ramènent à des produits de matrices de nombres flottants. Pour contourner les limitations d'une telle approche sur la validité du résultat et sur la capacité à monter en charge, je propose une implémentation par blocs réalisée avec des threads OpenMP qui exécutent des noyaux de calcul utilisant les instructions vectorielles. L'analyse des temps d'exécution sur une machine de 4 octo-coeurs montre que les coûts de calcul sont du même ordre de grandeur sur des matrices intervalles et numériques de même dimension et que l'implémentation par bloc passe mieux à l'échelle que l'implémentation avec plusieurs appels aux routines BLAS. / This work aims at determining suitable scopes for several algorithms of interval matrices multiplication.First, we quantify the numerical quality. Former error analyses of interval matrix products establish bounds on the radius overestimation by neglecting the roundoff error. We discuss here several possible measures for interval approximations. We then bound the roundoff error and compare experimentally this bound with the global error distribution on several random data sets. This approach enlightens the relative importance of the roundoff and arithmetic errors depending on the value and homogeneity of relative accuracies of inputs, on the matrix dimension, and on the working precision. This also leads to a new algorithm that is cheaper yet as accurate as previous ones under well-identified conditions.Second, we exploit the parallelism of linear algebra. Previous implementations use calls to BLAS routines on numerical matrices. We show that this may lead to wrong interval results and also restrict the scalability of the performance when the core count increases. To overcome these problems, we implement a blocking version with OpenMP threads executing block kernels with vector instructions. The timings on a 4-octo-core machine show that this implementation is more scalable than the BLAS one and that the cost of numerical and interval matrix products are comparable. Algèbre linéaire numérique Multiplication de matrices Implémentation parallèle Processeurs multi-cœurs Memoire partagée Virgule flottante Analyse d’erreur Arithmétique d’intervalles Numerical linear algebra Matrix multiplication Parallel implementation Multi-core processors Shared memory Floating-point number Error analysis Interval arithmetic
10	Contrer l'attaque Simple Power Analysis efficacement dans les applications de la cryptographie asymétrique, algorithmes et implantations / Thwart simple power analysis efficiently in asymmetric cryptographic applications, algorithms and implementations Robert, Jean-Marc 08 December 2015 (has links) Avec le développement des communications et de l'Internet, l'échange des informations cryptées a explosé. Cette évolution a été possible par le développement des protocoles de la cryptographie asymétrique qui font appel à des opérations arithmétiques telles que l'exponentiation modulaire sur des grands entiers ou la multiplication scalaire de point de courbe elliptique. Ces calculs sont réalisés par des plates-formes diverses, depuis la carte à puce jusqu'aux serveurs les plus puissants. Ces plates-formes font l'objet d'attaques qui exploitent les informations recueillies par un canal auxiliaire, tels que le courant instantané consommé ou le rayonnement électromagnétique émis par la plate-forme en fonctionnement.Dans la thèse, nous améliorons les performances des opérations résistantes à l'attaque Simple Power Analysis. Sur l'exponentiation modulaire, nous proposons d'améliorer les performances par l'utilisation de multiplications modulaires multiples avec une opérande commune optimisées. Nous avons proposé trois améliorations sur la multiplication scalaire de point de courbe elliptique : sur corps binaire, nous employons des améliorations sur les opérations combinées AB,AC et AB+CD sur les approches Double-and-add, Halve-and-add et Double/halve-and-add et l'échelle binaire de Montgomery ; sur corps binaire, nous proposons de paralléliser l'échelle binaire de Montgomery ; nous réalisons l'implantation d'une approche parallèle de l'approche Right-to-left Double-and-add sur corps premier et binaire, Halve-and-add et Double/halve-and-add sur corps binaire. / The development of online communications and the Internet have made encrypted data exchange fast growing. This has been possible with the development of asymmetric cryptographic protocols, which make use of arithmetic computations such as modular exponentiation of large integer or elliptic curve scalar multiplication. These computations are performed by various platforms, including smart-cards as well as large and powerful servers. The platforms are subject to attacks taking advantage of information leaked through side channels, such as instantaneous power consumption or electromagnetic radiations.In this thesis, we improve the performance of cryptographic computations resistant to Simple Power Analysis. On modular exponentiation, we propose to use multiple multiplications sharing a common operand to achieve this goal. On elliptic curve scalar multiplication, we suggest three different improvements : over binary fields, we make use of improved combined operation AB,AC and AB+CD applied to Double-and-add, Halve-and-add and Double/halve-and-add approaches, and to the Montgomery ladder ; over binary field, we propose a parallel Montgomery ladder ; we make an implementation of a parallel approach based on the Right-to-left Double-and-add algorithm over binary and prime fields, and extend this implementation to the Halve-and-add and Double/halve-and-add over binary fields. Cryptographie Protocole asymétrique Arithmétique des corps finis Exponentiation modulaire Opération combinée Implantation parallèle Échelle binaire de Montgomery Cryptography Asymmetric protocol Finite field arithmetic Modular exponentiation Elliptic curve scalar multiplication Opération combinée Parallel implementation Montgomery's binary ladder 005.8

Search results