Global ETD Search

371	Algoritmy grafiky a video v GP-GPU / Graphics and Video Algorithms in GP-GPU Kula, Michal January 2013 (has links) This diploma thesis is focused on object detections through general-purpose computing on graphics processor units. There is an explanation of graphics adapters work and basics of their architecture in this thesis. Based on the adapters, there is the effective work in libraries for general-purpose computing on graphics processor units demonstrated in this thesis. Further, the thesis shows the available algorithms for object detection and which ones from them are possible to be effectively parallelized. In conclusion of this thesis, there is a comparison of the object detections speeds to common implementations on classical processors.
372	Bayesian iterative reconstruction methods for 3D X-ray Computed Tomography / Méthodes bayésiennes de reconstruction itérative pour la tomographie 3D à rayons X Chapdelaine, Camille 12 April 2019 (has links) Dans un contexte industriel, la tomographie 3D par rayons X vise à imager virtuellement une pièce afin d'en contrôler l'intérieur. Le volume virtuel de la pièce est obtenu par un algorithme de reconstruction, prenant en entrées les projections de rayons X qui ont été envoyés à travers la pièce. Beaucoup d'incertitudes résident dans ces projections à cause de phénomènes non contrôlés tels que la diffusion et le durcissement de faisceau, causes d'artefacts dans les reconstructions conventionnelles par rétroprojection filtrée. Afin de compenser ces incertitudes, les méthodes de reconstruction dites itératives tentent de faire correspondre la reconstruction à un modèle a priori, ce qui, combiné à l'information apportée par les projections, permet d'améliorer la qualité de reconstruction. Dans ce contexte, cette thèse propose de nouvelles méthodes de reconstruction itératives pour le contrôle de pièces produites par le groupe SAFRAN. Compte tenu de nombreuses opérations de projection et de rétroprojection modélisant le processus d'acquisition, les méthodes de reconstruction itératives peuvent être accélérées grâce au calcul parallèle haute performance sur processeur graphique (GPU). Dans cette thèse, les implémentations sur GPU de plusieurs paires de projecteur-rétroprojecteur sont décrites. En particulier, une nouvelle implémentation pour la paire duale dite à empreinte séparable est proposée. Beaucoup de pièces produites par SAFRAN pouvant être vues comme des volumes constants par morceaux, un modèle a priori de Gauss-Markov-Potts est introduit, à partir duquel est déduit un algorithme de reconstruction et de segmentation conjointes. Cet algorithme repose sur une approche bayésienne permettant d'expliquer le rôle de chacun des paramètres. Le caractère polychromatique des rayons X par lequel s'expliquent la diffusion et le durcissement de faisceau est pris en compte par l'introduction d'un modèle direct séparant les incertitudes sur les projections. Allié à un modèle de Gauss-Markov-Potts sur le volume, il est montré expérimentalement que ce nouveau modèle direct apporte un gain en précision et en robustesse. Enfin, l'estimation des incertitudes sur la reconstruction est traitée via l'approche bayésienne variationnelle. Pour obtenir cette estimation en un temps de calcul raisonnable, il est montré qu'il est nécessaire d'utiliser une paire duale de projecteur-rétroprojecteur. / In industry, 3D X-ray Computed Tomography aims at virtually imaging a volume in order to inspect its interior. The virtual volume is obtained thanks to a reconstruction algorithm based on projections of X-rays sent through the industrial part to inspect. In order to compensate uncertainties in the projections such as scattering or beam-hardening, which are cause of many artifacts in conventional filtered backprojection methods, iterative reconstruction methods bring further information by enforcing a prior model on the volume to reconstruct, and actually enhance the reconstruction quality. In this context, this thesis proposes new iterative reconstruction methods for the inspection of aeronautical parts made by SAFRAN group. In order to alleviate the computational cost due to repeated projection and backprojection operations which model the acquisition process, iterative reconstruction methods can take benefit from the use of high-parallel computing on Graphical Processor Unit (GPU). In this thesis, the implementation on GPU of several pairs of projector and backprojector is detailed. In particular, a new GPU implementation of the matched Separable Footprint pair is proposed. Since many of SAFRAN's industrial parts are piecewise-constant volumes, a Gauss-Markov-Potts prior model is introduced, from which a joint reconstruction and segmentation algorithm is derived. This algorithm is based on a Bayesian approach which enables to explain the role of each parameter. The actual polychromacy of X-rays, which is responsible for scattering and beam-hardening, is taken into account by proposing an error-splitting forward model. Combined with Gauss-Markov-Potts prior on the volume, this new forward model is experimentally shown to bring more accuracy and robustness. At last, the estimation of the uncertainties on the reconstruction is investigated by variational Bayesian approach. In order to have a reasonable computation time, it is highlighted that the use of a matched pair of projector and backprojector is necessary. Tomographie à rayons X Reconstruction 3D Inférence bayésienne Calcul parallèle Gpu X-Ray computed tomography 3D Reconstruction Bayesian inference Parallel computing Gpu
373	Implementierung des Genom-Alignments auf modernen hochparallelen Plattformen Knodel, Oliver 28 June 2011 (has links) Durch die wachsende Bedeutung der DNS-Sequenzierung wurden die Geräte zur Sequenzierung weiterentwickelt und ihr Durchsatz so erhöht, dass sie Millionen kurzer Nukleotidsequenzen innerhalb weniger Tage liefern. Moderne Algorithmen und Programme, welche die dadurch entstehenden großen Datenmengen in akzeptabler Zeit verarbeiten können, ermitteln jedoch nur einen Bruchteil der Positionen der Sequenzen in bekannten Datenbanken. Eine derartige Suche ist eine der wichtigsten Aufgaben in der modernen Molekularbiologie. Diese Arbeit untersucht mögliche Übertragungen moderner Genom-Alignment Programme auf hochparallele Plattformen wie FPGA und GPU. Die derzeitig an das Problem angepassten Programme und Algorithmen werden untersucht und hinsichtlich ihrer Parallelisierbarkeit auf den beiden Plattformen FPGA und GPU analysiert. Nach einer Bewertung der Alternativen erfolgt die Auswahl eines Algorithmus. Anschließend wird dessen Übertragung auf die beiden Plattformen entworfen und implementiert. Dabei stehen die Geschwindigkeit der Suche, die Anzahl der ermittelten Positionen sowie die Nutzbarkeit im Vordergrund. Der auf der GPU implementierte reduzierte Smith & Waterman-Algorithmus ist effizient an die Problemstellung angepasst und erreicht für kurze Sequenzen höhere Geschwindigkeiten als bisherige Realisierungen auf Grafikkarten. Eine vergleichbare Umsetzung auf dem FPGA benötigt eine deutlich geringere Laufzeit, findet ebenfalls jede Position in der Datenbank und erreicht dabei ähnliche Geschwindigkeiten wie moderne leistungsfähige Programme, die aber heuristisch arbeiten. Die Anzahl der gefundenen Positionen ist bei FPGA und GPU damit mehr als doppelt so hoch wie bei sämtlichen vergleichbaren Programmen. / Further developments of DNA sequencing devices produce millions of short nucleotide sequences. Finding the positions of these sequences in databases of known sequences is an important problem in modern molecular biology. Current heuristic algorithms and programs only find a small fraction of these positions. In this thesis genome alignment algorithms are implemented on massively parallel platforms as FPGA and GPU. The next generation sequencing technologies that are currently in use are reviewed regarding their possible parallelization on FPGA and GPU. After evaluation one algorithm is chosen for parallelization. Its implementation on both platforms is designed and realized. Runtime, accuracy as well as usability are important features of the implementation. The reduced Smith & Waterman algorithm which is realized on the GPU outperforms similar GPU programs in speed and efficiency for short sequences. The runtime of the FPGA approach is similar to those of widely used heuristic software mappers and much lower than on the GPU. Furthermore the FPGA guarantees to find all alignment positions of a sequence in the database, which is more than twice the number that is found by comparable software algorithms. info:eu-repo/classification/ddc/004 ddc:004
374	Tolkning av handskrivna siffror i formulär : Betydelsen av datauppsättningens storlek vid maskininlärning Kirik, Engin January 2021 (has links) Forskningen i denna studie har varit att tag fram hur mycket betydelse storleken på datauppsättningen har för inverkan på resultat inom objektigenkänning. Forskningen implementerades i att träna en modell inom datorseende som skall kunna identifiera och konvertera handskrivna siffror från fysisk-formulär till digitaliserad-format. Till denna process användes två olika ramverk som heter TensorFlow och PyTorch. Processen tränades inom två olika miljöer, ena modellen tränades i CPU-miljö och den andra i Google Clouds GPU-miljö. Tanken med studien är att förbättra resultat från tidigare examensarbete och forska vidare till att utöka utvecklingen extra genom att skapa en modell som identifierar och digitaliserar flera handskrivna siffror samtidigt på ett helt formulär. För att vidare i fortsättningen kunna användas till applikationer som räknar ihop tex poängskörden på ett formulär med hjälp av en mobilkamera för igenkänning. Projektet visade ett resultat av ett felfritt igenkännande av flera siffror samtidigt, när datauppsättningen ständigt utökades. Resultat kring enskilda siffror lyckades identifiera alla siffror från 0 till 9 med både ramverket TensorFlow och PyTorch. / The research in this study has been to extract how important the size of the dataset is for the impact on results within object recognition. The research was implemented in training a model in computer vision that should be able to identify and convert handwritten numbers from physical forms to digitized format. Two different frameworks called TensorFlow and PyTorch were used for this process. The process was trained in two different environments, one model was trained in the CPU environment and the other in the Google Cloud GPU environment. The idea of the study is to improve results from previous degree projects and further research to expand the development extra by creating a model that identifies and digitizes several handwritten numbers simultaneously on a complete form, which will continue to be able to help and be used in the future for applications that sums up points on a form using a mobile camera for recognition. The project showed a result of an error-free recognition of several numbers at the same time, when the data set was constantly expanded. Results around individual numbers managed to identify all numbers from 0 to 9 with both the TensorFlow and PyTorch frameworks. Machine learning Neural networks Object recognition TensorFlow PyTorch CPU GPU Maskininlärning Neurala nätverk Objektigenkänning TensorFlow PyTorch CPU GPU Software Engineering Programvaruteknik
375	Konstruktion av variabel last : Utvärdering av GPU:er genom simulerad flygfarkost Schleu, Anton January 2020 (has links) När flygfarkoster befinner sig på marken får de sin strömförsörjning tillgodosedd genom att vara anslutna mot en GPU (Ground Power Unit), vilket förser flygfarkosten med spänningen 115 V till en frekvens av 400 Hz. Dagens moderna flygfarkoster består av mer avancerad elektronik, vilket kan ge upphov till störningar på den matande GPU som tidigare inte förekommit. Syftet med denna rapport är att undersöka vilka lastfall som kan genereras av en flygfarkost, för att genom en konstruktion kunna återskapa dessa i form av en variabel last. Med hjälp av lasten ska GPU:ers förmåga att upprätthålla spänning och frekvens utvärderas, för att kontrollera om dessa vidhålls trots störningar i form av växlande belastningar. Huvudmålen för rapporten är att fastställa vilka lastfall som kan genereras och återskapa dessa genom en variabel last samt slutligen verifiera och kontrollera framtagen produkt tillsammans med en GPU. En viktig avgränsning för rapporten är att den last som konstrueras inte kommer att kunna genera övertoner, någon som vanligen förekommer bland dagens flygfarkoster. Lasten kommer därigenom enbart att kunna växla mellan att vara induktiv eller kapacitiv. Resultatet från denna rapport visar på att genom konstruktion av en last med lastfallen cos-φ 0,5 0,6 0,7 0,8 0,9 samt -0,5 -0,6 -0,7 -0,8 -0,9, går det att simulera en flygfarkost i form av induktiva och kapacitiva växlingar. Det observeras också genom mätningar att den last som konstruerats ger störningar på spänning och ström liknade det som setts vid mätningar på en verklig flygfarkost. Slutsatsen för denna rapport är att det går att konstruera en variabel last som kan generera lastfall likt en modern flygfarkost, både teoretiskt och praktiskt. Det går också genom denna last att efterlikna verkliga störningar uppmätta på en GPU, för att därigenom kunna utvärdera dess förmåga att hantera komplexa lastfall. / When an aircraft is on the ground, it receives its power by being connected to a GPU (Ground Power Unit), which supplies the aircraft with a voltage of 115 V to a frequency of 400 Hz. Today's modern aircraft consists of more advanced electronics, which can cause interference to the GPU that not previously existed. The purpose of this report is to investigate which load cases can be generated by an aircraft, to be able to reproduce these in the form of a variable last. Using the load, the GPU's ability to maintain voltage and frequency should be evaluated to check if these are maintained despite interference in the form of alternating loads. The objectives of this report are to determine which load cases can be generated and recreate these through the variable load, to verify and control the developed product together with a GPU. An important delimitation of the report is that the constructed load will not be able to generate harmonics, which is commonly found among today's aircraft. The load will therefore only be able to switch between being inductive or capacitive. The result of this report shows that by constructing a load with the load case cos-φ 0.5 0.6 0.7 0.8 0.9 and -0.5 -0.6 -0.7 -0, 8 -0.9, it is possible to simulate an aircraft in the form of inductive and capacitive alternations. It has also been seen by measurements that the constructed load causes disturbances in voltage and current similar to those seen in measurements on a real aircraft. This report concludes that it is possible to construct a variable load that can generate load cases similar to a modern aircraft, both theoretically and practically. It also possible to mimic real interference measured on a GPU, thereby evaluate its ability to handle complex load cases. Design aircraft complex GPU 400 Hz Konstruktion flygfarkost komplex GPU 400 Hz Annan elektroteknik och elektronik
376	Big Data causing Big (TLB) Problems: Taming Random Memory Accesses on the GPU Karnagel, Tomas, Ben-Nun, Tal, Werner, Matthias, Habich, Dirk, Lehner, Wolfgang 13 June 2022 (has links) GPUs are increasingly adopted for large-scale database processing, where data accesses represent the major part of the computation. If the data accesses are irregular, like hash table accesses or random sampling, the GPU performance can suffer. Especially when scaling such accesses beyond 2GB of data, a performance decrease of an order of magnitude is encountered. This paper analyzes the source of the slowdown through extensive micro-benchmarking, attributing the root cause to the Translation Lookaside Buffer (TLB). Using the micro-benchmarks, the TLB hierarchy and structure are fully analyzed on two different GPU architectures, identifying never-before-published TLB sizes that can be used for efficient large-scale application tuning. Based on the gained knowledge, we propose a TLB-conscious approach to mitigate the slowdown for algorithms with irregular memory access. The proposed approach is applied to two fundamental database operations - random sampling and hash-based grouping - showing that the slowdown can be dramatically reduced, and resulting in a performance increase of up to 13×. info:eu-repo/classification/ddc/004 ddc:004
377	Fuites d'information dans les processeurs récents et applications à la virtualisation / Information leakage on shared hardware : evolutions in recent hardware and applications to virtualization Maurice, Clémentine 28 October 2015 (has links) Dans un environnement virtualisé, l'hyperviseur fournit l'isolation au niveau logiciel, mais l'infrastructure partagée rend possible des attaques au niveau matériel. Les attaques par canaux auxiliaires ainsi que les canaux cachés sont des problèmes bien connus liés aux infrastructures partagées, et en particulier au partage du processeur. Cependant, ces attaques reposent sur des caractéristiques propres à la microarchitecture qui change avec les différentes générations de matériel. Ces dernières années ont vu la progression des calculs généralistes sur processeurs graphiques (aussi appelés GPUs), couplés aux environnements dits cloud. Cette thèse explore ces récentes évolutions, ainsi que leurs conséquences en termes de fuites d'information dans les environnements virtualisés. Premièrement, nous investiguons les microarchitectures des processeurs récents. Notre première contribution est C5, un canal caché sur le cache qui traverse les coeurs d'un processeur, évalué entre deux machines virtuelles. Notre deuxième contribution est la rétro-ingénierie de la fonction d'adressage complexe du dernier niveau de cache des processeurs Intel, rendant la classe des attaques sur les caches facilement réalisable en pratique. Finalement, dans la dernière partie nous investiguons la sécurité de la virtualisation des GPUs. Notre troisième contribution montre que les environnements virtualisés sont susceptibles aux fuites d'informations sur la mémoire d'un GPU. / In a virtualized environment, the hypervisor provides isolation at the software level, but shared infrastructure makes attacks possible at the hardware level. Side and covert channels are well-known issues of shared hardware, and in particular shared processors. However, they rely on microarchitectural features that are changing with the different generations of hardware. The last years have also shown the rise of General-Purpose computing on Graphics Processing Units (GPGPU), coupled to so-called cloud environments. This thesis explores these recent evolutions and their consequences in terms of information leakage in virtualized environments. We first investigate the recent processor microarchitectures. Our first contribution is C5, a cross-core cache covert channel, evaluated between virtual machines. Following this work, our second contribution is the reverse engineering of the complex addressing function of the last-level cache of Intel processors, rendering the class of cache attacks highly practical. In the last part, we investigate the security of GPU virtualization. Our third contribution shows that virtualized environments are susceptible to information leakage from the GPU memory. Sécurité informatique Fuite d'information Virtualisation Canal caché Canal auxiliaire Processeur Cache GPU Computer security Information leakage Virtualization Covert channel Side channel Processor Cache GPU
378	A vision system based real-time SLAM applications / Un système de vision pour la localisation et cartographie temps-réel Nguyen, Dai-Duong 07 December 2018 (has links) SLAM (localisation et cartographie simultanées) joue un rôle important dans plusieurs applications telles que les robots autonomes, les véhicules intelligents, les véhicules aériens sans pilote (UAV) et autres. De nos jours, les applications SLAM basées sur la vision en temps réel deviennent un sujet d'intérêt général dans de nombreuses recherches. L'une des solutions pour résoudre la complexité de calcul des algorithmes de traitement d'image, dédiés aux applications SLAM, consiste à effectuer un traitement de haut ou de bas niveau sur les coprocesseurs afin de créer un système sur puce. Les architectures hétérogènes ont démontré leur capacité à devenir des candidats potentiels pour un système sur puce dans une approche de co-conception de logiciels matériels. L'objectif de cette thèse est de proposer un système de vision implémentant un algorithme SLAM sur une architecture hétérogène (CPU-GPU ou CPU-FPGA). L'étude permettra d'évaluer ce type d'architectures et contribuer à répondre aux questions relatives à la définition des fonctions et/ou opérateurs élémentaires qui devraient être implantés et comment intégrer des algorithmes de traitement de données tout en prenant en considération l'architecture cible (dans un contexte d'adéquation algorithme architecture). Il y a deux parties dans un système SLAM visuel : Front-end (extraction des points d'intérêt) et Back-end (cœur de SLAM). Au cours de la thèse, concernant la partie Front-end, nous avons étudié plusieurs algorithmes de détection et description des primitives dans l’image. Nous avons développé notre propre algorithme intitulé HOOFR (Hessian ORB Overlapped FREAK) qui possède une meilleure performance par rapport à ceux de l’état de l’art. Cet algorithme est basé sur la modification du détecteur ORB et du descripteur bio-inspiré FREAK. Les résultats de l’amélioration ont été validés en utilisant des jeux de données réel connus. Ensuite, nous avons proposé l'algorithme HOOFR-SLAM Stereo pour la partie Back-end. Cet algorithme utilise les images acquises par une paire de caméras pour réaliser la localisation et cartographie simultanées. La validation a été faite sur plusieurs jeux de données (KITTI, New_College, Malaga, MRT, St_lucia…). Par la suite, pour atteindre un système temps réel, nous avons étudié la complexité algorithmique de HOOFR SLAM ainsi que les architectures matérielles actuelles dédiées aux systèmes embarqués. Nous avons utilisé une méthodologie basée sur la complexité de l'algorithme et le partitionnement des blocs fonctionnels. Le temps de traitement de chaque bloc est analysé en tenant compte des contraintes des architectures ciblées. Nous avons réalisé une implémentation de HOOFR SLAM sur une architecture massivement parallèle basée sur CPU-GPU. Les performances ont été évaluées sur un poste de travail puissant et sur des systèmes embarqués basés sur des architectures. Dans cette étude, nous proposons une architecture au niveau du système et une méthodologie de conception pour intégrer un algorithme de vision SLAM sur un SoC. Ce système mettra en évidence un compromis entre polyvalence, parallélisme, vitesse de traitement et résultats de localisation. Une comparaison avec les systèmes conventionnels sera effectuée pour évaluer l'architecture du système définie. Vue de la consommation d'énergie, nous avons étudié l'implémentation la partie Front-end sur l'architecture configurable type soc-FPGA. Le SLAM kernel est destiné à être exécuté sur un processeur. Nous avons proposé une architecture par la méthode HLS (High-level synthesis) en utilisant langage OpenCL. Nous avons validé notre architecture sur la carte Altera Arria 10 soc. Une comparaison avec les systèmes les plus récents montre que l’architecture conçue présente de meilleures performances et un compromis entre la consommation d’énergie et les temps de traitement. / SLAM (Simultaneous Localization And Mapping) has an important role in several applications such as autonomous robots, smart vehicles, unmanned aerial vehicles (UAVs) and others. Nowadays, real-time vision based SLAM applications becomes a subject of widespread interests in many researches. One of the solutions to solve the computational complexity of image processing algorithms, dedicated to SLAM applications, is to perform high or/and low level processing on co-processors in order to build a System on Chip. Heterogeneous architectures have demonstrated their ability to become potential candidates for a system on chip in a hardware software co-design approach. The aim of this thesis is to propose a vision system implementing a SLAM algorithm on a heterogeneous architecture (CPU-GPU or CPU-FPGA). The study will allow verifying if these types of heterogeneous architectures are advantageous, what elementary functions and/or operators should be added on chip and how to integrate image-processing and the SLAM Kernel on a heterogeneous architecture (i. e. How to map the vision SLAM on a System on Chip).There are two parts in a visual SLAM system: Front-end (feature extraction, image processing) and Back-end (SLAM kernel). During this thesis, we studied several features detection and description algorithms for the Front-end part. We have developed our own algorithm denoted as HOOFR (Hessian ORB Overlapped FREAK) extractor which has a better compromise between precision and processing times compared to those of the state of the art. This algorithm is based on the modification of the ORB (Oriented FAST and rotated BRIEF) detector and the bio-inspired descriptor: FREAK (Fast Retina Keypoint). The improvements were validated using well-known real datasets. Consequently, we propose the HOOFR-SLAM Stereo algorithm for the Back-end part. This algorithm uses images acquired by a stereo camera to perform simultaneous localization and mapping. The HOOFR SLAM performances were evaluated on different datasets (KITTI, New-College , Malaga, MRT, St-Lucia, ...).Afterward, to reach a real-time system, we studied the algorithmic complexity of HOOFR SLAM as well as the current hardware architectures dedicated for embedded systems. We used a methodology based on the algorithm complexity and functional blocks partitioning. The processing time of each block is analyzed taking into account the constraints of the targeted architectures. We achieved an implementation of HOOFR SLAM on a massively parallel architecture based on CPU-GPU. The performances were evaluated on a powerful workstation and on architectures based embedded systems. In this study, we propose a system-level architecture and a design methodology to integrate a vision SLAM algorithm on a SoC. This system will highlight a compromise between versatility, parallelism, processing speed and localization results. A comparison related to conventional systems will be performed to evaluate the defined system architecture. In order to reduce the energy consumption, we have studied the implementation of the Front-end part (image processing) on an FPGA based SoC system. The SLAM kernel is intended to run on a CPU processor. We proposed a parallelized architecture using HLS (High-level synthesis) method and OpenCL language programming. We validated our architecture for an Altera Arria 10 SoC. A comparison with systems in the state-of-the-art showed that the designed architecture presents better performances and a compromise between power consumption and processing times. Traitement d'image Systèmes embarqués SLAM GPU FPGA HW/SW mapping Image Processing Embedded systems SLAM GPU FPGA
379	Approche haut niveau pour l’accélération d’algorithmes sur des architectures hétérogènes CPU/GPU/FPGA. Application à la qualification des radars et des systèmes d’écoute électromagnétique / High-Level Approach for the Acceleration of Algorithms on CPU/GPU/FPGA Heterogeneous Architectures. Application to Radar Qualification and Electromagnetic Listening Systems Martelli, Maxime 13 December 2019 (has links) A l'heure où l'industrie des semi-conducteurs fait face à des difficultés majeures pour entretenir une croissance en berne, les nouveaux outils de synthèse de haut niveau repositionnent les FPGAs comme une technologie de premier plan pour l'accélération matérielle d'algorithmes face aux clusters à base de CPUs et GPUs. Mais en l’état, pour un ingénieur logiciel, ces outils ne garantissent pas, sans expertise du matériel sous-jacent, l’utilisation de ces technologies à leur plein potentiel. Cette particularité peut alors constituer un frein à leur démocratisation. C'est pourquoi nous proposons une méthodologie d'accélération d'algorithmes sur FPGA. Après avoir présenté un modèle d'architecture haut niveau de cette cible, nous détaillons différentes optimisations possibles en OpenCL, pour finalement définir une stratégie d'exploration pertinente pour l'accélération d'algorithmes sur FPGA. Appliquée sur différents cas d'étude, de la reconstruction tomographique à la modélisation d'un brouillage aéroporté radar, nous évaluons notre méthodologie suivant trois principaux critères de performance : le temps de développement, le temps d'exécution, et l'efficacité énergétique. / As the semiconductor industry faces major challenges in sustaining its growth, new High-Level Synthesis tools are repositioning FPGAs as a leading technology for algorithm acceleration in the face of CPU and GPU-based clusters. But as it stands, for a software engineer, these tools do not guarantee, without expertise of the underlying hardware, that these technologies will be harnessed to their full potential. This can be a game breaker for their democratization. From this observation, we propose a methodology for algorithm acceleration on FPGAs. After presenting a high-level model of this architecture, we detail possible optimizations in OpenCL, and finally define a relevant exploration strategy for accelerating algorithms on FPGA. Applied to different case studies, from tomographic reconstruction to the modelling of an airborne radar jammer, we evaluate our methodology according to three main performance criteria: development time, execution time, and energy efficiency. Adéquation algorithme architecture Radar OpenCL FPGA GPU Calcul haute performance Algorithm architecture co-design Radar OpenCL FPGA GPU High performance computing
380	Comparing Julia and Python : An investigation of the performance on image processing with deep neural networks and classification Axillus, Viktor January 2020 (has links) Python is the most popular language when it comes to prototyping and developing machine learning algorithms. Python is an interpreted language that causes it to have a significant performance loss compared to compiled languages. Julia is a newly developed language that tries to bridge the gap between high performance but cumbersome languages such as C++ and highly abstracted but typically slow languages such as Python. However, over the years, the Python community have developed a lot of tools that addresses its performance problems. This raises the question if choosing one language over the other has any significant performance difference. This thesis compares the performance, in terms of execution time, of the two languages in the machine learning domain. More specifically, image processing with GPU-accelerated deep neural networks and classification with k-nearest neighbor on the MNIST and EMNIST dataset. Python with Keras and Tensorflow is compared against Julia with Flux for GPU-accelerated neural networks. For classification Python with Scikit-learn is compared against Julia with Nearestneighbors.jl. The results point in the direction that Julia has a performance edge in regards to GPU-accelerated deep neural networks. With Julia outperforming Python by roughly 1.25x − 1.5x. For classification with k-nearest neighbor the results were a bit more varied with Julia outperforming Python in 5 out of 8 different measurements. However, there exists some validity threats and additional research is needed that includes all different frameworks available for the languages in order to provide a more conclusive and generalized answer. julia python performance comparison machine learning image processing GPU GPU-acceleration neural networks autoencoder classification knn k-nearest neighbor Software Engineering Programvaruteknik

Search results