Global ETD Search

81	Prédiction de performance d'algorithmes de traitement d'images sur différentes architectures hardwares / Image processing algorithm performance prediction on different hardware architectures Soucies, Nicolas 07 May 2015 (has links) Dans le contexte de la vision par ordinateur, le choix d’une architecture de calcul est devenu de plus en plus complexe pour un spécialiste du traitement d’images. Le nombre d’architectures permettant de résoudre des algorithmes de traitement d’images augmente d’année en année. Ces algorithmes s’intègrent dans des cadres eux-mêmes de plus en plus complexes répondant à de multiples contraintes, que ce soit en terme de capacité de calculs, mais aussi en terme de consommation ou d’encombrement. A ces contraintes s’ajoute le nombre grandissant de types d’architectures de calculs pouvant répondre aux besoins d’une application (CPU, GPU, FPGA). L’enjeu principal de l’étude est la prédiction de la performance d’un système, cette prédiction pouvant être réalisée en phase amont d’un projet de développement dans le domaine de la vision. Dans un cadre de développement, industriel ou de recherche, l’impact en termes de réduction des coûts de développement, est d’autant plus important que le choix de l’architecture de calcul est réalisé tôt. De nombreux outils et méthodes d’évaluation de la performance ont été développés mais ceux-ci, se concentrent rarement sur un domaine précis et ne permettent pas d’évaluer la performance sans une étude complète du code ou sans la réalisation de tests sur l’architecture étudiée. Notre but étant de s’affranchir totalement de benchmark, nous nous sommes concentrés sur le domaine du traitement d’images pour pouvoir décomposer les algorithmes du domaine en éléments simples ici nommées briques élémentaires. Dans cette optique, un nouveau paradigme qui repose sur une décomposition de tout algorithme de traitement d’images en ces briques élémentaires a été conçu. Une méthode est proposée pour modéliser ces briques en fonction de paramètres software et hardwares. L’étude démontre que la décomposition en briques élémentaires est réalisable et que ces briques élémentaires peuvent être modélisées. Les premiers tests sur différentes architectures avec des données réelles et des algorithmes comme la convolution et les ondelettes ont permis de valider l'approche. Ce paradigme est un premier pas vers la réalisation d’un outil qui permettra de proposer des architectures pour le traitement d’images et d’aider à l’optimisation d’un programme dans ce domaine. / In computer vision, the choice of a computing architecture is becoming more difficult for image processing experts. Indeed, the number of architectures allowing the computation of image processing algorithms is increasing. Moreover, the number of computer vision applications constrained by computing capacity, power consumption and size is increasing. Furthermore, selecting an hardware architecture, as CPU, GPU or FPGA is also an important issue when considering computer vision applications.The main goal of this study is to predict the system performance in the beginning of a computer vision project. Indeed, for a manufacturer or even a researcher, selecting the computing architecture should be done as soon as possible to minimize the impact on development.A large variety of methods and tools has been developed to predict the performance of computing systems. However, they do not cover a specific area and they cannot predict the performance without analyzing the code or making some benchmarks on architectures. In this works, we specially focus on the prediction of the performance of computer vision algorithms without the need for benchmarking. This allows splitting the image processing algorithms in primitive blocks.In this context, a new paradigm based on splitting every image processing algorithms in primitive blocks has been developed. Furthermore, we propose a method to model the primitive blocks according to the software and hardware parameters. The decomposition in primitive blocks and their modeling was demonstrated to be possible. Herein, the performed experiences, on different architectures, with real data, using algorithms as convolution and wavelets validated the proposed paradigm. This approach is a first step towards the development of a tool allowing to help choosing hardware architecture and optimizing image processing algorithms. Prédiction de performance Traitement d'images Architecture Hardware CPU Calculateur embarqué Modélisation Performance prediction Hardware architecture 004
82	Uppdatering av IT-stöd hos Markbyggarna AB Zakharina, Tatiana January 2019 (has links) Markbyggarna AB är ett företag som huvudsakligen utför markarbete och maskintjänster. Företagets IT-system är formad för att stödja typiskt kontorsarbete. Projektet inriktades på att hjälpa företaget effektivisera det befintliga systemet och i större omfattning nyttja kostnadsfria alternativ som finns tillgängliga i dagens läge. Arbetet startade med definiering utav systemets avgörande aspekter. För hårdvarorna ställdes krav på CPUs, RAMs och hårddiskens belastning. För mjukvaror definierades funktioner som systemet bör innehava. För trådburen och trådlös nätverk definierades krav på internetanslutning. Efteråt samlades informationen om prestanda i det befintliga systemet med hjälpa av olika övervakningssystem, verktyg och intervjuer. PGRG tillämpades för hårdvarornas övervakning. För nätverksövervakning skapades ett eget övervakningssystem med stöd av de kommandobaserade verktygen Iperf, Speedtest-cli och verktyget Vistumbler. Övervakningsresultaten jämfördes med de önskade egenskaperna och skillnaden mellan dessa två utgjorde underlag för förändringsarbete även med beställarens önskemål i åtanke. Såldes gick den praktiska delen av projektet bland annat på att trådlös signal förstärktes genom installation av AP, introduktion av nya applikationer, en anslutning till företagets PC hemifrån samt skapades system för säkerhetskopiering. Även har en rad andra säkerhetsåtgärder vidtagits. För att kunna utvärdera det genomförda arbetet övervakades det trådlösa nätverket på nytt. Dessa förändringar har gjort att IT-systemet blivit säkrare, med bättre presterande nätverk. Flexibiliteten och funktionaliteten har också ökat. Totalt gick projektet på 1660 kr för inköp av hårda varor. Det gick att täcka de flesta av företagets behov med kostnadsfria mjuka varor. CPU RAM hårddisk PGRP Iperf Speedters-cli Vistumbler. Computer Engineering Datorteknik
83	Prestanda och precision på en enkortsdator i ett system med realtidskrav / Performance and precision of a single-board computer in a system with real-time requirements Wikman, Torbjörn, Hassel, Philip January 2014 (has links) The report aims to investigate how well a certain type of affordable embedded single board computer can hold up against today's more expensive computers in a computer system by doing various tests on a system with the specified requirements. The system has a Raspberry Pi as the single board computer which task is to control a camera based on coordinates obtained from a server as well as capture and stream a video signal on a network. The researches were conducted to check how much network traffic a single-chip computer sent in different video formats and how much CPU utilization was required. Studies were also made to ensure the accuracy of the camera control. The researches have been experimental, where several tests have been performed and analyzed. The results show that a sufficiently good accuracy can be obtained from the camera steering unit, in which two different servos have been investigated. When the video format MJPEG and H.264 are used, the single-chip computer is able to transmit a video signal up to 1280x720 at 15 fps. The system managed to download and perform calculations on an object from the server at 42.3 ms. When the entire system was up and running at the same time the Raspberry Pi didn’t manage to deliver a video signal and obtain the coordinates from the server fast enough. Depending on the video format the performance on the single-chip computer varied, but no setup managed to keep the system stable enough to reach the requirements. / Rapportens syfte är att undersöka hur väl en viss typ av billigare enkortsdator kan stå sig mot dagens dyrare datorer i ett datorsystem genom att göra olika undersökningar på ett system med uppsatta krav. Systemet har en Raspberry Pi som enkortsdator och har till uppgift att styra en kamera utifrån koordinater som fås från en server samt fånga och strömma en videosignal ut på ett nätverk. De undersökningar som gjordes var att kontrollera hur mycket nätverkstrafik som enkortsdatorn sände vid olika format på videosignalen samt hur mycket CPU- utnyttjande som krävdes. Undersökningar gjordes också för att säkerställa precisionen på kamerastyrningen. Alla undersökningar har varit experimentella, där flera olika tester har utförts och analyserats. Resultatet från undersökningarna visar att en tillräckligt god precision kan fås från kamerastyrningen, där två olika servon har undersökts. När videoformaten MJPEG och H.264 används kan enkortsdatorn klara av att sända ut en videosignal upp till 1280x720 med 15 bildrutor per sekund. I systemet som testerna utfördes på klarade enkortsdatorn av att hämta och utföra beräkningar på ett objekt från servern på 42,3 ms. När hela systemet var igång samtidigt klarade dock inte Raspberry Pi av att leverera en videosignal och hämta koordinater från servern tillräckligt snabbt. Beroende på vilket videoformat som användes presterade enkortsdatorn olika bra, men det var ingen inställning som stabilt klarade av att nå kraven. Kamerasystem enkortsdator Raspberry Pi videosignal CPU-användning serverkommunikation nätverkstrafik Saab AB Embedded Systems Inbäddad systemteknik
84	SYSTEMS SUPPORT FOR DATA ANALYTICS BY EXPLOITING MODERN HARDWARE Hongyu Miao (11751590) 03 December 2021 (has links) <p>A large volume of data is continuously being generated by data centers, humans, and the internet of things (IoT). In order to get useful insights, such enormous data must be processed in time with high throughput, low latency, and high accuracy. To meet such performance demands, a large body of new hardware is being shipped by vendors, such as multi-core CPUs, 3D-stacked memory, embedded microcontrollers, and other accelerators.</p><br><p>However, traditional operating systems (OSes) and data analytics frameworks, the key layer that bridges high-level data processing applications and low-level hardware, fails to deliver these requirements due to quickly evolving new hardware and increases in explosion of data. For instance, general OSes are not aware of the unique characters and demands of data processing applications. Data analytics engines for stream processing, e.g., Apache Spark and Beam, always add more machines to deal with more data but leave every single machine underutilized without fully exploiting underlying hardware features, which leads to poor efficiency. Data analytics frameworks for machine learning inference on IoT devices cannot run neural networks that exceed SRAM size, which disqualifies many important use cases.</p><br><p>In order to bridge the gap between the performance demands of data analytics and the new features of emerging hardware, in this thesis we exploit runtime system designs for high-level data processing applications by exploiting low-level modern hardware features. We study two important data analytics applications, including real-time stream processing and on-device machine learning inference, on three important hardware platforms across the Cloud and the Edge, including multicore CPUs, hybrid memory system combining 3D-stacked memory and general DRAM, and embedded microcontrollers with limited resources. </p><br><p>In order to speed up and enable the two data analytics applications on the three hardware platforms, this thesis contributes three related research projects. In project StreamBox, we exploit the parallelism and memory hierarchy of modern multicore hardware on single machines for stream processing, achieving scalable and highly efficient performance. In project StreamBox-HBM, we exploit hybrid memories to balance bandwidth and latency, achieving memory scalability and highly efficient performance. StreamBox and StreamBox-HBM both offer orders of magnitude performance improvements over the prior state of the art, opening up new applications with higher data processing needs. In project SwapNN, we investigate a system solution for microcontrollers (MCUs) to execute neural networks (NNs) inference out-of-core without losing accuracy, enabling new use cases and significantly expanding the scope of NN inference on tiny MCUs. </p><br><p>We report the system designs, system implementations, and experimental results. Based on our experience in building above systems, we provide general guidance on designing runtime systems across hardware/software stack for a wider range of new applications on future hardware platforms.</p><div><br></div> Computer Engineering Computer systems Machine learning Multicore CPU High bandwidth memory Hybrid memory Microcontrollers Data analytics
85	Zpracování obrazu s velkými datovými toky - využití CUDA/OpenCL / High data rate image processing using CUDA/OpenCL Sedláček, Filip January 2018 (has links) The main objective of this research is to propose optimization of the defect detection algorithm in the production of nonwoven textile. The algorithm was developed by CAMEA spol. s.r.o. As a consequence of upgrading the current camera system to a more powerful one, it will be necessary to optimize the current algorithm and choose the hardware with the appropriate architecture on which the calculations will be performed. This work will describe a usefull programming techniques of CUDA software architecture and OpenCL framework in details. Using these tools, we proposed to implement a parallel equivalent of the current algorithm, describe various optimization methods, and we designed a GUI to test these methods.
86	Expertní systém / Expert system Šimková, Jana January 2010 (has links) The main point of the work is an identification with NPS32 expert system, the describtion of the ways of getting knowledges. By choosing a suitable district for the expert system aplication, the suggestion of the knowledge base for the district will have been the result of the work.
87	Využití GPU pro akceleraci optimalizace systému vodních děl / The GPU Accelerated Optimisation of the Water Management Systems Marek, Jan January 2014 (has links) Subject of this thesis is optimalization of storage function of water management system. The work is based on dissertation thesis of Ing. Pavel Menšík Ph.D. Automatization of storage function of water management system. As optimalization method was chosen diferential evolution. Sequential version of the method will be implemented as a first step, followed by CPU accelerated and GPU accelerated versions.
88	Akcelerace částicových rojů PSO pomocí GPU / Particle Swarm Optimization on GPUs Záň, Drahoslav January 2013 (has links) This thesis deals with a population based stochastic optimization technique PSO (Particle Swarm Optimization) and its acceleration. This simple, but very effective technique is designed for solving difficult multidimensional problems in a wide range of applications. The aim of this work is to develop a parallel implementation of this algorithm with an emphasis on acceleration of finding a solution. For this purpose, a graphics card (GPU) providing massive performance was chosen. To evaluate the benefits of the proposed implementation, a CPU and GPU implementation were created for solving a problem derived from the known NP-hard Knapsack problem. The GPU application shows 5 times average and almost 10 times the maximum speedup of computation compared to an optimized CPU application, which it is based on.
89	A comparison of Hybrid and Progressive Web Applications for the Android platform Eleskovic, Denis January 2021 (has links) The Hybrid approach of development has for a long time been the dominating way to develop cross-platform applications targeting both the web and mobile. In recent years, a new combination of technology has appeared called Progressive Web Application (PWA) which aims to combine Native capabilities with best practices of the web to deliver a new Native-like experience to users without the need of Native wrappers. So far PWAs have proven to be the inferior choice when it came to performance and platform support. The purpose of this study is to compare the two technologies based on a literature review and evaluate the current performance across three parameters in an experiment - battery consumption, CPU utilization and time to first activity. Two applications were developed using each respective technique, with the Apache Cordova framework being used for the Hybrid approach and the React framework being used to implement PWA features. The results showed that the Hybrid approach is better in the majority of tests, offering more in terms of platform API access and providing better performance while only being slower when it came to time it took to first activity; but something to consider is that the PWA approach was not far behind. The conclusion this study arrived at was that PWAs have developed significantly since previous studies and is almost able to match Hybrid apps in terms of APIs and performance, but that Hybrid apps are still the preferred choice when it comes to performance. Further development and a wider adaptation of the PWA specification could very well change the way developers choose to approach mobile app development in the future as well as a potential for bringing the web closer to the mobile platform. Progressive Web Apps Hybrid apps Battery consumption CPU utilization Launch time. Software Engineering Programvaruteknik
90	Parallelizing Digital Signal Processing for GPU Ekstam Ljusegren, Hannes, Jonsson, Hannes January 2020 (has links) Because of the increasing importance of signal processing in today's society, there is a need to easily experiment with new ways to process signals. Usually, fast-performing digital signal processing is done with special-purpose hardware that are difficult to develop for. GPUs pose an alternative for fast performing digital signal processing. The work in this thesis is an analysis and implementation of a GPU version of a digital signal processing chain provided by SAAB. Through an iterative process of development and testing, a final implementation was achieved. Two benchmarks, both comprised of 4.2 M test samples, were made to compare the CPU implementation with the GPU implementation. The benchmark was run on three different platforms: a desktop computer, a NVIDIA Jetson AGX Xavier and a NVIDIA Jetson TX2. The results show that the parallelized version can reach several magnitudes higher throughput than the CPU implementation. GPU Signal Processing Pulse detection CPU CUDA Jetson NVIDIA Parallelize Digital Processing Computer Engineering Datorteknik

Search results