91 |
Parallelizing Digital Signal Processing for GPUEkstam Ljusegren, Hannes, Jonsson, Hannes January 2020 (has links)
Because of the increasing importance of signal processing in today's society, there is a need to easily experiment with new ways to process signals. Usually, fast-performing digital signal processing is done with special-purpose hardware that are difficult to develop for. GPUs pose an alternative for fast performing digital signal processing. The work in this thesis is an analysis and implementation of a GPU version of a digital signal processing chain provided by SAAB. Through an iterative process of development and testing, a final implementation was achieved. Two benchmarks, both comprised of 4.2 M test samples, were made to compare the CPU implementation with the GPU implementation. The benchmark was run on three different platforms: a desktop computer, a NVIDIA Jetson AGX Xavier and a NVIDIA Jetson TX2. The results show that the parallelized version can reach several magnitudes higher throughput than the CPU implementation.
|
92 |
Att procedurellt generera ett 2D landskap parallellt på GPU vs seriellt på CPU / To procedurally generate a 2D landscape in parallell on the GPU vs serially on the CPUWahlberg, Björn January 2019 (has links)
Procedurellt genererat innehåll, PCG,förekommer väldigt ofta i spel nu för tiden, mycket för att öka återspelbarheten i ett spel. Några populära exempel på spel som utnyttjar PCG är Terraria(2011) och Minecraft(2011). I takt med att hårdvara blir mer och mer kraftfull så ökar även kraven på spelen som utnyttjar teknikerna eftersom att det går att generera innehåll i realtid. Men finns det outnyttjat potential i grafikkortet? Trenden av ökningen av klockfrekvensen på processorer har reducerats på senare tid, för att istället ersättas av ett större antal kärnor. Här så kan parallellisering av programkod utnyttjas för att utvinna mer ur datorns hårdvara. Ett teknologi-orienterat experiment att utfördes på först en seriell CPUlösning, och sedan en parallell GPUlösning för att undersöka hur lång tid varje metod tog. Detta skedde på varierande stora kartor för att kunna fastställa om det fanns ett samband mellan storlek och tid. Genomförandet använde sig av SFML biblioteket för att implementera GPU varianten där en fragment shader användes för att utföra alla parallella uträkningar för kartgenreringen. CPU metoden använde samma tekniker som GPU metoden, fast utan någon parallellisering. Båda teknikerna validerades genom att använda SFML för att rita ut kartorna som de genererar med enkelgrafik.
|
93 |
Effective and Accelerated Informative Frame Filtering in Colonoscopy Videos Using Graphic Processing UnitsKarri, Venkata Praveen 08 1900 (has links)
Colonoscopy is an endoscopic technique that allows a physician to inspect the mucosa of the human colon. Previous methods and software solutions to detect informative frames in a colonoscopy video (a process called informative frame filtering or IFF) have been hugely ineffective in (1) covering the proper definition of an informative frame in the broadest sense and (2) striking an optimal balance between accuracy and speed of classification in both real-time and non real-time medical procedures. In my thesis, I propose a more effective method and faster software solutions for IFF which is more effective due to the introduction of a heuristic algorithm (derived from experimental analysis of typical colon features) for classification. It contributed to a 5-10% boost in various performance metrics for IFF. The software modules are faster due to the incorporation of sophisticated parallel-processing oriented coding techniques on modern microprocessors. Two IFF modules were created, one for post-procedure and the other for real-time. Code optimizations through NVIDIA CUDA for GPU processing and/or CPU multi-threading concepts embedded in two significant microprocessor design philosophies (multi-core design and many-core design) resulted a 5-fold acceleration for the post-procedure module and a 40-fold acceleration for the real-time module. Some innovative software modules, which are still in testing phase, have been recently created to exploit the power of multiple GPUs together.
|
94 |
Scalability of Kubernetes Running Over AWS - A Performance Study while deploying CPU intensive application containersMOGALLAPU, RAJA January 2019 (has links)
Background: Nowadays lot of companies are enjoying the benefits of kubernetes by maintaining their containerized applications over it. AWS is one of the leading cloud computing service providers and many well-known companies are their clients. Many researches have been conducted on kubernetes, docker containers, cloud computing platforms but a confusion exists on how to deploy the applications in Kubernetes. A research gap about the impact created by CPU limits and requests while deploying the Kubernetes application can be found. So, through this thesis I want to analyze the performance of the CPU intensive containerized application. It will help many companies avoid the confusion while deploying their applications over kubernetes. Objectives: We measure the scalability of kubernetes under CPU intensive containerized application running over AWS and we can study the impact created by changing CPU limits and requests while deploying the application in Kubernetes. Methods: we choose a blend of literature study and experimentation as methods to conduct the research. Results and Conclusion: From the experiments it is evident that the application performs better when we allocate more CPU limits and less CPU requests when compared to equal CPU requests and CPU limits in the deployment file. CPU metrics collected from SAR and Kubernetes metrics server are similar. It is better to allocate pods with more CPU limits and CPU requests than with equal CPU requests and CPU limits for better performance. Keywords: Kubernetes, CPU intensive containerized application, AWS, Stress-ng.
|
95 |
React Native vs. Flutter : A performance comparison between cross-platform mobile application development frameworksTollin, Gustav, Marcus, Lidekrans January 2023 (has links)
This study compares the performance of two popular cross-platform mobile application development frameworks, Flutter and React Native. As the number of mobile users continues to grow, the ability to target multiple platforms using a single codebase is increasingly important for developers and companies. We conducted three manual UI tests; scrolling through a list, testing the camera, and filtering a large dataset to measure the performance of the frameworks in terms of CPU usage, memory usage, and janky frames on an Android device. The results indicate that Flutter may provide better performance in specific situations when compared to React Native. The study contributes to the existing research by providing additional insights into the performance of these frameworks under specific test scenarios.
|
96 |
Performance comparison between OOD and DOD with multithreading in gamesWingqvist, David, Wickström, Filip January 2022 (has links)
Background. The frame rate of a game is important for both the end-user and the developer. Maintaining at least 60 FPS in a PC game is the current standard, and demands for efficient game applications rise. Currently, the industry standard within programming is to use Object-Oriented Design (OOD). But with the trend of larger sized games, this frame rate might not be maintainable using OOD. A design pattern that mitigates this is the Data-Oriented Design (DOD) which focuses on utilizing the CPU and memory efficiently. These design patterns differ in how they handle the data associated with them. Objectives. In this thesis, two games were created with two versions that used either OOD or DOD. The first game had multithreading included. New hardware utilizes several CPU cores, therefore, this thesis compares both singlethreaded and multithreaded versions of these design patterns.Methods. Experiments were made to measure the execution time and cache misses on the CPU. Each experiment started with a baseline that was gradually increased to stress the systems under test.Results. The results gathered from the experiments showed that the sections of the code that used DOD were significantly faster than OOD. DOD also had a better affinity with multithreading and was able to achieve at certain parts up to 13 times the speed of equivalent conditioned OOD. In the special case comparison DOD, even though it had larger objects, proved to be faster than OOD.Conclusions. DOD has shown to be significantly faster in execution time with fewer cache misses compared to OOD. Using multithreading for DOD presented to be the most efficient.
|
97 |
Performance Modeling of OpenStack ControllerSamadi Khah, Pouya January 2016 (has links)
OpenStack is currently the most popular open source platform for Infrastructure as a Service (IaaS) clouds. OpenStack lets users deploy virtual machines and other instances, which handle different tasks for managing a cloud environment on the fly. A lot of cloud platform offerings, including the Ericsson Cloud System, are based on OpenStack. Despite the popularity of OpenStack, there is currently a limited understanding of how much resource is consumed/needed by components of OpenStack under different operating conditions such as number of compute nodes, number of running VMs, the number of users and the rate of requests to the various services. The master thesis attempts to model the resource demand of the various components of OpenStack in function of different operating condition, identify correlations and evaluate how accurate the predictions are. For this purpose, a physical OpenStack is setup with one strong controller node and eight compute nodes. All the experiments and measurements were on virtual OpenStack components on top of the main physical one. In conclusion, a simple model is generated for idle behavior of OpenStack, starting and stopping a Virtual Machine (VM) API calls which predicts the total CPU utilization based on the number of Compute Nodes and VMs.
|
98 |
Efficient Search for Cost-Performance Optimal CachesLima-Engelmann, Tobias January 2024 (has links)
CPU cache hierarchies are the central solution in bridging the memory wall. A proper understanding of how to trade-off their high cost against performance can lead to cost-savings without sacrificing performance.Due to the combinatorial nature of the problem, there exist a large number of configurations to investigate, making design space exploration slow and cumbersome. To improve this process, this Thesis develops and evaluates a model for optimally trading-off cost and performance of CPU cache hierarchies, named the Optimal Cache Problem (OCP), in the form of a Non-linear Integer Problem. A second goal of this work is the development of an efficient solver for the OCP, which was found to be a branch & bound algorithm and proven to function correctly. Experiments were conducted to empirically analyse and validate the model and to showcase possible use-cases. There, it was possible to ascribe the model outputs on measurable performance metrics. The model succeeded in formalising the inherent trade-off between cost and performance in a way that allows for an efficient and complete search of the configuration space of possible cache hierarchies. In future work, the model needs to be refined and extended to allow for the simultaneous analysis of multiple programs.
|
99 |
<b>Accelerating Physical design Algorithms using CUDA</b>Abhinav Agarwal (17623890) 13 December 2023 (has links)
<p dir="ltr">The intricate domain of chip design encompasses the creation of intricate blueprints for integrated circuits (ICs). Algorithms, pivotal in this realm, assume the role of optimizing IC performance and functionality. This thesis delves into the utilization of algorithms within chip design, spotlighting their potential to amplify design process efficiency and efficacy. Notably, this study undertakes a comprehensive comparison of algorithmic performances on both Central Processing Units (CPUs) and Graphics Processing Units (GPUs). A cornerstone application of algorithms in chip design lies in logic synthesis, which transmutes a high-level circuit description into a silicon-compatible, low-level representation. By minimizing gate requisites, curtailing power consumption, and bolstering performance, algorithms serve as architects of optimized logic synthesis. Furthermore, the arena of physical design harnesses algorithms to translate logical designs into physically realizable layouts on silicon wafers. This involves meticulous considerations like routing congestion and power efficiency. Furthermore, this thesis adopts a thorough approach by extensively exploring the implementation intricacies of two pivotal physical design algorithms. The Kernighan-Lin Partitioning Algorithm is prominently featured for optimizing Placement and Partitioning, while Lee’s Algorithm provides valuable insights for enhancing Routing. Through a meticulous comparison of dataset efficiency and run-time across both hardware platforms, noteworthy insights have emerged. In KL Algorithm, datasets categorized as small (with sizes < 105), the CPU demonstrates a 1.2X faster processing speed compared to the GPU. However, as dataset sizes surpass this threshold, a distinct trend emerges: while GPU run times remain relatively consistent, CPU run times undergo a threefold increase at select points. In the case of Lee’s Algorithm, the CPU demonstrated superior execution time despite having fewer cores and threads than the GPU. This can be attributed to the inherently sequential nature of Lee’s Algorithm, where each element depends on the preceding one, aligning with the CPU's strength in handling sequential tasks. This thesis embarks on a comprehensive analytical journey, delving into the nuanced interplay between these contrasting aspects.</p>
|
100 |
Evaluation of FPGA-based High Performance Computing PlatformsFrick-Lundgren, Martin January 2023 (has links)
High performance computing is a topic that has risen to the top in the era ofdigitalization, AI and automation. Therefore, the search for more cost and timeeffective ways to implement HPC work is always a subject extensively researched.One part of this is to have hardware that is capable to improve on these criteria. Different hardware usually have different code languages to implement theseworks though, cross-platform solution like Intel’s oneAPI framework is startingto gaining popularity.In this thesis, the capabilities of Intel’s oneAPI framework to implement andexecute HPC benchmarks on different hardware platforms will be discussed. Using the hardware available through Intel’s DevCloud services, Intel’s Xeon Gold6128, Intel’s UHD Graphics P630 and the Arria10 FPGA board were all chosento use for implementation. The benchmarks that were chosen to be used wereGEMM (General Matrix Multiplication) and BUDE (Bristol University DockingEngine). They were implemented using DPC++ (Data Parallel C++), Intel’s ownSYCL-based C++ extension. The benchmarks were also tried to be improved uponwith HPC speed-up methods like loop unrolling and some hardware manipulation.The performance for CPU and GPU were recorded and compared, as the FPGAimplementation could not be preformed because of technical difficulties. Theresults are good comparison to related work, but did not improve much uponthem. This because the hardware used is quite weak compared to industry standard. Though further research on the topic would be interesting, to compare aworking FPGA implementation to the other results and results from other studies. This implementation also probably has the biggest improvement potential,so to see how good one could make it would be interesting. Also, testing someother more complex benchmarks could be interesting.
|
Page generated in 0.0278 seconds