• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 92
  • 30
  • 11
  • 11
  • 8
  • 5
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 186
  • 75
  • 52
  • 40
  • 29
  • 28
  • 24
  • 23
  • 23
  • 21
  • 19
  • 19
  • 18
  • 18
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

Geometrické transformace obrazu / Geometrical Image Transforms

Němeček, Petr Unknown Date (has links)
This master's thesis deals with acceleration of geometrical image transforms using the GPU and NVIDIA (R) CUDA TM architecture. Time critical parts of the code are moved on the GPU and executed in parallel. One of the results is a demonstrational application for performance comparison of both architectures: the CPU, and GPU in combination with the CPU. As a reference implementation, there are used highly optimized routines from the OpenCV library, made by the Intel company.
132

Jämförelse av bluetooth codecs med fokus på batteriladdning, CPU användning och räckvidd / Comparison of bluetooth codecs with focus on battery drainage, CPU usage and range

Larsson, Daniel, Ly Khuu, Kevin January 2022 (has links)
With the constant advances in technology, people are using more wireless products, such as earphones or speakers whereas many of them use Bluetooth. With the current advances in Bluetooth technology, consumers and manufacturers have a hard time keeping up with the pace. Thus, when it comes to factors such as battery drainage, CPU usage, and range there is missing knowledge. This study is conducted to find out what effect the different codecs have on these factors, by comparing the two most commonly used codecs SBC and AAC. Using a codec that has lower battery drainage whilst still having a good enough audio quality can have a positive impact on our society and environment. Needing less electricity, lessens the overall energy consumption and directly lowers the energy production. Our results indicate that there is a significant difference in CPU usage but not in battery drainage or range.
133

Spatial prestandaprofilering i spel : Lokalisering av prestandaproblem i spelnivåer / Spatial performance profiling in games : Localisation of performance problems in game levels

Chanane, Karim January 2022 (has links)
Profilering är ett underutforskat område inom spelutveckling trots de höga prestandakraven och därmed optimeringsbehov i moderna spel. Detta arbete ämnar underlätta profileringsarbete genom att spatialt visualisera profileringsdata i form av värmekartor och dessutom visualisera GPU- och CPU-bundenhet. Målet med detta arbete var att avancera profileringsområdet inom spelutveckling, att uppmana vidareutveckling av profileringsverktyg för att dra ner på kostnader och tid som i ställetkan spenderas på implementering av ny funktionalitet. Projektet har bidragit till området automatiserad profilering, genom att ta fram en metod för att underlätta tolkningsaspekten av profileringsarbetet och kan därmed bidra till att göra profilering mer tillgängligt för utvecklare som saknar djupgående kunskap kring mjuk- och hårdvara. / <p>Det finns övrigt digitalt material (t.ex. film-, bild- eller ljudfiler) eller modeller/artefakter tillhörande examensarbetet som ska skickas till arkivet.</p><p>There are other digital material (eg film, image or audio files) or models/artifacts that belongs to the thesis and need to be archived.</p>
134

Resource optimization of edge servers dealing with priority-based workloads by utilizing service level objective-aware virtual rebalancing

Shahid, Amna 08 August 2023 (has links) (PDF)
IoT enables profitable communication between sensor/actuator devices and the cloud. Slow network causing Edge data to lack Cloud analytics hinders real-time analytics adoption. VRebalance solves priority-based workload performance for stream processing at the Edge. BO is used in VRebalance to prioritize workloads and find optimal resource configurations for efficient resource management. Apache Storm platform was used with RIoTBench IoT benchmark tool for real-time stream processing. Tools were used to evaluate VRebalance. Study shows VRebalance is more effective than traditional methods, meeting SLO targets despite system changes. VRebalance decreased SLO violation rates by almost 30% for static priority-based workloads and 52.2% for dynamic priority-based workloads compared to hill climbing algorithm. Using VRebalance decreased SLO violations by 66.1% compared to Apache Storm's default allocation.
135

A Performance comparison Between ASP.NET Core and Express.js for creating Web APIs

Karlsson, Oliver January 2021 (has links)
Modern web applications are growing in complexity and becoming more widely used. Using frameworks to build APIs is a popular way for both hobby developers and businesses to speed up development time and save costs. With this dependence on frameworks to be the foundation for potentially large applications comes the need to understand their performance qualities and which areas they are best suited for. This study compares the performance of the two similarly popular frameworks ASP.NET Core and Express.js, when used together with a MySQL Database to build Web APIs. This was done by building two different API implementations in each framework, one employing a RESTful approach and the other using the new querying language GraphQL. Experiments were run where the peak CPU usage, peak memory usage and response times were measured.The results of the experiments were that in a RESTful API, ASP.NET Core is faster at serving requests during lower loads whereas Express.js outperforms ASP.NET Core when faced with a higher amount of concurrent requests that fetch a lot of data. In a GraphQL API Express.js was able to perform similarly or better in all cases in terms of response times and resource usage compared to ASP.NET Core.
136

Establishing Effective Techniques for Increasing Deep Neural Networks Inference Speed / Etablering av effektiva tekniker för att öka inferenshastigheten i djupa neurala nätverk

Sunesson, Albin January 2017 (has links)
Recent trend in deep learning research is to build ever more deep networks (i.e. increase the number of layers) to solve real world classification/optimization problems. This introduces challenges for applications with a latency dependence. The problem arises from the amount of computations that needs to be performed for each evaluation. This is addressed by reducing inference speed. In this study we analyze two different methods for speeding up the evaluation of deep neural networks. The first method reduces the number of weights in a convolutional layer by decomposing its convolutional kernel. The second method lets samples exit a network through early exit branches when classifications are certain. Both methods were evaluated on several network architectures with consistent results. Convolutional kernel decomposition shows 20-70% speed up with no more than 1% loss in classification accuracy in setups evaluated. Early exit branches show up to 300% speed up with no loss in classification accuracy when evaluated on CPUs. / De senaste årens trend inom deep learning har varit att addera fler och fler lager till neurala nätverk. Det här introducerar nya utmaningar i applikationer med latensberoende. Problemet uppstår från mängden beräkningar som måste utföras vid varje evaluering. Detta adresseras med en reducering av inferenshastigheten. Jag analyserar två olika metoder för att snabba upp evalueringen av djupa neurala näverk. Den första metoden reducerar antalet vikter i ett faltningslager via en tensordekomposition på dess kärna. Den andra metoden låter samples lämna nätverket via tidiga förgreningar när en klassificering är säker. Båda metoderna utvärderas på flertalet nätverksarkitekturer med konsistenta resultat. Dekomposition på fältningskärnan visar 20-70% hastighetsökning med mindre än 1% försämring av klassifikationssäkerhet i evaluerade konfigurationer. Tidiga förgreningar visar upp till 300% hastighetsökning utan någon försämring av klassifikationssäkerhet när de evalueras på CPU.
137

Operating System Support for Modern Applications

Yang, Ting 01 May 2009 (has links)
Computer systems now run drastically different workloads than they did two decades ago. The enormous advances in hardware power, such as processor speed, memory and storage capacity, and network bandwidth, enable them to run new kinds as well as a large number of applications simultaneously. Software technologies, such as garbage collection and multi-threading, also reshape applications and their behaviors, introducing more challenges to system resource management. However, existing general-purpose operating systems do not provide adequate support for these modern applications. These operating systems were designed over two decades ago, when garbage-collected applications were not prevalent and users interacted with systems using consoles and command lines, rather than graphical user interfaces. As a result, they fail to allow necessary coordinations among resource management components to ensure consistent performance guarantees. For example, garbage-collected applications cannot adjust themselves to maintain high throughput under dynamic memory pressure, simply because existing virtual memory managers do not collect and expose enough information to them. Furthermore, despite the increasing demand of supporting co-existing interactive applications in desktop environment, resource managers (especially memory and disk I/O) mostly focus on optimizing throughput. They each work independently, ignoring the response time requirements that the CPU scheduler attempts to satisfy. Consequently, pressure on any of these resources can significantly degrade application responsiveness. In order to deliver robust performance to these modern applications, an operating system has to coordinate its resource managers (e.g., CPU, memory, and disk I/O), as well as cooperate with resource managers in the user space, such as the garbage collector and the thread manger. To support garbage-collected applications, we present CRAMM, a system that enables them to predict an appropriate heap size using information supplied by the underlying operating system, allowing them to maintain high throughput in the face of changing memory pressure. To support highly interactive workloads, we present Redline, a system that manages CPU, memory, and disk I/O in an integrated manner. It uses lightweight specifications to drive CPU scheduling and to coordinate memory and disk I/O management to serve the needs of interactive applications. Such coordination enables it to maintain responsiveness in the face of extreme resource contention, without sacrificing resource utilization. We also show that Redline can be used to support response time sensitive multi-threaded server applications. Our experiences and extensive experiments show that we can coordinate resource managers, both inside and outside the operating system, efficiently without destroying the modularity of the existing system. Such coordination prevents resource managers from working at cross purposes, and dramatically improve the performance of applications when facing heavy resource contention, sometimes by orders of magnitude.
138

Automatic methods for distribution of data-parallel programs on multi-device heterogeneous platforms

Moreń, Konrad 07 February 2024 (has links)
This thesis deals with the problem of finding effective methods for programming and distributing data-parallel applications for heterogeneous multiprocessor systems. These systems are ubiquitous today. They range from embedded devices with low power consumption to high performance distributed systems. The demand for these systems is growing steadily. This is due to the growing number of data-intensive applications and the general growth of digital applications. Systems with multiple devices offer higher performance but unfortunately add complexity to the software development for such systems. Programming heterogeneous multiprocessor systems present several unique challenges compared to single device systems. The first challenge is the programmability of such systems. Despite constant innovations in programming languages and frameworks, they are still limited. They are either platform specific, like CUDA which supports only NVIDIA GPUs, or applied at a low level of abstraction, such as OpenCL. Application developers that design OpenCL programs must manually distribute data to the different devices and synchronize the distributed computations. These capabilities have an impact on the productivity of the developers. To reduce the programming complexity and the development time, this thesis introduces two approaches that automatically distribute and synchronize the data-parallel workloads. Another challenge is the multi-device hardware utilization. In contrast to single-device platforms, the application optimization process for a multi-device system is even more complicated. The application designers need to apply not only optimization strategies specific for a single-device architecture. They need also focus on the careful workload balancing between all the platform processors. For the balancing problem, this thesis proposes a method based on the platform model. The platform model is created with machine learning techniques. Using machine learning, this thesis builds automatically a reliable platform model, which is portable and adaptable to different platform setups, with a minimum manual involvement of the programmers.
139

Dynamic Bandwidth and Laser Scaling for CPU-GPU Heterogenous Network-on-Chip Architectures

Van Winkle, Scott E. 20 September 2017 (has links)
No description available.
140

Grafikkort till parallella beräkningar

Music, Sani January 2012 (has links)
Den här studien beskriver hur grafikkort kan användas på en bredare front änmultimedia. Arbetet förklarar och diskuterar huvudsakliga alternativ som finnstill att använda grafikkort till generella operationer i dagsläget. Inom denna studieanvänds Nvidias CUDA arkitektur. Studien beskriver hur grafikkort användstill egna operationer rent praktiskt ur perspektivet att vi redan kan programmerai högnivåspråk och har grundläggande kunskap om hur en dator fungerar. Vianvänder s.k. accelererade bibliotek på grafikkortet (THRUST och CUBLAS) föratt uppnå målet som är utveckling av programvara och prestandatest. Resultatetär program som använder GPU:n till generella och prestandatest av dessa,för lösning av olika problem (matrismultiplikation, sortering, binärsökning ochvektor-inventering) där grafikkortet jämförs med processorn seriellt och parallellt.Resultat visar att grafikkortet exekverar upp till ungefär 50 gånger snabbare(tidsmässigt) kod jämfört med seriella program på processorn. / This study describes how we can use graphics cards for general purpose computingwhich differs from the most usual field where graphics cards are used, multimedia.The study describes and discusses present day alternatives for usinggraphic cards for general operations. In this study we use and describe NvidiaCUDA architecture. The study describes how we can use graphic cards for generaloperations from the point of view that we have programming knowledgein some high-level programming language and knowledge of how a computerworks. We use accelerated libraries (THRUST and CUBLAS) to achieve our goalson the graphics card, which are software development and benchmarking. Theresults are programs countering certain problems (matrix multiplication, sorting,binary search, vector inverting) and the execution time and speedup forthese programs. The graphics card is compared to the processor in serial andthe processor in parallel. Results show a speedup of up to approximatly 50 timescompared to serial implementations on the processor.

Page generated in 0.032 seconds