Spelling suggestions: "subject:"ppu"" "subject:"upu""
151 |
Investigating the effect of implementing Data-Oriented Design principles on performance and cache utilizationNyberg, Frank January 2021 (has links)
Game engines process a lot of data under strict deadlines. Therefore, measures to increase performance are important in this area. Data-Oriented Design (DOD) promotes principles that are meant to increase performance by better cache utilization. The purpose of this thesis is to examine a selection of these principles to give a better understanding of how DOD affects CPU time and the rate of cache misses, with focus on the area of game development. More specifically, the principles examined are removal of run-time polymorphism, iteration over contiguous data, and lowering the amount of data in hot loops. Also, the Entity-Component-System pattern is examined, which is based upon DOD principles. The approach was to first present a theoretical background on the subject, and then to conduct tests by implementing a simulation of movement and collision detection utilizing said principles. The tests were written in C++ and executed on an Intel Core i7 4770k with no rendering. CPU time was measured in updated entities per μs, and cache utilization was measured in the form of cache miss rate. The results showed that the DOD principles did increase performance. Cache miss rate was also lower, with the exception of when removing run-time polymorphism. The conclusion is that Data-Oriented Design, used in game development, is likely to result in better performance, mostly as a result of better cache utilization.
|
152 |
Neuronové sítě s ozvěnou stavu pro předpověď vývoje finančních trhů / Echo state neural network for stock market predictionPospíchal, Ondřej January 2018 (has links)
This thesis deals with an echo state network and with acceleration of its learning by implementing the echo state network on a graphics processor. The theoretical part consists of the description of neural networks and some selected types of neural networks, on which is based the echo state network. After that, there are some other algorithms described used for time series analysis and last but not least, the tools that were used in the practical part of the thesis were briefly described. The practical part describes the creation of the accelerated version of the echo state network. After that, there is described the creation of input data sets of real financial indexes, on which the echo state network and the other algorithmns were then tested. By analyzing this accelerated version it was found that its learning speed did not reach the theoretical expectations. The accelerated version works slower, but with greater precision. By analyzing the results of the measurement of the other algorithmns it was found that the highest precision is achieved by solutions based on the neural network principle.
|
153 |
Prostředí pro spouštění testů kompatibility RISC-V / Framework for RISC-V Compliance Tests ExecutionSkála, Milan January 2018 (has links)
This thesis focuses on design and implementation of a testing framework for different implementation types of RISC-V architecture. It describes history, instruction set and processor modes which are supported by this architecture. Further, the current methodologies and testing frameworks implemented in Python are discussed. Emphasis is placed on the analysis of compliance tests. In the practical part, the design and implementation of a framework for execution of compliance tests for models, which can be implemented in various ways, either as an ISA simulator or a hardware model, is done. The secondary aim of the thesis is to create a graphical user interface for quick and easy test configuration. Finally, the results are evaluated and the possibilities of further development are discussed.
|
154 |
Brustschmerzambulanz - Chest Pain Unit am Herzzentrum der Universität Leipzig Eine retrospektive Analyse für das Jahr 2009: Brustschmerzambulanz - Chest Pain Unitam Herzzentrum der Universität LeipzigEine retrospektive Analyse für das Jahr 2009Heumesser, Christian Eugen 24 September 2015 (has links)
Brustschmerz ist ein häufiges Symptom. Er bedarf einer schnellen Differenzierung zum Ausschluss lebensbedrohlicher Erkrankungen wie zum Beispiel eines Myokardinfarktes oder einer Aortendissektion. Hierzu wurden Chest Pain Units (CPU) und Brustschmerzambulanzen (BSA) gegründet. Im Jahr 2008 führte die Deutsche Gesellschaft für Kardiologie Mindeststandards für deren Ausstattung und Struktur ein. 2009 wurde die zwei Jahre zuvor gegründete BSA am Herzzentrum Leipzig (HZL) zertifiziert.
In dieser Arbeit wurde eine retrospektive Analyse von 2.220 Patientendaten aus dem Jahr 2009 durchgeführt. Bei steigenden Patientenzahlen wurde die BSA am häufigsten montags sowie in den Mittagsstunden aufgesucht. Dabei zeigte die Symptomdauer eine Spannweite von wenigen Minuten bis zu mehreren Jahren. Der größte Anteil mit 19,1 % der Patienten kam mit einer Symptomdauer zwischen einer Woche und einem Monat, 11,6 % der Patienten innerhalb von sechs Stunden. Symptome und Begleiterkrankungen boten eine große Variabilität. 24,7 % der Patienten stellten sich ohne Schmerzen vor. 66,4 % der Patienten verblieben ambulant und durchschnittlich verbrachten die Patienten 4,8 Stunden in der BSA. 59,9 % der Patienten ohne primär ersichtliche, kardiale Symptomkonstellation zeigten eine kardiale Erkrankung. Selbsteinweiser und ärztlich eingewiesene Patienten sowie stationäre und ambulante Verläufe zeigten Unterschiede in Symptomen, Begleiterkrankungen, Untersuchungen, Interventionen und Entlassungsdiagnosen. 26,9 % der Patienten erhielten eine Herzkatheteruntersuchung. Davon erfolgte bei 31,4 % eine Intervention, in 62,4 % der Fälle eine medikamentöse Therapie. Eine KHK war bei 19,1 % der Patienten die Entlassungsdiagnose. In der Hälfte der Fälle wurde damit erstmals diese Diagnose gestellt. Aus Symptomen, Symptomdauer und kardiovaskulären Risikofaktoren wurde der Symptome-30-2-CRF-Score abgeleitet, welcher bei ≤ 9 Punkten eine KHK ablehnt und bei Werten ≥ 14 Punkten den Verdacht auf eine KHK bekräftigt.
|
155 |
Parallellisering av Sliding Extensive Cancellation Algorithm (ECA-S) för passiv radar med OpenMP / Parallelization of Sliding Extensive Cancellation Algorithm (ECA-S) for Passive Radar with OpenMPJohansson Hultberg, Andreas January 2021 (has links)
Software parallelization has gained increasing interest since the transistor manufacturing of smaller chips within an integrated circuit has begun to stagnate. This has led to the development of new processing units with an increasing number of cores. Parallelization is an optimization technique that allows the user to utilize parallel processes in order to streamline algorithm flows. This study examines the performance benefits that a passive bistatic radar system can obtain by parallelization and code refactorization. The study focuses mainly on investigating the use of parallel instructions within a shared memory model on a Central Processing Unit (CPU) with the use of an application programming interface, namely OpenMP. Quantitative data is collected to compare the runtime of the most central algorithm in the passive radar system, namely the Extensive Cancellation Algorithm (ECA). ECA can be used to suppress unwanted clutter in the surveillance signal, which purpose is to create clear target detections of airborne objects. The algorithm on the other hand is computationally demanding, which has led to the development of faster versions such as the Sliding ECA (ECA-S). Despite the ongoing development, the algorithm is still relatively computationally demanding which can lead to long execution times within the radar system. In this study, a MATLAB implementation of ECA-S is transformed to C in order to take advantage of the fast execution time of the procedural programming language. Parallelism is introduced within the converted algorithm by the use of Intel's thread methodology and then applied within two different operating systems. The study shows that a speedup can be obtained, in the programming language C, by a factor of 24 while still ensuring the correctness of the results. The results also showed that code refactorization of a MATLAB algorithm could result in 73% faster code and that C-MEX implementations are twice as slow as a C-implementation. Finally, the study pointed out that real-time can be achieved for a passive bistatic radar system with the use of the programming language C and by using parallel instructions within a shared memory model on a CPU. / Parallellisering av mjukvara har fått ett ökat intresse sedan transistortillverkningen av mindre chip inom en integrerade krets har börjat att stagnera. Detta har lett till utveckling av moderna processorer med ett ökande antal av kärnor. Parallellisering är en optimeringsteknik vilken tillåter användaren att utnyttja parallella processer till att effektivisera algoritmflöden. Denna studie undersöker de tidsmässiga fördelar ett passivt bistatiskt radarsystem kan erhålla genom att, bland annat tillämpa parallellisering och omformning. Studien fokuserar främst på att undersöka användandet av parallella trådar inom det delade minnesutrymmet på en centralprocessor (CPU), detta med hjälp av applikationsprogrammeringsgränssnittet OpenMP. Kvantitativa jämförelser tas fram med hjälp av en av de mest centrala algoritmerna inom det passiva radarsystemet, nämligen Extensive Cancellation Algorithm (ECA). ECA kan används till att undertrycka oönskat klotter i övervakningssignalen, vilket har till syfte att skapa klara måldetektioner av luftföremål. Algoritmen är däremot beräkningstung, vilket har medfört utveckling av snabbare versioner som exempelvis Sliding ECA (ECA-S). Trots utvecklingen är algoritmen fortfarande relativt beräkningstung och kan medföra en lång exekeveringstid inom hela radarsystemet. I denna studie transformeras en MATLAB-implementation av ECA-S till C för att kunna dra nytta av den snabba exekeveringstiden i det procedurella programmeringsspråket. Parallellism införs inom den transformerade algoritmen med hjälp av Intels trådmetodik och appliceras sedan inom två olika operativsystem. Studien visar på en tidsmässig optimering i C med upp till 24 gånger snabbare exekeveringstid och bibehållen noggrannhet. Resultaten visade även på att en enklare omformning av en MATLAB-algoritm kunde resultera till 73% snabbare kod och att en C-MEX-implementation är dubbelt så långsam i jämförelse med en C-implementering. Slutligen pekade studien på att realtid kan uppnås för ett passivt bistatiskt radarsystem vid användandet av programmeringsspråket C och med utnyttjandet av parallella instruktioner inom det delade minnet på en CPU.
|
156 |
IMPLEMENTATION OF OXYFUEL COMBUSTION IN A WASTE INCINERATION CHP PLANT : A Techno-Economic AssessmentSaleh, Mostafa, Hedén Sandberg, Anton January 2021 (has links)
Global energy demand is predicted to rise in the coming decades, necessitating a shift to renewable energy sources to mitigate greenhouse gas emissions. However, due to the inability to supply renewable energy around the clock, it is estimated that only by adding an important technology, carbon capture and storage (CCS), it could be possible to reduce 80% of the 1990s greenhouse gas emissions. CCS aims to reduce anthropogenic carbon emissions by capturing CO2 from flue gases, transporting, and permanently storing or reutilizing industrially. The CCS approach includes three technologies: post-combustion capture, pre-combustion capture, and oxyfuel combustion, with the latter being the emphasis of this thesis. Based on the case study of Mälarenergi’s Refused-derived waste-fired CHP plant, this thesis investigates the viability of converting existing non-fossil fueled CHP plants to oxyfuel combustion. A thorough technical investigation based on analyzing the impact of oxyfuel combustion on system performance was conducted through system modeling using a process simulator, Aspen plus. The model in this thesis considers the development of an air separation unit (ASU), a CHP plant, and a cryogenic CO2 purification unit (CPU). All of which are validated through calibration and comparison with real-world data and similar work. To investigate the influence of employing oxyfuel combustion on the generation of both heat and electricity, two different scenarios were comprised, including recirculating flue gas before and after flue gas condensation. In addition, an analysis of the oxygen purity was conducted to assess the most optimal parameters with the least impact on system performance. Moreover, a detailed eco- nomic assessment comprising the costs of integrating oxyfuel combustion was also conducted. The findings of this thesis show that integrating waste incineration CHP plants with oxyfuel combustion for CO2 capture entails promising features under the condition of 97% oxygen purity and a flue gas recirculation system taking place after flue gas condensation. This is owing to (i) modest imposed energy penalty of approximately 8.7%, (ii) high CO2 recovery ratio, around 92.4%, (iii) total investment cost of approximately 554 M$ during a 20-year lifetime, and (iv) cost of captured CO2 of around 76 $/ton. Aside from system modeling, this thesis pre- sents an overview of the current state-of-the-art technology on the different separation and capture mechanisms. It is important to highlight that the goal of this thesis is not to provide a comprehensive review but rather to present an overall picture of the maturity of the different mechanisms. The findings point to the cryogenic separation mechanism as the most mature technology for both oxygen production and capturing of CO2 during oxyfuel combustion.
|
157 |
Hybridapplikationer med Apache Cordova och Flutter : En jämförande studie ur ett prestandaperspektiv / Hybrid Applications with Apache Cordova and Flutter : A comparative performance studyMalki, Ara January 2021 (has links)
Idag ser man ett ökat intresset för applikationsutveckling. Den hybrida applikationsutvecklingen är något företag inte kan se förbi längre då dess fördelar väger tungt. Fördelarna med hybrida applikationer är bland annat det faktum att de använder en kodbas för flera plattformar. Detta i sin tur leder till en kraftig tids- och resursbesparing under utvecklingsfasen. De två hybrida ramverken som valts ut i ett jämförande syfte är Apache Cordova och Flutter, och syftet är att identifiera vilket ramverk presterar bäst ur ett prestandaperspektiv. Den frågeställning som besvaras i studien är: Vilket ramverk för att skapa en hybrid applikation av Apache Cordova och Flutter, är att rekommendera ur ett prestandaperspektiv? Studien mäter prestandarelaterade variabler, dessa är exekveringshastighet, uppstartstid, CPU-, RAM- och batterianvändning. Studien gör detta genom skapandet av två funktionellt identiska applikationer i respektive ramverk. Datainsamlingen görs genom Android Profiler, Logcat samt egen implementerad kod. Resultatet presenterar mätningar i respektive prestandarelaterad variabel. Det som redovisas i det totala resultatet visar på att Flutter är det ramverk som presterar snabbare men även är det ramverk som har en högre resursanvändning. / Today we see an increased trend around the interest in application development, and in our society there are over three billion smartphone users. Hybrid application development is something companies can no longer overlook because the benefits weigh heavily. The benefits with hybrid applications include the fact that it uses a multi-platform code base. This in turn leads to a significant saving of time and resources during the development process. The two hybrid frameworks chosen for a comparative purpose are Apache Cordova and Flutter, and the purpose is to identify which framework performs the best out of a performance perspective. The question answered in the study is: What framework for creating a hybrid application between Apache Cordova and Flutter, is recommended out of a performance perspective? The study measures performance-related variables, these are execution speed, start-up time, CPU, RAM and battery usage. The study does this by creating two functionally identical applications in each framework. Data collection takes place via Android Profiler, Logcat and own implemented code. The result shows measurements in each performance-related variable. Not only does the overall results show that Flutter is the faster performing framework but also the framework with a higher use of resources.
|
158 |
Heat Transfer Characteristics of Natural Convection within an Enclosure Using Liquid Cooling System.Gdhaidh, Farouq A.S. January 2015 (has links)
In this investigation, a single phase fluid is used to study the coupling between natural convection heat transfer within an enclosure and forced convection through computer covering case to cool the electronic chip. Two working fluids are used (water and air) within a rectangular enclosure and the air flow through the computer case is created by an exhaust fan installed at the back of the computer case. The optimum enclosure size configuration that keeps a maximum temperature of the heat source at a safe temperature level (85℃) is determined. The cooling system is tested for varying values of applied power in the range of 15−40𝑊.
The study is based on both numerical models and experimental observations. The numerical work was developed using the commercial software (ANSYS-Icepak) to simulate the flow and temperature fields for the desktop computer and the cooling system. The numerical simulation has the same physical geometry as those used in the experimental investigations. The experimental work was aimed to gather the details for temperature field and use them in the validation of the numerical prediction.
The results showed that, the cavity size variations influence both the heat transfer process and the maximum temperature. Furthermore, the experimental results
ii
compared favourably with those obtained numerically, where the maximum deviation in terms of the maximum system temperature, is within 3.5%. Moreover, it is seen that using water as the working fluid within the enclosure is capable of keeping the maximum temperature under 77℃ for a heat source of 40𝑊, which is below the recommended electronic chips temperature of not exceeding 85℃. As a result, the noise and vibration level is reduced. In addition, the proposed cooling system saved about 65% of the CPU fan power.
|
159 |
Predicting a business application's cloud server CPU utilization using the machine learning model LSTMNääs Starberg, Filip, Rooth, Axel January 2021 (has links)
Cloud Computing sees increased adoption as companies seek to increase flexibility and reduce cost. Although the large cloud service providers employ a pay-as-you-go pricing model and enable customers to scale up and down quickly, there is still room for improvement. Workload in the form of CPU utilization often fluctuates which leads to unnecessary cost and environmental impact for companies. To help mitigate this issue, the aim of this paper is to predict future CPU utilization using a long short-term memory (LSTM) machine learning model. By predicting utilization up to 30 minutes into the future, companies are able to scale their capacity just in time and avoid unnecessary cost and damage to the environment. The study is divided into two parts. The first part analyses how well the LSTM model performs when predicting one step at a time compared with a state-of-the-art model. The second part analyses the accuracy of the LSTM when making predictions up to 30 minutes into the future. To allow for an objective analysis of results, the LSTM is compared with a standard RNN, which is similar to the LSTM in its inherit algorithmic structure. To conclude, the results suggest that LSTM may be a useful tool for reducing cost and unnecessary environmental impact for business applications hosted on a public cloud. / Användandet av molntjänster ökar bland företag som önskar förbättrad flexibilitet och sänkta kostnader. De stora molntjänstleverantörerna använder en prismodell där kostnaden är direkt kopplad till användningen, och låter kunderna snabbt ställa om sin kapacitet, men det finns ändå förbättringsmöjligheter. CPU-behoven fluktuerar ofta vilket leder till meningslösa kostnader och onödig påverkan på klimatet när kapacitet är outnyttjad. För att lindra detta problem används i denna rapport en LSTM maskininlärningsmodell för att förutspå framtida CPU-utnyttjande. Genom att förutspå utnyttjandet upp till 30 minuter in i framtiden hinner företag ställa om sin kapacitet och undvika onödig kostnad och klimatpåverkan. Arbetet ¨ar uppdelat i två delar. Först en del där LSTM-modellen förutspår ett tidssteg åt gången. Därefter en del som analyserar träffsäkerheten för LSTM flera tidssteg in i framtiden, upp till 30 tidssteg. För att möjliggöra en objektiv utvärdering så jämfördes LSTM-modellen med ett standard recurrent neural network (RNN) vilken liknar LSTM i sin struktur. Resultaten i denna studie visar att LSTM verkar vara ¨överlägsen RNN, både när det gäller att förutspå ett tidssteg in i framtiden och när det gäller flera tidssteg in i framtiden. LSTM-modellen var kapabel att förutspå CPU-utnyttjandet 30 minuter in i framtiden med i hög grad bibehållen träffsäkerhet, vilket också var målet med studien. Sammanfattningsvis tyder resultaten på att denna LSTM-modell, och möjligen liknande LSTM-modeller, har potential att användas i samband med företagsapplikationer då man önskar att reducera onödig kostnad och klimatpåverkan.
|
160 |
Low-power high-resolution image detectionMerchant, Caleb 09 August 2019 (has links)
Many image processing algorithms exist that can accurately detect humans and other objects such as vehicles and animals. Many of these algorithms require large amounts of processing often requiring hardware acceleration with powerful central processing units (CPUs), graphics processing units (GPUs), field programmable gate arrays (FPGAs), etc. Implementing an algorithm that can detect objects such as humans at longer ranges makes these hardware requirements even more strenuous as the numbers of pixels necessary to detect objects at both close ranges and long ranges is greatly increased. Comparing the performance of different low-power implementations can be used to determine a trade-off between performance and power. An image differencing algorithm is proposed along with selected low-power hardware that is capable of detected humans at ranges of 500 m. Multiple versions of the detection algorithm are implemented on the selected hardware and compared for run-time performance on a low-power system.
|
Page generated in 0.0384 seconds