Global ETD Search

141	Environnements pour l'analyse expérimentale d'applications de calcul haute performance / Environments for the experimental analysis of HPC applications. Perarnau, Swann 01 December 2011 (has links) Les machines du domaine du calcul haute performance (HPC) gagnent régulièrement en com- plexité. De nos jours, chaque nœud de calcul peut être constitué de plusieurs puces ou de plusieurs cœurs se partageant divers caches mémoire de façon hiérarchique. Que se soit pour comprendre les performances ob- tenues par une application sur ces architectures ou pour développer de nouveaux algorithmes et valider leur performance, une phase d'expérimentation est souvent nécessaire. Dans cette thèse, nous nous intéressons à deux formes d'analyse expérimentale : l'exécution sur machines réelles et la simulation d'algorithmes sur des jeux de données aléatoires. Dans un cas comme dans l'autre, le contrôle des paramètres de l'environnement (matériel ou données en entrée) permet une meilleure analyse des performances de l'application étudiée. Ainsi, nous proposons deux méthodes pour contrôler l'utilisation par une application des ressources ma- térielles d'une machine : l'une pour le temps processeur alloué et l'autre pour la quantité de cache mémoire disponible. Ces deux méthodes nous permettent notamment d'étudier les changements de comportement d'une application en fonction de la quantité de ressources allouées. Basées sur une modification du compor- tement du système d'exploitation, nous avons implémenté ces méthodes pour un système Linux et démontré leur utilité dans l'analyse de plusieurs applications parallèles. Du point de vue de la simulation, nous avons étudié le problème de la génération aléatoire de graphes orientés acycliques (DAG) pour la simulation d'algorithmes d'ordonnancement. Bien qu'un grand nombre d'algorithmes de génération existent dans ce domaine, la plupart des publications repose sur des implémen- tations ad-hoc et peu validées de ces derniers. Pour pallier ce problème, nous proposons un environnement de génération comprenant la majorité des méthodes rencontrées dans la littérature. Pour valider cet envi- ronnement, nous avons réalisé de grande campagnes d'analyses à l'aide de Grid'5000, notamment du point de vue des propriétés statistiques connues de certaines méthodes. Nous montrons aussi que la performance d'un algorithme est fortement influencée par la méthode de génération des entrées choisie, au point de ren- contrer des phénomènes d'inversion : un changement d'algorithme de génération inverse le résultat d'une comparaison entre deux ordonnanceurs. / High performance computing systems are increasingly complex. Nowadays, each compute node can contain several sockets or several cores and share multiple memory caches in a hierarchical way. To understand an application's performance on such systems or to develop new algorithms and validate their behavior, an experimental study is often required. In this thesis, we consider two types of experimental analysis : execution on real systems and simulation using randomly generated inputs. In both cases, a scientist can improve the quality of its performance analysis by controlling the environment (hardware or input data) used. Therefore, we discuss two methods to control hardware resources allocation inside a system : one for the processing time given to an application, the other for the amount of cache memory available to it. Both methods allow us to study how an application's behavior change according to the amount of resources allocated. Based on modifications of the operating system, we implemented these methods for Linux and demonstrated their use for the analysis of several parallel applications. Regarding simulation, we studied the issue of the random generation of directed acyclic graphs for scheduler simulations. While numerous algorithms can be found for such problem, most papers in this field rely on ad-hoc implementations and provide little validation of their generator. To tackle this issue, we propose a complete environment providing most of the classical generation methods. We validated this environment using big analysis campaigns on Grid'5000, verifying known statistical properties of most algorithms. We also demonstrated that the performance of a scheduler can be impacted by the generation method used, identifying a reversing phenomenon : changing the generating algorithm can reverse the comparison between two schedulers. Charge processeur Coloration de page Analyse de performance Génération de graphes Algorithmes d'ordonnancement Cpu load generation Page coloring Performance analysis Random graph generation Scheduling algorithms 004
142	Brustschmerzambulanz - Chest Pain Unit am Herzzentrum der Universität Leipzig Eine retrospektive Analyse für das Jahr 2009 Heumesser, Christian Eugen 15 October 2015 (has links) (PDF) Brustschmerz ist ein häufiges Symptom. Er bedarf einer schnellen Differenzierung zum Ausschluss lebensbedrohlicher Erkrankungen wie zum Beispiel eines Myokardinfarktes oder einer Aortendissektion. Hierzu wurden Chest Pain Units (CPU) und Brustschmerzambulanzen (BSA) gegründet. Im Jahr 2008 führte die Deutsche Gesellschaft für Kardiologie Mindeststandards für deren Ausstattung und Struktur ein. 2009 wurde die zwei Jahre zuvor gegründete BSA am Herzzentrum Leipzig (HZL) zertifiziert. In dieser Arbeit wurde eine retrospektive Analyse von 2.220 Patientendaten aus dem Jahr 2009 durchgeführt. Bei steigenden Patientenzahlen wurde die BSA am häufigsten montags sowie in den Mittagsstunden aufgesucht. Dabei zeigte die Symptomdauer eine Spannweite von wenigen Minuten bis zu mehreren Jahren. Der größte Anteil mit 19,1 % der Patienten kam mit einer Symptomdauer zwischen einer Woche und einem Monat, 11,6 % der Patienten innerhalb von sechs Stunden. Symptome und Begleiterkrankungen boten eine große Variabilität. 24,7 % der Patienten stellten sich ohne Schmerzen vor. 66,4 % der Patienten verblieben ambulant und durchschnittlich verbrachten die Patienten 4,8 Stunden in der BSA. 59,9 % der Patienten ohne primär ersichtliche, kardiale Symptomkonstellation zeigten eine kardiale Erkrankung. Selbsteinweiser und ärztlich eingewiesene Patienten sowie stationäre und ambulante Verläufe zeigten Unterschiede in Symptomen, Begleiterkrankungen, Untersuchungen, Interventionen und Entlassungsdiagnosen. 26,9 % der Patienten erhielten eine Herzkatheteruntersuchung. Davon erfolgte bei 31,4 % eine Intervention, in 62,4 % der Fälle eine medikamentöse Therapie. Eine KHK war bei 19,1 % der Patienten die Entlassungsdiagnose. In der Hälfte der Fälle wurde damit erstmals diese Diagnose gestellt. Aus Symptomen, Symptomdauer und kardiovaskulären Risikofaktoren wurde der Symptome-30-2-CRF-Score abgeleitet, welcher bei ≤ 9 Punkten eine KHK ablehnt und bei Werten ≥ 14 Punkten den Verdacht auf eine KHK bekräftigt. Brustschmerzambulanz BSA Chest Pain Unit CPU Symptome-30-2-CRF-Score Chest Pain Unit Symptoms-30-2-CRF-Score ddc:610
143	Διαχείριση κοινόχρηστων πόρων σε πολυεπεξεργαστικά συστήματα ενός ολοκληρωμένου Πετούμενος, Παύλος 06 October 2011 (has links) Στην παρούσα διατριβή προτείνονται μέθοδοι διαχείρισης των κοινόχρηστων πόρων σε υπολογιστικά συστήματα όπου πολλαπλοί επεξεργαστές μοιράζονται το ίδιο ολοκληρωμένο (Chip Multiprocessors – CMPs). Ενώ μέχρι πρόσφατα ο σχεδιασμός ενός υπολογιστικού συστήματος στόχευε στην ικανοποίηση των απαιτήσεων μόνο μίας εφαρμογής ανά χρονική περίοδο, τώρα πια απαιτείται και η εξισορρόπηση των απαιτήσεων διαφορετικών εφαρμογών που ανταγωνίζονται για την κατοχή των ίδιων πόρων. Σε πολλές περιπτώσεις, όμως, αυτό δεν αρκεί από μόνο του. Ακόμη και αν επιτευχθεί κάποιος ιδανικός διαμοιρασμός του πόρου, αν δεν βελτιστοποιηθεί ο τρόπος με τον οποίο χρησιμοποιούν οι επεξεργαστές τον κοινόχρηστο πόρο, δεν θα καταφέρει να εξυπηρετήσει ικανοποιητικά το αυξημένο φορτίο. Για να αντιμετωπιστούν τα προβλήματα που πηγάζουν από τον διαμοιρασμό των κοινόχρηστων πόρων, στην παρούσα εργασία προτείνονται τρεις εναλλακτικοί μηχανισμοί διαχείρισης. Η πρώτη μεθοδολογία εισάγει μία νέα θεωρητική μοντελοποίηση του διαμοιρασμού της κρυφής μνήμης, η οποία μπορεί να χρησιμοποιηθεί παράλληλα με την εκτέλεση των προγραμμάτων που διαμοιράζονται την κρυφή μνήμη. Η μεθοδολογία αξιοποιεί στην συνέχεια αυτήν την μοντελοποίηση, για να ελέγξει τον διαμοιρασμό της κρυφής μνήμης και να επιτύχει δικαιοσύνη στο πως κατανέμεται ο χώρος της κρυφής μνήμης μεταξύ των επεξεργαστών. Η δεύτερη μεθοδολογία παρουσιάζει μία νέα τεχνική για την πρόβλεψη της τοπικότητας των προσπελάσεων της κρυφής μνήμης. Καθώς η τοπικότητα είναι η βασική παράμετρος που καθορίζει την χρησιμότητα των δεδομένων της κρυφής μνήμης, χρησιμοποιώντας αυτήν την τεχνική πρόβλεψης μπορούν να οδηγηθούν μηχανισμοί διαχείρισης που βελτιώνουν την αξιοποίηση του χώρου της κρυφής μνήμης. Στα πλαίσια της μεθοδολογίας παρουσιάζουμε έναν τέτοιο μηχανισμό, ο οποίος στοχεύει στην ελαχιστοποίηση των αστοχιών της κρυφής μνήμης μέσω μίας νέας πολιτικής αντικατάστασης. Η τελευταία μεθοδολογία που παρουσιάζεται είναι μία μεθοδολογία για την μείωση της κατανάλωσης ενέργειας της ουράς εντολών, που είναι μία από τις πιο ενεργειακά απαιτητικές δομές του επεξεργαστή. Στα πλαίσια της μεθοδολογίας, δείχνεται ότι το κλειδί για την αποδοτική μείωση της κατανάλωσης ενέργειας της ουράς εντολών βρίσκεται στην αλληλεπίδραση της με το υποσύστημα μνήμης. Με βάση αυτό το συμπέρασμα, παρουσιάζουμε έναν νέο μηχανισμό δυναμικής διαχείρισης του μεγέθους της ουράς εντολών, ο οποίος συνδυάζει επιθετική μείωση της κατανάλωσης ενέργειας του επεξεργαστή με διατήρηση της υψηλής απόδοσής του. / This dissertation proposes methodologies for the management of shared resources in chip multi-processors (CMP). Until recently, the design of a computing system had to satisfy the computational and storage needs of a single program during each time period. Now instead, the designer has to balance the, perhaps conflicting, needs of multiple programs competing for the same resources. But, in many cases, even this is not enough. Even if we could invent a perfect way to manage sharing, without optimizing the way that each processor uses the shared resource, the resource could not deal efficiently with the increased load. In order to handle the negative effects of resource sharing, this dissertation proposes three management mechanisms. The first one introduces a novel theoretical model of the sharing of the shared cache, which can be used at run-time. Furthermore, out methodology uses the model to control sharing and to achieve a sense of justice in the way the cache is shared among the processors. Our second methodology presents a new technique for predicting the locality of cache accesses. Since locality determines, almost entirely, the usefulness of cache data, our technique can be used to drive any management mechanism which strives to improve the efficiency of the cache. As part of our methodology, we present such a mechanism, a new cache replacement policy which tries to minimize cache misses by near-optimal replacement decisions. The last methodology presented in this dissertation, targets the energy consumption of the processor. To that end, our methodology shows that the key to reducing the power consumption of the Issue Queue, without disproportional performance degradation, lies at the interaction of the Issue Queue with the memory subsystem: as long as the management of the Issue Queue doesn’t reduce the utilization of the memory subsystem, the effects of the management on the processor’s performance will be minimal. Based on this conclusion, we introduce a new mechanism for dynamically resizing the Issue Queue, which achieves aggressive downsizing and energy savings with almost no performance degradation. Κρυφές μνήμες Ουρά εντολών Μηχανισμοί πρόβλεψης Κατανάλωση ισχύος 004.35 Computer architecture Chip multiprocessors CPU caches Instruction queue Prediction mechanisms Power-aware management techniques
144	Large Scale Graph Processing in a Distributed Environment Upadhyay, Nitesh January 2017 (has links) (PDF) Graph algorithms are ubiquitously used across domains. They exhibit parallelism, which can be exploited on parallel architectures, such as multi-core processors and accelerators. However, real world graphs are massive in size and cannot fit into the memory of a single machine. Such large graphs are partitioned and processed in a distributed cluster environment which consists of multiple GPUs and CPUs. Existing frameworks that facilitate large scale graph processing in the distributed cluster have their own style of programming and require extensive involvement by the user in communication and synchronization aspects. Adaptation of these frameworks appears to be an overhead for a programmer. Furthermore, these frameworks have been developed to target only CPU clusters and lack the ability to harness the GPU architecture. We provide a back-end framework to the graph Domain Specific Language, Falcon, for large scale graph processing on CPU and GPU clusters. The Motivation behind choosing this DSL as a front-end is its shared-memory based imperative programmability feature. Our framework generates Giraph code for CPU clusters. Giraph code runs on the Hadoop cluster and is known for scalable and fault-tolerant graph processing. For GPU cluster, Our framework applies a set of optimizations to reduce computation and communication latency, and generates efficient CUDA code coupled with MPI. Experimental evaluations show the scalability and performance of our framework for both CPU and GPU clusters. The performance of the framework generated code is comparable to the manual implementations of various algorithms in distributed environments. Distributed Environment Multi-core Processors Artificial Intelligence Computer Network Scale Graph Graph Algorithms Bulk Synchronous Parallel (BSP) Model Large Scale Graph CPU Cluster Giraph Computer Science
145	Dynamic real-time scene voxelization and an application for large scale scenes / Dynamisk voxelisering av stora 3D-miljöer Valter, Andreas January 2015 (has links) This report describes a basic implementation of scene voxelization within the Frostbite engine created by EA Frostbite. The algorithm supports dynamic scenes by voxelizing in real-time using the Graphical Programming Unit. The voxel grid is stored inside a buffer with a binary representation using clip mapping and multiple levels of detail. An ambient occlusion algorithm is implemented to show the benefits of the structure. Results from running the application within the engine is presented, both with figures showing the resulting image and timings for diifferent parts of the algorithm. Several future improvements to make the algorithm more competitive is presented as well. Voxelization GPU ACG Computer Graphics Computer games Computer science Media technology Medieteknik MT thesis Andreas Valter Valter CPU Real time Direct X DX11 HLSL Media and Communication Technology Medieteknik
146	Improving and Extending a High Performance Processor Optimized for FPGAs / Förbättring och utökning av en högpresterande processor anpassad för FPGAer Källming, Daniel, Hultenius, Kristoffer January 2010 (has links) This thesis is about a number of improvements and additions done to a soft CPU optimized for field programmable gate arrays (FPGAs). The goal has been to implement the changes without substantially lowering the CPU's ability to operate at high clock frequencies. The result of the thesis is a number of high clock frequency modules, which when added completes the CPU hardware functionality in certain areas. The maximum frequency of the CPU is however somewhat lowered after the modules have been added. / Detta examensarbete handlar om ett antal förbättringar och utökningar av en mjuk processor speciellt anpassad för fältprogrammerbara grindmatriser (FPGA). Målet har varit att göra förändringarna utan att göra större avkall på processorns förmåga att operera i höga klockfrekvenser. Resultatet av examensarbetet är ett antal moduler som klarar av höga frekvenser och kompletterar processorns hårdvarufunktioner. Dock reduceras maxfrekvensen på processorn något med modulerna tillagda. FPGA Soft CPU Xi2 Embedded Cache Division Interrupts DSP Computer Engineering Datorteknik Annan elektroteknik och elektronik
147	Heat transfer characteristics of natural convection within an enclosure using liquid cooling system Gdhaidh, Farouq Ali S. January 2015 (has links) In this investigation, a single phase fluid is used to study the coupling between natural convection heat transfer within an enclosure and forced convection through computer covering case to cool the electronic chip. Two working fluids are used (water and air) within a rectangular enclosure and the air flow through the computer case is created by an exhaust fan installed at the back of the computer case. The optimum enclosure size configuration that keeps a maximum temperature of the heat source at a safe temperature level (85°C) is determined. The cooling system is tested for varying values of applied power in the range of 15-40W. The study is based on both numerical models and experimental observations. The numerical work was developed using the commercial software (ANSYS-Icepak) to simulate the flow and temperature fields for the desktop computer and the cooling system. The numerical simulation has the same physical geometry as those used in the experimental investigations. The experimental work was aimed to gather the details for temperature field and use them in the validation of the numerical prediction. The results showed that, the cavity size variations influence both the heat transfer process and the maximum temperature. Furthermore, the experimental results ii compared favourably with those obtained numerically, where the maximum deviation in terms of the maximum system temperature, is within 3.5%. Moreover, it is seen that using water as the working fluid within the enclosure is capable of keeping the maximum temperature under 77°C for a heat source of 40W, which is below the recommended electronic chips temperature of not exceeding 85°C. As a result, the noise and vibration level is reduced. In addition, the proposed cooling system saved about 65% of the CPU fan power. 621.402
148	Three-and four-derivative Hermite-Birkhoff-Obrechkoff solvers for stiff ODE Albishi, Njwd January 2016 (has links) Three- and four-derivative k-step Hermite-Birkhoff-Obrechkoff (HBO) methods are constructed for solving stiff systems of first-order differential equations of the form y'= f(t,y), y(t0) = y0. These methods use higher derivatives of the solution y as in Obrechkoff methods. We compute their regions of absolute stability and show the three- and four-derivative HBO are A( 𝜶)-stable with 𝜶 > 71 ° and 𝜶 > 78 ° respectively. We conduct numerical tests and show that our new methods are more efficient than several existing well-known methods. general linear method for stiff ODE's Hermite-Birkhoff-Obrechkoff method maximum end error number of function evaluations CPU time comparing stiff ODE solvers.
149	En jämförelse mellan dataorienterad design och objektorienterad design / A Comparison Between Data-Oriented Design and Object-Oriented Design Westerberg, Charlotte January 2020 (has links) Dagens applikationer hanterar mer och mer data vilket resulterar i att de blir allt mer resurskrävande och kräver mer av hårdvaran. Vilket i förlängningen kan innebär att hårdvaran måste bytas ut med jämna mellanrum för att kunna köra mjukvaran på ett för användaren tillfredsställande sätt. Detta arbete undersöker om det genom att byta designteknik är möjligt att utveckla mindre resurskrävande applikationer. Arbetet presenterar en jämförelse mellan objektorienterad design (även kallad objektorienterad programmering, OOP) och data orienterad design (DOD). Detta genom att dels ta upp kända för- och nackdelar med respektive designteknik samt genom att utföra en mätning på respektive teknik. Det som anses vara de främsta fördelarna med OOP är återanvändning av kod, att koden är lätt att underhålla, säkerhet i form av inkapsling samt att objekten som används reflekterar den mänskliga verkligheten. Dessa fördelar är dock även något som bidrar till det som anses vara den främsta nackdelen med OOP, nämligen att den är prestandakrävande. När det gäller DOD så anses de främsta fördelarna vara att det medför en cachevänligare kod som leder till färre cachemissar. Det anses även vara lättare att parallellisera koden i jämförelse med OOP. Den nackdelen som tas upp med DOD är att de tar tid att lära sig och kräver en del övning. Dock är DOD väldigt okänt vilket resulterade i ett svagt underlag. Två simuleringar utvecklades i Unity varav den ena använder sig av den nya teknikstacken DOTS som är dataorienterad. Resultatet av mätningarna indikerar på att DOD använder mindre av hårdvaruresurserna vid prestandakrävande applikationer. Om applikationen ej är prestandakrävande märks dock ingen skillnad mellan de olika teknikerna vid fråga om processoranvändning. / Today, applications handle more and more data, which results in them becoming increasingly resource-intensive and requiring more of the hardware. Which in the long run may cause that the hardware must be replaced at regular intervals to be able to run the software in a way that is satisfactory for the user. This thesis investigates whether it is possible to get less resource-intensive applications by changing the design technology. The paper presents a comparison between object-oriented design (also known as object-oriented programming, OOP) and data-oriented design (DOD). This is performed by addressing the known advantages and disadvantages of each design technique and by measuring each technique in the matter of performance. What was considered to be the main advantages of OOP is the reuse of code, that the code is easy to maintain, security in the form of encapsulation and that the objects that are used reflect human reality. On the other hand, these advantages also contribute to what is considered to be the main disadvantage of OOP, namely that it is performance-intensive. When it comes to DOD, the main advantages are considered to be that it results in a more cache-friendly code that leads to fewer cache misses. DOD is also considered easier to parallelize the code compared to OOP. The disadvantage of DOD is that it is time consuming to learn and requires some practice. Though, DOD is very unknown which resulted in a narrow basis. Two simulations were developed in Unity, one of which uses the new technology stack DOTS, which is data-oriented. The results of the measurements indicate that DOD uses less of the hardware resources in performance-intensive applications. If the application is not performance-intensive, though, no difference is noticed between the different technologies when it comes to CPU-usage. Object-oriented design Data-oriented design performance measurement CPU-usage memory usage Unity DOTS Objektorienterad design Data orienterad design prestandamätning processoranvändning minnesanvändning Unity DOTS Computer Sciences Datavetenskap (datalogi)
150	Investigating the effect of implementing Data-Oriented Design principles on performance and cache utilization Nyberg, Frank January 2021 (has links) Game engines process a lot of data under strict deadlines. Therefore, measures to increase performance are important in this area. Data-Oriented Design (DOD) promotes principles that are meant to increase performance by better cache utilization. The purpose of this thesis is to examine a selection of these principles to give a better understanding of how DOD affects CPU time and the rate of cache misses, with focus on the area of game development. More specifically, the principles examined are removal of run-time polymorphism, iteration over contiguous data, and lowering the amount of data in hot loops. Also, the Entity-Component-System pattern is examined, which is based upon DOD principles. The approach was to first present a theoretical background on the subject, and then to conduct tests by implementing a simulation of movement and collision detection utilizing said principles. The tests were written in C++ and executed on an Intel Core i7 4770k with no rendering. CPU time was measured in updated entities per μs, and cache utilization was measured in the form of cache miss rate. The results showed that the DOD principles did increase performance. Cache miss rate was also lower, with the exception of when removing run-time polymorphism. The conclusion is that Data-Oriented Design, used in game development, is likely to result in better performance, mostly as a result of better cache utilization. Data-oriented design DOD ECS Entity Component System HPC CPU cache cache misses games game architecture game engines design patterns Computer Sciences Datavetenskap (datalogi)

Search results