Global ETD Search

681	Architectural enhancement for message passing interconnects Khun Jush, Farshad 21 October 2008 (has links) Research in high-performance architecture has been focusing on achieving more computing power to solve computationally-intensive problems. Advancements in the processor industry are not applicable in applications that need several hundred or thousand-fold improvement in performance. The parallel architecture approach promises to provide more computing power and scalability. Cluster computing, consisting of low-cost and high-performance processors, has been an alternative to proprietary and expensive supercomputer platforms. As in any other parallel system, communication overhead (including hardware, software, and network) adversely affects the computation performance in a cluster environment. Therefore, decreasing this overhead is the main concern in such environments. Communication overhead is the key obstacle to reaching hardware performance limits and is mostly associated with software overhead, a significant portion of which is attributed to message copying. Message copying is largely caused by a lack of knowledge of the next received message, which can be dealt with through speculation. To reduce this copying overhead and advance toward a finer granularity, architectural extensions comprised of a specialized network cache and instructions to manage the operations of these extensions were introduced. In order to investigate the effectiveness of the proposed architectural enhancement, a simulation environment was established by expanding an existing single-thread infrastructure to one that can run MPI applications. Then the proposed extensions were implemented, along with the MPI functions on top of the SimpleScalar infrastructure. Further, two techniques were proposed in order to achieve zero-copy data transfer in message passing environments, two policies that determine when a message is to be bound and sent to the data cache. These policies are called Direct to Cache Transfer DTCT and lazy DTCT. The simulations showed that by using the proposed network extension along with the DTCT techniques fewer data cache misses were encountered as compared to when the DTCT techniques were not used. This involved a study of the possible overhead and cache pollution introduced by the operating system and the communications stack, as exemplified by Linux, TCP/IP and M-VIA. Then these effects on the proposed extensions were explored. Ultimately, this enabled a comparison of the performance achieved by applications running on a system incorporating the proposed extension with the performance of the same applications running on a standard system. The results showed that the proposed approach could improve the performance of MPI applications by 15 to 20%. Moreover, data transfer mechanisms and the associated components in the CELL BE processor were studied. For this, two general data transfer methods were explored involving the PUT and GET functions, demonstrating that the SPE-initiated DMA data transfers are faster than the corresponding PPE-initiated DMAs. The main components of each data transfer were also investigated. In the SPE-initiated GET function, the main component is data delivery. However, the PPE-initiated GET function shows a long DMA issue time as well as a lengthy gap in receiving successive messages. It was demonstrated that the main components of the SPE-initiated PUT function are data delivery and latency (that is, the time to receive the first byte), and the main components in the PPE-initiated PUT function are the DMA issue time and latency. Further, an investigation revealed that memory-management overhead is comparable to the data transfer time; therefore, this calls for techniques to hide the unavoidable overhead in order to reach high-throughput communication in MPI implementation in the Cell BE processor. Network Cache MPI Memory Hierarchy System DTCT Techniques
682	Διαχείριση κοινών πόρων σε πολυπύρηνους επεξεργαστές Αλεξανδρής, Φωκίων 27 June 2012 (has links) Οι σύγχρονες τάσεις της Επιστήμης Σχεδιασμού των Υπολογιστικών Συστημάτων έχουν υιοθετήσει την χρήση των Κρυφών Μνημών ή Μνημών Cache, αποβλέποντας στην απόκρυψη της Καθυστέρησης της Κύριας Μνήμης των Συστημάτων (Memory Latency) και την γεφύρωση του χάσματος της απόδοσης του Επεξεργαστή και της Κύριας Μνήμης (Processor – Memory Performance Gap). Οι Μνήμες Cache έτσι έχουν αποκτήσει αδιαμφισβήτητα πρωτεύοντα ρόλο στην Ιεραρχία Μνήμης των Ηλεκτρονικών Υπολογιστών. Οι νέες τάσεις Σχεδιασμού ανέδειξαν την Έννοια του Παραλληλισμού σε πρωτεύοντα ρόλο. Αρχικά διερευνήθηκε ο Παραλληλισμός Επιπέδου Εντολών, ωστόσο η αύξηση της Απόδοσης των Υπολογιστών σύντομα έφτασε ένα μέγιστο. Την τελευταία δεκαετία το κέντρο του ενδιαφέροντος των σχεδιαστών έχει και πάλι μετατοπιστεί, καθώς ένας νέος τύπος Επεξεργαστών έχει εισέλθει στο προσκήνιο, οι Πολυπύρηνοι Επεξεργαστές, ή όπως είναι αλλιώς γνωστοί on-chip Multiprocessors (CMP). Αυτές οι εξελίξεις, σε συνδυασμό με την ολοένα αυξανόμενη πολυπλοκότητα της “συμπεριφοράς” των εκτελούμενων Εφαρμογών, ώθησαν το σχεδιαστικό ενδιαφέρον προς την εκμετάλλευση ενός νεοσύστατου τύπου Παραλληλισμού. Ο Παραλληλισμός Επιπέδου Μνήμης ή Memory Level Parallelism (MLP) αποτελεί τα τελευταία χρόνια, το πλέον ισχυρό μέσο αύξησης της απόδοσης των Υπολογιστικών Συστημάτων και μαζί με τους Πολυπύρηνους Επεξεργαστές θα κυριαρχήσει στο προσκήνιο των εξελίξεων τα επόμενα χρόνια. Σκοπός της παρούσας Διπλωματικής Εργασίας είναι η ανάπτυξη ενός Στατιστικού – Πιθανοτικού Μοντέλου για μελέτη και πρόβλεψη των φαινομένων που αναπτύσσονται σε Μνήμες Cache, στις οποίες αποθηκεύονται δεδομένα από εκτελούμενες Εφαρμογές, με έντονο Παραλληλισμό Επιπέδου Μνήμης. Θα οριστεί ένας Εκτιμητής του Φόρτου που επιβάλλεται στο Σύστημα, από φαινόμενα Παραλληλισμού Επιπέδου Μνήμης (MLP). Στην συνέχεια, με βάση το Μοντέλο που αναπτύσσουμε, θα διερευνηθεί ένα ικανοποιητικό σύνολο Εφαρμογών, και θα εξαχθεί μια Εκτίμηση – Πρόβλεψη για τον Φόρτο (MLP) του Συστήματος. Εφόσον οι Προβλέψεις μας κριθούν επιτυχής, το Μοντέλο Πρόβλεψης Φόρτου MLP που αναπτύξαμε, μπορεί να αποτελέσει χρήσιμο Εργαλείο στα χέρια των Σχεδιαστών που ασχολούνται με την αύξηση της Απόδοσης των Σύγχρονων Υπολογιστικών Συστημάτων. / - Κρυφές μνήμες Έννοια παραλληλισμού 004.5 Cache memories Memory latency On-chip multiprocessors (CMP) Memory level parallelism (MLP)
683	User-Based Predictive Caching of Streaming Media / Användarbaserad predektiv cachning av strömmande media Håkansson, Fredrik, Larsson, Carl-Johan January 2018 (has links) Streaming media is a growing market all over the world which sets a strict requirement on mobile connectivity. The foundation for a good user experience when supplying a streaming media service on a mobile device is to ensure that the user can access the requested content. Due to the varying availability of mobile connectivity measures has to be taken to remove as much dependency as possible on the quality of the connection. This thesis investigates the use of a Long Short-Term Memory machine learning model for predicting a future geographical location for a mobile device. The predicted location in combination with information about cellular connectivity in the geographical area is used to schedule prefetching of media content in order to improve user experience and to reduce mobile data usage. The Long Short-Term Memory model suggested in this thesis achieves an accuracy of 85.15% averaged over 20000 routes and the predictive caching managed to retain user experience while decreasing the amount of data consumed. / <p>This thesis is written as a joint thesis between two students from different universities. This means the exact same thesis is published at two universities (LiU and KTH) but with different style templates. The other report has identification number: TRITA-EECS-EX-2018:403</p> cache software media streaming ml ai machine learning LSTM GRU network coverage Software Engineering Programvaruteknik Computer Sciences Datavetenskap (datalogi) Information Systems
684	Fyzická zátěž při geocachingu v jednotlivých věkových kategoriích ve Středočeském kraji / Thephysicalstrain of geocachingindividualagecategories in TheCentral Bohemian region BEČVÁŘOVÁ, Ivana January 2015 (has links) Abstract: This thesis focuses on the physical strain of geocaching in individual age categories in The Central Bohemian region. Geocaching is a new outdoor activity, based on a popular game, which was played by generations of children and young people. The players seek for a well hidden treasure a container called "cache". This game connects younger and older age groups, who have a common aim. It supports the cooperation of different generations, whose family trips get a new motivation. The geocachers get to know interesting towns and cities, their history, national parks or other nature beauties. This final paper compares the physical strain of some age groups of geocachers from the Central Bohemian region. The thesis concentrates on the frequency, length and energetic demands of their routes.
685	Co-scheduling for large-scale applications : memory and resilience / Ordonnancement concurrent d’applications à grande échelle : mémoire et résilience Pottier, Loïc 18 September 2018 (has links) Cette thèse explore les problèmes liés à l'ordonnancement concurrent dans le contexte des applications massivement parallèle, de deux points de vue: le coté mémoire (en particulier la mémoire cache) et le coté tolérance aux fautes.Avec l'avènement récent des architectures dites many-core, tels que les récents processeurs multi-coeurs, le nombre d'unités de traitement augmente de manière importante.Dans ce contexte, les avantages fournis par les techniques d'ordonnancements concurrents ont été démontrés à travers de nombreuses études.L'ordonnancement concurrent, aussi appelé co-ordonnancement, consiste à exécuter les applications de manière concurrente plutôt que les unes après les autres, dans le but d'améliorer le débit global de la plateforme.Mais le partage des ressources peut souvent générer des interférences.Une des solutions pour réduire de manière importante ces interférences est le partitionnement de cache.À travers un modèle théorique, des simulations et des expériences sur une plateforme existante, nous montrons l'utilité et l'importance du co-ordonnancement quand nos stratégies de partitionnement de cache sont utilisées.De plus, avec ce nombre croissant de processeurs, la probabilité d'une panne augmente également.L'efficacité des techniques de co-ordonnancement a été démontrée dans un contexte sans pannes, mais les plateformes massivement parallèles sont confrontées à des pannes fréquentes, et des techniques de tolérance aux fautes doivent être mise en place pour améliorer l'efficacité de ces plateformes.Nous étudions la complexité du problème avec un modèle théorique, nous concevons des heuristiques et nous effectuons un ensemble complet de simulations avec un simulateur de pannes, qui démontre l'efficacité des heuristiques proposées. / This thesis explores co-scheduling problems in the context of large-scale applications with two main focus: the memory side, in particular the cache memory and the resilience side.With the recent advent of many-core architectures such as chip multiprocessors (CMP), the number of processing units is increasing.In this context, the benefits of co-scheduling techniques have been demonstrated. Recall that, the main idea behind co-scheduling is to execute applications concurrently rather than in sequence in order to improve the global throughput of the platform.But sharing resources often generates interferences.With the arising number of processing units accessing to the same last-level cache, those interferences among co-scheduled applications becomes critical.In addition, with that increasing number of processors the probability of a failure increases too.Resiliency aspects must be taking into account, specially for co-scheduling because failure-prone resources might be shared between applications.On the memory side, we focus on the interferences in the last-level cache, one solution used to reduce these interferences is the cache partitioning.Extensive simulations demonstrate the usefulness of co-scheduling when our efficient cache partitioning strategies are deployed.We also investigate the same problem on a real cache partitioned chip multiprocessors, using the Cache Allocation Technology recently provided by Intel.In a second time, still on the memory side, we study how to model and schedule task graphs on the new many-core architectures, such as Knights Landing architecture.These architectures offer a new level in the memory hierarchy through a new on-packagehigh-bandwidth memory. Current approaches usually do not take intoaccount this new memory level, however new scheduling algorithms anddata partitioning schemes are needed to take advantage of this deepmemory hierarchy.On the resilience, we explore the impact on failures on co-scheduling performance.The co-scheduling approach has been demonstrated in a fault-free context, but large-scale computer systems are confronted by frequent failures, and resilience techniques must be employed for large applications to execute efficiently. Indeed, failures may create severe imbalance between applications, and significantly degrade performance.We aim at minimizing the expected completion time of a set of co-scheduled applications in a failure-prone context by redistributing processors. Ordonnancement concurrent Hiérarchie mémoire Algorithme d’ordonnancement Résilience Informatique haute performance HPC Antémémoire Co-scheduling algorithm Memory hierarchy Cache memory Scheduling Resilience High performance computing HPC Memory Many-core
686	Heuristisk profilbaserad optimering av instruktionscache i en online Just-In-Time kompilator / Heuristic Online Profile Based Instruction Cache Optimisation in a Just-In-Time Compiler Eng, Stefan January 2004 (has links) This master’s thesis examines the possibility to heuristically optimise instruction cache performance in a Just-In-Time (JIT) compiler. Programs that do not fit inside the cache all at once may suffer from cache misses as a result of frequently executed code segments competing for the same cache lines. A new heuristic algorithm LHCPA was created to place frequently executed code segments to avoid cache conflicts between them, reducing the overall cache misses and reducing the performance bottlenecks. Set-associative caches are taken into consideration and not only direct mapped caches. In Ahead-Of-Time compilers (AOT), the problem with frequent cache misses is often avoided by using call graphs derived from profiling and more or less complex algorithms to estimate the performance for different placements approaches. This often results in heavy computation during compilation which is not accepted in a JIT compiler. A case study is presented on an Alpha processor and an at Ericsson developed JIT Compiler. The results of the case study shows that cache performance can be improved using this technique but also that a lot of other factors influence the result of the cache performance. Such examples are whether the cache is set-associative or not; and especially the size of the cache highly influence the cache performance. Datorsystem Alpha processor Cache Compiler Heuristic Hot Instruction Model Online Optimisation Profile Just-In-Time Set-Associative Datorsystem Information Systems
687	Caracterização de memorias analogicas implementadas com transistores MOS floating gate / Analogic memories characterization implemented with floating gate MOS transistors Couto, Andre Luis do 28 November 2005 (has links) Orientador: Carlos Alberto dos Reis Filho / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação / Made available in DSpace on 2018-08-07T11:14:24Z (GMT). No. of bitstreams: 1 Couto_AndreLuisdo_M.pdf: 2940356 bytes, checksum: 959908541a3bc46b7b7035eb035de186 (MD5) Previous issue date: 2005 / Resumo: A integração de memórias e circuitos analógicos em um mesmo die oferece diversas vantagens: redução de espaço nas placas, maior confiabilidade, menor custo. Para tanto, prescindir-se de tecnologia específica à confecção de memórias e utilizar-se somente de tecnologia CMOS convencional é requisito para tal integração. Essa pode ser tanto mais eficiente quanto maior a capacidade de armazenagem de dados, ou seja, maior a densidade de informação. Para isso, memórias analógicas mostram-se bem mais adequadas, posto que em uma só célula (um ou dois transistores) podem ser armazenados dados que precisariam de diversas células de memórias digitais e, portanto, de maior área. Neste trabalho, transistores MOS com porta flutuante mostraram-se viáveis de serem confeccionados e resultados de caracterização como tipos de programação, retenção de dados e endurance foram obtidos. O trabalho apresenta as principais características dos FGMOS (Floating Gate MOS) e presta-se como referência à futuros trabalhos na área / Abstract:Monolithic integration of memories and analog circuits ,in the same die offers interesting advantages like: smaller application boards, higher robustness and mainly lower costs. Today, a profitable integration of these kind of circuit can only be possible using conventional CMOS technology, which allows efficiently extraordinary levels of integration. Thus, the possibility of integrating analog memories looks more suitable since one single cell (usually use one or two transistors) serves for storing the same data stored by few digital memory cells, therefore, they requiring less area. In this work, it was implemented different memory cells together with few devices using floating gate MOS transistors and manufactured by a conventional CMOS technology. Differemt sort of programrning', data retention, and endurance were characterized as well as the main characteristics of the FGMOS (Floating Gate MOS) were obtained. The results of their characterization reveal that is possible to make and' to program fIoating gate MOSFETS analog memories and must serve as starting-point and reference for new academic studies / Mestrado / Eletrônica, Microeletrônica e Optoeletrônica / Mestre em Engenharia Elétrica Memória cache Sistemas de memória de computadores Transistores de efeito de campo Transistores Circuitos integrados Microeletrônica Microelectronics Integrated circuits Transistors Field effect transistor Computer memory system
688	Web Map Service implementation i .NET Lundmark, Anton January 2011 (has links) I dagens samhälle så används internet mer och mer för att få fram information, så är även fallet förkartor. I denna uppsats, som gjorts på uppdrag av Tieto Sweden Healthcare & Welfare för att kunnaanvändas i systemet Laps Care, kommer det tas upp lösningar för att hämta geografisk data viakarttjänster med hjälp av Web Map Service (WMS) tjänster i en .NET applikation.Detta examensarbete kommer att ta upp, på en grundläggande nivå, hur WMS-standarden kananvändas av en klient för att visa digitala kartor från en WMS-tjänst samt lite kort om andraalternativ till WMS så som Web Map Service Tile Cache (WMS-C) och Tile Map Service (TMS)tjänster. Det ges olika förslag på open source komponenter som kan användas för att hantera sådanatjänster med fokus på SharpMap som valdes att användas i prototypen som gjordes för att visa huren sådan klient kan se ut.Uppsatsen kommer också behandla kartografi där det kortfattat förklaras om vilka riktlinjer somborde tas för en karta.I andra stycket tas kortfattat upp hur webbtjänster fungerar och även vad det finns för för- ochnackdelar att använda sig av sådana tjänster.Det kommer också förklaras vad Geografiska informationssystem (GIS) är och hur det användsidag.Sammanfattningsvis så utvecklades en fungerande prototyp med hjälp av open source komponentenSharpMap som kan visa kartor från WMS, WMS-C och TMS tjänster och om en ERSI Shapefilmed vägdata finns tillgänglig så går det att söka efter gator. Web Map Service Web Map Service Tile Cache Tile Map Service GIS geografisk data kartografi webbtjänster .NET SharpMap Computer and Information Sciences Data- och informationsvetenskap
689	Improving and Extending a High Performance Processor Optimized for FPGAs / Förbättring och utökning av en högpresterande processor anpassad för FPGAer Källming, Daniel, Hultenius, Kristoffer January 2010 (has links) This thesis is about a number of improvements and additions done to a soft CPU optimized for field programmable gate arrays (FPGAs). The goal has been to implement the changes without substantially lowering the CPU's ability to operate at high clock frequencies. The result of the thesis is a number of high clock frequency modules, which when added completes the CPU hardware functionality in certain areas. The maximum frequency of the CPU is however somewhat lowered after the modules have been added. / Detta examensarbete handlar om ett antal förbättringar och utökningar av en mjuk processor speciellt anpassad för fältprogrammerbara grindmatriser (FPGA). Målet har varit att göra förändringarna utan att göra större avkall på processorns förmåga att operera i höga klockfrekvenser. Resultatet av examensarbetet är ett antal moduler som klarar av höga frekvenser och kompletterar processorns hårdvarufunktioner. Dock reduceras maxfrekvensen på processorn något med modulerna tillagda. FPGA Soft CPU Xi2 Embedded Cache Division Interrupts DSP Computer Engineering Datorteknik Annan elektroteknik och elektronik
690	Dynamic Load Generator: Synthesising dynamic hardware load characteristics Karlsson, Stefan, Hansson, Erik January 2015 (has links) In this thesis we proposed and tested a new method for creating synthetic workloads. Our method takes the dynamic behaviour into consideration, whereas previous studies only consider the static behaviour. This was done by recording performance monitor counters (PMC) events from a reference application. These events were then used to calculate the hardware load characteristics, in our case cache miss ratios, that were stored for each sample and used as input to a load regulator. A signalling application was then used together with a load regulator and a cache miss generator to tune the hardware characteristics until they were similar to those of the reference application. For each sample, the final parameters from the load regulator were stored in order to be able to simulate it. By simulating all samples with the same sampling period with which they were recorded, the dynamic behaviour of the reference application could be simulated. Measurements show that this was successful for L1 D$ miss ratio, but not for L1 I$ miss ratio and only to a small extent for L2 D$ miss ratio. We were also able to show that the total convergence time for the regulator could be reduced by using case-based reasoning to select the initial parameters from similar samples. Performance evaluation Synthetic hardware load characteristics measurement cache instructions per cycle simulation performance monitoring performance monitor counters Computer Sciences Datavetenskap (datalogi)

Search results