Global ETD Search

621	Implementação de um algoritmo de mecânica dos fluidos computacional projetado para plataformas de processamento paralelo com memória distribuída Angeli, João Paulo de 30 June 2005 (has links) Made available in DSpace on 2016-12-23T14:36:45Z (GMT). No. of bitstreams: 1 dissertacao.pdf: 1896132 bytes, checksum: dc313d94261c073031be0aad2e3bffbf (MD5) Previous issue date: 2005-06-30 / Discute a implementação do algoritmo numérico para simulação de escoamento de fluidos incompressíveis, baseado no método de diferenças finitas, projetado para plataformas de processamento paralelo com memória distribuída, particularmente para clusters de estações de trabalho. O algoritmo de solução para as equações de Navier-Stokes utiliza um esquema explicito para pressão e um esquema implícito para as velocidades. A implementação paralela é baseada na decomposição do domínio, onde o domínio computacional do problema é decomposto em vários blocos, sendo um ou mais destinados a nós de processamento distintos. Todos os nós então processam em paralelo as tarefas de computação sobre os blocos a eles designados. O processamento paralelo inclui inicialização, cálculo de coeficientes, solução linear nos subdomínios, e comunicação entre os nós. A troca de informação entre os processos referentes a cada subdomínio é realizada utilizando a biblioteca message passing interface (MPI), o que assegura portabilidade entre diferentes plataformas computacionais, abrangendo desde máquinas maciçamente paralelas (MPP) até clusters de estações de trabalho. Para melhorar os níveis de desempenho obtidos pelo algoritmo, foram investigadas técnicas para a redução do volume de comunicação entre processadores e utilização mais eficiente da memória cache dos microprocessadores. Para avaliar o desempenho do algoritmo desenvolvido e analisar as diferentes estratégias de paralelização foram executadas simulações com cluster de 2 a 56 processadores, nas quais foram avaliados o tempo de execução, speedup e eficiência paralela. Os resultados experimentais mostram que as otimizações relacionadas aos fatores de comunicação melhoram o speedup em até 165%, e a técnica de utilização mais eficiente da memória cache pode melhorar o speedup em mais 40% acima da otimização da comunicação. / This work discusses the implementation of a numerical algorithm for simulating incompressible fluid flows, based on the finite difference method, and designed for parallel computing platforms with distributed-memory, particularly for clusters of workstations. The solution algorithm for the Navier-Stokes equations utilizes an explicit scheme for pressure and an implicit scheme for velocities. The parallel implementation is based on domain decomposition, where the original calculation domain is decomposed into several blocks, each of which given to a separate processing node. All nodes then execute computations in parallel, each node on its associated sub-domain. The parallel computations include initialization, coefficient generation, linear solution on the sub-domain, and inter-node communication. The exchange of information across the sub-domains, or processors, is achieved using the message passing interface standard, MPI. The use of MPI ensures portability across different computing platforms ranging from massively parallel machines to clusters of workstations. Three different optimization strategies were evaluated in order to improve the computational performance of the algorithm, which include techniques exploring a reduction in the communication volume between processors and a more efficient utilization of the microprocessor s cache memory. In order to evaluate the performance levels obtained, and to analyze the effectiveness of the optimization strategies adopted, simulations using a 64 nodes cluster were executed. The simulations were performed using 2 to 56 processors, where execution time and speed-up were measured. The results indicate that the optimizations related to communication factors can improve the speed-up obtained up to 165%, while the cache memory optimization technique used can improve the speed-up obtained in further 40%. processamento paralelo diferenças finitas Navier-Stokes MPI memória cache parallel processing finite difference method Navier-Stokes MPI cache memory
622	Cache Prediction and Execution Time Analysis on Real-Time MPSoC Neikter, Carl-Fredrik January 2008 (has links) Real-time systems do not only require that the logical operations are correct. Equally important is that the specified time constraints always are complied. This has successfully been studied before for mono-processor systems. However, as the hardware in the systems gets more complex, the previous approaches become invalidated. For example, multi-processor systems-on-chip (MPSoC) get more and more common every day, and together with a shared memory, the bus access time is unpredictable in nature. This has recently been resolved, but a safe and not too pessimistic cache analysis approach for MPSoC has not been investigated before. This thesis has resulted in designed and implemented algorithms for cache analysis on real-time MPSoC with a shared communication infrastructure. An additional advantage is that the algorithms include improvements compared to previous approaches for mono-processor systems. The verification of these algorithms has been performed with the help of data flow analysis theory. Furthermore, it is not known how different types of cache miss characteristic of a task influence the worst case execution time on MPSoC. Therefore, a program that generates randomized tasks, according to different parameters, has been constructed. The parameters can, for example, influence the complexity of the control flow graph and average distance between the cache misses. Real-time systems MPSoC static timing analysis worst case execution time cache memory cache analysis data flow analysis control flow graph task generation randomization Computer Sciences Datavetenskap (datalogi) Computer Engineering Datorteknik
623	Efficient techniques to provide scalability for token-based cache coherence protocols Cuesta Sáez, Blas Antonio 17 July 2009 (has links) Cache coherence protocols based on tokens can provide low latency without relying on non-scalable interconnects thanks to the use of efficient requests that are unordered. However, when these unordered requests contend for the same memory block, they may cause protocols races. To resolve the races and ensure the completion of all the cache misses, token protocols use a starvation prevention mechanism that is inefficient and non-scalable in terms of required storage structures and generated traffic. Besides, token protocols use non-silent invalidations which increase the latency of write misses proportionally to the system size. All these problems make token protocols non-scalable. To overcome the main problems of token protocols and increase their scalability, we propose a new starvation prevention mechanism named Priority Requests. This mechanism resolves contention by an efficient, elegant, and flexible method based on ordered requests. Furthermore, thanks to Priority Requests, efficient techniques can be applied to limit the storage requirements of the starvation prevention mechanism, to reduce the total traffic generated for managing protocol races, and to reduce the latency of write misses. Thus, the main problems of token protocols can be solved, which, in turn, contributes to wide their efficiency and scalability. / Cuesta Sáez, BA. (2009). Efficient techniques to provide scalability for token-based cache coherence protocols [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/6024 / Palancia Memoria cache Memoria compartida Procesadores paralelos Multiprocesadores Arquitectura del procesador Coherencia de cache Token coherence Evaluacion de prestaciones Escalabilidad Ccnuma 330406 - Arquitectura de ordenadores 120309 - Diseño con ayuda de ordenador 120326 - Simulación
624	Design and Implementation of an Architecture-aware In-memory Key- Value Store Giordano, Omar January 2021 (has links) Key-Value Stores (KVSs) are a type of non-relational databases whose data is represented as a key-value pair and are often used to represent cache and session data storage. Among them, Memcached is one of the most popular ones, as it is widely used in various Internet services such as social networks and streaming platforms. Given the continuous and increasingly rapid growth of networked devices that use these services, the commodity hardware on which the databases are based must process packets faster to meet the needs of the market. However, in recent years, the performance improvements characterising the new hardware has become thinner and thinner. From here, as the purchase of new products is no longer synonymous with significant performance improvements, companies need to exploit the full potential of the hardware already in their possession, consequently postponing the purchase of more recent hardware. One of the latest ideas for increasing the performance of commodity hardware is the use of slice-aware memory management. This technique exploits the Last Level of Cache (LLC) by making sure that the individual cores take data from memory locations that are mapped to their respective cache portions (i.e., LLC slices). This thesis focuses on the realisation of a KVS prototype—based on Intel Haswell micro-architecture—built on top of the Data Plane Development Kit (DPDK), and to which the principles of slice-aware memory management are applied. To test its performance, given the non-existence of a DPDKbased traffic generator that supports the Memcached protocol, an additional prototype of a traffic generator that supports these features has also been developed. The performances were measured using two distinct machines: one for the traffic generator and one for the KVS. First, the “regular” KVS prototype was tested, then, to see the actual benefits, the slice-aware one. Both KVS prototypeswere subjected to two types of traffic: (i) uniformtraffic where the keys are always different from each other, and (ii) skewed traffic, where keys are repeated and some keys are more likely to be repeated than others. The experiments show that, in real-world scenario (i.e., characterised by skewed key distributions), the employment of a slice-aware memory management technique in a KVS can slightly improve the end-to-end latency (i.e.,~2%). Additionally, such technique highly impacts the look-up time required by the CPU to find the key and the corresponding value in the database, decreasing the mean time by ~22.5%, and improving the 99th percentile by ~62.7%. / Key-Value Stores (KVSs) är en typ av icke-relationsdatabaser vars data representeras som ett nyckel-värdepar och används ofta för att representera lagring av cache och session. Bland dem är Memcached en av de mest populära, eftersom den används ofta i olika internettjänster som sociala nätverk och strömmande plattformar. Med tanke på den kontinuerliga och allt snabbare tillväxten av nätverksenheter som använder dessa tjänster måste den råvaruhårdvara som databaserna bygger på bearbeta paket snabbare för att möta marknadens behov. Under de senaste åren har dock prestandaförbättringarna som kännetecknar den nya hårdvaran blivit tunnare och tunnare. Härifrån, eftersom inköp av nya produkter inte längre är synonymt med betydande prestandaförbättringar, måste företagen utnyttja den fulla potentialen för hårdvaran som redan finns i deras besittning, vilket skjuter upp köpet av nyare hårdvara. En av de senaste idéerna för att öka prestanda för råvaruhårdvara är användningen av skivmedveten minneshantering. Denna teknik utnyttjar den Sista Nivån av Cache (SNC) genom att se till att de enskilda kärnorna tar data från minnesplatser som är mappade till deras respektive cachepartier (dvs. SNCskivor). Denna avhandling fokuserar på förverkligandet av en KVS-prototyp— baserad på Intel Haswell mikroarkitektur—byggd ovanpå Data Plane Development Kit (DPDK), och på vilken principerna för skivmedveten minneshantering tillämpas. För att testa dess prestanda, med tanke på att det inte finns en DPDK-baserad trafikgenerator som stöder Memcachedprotokollet, har en ytterligare prototyp av en trafikgenerator som stöder dessa funktioner också utvecklats. Föreställningarna mättes med två olika maskiner: en för trafikgeneratorn och en för KVS. Först testades den “vanliga” KVSprototypen, för att se de faktiska fördelarna, den skivmedvetna. Båda KVSprototyperna utsattes för två typer av trafik: (i) enhetlig trafik där nycklarna alltid skiljer sig från varandra och (ii) sned trafik, där nycklar upprepas och vissa nycklar är mer benägna att upprepas än andra. Experimenten visar att i verkliga scenarier (dvs. kännetecknas av snedställda nyckelfördelningar) kan användningen av en skivmedveten minneshanteringsteknik i en KVS förbättra förbättringen från slut till slut (dvs. ~2%). Dessutom påverkar sådan teknik i hög grad uppslagstiden som krävs av CPU: n för att hitta nyckeln och motsvarande värde i databasen, vilket minskar medeltiden med ~22, 5% och förbättrar 99th percentilen med ~62, 7%. Key-Value Store Data Plane Development Kit Last Level of Cache Sliceaware memory management Memcached Haswell microrchitecture. Key-Value Store Data Plane Development Kit Sista Nivån av Cache Skivmedveten Minneshantering Memcached Haswell mikroarkitektur. Computer and Information Sciences Data- och informationsvetenskap
625	Jämförelse av prestanda mellan GraphQL och REST / Comparison of performance between GraphQL and REST Onval, Sara, Dualeh, Iman January 2020 (has links) Med dagens snabba utveckling av informationsteknologin och med ökningen av antalet människor som är uppkopplade mot Internet, blir utvecklingen av webbtjänster allt viktigare. Eftersom webbtjänster har en betydande roll för utvecklingen av Internet, uppstår frågan om vilka verktyg som bör användas för att uppnå den prestanda som dagens användare kräver. Ett vanligt tillvägagångssätt för implementering av webbtjänster är med arkitekturen REST. Dock har REST prestandasvagheter som overfetching, underfetching och hantering av slutpunkter som uppstår i fall där flera slutpunkter nås. Ett alternativ till REST är frågespråket GraphQL som utvecklades för att utesluta de svagheter som REST har och således förbättra prestanda vid datahämtning. I detta arbete utfördes prestandamätningar där latens och datavolym mättes vid olika typer av frågor för respektive GraphQL, REST utan cache och REST med cache. Latens är tidsintervallet från att en klient skickar en förfrågan till att klienten mottar svaret, och datavolym avser storleken på data i ett svarspaket som överförs från en server till en klient. REST med cache togs med i prestandamätningarna då det inte har undersökts i tidigare arbeten som jämfört prestanda mellan GraphQL och REST. Resultaten visade att GraphQL presterar bättre med avseende på både latens och datavolym, i jämförelse med de övriga systemen i fall där förfrågningar ställs mot två eller flera slutpunkter i REST. GraphQL presterade sämre än övriga system, med avseende på latens, när endast en slutpunkt i REST kontaktades. Däremot presterade GraphQL bättre än de övriga systemen, med avseende på datavolym, i samtliga fall. Vid jämförelse av REST med och utan cache visade det sig att ju fler slutpunkter som kontaktades, desto bättre presterade REST utan cache med avseende på datavolym medan REST med cache presterade bättre med avseende på latens. / With today’s rapid development of information technology and with the increase in the number of people connected to the Internet, the development of web services is becoming more important. As web services play a significant role in the development of the Internet, the question arises as to which tools should be used to achieve the performance required by today’s users. A common approach for implementing web services is with the architecture REST. However, REST has performance weaknesses such as overfetching, underfetching, and maintenance of endpoints, that arise in cases where multiple endpoints are accessed. An alternative to REST is the GraphQL query language, which was developed to exclude the weaknesses that REST has and thus improve performance in data retrieval. In this work, performance measurements were conducted where latency and data volume were measured for different types of queries for GraphQL, REST without cache and, REST with cache. Latency is the time interval between a client sending a request and the client receiving the response, and data volume refers to the size of data in a response packet that is transmitted from a server to a client. REST with cache was included in the experiment as it has not been investigated in previous work comparing performance between GraphQL and REST. The results showed that GraphQL performs better, in terms of both latency and data volume, compared to the other systems in cases where requests are made to two or more endpoints in REST. GraphQL performed worse than the other systems, in terms of latency, when only one endpoint in REST was contacted. However, GraphQL performed better than the other systems in terms of data volume in all cases. When comparing REST with and without cache, it turned out that the more endpoints that were contacted, the better REST without cache performed in terms of data volume, while REST with cache performed better in terms of latency. GraphQL REST cache latency data volume performance overfetching underfetching endpoints data-fetching GraphQL REST cache latens datavolym prestanda overfetching underfetching slutpunkter datahämtning Software Engineering Programvaruteknik Computer Engineering Datorteknik Information Systems
626	Realizing Low-Latency Internet Services via Low-Level Optimization of NFV Service Chains : Every nanosecond counts! Farshin, Alireza January 2019 (has links) By virtue of the recent technological developments in cloud computing, more applications are deployed in a cloud. Among these modern cloud-based applications, some require bounded and predictable low-latency responses. However, the current cloud infrastructure is unsuitable as it cannot satisfy these requirements, due to many limitations in both hardware and software. This licentiate thesis describes attempts to reduce the latency of Internet services by carefully studying the currently available infrastructure, optimizing it, and improving its performance. The focus is to optimize the performance of network functions deployed on commodity hardware, known as network function virtualization (NFV). The performance of NFV is one of the major sources of latency for Internet services. The first contribution is related to optimizing the software. This project began by investigating the possibility of superoptimizing virtualized network functions(VNFs). This began with a literature review of available superoptimization techniques, then one of the state-of-the-art superoptimization tools was selected to analyze the crucial metrics affecting application performance. The result of our analysis demonstrated that having better cache metrics could potentially improve the performance of all applications. The second contribution of this thesis employs the results of the first part by taking a step toward optimizing cache performance of time-critical NFV service chains. By doing so, we reduced the tail latencies of such systems running at 100Gbps. This is an important achievement as it increases the probability of realizing bounded and predictable latency for Internet services. / Tack vare den senaste tekniska utvecklingen inom beräkningar i molnet(“cloud computing”) används allt fler tillämpningar i molnlösningar. Flera avdessa moderna molnbaserade tillämpningar kräver korta svarstider är låga ochatt dessa ska vara förutsägbara och ligga inom givna gränser. Den nuvarandemolninfrastrukturen är dock otillräcklig eftersom den inte kan uppfylla dessa krav,på grund av olika typer av begränsningar i både hårdvara och mjukvara. I denna licentiatavhandling beskrivs försök att minska fördröjningen iinternettjänster genom att noggrant studera den nuvarande tillgängligainfrastrukturen, optimera den och förbättra dess prestanda. Fokus ligger påatt optimera prestanda för nätverksfunktioner som realiseras med hjälp avstandardhårdvara, känt som nätverksfunktionsvirtualisering (NFV). Prestanda hosNFV är en av de viktigaste källorna till fördröjning i internettjänster. Det första bidraget är relaterat till att optimera mjukvaran. Detta projektbörjade med att undersöka möjligheten att “superoptimera” virtualiseradenätverksfunktioner (VNF). Detta inleddes med en litteraturöversikt av tillgängligasuperoptimeringstekniker, och sedan valdes ett av de toppmodernasuperoptimeringsverktygen för att analysera de viktiga mätvärden som påverkartillämpningssprestanda. Resultatet av vår analys visade att bättre cache-mätningar potentiellt skulle kunna förbättra prestanda för alla tillämpningar. Det andra bidraget i denna avhandling utnyttjar resultaten från den förstadelen genom att ta ett steg mot att optimera cache-prestanda för tidskritiskakedjor av NFV-tjänster. Genom att göra så reducerade vi de långa fördröjningarnahos sådana system som kördes vid 100 Gbps. Detta är en viktig bedrift eftersomdetta ökar sannolikheten för att uppnå en begränsad och förutsägbar fördrörninghos internettjänster. / <p>QC 20190415</p> / Time-Critical Clouds / ULTRA Low-latency Internet services Network Function Virtualization Low-level Optimization Superoptimization Last Level Cache Internettjänster med låg fördröjning Virtualisering av nätverksfunktioner Optimering på låg nivå Superoptimering Sista-nivåns cache Communication Systems Kommunikationssystem
627	Performance improvements using dynamic performance stubs Trapp, Peter January 2011 (has links) This thesis proposes a new methodology to extend the software performance engineering process. Common performance measurement and tuning principles mainly target to improve the software function itself. Hereby, the application source code is studied and improved independently of the overall system performance behavior. Moreover, the optimization of the software function has to be done without an estimation of the expected optimization gain. This often leads to an under- or overoptimization, and hence, does not utilize the system sufficiently. The proposed performance improvement methodology and framework, called dynamic performance stubs, improves the before mentioned insufficiencies by evaluating the overall system performance improvement. This is achieved by simulating the performance behavior of the original software functionality depending on an adjustable optimization level prior to the real optimization. So, it enables the software performance analyst to determine the systems’ overall performance behavior considering possible outcomes of different improvement approaches. Moreover, by using the dynamic performance stubs methodology, a cost-benefit analysis of different optimizations regarding the performance behavior can be done. The approach of the dynamic performance stubs is to replace the software bottleneck by a stub. This stub combines the simulation of the software functionality with the possibility to adjust the performance behavior depending on one or more different performance aspects of the replaced software function. A general methodology for using dynamic performance stubs as well as several methodologies for simulating different performance aspects is discussed. Finally, several case studies to show the application and usability of the dynamic performance stubs approach are presented. 600
628	Blocs nuls dans la hiérarchie mémoire Dusser, Julien 16 December 2010 (has links) (PDF) La hiérarchie mémoire subit une pression qui ne cesse de croître. Cette pression a eu pour origine la montée en fréquence des processeurs. Cependant, maintenant que la fréquence stagne autour de 3 GHz, le nombre de cœurs d'exécution et donc le nombre de processus s'exécutant simultanément augmentent à leur tour. La hiérarchie mémoire subit alors un nombre croissant de requêtes, conduisant à la saturation de sa bande passante. Les travaux présentés dans cette thèse montrent que la hiérarchie mémoire est souvent utilisée pour transporter des blocs de données totalement nuls. Ces blocs de valeur triviale se trouvent particulièrement nombreux au dernier niveau de cache et au niveau de la mémoire principale. Nous proposons dans ce document d'utiliser un cache spécialisé dans la gestion de ces blocs nuls, le Zero-Content Augmented Cache. Ce dernier est composé d'un cache traditionnel et d'un cache dédié aux blocs nuls. Cette proposition permet à la fois d'augmenter les performances globales du système et de réduire significativement la bande passante mémoire utilisée. Dans ce document, nous proposons également une architecture de mémoire compressée utilisant la présence de blocs nuls, la Decoupled Zero-Compressed Memory. Cette mémoire permet de stocker un working-set plus grand que la taille de la mémoire physique, et donc de réduire significativement le nombre d'accès aux périphériques de stockage de masse. Blocs nuls Cache Mémoire principale Hiérarchie de mémoire (informatique) Antémémoire Données -- Compression (informatique) Zéro (le nombre)
629	Une analyse de système Leroudier, Jacques 10 April 1973 (has links) (PDF) . système d'exploitation système mémoires à cache anté-mémoire pagination systèmes auto-adaptifs systèmes à apprentissages
630	Caching rodents disproportionately disperse seed beneath invasive grass Sommers, Pacifica, Chesson, Peter 07 February 2017 (has links) Seed dispersal by caching rodents is a context-dependent mutualism in many systems. Plants benefit when seed remaining in shallow caches germinates before being eaten, often gaining protection from beetles and a favorable microsite in the process. Caching in highly unfavorable microsites, conversely, could undermine the dispersal benefit for the plant. Plant invasions could disrupt dispersal benefits of seed caching by attracting rodents to the protection of a dense invasive canopy which inhibits the establishment of native seedlings beneath it. To determine whether rodents disproportionately cache seed under the dense canopy of an invasive grass in southeastern Arizona, we used nontoxic fluorescent powder and ultraviolet light to locate caches of seed offered to rodents in the field. We fitted a general habitat-use model, which showed that disproportionate use of plant cover by caching rodents (principally Chaetodipus spp.) increased with moonlight. Across all moon phases, when rodents cached under plants, they cached under the invasive grass disproportionately to its relative cover. A greenhouse experiment showed that proximity to the invasive grass reduced the growth and survival of seedlings of a common native tree (Parkinsonia microphylla) whose seeds are dispersed by caching rodents. Biased dispersal of native seed to the base of an invasive grass could magnify the competitive effect of this grass on native plants, further reducing their recruitment and magnifying the effect of the invasion. cache Chaetodipus baileyi Chaetodipus intermedius Heteromyidae invasion mutualism disruption Neotoma albigula Parkinsonia microphylla Pennisetum ciliare predator avoidance Sonoran Desert

Search results