Global ETD Search

401	Conception d'une architecture extensible pour le calcul massivement parallèle / Designing a scalable architecture for massively parallel computing Kaci, Ania 14 December 2016 (has links) En réponse à la demande croissante de performance par une grande variété d’applications (exemples : modélisation financière, simulation sub-atomique, bio-informatique, etc.), les systèmes informatiques se complexifient et augmentent en taille (nombre de composants de calcul, mémoire et capacité de stockage). L’accroissement de la complexité de ces systèmes se traduit par une évolution de leur architecture vers une hétérogénéité des technologies de calcul et des modèles de programmation. La gestion harmonieuse de cette hétérogénéité, l’optimisation des ressources et la minimisation de la consommation constituent des défis techniques majeurs dans la conception des futurs systèmes informatiques.Cette thèse s’adresse à un domaine de cette complexité en se focalisant sur les sous-systèmes à mémoire partagée où l’ensemble des processeurs partagent un espace d’adressage commun. Les travaux porteront essentiellement sur l’implémentation d’un protocole de cohérence de cache et de consistance mémoire, sur une architecture extensible et sur la méthodologie de validation de cette implémentation.Dans notre approche, nous avons retenu les processeurs 64-bits d’ARM et des co-processeurs génériques (GPU, DSP, etc.) comme composants de calcul, les protocoles de mémoire partagée AMBA/ACE et AMBA/ACE-Lite ainsi que l’architecture associée « CoreLink CCN » comme solution de départ. La généralisation et la paramètrisation de cette architecture ainsi que sa validation dans l’environnement de simulation Gem5 constituent l’épine dorsale de cette thèse.Les résultats obtenus à la fin de la thèse, tendent à démontrer l’atteinte des objectifs fixés / In response to the growing demand for performance by a wide variety of applications (eg, financial modeling, sub-atomic simulation, bioinformatics, etc.), computer systems become more complex and increase in size (number of computing components, memory and storage capacity). The increased complexity of these systems results in a change in their architecture towards a heterogeneous computing technologies and programming models. The harmonious management of this heterogeneity, resource optimization and minimization of consumption are major technical challenges in the design of future computer systems.This thesis addresses a field of this complexity by focusing on shared memory subsystems where all processors share a common address space. Work will focus on the implementation of a cache coherence and memory consistency on an extensible architecture and methodology for validation of this implementation.In our approach, we selected processors 64-bit ARM and generic co-processor (GPU, DSP, etc.) as components of computing, shared memory protocols AMBA / ACE and AMBA / ACE-Lite and associated architecture "CoreLink CCN" as a starting solution. Generalization and parameterization of this architecture and its validation in the simulation environment GEM5 are the backbone of this thesis.The results at the end of the thesis, tend to demonstrate the achievement of objectives Systèmes informatiques Mémoire partagée Cohérence de cache Consistance mémoire Modélisation niveau transactionnel Réseaux d’interconnexion Computing systems Shared memory Cache coherency Memory consistancy Transactional level modeling Interconnection networks
402	Adaptive and intelligent memory systems / Système mémoire adaptatif intelligent Sridharan, Aswinkumar 15 December 2016 (has links) Dans cette thèse, nous nous sommes concentrés sur l'interférence aux ressources de la hiérarchie de la mémoire partagée : cache de dernier niveau et accès à la mémoire hors-puce dans le contexte des systèmes multicœurs à grande échelle. À cette fin, le premier travail a porté sur les caches de dernier niveau partagées, où le nombre d'applications partageant le cache pourrait dépasser l'associativité du cache. Pour gérer les caches dans de telles situations, notre solution évalue l'empreinte du cache des applications pour déterminer approximativement à quel point elles pourraient utiliser le cache. L'estimation quantitative de l'utilitaire de cache permet explicitement de faire respecter différentes priorités entre les applications. La seconde partie apporte une prédétection dans la gestion de la mémoire cache. En particulier, nous observons les blocs cache pré-sélectionnés pour présenter un bon comportement de réutilisation dans le contexte de caches plus grands. Notre troisième travail est axé sur l'interférence entre les demandes à la demande et les demandes de prélecture à l'accès partagé à la mémoire morte. Ce travail est basé sur deux observations fondamentales de la fraction des requêtes de prélecture générées et de sa corrélation avec l'utilité de prélecture et l'interférence causée par le prélecteur. Au total, deux observations conduisent à contrôler le flux de requêtes de prélecture entre les mémoires LLC et off-chip. / In this thesis, we have focused on addressing interference at the shared memory-hierarchy resources: last level cache and off-chip memory access in the context of large-scale multicore systems. Towards this end, the first work focused on shared last level caches, where the number of applications sharing the cache could exceed the associativity of the cache. To manage caches in such situations, our solution estimates the cache footprint of applications to approximate how well they could utilize the cache. Quantitative estimate of cache utility explicitly allows enforcing different priorities across applications. The second part brings in prefetch awareness in cache management. In particular, we observe prefetched cache blocks to exhibit good reuse behavior in the context of larger caches. Our third work focuses on addressing interference between on-demand and prefetch requests at the shared off-chip memory access. This work is based on two fundamental observations of the fraction of prefetch requests generated and its correlation with prefetch usefulness and prefetcher-caused interference. Altogether, two observations lead to control the flow of prefetch requests between LLC and off-chip memory. Systèmes multicœurs à grande échelle Cache (informatique) Prélecture Cache Management Large-Scale multicores Prefetcher aggressiveness control Memory system Footprint-Number Prefetch-Fraction
403	Cache architectures based on heterogeneous technologies to deal with manufacturing errors Lorente Garcés, Vicente Jesús 02 December 2015 (has links) [EN] SRAM technology has traditionally been used to implement processor caches since it is the fastest existing RAM technology.However,one of the major drawbacks of this technology is its high energy consumption.To reduce this energy consumption modern processors mainly use two complementary techniques: i)low-power operating modes and ii)low-power memory technologies.The first technique allows the processor working at low clock frequencies and supply voltages.The main limitation of this technique is that manufacturing defects can significantly affect the reliability of SRAM cells when working these modes.The second technique brings alternative technologies such as eDRAM, which provides minimum area and power consumption.The main drawback of this memory technology is that reads are destructive and eDRAM cells work slower than SRAM ones. This thesis presents three main contributions regarding low-power caches and heterogeneous technologies: i)an study that identifies the optimal capacitance of eDRAM cells, ii)a novel cache design that tolerates the faults produced by SRAM cells in low-power modes, iii)a methodology that allows obtain the optimal operating frequency/voltage level when working with low-power modes. Regarding the first contribution,in this work SRAM and eDRAM technologies are combined to achieve a low-power fast cache that requires smaller area than conventional designs and that tolerates SRAM failures.First,this dissertation focuses on one of the main critical aspects of the design of heterogeneous caches:eDRAM cell capacitance.In this dissertation the optimal capacitance for an heterogeneous L1 data cache is identified by analyzing the compromise between performance and energy consumption.Experimental results show that an heterogeneous cache implemented with 10fF capacitors offers similar performance as a conventional SRAM cache while providing 55% energy savings and reducing by 29% the cache area. Regarding the second contribution,this thesis proposes a novel organization for a fault-tolerant heterogeneous cache.Currently,reducing the supply voltage is a mechanism widely used to reduce consumption and applies when the system workload activity decreases.However,SRAM cells cause different types of failures when the supply voltage is reduced and thus they limit the minimum operating voltage of the microprocessor. In the proposal,memory cells implemented with eDRAM technology serve as backup in case of failure of SRAM cells, because the correct operation of eDRAM cells is not affected by reduced voltages. The proposed architecture has two working modes: high-performance mode for supply voltages that do not induce SRAM cell failures, and low-power mode for those voltages that cause SRAM cell failures. In high-performance mode, the cache provides full capacity, which enables the processor to achieve its maximum performance. In low-power mode, the effective capacity of the cache is reduced because some of the eDRAM cells are dedicated to recover from SRAM failures. Experimental results show that the performance is scarcely reduced (e.g. less than 2.7% across all the studied benchmarks) with respect to an ideal SRAM cache without failures. Finally,this thesis proposes a methodology to find the optimal frequency/voltage level regarding energy consumption for the designed heterogeneous cache. For this purpose, first SRAM failure types and their probabilities are characterized.Then,the energy consumption of different frequency/voltage levels is evaluated when the system works in low-power mode.The study shows that, mainly due to the impact of SRAM failures on performance,the optimal combination of voltage and frequency from the energy point of view does not always correspond to the minimum voltage. / [ES] La tecnología SRAM se ha utilizado tradicionalmente para implementar las memorias cache debido a que es la tecnología de memoria RAM más rápida existente.Por contra,uno de los principales inconvenientes de esta tecnología es su elevado consumo energético.Para reducirlo los procesadores modernos suelen emplear dos técnicas complementarias:i) modos de funcionamiento de bajo consumo y ii)tecnologías de bajo consumo.La primeras técnica consiste en utilizar bajas frecuencias y voltajes de funcionamiento.La principal limitación de esta técnica es que los defectos de fabricación pueden afectar notablemente a la fiabilidad de las celdas SRAM en estos modos.La segunda técnica agrupa tecnologías alternativas como la eDRAM,que ofrece área y consumo mínimos.El inconveniente de esta tecnología es que las lecturas son destructivas y es más lenta que la SRAM. Esta tesis presenta tres contribuciones principales centradas en caches de bajo consumo y tecnologías heterogéneas: i)estudio de la capacitancia óptima de las celdas eDRAM, ii)diseño de una cache tolerante a fallos producidos en las celdas SRAM en modos de bajo consumo, iii)metodología para obtener la relación óptima entre voltaje y frecuencia en procesadores con modos de bajo consumo. Respecto a la primera contribución,en este trabajo se combinan las tecnologías SRAM y eDRAM para conseguir una memoria cache rápida, de bajo consumo, área reducida, y tolerante a los fallos inherentes a la tecnología SRAM.En primer lugar,esta disertación se centra en uno de los aspectos críticos de diseño de caches heterogéneas SRAM/eDRAM: la capacitancia de los condensadores implementados con tecnología eDRAM.En esta tesis se identifica la capacitancia óptima de una cache de datos L1 heterogénea mediante el estudio del compromiso entre prestaciones y consumo energético.Los resultados experimentales muestran que condensadores de 10fF ofrecen prestaciones similares a las de una cache SRAM convencional ahorrando un 55% de consumo y reduciendo un 29% el área ocupada por la cache. Respecto a la segunda contribución,esta tesis propone una organización de cache heterogénea tolerante a fallos.Actualmente,reducir el voltaje de alimentación es un mecanismo muy utilizado para reducir el consumo en condiciones de baja carga.Sin embargo,las celdas SRAM producen distintos tipos de fallos cuando se reduce el voltaje de alimentación y por tanto limitan el voltaje mínimo de funcionamiento del microprocesador. En la cache heterogénea propuesta,las celdas de memoria implementadas con tecnología eDRAM sirven de copia de seguridad en caso de fallo de las celdas SRAM, ya que el correcto funcionamiento de las celdas eDRAM no se ve afectado por tensiones reducidas.La arquitectura propuesta consta de dos modos de funcionamiento: high-performance mode para voltajes de alimentación que no inducen fallos en celdas implementadas en tecnología SRAM, y low-power mode para aquellos que sí lo hacen. En el modo high-performance mode,el procesador dispone de toda la capacidad de la cache.En el modo low-power mode se reduce la capacidad efectiva de la cache puesto que algunas de las celdas eDRAM se dedican a la recuperación de fallos de celdas SRAM.El estudio de prestaciones realizado muestra que éstas bajan hasta un máximo de 2.7% con respecto a una cache perfecta sin fallos. Finalmente, en esta tesis se propone una metodología para encontrar la relación óptima de voltaje/frecuencia con respecto al consumo energético sobre la cache heterogénea previamente diseñada. Para ello,primero se caracterizan los tipos de fallos SRAM y las probabilidades de fallo de los mismos.Después,se evalúa el consumo energético de diferentes combinaciones de voltaje/frecuencia cuando el sistema se encuentra en un modo de bajo consumo.El estudio muestra que la combinación óptima de voltaje y frecuencia desde el punto de vista energético no siempre corresponde al mínimo voltaje debido al imp / [CAT] La tecnologia SRAM s'ha utilitzat tradicionalment per a implementar les memòries cau degut a que és la tecnologia de memòria RAM més ràpida existent.Per contra, un dels principals inconvenients d'aquesta tecnologia és el seu elevat consum energètic.Per a reduir el consum els processadors moderns solen emprar dues tècniques complementàries: i)modes de funcionament de baix consum i ii)tecnologies de baix consum.La primera tècnica consisteix en utilitzar baixes freqüències i voltatges de funcionament.La principal limitació d'aquesta tècnica és que els defectes de fabricació poden afectar notablement a la fiabilitat de les cel·les SRAM en aquests modes.La segona tècnica agrupa tecnologies alternatives com la eDRAM, que ofereix àrea i consum mínims.L'inconvenient d'aquesta tecnologia és que les lectures són destructives i és més lenta que la SRAM. Aquesta tesi presenta tres contribucions principals centrades en caus de baix consum i tecnologies heterogènies: i)estudi de la capacitancia òptima de les cel·les eDRAM, ii)disseny d'una cau tolerant a fallades produïdes en les cel·les SRAM en modes de baix consum, iii)metodologia per a obtenir la relació òptima entre voltatge i freqüència en processadors amb modes de baix consum. Respecte a la primera contribució, en aquest treball es combinen les tecnologies SRAM i eDRAM per a aconseguir una memòria cau ràpida, de baix consum, àrea reduïda, i tolerant a les fallades inherents a la tecnologia SRAM.En primer lloc, aquesta dissertació se centra en un dels aspectes crítics de disseny de caus heterogènies: la capacitancia dels condensadors implementats amb tecnologia eDRAM.En aquesta dissertació s'identifica la capacitancia òptima d'una cache de dades L1 heterogènia mitjançant l'estudi del compromís entre prestacions i consum energètic.Els resultats experimentals mostren que condensadors de 10fF ofereixen prestacions similars a les d'una cau SRAM convencional estalviant un 55% de consum i reduint un 29% l'àrea ocupada per la cau. Respecte a la segona contribució, aquesta tesi proposa una organització de cau heterogènia tolerant a fallades.Actualment,reduir el voltatge d'alimentació és un mecanisme molt utilitzat per a reduir el consum en condicions de baixa càrrega.Per contra, les cel·les SRAM produeixen diferents tipus de fallades quan es redueix el voltatge d'alimentació i per tant limiten el voltatge mínim de funcionament del microprocessador. En la cau heterogènia proposta, les cel·les de memòria implementades amb tecnologia eDRAM serveixen de còpia de seguretat en cas de fallada de les cel·les SRAM, ja que el correcte funcionament de les cel·les eDRAM no es veu afectat per tensions reduïdes.L'arquitectura proposada consta de dues maneres de funcionament: high-performance mode per a voltatges d'alimentació que no indueixen fallades en cel·les implementades en tecnologia SRAM,i low-power mode per a aquells que sí ho fan.En el mode high-performance,el processador disposa de tota la capacitat de la cau.En el mode low-power es redueix la capacitat efectiva de la cau posat que algunes de les cel·les eDRAM es dediquen a la recuperació de fallades de cel·les SRAM.L'estudi de prestacions realitzat mostra que aquestes baixen fins a un màxim de 2.7% pel que fa a una cache perfecta sense fallades. Finalment,en aquesta tesi es proposa una metodologia per a trobar la relació òptima de voltatge/freqüència pel que fa al consum energètic sobre la cau heterogènia prèviament dissenyada.Per a açò,primer es caracteritzen els tipus de fallades SRAM i les probabilitats de fallada de les mateixes.Després,s'avalua el consum energètic de diferents combinacions de voltatge/freqüència quan el sistema es troba en un mode de baix consum.L'estudi mostra que la combinació òptima de voltatge i freqüència des del punt de vista energètic no sempre correspon al mínim voltatge degut a l'impacte de les fallades de SRAM en les pres / Lorente Garcés, VJ. (2015). Cache architectures based on heterogeneous technologies to deal with manufacturing errors [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/58428 / TESIS Cache Memory Fault-Tolerant cache SRAM DRAM EDRAM Memory cell Process variation Low-Power Performance Area Leakage Energy
404	Workload Driven Designs for Cost-Effective Non-Volatile Memory Hierarchies Timothy A Pritchett (9179468) 28 July 2020 (has links) Compared to traditional hard-disk drives (HDDs), non-volatile (NV) memory technologies oﬀer signiﬁcant performance advantages on one hand, but also incur signiﬁcant cost and asymmetric write-performance on the other. A common strategy to manage such cost- and performance-diﬀerentials is to use hierarchies such that a small, but intensely accessed, working set is staged in the NV storage (selective caching). However, when this working set includes write-heavy data, the low write-lifetime of NV storage necessitates signiﬁcant over-provisioning to maintain required lifespans (e.g., storage lifespan must match or exceed 3 year server lifespan). One may think that employing DRAM-based write-buﬀers can ﬁlter writes that trickle through to the NV storage and thus alleviate the write-pressure felt at the NV storage. Unfortunately, selective caches, when used with common recency-based or frequency-based replacement, have access patterns that require large write buﬀers (e.g., 100MB+ relative to a 12GB cache) to ﬁlter writes adequately. Further, these large DRAM write-buﬀers also require backup-power to ensure the durability of disk writes. More sophisticated replacement policies that combine recency and frequency can reduce the size of the DRAM buﬀer (while preserving write-ﬁltering), but are so computationally-expensive that they can limit the I/O rate, especially for simple controllers (e.g., RAID controller). <br>My ﬁrst contribution is the design and implementation of WriteGuard– a self-tuning sieving write-buﬀer algorithm that ﬁlters writes as well as the highly-eﬀective (but computationally-expensive) algorithms while requiring lightweight computation comparable to a simple LRU-based write-buﬀer. While WriteGuard reduces the capacity needed for DRAM buﬀering (to approx. 64 MB), it does not eliminate the need for DRAM buﬀers (and corresponding power backup).<br>For my second thrust, I identify two speciﬁc application characteristics – (1) the vast majority of the write-buﬀer’s contents is composed of write-dominant blocks, and (2) the vast majority of blocks in the write-buﬀer are overwritten within a period of 28 hours. I show that these characteristics help enable a high-density, optimized STT-MRAM as a replacement for DRAM, which enables durable write-buﬀers (thus eliminating the cost of power backup for the write-buﬀer). My optimized STT-MRAM-based write buﬀer achieves higher density by (a) trading oﬀ superﬂuous durability by exploiting characteristic (2), and (b) deoptimizing the read-performance of STT-MRAM by leveraging characteristic (1). Together, the techniques increase the density of STT-MRAM by 20% with low or no impact on write-buﬀer performance.<br> Computer Engineering Software Engineering SSD Caching Storage Caches Write Buffering Selective Allocation STT-MRAM Flash Cache Non-Volatile Cache Non-Volatile Write Buffer
405	Design and Implementation of a Graceful Degradation Approach for Polymorphic Role Invocation in Object Teams Kummer, Cornelius 07 September 2021 (has links) In the ever-evolving world of modern software engineering, dynamic and context-dependent adaptability becomes increasingly important. A promising new paradigm that has been proposed is role-oriented programming, an extension of object-oriented programming which allows collaborative relationships of objects to be modeled. Through the introduction of roles and contexts, the behavior of objects can be adapted at run-time via addition or modification of attributes and methods. This dynamism however incurs a high overhead, especially in the area of role function invocation. Recent research has found a remedy inspired by polymorphic inline caches, allowing reuse of so-called dispatch plans which encode the steps directly required for the execution of adaptations. With this optimization, an average speedup of 4.0× was achieved in static contexts and 1.1× in variable contexts. Still, performance sharply drops off at a certain degree of volatility as a consequence of cache capacity exhaustion. This thesis presents a fallback mechanism that is to be used at highly variable call sites which would normally cause a significant slowdown with the new approach. In addition, an optimized reuse mechanism is proposed, further improving execution efficiency. Evaluation through benchmarking shows complete elimination of the aforementioned overhead, meaning a speedup of 16.5×, while the previously achieved speedup is maintained. info:eu-repo/classification/ddc/004 ddc:004
406	Investigating the effect of implementing Data-Oriented Design principles on performance and cache utilization Nyberg, Frank January 2021 (has links) Game engines process a lot of data under strict deadlines. Therefore, measures to increase performance are important in this area. Data-Oriented Design (DOD) promotes principles that are meant to increase performance by better cache utilization. The purpose of this thesis is to examine a selection of these principles to give a better understanding of how DOD affects CPU time and the rate of cache misses, with focus on the area of game development. More specifically, the principles examined are removal of run-time polymorphism, iteration over contiguous data, and lowering the amount of data in hot loops. Also, the Entity-Component-System pattern is examined, which is based upon DOD principles. The approach was to first present a theoretical background on the subject, and then to conduct tests by implementing a simulation of movement and collision detection utilizing said principles. The tests were written in C++ and executed on an Intel Core i7 4770k with no rendering. CPU time was measured in updated entities per μs, and cache utilization was measured in the form of cache miss rate. The results showed that the DOD principles did increase performance. Cache miss rate was also lower, with the exception of when removing run-time polymorphism. The conclusion is that Data-Oriented Design, used in game development, is likely to result in better performance, mostly as a result of better cache utilization. Data-oriented design DOD ECS Entity Component System HPC CPU cache cache misses games game architecture game engines design patterns Computer Sciences Datavetenskap (datalogi)
407	Jádra schématu lifting pro vlnkovou transformaci / Lifting Scheme Cores for Wavelet Transform Bařina, David Unknown Date (has links) Práce se zaměřuje na efektivní výpočet dvourozměrné diskrétní vlnkové transformace. Současné metody jsou v práci rozšířeny v několika směrech a to tak, aby spočetly tuto transformaci v jediném průchodu, a to případně víceúrovňově, použitím kompaktního jádra. Tohle jádro dále může být vhodně přeorganizováno za účelem minimalizace užití některých prostředků. Představený přístup krásně zapadá do běžně používaných rozšíření SIMD, využívá hierarchii cache pamětí moderních procesorů a je vhodný k paralelnímu výpočtu. Prezentovaný přístup je nakonec začleněn do kompresního řetězce formátu JPEG 2000, ve kterém se ukázal být zásadně rychlejší než široce používané implementace.
408	Performance Analysis of Complex Shared Memory Systems Molka, Daniel 10 March 2017 (has links) Systems for high performance computing are getting increasingly complex. On the one hand, the number of processors is increasing. On the other hand, the individual processors are getting more and more powerful. In recent years, the latter is to a large extent achieved by increasing the number of cores per processor. Unfortunately, scientific applications often fail to fully utilize the available computational performance. Therefore, performance analysis tools that help to localize and fix performance problems are indispensable. Large scale systems for high performance computing typically consist of multiple compute nodes that are connected via network. Performance analysis tools that analyze performance problems that arise from using multiple nodes are readily available. However, the increasing number of cores per processor that can be observed within the last decade represents a major change in the node architecture. Therefore, this work concentrates on the analysis of the node performance. The goal of this thesis is to improve the understanding of the achieved application performance on existing hardware. It can be observed that the scaling of parallel applications on multi-core processors differs significantly from the scaling on multiple processors. Therefore, the properties of shared resources in contemporary multi-core processors as well as remote accesses in multi-processor systems are investigated and their respective impact on the application performance is analyzed. As a first step, a comprehensive suite of highly optimized micro-benchmarks is developed. These benchmarks are able to determine the performance of memory accesses depending on the location and coherence state of the data. They are used to perform an in-depth analysis of the characteristics of memory accesses in contemporary multi-processor systems, which identifies potential bottlenecks. However, in order to localize performance problems, it also has to be determined to which extend the application performance is limited by certain resources. Therefore, a methodology to derive metrics for the utilization of individual components in the memory hierarchy as well as waiting times caused by memory accesses is developed in the second step. The approach is based on hardware performance counters, which record the number of certain hardware events. The developed micro-benchmarks are used to selectively stress individual components, which can be used to identify the events that provide a reasonable assessment for the utilization of the respective component and the amount of time that is spent waiting for memory accesses to complete. Finally, the knowledge gained from this process is used to implement a visualization of memory related performance issues in existing performance analysis tools. The results of the micro-benchmarks reveal that the increasing number of cores per processor and the usage of multiple processors per node leads to complex systems with vastly different performance characteristics of memory accesses depending on the location of the accessed data. Furthermore, it can be observed that the aggregated throughput of shared resources in multi-core processors does not necessarily scale linearly with the number of cores that access them concurrently, which limits the scalability of parallel applications. It is shown that the proposed methodology for the identification of meaningful hardware performance counters yields useful metrics for the localization of memory related performance limitations. info:eu-repo/classification/ddc/004 ddc:004
409	Cache-Efficient Aggregation: Hashing Is Sorting Müller, Ingo, Sanders, Peter, Lacurie, Arnaud, Lehner, Wolfgang, Färber, Franz 14 June 2022 (has links) For decades researchers have studied the duality of hashing and sorting for the implementation of the relational operators, especially for efficient aggregation. Depending on the underlying hardware and software architecture, the specifically implemented algorithms, and the data sets used in the experiments, different authors came to different conclusions about which is the better approach. In this paper we argue that in terms of cache efficiency, the two paradigms are actually the same. We support our claim by showing that the complexity of hashing is the same as the complexity of sorting in the external memory model. Furthermore we make the similarity of the two approaches obvious by designing an algorithmic framework that allows to switch seamlessly between hashing and sorting during execution. The fact that we mix hashing and sorting routines in the same algorithmic framework allows us to leverage the advantages of both approaches and makes their similarity obvious. On a more practical note, we also show how to achieve very low constant factors by tuning both the hashing and the sorting routines to modern hardware. Since we observe a complementary dependency of the constant factors of the two routines to the locality of the input, we exploit our framework to switch to the faster routine where appropriate. The result is a novel relational aggregation algorithm that is cache-efficient---independently and without prior knowledge of input skew and output cardinality---, highly parallelizable on modern multi-core systems, and operating at a speed close to the memory bandwidth, thus outperforming the state-of-the-art by up to 3.7x. info:eu-repo/classification/ddc/004 ddc:004
410	Décomposition de multi-flots et localisation de caches dans les réseaux / Multi flow decomposition methods and network cache location Bauguion, Pierre-Olivier 22 September 2014 (has links) Les nouveaux acteurs, les nouveaux services et les nouveaux contenus multimédias qui transitent sur le réseau internet génèrent un trafic et des débits de plus en plus élevés. Ceci peut occasionner une congestion, source de latence et de dépréciation de la qualité de service ressentie par les utilisateurs. Un fournisseur d'accès à internet dont l'objectif est de garantir un réseau d'excellence doit donc prendre des mesures pour améliorer sans cesse la fluidité de son réseau. Cela passe notamment par la mise en place d'un réseau de distribution de contenus (déploiement de dispositifs sur le réseau existant). Dans un premier temps cette thèse s'articule à présenter des approches de programmation dynamique de localisation de serveurs optimales dans des arborescences. Nous présentons également un approche pour résoudre le problème de déploiement de CDN et de k serveurs/caches à l'aide de l'algorithme exact et polynomial d'intersection de matroïdes. Nous explicitons ensuite ce qu'est un cache et quelles sont ses caractéristiques. Nous définissons ensuite les hypothèses effectuées et la modélisation associée pour le déploiement de caches transparents dans une arborescence, et le liens avec les algorithmes existants présentés précédemment. Nous présentons alors un modèle complet pour un programme linéaire en nombres entiers (PLNE) et un nouveau paradigme de programmation dynamique pour résoudre ce même problème. Nous montrons alors en quoi cette approche se généralise à des problèmes connexes de localisation dans les arborescences, ainsi que les performances pratiques d'une telle approche. D'un regard plus théorique, nous mesurons la capacité d'un réseau donné par le routage optimal de ses demandes, et, de ce fait, ses liens critiques. Nous manipulons alors le problème de flot concurrent maximal (FCM), un problème classique de la littérature de recherche opérationnelle. Nous exhibons alors de nouvelles formulations exactes pour résoudre ce problème, ainsi que les problèmes de multi-flots de manière plus générale. Une heuristique de construction de formulation pour le FCM est également proposée, pour tirer parti de la distribution spécifique des capacités d'une instance. Nous montrons alors la supériorité des performances de ces nouvelles formulations par le biais de comparaisons. Enfin, nous décrivons le premier algorithme exact et fortement polynomial pour résoudre le problème de flot concurrent maximal dans le cas d'une seule source; et nous montrons l'efficacité pratique d'une telle approche, comparée aux meilleures formulations explicitées précédemment / Streaming requirements on internet network are even more driven by new actors, new services and new digital contents. This leads to high probability of congestion, latency and therefore, a critical decrease of quality of service and/or experience for customers. An internet service provider (ISP) whose goal is to guarantee a first-class performance, needs to take measures to constantly enhance the fluidity of the traffic streaming on its network. One way to face the problem, is to build a Content Delivery Network (CDN). A CDN mainly consists in the deployment of different devices on an existing network. First of all, this thesis presents dynamic programming approaches to tackle server location problems in tree networks. Then, we address a variation of the matroïd intersection algorithm to solve the k-server/cache location problem. We start by giving the definition and characteristics of transparent-caching, as well as the hypothesis that we will use it to build models for transparent cache location in tree network. We tract it to a Mixed Integer Program, and formulate a new paradigm of dynamic programming. We show the relevance of such approach for our problem, and to what extent it can be tractable in other related problems. From a more theoretical point of view, we manage to measure the capacity of a network which is given by the optimal routing strategy, and hence, to identify its critical links. We deal with the Maximum Concurrent Flow (MCF), a classical combinatorial optimization problem. We propose new models and formulations to solve this problem exactly, and more general multi-flows problems as well. A heuristic is also given, to adapt the model to the specific instance values. We experiment these formulations to show the improvements they can provide. Finally, we describe the first strongly polynomial algorithm to solve the maximum concurrent flow to optimality, in the single source case. We show the efficiency of such an approach, even compared to the best models previously presented Localisation Décomposition de multi-flots Arbres (graphes) Arborescences Cache Maximum concurrent flow Polynomial Location Multi flow decomposition Tree (graph theory) Tree view Cache Maximum concurrent flow Polynomial

Search results