Global ETD Search

11	Energy-Aware Data Management on NUMA Architectures Kissinger, Thomas 23 March 2017 (has links) The ever-increasing need for more computing and data processing power demands for a continuous and rapid growth of power-hungry data center capacities all over the world. As a first study in 2008 revealed, energy consumption of such data centers is becoming a critical problem, since their power consumption is about to double every 5 years. However, a recently (2016) released follow-up study points out that this threatening trend was dramatically throttled within the past years, due to the increased energy efficiency actions taken by data center operators. Furthermore, the authors of the study emphasize that making and keeping data centers energy-efficient is a continuous task, because more and more computing power is demanded from the same or an even lower energy budget, and that this threatening energy consumption trend will resume as soon as energy efficiency research efforts and its market adoption are reduced. An important class of applications running in data centers are data management systems, which are a fundamental component of nearly every application stack. While those systems were traditionally designed as disk-based databases that are optimized for keeping disk accesses as low a possible, modern state-of-the-art database systems are main memory-centric and store the entire data pool in the main memory, which replaces the disk as main bottleneck. To scale up such in-memory database systems, non-uniform memory access (NUMA) hardware architectures are employed that face a decreased bandwidth and an increased latency when accessing remote memory compared to the local memory. In this thesis, we investigate energy awareness aspects of large scale-up NUMA systems in the context of in-memory data management systems. To do so, we pick up the idea of a fine-grained data-oriented architecture and improve the concept in a way that it keeps pace with increased absolute performance numbers of a pure in-memory DBMS and scales up on NUMA systems in the large scale. To achieve this goal, we design and build ERIS, the first scale-up in-memory data management system that is designed from scratch to implement a data-oriented architecture. With the help of the ERIS platform, we explore our novel core concept for energy awareness, which is Energy Awareness by Adaptivity. The concept describes that software and especially database systems have to quickly respond to environmental changes (i.e., workload changes) by adapting themselves to enter a state of low energy consumption. We present the hierarchically organized Energy-Control Loop (ECL), which is a reactive control loop and provides two concrete implementations of our Energy Awareness by Adaptivity concept, namely the hardware-centric Resource Adaptivity and the software-centric Storage Adaptivity. Finally, we will give an exhaustive evaluation regarding the scalability of ERIS as well as our adaptivity facilities. info:eu-repo/classification/ddc/004 ddc:004
12	BUZZARD: A NUMA-Aware In-Memory Indexing System Maas, Lukas M., Kissinger, Thomas, Habich, Dirk, Lehner, Wolfgang 14 June 2022 (has links) With the availability of large main memory capacities, in-memory index structures have become an important component of modern data management platforms. Current research even suggests index-based query processing as an alternative or supplement for traditional tuple-at-a-time processing models. However, while simple sequential scan operations can fully exploit the high bandwidth provided by main memory, indexes are mainly latency bound and spend most of their time waiting for memory accesses. Considering current hardware trends, the problem of high memory latency is further exacerbated as modern shared-memory multiprocessors with non-uniform memory access (NUMA) become increasingly common. On those NUMA platforms, the execution time of index operations is dominated by memory access latency that increases dramatically when accessing memory on remote sockets. Therefore, good index performance can only be achieved through careful optimization of the index structure to the given topology. BUZZARD is a NUMA-aware in-memory indexing system. Using adaptive data partitioning techniques, BUZZARD distributes a prefix-tree-based index across the NUMA system and hands off incoming requests to worker threads located on each partition's respective NUMA node. This approach reduces the number of remote memory accesses to a minimum and improves cache utilization. In addition, all indexes inside BUZZARD are only accessed by their respective owner, eliminating the need for synchronization primitives like compare-and-swap. NUMA, in-memory indexing, prefix trees info:eu-repo/classification/ddc/004 ddc:004
13	Virtualisation efficace d'architectures NUMA / Efficient virtualization of NUMA architectures Voron, Gauthier 08 March 2018 (has links) Alors que le surcoût de la virtualisation reste marginal sur des machines peu puissantes, la situation change radicalement quand le nombre de cœur disponible augmente. Il existe aujourd’hui des machines de plusieurs dizaines de cœurs dans les data centers dédiés au cloud computing, un modèle de gestion de ressources qui utilise largement la virtualisation. Ces machines reposent sur une architecture Non Uniform Memory Access (NUMA) pour laquelle le placement des tâches sur les cœurs ainsi que celui des données en mémoire est déterminant pour les performances.Cette thèse montre d’une part comment la virtualisation affecte le comportement des applications en les empêchant notamment d’utiliser un placement efficace de leurs données en mémoire. Cette étude montre que les erreurs de placement ainsi provoquées engendrent une dégradation des performances allant jusqu’à 700%. D’autre part, cette thèse propose une méthode qui permet la virtualisation efficace d’architectures NUMA par la mise en œuvre dans l’hyperviseur Xen de politiques génériques de placement mémoire. Une évaluation sur un ensemble de 29 applications exécutées sur une machine NUMA de 48 cœurs montre que ces politiques multiplient les performances de 9 de ces applications par 2 ou plus et diminuent le surcoût de la virtualisation à moins de 50% pour 23 d’entre elles. / While virtualization only introduces a negligible overhead on machines with few cores, this is not the case when the number of cores increases. We can find such computers with tens of cores in todays data centers dedicated to the cloud computing, a resource management model which relies on virtualization. These large multicore machines have a complex architecture, called Non Uniform Memory Access (NUMA). Achieving high performance on a NUMA architecture requires to wisely place application threads on the appropriate cores and application data in the appropriate memory bank.In this thesis, we show how virtualization techniques modify the applications behavior by preventing them to efficiently place their data in memory. We show that the data misplacement leads to a serious performance degradation, up to 700%.Additionally, we suggest a method which allows the Xen hypervisor to efficiently virtualize NUMA architectures by implementing a set of generic memory placement policies. With an evaluation over a set of 29 applications on a 48-cores machine, we show that the NUMA policies can multiply the performance of 9 applications by more than 2 and decrease the virtualization overhead below 50% for 23 of them. Système Virtualisation NUMA Cloud computing Hyperviseur Mémoire Virtualization Cloud computing NUMA 004.2
14	On Optimizing Transactional Memory: Transaction Splitting, Scheduling, Fine-grained Fallback, and NUMA Optimization Mohamedin, Mohamed Ahmed Mahmoud 01 September 2015 (has links) The industrial shift from single core processors to multi-core ones introduced many challenges. Among them, a program cannot get a free performance boost by just upgrading to a new hardware because new chips include more processing units but at the same (or comparable) clock speed as the previous generation. In order to effectively exploit the new available hardware and thus gain performance, a program should maximize parallelism. Unfortunately, parallel programming poses several challenges, especially when synchronization is involved because parallel threads need to access the same shared data. Locks are the standard synchronization mechanism but gaining performance using locks is difficult for a non-expert programmers and without deeply knowing the application logic. A new, easier, synchronization abstraction is therefore required and Transactional Memory (TM) is the concrete candidate. TM is a new programming paradigm that simplifies the implementation of synchronization. The programmer just defines atomic parts of the code and the underlying TM system handles the required synchronization, optimistically. In the past decade, TM researchers worked extensively to improve TM-based systems. Most of the work has been dedicated to Software TM (or STM) as it does not requires special transactional hardware supports. Very recently (in the past two years), those hardware supports have become commercially available as commodity processors, thus a large number of customers can finally take advantage of them. Hardware TM (or HTM) provides the potential to obtain the best performance of any TM-based systems, but current HTM systems are best-effort, thus transactions are not guaranteed to commit in any case. In fact, HTM transactions are limited in size and time as well as prone to livelock at high contention levels. Another challenge posed by the current multi-core hardware platforms is their internal architecture used for interfacing with the main memory. Specifically, when the common computer deployment changed from having a single processor to having multiple multi-core processors, the architects redesigned also the hardware subsystem that manages the memory access from the one providing a Uniform Memory Access (UMA), where the latency needed to fetch a memory location is the same independently from the specific core where the thread executes on, to the current one with a Non-Uniform Memory Access (NUMA), where such a latency differs according to the core used and the memory socket accessed. This switch in technology has an implication on the performance of concurrent applications. In fact, the building blocks commonly used for designing concurrent algorithms under the assumptions of UMA (e.g., relying on centralized meta-data) may not provide the same high performance and scalability when deployed on NUMA-based architectures. In this dissertation, we tackle the performance and scalability challenges of multi-core architectures by providing three solutions for increasing performance using HTM (i.e., Part-HTM, Octonauts, and Precise-TM), and one solution for solving the scalability issues provided by NUMA-architectures (i.e., Nemo). • Part-HTM is the first hybrid transactional memory protocol that solves the problem of transactions aborted due to the resource limitations (space/time) of current best-effort HTM. The basic idea of Part-HTM is to partition those transactions into multiple sub-transactions, which can likely be committed in hardware. Due to the eager nature of HTM, we designed a low-overhead software framework to preserve transaction's correctness (with and without opacity) and isolation. Part-HTM is efficient: our evaluation study confirms that its performance is the best in all tested cases, except for those where HTM cannot be outperformed. However, in such a workload, Part-HTM still performs better than all other software and hybrid competitors. • Octonauts tackles the live-lock problem of HTM at high contention level. HTM lacks of advanced contention management (CM) policies. Octonauts is an HTM-aware scheduler that orchestrates conflicting transactions. It uses a priori knowledge of transactions' working-set to prevent the activation of conflicting transactions, simultaneously. Octonauts also accommodates both HTM and STM with minimal overhead by exploiting adaptivity. Based on the transaction's size, time, and irrevocable calls (e.g., system call) Octonauts selects the best path among HTM, STM, or global locking. Results show a performance improvement up to 60% when Octonauts is deployed in comparison with pure HTM with falling back to global locking. • Precise-TM is a unique approach to solve the granularity of the software fallback path of best-efforts HTM. It provide an efficient and precise technique for HTM-STM communication such that HTM is not interfered by concurrent STM transactions. In addition, the added overhead is marginal in terms of space or execution time. Precise-TM uses address-embedded locks (pointers bit-stealing) for a precise communication between STM and HTM. Results show that our precise fine-grained locking pays off as it allows more concurrency between hardware and software transactions. Specifically, it gains up to 5x over the default HTM implementation with a single global lock as fallback path. • Nemo is a new STM algorithm that ensures high and scalable performance when an application workload with a data locality property is deployed. Existing STM algorithms rely on centralized shared meta-data (e.g., a global timestamp) to synchronize concurrent accesses, but in such a workload, this scheme may hamper the achievement of scalable performance given the high latency introduced by NUMA architectures for updating those centralized meta-data. Nemo overcomes these limitations by allowing only those transactions that actually conflict with each other to perform inter-socket communication. As a result, if two transactions are non-conflicting, they cannot interact with each other through any meta-data. Such a policy does not apply for application threads running in the same socket. In fact, they are allowed to share any meta-data even if they execute non-conflicting operations because, supported by our evaluation study, we found that the local processing happening inside one socket does not interfere with the work done by parallel threads executing on other sockets. Nemo's evaluation study shows improvement over state-of-the-art TM algorithms by as much as 65%. / Ph. D. Transaction Memory Hardware Transaction Memory (HTM) Best-efforts HTM Transactions Partitioning Transactions Scheduling NUMA NUMA Optimization NUMA-aware STM Fine-grained Fallback
15	De l’exécution structurée d’applications scientiﬁques OpenMP sur architectures hiérarchiques Broquedis, François 09 December 2010 (has links) Le domaine applicatif de la simulation numérique requiert toujours plus de puissance de calcul. La technologie multicœur aide à satisfaire ces besoins mais impose toutefois de nouvelles contraintes aux programmeurs d’applications scientiﬁques qu’ils devront respecter s’ils souhaitent en tirer la quintessence. En particulier, il devient plus que jamais nécessaire de structurer le parallélisme des applications pour s’adapter au relief imposé par la hiérarchie mémoire des architectures multicœurs. Les approches existantes pour les programmer ne tiennent pas compte de cette caractéristique, et le respect de la structure du parallélisme reste à la charge du programmeur. Il reste de ce fait très difﬁcile de développer une application qui soit à la fois performante et portable.La contribution de cette thèse s’articule en trois axes. Il s’agit dans un premier temps de s’appuyer sur le langage OpenMP pour générer du parallélisme structuré, et de permettre au programmeur de transmettre cette structure au support exécutif ForestGOMP. L’exécution structurée de ces ﬂots de calcul est ensuite laissée aux ordonnanceurs Cacheet Memory développés au cours de cette thèse, permettant respectivement de maximiser la réutilisation des caches partagés et de maximiser la bande passante mémoire accessible par les programmes OpenMP. Enﬁn, nous avons étudié la composition de ces ordonnanceurs, et plus généralement de bibliothèques parallèles, en considérant cette voie comme une piste sérieuse pour exploiter efﬁcacement les multiples unités de calcul des architectures multicœurs.Les gains obtenus sur des applications scientiﬁques montrent l’intérêt d’une communication forte entre l’application et le support exécutif, permettant l’ordonnancement dynamique et portable de parallélisme structuré sur les architectures hiérarchiques. / Abstract Calcul hautes performances Support d’exécution OpenMP Multicœur Numa
16	Support for NUMA hardware in HelenOS / Support for NUMA hardware in HelenOS Horký, Vojtěch January 2011 (has links) The goal of this master thesis is to extend HelenOS operating system with the support for ccNUMA hardware. The text of the thesis contains a brief introduction to ccNUMA hardware, an overview of NUMA features and relevant features of HelenOS (memory management, scheduling, etc.). The thesis analyses various design decisions of the implementation of NUMA support -- introducing the hardware topology into the kernel data structures, propagating this information to user space, thread affinity to cores and nodes, memory allocation policies, load balancing, etc. The thesis also contains a prototype implementation of ccNUMA support in HelenOS for the AMD64 platform and a brief evaluation and comparison with ccNUMA support in other monolithic and microkernel-based operating systems.
17	Avaliação probabilística da capacidade de receção Nodal de uma rede de transporte Almeida, Jorge Miguel Martins de January 2012 (has links) Tese de Mestrado Integrado. Mestrado Integrado em Engenharia Electrotécnia e de Computadores. Faculdade de Engenharia. Universidade do Porto. 2012 Rede de transportes Centros electroprodutores Receção nodal numa rede de transportes
18	Análise de sensibilidade e risco no resgate da concessão e gestão da rede de distribuição de energia numa autarquia Fernandes, Ruben Diogo Salgueiro January 2012 (has links) Tese de Mestrado Integrado. Engenharia Electrotécnica e de Computadores. Área de Especialização de Energia. Faculdade de Engenharia. Universidade do Porto. 2012 Redes energéticas Gastos camarários
19	Contribution à la modélisation numérique de la propagation des ondes sismiques sur architectures multicœurs et hiérarchiques Dupros, Fabrice 13 December 2010 (has links) En termes de prévention du risque associé aux séismes, la prédiction quantitative des phénomènes de propagation et d'amplification des ondes sismiques dans des structures géologiques complexes devient essentielle. Dans ce domaine, la simulation numérique est prépondérante et l'exploitation efficace des techniques de calcul haute performance permet d'envisager les modélisations à grande échelle nécessaires dans le domaine du risque sismique.Plusieurs évolutions récentes au niveau de l'architecture des machines parallèles nécessitent l'adaptation des algorithmes classiques utilisées pour la modélisation sismique. En effet, l'augmentation de la puissance des processeurs se traduit maintenant principalement par un nombre croissant de cœurs de calcul et les puces multicœurs sont maintenant à la base de la majorité des architectures multiprocesseurs. Ce changement correspond également à une plus grande complexité au niveau de l'organisation physique de la mémoire qui s'articule généralement autour d'une architecture NUMA (Non Uniform Memory Access pour accès mémoire non uniforme) de profondeur importante.Les contributions de cette thèse se situent à la fois au niveau algorithmique et numérique mais abordent également l'articulation avec les supports d'exécution optimisés pour les architectures multicœurs. Les solutions retenues sont validées à grande échelle en considérant deux exemples de modélisation sismique. Le premier cas se situe dans la préfecture de Niigata-Chuetsu au Japon (événement du 16 juillet 2007) et repose sur la méthode des différences finies. Le deuxième exemple met en œuvre la méthode des éléments finis. Un séisme hypothétique dans la région de Nice est modélisé en tenant compte du comportement non linéaire du sol. / One major goal of strong motion seismology is the estimation of damage in future earthquake scenarios. Simulation of large scale seismic wave propagation is of great importance for efficient strong motion analysis and risk mitigation. Being particularly CPU-consuming, this three-dimensional problem makes use of high-performance computing technologies to make realistic simulation feasible on a regional scale at relatively high frequencies.Several evolutions at the chip level have an important impact on the performance of classical implementation of seismic applications. The trend in parallel computing is to increase the number of cores available at the shared-memory level with possible non-uniform cost of memory accesses. The increasing number of cores per processor and the effort made to overcome the limitation of classical symmetric multiprocessors SMP systems make available a growing number of NUMA (Non Uniform Memory Access) architecture as computing node. We therefore need to consider new approaches more suitable to such parallel systems.This PhD work addresses both the algorithmic issues and the integration of efficient programming models for multicore architectures. The proposed contributions are validated with two large scale examples. The first case is the modeling of the 2007 Niigata-Chuetsu, Japan earthquake based on the finite differences numerical method. The second example considers a potential seismic event in the Nice sedimentary basin in the French Riviera. The finite elements method is used and the nonlinear soil behavior is taken into account. Calcul haute performance Modélisation sismique Architectures NUMA Processeurs multicœurs High performance computing Seismic modeling NUMA architecture Multicore processor
20	Mécanisme et importance développementale de l'orientation du fuseau mitotique des progéniteurs neuraux chez les vertébrés : rôle du complexe Gαi\LGN\NUMA Peyre, Elise 12 October 2011 (has links) Pour maintenir l'architecture du tissue, les cellules épithéliales se divisent de manière planaire, perpendiculaire à leur axe principal de polarité. Du fait que le centrosome retrouve sa localisation apicale à l'interphase l'orientation du fuseau mitotique est réinitialisée à chaque cycle cellulaire. Nous utilisons de l'imagerie live en trois dimensions de centrosome marqués en GFP pour investiguer la dynamique de l'orientation du fuseau mitotique des cellules neuroépithéliales de l'embryon de poulet. Le fuseau mitotique présente des mouvements stéréotypiques pendant la métaphase, avec dans un premier temps une phase active de d'orientation planaire suivie par une phase de maintenance planaire jusqu'à l'anaphase. Nous décrivons la localisation des protéines NuMA et LGN formant un anneau au niveau du cortex latéral cellulaire au moment de l'orientation du fuseau. Enfin, nous montrons que le complexe protéique formé par LGN, NuMA et par la sous unité Gai localisé au cortex est nécessaire pour les mouvements du fuseau et pour réguler la dynamique de l'orientation du fuseau. La localisation restreinte de LGN et NuMA en anneau cortical est instructive pour l'alignement planaire du fuseau mitotique et est également requise pour sa maintenance planaire. / To maintain tissue architecture, epithelial cells divide in a planar fashion, perpendicular to their main polarity axis. As the centrosome resumes an apical localization in interphase, planar spindle orientation is reset at each cell cycle. We used three-dimensional live imaging of GFP-labeled centrosomes to investigate the dynamics of spindle orientation in chick neuroepithelial cells. The mitotic spindle displays stereotypic movements during metaphase, with an active phase of planar orientation and a subsequent phase of planar maintenance before anaphase. We describe the localization of the NuMA and LGN proteins in a belt at the lateral cell cortex during spindle orientation. Finally, we show that the complex formed of LGN, NuMA, and of cortically located Gái subunits is necessary for spindle movements and regulates the dynamics of spindle orientation. The restricted localization of LGN and NuMA in the lateral belt is instructive for the planar alignment of the mitotic spindle, and required for its planar maintenance. Fuseau mitotique Lgn NuMA Gai Progéniteur neural Orientation de division Mitotic spindle Lgn NuMA Gai Neural progenitor Division orientation

Search results