Global ETD Search

1	Case for holistic query evaluation Krikellas, Konstantinos January 2010 (has links) In this thesis we present the holistic query evaluation model. We propose a novel query engine design that exploits the characteristics of modern processors when queries execute inside main memory. The holistic model (a) is based on template-based code generation for each executed query, (b) uses multithreading to adapt to multicore processor architectures and (c) addresses the optimization problem of scheduling multiple threads for intra-query parallelism. Main-memory query execution is a usual operation in modern database servers equipped with tens or hundreds of gigabytes of RAM. In such an execution environment, the query engine needs to adapt to the CPU characteristics to boost performance. For this purpose, holistic query evaluation applies customized code generation to database query evaluation. The idea is to use a collection of highly efficient code templates and dynamically instantiate them to create query- and hardware-specific source code. The source code is compiled and dynamically linked to the database server for processing. Code generation diminishes the bloat of higher-level programming abstractions necessary for implementing generic, interpreted, SQL query engines. At the same time, the generated code is customized for the hardware it will run on. The holistic model supports the most frequently used query processing algorithms, namely sorting, partitioning, join evaluation, and aggregation, thus allowing the efficient evaluation of complex DSS or OLAP queries. Modern CPUs follow multicore designs with multiple threads running in parallel. The dataflow of query engine algorithms needs to be adapted to exploit such designs. We identify memory accesses and thread synchronization as the main bottlenecks in a multicore execution environment. We extend the holistic query evaluation model and propose techniques to mitigate the impact of these bottlenecks on multithreaded query evaluation. We analytically model the expected performance and scalability of the proposed algorithms according to the hardware specifications. The analytical performance expressions can be used by the optimizer to statically estimate the speedup of multithreaded query execution. Finally, we examine the problem of thread scheduling in the context of multithreaded query evaluation on multicore CPUs. The search space for possible operator execution schedules scales fast, thus forbidding the use of exhaustive techniques. We model intra-query parallelism on multicore systems and present scheduling heuristics that result in different degrees of schedule quality and optimization cost. We identify cases where each of our proposed algorithms, or combinations of them, are expected to generate schedules of high quality at an acceptable running cost. 004.01
2	Resource management for efficient single-ISA heterogeneous computing Chen, Jian, doctor of electrical and computer engineering 11 July 2012 (has links) Single-ISA heterogeneous multi-core processors (SHMP) have become increasingly important due to their potential to significantly improve the execution efficiency for diverse workloads and thereby alleviate the power density constraints in Chip Multiprocessors (CMP). The importance of SHMP is further underscored by the fact that manufacturing defects and process variation could also cause single-ISA heterogeneity in CMPs even though the CMP is originally designed as homogeneous. However, to fully exploit the execution efficiency that SHMP has to offer, programs have to be efficiently mapped/scheduled to the appropriate cores such that the hardware resources of the cores match the resource demands of the programs, which is challenging and remains an open problem. This dissertation presents a comprehensive set of off-line and on-line techniques that leverage analytical performance modeling to bridge the gap between the workload diversity and the hardware heterogeneity. For the off-line scenario, this dissertation presents an efficient resource demand analysis framework that can estimate the resource demands of a program based on the inherent characteristics of the program without using any detailed simulation. Based on the estimated resource demands, this dissertation further proposes a multi-dimensional program-core matching technique that projects program resource demands and core configurations to a unified multi-dimensional space, and uses the weighted Euclidean distance between these two to identify the matching program-core pair. This dissertation also presents a dynamic and predictive application scheduler for SHMPs. It uses a set of hardware-efficient online profilers and an analytical performance model to simultaneously predict the application’s performance on different cores. Based on the predicted performance, the scheduler identifies and enforces near-optimal application assignment for each scheduling interval without any trial runs or off-line profiling. Using only a few kilo-bytes of extra hardware, the proposed heterogeneity-aware scheduler improves the weighted speedup by 11.3% compared with the commodity OpenSolaris scheduler and by 6.8% compared with the best known research scheduler. Finally, this dissertation presents a predictive yet cost effective mechanism to manage intra-core and/or inter-core resources in dynamic SHMP. It also uses a set of hardware-efficient online profilers and an analytical performance model to predict the application’s performance with different resource allocations. Based on the predicted performance, the resource allocator identifies and enforces near optimum resource partitions for each epoch without any trial runs. The experimental results show that the proposed predictive resource management framework could improve the weighted speedup of the CMP system by an average of 11.6% compared with the equal partition scheme, and 9.3% compared with existing reactive resource management scheme. / text Performance modeling Heterogeneous multicore processor Hardware resource demand Application scheduling Resource management
3	Towards Energy Efficient Computing with Linux : Enabling Task Level Power Awareness and Support for Energy Efficient Accelerator January 2013 (has links) abstract: With increasing transistor volume and reducing feature size, it has become a major design constraint to reduce power consumption also. This has given rise to aggressive architectural changes for on-chip power management and rapid development to energy efficient hardware accelerators. Accordingly, the objective of this research work is to facilitate software developers to leverage these hardware techniques and improve energy efficiency of the system. To achieve this, I propose two solutions for Linux kernel: Optimal use of these architectural enhancements to achieve greater energy efficiency requires accurate modeling of processor power consumption. Though there are many models available in literature to model processor power consumption, there is a lack of such models to capture power consumption at the task-level. Task-level energy models are a requirement for an operating system (OS) to perform real-time power management as OS time multiplexes tasks to enable sharing of hardware resources. I propose a detailed design methodology for constructing an architecture agnostic task-level power model and incorporating it into a modern operating system to build an online task-level power profiler. The profiler is implemented inside the latest Linux kernel and validated for Intel Sandy Bridge processor. It has a negligible overhead of less than 1\% hardware resource consumption. The profiler power prediction was demonstrated for various application benchmarks from SPEC to PARSEC with less than 4\% error. I also demonstrate the importance of the proposed profiler for emerging architectural techniques through use case scenarios, which include heterogeneous computing and fine grained per-core DVFS. Along with architectural enhancement in general purpose processors to improve energy efficiency, hardware accelerators like Coarse Grain reconfigurable architecture (CGRA) are gaining popularity. Unlike vector processors, which rely on data parallelism, CGRA can provide greater flexibility and compiler level control making it more suitable for present SoC environment. To provide streamline development environment for CGRA, I propose a flexible framework in Linux to do design space exploration for CGRA. With accurate and flexible hardware models, fine grained integration with accurate architectural simulator, and Linux memory management and DMA support, a user can carry out limitless experiments on CGRA in full system environment. / Dissertation/Thesis / M.S. Electrical Engineering 2013 Engineering Electrical engineering Computer engineering CGRA framework Multicore Processor Power Management Task Power Profiler
4	Performance monitoring of throughput constrained dataflow programs executed on shared-memory multi-core architectures / Evaluation de performance d'applications flot de données executées sur des architectures multi-coeur Selva, Manuel 02 July 2015 (has links) Les progrès continus de la microélectronique couplés au problème de gestion de la puissance dissipée ont conduit les fabricants de processeurs à se tourner vers des puces dites multi-coeurs au début des années 2000. Ces processeurs sont composés de plusieurs unités de calcul indépendantes. Contrairement aux progrès précédents ces architectures multi-coeurs, le logiciel doit être en grande parti repensé pour tirer parti de toutes les unités de calcul. Il faut pouvoir paralléliser une application séquentielle en tâches le plus indépendantes possibles pour pouvoir les exécuter sur différentes unités de calcul. Pour cela, de nombreux modèles de programmations dits concurrents ont été proposés. Dans cette thèse nous nous intéressons aux programmes décrits à l’aide du modèle dataflow. Ce travail porte sur l’évaluation des performances de programmes dataflow (forme que revêtent typiquement des applications de types traitement de flux vidéos ou protocoles de communication) sur des architectures multi-coeurs. Plus particulièrement, le sujet de la thèse porte sur l’extension de modèles de programmation dataflow avec des éléments d’expression de propriétés de qualité de service ainsi que la prise en compte de ces éléments pour détecter, à l’exécution, les goulots d’étranglement de performance au sein des programmes. Les informations concernant les goulots d'étranglements collectées pendant l'exécution sont utilisées à la fois pour faire de l'analyse hors-ligne et pour faire des adaptations pendant l'exécution des programmes. Dans le premier cas, le programmeur utilise ces informations pour savoir quelles parties du programme dataflow il faut optimiser et pour savoir comment distribuer efficacement le programme sur les unités de calcul. Dans le second cas, les informations collectées sont utilisées par des mécanismes d'adaptation automatique afin de redistribuer le travail sur les différentes unités de calcul de façon plus efficace. Nous portons une attention particulière au profiling de l'utilisation faite par les applications dataflow du système mémoire. Les informations sur les échanges de données fournies par le modèle de programmation permettent d'exploiter de façon intelligente les architectures mémoires des machines multi-coeurs. Néanmoins, la complexité de ces dernières ne permet pas de façon générale d'évaluer statiquement l'impact sur les performances des accès mémoires. Nous proposons donc la mise en place d'un système de profiling mémoire pour des applications dataflow basé sur des mécanismes matériels. / Because of physical limits, hardware designers have switched to parallel systems to exploit the still growing number of transistors per square millimeter of silicon. These parallel systems are made of several independent computing units. To benefit from these computing units, software must be changed. Existing sequential applications have to be split into independent tasks to be executed in parallel on the different computing units. To that end, many concurrent programming models have been proposed and are in use today. We focus in this thesis on the dataflow concurrent programming model. This work is about performance evaluation of dataflow programs on multicore architectures. We propose to extend dataflow programming models with the notion of throughput constraints and to take this information into account in the compilation tool chain to detect at runtime the throughput bottlenecks. The profiling results gathered during the execution are used both for off-line analyzes and to adapt the application during its execution. In the former case, the developer uses this information to know which part of the dataflow program should be optimized and to efficiently distribute the program on the computing units. In the later case, the profiling information is used by runtime adaptation mechanisms to distribute differently the work on the computing units. We give a particular focus on the profiling of the usage of the memory subsystem. The data exchange information provide by the programming model allows to efficiently used the memory subsystem of multicore architectures. Nevertheless, the complexity of modern memory systems doesn't allow to statically evaluate the impact of memory accesses on the global performances of the application. We propose to set up memory profiling dedicated to dataflow applications based on hardware profiling mechanisms. Informatique Processeur multicoeur Analyse de la performance Mémoire IT - Information Technology Multicore processor Profiling Memory Throughput 004.220 72
5	Emerging Technologies in On-Chip and Off-Chip Interconnection Network Sikder, Md Ashif Iqbal 23 September 2016 (has links) No description available. Computer Engineering Electrical Engineering Network-on-Chip wireless photonics DRAM Chip Multicore Processor
6	Contribution à la modélisation numérique de la propagation des ondes sismiques sur architectures multicœurs et hiérarchiques Dupros, Fabrice 13 December 2010 (has links) En termes de prévention du risque associé aux séismes, la prédiction quantitative des phénomènes de propagation et d'amplification des ondes sismiques dans des structures géologiques complexes devient essentielle. Dans ce domaine, la simulation numérique est prépondérante et l'exploitation efficace des techniques de calcul haute performance permet d'envisager les modélisations à grande échelle nécessaires dans le domaine du risque sismique.Plusieurs évolutions récentes au niveau de l'architecture des machines parallèles nécessitent l'adaptation des algorithmes classiques utilisées pour la modélisation sismique. En effet, l'augmentation de la puissance des processeurs se traduit maintenant principalement par un nombre croissant de cœurs de calcul et les puces multicœurs sont maintenant à la base de la majorité des architectures multiprocesseurs. Ce changement correspond également à une plus grande complexité au niveau de l'organisation physique de la mémoire qui s'articule généralement autour d'une architecture NUMA (Non Uniform Memory Access pour accès mémoire non uniforme) de profondeur importante.Les contributions de cette thèse se situent à la fois au niveau algorithmique et numérique mais abordent également l'articulation avec les supports d'exécution optimisés pour les architectures multicœurs. Les solutions retenues sont validées à grande échelle en considérant deux exemples de modélisation sismique. Le premier cas se situe dans la préfecture de Niigata-Chuetsu au Japon (événement du 16 juillet 2007) et repose sur la méthode des différences finies. Le deuxième exemple met en œuvre la méthode des éléments finis. Un séisme hypothétique dans la région de Nice est modélisé en tenant compte du comportement non linéaire du sol. / One major goal of strong motion seismology is the estimation of damage in future earthquake scenarios. Simulation of large scale seismic wave propagation is of great importance for efficient strong motion analysis and risk mitigation. Being particularly CPU-consuming, this three-dimensional problem makes use of high-performance computing technologies to make realistic simulation feasible on a regional scale at relatively high frequencies.Several evolutions at the chip level have an important impact on the performance of classical implementation of seismic applications. The trend in parallel computing is to increase the number of cores available at the shared-memory level with possible non-uniform cost of memory accesses. The increasing number of cores per processor and the effort made to overcome the limitation of classical symmetric multiprocessors SMP systems make available a growing number of NUMA (Non Uniform Memory Access) architecture as computing node. We therefore need to consider new approaches more suitable to such parallel systems.This PhD work addresses both the algorithmic issues and the integration of efficient programming models for multicore architectures. The proposed contributions are validated with two large scale examples. The first case is the modeling of the 2007 Niigata-Chuetsu, Japan earthquake based on the finite differences numerical method. The second example considers a potential seismic event in the Nice sedimentary basin in the French Riviera. The finite elements method is used and the nonlinear soil behavior is taken into account. Calcul haute performance Modélisation sismique Architectures NUMA Processeurs multicœurs High performance computing Seismic modeling NUMA architecture Multicore processor
7	Problematika přechodu od jednojádrové k vícejádrové implementaci operačního systému / Issue of Migrating from Single-Core to Multi-Core Implementation of Operating System Matyáš, Jan January 2014 (has links) This thesis discuss necessary changes needed in order to run MicroC/OS-II on multicore processor, mainly Zynq 7000 All Programmable SoC which uses two ARM Cortex-A9 cores. Problems that arise during this transition are also discussed.
8	Dynamic Power Management in a Heterogeneous Processor Architecture Arega, Frehiwot Melak, Hähnel, Markus, Dargie, Waltenegus 15 May 2023 (has links) Emerging mobile platforms integrate heterogeneous, multicore processors to efficiently deal with the heterogeneity of data (in magnitude, type, and quality). The main goal is to achieve a high degree of energy-proportionality which corresponds with the nature and fluctuation of mobile workloads. Most existing power and energy consumption analyses of these architectures rely on simulation or static benchmarks neither of which truly reflects the type of workload the processors handle in reality. By contrast, we generate two types of stochastic workloads and employ four types of dynamic voltage and frequency scaling (DVFS) policies to investigate the energy proportionality and the dynamic power consumption characteristics of a heterogeneous processor architecture when operating in different configurations. The analysis illustrates, both qualitatively and quantitatively, that knowledge of the statistics of the incoming workload is critical to determine the appropriate processor configuration. info:eu-repo/classification/ddc/004 ddc:004
9	Viab-Cell, développement d'un logiciel viabiliste sur processeur multicoeurs pour la simulation de la morphogénèse / Development of a viabilist software on multi-core CPU for morhogenesis simulation Sarr, Abdoulaye 08 December 2016 (has links) Ce travail présente un modèle théorique de morphogenèse animale, sous la forme d’un système complexe émergeant de nombreux comportements, processus internes, expressions et interactions cellulaires. Son implémentation repose sur un automate cellulaire orienté système multi-agents avec un couplage énergico-génétique entre les dynamiques cellulaires et les ressources.Notre objectif est de proposer des outils permettant l’étude numérique du développement de tissus cellulaires à travers une approche hybride (discrète/continue et qualitative/quantitative) pour modéliser les aspects génétiques, énergétiques et comportementaux des cellules. La modélisation de ces aspects s’inspire des principes de la théorie de la viabilité et des données expérimentales sur les premiers stades de division de l’embryon du poisson-zèbre.La théorie de la viabilité appliquée à la morphogenèse pose cependant de nouveaux défis en informatique pour pouvoir implémenter des algorithmes dédiés aux dynamiques morphologiques. Le choix de données biologiques pertinentes à considérer dans le modèle à proposer, la conception d’un modèle basé sur une théorie nouvelle, l’implémentation d’algorithmes adaptés reposant sur des processeurs puissants et le choix d’expérimentations pour éprouver nos propositions sont les enjeux fondamentaux de ces travaux. Les hypothèses que nous proposons sont discutées au moyen d’expérimentations in silico qui ont porté principalement sur l’atteignabilité et la capturabilité de formes de tissus ; sur la viabilité de l’évolution d’un tissu pour un horizon de temps ; sur la mise en évidence de nouvelles propriétés de tissus et la simulation de mécanismes tissulaires essentiels pour leur contrôlabilité face à des perturbations ; sur de nouvelles méthodes de caractérisation de tissus pathologiques, etc. De telles propositions doivent venir en appoint aux expérimentations in vitro et in vivo et permettre à terme de mieux comprendre les mécanismes régissant le développement de tissus. Plus particulièrement, nous avons mis en évidence lors du calcul de noyaux de viabilité les relations de causalité ascendante reliant la maintenance des cellules en fonction des ressources énergétiques disponibles et la viabilité du tissu en croissance. La dynamique de chaque cellule est associée à sa constitution énergétique et génétique. Le modèle est paramétré à travers une interface permettant de prendre en compte le nombre de coeurs à solliciter pour la simulation afin d’exploiter la puissance de calcul offerte par les matériels multi-coeurs. / This work presents a theoretical model of animal morphogenesis, as a complex system from which emerge cellular behaviors, internal processes, interactions and expressions. Its implementation is based on a cellular automaton oriented multi-agent system with an energico-genetic coupling between the cellular dynamics and resources. Our main purpose is to provide tools for the numerical study of tissue development through a hybrid approach (discrete/continuous and qualitative/quantitative) that models genetic, behavioral and energetic aspects of cells. The modeling of these aspects is based on the principles of viability theory and on experimental data on the early stages of the zebrafish embryo division. The viability theory applied to the morphogenesis, however, raises new challenges in computer science to implement algorithms dedicated to morphological dynamics. The choice of relevant biological data to be considered in the model to propose, the design of a model based on a new theory, the implementation of suitable algorithms based on powerful processors and the choice of experiments to test our proposals are fundamental issues of this work. The assumptions we offer are discussed using in silico experiments that focused on the reachability and catchability of tissue forms ; on the viability of the evolution of a tissue for a time horizon ; on the discovery of new tissue properties and simulation of tissue mechanisms that are fondamental for their controllability face to disruptions ; on new pathological tissue characterization methods, etc. Such proposals must come extra to support experiments in vitro and in vivo and eventually allow a better understanding of the mechanisms governing the development of tissues.In particular, we have highlighted through the computing of viability kernels the bottom causal relationship between the maintenance of cells according to available energy resources and the viability of the tissue in growth. The model is set through an interface that takes into account the number of cores to solicit for simulation in order to exploit the computing power offered by multicore hardware. Système multicellulaire Morphogenèse Théorie de la viabilité Biologie computationnelle Automate cellulaire Système multi-agents Multicoeurs Tumeurs Multicellular system Morphogenesis Viability theory Computational biology Cellular automata Multi-agent system Multicore processor Tumors 571.833 011 3
10	Conception et intégration d'un convertisseur buck en technologie 28 nm CMOS orientée plateformes mobiles / Design and Integration of a buck converter in 28 nm CMOS technology for mobile platforms Toni, Kotchikpa Arnaud 10 July 2019 (has links) Ce travail de thèse présente la conception d’un convertisseur Buck 3 états pour améliorer le comportement dynamique des tensions d’alimentations des microprocesseurs. La topologie du convertisseur est dans un premier temps, implémentée en technologie IBM CMOS 180 nm pour la validation de la structure 3 états. Le prototype réalisé utilise une tension d’entrée de 3.6V et génère une tension de sortie de 0.8V à 2V. Sa réponse aux transitoires de charge ne montre que 1 à 2% de surtension prouvant ainsi l’avantage du régulateur en dynamique. Le convertisseur 3 états est dans un deuxième temps intégré en technologie 28 nm CMOS HPM (cette technologie est essentiellement utilisée pour les microprocesseurs). Les résultats des tests effectués sur le prototype réalisé confirment les performances en économie d’énergie, de surface et de réponse dynamique. Ce prototype délivre en effet 0.5 à 1.2V en sortie pour 1.8V en entrée et présente un rendement maximal de 90%. Les mesures de régulation dynamique montrent qu’il permet d’obtenir moins de 5% de bruit sur le processeur et 10 mV/ns de commutation de tensio / This thesis work consists into the design of a 3 states buck converter targeting the improvement of dynamic regulation of microprocessors supplies. The topology of the converter is, at first, implemented in IBMCMOS 180 nm technology to validate the transient performances of the3 states regulator. The prototype in 180 nm, uses an input voltage of 3.6V and outputs a voltage in the range of 0.8V to 2V. Its response to load transients shows about 1% of undershoot and 2 % of overshoot, proving a good dynamic behavior for a simple structure compared to state of the art.The 3 states converter is then integrated in 28 nm CMOS HPM (technologymostly used for microprocessors desgn). The experimental results on the prototype confirm the performances in terms of energy and area savings, aswell as dynamic response. The chip delivers 0.5V to 1.2V from a 1.8V supply,and shows a 90% peak efficiency. The measurements of dynamic regulation show less than 5% of noise on the processor supply and 10 mV/ns outputvoltage switching for DVFS purpose. Microélectronique Convertisseur de puissance Microprocesseur MOS Régulateur de tension Réseau de distribution Processeur multicoeur Plateforme portable Densité d'énergie Microelectronics Power Converter Voltage regulator Distribution network Multicore processor Portable platform Density of energy 621.317 072

Search results