Global ETD Search

1	Constraint Programming for Wireless Sensor Networks Hassani Bijarbooneh, Farshid January 2015 (has links) In recent years, wireless sensor networks (WSNs) have grown rapidly and have had a substantial impact in many applications. A WSN is a network that consists of interconnected autonomous nodes that monitor physical and environmental conditions, such as temperature, humidity, pollution, etc. If required, nodes in a WSN can perform actions to affect the environment. WSNs present an interesting and challenging field of research due to the distributed nature of the network and the limited resources of the nodes. It is necessary for a node in a WSN to be small to enable easy deployment in an environment and consume as little energy as possible to prolong its battery lifetime. There are many challenges in WSNs, such as programming a large number of nodes, designing communication protocols, achieving energy efficiency, respecting limited bandwidth, and operating with limited memory. WSNs are further constrained due to the deployment of the nodes in indoor and outdoor environments and obstacles in the environment. In this dissertation, we study some of the fundamental optimisation problems related to the programming, coverage, mobility, data collection, and data loss of WSNs, modelled as standalone optimisation problems or as optimisation problems integrated with protocol design. Our proposed solution methods come from various fields of research including constraint programming, integer linear programming, heuristic-based algorithms, and data inference techniques. / ProFuN Constraint programming wireless sensor networks optimisation macroprogramming task mapping
2	Energy-efficient computation and communication scheduling for cluster-based in-network processing in large-scale wireless sensor networks Tian, Yuan 14 September 2006 (has links) No description available. wireless sensor networks clusered networks task mapping and scheduling
3	EXPLORATION OF RUNTIME DISTRIBUTED MAPPING TECHNIQUES FOR EMERGING LARGE SCALE MPSOCS / EXPLORATION DE TECHNIQUES D’ALLOCATION DE TÂCHES DYNAMIQUES ET DISTRIBUÉES POUR MPSOCS DE LARGE ÉCHELLE Grandi Mandelli, Marcelo 13 July 2015 (has links) MPSoCs (systèmes multiprocesseurs sur puces) avec des centaines de cœurs sont déjà disponibles sur le marché. Selon le ITRS, ces systèmes intégreront des milliers de cœurs à la fin de la décennie. La définition du cœur, où chaque tâche sera exécutée dans le système, est une question majeure dans la conception de MPSoCs. Dans la littérature, cette question est définie comme allocation de tâches. La croissance du nombre de cœurs augmente la complexité de l'allocation de tâches. Les principales préoccupations en matière d'allocation de tâches dans des grands MPSoCs incluent: (i) l'évolutivité; (ii) la charge de travail dynamique; et (iii) la fiabilité. Il est nécessaire de distribuer la décision d'allocation de tâches à travers le système afin d'assurer l'évolutivité. La charge de travail de grands MPSoCs peut être dynamique, à savoir, de nouvelles applications peuvent commencer à tout moment, conduisant à différents scénarios d'allocation. Par conséquent, il est nécessaire d'exécuter le processus d'allocation à l'exécution pour soutenir une charge de travail dynamique. La fiabilité est étroitement liée à la distribution de la charge de travail du système. Un déséquilibre de charge peut générer des hotspots et autres implications thermiques, ce qui peut entraîner un fonctionnement peu fiable du système. Dans de grands MPSoCs, les problèmes de fiabilité empirent puisque l'augmentation du nombre de cœurs sur la même puce augmente la densité de puissance et, par conséquent, la température du système. La littérature présente différentes techniques d'allocation de tâches pour améliorer la fiabilité du système. Cependant, ces techniques utilisent des approches d'allocation centralisées, qui ne sont pas évolutives. Pour répondre à ces trois défis, l'objectif principal de cette Thèse est de proposer et évaluer des heuristiques d'allocation de tâches distribuées et dynamiques en assurant l'évolutivité et une distribution équitable de la charge de travail. Une distribution équitable de la charge de travail et du trafic du NoC (réseau sur puce) augmente la fiabilité du système dans le long terme, en raison de la minimisation des régions de hotspot. Pour permettre l'exploration de l'espace de conception de grands MPSoCs, la première contribution de cette Thèse se situe dans le cadre d'une modélisation multi-niveaux, qui prend en compte différents modèles et de capacités de débogage qui enrichissent et facilitent la conception des MPSoCs. La simulation de modèles de niveau inférieur (par exemple RTL) génère des paramètres de performance utilisés pour calibrer des modèles abstraits (sans précision d'horloge). Les modèles abstraits permettent d'explorer des heuristiques d'allocation de tâches dans de grands systèmes. La plupart des techniques d'allocation de tâches se focalisent sur l'optimisation du volume de communication, ce qui peut compromettre la fiabilité du système, en raison d'une surcharge des processeurs. D'autre part, une heuristique qui optimise seulement la distribution de la charge de travail peut surcharger le NoC et compromettre sa fiabilité. La deuxième contribution importante de cette Thèse est la proposition d'heuristiques d'allocation de tâches dynamiques et distribuées, qui réalisent un compromis entre le volume de communication (liens du NoC) et la distribution de la charge de travail (de l'utilisation des processeurs). Des résultats liés au temps d'exécution, au volume de la communication, à la consommation d'énergie, aux traces de puissance et à la distribution de la température dans les grands MPSoCs (144 processeurs) confirment l'hypothèse de compromis. Faire un compromis entre la réduction du volume de communication et une distribution équitable de la charge de travail améliore le système de manière fiable grâce à la réduction des régions de hotspots, sans compromettre la performance du système. / MPSoCs with hundreds of cores are already available in the market. According to the ITRS roadmap, such systems will integrate thousands of cores by the end of the decade. The definition of where each task will execute in the system is a major issue in the MPSoC design. In the literature, this issue is defined as task mapping. The growth in the number of cores increases the complexity of the task mapping. The main concerns in task mapping in large systems include: (i) scalability; (ii) dynamic workload; and (iii) reliability. It is necessary to distribute the mapping decision across the system to ensure scalability. The workload of emerging large MPSoCs may be dynamic, i.e., new applications may start at any moment, leading to different mapping scenarios. Therefore, it is necessary to execute the mapping process at runtime to support a dynamic workload. Reliability is tightly connected to the system workload distribution. Load imbalance may generate hotspots zones and consequently thermal implications, which may result in unreliable system operation. In large scale MPSoCs, reliability issues get worse since the growing number of cores on the same die increases power densities and, consequently, the system temperature. The literature presents different task mapping techniques to improve system reliability. However, such approaches use a centralized mapping approach, which are not scalable. To address these three challenges, the main goal of this Thesis is to propose and evaluate distributed mapping heuristics, executed at runtime, ensuring scalability and a fair workload distribution. Distributing the workload and the traffic inside the NoC increases the system reliability in long-term, due to the minimization of hotspot regions. To enable the design space exploration of large MPSoCs the first contribution of the Thesis lies in a multi-level modeling framework, which supports different models and debugging capabilities that enrich and facilitate the design of MPSoCs. The simulation of lower level models (e.g. RTL) generates performance parameters used to calibrate abstract models (e.g. untimed models). The abstract models pave the way to explore mapping heuristics in large systems. Most mapping techniques focus on optimizing communication volume in the NoC, which may compromise reliability due to overload processors. On the other hand, a heuristic optimizing only the workload distribution may overload NoC links, compromising its reliability. The second significant contribution of the Thesis is the proposition of dynamic and distributed mapping heuristics, making a tradeoff between communication volume (NoC links) and workload distribution (CPU usage). Results related to execution time, communication volume, energy consumption, power traces and temperature distribution in large MPSoCs (144 processors) confirm the tradeoff hypothesis. Trading off workload and communication volume improves system reliably through the reduction of hotspots regions, without compromising system performance. Modélisation Gestion de MPSoCs Allocation de Tâches NoC MPSoC SoC Modeling System Management Task Mapping NoC MPSoC SoC
4	Energy-efficient mapping and pipeline for the multi-resource systems with multiple supply voltages Wu, Kun-Yi 13 August 2007 (has links) Since the development of SoC is very fast, how to reduce the power consumption of SoC and improve the performance of SoC has become a very important issue. The power consumption of a system depends upon the hardware and software of a system. To overcome the issue of power consumption, the hardware circuit provides multi-voltage method to reduce task power consumption. On the other hand, the software tool decides the exact voltage for each task to minimize the total power consumption and finds a pipelined schedule of the periodic tasks to enhance the total throughput. In this thesis, a Tabu search is used to solve the voltage mapping and resource mapping problems of multi-voltage systems. This goal of this Tabu search is to find the solution with minimal power consumption for the multi-voltage system under the time constraints and resource constraints at the same time in the multi-voltage system to. Under the throughput constraints we use Tabu search to find solutions including the task¡¦s execution voltage and resource mapping, and then use list pipelined scheduling to schedule task and data communication and check their correctness. This method can reduce total power consumption. Experimental results show that our proposed algorithm can decide the resources mapping and pipeline in seconds, and it can reduce the power consumption efficiently. tabu search pipeline schedule Task mapping system level low power voltage mapping DVS
5	Automatic task and data mapping in shared memory architectures / Mapeamento automático de processos e dados em arquiteturas de memória compartilhada Diener, Matthias January 2015 (has links) Arquiteturas paralelas modernas têm hierarquias de memória complexas, que consistem de vários níveis de memórias cache privadas e compartilhadas, bem como Non-Uniform Memory Access (NUMA) devido a múltiplos controladores de memória por sistema. Um dos grandes desafios dessas arquiteturas é melhorar a localidade e o balanceamento de acessos à memória de tal forma que a latência média de acesso à memória é reduzida. Dessa forma, o desempenho e a eficiência energética de aplicações paralelas podem ser melhorados. Os acessos podem ser melhorados de duas maneiras: (1) processos que acessam dados compartilhados (comunicação entre processos) podem ser alocados em unidades de execução próximas na hierarquia de memória, a fim de melhorar o uso das caches. Esta técnica é chamada de mapeamento de processos. (2) Mapear as páginas de memória que cada processo acessa ao nó NUMA que ele está sendo executado, assim, pode-se reduzir o número de acessos a memórias remotas em arquiteturas NUMA. Essa técnica é conhecida como mapeamento de dados. Para melhores resultados, os mapeamentos de processos e dados precisam ser realizados de forma integrada. Trabalhos anteriores nesta área executam os mapeamentos separadamente, o que limita os ganhos que podem ser alcançados. Além disso, a maioria dos mecanismos anteriores exigem operações caras, como traços de acessos à memória, para realizar o mapeamento, além de exigirem mudanças no hardware ou na aplicação paralela. Estes mecanismos não podem ser considerados soluções genéricas para o problema de mapeamento. Nesta tese, fazemos duas contribuições principais para o problema de mapeamento. Em primeiro lugar, nós introduzimos um conjunto de métricas e uma metodologia para analisar aplicações paralelas, a fim de determinar a sua adequação para um melhor mapeamento e avaliar os possíveis ganhos que podem ser alcançados através desse mapeamento otimizado. Em segundo lugar, propomos um mecanismo que executa o mapeamento de processos e dados online. Este mecanismo funciona no nível do sistema operacional e não requer alterações no hardware, os códigos fonte ou bibliotecas. Uma extensa avaliação com múltiplos conjuntos de carga de trabalho paralelos mostram consideráveis melhorias em desempenho e eficiência energética. / Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a major challenge in shared-memory architectures. Modern systems have deep and complex memory hierarchies with multiple cache levels and memory controllers, leading to a Non-Uniform Memory Access (NUMA) behavior. In such systems, there are two ways to improve the memory affinity: First, by mapping tasks that share data (communicate) to cores with a shared cache, cache usage and communication performance are improved. Second, by mapping memory pages to memory controllers that perform the most accesses to them and are not overloaded, the average cost of accesses is reduced. We call these two techniques task mapping and data mapping, respectively. For optimal results, task and data mapping need to be performed in an integrated way. Previous work in this area performs the mapping only separately, which limits the gains that can be achieved. Furthermore, most previous mechanisms require expensive operations, such as communication or memory access traces, to perform the mapping, require changes to the hardware or to the parallel application, or use a simple static mapping. These mechanisms can not be considered generic solutions for the mapping problem. In this thesis, we make two contributions to the mapping problem. First, we introduce a set of metrics and a methodology to analyze parallel applications in order to determine their suitability for an improved mapping and to evaluate the possible gains that can be achieved using an optimized mapping. Second, we propose two automatic mechanisms that perform task mapping and combined task/data mapping, respectively, during the execution of a parallel application. These mechanisms work on the operating system level and require no changes to the hardware, the applications themselves or their runtime libraries. An extensive evaluation with parallel applications from multiple benchmark suites as well as real scientific applications shows substantial performance and energy efficiency improvements that are significantly higher than simple mechanisms and previous work, while maintaining a low overhead. Arquiteturas paralelas Processamento paralelo Banco : Dados Memória compartilhada Task mapping Data mapping Shared memory Multicore NUMA
6	Automatic task and data mapping in shared memory architectures / Mapeamento automático de processos e dados em arquiteturas de memória compartilhada Diener, Matthias January 2015 (has links) Arquiteturas paralelas modernas têm hierarquias de memória complexas, que consistem de vários níveis de memórias cache privadas e compartilhadas, bem como Non-Uniform Memory Access (NUMA) devido a múltiplos controladores de memória por sistema. Um dos grandes desafios dessas arquiteturas é melhorar a localidade e o balanceamento de acessos à memória de tal forma que a latência média de acesso à memória é reduzida. Dessa forma, o desempenho e a eficiência energética de aplicações paralelas podem ser melhorados. Os acessos podem ser melhorados de duas maneiras: (1) processos que acessam dados compartilhados (comunicação entre processos) podem ser alocados em unidades de execução próximas na hierarquia de memória, a fim de melhorar o uso das caches. Esta técnica é chamada de mapeamento de processos. (2) Mapear as páginas de memória que cada processo acessa ao nó NUMA que ele está sendo executado, assim, pode-se reduzir o número de acessos a memórias remotas em arquiteturas NUMA. Essa técnica é conhecida como mapeamento de dados. Para melhores resultados, os mapeamentos de processos e dados precisam ser realizados de forma integrada. Trabalhos anteriores nesta área executam os mapeamentos separadamente, o que limita os ganhos que podem ser alcançados. Além disso, a maioria dos mecanismos anteriores exigem operações caras, como traços de acessos à memória, para realizar o mapeamento, além de exigirem mudanças no hardware ou na aplicação paralela. Estes mecanismos não podem ser considerados soluções genéricas para o problema de mapeamento. Nesta tese, fazemos duas contribuições principais para o problema de mapeamento. Em primeiro lugar, nós introduzimos um conjunto de métricas e uma metodologia para analisar aplicações paralelas, a fim de determinar a sua adequação para um melhor mapeamento e avaliar os possíveis ganhos que podem ser alcançados através desse mapeamento otimizado. Em segundo lugar, propomos um mecanismo que executa o mapeamento de processos e dados online. Este mecanismo funciona no nível do sistema operacional e não requer alterações no hardware, os códigos fonte ou bibliotecas. Uma extensa avaliação com múltiplos conjuntos de carga de trabalho paralelos mostram consideráveis melhorias em desempenho e eficiência energética. / Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a major challenge in shared-memory architectures. Modern systems have deep and complex memory hierarchies with multiple cache levels and memory controllers, leading to a Non-Uniform Memory Access (NUMA) behavior. In such systems, there are two ways to improve the memory affinity: First, by mapping tasks that share data (communicate) to cores with a shared cache, cache usage and communication performance are improved. Second, by mapping memory pages to memory controllers that perform the most accesses to them and are not overloaded, the average cost of accesses is reduced. We call these two techniques task mapping and data mapping, respectively. For optimal results, task and data mapping need to be performed in an integrated way. Previous work in this area performs the mapping only separately, which limits the gains that can be achieved. Furthermore, most previous mechanisms require expensive operations, such as communication or memory access traces, to perform the mapping, require changes to the hardware or to the parallel application, or use a simple static mapping. These mechanisms can not be considered generic solutions for the mapping problem. In this thesis, we make two contributions to the mapping problem. First, we introduce a set of metrics and a methodology to analyze parallel applications in order to determine their suitability for an improved mapping and to evaluate the possible gains that can be achieved using an optimized mapping. Second, we propose two automatic mechanisms that perform task mapping and combined task/data mapping, respectively, during the execution of a parallel application. These mechanisms work on the operating system level and require no changes to the hardware, the applications themselves or their runtime libraries. An extensive evaluation with parallel applications from multiple benchmark suites as well as real scientific applications shows substantial performance and energy efficiency improvements that are significantly higher than simple mechanisms and previous work, while maintaining a low overhead. Arquiteturas paralelas Processamento paralelo Banco : Dados Memória compartilhada Task mapping Data mapping Shared memory Multicore NUMA
7	Automatic task and data mapping in shared memory architectures / Mapeamento automático de processos e dados em arquiteturas de memória compartilhada Diener, Matthias January 2015 (has links) Arquiteturas paralelas modernas têm hierarquias de memória complexas, que consistem de vários níveis de memórias cache privadas e compartilhadas, bem como Non-Uniform Memory Access (NUMA) devido a múltiplos controladores de memória por sistema. Um dos grandes desafios dessas arquiteturas é melhorar a localidade e o balanceamento de acessos à memória de tal forma que a latência média de acesso à memória é reduzida. Dessa forma, o desempenho e a eficiência energética de aplicações paralelas podem ser melhorados. Os acessos podem ser melhorados de duas maneiras: (1) processos que acessam dados compartilhados (comunicação entre processos) podem ser alocados em unidades de execução próximas na hierarquia de memória, a fim de melhorar o uso das caches. Esta técnica é chamada de mapeamento de processos. (2) Mapear as páginas de memória que cada processo acessa ao nó NUMA que ele está sendo executado, assim, pode-se reduzir o número de acessos a memórias remotas em arquiteturas NUMA. Essa técnica é conhecida como mapeamento de dados. Para melhores resultados, os mapeamentos de processos e dados precisam ser realizados de forma integrada. Trabalhos anteriores nesta área executam os mapeamentos separadamente, o que limita os ganhos que podem ser alcançados. Além disso, a maioria dos mecanismos anteriores exigem operações caras, como traços de acessos à memória, para realizar o mapeamento, além de exigirem mudanças no hardware ou na aplicação paralela. Estes mecanismos não podem ser considerados soluções genéricas para o problema de mapeamento. Nesta tese, fazemos duas contribuições principais para o problema de mapeamento. Em primeiro lugar, nós introduzimos um conjunto de métricas e uma metodologia para analisar aplicações paralelas, a fim de determinar a sua adequação para um melhor mapeamento e avaliar os possíveis ganhos que podem ser alcançados através desse mapeamento otimizado. Em segundo lugar, propomos um mecanismo que executa o mapeamento de processos e dados online. Este mecanismo funciona no nível do sistema operacional e não requer alterações no hardware, os códigos fonte ou bibliotecas. Uma extensa avaliação com múltiplos conjuntos de carga de trabalho paralelos mostram consideráveis melhorias em desempenho e eficiência energética. / Reducing the cost of memory accesses, both in terms of performance and energy consumption, is a major challenge in shared-memory architectures. Modern systems have deep and complex memory hierarchies with multiple cache levels and memory controllers, leading to a Non-Uniform Memory Access (NUMA) behavior. In such systems, there are two ways to improve the memory affinity: First, by mapping tasks that share data (communicate) to cores with a shared cache, cache usage and communication performance are improved. Second, by mapping memory pages to memory controllers that perform the most accesses to them and are not overloaded, the average cost of accesses is reduced. We call these two techniques task mapping and data mapping, respectively. For optimal results, task and data mapping need to be performed in an integrated way. Previous work in this area performs the mapping only separately, which limits the gains that can be achieved. Furthermore, most previous mechanisms require expensive operations, such as communication or memory access traces, to perform the mapping, require changes to the hardware or to the parallel application, or use a simple static mapping. These mechanisms can not be considered generic solutions for the mapping problem. In this thesis, we make two contributions to the mapping problem. First, we introduce a set of metrics and a methodology to analyze parallel applications in order to determine their suitability for an improved mapping and to evaluate the possible gains that can be achieved using an optimized mapping. Second, we propose two automatic mechanisms that perform task mapping and combined task/data mapping, respectively, during the execution of a parallel application. These mechanisms work on the operating system level and require no changes to the hardware, the applications themselves or their runtime libraries. An extensive evaluation with parallel applications from multiple benchmark suites as well as real scientific applications shows substantial performance and energy efficiency improvements that are significantly higher than simple mechanisms and previous work, while maintaining a low overhead. Arquiteturas paralelas Processamento paralelo Banco : Dados Memória compartilhada Task mapping Data mapping Shared memory Multicore NUMA
8	Load-Balancing and Task Mapping for Exascale Systems Deveci, Mehmet 22 May 2015 (has links) No description available. Computer Science
9	Methodology and tools for energy-aware task mapping on heterogeneous multiprocessor architectures / Méthodes et outils permettant le placement de taches efficaces en énergie sur architectures multicoeurs hétérogènes Roux, Baptiste 23 November 2017 (has links) Au cours de la dernière décennie, la conception des systèmes embarqués a évolué dans l'optique d'augmenter la puissance de calcul tout en conservant une faible consommation d'énergie. À titre d'exemple, les véhicules autonomes tels que les drones sont un domaine d'application représentatif qui combine de la vision, des communications sans fil avec d'autres noyaux de calculs intensifs, le tout avec un budget énergétique limité. Avec l'avènement des systèmes multicœurs sur puce (MpSoC), la simplification des processeurs a diminué la consommation d'énergie par opération, alors que leur multiplication a amélioré les performances. Cependant, l'apparition du phénomène de ''dark silicon'' a conduit à l'intégration d'accélérateurs matériels spécialisés au sein des systèmes multicœurs. C'est ainsi que sont nées les architectures massivement multicœurs hétérogènes (HMpSoC) combinant des processeurs généralistes (SW) et des accélérateurs matériels (HW). Pour ces architectures hétérogènes, les performances et la consommation d'énergie dépendent d'un large ensemble de paramètres tels que le partitionnement HW/SW, le type d'implémentation HW et le coût de communication entre les organes de calcul HW et SW conduisant ainsi à un immense espace de conception. Dans cette thèse, nous étudions des méthodes permettant la réduction de la complexité de développement et de mise en oeuvre d'applications efficaces en énergie sur HMpSoC. De nombreuses contributions sont proposées pour améliorer les outils d'exploration de l'espace de conception (DSE) avec des objectifs énergétiques. Tout d'abord, une définition formelle de la structure HMpSoC est introduite ainsi qu'une méthode de représentation générique axée sur la hiérarchie mémoire. Ensuite, un outil de modélisation rapide de l'énergie est proposé et validé sur plusieurs applications. Ce modèle énergétique sépare les sources d'énergie en trois catégories (calcul statique, dynamique et communications) et calcule leurs contributions sur la consommation globale de manière indépendante. Basée sur une étude précise des communications, cette approche calcule rapidement la consommation d'énergie pour une répartition donnée d'application sur un HMpSoC. Dans un deuxième temps, nous proposons une méthodologie permettant l'exploration énergétique d'accélérateurs sur HmpSoC. Cette méthode s'appuie sur le modèle de consommation précédent couplé à une formulation de programmation linéaire en nombre entier mixte (MILP). Cela permet de sélectionner efficacement les accélérateurs HW et le partitionnement HW/SW et ainsi d'obtenir une implémentation efficace en énergie pour une application tuilée. Les expériences réalisées ont montré la complexité du processus de validation d'outils/algorithmes de DSE sur une large gamme d'applications et d'architectures. Afin de résoudre ce problème, nous proposons un simulateur d'architectures HMpSoC intégrant un modèle de consommation permettant d'observer l'exécution d'applications. La structure de l'architecture cible est décrite à l'aide d'un fichier de configuration basé sur le modèle de représentation générique précédent. Ce fichier est chargé dynamiquement lors du démarrage du simulateur. De plus, ce simulateur est associé à un générateur d'applications permettant la création d'un large ensemble d'applications représentatives du domaine. Ce générateur se base sur un ensemble de schémas de calcul et de communication élémentaire qu'il combine pour obtenir une application complète. Les applications ainsi obtenues peuvent être enrichies par des informations de placement et automatiquement exécutées sur le simulateur. Cet ensemble d'outils a pour objectif de faciliter la validation de nouveaux algorithmes ciblant le placement efficace en énergie d'application sur une large gamme d'architectures HMpSoC. / During the last decade, the design of embedded systems was pushed to increase computational power while maintaining low energy consumption. As an example, autonomous vehicles such as drones are a representative application domain which combines vision, wireless communications and other computation intensive kernels constrained with a limited energy budget. With the advent of Multiprocessor System-on-Chip (MpSoC) architectures, simplification of processor cores decreased power consumption per operation, while the multiplication of cores brought performance improvement. However, the ''dark silicon'' issue led to the benefit of augmenting programmable processors with specialized hardware accelerators and to the rise of Heterogeneous MpSoC (HMpSoC) combining both software (SW) and hardware (HW) computational resources. For these heterogeneous architectures, performance and energy consumption depend on a large set of parameters such as the HW/SW partitioning, the type of HW implementation or the communication cost between HW and SW cores therefore leading to a huge design space. In this thesis, we study how to reduce the development and implementation complexity of energy-efficient applications on HMpSoC. Multiple contributions are proposed to enhance Design Space Exploration (DSE) tools with energy objectives. First, a formal definition of HMpSoC structure is introduced alongside with a generic representation focused on the memory hierarchy. Then, a fast power modelling tool is proposed and validated on several applications. This power model separates the power sources in three families (static, dynamic computation and dynamic communication) and computes their contributions on global consumption independently. With a fine grain communications study, this approach rapidly computes energy consumption for a given application mapping on a HMpSoC. In a second time, we propose a methodology for energy-driven accelerator exploration on HMpSoC. This method builds upon the previous power model coupled with an Mixed Integer Linear Programming (MILP) formulation and enables to efficiently select HW accelerators and HW/SW partitioning which achieve energy efficient-mapping of a tiled application. The experiments involved in these contributions show the complexity of DSE validation process on a wide range of applications and architectures. To address these issues, we introduce a HMpSoC simulator embedding a power model to monitor application execution. Properties of targeted architectures are described, at run-time with the previous generic representation model. Furthermore, this simulator is coupled with an application generator framework that could build an infinite set of representative applications following predefined computation models. The obtained applications could then be enriched with mapping directive and executed on the simulator. This combination enables to ease the research and validation of new DSE algorithms targeting energy-aware application mapping on a wide range of HMpSoC architectures. Efficacité énergétique Modèle de consommation Energy-Aware Fast power model Heterogeneous architecture Task mapping
10	Stratégie de placement et d'ordonnancement de taches logicielles pour architectures reconfigurables sous contrainte énergétique / Mapping and scheduling strategy of OS tasks into reconfigurable architectures under energy constraint Gammoudi, Aymen 26 June 2018 (has links) La conception de systèmes temps-réel embarqués se développe de plus en plus avec l’intégration croissante de fonctionnalités critiques pour les applications de surveillance, notamment dans le domaine biomédical, environnemental, domotique, etc. Le développement de ces systèmes doit relever divers défis en termes de minimisation de la consommation énergétique. Gérer de tels dispositifs embarqués, entièrement autonomes, nécessite cependant de résoudre différents problèmes liés à la quantité d’énergie disponible dans la batterie, à l’ordonnancement temps-réel des tâches qui doivent être exécutées avant leurs échéances, aux scénarios de reconfiguration, particulièrement dans le cas d’ajout de tâches, et à la contrainte de communication pour pouvoir assurer l’échange des messages entre les processeurs, de façon à assurer une autonomie durable jusqu’à la prochaine recharge et ce, tout en maintenant un niveau de qualité de service acceptable du système de traitement. Pour traiter cette problématique, nous proposons dans ces travaux une stratégie de placement et d’ordonnancement de tâches permettant d’exécuter des applications temps-réel sur une architecture contenant des cœurs hétérogènes. Dans cette thèse, nous avons choisi d’aborder cette problématique de façon incrémentale pour traiter progressivement les problèmes liés aux contraintes temps-réel, énergétique et de communications. Tout d’abord, nous nous intéressons particulièrement à l’ordonnancement des tâches sur une architecture mono-cœur. Nous proposons une stratégie d’ordonnancement basée sur le regroupement des tâches dans des packs pour pouvoir calculer facilement les nouveaux paramètres des tâches afin de réobtenir la faisabilité du système. Puis, nous l’avons étendu pour traiter le cas de l’ordonnancement sur une architecture multi-cœurs homogènes. Finalement, une extension de ce dernier sera réalisée afin d’arriver à l’objectif principal qui est l’ordonnancement des tâches pour les architectures hétérogènes. L’idée est de prendre progressivement en compte des contraintes d’exécution de plus en plus complexes. Nous formalisons tous les problèmes en utilisant la formulation ILP afin de pouvoir produire des résultats optimaux. L’idée est de pouvoir situer nos solutions proposées par rapport aux solutions optimales produites par un solveur et par rapport aux autres algorithmes de l’état de l’art. Par ailleurs, la validation par simulation des stratégies proposées montre qu’elles engendrent un gain appréciable vis-à-vis des critères considérés importants dans les systèmes embarqués, notamment le coût de la communication entre cœurs et le taux de rejet des tâches. / The design of embedded real-time systems is developing more and more with the increasing integration of critical functionalities for monitoring applications, particularly in the biomedical, environmental, home automation, etc. The developement of these systems faces various challenges particularly in terms of minimizing energy consumption. Managing such autonomous embedded devices, requires solving various problems related to the amount of energy available in the battery and the real-time scheduling of tasks that must be executed before their deadlines, to the reconfiguration scenarios, especially in the case of adding tasks, and to the communication constraint to be able to ensure messages exchange between cores, so as to ensure a lasting autonomy until the next recharge, while maintaining an acceptable level of quality of services for the processing system. To address this problem, we propose in this work a new strategy of placement and scheduling of tasks to execute real-time applications on an architecture containing heterogeneous cores. In this thesis, we have chosen to tackle this problem in an incremental manner in order to deal progressively with problems related to real-time, energy and communication constraints. First of all, we are particularly interested in the scheduling of tasks for single-core architecture. We propose a new scheduling strategy based on grouping tasks in packs to calculate the new task parameters in order to re-obtain the system feasibility. Then we have extended it to address the scheduling tasks on an homogeneous multi-core architecture. Finally, an extension of the latter will be achieved in order to realize the main objective, which is the scheduling of tasks for the heterogeneous architectures. The idea is to gradually take into account the constraints that are more and more complex. We formalize the proposed strategy as an optimization problem by using integer linear programming (ILP) and we compare the proposed solutions with the optimal results provided by the CPLEX solver. Inaddition, the validation by simulation of the proposed strategies shows that they generate a respectable gain compared with the criteria considered important in embedded systems, in particular the cost of communication between cores and the rate of new tasks rejection. Architecture multi-Cœur Reconfiguration Placement de tâches Ordonnancement temps-Réel Contrainte énergétique Multi-Core architecture Reconfiguration Task mapping Real-Time scheduling Energy constraint

Search results