Global ETD Search

21	Problème du Consensus dans le Modèle Homonyme Tran-The, Hung 06 June 2013 (has links) (PDF) So far, the distributed computing community has either assumed that all the processes of a distributed system have distinct identifiers or, more rarely, that the processes are anonymous and have no identifiers. These are two extremes of the same general model: namely, n processes use l different identifiers, where 1 l n. We call this model homonymous model. To determine the power of homonymous model as well as the importance of identifiers in distributed computing, this thesis studies the consensus problem, one of the most famous distributed computing problem. We give necessary and sufficient conditions on the number of identifiers for solving consensus in a distributed system with t faulty processes in the synchronous case. We show that in crash, send omission and general omission failures model, the uniform consensus is solvable even if processes are anonymous. Thus, identifiers are not useful in that case. However identifiers become important in Byzantine failures model: 3t + 1 identifiers is necessary and sufficient for Byzantine agreement. Surprisingly the number of identifiers must be greater than n+3t 2 in presence of three facets of uncertainty: partial synchrony, Byzantine failures and homonyms. This demonstrates two differences from the classical model (which has l = n): there are situations where relaxing synchrony to partial synchrony renders agreement impossible, and, in the partially synchronous case, increasing the number of correct processes can actually make it harder to reach agreement. consensus
22	Compiling for a multithreaded dataflow architecture : algorithms, tools, and experience Li, Feng 20 May 2014 (has links) (PDF) Across the wide range of multiprocessor architectures, all seem to share one common problem: they are hard to program. It is a general belief that parallelism is a software problem, and that perhaps we need more sophisticated compilation techniques to partition the application into concurrent threads. Many experts also make the point that the underlining architecture plays an equally important architecture before one may expect significant progress in the programmability of multiprocessors. Our approach favors a convergence of these viewpoints. The convergence of dataflow and von Neumann architecture promises latency tolerance, the exploitation of a high degree of parallelism, and light thread switching cost. Multithreaded dataflow architectures require a high degree of parallelism to tolerate latency. On the other hand, it is error-prone for programmers to partition the program into large number of fine grain threads. To reconcile these facts, we aim to advance the state of the art in automatic thread partitioning, in combination with programming language support for coarse-grain, functionally deterministic concurrency. This thesis presents a general thread partitioning algorithm for transforming sequential code into a parallel data-flow program targeting a multithreaded dataflow architecture. Our algorithm operates on the program dependence graph and on the static single assignment form, extracting task, pipeline, and data parallelism from arbitrary control flow, and coarsening its granularity using a generalized form of typed fusion. We design a new intermediate representation to ease code generation for an explicit token match dataflow execution model. We also implement a GCC-based prototype. We also evaluate coarse-grain dataflow extensions of OpenMP in the context of a large-scale 1024-core, simulated multithreaded dataflow architecture. These extension and simulated architecture allow the exploration of innovative memory models for dataflow computing. We evaluate these tools and models on realistic applications. Dataflow Multiprocessors
23	Benchmark-driven Approaches to Performance Modeling of Multi-Core Architectures Putigny, Bertrand 27 March 2014 (has links) (PDF) Ce manuscrit s'inscrit dans le domaine du calcul intensif (HPC) où le besoin croissant de performance pousse les fabricants de processeurs à y intégrer des mécanismes de plus en plus sophistiqués. Cette complexité grandissante rend l'utilisation des architectures compliquée. La modélisation des performances des architectures multi-cœurs permet de remonter des informations aux utilisateurs, c'est à dire les programmeurs, afin de mieux exploiter le matériel. Cependant, du fait du manque de documentation et de la complexité des processeurs modernes, cette modélisation est souvent difficile. L'objectif de ce manuscrit est d'utiliser des mesures de performances de petits fragments de codes afin de palier le manque d'information sur le matériel. Ces expériences, appelées micro-benchmarks, permettent de comprendre les performances des architectures modernes sans dépendre de la disponibilité des documentations techniques. Le premier chapitre présente l'architecture matérielle des processeurs modernes et, en particulier, les caractéristiques rendant la modélisation des performances complexe. Le deuxième chapitre présente une méthodologie automatique pour mesurer les performances des instructions arithmétiques. Les informations trouvées par cette méthode sont la base pour des modèles de calculs permettant de prédire le temps de calcul de fragments de codes arithmétique. Ce chapitre présent également comment de tels modèles peuvent être utilisés pour optimiser l'efficacité énergétique, en prenant pour exemple le processeur SCC. La dernière partie de ce chapitre motive le fait de réaliser un modèle mémoire prenant en compte la cohérence de cache pour prédire le temps d'accès au données. Le troisième chapitre présente l'environnement de développement de micro-benchmark utilisé pour caractériser les hiérarchies mémoires dotées de cohérence de cache. Ce chapitre fait également une étude comparative des performances mémoire de différentes architectures et l'impact sur les performances du choix du protocole de cohérence. Enfin, le quatrième chapitre présente un modèle mémoire permettant la prédiction du temps d'accès aux données pour des applications régulières de type \openmp. Le modèle s'appuie sur l'état des données dans le protocole de cohérence. Cet état évolue au fil de l'exécution du programme en fonction des accès à la mémoire. Pour chaque transition, une fonction de coût est associée. Cette fonction est directement dérivée des résultats des expériences faites dans le troisième chapitre, et permet de prédire le temps d'accès à la mémoire. Une preuve de concept de la fiabilité de ce modèle est faite, d'une part sur les applications d'algèbre et d'analyse numérique, d'autre part en utilisant ce modèle pour modéliser les performance des communications \mpi en mémoire partagée. modélisation performance HPC
24	Programmation efficace et sécurisé d'applications à mémoire partagée Sifakis, Emmanuel 06 May 2013 (has links) (PDF) L'utilisation massive des plateformes multi-cœurs et multi-processeurs a pour effet de favoriser la programmation parallèle à mémoire partagée. Néanmoins, exploiter efficacement et de manière correcte le parallélisme sur ces plateformes reste un problème de recherche ouvert. De plus, leur modèle d'exécution sous-jacent, et notamment les modèles de mémoire "relâchés", posent de nouveaux défis pour les outils d'analyse statiques et dynamiques. Dans cette thèse nous abordons deux aspects importants dans le cadre de la programmation sur plateformes multi-cœurs et multi-processeurs: l'optimisation de sections critiques implémentées selon l'approche pessimiste, et l'analyse dynamique de flots d'informations. Les sections critiques définissent un ensemble d'accès mémoire qui doivent être exécutées de façon atomique. Leur implémentation pessimiste repose sur l'acquisition et le relâchement de mécanismes de synchronisation, tels que les verrous, en début et en fin de sections critiques. Nous présentons un algorithme générique pour l'acquisition/relâchement des mécanismes de synchronisation, et nous définissons sur cet algorithme un ensemble de politiques particulier ayant pour objectif d'augmenter le parallélisme en réduisant le temps de possession des verrous par les différentes threads. Nous montrons alors la correction de ces politiques (respect de l'atomicité et absence de blocages), et nous validons expérimentalement leur intérêt. Le deuxième point abordé est l'analyse dynamique de flot d'information pour des exécutions parallèles. Dans ce type d'analyse, l'enjeu est de définir précisément l'ordre dans lequel les accès à des mémoires partagées peuvent avoir lieu à l'exécution. La plupart des travaux existant sur ce thème se basent sur une exécution sérialisée du programme cible. Ceci permet d'obtenir une sérialisation explicite des accès mémoire mais entraîne un surcoût en temps d'exécution et ignore l'effet des modèles mémoire relâchées. A contrario, la technique que nous proposons permet de prédire l'ensemble des sérialisations possibles vis-a-vis de ce modèle mémoire à partir d'une seule exécution parallèle ("runtime prediction"). Nous avons développé cette approche dans le cadre de l'analyse de teinte, qui est largement utilisée en détection de vulnérabilités. Pour améliorer la précision de cette analyse nous prenons également en compte la sémantique des primitives de synchronisation qui réduisent le nombre de sérialisations valides. Les travaux proposé ont été implémentés dans des outils prototype qui ont permit leur évaluation sur des exemples représentatifs. Concurrence Atomicité Vulnerabilité
25	Une démarche orientée modèle pour le déploiement de systèmes en environnements ouverts distribués Dubus, Jérémy 10 October 2008 (has links) (PDF) Le déploiement reste l'une des étapes du cycle de vie des logiciels la moins standardisée et outillée à ce jour. Dans ce travail, nous identifions quatre grands défis à relever pour dé- ployer des systèmes logiciels distribués et hétérogènes. Le premier défi est de réussir à initier le consensus manquant autour d'un langage générique de déploiement de logiciels. Le deuxième défi consiste en la vérification statique de déploiements logiciels décrits dans ce langage pour assurer un déroulement correct avant d'exécuter les opérations de déploiement. Le troisième défi est de réaliser une plate-forme intergicielle capable d'interpréter ce langage et d'effectuer le déploiement de n'importe quel système logiciel réparti. Enfin le quatrième défi est d'appli- quer ces déploiements de systèmes dans les environnements ouverts distribués, c'est-à-dire les réseaux fluctuants et à grande échelle comme les réseaux ubiquitaires ou les grilles de calcul. Notre contribution consiste à définir une démarche de déploiement de systèmes distribués cen- trée sur quatre rôles pour relever ces défis : l'expert réseau, l'expert logiciel, l'administrateur système et l'architecte métier. D'un côté, l'approche DeployWare, conforme à l'ingénierie des modèles, est définie par un méta-modèle multi-rôles pour décrire le déploiement de la couche intergicielle du système ainsi que par une machine virtuelle capable d'exécuter automatique- ment le déploiement de cette couche. L'utilisation d'un langage de méta-modélisation permet d'écrire des programmes de vérification statique des modèles de déploiement. De l'autre côté, l'approche DACAR propose un méta-modèle d'architecture générique pour exprimer et exé- cuter le déploiement d'une application métier à base de composants. Cette double approche DeployWare/DACAR permet de prendre en compte, lors de la description du déploiement, les propriétés des environnements ouverts distribués selon une approche conforme à l'informatique auto-gérée. Notre contribution est validée par plusieurs expériences pour valider la capacité de prise en charge des environnements ouverts ubiquitaires, et pour éprouver l'hétérogénéité des technologies déployables dans le monde des services d'entreprise. système réparti déploiement
26	AN EFFECTIVE PARALLEL PARTICLE SWARM OPTIMIZATION ALGORITHM AND ITS PERFORMANCE EVALUATION Maripi, Jagadish Kumar 01 December 2010 (has links) Population-based global optimization algorithms including Particle Swarm Optimization (PSO) have become popular for solving multi-optima problems much more efficiently than the traditional mathematical techniques. In this research, we present and evaluate a new parallel PSO algorithm that provides a significant performance improvement as compared to the serial PSO algorithm. Instead of merely assigning parts of the task of serial version to several processors, the new algorithm places multiple swarms on the available nodes in which operate independently, while collaborating on the same task. With the reduction of the communication bottleneck as well the ability to manipulate the individual swarms independently, the proposed approach outperforms the original PSO algorithm and still maintains the simplicity and ease of implementation. artificial intelligence cluster computing parallel computing Particle swarm optimization pso swarm intelligence
27	System Support for Large-scale Geospatial Data Analytics January 2020 (has links) abstract: The volume of available spatial data has increased tremendously. Such data includes but is not limited to: weather maps, socioeconomic data, vegetation indices, geotagged social media, and more. These applications need a powerful data management platform to support scalable and interactive analytics on big spatial data. Even though existing single-node spatial database systems (DBMSs) provide support for spatial data, they suﬀer from performance issues when dealing with big spatial data. Challenges to building large-scale spatial data systems are as follows: (1) System Scalability: The massive-scale of available spatial data hinders making sense of it using traditional spatial database management systems. Moreover, large-scale spatial data, besides its tremendous storage footprint, may be extremely diﬃcult to manage and maintain due to the heterogeneous shapes, skewed data distribution and complex spatial relationship. (2) Fast analytics: When the user runs spatial data analytics applications using graphical analytics tools, she does not tolerate delays introduced by the underlying spatial database system. Instead, the user needs to see useful information quickly. In this dissertation, I focus on designing eﬃcient data systems and data indexing mechanisms to bolster scalable and interactive analytics on large-scale geospatial data. I ﬁrst propose a cluster computing system GeoSpark which extends the core engine of Apache Spark and Spark SQL to support spatial data types, indexes, and geometrical operations at scale. In order to reduce the indexing overhead, I propose Hippo, a fast, yet scalable, sparse database indexing approach. In contrast to existing tree index structures, Hippo stores disk page ranges (each works as a pointer of one or many pages) instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. Moreover, I present Tabula, a middleware framework that sits between a SQL data system and a spatial visualization dashboard to make the user experience with the dashboard more seamless and interactive. Tabula adopts a materialized sampling cube approach, which pre-materializes samples, not for the entire table as in the SampleFirst approach, but for the results of potentially unforeseen queries (represented by an OLAP cube cell). / Dissertation/Thesis / Doctoral Dissertation Computer Science 2020 Computer science big data cluster computing data visualization database index distributed data systems geospatial data
28	Performance analysis and improvement of InfiniBand networks. Modelling and effective Quality-of-Service mechanisms for interconnection networks in cluster computing systems. Yan, Shihang January 2012 (has links) The InfiniBand Architecture (IBA) network has been proposed as a new industrial standard with high-bandwidth and low-latency suitable for constructing high-performance interconnected cluster computing systems. This architecture replaces the traditional bus-based interconnection with a switch-based network for the server Input-Output (I/O) and inter-processor communications. The efficient Quality-of-Service (QoS) mechanism is fundamental to ensure the import at QoS metrics, such as maximum throughput and minimum latency, leaving aside other aspects like guarantee to reduce the delay, blocking probability, and mean queue length, etc. Performance modelling and analysis has been and continues to be of great theoretical and practical importance in the design and development of communication networks. This thesis aims to investigate efficient and cost-effective QoS mechanisms for performance analysis and improvement of InfiniBand networks in cluster-based computing systems. Firstly, a rate-based source-response link-by-link admission and congestion control function with improved Explicit Congestion Notification (ECN) packet marking scheme is developed. This function adopts the rate control to reduce congestion of multiple-class traffic. Secondly, a credit-based flow control scheme is presented to reduce the mean queue length, throughput and response time of the system. In order to evaluate the performance of this scheme, a new queueing network model is developed. Theoretical analysis and simulation experiments show that these two schemes are quite effective and suitable for InfiniBand networks. Finally, to obtain a thorough and deep understanding of the performance attributes of InfiniBand Architecture network, two efficient threshold function flow control mechanisms are proposed to enhance the QoS of InfiniBand networks; one is Entry Threshold that sets the threshold for each entry in the arbitration table, and other is Arrival Job Threshold that sets the threshold based on the number of jobs in each Virtual Lane. Furthermore, the principle of Maximum Entropy is adopted to analyse these two new mechanisms with the Generalized Exponential (GE)-Type distribution for modelling the inter-arrival times and service times of the input traffic. Extensive simulation experiments are conducted to validate the accuracy of the analytical models. Performance analysis Modelling Quality-of-Service (QoS) InfiniBand Interconnection Networks Cluster Computing Systems.
29	Designing high performance and scalable MPI over InfiniBand Liu, Jiuxing 12 October 2004 (has links) No description available. Computer Science Cluster Computing InfiniBand MPI High Performance Computing High Speed Interconnect Performance Scalability
30	Publish Subscribe on Large-Scale Dynamic Topologies: Routing and Overlay Management Frey, Davide 18 May 2006 (has links) (PDF) Content-based publish-subscribe is emerging as a communication paradigm able to meet the demands of highly dynamic distributed applications, such as those made popular by mobile computing and peer-to-peer networks. Nevertheless, the available systems implementing this communication model are still unable to cope efficiently with dynamic changes to the topology of their distributed dispatching infrastructure. This hampers their applicability in the aforementioned scenarios. This thesis addresses this problem and presents a complete approach to the reconfiguration of content-based publish-subscribe systems. In Part I, it proposes a layered architecture for reconfigurable publish-subscribe middleware consisting of an overlay, a routing, and an event-recovery layer. This architecture allows the same routing components to operate in different types of dynamic network environments, by exploiting different underlying overlays. Part II addresses the routing layer with new protocols to manage the recon- figuration of the routing information enabling the correct delivery of events to subscribers. When the overlay changes as a result of nodes joining or leaving the network or as a result of mobility, this information is updated so that routing can adapt to the new environment. Our protocols manage to achieve this with as little overhead as possible. Part III addresses the overlay layer and proposes two novel approaches for building and maintaining a connected topology in highly dynamic network sce- narios. The protocols we present achieve this goal, while managing node degree and keeping reconfigurations localized when possible. These properties allow our overlay managers to be applied not only in the context of publish-subscribe mid- dleware but also as enabling technologies for other communication paradigms like application-level multicast. Finally, the thesis integrates the overlay and routing layers into a single frame- work and evaluates their combined performance both in wired and in wireless scenarios. Results show that the optimizations provided by our routing reconfig- uration protocols allow the middleware to achieve very good performance in such networks. Moreover, they highlight that our overlay layer is able to optimize this performance even further, significantly reducing the network traffic generated by the routing layer. The protocols presented in this thesis are implemented in the REDS middle- ware framework developed at Politecnico di Milano. Their use enables REDS to operate efficiently in dynamic network scenarios ranging from large-scale peer-to- peer to mobile ad hoc networks. publish-subscribe middleware overlay content-based routing

Search results