Global ETD Search

111	Algorithmique distribuée asynchrone avec une majorité de pannes / Asynchronous distributed computing with a majority of crashes Bonnin, David 24 November 2015 (has links) En algorithmique distribuée, le modèle asynchrone par envoi de messages et à pannes est connu et utilisé dans de nombreux articles de par son réalisme,par ailleurs il est suffisamment simple pour être utilisé et suffisamment complexe pour représenter des problèmes réels. Dans ce modèle, les n processus communiquent en s'échangeant des messages, mais sans borne sur les délais de communication, c'est-à-dire qu'un message peut mettre un temps arbitrairement long à atteindre sa destination. De plus, jusqu'à f processus peuvent tomber en panne, et ainsi arrêter définitivement de fonctionner. Ces pannes indétectables à cause de l'asynchronisme du système limitent les possibilités de ce modèle. Dans de nombreux cas, les résultats connus dans ces systèmes sont limités à une stricte minorité de pannes. C'est par exemple le cas de l'implémentation de registres atomiques et de la résolution du renommage. Cette barrière de la majorité de pannes, expliquée par le théorème CAP, s'applique à de nombreux problèmes, et fait que le modèle asynchrone par envoi de messages avec une majorité de pannes est peu étudié. Il est donc intéressant d'étudier ce qu'il est possible de faire dans ce cadre.Cette thèse cherche donc à mieux comprendre ce modèle à majorité de pannes, au travers de deux principaux problèmes. Dans un premier temps, on étudie l'implémentation d'objets partagés similaires aux registres habituels, en définissant les bancs de registres x-colorés et les α-registres. Dans un second temps, le problème du renommage est étendu en renommage k-redondant, dans ses versions à-un-coup et réutilisable, et de même pour les objets partagés diviseurs, étendus en k-diviseurs. / In distributed computing, asynchronous message-passing model with crashes is well-known and considered in many articles, because of its realism and it issimple enough to be used and complex enough to represent many real problems.In this model, n processes communicate by exchanging messages, but withoutany bound on communication delays, i.e. a message may take an arbitrarilylong time to reach its destination. Moreover, up to f among the n processesmay crash, and thus definitely stop working. Those crashes are undetectablebecause of the system asynchronism, and restrict the potential results in thismodel.In many cases, known results in those systems must verify the propertyof a strict minority of crashes. For example, this applies to implementationof atomic registers and solving of renaming. This barrier of a majority ofcrashes, explained by the CAP theorem, restricts numerous problems, and theasynchronous message-passing model with a majority of crashes is thus notwell-studied and rather unknown. Hence, studying what can be done in thiscase of a majority of crashes is interesting.This thesis tries to analyse this model, through two main problems. The first part studies the implementation of shared objects, similar to usual registers,by defining x-colored register banks, and α-registers. The second partextends the renaming problem into k-redundant renaming, for both one-shotand long-lived versions, and similarly for the shared objects called splitters intok-splitters. Algorithmes distribués Asynchrone Envoi de message Panne Mémoire partagée Registre Renommage Diviseur Distributed algorithms Asynchronous Message-passing Crash Shared memory Register Renaming Splitter
112	Arquitetura de uma rede de interconexão com memória compartilhada baseada na topologia crossbar / Architecture of an interconnection network with shared memory based on the topology crossbar. Fábio Gonçalves Pessanha 22 March 2013 (has links) Multi-Processor System-on-Chip (MPSoC) possui vários processadores, em um único chip. Várias aplicações podem ser executadas de maneira paralela ou uma aplicação paralelizável pode ser particionada e alocada em cada processador, a fim de acelerar a sua execução. Um problema em MPSoCs é a comunicação entre os processadores, necessária para a execução destas aplicações. Neste trabalho, propomos uma arquitetura de rede de interconexão baseada na topologia crossbar, com memória compartilhada. Esta arquitetura é parametrizável, possuindo N processadores e N módulos de memórias. A troca de informação entre os processadores é feita via memória compartilhada. Neste tipo de implementação cada processador executa a sua aplicação em seu próprio módulo de memória. Através da rede, todos os processadores têm completo acesso a seus módulos de memória simultaneamente, permitindo que cada aplicação seja executada concorrentemente. Além disso, um processador pode acessar outros módulos de memória, sempre que necessite obter dados gerados por outro processador. A arquitetura proposta é modelada em VHDL e seu desempenho é analisado através da execução paralela de uma aplicação, em comparação à sua respectiva execução sequencial. A aplicação escolhida consiste na otimização de funções objetivo através do método de Otimização por Enxame de Partículas (Particle Swarm Optimization - PSO). Neste método, um enxame de partículas é distribuído igualmente entre os processadores da rede e, ao final de cada interação, um processador acessa o módulo de memória de outro processador, a fim de obter a melhor posição encontrada pelo enxame alocado neste. A comunicação entre processadores é baseada em três estratégias: anel, vizinhança e broadcast. Essa aplicação foi escolhida por ser computacionalmente intensiva e, dessa forma, uma forte candidata a paralelização. / Multi-Processor System-on-Chip (MPSoC) has multiple processors in a single chip. Multiple applications can be executed in parallel or a parallelizable application can be partitioned and allocated to each processor in order to accelerate their execution. One problem in MPSoCs is the communication between the processors required to implement these applications. In this work, we propose the architecture of an interconnection network based on the crossbar topology, with shared memory. This architecture is parameterizable, having N processors and N memory modules. The exchange of information between processors is done via shared memory. In this type of implementation each processor executes its application stored in its own memory module. Through the network, all processors have complete access to their own memory modules simultaneously allowing each application to run concurrently. Moreover, a processor can access other memory modules, whenever it needs to retrieve data generated by another processor. The proposed architecture is modelled in VHDL and its performance is analysed by the execution of a parallel aplication, in comparison to its sequencial one. The chosen application consists of optimizing some objetive functions by using the Particle Swarm Optimization method. In this method, particles of a swarm are distributed among the processors and, at the end of each iteration, a processor accesses the memory module of another one in order to obtain the best position found in the swarm. The communication between processors is based on three strategies: ring, neighbourhood and broadcast. This application was chosen due to its computational intensive characteristic and, therefore, a strong candidate for parallelization. Engenharia Eletrônica Redes de Interconexão Memória Compartilhada Redes Crossbar Arquitetura de redes Electronic Engineering Interconnection Network Shared Memory Crossbar Switch Network architecture ENGENHARIAS
113	LX-MCAPI : biblioteca de comunicação para suporte a programação paralela em sistemas multi-core Ideguchi, Antonio Diogo Hidee 12 May 2016 (has links) Submitted by Alison Vanceto (alison-vanceto@hotmail.com) on 2016-12-19T10:21:33Z No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-16T18:00:17Z (GMT) No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2017-01-16T18:00:38Z (GMT) No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) / Made available in DSpace on 2017-01-16T18:00:48Z (GMT). No. of bitstreams: 1 DissADHI.pdf: 1668973 bytes, checksum: 66675509e8ba3ae17c94da9b605df4d4 (MD5) Previous issue date: 2016-05-12 / Não recebi financiamento / The multi-core processors represent the industry response for the physical barriers encountered during the development of computing processors during the last decades, and brought new advances on computing system performance. The complex superscalar unicore processors with high frequency clocks gave way to processing units with two or more cores in just one encapsulation, generally with low clock frequencies, allowing one or more execution threads per core. On this context, the existing programming models using serial and concurrent paradigms do not allow exploring the real potential provided by the new hardware elements incorporated, generating a necessity of new programming methodologies that does allow exploring parallelism aggregated by the use of multi-core processors. This work presents LX-MCAPI, a library based on modern IPC (Inter-Process Communication) and memory sharing mechanisms, developed over the hypothesis that message passing is a viable, flexible and scalable abstraction, compared to conventional programming methods using shared-memory on multi-core systems. LX-MCAPI offers a message-passing, zerocopy memory sharing mechanism between processes and ready to use scalability patterns to facilitate the process of abstraction and construction of applications. It has performed well in therms of transmission latency and transfer rate on x86-64 and ARM environments. / Os processadores multi-core representaram a resposta da indústria às barreiras físicas encontradas no desenvolvimento de processadores computacionais nas últimas décadas, e trouxeram novo fôlego ao avanço do desempenho de sistemas computacionais. Os complexos processadores superescalares de núcleo único com frequências de clock relativamente altas deram espaço a unidades de processamento com dois ou mais núcleos em um mesmo encapsulamento, geralmente mais “lentos”, possibilitando uma ou mais threads por núcleo. Nesse contexto, os modelos de programação existentes utilizando os paradigmas sequencial e concorrente não permitiam a exploração do potencial real proporcionado pelos novos elementos de hardware introduzidos, gerando uma necessidade de criação de novas metodologias de programação que permitissem tirar proveito do paralelismo agregado à utilização dos processadores multi-core. Este trabalho apresenta a LX-MCAPI, biblioteca baseada em mecanismos modernos de IPC (Inter-Process Communication) e compartilhamento de memória, desenvolvida sobre a hipótese em que a passagem de mensagens é uma abstração viável, flexível e escalável, quando comparada a métodos de programação convencionais utilizando memória-compartilhada em sistemas multi-core. LX-MCAPI oferece um mecanismo de passagem de mensagem e compartilhamento zero-copy de memória entre processos, além de padrões de programação paralela prontos para uso, que facilitam o processo de abstração e construção de aplicações. Além disso, apresentando bom desempenho em termos de latências de transmissão e taxas de transferência em ambientes x86-64 e ARM. Multi-core Paralelismo Metodologia Programação Desempenho parallelism methodology programming performance message passing shared-memory
114	Scaling the solution of large sparse linear systems using multifrontal methods on hybrid shared-distributed memory architectures / Scalabilité des méthodes multifrontales pour la résolution de grands systèmes linéaires creux sur architectures hybrides à mémoire partagée et distribuée Sid Lakhdar, Mohamed Wissam 01 December 2014 (has links) La résolution de systèmes d'équations linéaires creux est au cœur de nombreux domaines d'applications. De même que la quantité de ressources de calcul augmente dans les architectures modernes, offrant ainsi de nouvelles perspectives, la taille des problèmes rencontré de nos jours dans les applications de simulations numériques augmente aussi et de façon significative. L'exploitation des architectures modernes pour la résolution efficace de problèmes de très grande taille devient ainsi un défit a relever, aussi bien d'un point de vue théorique que d'un point de vue algorithmique. L'objectif de cette thèse est d'adresser les problèmes de scalabilité des solveurs creux directs basés sur les méthodes multifrontales en environnements parallèles asynchrones. Dans la première partie de la thèse, nous nous intéressons a l'exploitation du parallélisme multicoeur sur les architectures a mémoire partagée. Nous introduisons une variante de l'algorithme Geist-Ng afin de gérer aussi bien un parallélisme a grain fin, a travers l'utilisation de librairies BLAS séquentiel et parallèle optimisées, que d'un parallélisme a plus gros grain, a travers l'utilisation de parallélisme a base de directives OpenMP. Nous considérons aussi des aspects mémoire afin d'améliorer les performances sur des architectures NUMA: (i) d'une part, nous analysons l'influence de la localité mémoire et utilisons des stratégies d'allocation mémoire adaptatives pour gérer les espaces de travail privés et partagés; (ii) d'autre part, nous nous intéressons au problème de partages de ressources sur les architectures multicoeurs, qui induisent des pénalités en termes de performances. Enfin, afin d'éviter que des ressources ne reste inertes a la fin de l'exécution de leurs taches, et ainsi, afin d'exploiter au mieux les ressources disponibles, nous proposons un algorithme conceptuellement proche de l'approche dite de vol de travail, et qui consiste a assigner les ressources de calculs inactives au taches de travail actives de façon dynamique. Dans la deuxième partie de cette thèse, nous nous intéressons aux architectures hybrides, a base de mémoire partagées et de mémoire distribuées, pour lesquels un travail particulier est nécessaire afin d'améliorer la scalabilité du traitement de problèmes de grande taille. Nous étudions et optimisons tout d'abord les noyaux d'algèbre linéaire danse utilisé dans les méthodes multifrontales en environnent distribué asynchrone, en repensant les variantes right-looking et left-looking de la factorisation LU avec pivotage partiel dans notre contexte distribué. De plus, du fait du parallélisme multicoeurs, la proportion des communications relativement aux calculs et plus importante. Nous expliquons comment construire des algorithmes de mapping qui minimisent les communications entres nœuds de l'arbre de dépendances de la méthode multifrontale. Nous montrons aussi que les communications asynchrones collectives deviennent christiques sur grand nombres de processeurs, et que les broadcasts asynchrones a base d'arbres de broadcast doivent être utilisés. Nous montrons ensuite que dans un contexte multifrontale complètement asynchrone, où plusieurs instances de tels communications ont lieux, de nouveaux problèmes de synchronisation apparaissent. Nous analysons et caractérisons les situations de deadlock possibles et établissons formellement des propriétés générales simples afin de résoudre ces problèmes de deadlock. Nous établissons par la suite des propriétés nous permettant de relâcher les synchronisations induites par la solutions précédentes, et ainsi, d'améliorer les performances. Enfin, nous montrons que les synchronisations peuvent être relâchées dans un solveur creux danse et illustrons les gains en performances, sur des problèmes de grande taille issue d'applications réelles, dans notre environnement multifrontale complètement asynchrone. / The solution of sparse systems of linear equations is at the heart of numerous applicationfields. While the amount of computational resources in modern architectures increases and offersnew perspectives, the size of the problems arising in today’s numerical simulation applicationsalso grows very much. Exploiting modern architectures to solve very large problems efficiently isthus a challenge, from both a theoretical and an algorithmic point of view. The aim of this thesisis to address the scalability of sparse direct solvers based on multifrontal methods in parallelasynchronous environments.In the first part of this thesis, we focus on exploiting multi-threaded parallelism on sharedmemoryarchitectures. A variant of the Geist-Ng algorithm is introduced to handle both finegrain parallelism through the use of optimized sequential and multi-threaded BLAS libraries andcoarser grain parallelism through explicit OpenMP based parallelization. Memory aspects arethen considered to further improve performance on NUMA architectures: (i) on the one hand,we analyse the influence of memory locality and exploit adaptive memory allocation strategiesto manage private and shared workspaces; (ii) on the other hand, resource sharing on multicoreprocessors induces performance penalties when many cores are active (machine load effects) thatwe also consider. Finally, in order to avoid resources remaining idle when they have finishedtheir share of the work, and thus, to efficiently exploit all computational resources available, wepropose an algorithm wich is conceptually very close to the work-stealing approach and whichconsists in dynamically assigning idle cores to busy threads/activities.In the second part of this thesis, we target hybrid shared-distributed memory architectures,for which specific work to improve scalability is needed when processing large problems. We firststudy and optimize the dense linear algebra kernels used in distributed asynchronous multifrontalmethods. Simulation, experimentation and profiling have been performed to tune parameterscontrolling the algorithm, in correlation with problem size and computer architecture characteristics.To do so, right-looking and left-looking variants of the LU factorization with partialpivoting in our distributed context have been revisited. Furthermore, when computations are acceleratedwith multiple cores, the relative weight of communication with respect to computationis higher. We explain how to design mapping algorithms minimizing the communication betweennodes of the dependency tree of the multifrontal method, and show that collective asynchronouscommunications become critical on large numbers of processors. We explain why asynchronousbroadcasts using standard tree-based communication algorithms must be used. We then showthat, in a fully asynchronous multifrontal context where several such asynchronous communicationtrees coexist, new synchronization issues must be addressed. We analyse and characterizethe possible deadlock situations and formally establish simple global properties to handle deadlocks.Such properties partially force synchronization and may limit performance. Hence, wedefine properties which enable us to relax synchronization and thus improve performance. Ourapproach is based on the observation that, in our case, as long as memory is available, deadlockscannot occur and, consequently, we just need to keep enough memory to guarantee thata deadlock can always be avoided. Finally, we show that synchronizations can be relaxed in astate-of-the-art solver and illustrate the performance gains on large real problems in our fullyasynchronous multifrontal approach. Algèbre linéaire creuse Méthode multifrontale Mémoire partagée Mémoire distribuée Architectures NUMA Asynchronisme Sparse linear algebra Multifrontal method Shared-memory Distributed-memory NUMA architectures Asynchronism
115	[en] HIGH RESOLUTION GRAPHIC SYSTEM / [pt] PROCESSADOR GRÁFICO PARA SISTEMAS DE ALTA RESOLUÇÃO MARCELO ROBERTO BAPTISTA PEREIRA LUIS JIMENEZ 18 June 2007 (has links) [pt] Neste trabalho a arquitetura de placas gráficas que usam a tecnologia de varredura (raster scan) é analisada. É discutido então o uso de memórias dinâmicas do tipo VRAM para lidar com o problema do gargalo dos acessos à memória de vídeo. São analisados então alguns módulos importantes que podem ser considerados opcionais numa placa de vídeo, uma vez que a escolha por estes módulos depende da aplicação específica da placa gráfica. Finalmente, apresentamos a descrição do projeto e implementação de uma placa gráfica utilizando o processador gráfico TMS34010 com capacidade para realizar aquisição de imagens. / [en] In the present work, we analyse the architecture of raster- scan graphic boards. We discuss then the use of VRAM dynamic memories to deal with the video memories bottleneck problem. We also analyse a few important modules that may be considered optionals, since the choice of using these modules depends upon the specific use the graphic board will be given. At last, we present the description of the project and implementation of a graphic board using the TMS34010 graphic processor with image acquisition capabilities. [pt] PROCESSADOR GRAFICO [en] GRAPHIC PROCESSOR [pt] MEMORIA DINAMICA VRAM [en] VRAM DYNAMIC MEMORY [pt] PROCESSADOR DE COMUNICACAO [en] COMMUNICATION PROCESSOR [pt] MEMORIA COMPARTILHADA [en] SHARED MEMORY
116	Comparison of Shared memory based parallel programming models Ravela, Srikar Chowdary January 2010 (has links) Parallel programming models are quite challenging and emerging topic in the parallel computing era. These models allow a developer to port a sequential application on to a platform with more number of processors so that the problem or application can be solved easily. Adapting the applications in this manner using the Parallel programming models is often influenced by the type of the application, the type of the platform and many others. There are several parallel programming models developed and two main variants of parallel programming models classified are shared and distributed memory based parallel programming models. The recognition of the computing applications that entail immense computing requirements lead to the confrontation of the obstacle regarding the development of the efficient programming models that bridges the gap between the hardware ability to perform the computations and the software ability to support that performance for those applications [25][9]. And so a better programming model is needed that facilitates easy development and on the other hand porting high performance. To answer this challenge this thesis confines and compares four different shared memory based parallel programming models with respect to the development time of the application under a shared memory based parallel programming model to the performance enacted by that application in the same parallel programming model. The programming models are evaluated in this thesis by considering the data parallel applications and to verify their ability to support data parallelism with respect to the development time of those applications. The data parallel applications are borrowed from the Dense Matrix dwarfs and the dwarfs used are Matrix-Matrix multiplication, Jacobi Iteration and Laplace Heat Distribution. The experimental method consists of the selection of three data parallel bench marks and developed under the four shared memory based parallel programming models considered for the evaluation. Also the performance of those applications under each programming model is noted and at last the results are used to analytically compare the parallel programming models. Results for the study show that by sacrificing the development time a better performance is achieved for the chosen data parallel applications developed in Pthreads. On the other hand sacrificing a little performance data parallel applications are extremely easy to develop in task based parallel programming models. The directive models are moderate from both the perspectives and are rated in between the tasking models and threading models. / From this study it is clear that threading model Pthreads model is identified as a dominant programming model by supporting high speedups for two of the three different dwarfs but on the other hand the tasking models are dominant in the development time and reducing the number of errors by supporting high growth in speedup for the applications without any communication and less growth in self-relative speedup for the applications involving communications. The degrade of the performance by the tasking models for the problems based on communications is because task based models are designed and bounded to execute the tasks in parallel without out any interruptions or preemptions during their computations. Introducing the communications violates the purpose and there by resulting in less performance. The directive model OpenMP is moderate in both aspects and stands in between these models. In general the directive models and tasking models offer better speedup than any other models for the task based problems which are based on the divide and conquer strategy. But for the data parallelism the speedup growth however achieved is low (i.e. they are less scalable for data parallel applications) are equally compatible in execution times with threading models. Also the development times are considerably low for data parallel applications this is because of the ease of development supported by those models by introducing less number of functional routines required to parallelize the applications. This thesis is concerned about the comparison of the shared memory based parallel programming models in terms of the speedup. This type of work acts as a hand in guide that the programmers can consider during the development of the applications under the shared memory based parallel programming models. We suggest that this work can be extended in two different ways: one is from the developer‘s perspective and the other is a cross-referential study about the parallel programming models. The former can be done by using a similar study like this by a different programmer and comparing this study with the new study. The latter can be done by including multiple data points in the same programming model or by using a different set of parallel programming models for the study. / C/O K. Manoj Kumar; LGH 555; Lindbloms Vägan 97; 37233; Ronneby. Phone no: 0738743400 Home country phone no: +91 9948671552 Parallel Programming models Distributed memory Shared memory Dwarfs Development time Speedup Data parallelism Dense Matrix dwarfs threading models Tasking models Directive models. Computer Sciences Datavetenskap (datalogi)
117	Conception d'une architecture extensible pour le calcul massivement parallèle / Designing a scalable architecture for massively parallel computing Kaci, Ania 14 December 2016 (has links) En réponse à la demande croissante de performance par une grande variété d’applications (exemples : modélisation financière, simulation sub-atomique, bio-informatique, etc.), les systèmes informatiques se complexifient et augmentent en taille (nombre de composants de calcul, mémoire et capacité de stockage). L’accroissement de la complexité de ces systèmes se traduit par une évolution de leur architecture vers une hétérogénéité des technologies de calcul et des modèles de programmation. La gestion harmonieuse de cette hétérogénéité, l’optimisation des ressources et la minimisation de la consommation constituent des défis techniques majeurs dans la conception des futurs systèmes informatiques.Cette thèse s’adresse à un domaine de cette complexité en se focalisant sur les sous-systèmes à mémoire partagée où l’ensemble des processeurs partagent un espace d’adressage commun. Les travaux porteront essentiellement sur l’implémentation d’un protocole de cohérence de cache et de consistance mémoire, sur une architecture extensible et sur la méthodologie de validation de cette implémentation.Dans notre approche, nous avons retenu les processeurs 64-bits d’ARM et des co-processeurs génériques (GPU, DSP, etc.) comme composants de calcul, les protocoles de mémoire partagée AMBA/ACE et AMBA/ACE-Lite ainsi que l’architecture associée « CoreLink CCN » comme solution de départ. La généralisation et la paramètrisation de cette architecture ainsi que sa validation dans l’environnement de simulation Gem5 constituent l’épine dorsale de cette thèse.Les résultats obtenus à la fin de la thèse, tendent à démontrer l’atteinte des objectifs fixés / In response to the growing demand for performance by a wide variety of applications (eg, financial modeling, sub-atomic simulation, bioinformatics, etc.), computer systems become more complex and increase in size (number of computing components, memory and storage capacity). The increased complexity of these systems results in a change in their architecture towards a heterogeneous computing technologies and programming models. The harmonious management of this heterogeneity, resource optimization and minimization of consumption are major technical challenges in the design of future computer systems.This thesis addresses a field of this complexity by focusing on shared memory subsystems where all processors share a common address space. Work will focus on the implementation of a cache coherence and memory consistency on an extensible architecture and methodology for validation of this implementation.In our approach, we selected processors 64-bit ARM and generic co-processor (GPU, DSP, etc.) as components of computing, shared memory protocols AMBA / ACE and AMBA / ACE-Lite and associated architecture "CoreLink CCN" as a starting solution. Generalization and parameterization of this architecture and its validation in the simulation environment GEM5 are the backbone of this thesis.The results at the end of the thesis, tend to demonstrate the achievement of objectives Systèmes informatiques Mémoire partagée Cohérence de cache Consistance mémoire Modélisation niveau transactionnel Réseaux d’interconnexion Computing systems Shared memory Cache coherency Memory consistancy Transactional level modeling Interconnection networks
118	Fault tolerance for stream programs on parallel platforms Sanz-Marco, Vicent January 2015 (has links) A distributed system is defined as a collection of autonomous computers connected by a network, and with the appropriate distributed software for the system to be seen by users as a single entity capable of providing computing facilities. Distributed systems with centralised control have a distinguished control node, called leader node. The main role of a leader node is to distribute and manage shared resources in a resource-efficient manner. A distributed system with centralised control can use stream processing networks for communication. In a stream processing system, applications typically act as continuous queries, ingesting data continuously, analyzing and correlating the data, and generating a stream of results. Fault tolerance is the ability of a system to process the information, even if it happens any failure or anomaly in the system. Fault tolerance has become an important requirement for distributed systems, due to the possibility of failure has currently risen to the increase in number of nodes and the runtime of applications in distributed system. Therefore, to resolve this problem, it is important to add fault tolerance mechanisms order to provide the internal capacity to preserve the execution of the tasks despite the occurrence of faults. If the leader on a centralised control system fails, it is necessary to elect a new leader. While leader election has received a lot of attention in message-passing systems, very few solutions have been proposed for shared memory systems, as we propose. In addition, rollback-recovery strategies are important fault tolerance mechanisms for distributed systems, since that it is based on storing information into a stable storage in failure-free state and when a failure affects a node, the system uses the information stored to recover the state of the node before the failure appears. In this thesis, we are focused on creating two fault tolerance mechanisms for distributed systems with centralised control that uses stream processing for communication. These two mechanism created are leader election and log-based rollback-recovery, implemented using LPEL. The leader election method proposed is based on an atomic Compare-And-Swap (CAS) instruction, which is directly available on many processors. Our leader election method works with idle nodes, meaning that only the non-busy nodes compete to become the new leader while the busy nodes can continue with their tasks and later update their leader reference. Furthermore, this leader election method has short completion time and low space complexity. The log-based rollback-recovery method proposed for distributed systems with stream processing networks is a novel approach that is free from domino effect and does not generate orphan messages accomplishing the always-no-orphans consistency condition. Additionally, this approach has lower overhead impact into the system compared to other approaches, and it is a mechanism that provides scalability, because it is insensitive to the number of nodes in the system. 004.2
119	Conception et implémentation d'un langage de programmation concurrente modulaire / Design and implementation of a modular concurrent programming language Grande, Johan 28 September 2015 (has links) La programmation concurrente à mémoire partagée est un modèle classique de concurrence qui permet notamment de tirer parti des processeurs multicoeurs aujourd'hui très répandus dans les ordinateurs personnels. Les programmes concurrents sont sujets au problème des interblocages, notoirement difficiles à prévoir et à éliminer, en particulier dans le cas de l'utilisation du mécanisme de synchronisation très populaire que sont les mutex. Dans cette thèse nous avons travaillé à rendre plus aisée la programmation avec des mutex en étudiant des méthodes d'évitement des interblocages. Nous avons d'abord étudié une méthode utilisant une analyse statique par un système de types et d'effets, puis une variante de cette méthode dans un langage à typage dynamique. La seconde méthode est celle que nous avons le plus développée. Elle combine prévention et évitement des interblocages pour fournir une fonction de verrouillage sans interblocages expressive et utilisable. Nous l'avons implémentée sous forme d'une bibliothèque Hop (dialecte de Scheme). Ce faisant, nous avons développé un algorithme sans famine pour l'acquisition simultanée d'un nombre arbitraire de mutex, et identifié le concept d'interblocage asymptotique. Nous avons également été amenés à proposer une optimisation des exceptions (blocs finally). Nos tests de performances semblent indiquer un impact négligeable de l'utilisation de notre bibliothèque sur des applications concurrentes réelles. La majeure partie de notre recherche pourrait être appliquée à d'autres langages de programmation structurée tels que Java. / Shared-memory concurrency is a classic concurrency model which, among other things, makes it possible to take advantage of multicore processors that are now widespread in personal computers. Concurrent programs are prone to deadlocks which are notoriously hard to predict and debug. Programs using mutexes, a very popular synchronization mechanism, are no exception. In this thesis we studied deadlock avoidance methods with the aim of making programming with mutexes easier. We first studied a method that uses a static analysis by means of a type and effect system, then a variation on this method in a dynamically typed language. We developed more the second method. It mixes deadlock prevention and avoidance to provide an easy-to-use and expressive deadlock-free locking function. We implemented it as a Hop (dialect of Scheme) library. This lead us to develop a starvation-free algorithm to simultaneously acquire an arbitrary number of mutexes, and to identify the concept of asymptotic deadlock. While doing so, we also developped an optimization of exceptions(finally blocks). Our performance tests seem to show that using our library has negligible impact on theperformance of real-life applications. Most of our work could be applied to other structured programming languages such as Java. Concurrence à mémoire partagée Mutex Verouillage imbriqué Évitement des interblocages Interblocage asymptotique Shared memory concurrency Thread Mutex Nested lock Deadlok avoidance Asymptotic deadlock
120	Securing Multiprocessor Systems-on-Chip Biswas, Arnab Kumar 16 August 2016 (has links) (PDF) MHRD PhD scholarship / With Multiprocessor Systems-on-Chips (MPSoCs) pervading our lives, security issues are emerging as a serious problem and attacks against these systems are becoming more critical and sophisticated. We have designed and implemented different hardware based solutions to ensure security of an MPSoC. Security assisting modules can be implemented at different abstraction levels of an MPSoC design. We propose solutions both at circuit level and system level of abstractions. At the VLSI circuit level abstraction, we consider the problem of presence of noise voltage in input signal coming from outside world. This noise voltage disturbs the normal circuit operation inside a chip causing false logic reception. If the disturbance is caused intentionally the security of a chip may be compromised causing glitch/transient attack. We propose an input receiver with hysteresis characteristic that can work at voltage levels between 0.9V and 5V. The circuit can protect the MPSoC from glitch/transient attack. At the system level, we propose solutions targeting Network-on-Chip (NoC) as the on-chip communication medium. We survey the possible attack scenarios on present-day MPSoCs and investigate a new attack scenario, i.e., router attack targeted toward NoC enabled MPSoC. We propose different monitoring-based countermeasures against routing table-based router attack in an MPSoC having multiple Trusted Execution Environments (TEEs). Software attacks, the most common type of attacks, mainly exploit vulnerabilities like buffer overflow. This is possible if proper access control to memory is absent in the system. We propose four hardware based mechanisms to implement Role Based Access Control (RBAC) model in NoC based MPSoC. Network-on-Chip (NoC) Computer Security Multiprocessor Systems-on-Chips (MPSoCs) MP-SoC Role Based Access Control (RBAC) Role Based Shared Memory Access Control MPSoC Electronics Engineering

Search results