Spelling suggestions: "subject:"sharedmemory"" "subject:"sharememory""
81 |
Abortable and Query-abortable Types and Their Efficient ImplementationHorn, Stephanie Lorraine 24 September 2009 (has links)
We introduce abortable and query-abortable object types intended for implementation in asynchronous shared-memory systems with low contention. Implementations of such types behave like ordinary objects when accessed sequentially, but may abort operations when accessed concurrently. An aborted operation may or may not take effect, i.e., cause a state transition, and it returns no indication of which possibility occurred. Since this uncertainty can be problematic, a query-abortable type supports a QUERY operation that each process can use to determine its last non-QUERY operation on the object that caused a state transition, and the response associated with this state transition.
Our research is closely related to obstruction-free implementations (introduced by Herlihy, Luchangco and Moir) and responsive obstruction-free implementations (introduced by Attiya, Guerraoui and Kouznetsov). Like abortable and query-abortable types, these implementations may exhibit degraded behaviour in the face of contention. We show that abortable registers--registers strictly weaker than safe registers--can be used to obtain obstruction-free and responsive obstruction-free implementations for any type.
We present universal constructions for abortable and query-abortable types that are novel and efficient in the number of registers used. Specifically, they are based on a simple timestamping mechanism for detecting concurrent executions, and, in systems with n processes, use only n abortable registers or only O(n^2) single-reader, single-writer abortable registers. The timestamping mechanism we introduce is based on the inc&read counter type and appears to be interesting in its own right. As a generalization, we study the k-inc&read counter types, for k>0.
We also identify a potential problem with correctness properties based on step contention: with such properties, the composition of correct object implementations may result in an implementation that is not correct. In other words, implementations defined in terms of step contention are not always composable. To avoid this problem, we introduce a property based on interval contention, namely non-triviality, to define the correct behaviour of abortable and query-abortable object implementations.
|
82 |
An API for adaptive loop scheduling in shared address space architecturesGovindaswamy, Kirthilakshmi. January 2003 (has links) (PDF)
Thesis (M.S.)--Mississippi State University. Department of Computer Science and Engineering. / Title from title screen. Includes bibliographical references.
|
83 |
Dynamic detection of the communication pattern in shared memory environments for thread mapping / Detecção dinâmica do padrão de comunicação em ambientes de memória compartilhada para o mapeamento de threadsCruz, Eduardo Henrique Molina da January 2012 (has links)
As threads de aplicações paralelas cooperam a fim de cumprir suas tarefas, dessa forma, comunicação é realizada entre elas. A latência de comunicação entre os núcleos em arquiteturas multiprocessadas diferem dependendo da hierarquia de memória e das interconexões. Com o aumento do número de núcleos por chip e número de threads por núcleo, esta diferença entre as latências de comunicação está aumentando. Portanto, é importante mapear as threads de aplicações paralelas levando em conta a comunicação entre elas. Em aplicações paralelas baseadas no paradigma de memória compartilhada, a comunicação é implícita e ocorre através de acessos à variáveis compartilhadas, o que torna difícil a descoberta do padrão de comunicação entre as threads. Mecanismos tradicionais usam simulação para monitorar os acessos à memória realizados pela aplicação, requerendo modificações no código fonte e aumentando drasticamente a sobrecarga. Nesta dissertação de mestrado, são introduzidos dois mecanismos inovadores com uma baixa sobrecarga para se detectar o padrão de comunicação entre threads. O primeiro mecanismo faz uso de informações sobre linhas compartilhadas de caches providas por protocolos de coerência de cache. O segundo mecanismo utiliza a Translation Lookaside Buffer (TLB) para detectar quais páginas de memória cada núcleo está acessando. Ambos os mecanismos dependem totalmente do hardware, o que torna o mapeamento de threads transparente aos programadores e permite que ele seja realizado dinamicamente pelo sistema operacional. Além disto, nenhuma tarefa de alta sobrecarga, como simulação, é requerida. As propostas foram avaliadas com o NAS Parallel Benchmarks (NPB), obtendo representações precisas dos padrões de comunicação. Mapeamentos para as threads foram gerados utilizando os padrões de comunicação descobertos e um algoritmo de mapeamento. O problema do mapeamento é NP-Difícil. Portanto, de forma a se atingir uma complexidade polinomial, o algoritmo empregado é heurístico, baseado no algoritmo de emparelhamento de grafos de Edmonds. Executando as aplicações com o mapeamento resultou em um ganho de desempenho de até 15; 3%. O número de faltas na cache, invalidações em linhas de cache e transações de espionagem foram reduzidos em até 31; 9%, 41% e 65; 4%, respectivamente. / The threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among themselves. The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase in the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads. The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols. The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing. Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns. We generated thread mappings from the detected communication patterns using a mapping algorithm. Mapping is a NP-Hard problem. Therefore, in order to achieve a polynomial complexity, we designed a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3% compared to the original scheduler of the operating system. The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively.
|
84 |
Dynamic detection of the communication pattern in shared memory environments for thread mapping / Detecção dinâmica do padrão de comunicação em ambientes de memória compartilhada para o mapeamento de threadsCruz, Eduardo Henrique Molina da January 2012 (has links)
As threads de aplicações paralelas cooperam a fim de cumprir suas tarefas, dessa forma, comunicação é realizada entre elas. A latência de comunicação entre os núcleos em arquiteturas multiprocessadas diferem dependendo da hierarquia de memória e das interconexões. Com o aumento do número de núcleos por chip e número de threads por núcleo, esta diferença entre as latências de comunicação está aumentando. Portanto, é importante mapear as threads de aplicações paralelas levando em conta a comunicação entre elas. Em aplicações paralelas baseadas no paradigma de memória compartilhada, a comunicação é implícita e ocorre através de acessos à variáveis compartilhadas, o que torna difícil a descoberta do padrão de comunicação entre as threads. Mecanismos tradicionais usam simulação para monitorar os acessos à memória realizados pela aplicação, requerendo modificações no código fonte e aumentando drasticamente a sobrecarga. Nesta dissertação de mestrado, são introduzidos dois mecanismos inovadores com uma baixa sobrecarga para se detectar o padrão de comunicação entre threads. O primeiro mecanismo faz uso de informações sobre linhas compartilhadas de caches providas por protocolos de coerência de cache. O segundo mecanismo utiliza a Translation Lookaside Buffer (TLB) para detectar quais páginas de memória cada núcleo está acessando. Ambos os mecanismos dependem totalmente do hardware, o que torna o mapeamento de threads transparente aos programadores e permite que ele seja realizado dinamicamente pelo sistema operacional. Além disto, nenhuma tarefa de alta sobrecarga, como simulação, é requerida. As propostas foram avaliadas com o NAS Parallel Benchmarks (NPB), obtendo representações precisas dos padrões de comunicação. Mapeamentos para as threads foram gerados utilizando os padrões de comunicação descobertos e um algoritmo de mapeamento. O problema do mapeamento é NP-Difícil. Portanto, de forma a se atingir uma complexidade polinomial, o algoritmo empregado é heurístico, baseado no algoritmo de emparelhamento de grafos de Edmonds. Executando as aplicações com o mapeamento resultou em um ganho de desempenho de até 15; 3%. O número de faltas na cache, invalidações em linhas de cache e transações de espionagem foram reduzidos em até 31; 9%, 41% e 65; 4%, respectivamente. / The threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among themselves. The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase in the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads. The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols. The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing. Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns. We generated thread mappings from the detected communication patterns using a mapping algorithm. Mapping is a NP-Hard problem. Therefore, in order to achieve a polynomial complexity, we designed a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3% compared to the original scheduler of the operating system. The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively.
|
85 |
Dynamic detection of the communication pattern in shared memory environments for thread mapping / Detecção dinâmica do padrão de comunicação em ambientes de memória compartilhada para o mapeamento de threadsCruz, Eduardo Henrique Molina da January 2012 (has links)
As threads de aplicações paralelas cooperam a fim de cumprir suas tarefas, dessa forma, comunicação é realizada entre elas. A latência de comunicação entre os núcleos em arquiteturas multiprocessadas diferem dependendo da hierarquia de memória e das interconexões. Com o aumento do número de núcleos por chip e número de threads por núcleo, esta diferença entre as latências de comunicação está aumentando. Portanto, é importante mapear as threads de aplicações paralelas levando em conta a comunicação entre elas. Em aplicações paralelas baseadas no paradigma de memória compartilhada, a comunicação é implícita e ocorre através de acessos à variáveis compartilhadas, o que torna difícil a descoberta do padrão de comunicação entre as threads. Mecanismos tradicionais usam simulação para monitorar os acessos à memória realizados pela aplicação, requerendo modificações no código fonte e aumentando drasticamente a sobrecarga. Nesta dissertação de mestrado, são introduzidos dois mecanismos inovadores com uma baixa sobrecarga para se detectar o padrão de comunicação entre threads. O primeiro mecanismo faz uso de informações sobre linhas compartilhadas de caches providas por protocolos de coerência de cache. O segundo mecanismo utiliza a Translation Lookaside Buffer (TLB) para detectar quais páginas de memória cada núcleo está acessando. Ambos os mecanismos dependem totalmente do hardware, o que torna o mapeamento de threads transparente aos programadores e permite que ele seja realizado dinamicamente pelo sistema operacional. Além disto, nenhuma tarefa de alta sobrecarga, como simulação, é requerida. As propostas foram avaliadas com o NAS Parallel Benchmarks (NPB), obtendo representações precisas dos padrões de comunicação. Mapeamentos para as threads foram gerados utilizando os padrões de comunicação descobertos e um algoritmo de mapeamento. O problema do mapeamento é NP-Difícil. Portanto, de forma a se atingir uma complexidade polinomial, o algoritmo empregado é heurístico, baseado no algoritmo de emparelhamento de grafos de Edmonds. Executando as aplicações com o mapeamento resultou em um ganho de desempenho de até 15; 3%. O número de faltas na cache, invalidações em linhas de cache e transações de espionagem foram reduzidos em até 31; 9%, 41% e 65; 4%, respectivamente. / The threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among themselves. The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase in the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads. The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols. The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing. Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns. We generated thread mappings from the detected communication patterns using a mapping algorithm. Mapping is a NP-Hard problem. Therefore, in order to achieve a polynomial complexity, we designed a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3% compared to the original scheduler of the operating system. The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively.
|
86 |
Teste de programas concorrentes com memória compartilhada / Test of shared memory concurrent programsFelipe Santos Sarmanho 13 April 2009 (has links)
Este trabalho propõe um modelo de teste para programas concorrentes que utilizam memória compartilhada. O modelo é inovador em três aspectos principais: (1) tratar a sincronização e a comunicação de threads de forma separada, (2) considerar a sincronização decorrente da inicialização/finalização de threads, e (3) apresenta um método baseado em timestamps para determinar as comunicações exercitadas em uma dada execução do programa. Os critérios de cobertura existentes para programas concorrentes foram adaptados ao contexto de programas baseados no paradigma de memória compartilhada. A ferramenta chamada ValiPThread foi implementada neste trabalho para apoiar a aplicação do modelo e dos critérios definidos. Com essa ferramenta é possível criar sessões de teste que podem ser salvas, interrompidas e retomadas a qualquer momento. Também é possível adicionar e executar casos de teste, avaliando a cobertura do código fonte em relação aos critérios de teste. A implementação da ferramenta mostra que é possível instanciar o modelo proposto em um software que auxilie a atividade de teste no contexto de programas com memória compartilhada. O trabalho apresenta soluções significativas para os principais desafios impostos pela programação concorrente para a atividade de teste, destacando-se dentre eles: (1) desenvolvimento de novas técnicas de análise estática para analisar programas concorrentes no contexto de memória compartilhada; (2) testar aspectos cruciais à programação concorrente como: sincronização, comunicação e fluxo de dados; (3) reproduzir uma execução de maneira controlada; (4) mapeamento de critérios de teste já existentes para programas concorrentes com passagem de mensagem para o contexto de memória compartilhada; (5) projetar critérios de fluxo de dados para programas concorrentes, considerando variáveis compartilhadas e (6) desenvolvimento de uma ferramenta de apoio a essas atividades / This work presents a novel test model for shared memory concurrent programs. Some important new features in this model are: (1) analysis the communication and synchronization in an isolated manner, (2) examines the synchronization due the start and the finish of threads, and (3) employs a method based on timestamps to check the communication exercised for an execution of the program. The coverage criteria set defined for concurrent programs was adjusted to the shared memory programs. In this work, the tool ValiPThread was implemented to support the application of the test model and of the coverage criteria. This tool allows to create a test session that can be saved, stopped and resumed at any time. In this tool is also possible to add and to execute test cases, analyzing the source code coverage with respect to the coverage criteria. The new tool shows that is possible instance the proposed model in a software that supports the test activity in context of shared memory. This work presents solutions for the major challenges related to task of to test concurrent programs, such as: (1) develop new techniques to do static analyzes of shared memory programs; (2) test important aspects to concurrent programs such as like: synchronization, communication and data flow; (3) replay and deterministic re-execution; (4) adjust coverage criteria from the context of message passing concurrent programs to shared memory context; (5) design data flow criteria based on shared variables and (6) build a tool to support these activities
|
87 |
Performance Prediction of Parallel Programs in a Linux EnvironmentFarooq, Mohammad Habibur Rahman & Qaisar January 2010 (has links)
Context. Today’s parallel systems are widely used in different computational tasks. Developing parallel programs to make maximum use of the computing power of parallel systems is tricky and efficient tuning of parallel programs is often very hard. Objectives. In this study we present a performance prediction and visualization tool named VPPB for a Linux environment, which had already been introduced by Broberg et.al, [1] for a Solaris2.x environment. VPPB shows the predicted behavior of a multithreaded program using any number of processors and the behavior is shown on two different graphs. The prediction is based on a monitored uni-processor execution. Methods. An experimental evaluation was carried out to validate the prediction reliability of the developed tool. Results. Validation of prediction is conducted, using an Intel multiprocessor with 8 processors and PARSEC 2.0 benchmark suite application programs. The validation shows that the speed-up predictions are +/-7% of a real execution. Conclusions. The experimentation of the VPPB tool showed that the prediction of VPPB is reliable and the incurred overhead into the application programs is low. / contact: +46(0)736368336
|
88 |
Programmation haute performance pour architectures hybrides / High Performance Programming for Hybrid ArchitecturesHabel, Rachid 19 November 2014 (has links)
Les architectures parallèles hybrides constituées d'un grand nombre de noeuds de calcul multi-coeurs/GPU connectés en réseau offrent des performances théoriques très élevées, de l'ordre de quelque dizaines de TeraFlops. Mais la programmation efficace de ces machines reste un défi à cause de la complexité de l'architecture et de la multiplication des modèles de programmation utilisés. L'objectif de cette thèse est d'améliorer la programmation des applications scientifiques denses sur les architectures parallèles hybrides selon trois axes: réduction des temps d'exécution, traitement de données de très grande taille et facilité de programmation. Nous avons pour cela proposé un modèle de programmation à base de directives appelé DSTEP pour exprimer à la fois la distribution des données et des calculs. Dans ce modèle, plusieurs types de distribution de données sont exprimables de façon unifiée à l'aide d'une directive "dstep distribute" et une réplication de certains éléments distribués peut être exprimée par un "halo". La directive "dstep gridify" exprime à la fois la distribution des calculs ainsi que leurs contraintes d'ordonnancement. Nous avons ensuite défini un modèle de distribution et montré la correction de la transformation de code du domaine séquentiel au domaine distribué. À partir du modèle de distribution, nous avons dérivé un schéma de compilation pour la transformation de programmes annotés de directives DSTEP en des programmes parallèles hybrides. Nous avons implémenté notre solution sous la forme d'un compilateur intégré à la plateforme de compilation PIPS ainsi qu'une bibliothèque fournissant les fonctionnalités du support d'exécution, notamment les communications. Notre solution a été validée sur des programmes de calcul scientifiques standards tirés des NAS Parallel Benchmarks et des Polybenchs ainsi que sur une application industrielle. / Clusters of multicore/GPU nodes connected with a fast network offer very high therotical peak performances, reaching tens of TeraFlops. Unfortunately, the efficient programing of such architectures remains challenging because of their complexity and the diversity of the existing programming models. The purpose of this thesis is to improve the programmability of dense scientific applications on hybrid architectures in three ways: reducing the execution times, processing larger data sets and reducing the programming effort. We propose DSTEP, a directive-based programming model expressing both data and computation distribution. A large set of distribution types are unified in a "dstep distribute" directive and the replication of some distributed elements can be expressed using a "halo". The "dstep gridify" directive expresses both the computation distribution and the schedule constraints of loop iterations. We define a distribution model and demonstrate the correctness of the code transformation from the sequential domain to the parallel domain. From the distribution model, we derive a generic compilation scheme transforming DSTEP annotated input programs into parallel hybrid ones. We have implemented such a tool as a compiler integrated to the PIPS compilation workbench together with a library offering the runtime functionality, especially the communication. Our solution is validated on scientific programs from the NAS Parallel Benchmarks and the PolyBenchs as well as on an industrial signal procesing application.
|
89 |
Directory scalability in multi-agent based systemsHussain, Shahid, Shabbir, Hassan January 2008 (has links)
Simulation is one approach to analyze and model the real world complex problems. Multi-agent based systems provide a platform to develop simulations based on the concept of agent-oriented programming. In multi-agent systems, the local interaction between agents contributes to the emergence of the global phenomena by getting the result of the simulation runs. In MABS systems, interaction is one common aspect for all agents to perform their tasks. To interact with each other the agents require yellow page services from the platform to search for other agents. As more and more agents perform searches on this yellow page directory, there is a decrease in the performance due to a central bottleneck. In this thesis, we have investigated multiple solutions for this problem. The most promising solution is to integrate distributed shared memory with the directory systems. With our proposed solution, empirical analysis shows a statistically significant increase in performance of the directory service. We expect this result to make a considerable contribution to the state of the art in multiagent platforms.
|
90 |
Some Theoretical Contributions To The Mutual Exclusion ProblemAlagarsamy, K 04 1900 (has links) (PDF)
No description available.
|
Page generated in 0.0485 seconds