• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 74
  • 16
  • 7
  • 5
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 146
  • 146
  • 57
  • 23
  • 21
  • 21
  • 19
  • 19
  • 19
  • 17
  • 16
  • 16
  • 15
  • 15
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Analyzing hybrid architectures for massively parallel graph analysis

Ediger, David 08 April 2013 (has links)
The quantity of rich, semi-structured data generated by sensor networks, scientific simulation, business activity, and the Internet grows daily. The objective of this research is to investigate architectural requirements for emerging applications in massive graph analysis. Using emerging hybrid systems, we will map applications to architectures and close the loop between software and hardware design in this application space. Parallel algorithms and specialized machine architectures are necessary to handle the immense size and rate of change of today's graph data. To highlight the impact of this work, we describe a number of relevant application areas ranging from biology to business and cybersecurity. With several proposed architectures for massively parallel graph analysis, we investigate the interplay of hardware, algorithm, data, and programming model through real-world experiments and simulations. We demonstrate techniques for obtaining parallel scaling on multithreaded systems using graph algorithms that are orders of magnitude faster and larger than the state of the art. The outcome of this work is a proposed hybrid architecture for massive-scale analytics that leverages key aspects of data-parallel and highly multithreaded systems. In simulations, the hybrid systems incorporating a mix of multithreaded, shared memory systems and solid state disks performed up to twice as fast as either homogeneous system alone on graphs with as many as 18 trillion edges.
112

Algorithms for large graphs

Das Sarma, Atish 01 July 2010 (has links)
No description available.
113

Objektų sekimo vaizde algoritmų įgyvendinimo LPLM įrenginiu tyrimas / Investigation of Object Tracking Algorithms Based on FPGA

Sledevič, Tomyslav 26 July 2012 (has links)
Magistro baigiamojo darbo tikslas – įgyvendinti realiuoju laiku veikiančius objektų sekimo vaizde algoritmus lauku programuojamų loginių matricų įrenginyje (LPLM) ir ištirti šių algoritmų veikimą. Iškelti uždaviniai pasiekti 3 etapais. Atlikta analitinė objektų sekimo vaizde literatūros apžvalga, išanalizuoti objektų sekimo vaizde algoritmai bei jų įgyvendinimo galimybės LPLM įrenginiuose. Sukurti algoritmai ir programos įgyvendintos viename ir keliuose LPLM įrenginiuose (sinchroniškai) taikant VHDL programavimo kalbą ir veikia realiu laiku. Atlikti sukurtų algoritmų tyrimai ir gautų rezultatų analizė. Ištirtas objektų sekimo stabilumas keičiant apšviestumo lygį, fono sudėtingumą, objekto spalvą, judesio greitį, atstumą iki kameros ir posūkio kampą. Darbo apimtis – 69 psl. teksto be priedų, 72 iliustr., 70 bibliografinių šaltinių, 3 priedai. / The aim of master’s thesis is to investigate the object tracking methods and implement the object tracking algorithms in field programmable gate array (FPGA) devices for real-time execution. The aim is achieved by performing 3 tasks. The analytical review of object tracking methods is performed, reviewing the abilities of algorithms implementation on FPGAs. The object tracking algorithms are implemented in VHDL and distributed on one and few FPGA chips in parallel and works in real-time. The implemented algorithms are investigated and results are analyzed. The stability of different object tracking is investigated by changing the illumination, background complexity, object color, moving velocity, distance to camera and rotation angle. Thesis consists of: 69 p. text without appendixes, 72 figures, 70 bibliographical entries, 3 appendixes included.
114

A Complexity Theory for VLSI

Thompson, C. D. 01 August 1980 (has links)
The established methodologies for studying computational complexity can be applied to the new problems posed by very large-scale integrated (VLSI) circuits. This thesis develops a ''VLSI model of computation'' and derives upper and lower bounds on the silicon area and time required to solve the problems of sorting and discrete Fourier transformation. In particular, the area A and time T taken by any VLSI chip using any algorithm to perform an N-point Fourier transform must satisfy AT2 ≥ c N2 log2 N, for some fixed c > 0. A more general result for both sorting and Fourier transformation is that AT2x = Ω(N1 + x log2x N) for any x in the range 0 < x < 1. Also, the energy dissipated by a VLSI chip during the solution of either of these problems is at least Ω(N3/2 log N). The tightness of these bounds is demonstrated by the existence of nearly optimal circuits for both sorting and Fourier transformation. The circuits based on the shuffle-exchange interconnection pattern are fast but large: T = O(log2 N) for Fourier transformation, T = O(log3 N) for sorting; both have area A of at most O(N2 / log1/2 N). The circuits based on the mesh interconnection pattern are slow but small: T = O(N1/2 loglog N), A = O(N log2 N).
115

High performance computing for irregular algorithms and applications with an emphasis on big data analytics

Green, Oded 22 May 2014 (has links)
Irregular algorithms such as graph algorithms, sorting, and sparse matrix multiplication, present numerous programming challenges, including scalability, load balancing, and efficient memory utilization. In this age of Big Data we face additional challenges since the data is often streaming at a high velocity and we wish to make near real-time decisions for real-world events. For instance, we may wish to track Twitter for the pandemic spread of a virus. Analyzing such data sets requires combing algorithmic optimizations and utilization of massively multithreaded architectures, accelerator such as GPUs, and distributed systems. My research focuses upon designing new analytics and algorithms for the continuous monitoring of dynamic social networks. Achieving high performance computing for irregular algorithms such as Social Network Analysis (SNA) is challenging as the instruction flow is highly data dependent and requires domain expertise. The rapid changes in the underlying network necessitates understanding real-world graph properties such as the small world property, shrinking network diameter, power law distribution of edges, and the rate at which updates occur. These properties, with respect to a given analytic, can help design load-balancing techniques, avoid wasteful (redundant) computations, and create streaming algorithms. In the course of my research I have considered several parallel programming paradigms for a wide range systems of multithreaded platforms: x86, NVIDIA's CUDA, Cray XMT2, SSE-SIMD, and Plurality's HyperCore. These unique programming models require examination of the parallel programming at multiple levels: algorithmic design, cache efficiency, fine-grain parallelism, memory bandwidths, data management, load balancing, scheduling, control flow models and more. This thesis deals with these issues and more.
116

Fast Algorithms for Mining Co-evolving Time Series

Li, Lei 01 September 2011 (has links)
Time series data arise in many applications, from motion capture, environmental monitoring, temperatures in data centers, to physiological signals in health care. In the thesis, I will focus on the theme of learning and mining large collections of co-evolving sequences, with the goal of developing fast algorithms for finding patterns, summarization, and anomalies. In particular, this thesis will answer the following recurring challenges for time series: 1. Forecasting and imputation: How to do forecasting and to recover missing values in time series data? 2. Pattern discovery and summarization: How to identify the patterns in the time sequences that would facilitate further mining tasks such as compression, segmentation and anomaly detection? 3. Similarity and feature extraction: How to extract compact and meaningful features from multiple co-evolving sequences that will enable better clustering and similarity queries of time series? 4. Scale up: How to handle large data sets on modern computing hardware? We develop models to mine time series with missing values, to extract compact representation from time sequences, to segment the sequences, and to do forecasting. For large scale data, we propose algorithms for learning time series models, in particular, including Linear Dynamical Systems (LDS) and Hidden Markov Models (HMM). We also develop a distributed algorithm for finding patterns in large web-click streams. Our thesis will present special models and algorithms that incorporate domain knowledge. For motion capture, we will describe the natural motion stitching and occlusion filling for human motion. In particular, we provide a metric for evaluating the naturalness of motion stitching, based which we choose the best stitching. Thanks to domain knowledge (body structure and bone lengths), our algorithm is capable of recovering occlusions in mocap sequences, better in accuracy and longer in missing period. We also develop an algorithm for forecasting thermal conditions in a warehouse-sized data center. The forecast will help us control and manage the data center in a energy-efficient way, which can save a significant percentage of electric power consumption in data centers.
117

Analysis of synchronizations in greedy-scheduled executions and applications to efficient generation of pseudorandom numbers in parallel / Análise de sincronizações em execuções por escalonamento guloso e aplicações para geração eficiente de números pseudoaleatórios em paralelo / Analyse des synchronisations dans un programme parallèle ordonnancé par vol de travail applications à la génération déterministe de nombres pseudo-aléatoires

Mor, Stefano Drimon Kurz January 2015 (has links)
Nous présentons deux contributions dans le domaine de la programmation parallèle. La première est théorique : nous introduisons l’analyse SIPS, une approche nouvelle pour dénombrer le nombre d’opérations de synchronisation durant l’exécution d’un algorithme parallèle ordonnancé par vol de travail. Basée sur le concept d’horloges logiques, elle nous permet : d’une part de donner de nouvelles majorations de coût en moyenne; d’autre part de concevoir des programmes parallèles plus efficaces par adaptation dynamique de la granularité. La seconde contribution est pragmatique : nous présentons une parallélisation générique d’algorithmes pour la génération déterministe de nombres pseudo-aléatoires, indépendamment du nombre de processus concurrents lors de l’exécution. Alternative à l’utilisation d’un générateur pseudo-aléatoire séquentiel par processus, nous introduisons une API générique, appelée Par-R qui est conçue et analysée grâce à SIPS. Sa caractéristique principale est d’exploiter un générateur séquentiel qui peut “sauter” directement d’un nombre à un autre situé à une distance arbitraire dans la séquence pseudo-aléatoire. Grâce à l’analyse SIPS, nous montrons qu’en moyenne, lors d’une exécution par vol de travail d’un programme très parallèle (dont la profondeur ou chemin critique est très petite devant le travail ou nombre d’opérations), ces opérations de saut sont rares. Par-R est comparé au générateur pseudo-aléatoire DotMix écrit pour Cilk Plus, une extension de C/C++ pour la programmation parallèle par vol de travail. Le surcout théorique de Par-R se compare favorablement au surcoput de DotMix, ce qui apparait aussi expériemntalement. De plus, étant générique, Par-R est indépendant du générateur séquentiel sous-jacent. / Nós apresentamos duas contribuições para a área de programação paralela. A primeira contribuição é teórica: nós introduzimos a análise SIPS, uma nova abordagem para a estimar o número de sincronizações realizadas durante a execução de um algoritmo paralelo. SIPS generaliza o conceito de relógios lógicos para contar o número de sincronizações realizadas por um algoritmo paralelo e é capaz de calcular limites do pior caso mesmo na presença de execuções paralelas não-determinísticas, as quais não são geralmente cobertas por análises no estado-da-arte. Nossa análise nos permite estimar novos limites de pior caso para computações escalonadas pelo popular algoritmo de roubo de tarefas e também projetar programas paralelos e adaptáveis que são mais eficientes. A segunda contribuição é pragmática: nós apresentamos uma estratégia de paralelização eficiente para a geração de números pseudoaleatórios. Como uma alternativa para implementações fixas de componentes de geração aleatória nós introduzimos uma API chamada Par-R, projetada e analisada utilizando-se SIPS. Sua principal idea é o uso da capacidade de um gerador sequencial R de realizar um “pulo” eficiente dentro do fluxo de números gerados; nós os associamos a operações realizadas pelo escalonador por roubo de tarefas, o qual nossa análise baseada em SIPS demonstra ocorrer raramente em média. Par-R é comparado com o gerador paralelo de números pseudoaleatórios DotMix, escrito para a plataforma de multithreading dinâmico Cilk Plus. A latência de Par-R tem comparação favorável à latência do DotMix, o que é confirmado experimentalmente, mas não requer o uso subjacente fixado de um dado gerador aleatório. / We present two contributions to the field of parallel programming. The first contribution is theoretical: we introduce SIPS analysis, a novel approach to estimate the number of synchronizations performed during the execution of a parallel algorithm. Based on the concept of logical clocks, it allows us: on one hand, to deliver new bounds for the number of synchronizations, in expectation; on the other hand, to design more efficient parallel programs by dynamic adaptation of the granularity. The second contribution is pragmatic: we present an efficient parallelization strategy for pseudorandom number generation, independent of the number of concurrent processes participating in a computation. As an alternative to the use of one sequential generator per process, we introduce a generic API called Par-R, which is designed and analyzed using SIPS. Its main characteristic is the use of a sequential generator that can perform a “jump-ahead” directly from one number to another on an arbitrary distance within the pseudorandom sequence. Thanks to SIPS, we show that, in expectation, within an execution scheduled by work stealing of a “very parallel” program (whose depth or critical path is subtle when compared to the work or number of operations), these operations are rare. Par-R is compared with the parallel pseudorandom number generator DotMix, written for the Cilk Plus dynamic multithreading platform. The theoretical overhead of Par-R compares favorably to DotMix’s overhead, what is confirmed experimentally, while not requiring a fixed generator underneath.
118

Analysis of synchronizations in greedy-scheduled executions and applications to efficient generation of pseudorandom numbers in parallel / Análise de sincronizações em execuções por escalonamento guloso e aplicações para geração eficiente de números pseudoaleatórios em paralelo / Analyse des synchronisations dans un programme parallèle ordonnancé par vol de travail applications à la génération déterministe de nombres pseudo-aléatoires

Mor, Stefano Drimon Kurz January 2015 (has links)
Nous présentons deux contributions dans le domaine de la programmation parallèle. La première est théorique : nous introduisons l’analyse SIPS, une approche nouvelle pour dénombrer le nombre d’opérations de synchronisation durant l’exécution d’un algorithme parallèle ordonnancé par vol de travail. Basée sur le concept d’horloges logiques, elle nous permet : d’une part de donner de nouvelles majorations de coût en moyenne; d’autre part de concevoir des programmes parallèles plus efficaces par adaptation dynamique de la granularité. La seconde contribution est pragmatique : nous présentons une parallélisation générique d’algorithmes pour la génération déterministe de nombres pseudo-aléatoires, indépendamment du nombre de processus concurrents lors de l’exécution. Alternative à l’utilisation d’un générateur pseudo-aléatoire séquentiel par processus, nous introduisons une API générique, appelée Par-R qui est conçue et analysée grâce à SIPS. Sa caractéristique principale est d’exploiter un générateur séquentiel qui peut “sauter” directement d’un nombre à un autre situé à une distance arbitraire dans la séquence pseudo-aléatoire. Grâce à l’analyse SIPS, nous montrons qu’en moyenne, lors d’une exécution par vol de travail d’un programme très parallèle (dont la profondeur ou chemin critique est très petite devant le travail ou nombre d’opérations), ces opérations de saut sont rares. Par-R est comparé au générateur pseudo-aléatoire DotMix écrit pour Cilk Plus, une extension de C/C++ pour la programmation parallèle par vol de travail. Le surcout théorique de Par-R se compare favorablement au surcoput de DotMix, ce qui apparait aussi expériemntalement. De plus, étant générique, Par-R est indépendant du générateur séquentiel sous-jacent. / Nós apresentamos duas contribuições para a área de programação paralela. A primeira contribuição é teórica: nós introduzimos a análise SIPS, uma nova abordagem para a estimar o número de sincronizações realizadas durante a execução de um algoritmo paralelo. SIPS generaliza o conceito de relógios lógicos para contar o número de sincronizações realizadas por um algoritmo paralelo e é capaz de calcular limites do pior caso mesmo na presença de execuções paralelas não-determinísticas, as quais não são geralmente cobertas por análises no estado-da-arte. Nossa análise nos permite estimar novos limites de pior caso para computações escalonadas pelo popular algoritmo de roubo de tarefas e também projetar programas paralelos e adaptáveis que são mais eficientes. A segunda contribuição é pragmática: nós apresentamos uma estratégia de paralelização eficiente para a geração de números pseudoaleatórios. Como uma alternativa para implementações fixas de componentes de geração aleatória nós introduzimos uma API chamada Par-R, projetada e analisada utilizando-se SIPS. Sua principal idea é o uso da capacidade de um gerador sequencial R de realizar um “pulo” eficiente dentro do fluxo de números gerados; nós os associamos a operações realizadas pelo escalonador por roubo de tarefas, o qual nossa análise baseada em SIPS demonstra ocorrer raramente em média. Par-R é comparado com o gerador paralelo de números pseudoaleatórios DotMix, escrito para a plataforma de multithreading dinâmico Cilk Plus. A latência de Par-R tem comparação favorável à latência do DotMix, o que é confirmado experimentalmente, mas não requer o uso subjacente fixado de um dado gerador aleatório. / We present two contributions to the field of parallel programming. The first contribution is theoretical: we introduce SIPS analysis, a novel approach to estimate the number of synchronizations performed during the execution of a parallel algorithm. Based on the concept of logical clocks, it allows us: on one hand, to deliver new bounds for the number of synchronizations, in expectation; on the other hand, to design more efficient parallel programs by dynamic adaptation of the granularity. The second contribution is pragmatic: we present an efficient parallelization strategy for pseudorandom number generation, independent of the number of concurrent processes participating in a computation. As an alternative to the use of one sequential generator per process, we introduce a generic API called Par-R, which is designed and analyzed using SIPS. Its main characteristic is the use of a sequential generator that can perform a “jump-ahead” directly from one number to another on an arbitrary distance within the pseudorandom sequence. Thanks to SIPS, we show that, in expectation, within an execution scheduled by work stealing of a “very parallel” program (whose depth or critical path is subtle when compared to the work or number of operations), these operations are rare. Par-R is compared with the parallel pseudorandom number generator DotMix, written for the Cilk Plus dynamic multithreading platform. The theoretical overhead of Par-R compares favorably to DotMix’s overhead, what is confirmed experimentally, while not requiring a fixed generator underneath.
119

Calcul haute performance pour la simulation d'interactions fluide-structure / High performance computing for the simulation of fluid-structure interactions

Partimbene, Vincent 25 April 2018 (has links)
Cette thèse aborde la résolution des problèmes d'interaction fluide-structure par un algorithme consistant en un couplage entre deux solveurs : un pour le fluide et un pour la structure. Pour assurer la cohérence entre les maillages fluide et structure, on considère également une discrétisation de chaque domaine par volumes finis. En raison des difficultés de décomposition du domaine en sous-domaines, nous considérons pour chaque environnement un algorithme parallèle de multi-splitting (ou multi-décomposition) qui correspond à une présentation unifiée des méthodes de sous-domaines avec ou sans recouvrement. Cette méthode combine plusieurs applications de points fixes contractantes et nous montrons que, sous des hypothèses appropriées, chaque application de points fixes est contractante dans des espaces de dimensions finies normés par des normes hilbertiennes et non-hilbertiennes. De plus, nous montrons qu'une telle étude est valable pour les résolutions parallèles synchrones et plus généralement asynchrones de grands systèmes linéaires apparaissant lors de la discrétisation des problèmes d'interaction fluide-structure et peut être étendue au cas où le déplacement de la structure est soumis à des contraintes. Par ailleurs, nous pouvons également considérer l’analyse de la convergence de ces méthodes de multi-splitting parallèles asynchrones par des techniques d’ordre partiel, lié au principe du maximum discret, aussi bien dans le cadre linéaire que dans celui obtenu lorsque les déplacements de la structure sont soumis à des contraintes. Nous réalisons des simulations parallèles pour divers cas test fluide-structure sur différents clusters, en considérant des communications bloquantes et non bloquantes. Dans ce dernier cas nous avons eu à résoudre une difficulté d'implémentation dans la mesure où une erreur irrécupérable survenait lors de l'exécution ; cette difficulté a été levée par introduction d’une méthode assurant la terminaison de toutes les communications non bloquantes avant la mise à jour du maillage. Les performances des simulations parallèles sont présentées et analysées. Enfin, nous appliquons la méthodologie présentée précédemment à divers contextes d'interaction fluide-structure de type industriel sur des maillages non structurés, ce qui constitue une difficulté supplémentaire. / This thesis deals with the solution of fluid-structure interaction problems by an algorithm consisting in the coupling between two solvers: one for the fluid and one for the structure. In order to ensure the consistency between fluid and structure meshes, we also consider a discretization of each domain by finite volumes. Due to the difficulties of decomposing the domain into sub-domains, we consider a parallel multi-splitting algorithm for each environment which represents a unified presentation of sub-domain methods with or without overlapping. This method combines several contracting fixed point mappings and we show that, under appropriate assumptions, each fixed point mapping is contracting in finite dimensional spaces normalized by Hilbertian and non-Hilbertian norms. In addition, we show that such a study is valid for synchronous parallel solutions and more generally asynchronous of large linear systems arising from the discretization of fluidstructure interaction problems and can be extended to cases where the displacement of the structure is subject to constraints. Moreover, we can also consider the analysis of the convergence of these asynchronous parallel multi-splitting methods by partial ordering techniques, linked to the discrete maximum principle, both in the linear frame and in the one obtained when the structure's displacements are subjected to constraints. We carry out parallel simulations for various fluidstructure test cases on different clusters considering blocking and non-blocking communications. In the latter case, we had to solve an implementation problem due to the fact that an unrecoverable error occurred during execution; this issue has been overcome by introducing a method to ensure the termination of all non-blocking communications prior to the mesh update. Performances of parallel simulations are presented ans analyzed. Finally, we apply the methodology presented above to various fluid-structure interaction cases on unstructured meshes, which represents an additional difficulty.
120

Analysis of synchronizations in greedy-scheduled executions and applications to efficient generation of pseudorandom numbers in parallel / Análise de sincronizações em execuções por escalonamento guloso e aplicações para geração eficiente de números pseudoaleatórios em paralelo / Analyse des synchronisations dans un programme parallèle ordonnancé par vol de travail applications à la génération déterministe de nombres pseudo-aléatoires

Mor, Stefano Drimon Kurz January 2015 (has links)
Nous présentons deux contributions dans le domaine de la programmation parallèle. La première est théorique : nous introduisons l’analyse SIPS, une approche nouvelle pour dénombrer le nombre d’opérations de synchronisation durant l’exécution d’un algorithme parallèle ordonnancé par vol de travail. Basée sur le concept d’horloges logiques, elle nous permet : d’une part de donner de nouvelles majorations de coût en moyenne; d’autre part de concevoir des programmes parallèles plus efficaces par adaptation dynamique de la granularité. La seconde contribution est pragmatique : nous présentons une parallélisation générique d’algorithmes pour la génération déterministe de nombres pseudo-aléatoires, indépendamment du nombre de processus concurrents lors de l’exécution. Alternative à l’utilisation d’un générateur pseudo-aléatoire séquentiel par processus, nous introduisons une API générique, appelée Par-R qui est conçue et analysée grâce à SIPS. Sa caractéristique principale est d’exploiter un générateur séquentiel qui peut “sauter” directement d’un nombre à un autre situé à une distance arbitraire dans la séquence pseudo-aléatoire. Grâce à l’analyse SIPS, nous montrons qu’en moyenne, lors d’une exécution par vol de travail d’un programme très parallèle (dont la profondeur ou chemin critique est très petite devant le travail ou nombre d’opérations), ces opérations de saut sont rares. Par-R est comparé au générateur pseudo-aléatoire DotMix écrit pour Cilk Plus, une extension de C/C++ pour la programmation parallèle par vol de travail. Le surcout théorique de Par-R se compare favorablement au surcoput de DotMix, ce qui apparait aussi expériemntalement. De plus, étant générique, Par-R est indépendant du générateur séquentiel sous-jacent. / Nós apresentamos duas contribuições para a área de programação paralela. A primeira contribuição é teórica: nós introduzimos a análise SIPS, uma nova abordagem para a estimar o número de sincronizações realizadas durante a execução de um algoritmo paralelo. SIPS generaliza o conceito de relógios lógicos para contar o número de sincronizações realizadas por um algoritmo paralelo e é capaz de calcular limites do pior caso mesmo na presença de execuções paralelas não-determinísticas, as quais não são geralmente cobertas por análises no estado-da-arte. Nossa análise nos permite estimar novos limites de pior caso para computações escalonadas pelo popular algoritmo de roubo de tarefas e também projetar programas paralelos e adaptáveis que são mais eficientes. A segunda contribuição é pragmática: nós apresentamos uma estratégia de paralelização eficiente para a geração de números pseudoaleatórios. Como uma alternativa para implementações fixas de componentes de geração aleatória nós introduzimos uma API chamada Par-R, projetada e analisada utilizando-se SIPS. Sua principal idea é o uso da capacidade de um gerador sequencial R de realizar um “pulo” eficiente dentro do fluxo de números gerados; nós os associamos a operações realizadas pelo escalonador por roubo de tarefas, o qual nossa análise baseada em SIPS demonstra ocorrer raramente em média. Par-R é comparado com o gerador paralelo de números pseudoaleatórios DotMix, escrito para a plataforma de multithreading dinâmico Cilk Plus. A latência de Par-R tem comparação favorável à latência do DotMix, o que é confirmado experimentalmente, mas não requer o uso subjacente fixado de um dado gerador aleatório. / We present two contributions to the field of parallel programming. The first contribution is theoretical: we introduce SIPS analysis, a novel approach to estimate the number of synchronizations performed during the execution of a parallel algorithm. Based on the concept of logical clocks, it allows us: on one hand, to deliver new bounds for the number of synchronizations, in expectation; on the other hand, to design more efficient parallel programs by dynamic adaptation of the granularity. The second contribution is pragmatic: we present an efficient parallelization strategy for pseudorandom number generation, independent of the number of concurrent processes participating in a computation. As an alternative to the use of one sequential generator per process, we introduce a generic API called Par-R, which is designed and analyzed using SIPS. Its main characteristic is the use of a sequential generator that can perform a “jump-ahead” directly from one number to another on an arbitrary distance within the pseudorandom sequence. Thanks to SIPS, we show that, in expectation, within an execution scheduled by work stealing of a “very parallel” program (whose depth or critical path is subtle when compared to the work or number of operations), these operations are rare. Par-R is compared with the parallel pseudorandom number generator DotMix, written for the Cilk Plus dynamic multithreading platform. The theoretical overhead of Par-R compares favorably to DotMix’s overhead, what is confirmed experimentally, while not requiring a fixed generator underneath.

Page generated in 0.0878 seconds