Global ETD Search

71	Kinetic Monte Carlo simulations of submonolayer and multilayer epitaxial growth over extended time- and length-scales Giridhar, Nandipati 23 September 2009 (has links) No description available. Physics kinetic Monte Carlo surface physics thin film growth parallel algorithms first passage time simulations
72	Parallel Algorithms for Switching Edges and Generating Random Graphs from Given Degree Sequences using HPC Platforms Bhuiyan, Md Hasanuzzaman 09 November 2017 (has links) Networks (or graphs) are an effective abstraction for representing many real-world complex systems. Analyzing various structural properties of and dynamics on such networks reveal valuable insights about the behavior of such systems. In today's data-rich world, we are deluged by the massive amount of heterogeneous data from various sources, such as the web, infrastructure, and online social media. Analyzing this huge amount of data may take a prohibitively long time and even may not fit into the main memory of a single processing unit, thus motivating the necessity of efficient parallel algorithms in various high-performance computing (HPC) platforms. In this dissertation, we present distributed and shared memory parallel algorithms for some important network analytic problems. First, we present distributed memory parallel algorithms for switching edges in a network. Edge switch is an operation on a network, where two edges are selected randomly, and one of their end vertices are swapped with each other. This operation is repeated either a given number of times or until a specified criterion is satisfied. It has diverse real-world applications such as in generating simple random networks with a given degree sequence and in modeling and studying various dynamic networks. One of the steps in our edge switch algorithm requires generating multinomial random variables in parallel. We also present the first non-trivial parallel algorithm for generating multinomial random variables. Next, we present efficient algorithms for assortative edge switch in a labeled network. Assuming each vertex has a label, an assortative edge switch operation imposes an extra constraint, i.e., two edges are randomly selected and one of their end vertices are swapped with each other if the labels of the end vertices of the edges remain the same as before. It can be used to study the effect of the network structural properties on dynamics over a network. Although the problem of assortative edge switch seems to be similar to that of (regular) edge switch, the constraint on the vertex labels in assortative edge switch leads to a new difficulty, which needs to be addressed by an entirely new algorithmic approach. We first present a novel sequential algorithm for assortative edge switch; then we present an efficient distributed memory parallel algorithm based on our sequential algorithm. Finally, we present efficient shared memory parallel algorithms for generating random networks with exact given degree sequence using a direct graph construction method, which involves computing a candidate list for creating an edge incident on a vertex using the Erdos-Gallai characterization and then randomly creating the edges from the candidates. / Ph. D. / Network analysis has become a popular topic in many disciplines including social sciences, epidemiology, biology, and business as it provides valuable insights about many real-world systems represented as networks. The recent advancement of science and technology has resulted in a massive growth of such networks, and mining and processing such massive networks poses significant challenges, which can be addressed by various high-performance computing (HPC) platforms. In this dissertation, we present parallel algorithms for a few network analytic problems using HPC platforms. Random networks are widely used for modeling many complex real-world systems such as the Internet, biological, social, and infrastructure networks. Most prior work on generating random graphs involves sequential algorithms, and they can be broadly categorized in two classes: (i) edge switching and (ii) stub-matching. We present parallel algorithms for generating random graphs using both the edge switching and stub-matching methods. Our parallel algorithms for switching edges can generate random networks with billions of edges in a few minutes with 1024 processors. We have studied several load balancing methods to equally distribute workload among the processors to achieve the best performance. The parallel algorithm for generating random graphs using the stub-matching method also shows good speedup for medium-sized networks. We believe the proposed parallel algorithms will prove useful in analyzing and mining of emerging networks. Network Science Parallel Algorithms High Performance Computing Edge Switch Random Networks
73	Computational Cost Analysis of Large-Scale Agent-Based Epidemic Simulations Kamal, Tariq 21 September 2016 (has links) Agent-based epidemic simulation (ABES) is a powerful and realistic approach for studying the impacts of disease dynamics and complex interventions on the spread of an infection in the population. Among many ABES systems, EpiSimdemics comes closest to the popular agent-based epidemic simulation systems developed by Eubank, Longini, Ferguson, and Parker. EpiSimdemics is a general framework that can model many reaction-diffusion processes besides the Susceptible-Exposed-Infectious-Recovered (SEIR) models. This model allows the study of complex systems as they interact, thus enabling researchers to model and observe the socio-technical trends and forces. Pandemic planning at the world level requires simulation of over 6 billion agents, where each agent has a unique set of demographics, daily activities, and behaviors. Moreover, the stochastic nature of epidemic models, the uncertainty in the initial conditions, and the variability of reactions require the computation of several replicates of a simulation for a meaningful study. Given the hard timelines to respond, running many replicates (15-25) of several configurations (10-100) (of these compute-heavy simulations) can only be possible on high-performance clusters (HPC). These agent-based epidemic simulations are irregular and show poor execution performance on high-performance clusters due to the evolutionary nature of their workload, large irregular communication and load imbalance. For increased utilization of HPC clusters, the simulation needs to be scalable. Many challenges arise when improving the performance of agent-based epidemic simulations on high-performance clusters. Firstly, large-scale graph-structured computation is central to the processing of these simulations, where the star-motif quality nodes (natural graphs) create large computational imbalances and communication hotspots. Secondly, the computation is performed by classes of tasks that are separated by global synchronization. The non-overlapping computations cause idle times, which introduce the load balancing and cost estimation challenges. Thirdly, the computation is overlapped with communication, which is difficult to measure using simple methods, thus making the cost estimation very challenging. Finally, the simulations are iterative and the workload (computation and communication) may change through iterations, as a result introducing load imbalances. This dissertation focuses on developing a cost estimation model and load balancing schemes to increase the runtime efficiency of agent-based epidemic simulations on high-performance clusters. While developing the cost model and load balancing schemes, we perform the static and dynamic load analysis of such simulations. We also statically quantified the computational and communication workloads in EpiSimdemics. We designed, developed and evaluated a cost model for estimating the execution cost of large-scale parallel agent-based epidemic simulations (and more generally for all constrained producer-consumer parallel algorithms). This cost model uses computational imbalances and communication latencies, and enables the cost estimation of those applications where the computation is performed by classes of tasks, separated by synchronization. It enables the performance analysis of parallel applications by computing its execution times on a number of partitions. Our evaluations show that the model is helpful in performance prediction, resource allocation and evaluation of load balancing schemes. As part of load balancing algorithms, we adopted the Metis library for partitioning bipartite graphs. We have also developed lower-overhead custom schemes called Colocation and MetColoc. We performed an evaluation of Metis, Colocation, and MetColoc. Our analysis showed that the MetColoc schemes gives a performance similar to Metis, but with half the partitioning overhead (runtime and memory). On the other hand, the Colocation scheme achieves a similar performance to Metis on a larger number of partitions, but at extremely lower partitioning overhead. Moreover, the memory requirements of Colocation scheme does not increase as we create more partitions. We have also performed the dynamic load analysis of agent-based epidemic simulations. For this, we studied the individual and joint effects of three disease parameter (transmissiblity, infection period and incubation period). We quantified the effects using an analytical equation with separate constants for SIS, SIR and SI disease models. The metric that we have developed in this work is useful for cost estimation of constrained producer-consumer algorithms, however, it has some limitations. The applicability of the metric is application, machine and data-specific. In the future, we plan to extend the metric to increase its applicability to a larger set of machine architectures, applications, and datasets. / Ph. D. Cost Analysis and Estimation Parallel Algorithms Graph Partitioning Computational Epidemiology Disease Dynamics Statistical Analysis
74	Efficient Parallel Algorithms and Data Structures Related to Trees Chen, Calvin Ching-Yuen 12 1900 (has links) The main contribution of this dissertation proposes a new paradigm, called the parentheses matching paradigm. It claims that this paradigm is well suited for designing efficient parallel algorithms for a broad class of nonnumeric problems. To demonstrate its applicability, we present three cost-optimal parallel algorithms for breadth-first traversal of general trees, sorting a special class of integers, and coloring an interval graph with the minimum number of colors. trees in graph theory algorithms parallel algorithms data structure Trees (Graph theory) Algorithms. Data structures (Computer science)
75	Parallel programming on General Block Min Max Criterion Lee, ChuanChe 01 January 2006 (has links) The purpose of the thesis is to develop a parallel implementation of the General Block Min Max Criterion (GBMM). This thesis deals with two kinds of parallel overheads: Redundant Calculations Parallel Overhead (RCPO) and Communication Parallel Overhead (CPO). Parallel programming (Computer science) High performance computing Computer algorithms Parallel algorithms Computer algorithms High performance computing Parallel algorithms Parallel programming (Computer science) Software Engineering
76	Solução paralela para sistemas de balanço não-lineares / Parallel solution of nonlinear balance systems Hime, Gustavo 27 September 2007 (has links) Made available in DSpace on 2015-03-04T18:50:53Z (GMT). No. of bitstreams: 1 tese_pt.pdf: 595115 bytes, checksum: d770c01b95b6bf56187a1dc69943ffce (MD5) Previous issue date: 2007-09-27 / Modelos para diversos fenômenos baseiam-se em equações de balanço ou conservação. Dependendo do fenônemo e do que é admitido pelo modelo, nas equações são simplificadas e resolvidas de diferentes modos. O problema de injeção em um meio poroso de um fluido bifásico cujo equilíbrio depende da temperatura, por exemplo, pode ser modelado por uma equação de conservação de massa que inclui um termo difusivo; esta equação, por sua vez, pode ser discretizada por diferenças finitas tanto no tempo quanto no espaço e resolvida numericamente. O estudo estritamente analítico destes modelos é muito limitado. Uma compreensão mais detalhada do comportamento do modelo só pode ser obtida através de simulações numéricas e do estudo qualitativo de seus resultados. Os resultados de uma simulação só podem ser visualizados uma vez que esta tenha sido concluída: mas simulações de alta qualidade requerem simulações em malhas mais finas, que necessitam de mais tempo computacional. Mesmo para fluxos unidimensionais, o ciclo interativo de especificar os parâmetros para uma nova simulação com base nas conclusões tiradas de simulações prévias necessariamente inclui um tempo de espera indesejável. Sistemas capazes de resolver esta classe de problemas numéricos rápida e eficientemente são portanto o objetivo principal deste trabalho. Para obter alto desempenho no cálculo destas soluções, muitos fatores precisam ser levados em consideração: o custo computacional inerente às equações constitutivas usadas no modelo, o tipo específico de sistema linear resultante da discretização do problema, as diferentes alternativas quanto ao algoritmo de solução do sistema e suas implementações e os pontos fortes e limitações impostas por cada ambiente computacional que se deseja explorar. Como resultado do teste de diversas abordagens em diferentes máquinas, nós obtemos não somente um motor numérico eficiente para os casos de estudo apresentados neste trabalho, mas também um guia para a aplicação destas técnicas a problemas similares. Algoritmos Paralelos Sistemas Bloco Tridiagonais Análise de Desempenho Parallel algorithms Block tridiagonal systems Performance analysis
77	Algoritmos paralelos para alocação e gerência de processadores em máquinas multiprocessadoras hipercúbicas / Parallel algorithms for processor allocation in hypercubes De Rose, Cesar Augusto Fonticielha January 1993 (has links) Nos últimos anos, máquinas maciçamente paralelas, compostas de centenas de processadores, vem sendo estudadas como uma alternativa para a construção de supercomputadores. Neste novo conceito de processamento de dados, grandes velocidades são alcançadas através da cooperação entre os diversos elementos processadores na resolução de um problema. Grande parte das máquinas maciçamente paralelas encontradas no mercado utilizam-se da topologia hipercúbica para a interconexão de seus múltiplos processadores, ou podem ser configuradas como tal. Uma alternativa interessante para o compartilhamento da capacidade de processamento destas máquinas é sua utilização como computador agregado a uma rede, servindo a diversos usuários [DUT 91]. Desta forma, a máquina hipercúbica se comporta como um banco de processadores, que permite que cada usuário aloque parte de seus processadores para seu uso pessoal. Isto resulta em um aumento no desempenho da rede ao nível de supercomputadores com um custo relativamente baixo e viabiliza a construção de máquinas hipercúbicas com altas dimensões, evitando que estas sejam sub-utilizadas. Neste tipo de contexto, cabe ao sistema operacional atender as requisições dos usuários do hipercubo compartilhado de forma eficiente, a fim de evitar uma rápida fragmentação do cubo e de não exceder o tempo máximo de espera de uma determinada aplicação. A partir dos algoritmos propostos é apresentada a definição de um servidor de processadores para o compartilhamento de uma máquina multiprocessadora hipercúbica em uma rede de estações de trabalho. Algumas funções deste servidor são implementadas por um protótipo denominado Sub-Cube RPC. Com o objetivo de analisar o comportamento da rede de estações em relação a inclusão de um novo recurso a ser compartilhado, foi desenvolvido, juntamente com o grupo de Avaliação de Desempenho ADMP, um simulador para o ambiente SUN/UNIX. Através desta ferramenta e dos tempos de resposta obtidos pelo protótipo do servidor desenvolvido é possível avaliar o custo que o tráfego gerado pelo servidor adiciona à rede, sendo possível a manipulação de parâmetros da rede e do servidor. Os resultados obtidos nas versões paralelas implementadas são comparados com o desempenho das versões seqüenciais. Para viabilizar esta comparação, todos os algoritmos seqüenciais encontrados na literatura também foram implementados na linguagem "C" no ambiente alvo UNIX e encontram-se em anexo. As versões paralelas foram implementadas utilizando-se recursos da própria rede de estações, através de diretivas socket, e também em Transputers na linguagem C paralela. O protótipo do servidor de processadores foi implementado como um servidor RPC para uma rede de estações UNIX também na linguagem "C". A ferramenta de simulação para o funcionamento do servidor foi implementada na linguagem "C" e seu sistema de entrada de dados e visualização utiliza a interface X-Windows. Com os resultados deste trabalho se pode ter uma boa idéia dos efeitos e das dificuldades encontradas na paralelização dos algoritmos de alocação e gerência de processadores para máquinas Hipercúbicas. As informações contidas no trabalho auxiliam na melhoria do tempo de resposta dos algoritmos seqüenciais atuais e no desenvolvimento de novos algoritmos com mais recursos e ainda assim viáveis em ambientes interativos, graças a utilização de paralelismo. O protótipo Sub-Cube RPC demonstra como os algoritmos estudados neste trabalho podem ser aplicados na construção de um servidor de processadores para máquinas multiprocessadas. O protótipo servirá como base para a implementação de um servidor semelhante no CPGCC/UFRGS, que colocará uma placa de Transputers à disposição da rede de estações do grupo de processamento paralelo. / In the last years massively parallel machines, build with hundreds of processors, are becoming an alternative for the construction of supercomputers. In this new concept of data processing, high performance is achieved by processor cooperation in the resolution of a problem. A great part of the commercial massively parallel machines utilizes the hypercubic topology to interconnect their multiple processors, or may be configured as hypercubes. A very interesting alternative for sharing the processing power of this machines is their utilization as aggregated computer in a network, serving various users [DUT 91]. In such environment, the hypercube behaves like a processor server, permitting the users to allocate part of its processors for local use. This result in a enhancement in the performance of workstation networks to the level of supercomputers and allow higher dimension hypercubes to be better utilized. In such environment the operating system is responsible for serving the users of a shared multiprocessor in a efficient way, not allowing a quick fragmentation of the hypercube and observing the maximal waiting time for the applications. The algorithms for processor allocation and management are responsible for obtention and control of one or more processors of the shared machine for the user's task execution. In this study, parallel versions of the most important algorithms for processor allocation and management in hypercubes found in the literature are proposed. The intention with this paralelization is to achieved a better response time of the more complex algorithms, making their use possible in a real time sharing environment. Because the allocation is considered the most important part of the processor server, the utilization of more complex algorithms allows a better utilization of the shared processors, resulting in a performance increase of the parallel machine. Based on the proposed algorithms, a processor server is defined for sharing a hypercubic multiprocessor in a workstation network. Some functions of this server are implemented in a prototype called Sub-Cube RPC. To analyze the behavior of the network, in relation to the inclusion of this new shared resource, a simulator for the SUN/UNIX environment has been developed together with the Performance Evaluation Group ADMP. With this tool and with the response times of the developed server prototype, it is possible to evaluate the cost of the additional network traffic generated by the server, with the possibility to change parameters of the server and network. The results obtained in the implemented parallel versions are compared with the performance of the sequential algorithms. To make this comparison possible all the sequential algorithms found in the literature are also implemented in the "C" language and can be found in annex. The parallel versions were implemented using network resources, through the socket directive, and also using Transputers in parallel "C". The processor server prototype was implemented as a RPC server for an UNIX network, also in the "C" language. The simulation tool was coded in "C" and the I/O interface use the X-Windows protocol. The results of this study may give a background about the effects and difficulties found in the pa ralelization of the allocation algorithms for the hypercubic machines. The information found in this study will help the operating system designer to obtain a better response time of the sequential algorithms found in the literature and in the development of new and more complex algorithms that will be still practicable in a real time environment due to parallelism utilization. The Sub-Cube RPC prototype demonstrates how the algorithms studied in this work can be applied in the construction of a processor server for multiprocessors. The prototype is the first step for the implementation of a similar server in the CPGCC/UFRGS that will share a Transputer board in a network of workstations from the parallel processing group. Arquitetura de computadores Processamento paralelo Algoritmos paralelos Hipercubo Alocacao : Processadores Computer architecture Parallel processing Processor allocation Parallel algorithms Hypercubes
78	Shell-based geometric image and video inpainting Hocking, Laird Robert January 2018 (has links) The subject of this thesis is a class of fast inpainting methods (image or video) based on the idea of filling the inpainting domain in successive shells from its boundary inwards. Image pixels (or video voxels) are filled by assigning them a color equal to a weighted average of either their already filled neighbors (the ``direct'' form of the method) or those neighbors plus additional neighbors within the current shell (the ``semi-implicit'' form). In the direct form, pixels (voxels) in the current shell may be filled independently, but in the semi-implicit form they are filled simultaneously by solving a linear system. We focus in this thesis mainly on the image inpainting case, where the literature contains several methods corresponding to the {\em direct} form of the method - the semi-implicit form is introduced for the first time here. These methods effectively differ only in the order in which pixels (voxels) are filled, the weights used for averaging, and the neighborhood that is averaged over. All of them are very fast, but at the same time all of them leave undesirable artifacts such as ``kinking'' (bending) or blurring of extrapolated isophotes. This thesis has two main goals. First, we introduce new algorithms within this class, which are aimed at reducing or eliminating these artifacts, and also target a specific application - the 3D conversion of images and film. The first part of this thesis will be concerned with introducing 3D conversion as well as Guidefill, a method in the above class adapted to the inpainting problems arising in 3D conversion. However, the second and more significant goal of this thesis is to study these algorithms as a class. In particular, we develop a mathematical theory aimed at understanding the origins of artifacts mentioned. Through this, we seek is to understand which artifacts can be eliminated (and how), and which artifacts are inevitable (and why). Most of the thesis is occupied with this second goal. Our theory is based on two separate limits - the first is a {\em continuum} limit, in which the pixel width →0, and in which the algorithm converges to a partial differential equation. The second is an asymptotic limit in which h is very small but non-zero. This latter limit, which is based on a connection to random walks, relates the inpainted solution to a type of discrete convolution. The former is useful for studying kinking artifacts, while the latter is useful for studying blur. Although all the theoretical work has been done in the context of image inpainting, experimental evidence is presented suggesting a simple generalization to video. Finally, in the last part of the thesis we explore shell-based video inpainting. In particular, we introduce spacetime transport, which is a natural generalization of the ideas of Guidefill and its predecessor, coherence transport, to three dimensions (two spatial dimensions plus one time dimension). Spacetime transport is shown to have much in common with shell-based image inpainting methods. In particular, kinking and blur artifacts persist, and the former of these may be alleviated in exactly the same way as in two dimensions. At the same time, spacetime transport is shown to be related to optical flow based video inpainting. In particular, a connection is derived between spacetime transport and a generalized Lucas-Kanade optical flow that does not distinguish between time and space.
79	Scalable Community Detection using Distributed Louvain Algorithm Sattar, Naw Safrin 23 May 2019 (has links) Community detection (or clustering) in large-scale graph is an important problem in graph mining. Communities reveal interesting characteristics of a network. Louvain is an efficient sequential algorithm but fails to scale emerging large-scale data. Developing distributed-memory parallel algorithms is challenging because of inter-process communication and load-balancing issues. In this work, we design a shared memory-based algorithm using OpenMP, which shows a 4-fold speedup but is limited to available physical cores. Our second algorithm is an MPI-based parallel algorithm that scales to a moderate number of processors. We also implement a hybrid algorithm combining both. Finally, we incorporate dynamic load-balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms, shows around 12-fold speedup scaling to a larger number of processors. Overall, we present the challenges, our solutions, and the empirical performance of our algorithms for several large real-world networks. Community Detection Louvain Method Parallel Algorithms MPI OpenMP Load-balancing Graph Mining Computer Engineering Computer Sciences Theory and Algorithms
80	Designing Efficient Parallel Algorithms for Graph Problems Liang, Weifa, wliang@cs.anu.edu.au January 1997 (has links) Graph algorithms are concerned with the algorithmic aspects of solving graph problems. The problems are motivated from and have application to diverse areas of computer science, engineering and other disciplines. Problems arising from these areas of application are good candidates for parallelization since they often have both intense computational needs and stringent response time requirements. Motivated by these concerns, this thesis investigates parallel algorithms for these kinds of graph problems that have at least one of the following properties: the problems involve some type of dynamic updates; the sparsification technique is applicable; or the problems are closely related to communications network issues. The models of parallel computation used in our studies are the Parallel Random Access Machine (PRAM) model and the practical interconnection network models such as meshes and hypercubes. ¶ Consider a communications network which can be represented by a graph G = (V;E), where V is a set of sites (processors), and E is a set of links which are used to connect the sites (processors). In some cases, we also assign weights and/or directions to the edges in E. Associated with this network, there are many problems such as (i) whether the network is k-edge (k-vertex) connected withfixed k; (ii) whether there are k-edge (k-vertex) disjoint paths between u and v for a pair of given vertices u and v after the network is dynamically updated by adding and/or deleting an edge etc; (iii) whether the sites in the network can communicate with each other when some sites and links fail; (iv) identifying the first k edges in the network whose deletion will result in the maximum increase in the routing cost in the resulting network for fixed k; (v) how to augment the network at optimal cost with a given feasible set of weighted edges such that the augmented network is k-edge (k-vertex) connected; (vi) how to route messages through the network efficiently. In this thesis we answer the problems mentioned above by presenting efficient parallel algorithms to solve them. As far as we know, most of the proposed algorithms are the first ones in the parallel setting. ¶ Even though most of the problems concerned in this thesis are related to communications networks, we also study the classic edge-coloring problem. The outstanding difficulty to solve this problem in parallel is that we do not yet know whether or not it is in NC. In this thesis we present an improved parallel algorithm for the problem which needs [bigcircle]([bigtriangleup][superscript 4.5]log [superscript 3] [bigtriangleup] log n + [bigtriangleup][superscript 4] log [superscript 4] n) time using [bigcircle](n[superscript 2][bigtriangleup] + n[bigtriangleup][superscript 3]) processors, where n is the number of vertices and [bigtriangleup] is the maximum vertex degree. Compared with a previously known result on the same model, we improved by an [bigcircle]([bigtriangleup][superscript 1.5]) factor in time. The non-trivial part is to reduce this problem to the edge-coloring update problem. We also generalize this problem to the approximate edge-coloring problem by giving a faster parallel algorithm for the latter case. ¶ Throughout the design and analysis of parallel graph algorithms, we also find a technique called the sparsification technique is very powerful in the design of efficient sequential and parallel algorithms on dense undirected graphs. We believe that this technique may be useful in its own right for guiding the design of efficient sequential and parallel algorithms for problems in other areas as well as in graph theory. efficient parallel algorithms graph problems graph algorithms paralellization dynamic updates sparsification communication networks Parallel Random Access Machine PRAM meshes hypercubes

Search results