Global ETD Search

61	Solução paralela para sistemas de balanço não-lineares / Parallel solution of nonlinear balance systems Hime, Gustavo 27 September 2007 (has links) Made available in DSpace on 2015-03-04T18:50:53Z (GMT). No. of bitstreams: 1 tese_pt.pdf: 595115 bytes, checksum: d770c01b95b6bf56187a1dc69943ffce (MD5) Previous issue date: 2007-09-27 / Modelos para diversos fenômenos baseiam-se em equações de balanço ou conservação. Dependendo do fenônemo e do que é admitido pelo modelo, nas equações são simplificadas e resolvidas de diferentes modos. O problema de injeção em um meio poroso de um fluido bifásico cujo equilíbrio depende da temperatura, por exemplo, pode ser modelado por uma equação de conservação de massa que inclui um termo difusivo; esta equação, por sua vez, pode ser discretizada por diferenças finitas tanto no tempo quanto no espaço e resolvida numericamente. O estudo estritamente analítico destes modelos é muito limitado. Uma compreensão mais detalhada do comportamento do modelo só pode ser obtida através de simulações numéricas e do estudo qualitativo de seus resultados. Os resultados de uma simulação só podem ser visualizados uma vez que esta tenha sido concluída: mas simulações de alta qualidade requerem simulações em malhas mais finas, que necessitam de mais tempo computacional. Mesmo para fluxos unidimensionais, o ciclo interativo de especificar os parâmetros para uma nova simulação com base nas conclusões tiradas de simulações prévias necessariamente inclui um tempo de espera indesejável. Sistemas capazes de resolver esta classe de problemas numéricos rápida e eficientemente são portanto o objetivo principal deste trabalho. Para obter alto desempenho no cálculo destas soluções, muitos fatores precisam ser levados em consideração: o custo computacional inerente às equações constitutivas usadas no modelo, o tipo específico de sistema linear resultante da discretização do problema, as diferentes alternativas quanto ao algoritmo de solução do sistema e suas implementações e os pontos fortes e limitações impostas por cada ambiente computacional que se deseja explorar. Como resultado do teste de diversas abordagens em diferentes máquinas, nós obtemos não somente um motor numérico eficiente para os casos de estudo apresentados neste trabalho, mas também um guia para a aplicação destas técnicas a problemas similares. Algoritmos Paralelos Sistemas Bloco Tridiagonais Análise de Desempenho Parallel algorithms Block tridiagonal systems Performance analysis
62	Algoritmos paralelos para alocação e gerência de processadores em máquinas multiprocessadoras hipercúbicas / Parallel algorithms for processor allocation in hypercubes De Rose, Cesar Augusto Fonticielha January 1993 (has links) Nos últimos anos, máquinas maciçamente paralelas, compostas de centenas de processadores, vem sendo estudadas como uma alternativa para a construção de supercomputadores. Neste novo conceito de processamento de dados, grandes velocidades são alcançadas através da cooperação entre os diversos elementos processadores na resolução de um problema. Grande parte das máquinas maciçamente paralelas encontradas no mercado utilizam-se da topologia hipercúbica para a interconexão de seus múltiplos processadores, ou podem ser configuradas como tal. Uma alternativa interessante para o compartilhamento da capacidade de processamento destas máquinas é sua utilização como computador agregado a uma rede, servindo a diversos usuários [DUT 91]. Desta forma, a máquina hipercúbica se comporta como um banco de processadores, que permite que cada usuário aloque parte de seus processadores para seu uso pessoal. Isto resulta em um aumento no desempenho da rede ao nível de supercomputadores com um custo relativamente baixo e viabiliza a construção de máquinas hipercúbicas com altas dimensões, evitando que estas sejam sub-utilizadas. Neste tipo de contexto, cabe ao sistema operacional atender as requisições dos usuários do hipercubo compartilhado de forma eficiente, a fim de evitar uma rápida fragmentação do cubo e de não exceder o tempo máximo de espera de uma determinada aplicação. A partir dos algoritmos propostos é apresentada a definição de um servidor de processadores para o compartilhamento de uma máquina multiprocessadora hipercúbica em uma rede de estações de trabalho. Algumas funções deste servidor são implementadas por um protótipo denominado Sub-Cube RPC. Com o objetivo de analisar o comportamento da rede de estações em relação a inclusão de um novo recurso a ser compartilhado, foi desenvolvido, juntamente com o grupo de Avaliação de Desempenho ADMP, um simulador para o ambiente SUN/UNIX. Através desta ferramenta e dos tempos de resposta obtidos pelo protótipo do servidor desenvolvido é possível avaliar o custo que o tráfego gerado pelo servidor adiciona à rede, sendo possível a manipulação de parâmetros da rede e do servidor. Os resultados obtidos nas versões paralelas implementadas são comparados com o desempenho das versões seqüenciais. Para viabilizar esta comparação, todos os algoritmos seqüenciais encontrados na literatura também foram implementados na linguagem "C" no ambiente alvo UNIX e encontram-se em anexo. As versões paralelas foram implementadas utilizando-se recursos da própria rede de estações, através de diretivas socket, e também em Transputers na linguagem C paralela. O protótipo do servidor de processadores foi implementado como um servidor RPC para uma rede de estações UNIX também na linguagem "C". A ferramenta de simulação para o funcionamento do servidor foi implementada na linguagem "C" e seu sistema de entrada de dados e visualização utiliza a interface X-Windows. Com os resultados deste trabalho se pode ter uma boa idéia dos efeitos e das dificuldades encontradas na paralelização dos algoritmos de alocação e gerência de processadores para máquinas Hipercúbicas. As informações contidas no trabalho auxiliam na melhoria do tempo de resposta dos algoritmos seqüenciais atuais e no desenvolvimento de novos algoritmos com mais recursos e ainda assim viáveis em ambientes interativos, graças a utilização de paralelismo. O protótipo Sub-Cube RPC demonstra como os algoritmos estudados neste trabalho podem ser aplicados na construção de um servidor de processadores para máquinas multiprocessadas. O protótipo servirá como base para a implementação de um servidor semelhante no CPGCC/UFRGS, que colocará uma placa de Transputers à disposição da rede de estações do grupo de processamento paralelo. / In the last years massively parallel machines, build with hundreds of processors, are becoming an alternative for the construction of supercomputers. In this new concept of data processing, high performance is achieved by processor cooperation in the resolution of a problem. A great part of the commercial massively parallel machines utilizes the hypercubic topology to interconnect their multiple processors, or may be configured as hypercubes. A very interesting alternative for sharing the processing power of this machines is their utilization as aggregated computer in a network, serving various users [DUT 91]. In such environment, the hypercube behaves like a processor server, permitting the users to allocate part of its processors for local use. This result in a enhancement in the performance of workstation networks to the level of supercomputers and allow higher dimension hypercubes to be better utilized. In such environment the operating system is responsible for serving the users of a shared multiprocessor in a efficient way, not allowing a quick fragmentation of the hypercube and observing the maximal waiting time for the applications. The algorithms for processor allocation and management are responsible for obtention and control of one or more processors of the shared machine for the user's task execution. In this study, parallel versions of the most important algorithms for processor allocation and management in hypercubes found in the literature are proposed. The intention with this paralelization is to achieved a better response time of the more complex algorithms, making their use possible in a real time sharing environment. Because the allocation is considered the most important part of the processor server, the utilization of more complex algorithms allows a better utilization of the shared processors, resulting in a performance increase of the parallel machine. Based on the proposed algorithms, a processor server is defined for sharing a hypercubic multiprocessor in a workstation network. Some functions of this server are implemented in a prototype called Sub-Cube RPC. To analyze the behavior of the network, in relation to the inclusion of this new shared resource, a simulator for the SUN/UNIX environment has been developed together with the Performance Evaluation Group ADMP. With this tool and with the response times of the developed server prototype, it is possible to evaluate the cost of the additional network traffic generated by the server, with the possibility to change parameters of the server and network. The results obtained in the implemented parallel versions are compared with the performance of the sequential algorithms. To make this comparison possible all the sequential algorithms found in the literature are also implemented in the "C" language and can be found in annex. The parallel versions were implemented using network resources, through the socket directive, and also using Transputers in parallel "C". The processor server prototype was implemented as a RPC server for an UNIX network, also in the "C" language. The simulation tool was coded in "C" and the I/O interface use the X-Windows protocol. The results of this study may give a background about the effects and difficulties found in the pa ralelization of the allocation algorithms for the hypercubic machines. The information found in this study will help the operating system designer to obtain a better response time of the sequential algorithms found in the literature and in the development of new and more complex algorithms that will be still practicable in a real time environment due to parallelism utilization. The Sub-Cube RPC prototype demonstrates how the algorithms studied in this work can be applied in the construction of a processor server for multiprocessors. The prototype is the first step for the implementation of a similar server in the CPGCC/UFRGS that will share a Transputer board in a network of workstations from the parallel processing group. Arquitetura de computadores Processamento paralelo Algoritmos paralelos Hipercubo Alocacao : Processadores Computer architecture Parallel processing Processor allocation Parallel algorithms Hypercubes
63	Shell-based geometric image and video inpainting Hocking, Laird Robert January 2018 (has links) The subject of this thesis is a class of fast inpainting methods (image or video) based on the idea of filling the inpainting domain in successive shells from its boundary inwards. Image pixels (or video voxels) are filled by assigning them a color equal to a weighted average of either their already filled neighbors (the ``direct'' form of the method) or those neighbors plus additional neighbors within the current shell (the ``semi-implicit'' form). In the direct form, pixels (voxels) in the current shell may be filled independently, but in the semi-implicit form they are filled simultaneously by solving a linear system. We focus in this thesis mainly on the image inpainting case, where the literature contains several methods corresponding to the {\em direct} form of the method - the semi-implicit form is introduced for the first time here. These methods effectively differ only in the order in which pixels (voxels) are filled, the weights used for averaging, and the neighborhood that is averaged over. All of them are very fast, but at the same time all of them leave undesirable artifacts such as ``kinking'' (bending) or blurring of extrapolated isophotes. This thesis has two main goals. First, we introduce new algorithms within this class, which are aimed at reducing or eliminating these artifacts, and also target a specific application - the 3D conversion of images and film. The first part of this thesis will be concerned with introducing 3D conversion as well as Guidefill, a method in the above class adapted to the inpainting problems arising in 3D conversion. However, the second and more significant goal of this thesis is to study these algorithms as a class. In particular, we develop a mathematical theory aimed at understanding the origins of artifacts mentioned. Through this, we seek is to understand which artifacts can be eliminated (and how), and which artifacts are inevitable (and why). Most of the thesis is occupied with this second goal. Our theory is based on two separate limits - the first is a {\em continuum} limit, in which the pixel width →0, and in which the algorithm converges to a partial differential equation. The second is an asymptotic limit in which h is very small but non-zero. This latter limit, which is based on a connection to random walks, relates the inpainted solution to a type of discrete convolution. The former is useful for studying kinking artifacts, while the latter is useful for studying blur. Although all the theoretical work has been done in the context of image inpainting, experimental evidence is presented suggesting a simple generalization to video. Finally, in the last part of the thesis we explore shell-based video inpainting. In particular, we introduce spacetime transport, which is a natural generalization of the ideas of Guidefill and its predecessor, coherence transport, to three dimensions (two spatial dimensions plus one time dimension). Spacetime transport is shown to have much in common with shell-based image inpainting methods. In particular, kinking and blur artifacts persist, and the former of these may be alleviated in exactly the same way as in two dimensions. At the same time, spacetime transport is shown to be related to optical flow based video inpainting. In particular, a connection is derived between spacetime transport and a generalized Lucas-Kanade optical flow that does not distinguish between time and space.
64	Scalable Community Detection using Distributed Louvain Algorithm Sattar, Naw Safrin 23 May 2019 (has links) Community detection (or clustering) in large-scale graph is an important problem in graph mining. Communities reveal interesting characteristics of a network. Louvain is an efficient sequential algorithm but fails to scale emerging large-scale data. Developing distributed-memory parallel algorithms is challenging because of inter-process communication and load-balancing issues. In this work, we design a shared memory-based algorithm using OpenMP, which shows a 4-fold speedup but is limited to available physical cores. Our second algorithm is an MPI-based parallel algorithm that scales to a moderate number of processors. We also implement a hybrid algorithm combining both. Finally, we incorporate dynamic load-balancing in our final algorithm DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing). DPLAL overcomes the performance bottleneck of the previous algorithms, shows around 12-fold speedup scaling to a larger number of processors. Overall, we present the challenges, our solutions, and the empirical performance of our algorithms for several large real-world networks. Community Detection Louvain Method Parallel Algorithms MPI OpenMP Load-balancing Graph Mining Computer Engineering Computer Sciences Theory and Algorithms
65	Designing Efficient Parallel Algorithms for Graph Problems Liang, Weifa, wliang@cs.anu.edu.au January 1997 (has links) Graph algorithms are concerned with the algorithmic aspects of solving graph problems. The problems are motivated from and have application to diverse areas of computer science, engineering and other disciplines. Problems arising from these areas of application are good candidates for parallelization since they often have both intense computational needs and stringent response time requirements. Motivated by these concerns, this thesis investigates parallel algorithms for these kinds of graph problems that have at least one of the following properties: the problems involve some type of dynamic updates; the sparsification technique is applicable; or the problems are closely related to communications network issues. The models of parallel computation used in our studies are the Parallel Random Access Machine (PRAM) model and the practical interconnection network models such as meshes and hypercubes. ¶ Consider a communications network which can be represented by a graph G = (V;E), where V is a set of sites (processors), and E is a set of links which are used to connect the sites (processors). In some cases, we also assign weights and/or directions to the edges in E. Associated with this network, there are many problems such as (i) whether the network is k-edge (k-vertex) connected withfixed k; (ii) whether there are k-edge (k-vertex) disjoint paths between u and v for a pair of given vertices u and v after the network is dynamically updated by adding and/or deleting an edge etc; (iii) whether the sites in the network can communicate with each other when some sites and links fail; (iv) identifying the first k edges in the network whose deletion will result in the maximum increase in the routing cost in the resulting network for fixed k; (v) how to augment the network at optimal cost with a given feasible set of weighted edges such that the augmented network is k-edge (k-vertex) connected; (vi) how to route messages through the network efficiently. In this thesis we answer the problems mentioned above by presenting efficient parallel algorithms to solve them. As far as we know, most of the proposed algorithms are the first ones in the parallel setting. ¶ Even though most of the problems concerned in this thesis are related to communications networks, we also study the classic edge-coloring problem. The outstanding difficulty to solve this problem in parallel is that we do not yet know whether or not it is in NC. In this thesis we present an improved parallel algorithm for the problem which needs [bigcircle]([bigtriangleup][superscript 4.5]log [superscript 3] [bigtriangleup] log n + [bigtriangleup][superscript 4] log [superscript 4] n) time using [bigcircle](n[superscript 2][bigtriangleup] + n[bigtriangleup][superscript 3]) processors, where n is the number of vertices and [bigtriangleup] is the maximum vertex degree. Compared with a previously known result on the same model, we improved by an [bigcircle]([bigtriangleup][superscript 1.5]) factor in time. The non-trivial part is to reduce this problem to the edge-coloring update problem. We also generalize this problem to the approximate edge-coloring problem by giving a faster parallel algorithm for the latter case. ¶ Throughout the design and analysis of parallel graph algorithms, we also find a technique called the sparsification technique is very powerful in the design of efficient sequential and parallel algorithms on dense undirected graphs. We believe that this technique may be useful in its own right for guiding the design of efficient sequential and parallel algorithms for problems in other areas as well as in graph theory. efficient parallel algorithms graph problems graph algorithms paralellization dynamic updates sparsification communication networks Parallel Random Access Machine PRAM meshes hypercubes
66	SYMPAD - A Class Library for Processing Parallel Algorithm Specifications Rullmann, Markus, Schaffer, Rainer, Siegel, Sebastian, Merker, Renate 08 June 2007 (has links) (PDF) In this paper we introduce a new class library to model transformations of parallel algorithms. SYMPAD serves as a basis to develop automated tools and methods to generate efficient implementations of such algorithms. The paper gives an overview over the general structure, as well as features of the library. We further describe the fundamental design process that is controlled by our developed methods. class of affine indexed algorithms design flow ddc:004 ddc:500 Parallelverarbeitung Systementwurf
67	Development, analysis and applications of the technology for parallelization of numerical algorithms for solution of PDE and systems of PDEs / Diferencialinių lygčių ir jų sistemų skaitinio sprendimo algoritmų lygiagretinimo technologijos kūrimas, analizė ir taikymai Jakušev, Aleksandr 20 June 2008 (has links) The new parallelization technology is presented in this work. The technology is suitable for parallelization of linear algebra problems that arise during solution of PDE and PDE systems. The new technology combines the strong points of "data parallel" and "global memory" parallel programming models. Using the pecularities of the problems of a given class, the technology allows to write effective code easily, with the addition of the possibility for semi-automatic parallelization. The work consists of 3 parts: the review of existing technologies, the description of the new one, various applications. / Šiame darbe pateikiama nauja tiesinės algebros algoritmų, atsirandančių sprendžiant dif. lygtis ir jų sistemas, lygiagretinimo technologija. Ši technologija apjungia "lygiagrečiųjų duomenų" ir "globalios atminties" lygiagretinimo modelių privalumus, ir, naudojant apibrėžtos klasės uždavinių yptaumus, leidžia lengvai gauti efektyvų programos kodą, kuris pusiau automatiškai lygiagretinamas. Darbas susideda iš 3 dalių: egzistuojančių priemonių apžvalga, naujos technologijos aprašymas, įvairūs taikymai. Informatics Engineering PDE Parallelization Parallel algorithms C++ MPI Diferencialinės lygtys ir jų sistemos Lygiagretinimas Lygiagretieji algoritmai C++ MPI
68	Diferencialinių lygčių ir jų sistemų skaitinio sprendimo algoritmų lygiagretinimo technologijos kūrimas, analizė ir taikymai / Development, analysis and applications of the technology for parallelization of numerical algorithms for solution of PDE and systems of PDEs Jakušev, Aleksandr 17 February 2009 (has links) Šiame darbe pateikiama nauja tiesinės algebros algoritmų, atsirandančių sprendžiant dif. lygtis ir jų sistemas, lygiagretinimo technologija. Ši technologija apjungia "lygiagrečiųjų duomenų" ir "globalios atminties" lygiagretinimo modelių privalumus, ir, naudojant apibrėžtos klasės uždavinių yptaumus, leidžia lengvai gauti efektyvų programos kodą, kuris pusiau automatiškai lygiagretinamas. Darbas susideda iš 3 dalių: egzistuojančių priemonių apžvalga, naujos technologijos aprašymas, įvairūs taikymai. / The new parallelization technology is presented in this work. The technology is suitable for parallelization of linear algebra problems that arise during solution of PDE and PDE systems. The new technology combines the strong points of "data parallel" and "global memory" parallel programming models. Using the pecularities of the problems of a given class, the technology allows to write effective code easily, with the addition of the possibility for semi-automatic parallelization. The work consists of 3 parts: the review of existing technologies, the description of the new one, various applications. Informatics Engineering Lygiagretieji algoritmai Lygiagretinimas C++ MPI Parallel algorithms Parallelization PDE systems solution C++ MPI
69	Parallelization of random search global optimization algorithms / Atsitiktinės paieškos globaliojo optimizavimo algoritmų lygiagretinimas Lančinskas, Algirdas 20 June 2013 (has links) Global optimization problems are relevant in various fields of research and industry, such as chemistry, biology, biomedicine, operational research, etc. Normally it is easier to solve optimization problems having some specific properties of objective function such as linearity, convexity, differentiability, etc. However, there are a lot of practical problems that do not satisfy such properties or even cannot be expressed in an adequate mathematical form. Therefore, it is popular to use random search optimization methods in solving such optimization problems. The dissertation deals with investigation of random search global optimization algorithms, their parallelization and application to solve practical problems. The work is focused on modification and parallelization of particle swarm optimization and genetic algorithms. The modification of particle swarm optimization algorithm, based on reduction of the search area is proposed, and several strategies to parallelize the algorithm are investigated. The algorithm is applied to solve Multiple Gravity Assist problem using parallel computing system. A hybrid global multi-objective optimization algorithm is developed by modifying single agent stochastic search strategy, and incorporating it into multi-objective optimization genetic algorithm. Several strategies to parallelize multi-objective optimization genetic algorithm is proposed. Parallel algorithms are experimentally investigated by solving competitive facility location... [to full text] / Optimizavimo uždaviniai sutinkami įvairiose mokslo ir pramonės srityse, tokiose kaip chemija, biologija, biomedicina, operacijų tyrimai ir pan. Paprastai efektyviausiai sprendžiami uždaviniai, turintys tam tikras savybes, tokias kaip tikslo funkcijų tiesiškumas, iškilumas, diferencijuojamumas ir pan. Tačiau ne visi praktikoje pasitaikantys optimizavimo uždaviniai tenkina šias savybes, o kartais iš vis negali būti išreiškiami adekvačia matematine išraiška. Tokiems uždaviniam spręsti yra populiarūs atsitiktinės paieškos optimizavimo metodai. Disertacijoje yra tiriami atsitiktinės paieškos optimizavimo metodai, jų lygiagretinimo galimybės ir taikymas praktikoje pasitaikantiems uždaviniams spręsti. Pagrindinis dėmesys skiriamas dalelių spiečiaus optimizavimo ir genetinių algoritmų modifikavimui ir lygiagretinimui. Disertacijoje yra siūloma dalelių spiečiaus optimizavimo algoritmo modifikacija, grįsta pieškos srities siaurinimu, ir tiriamos kelios algoritmo lygiagretinimo strategijos. Algoritmas yra taikomas erdvėlaivių skrydžių trajektorijų optimizavimo uždaviniui spręsti lygiagrečiųjų skaičiavimų sistemose. Taip pat yra siūlomas hibridinis globaliojo daugiakriterio optimizavimo algoritmas, gautas modifikuojant vieno agento stochastinės paieškos algoritmą ir įkomponuojant į daugiakriterio optimizavimo genetinį algoritmą. Siūlomos kelios daugiakriterio genetinio algoritmo lygiagretinimo strategijos. Jų pagrindu gauti lygiagretieji algoritmai eksperimentiškai tiriami sprendžiant... [toliau žr. visą tekstą] Informatics Global optimization Multi-objective optimization Random search Parallel algorithms Globalusis optimizavimas Daugiakriteris optimizavimas Atsitiktinė paieška Lygiagretieji algoritmai
70	Parallel computation in low-level vision Blake, Andrew January 1984 (has links) This thesis is concerned with problems of using computers to interpret scenes from television camera pictures. In particular, it tackles the problem of interpreting the picture in terms of lines and curves, rather like an artist's line drawing. This is very time consuming if done by a single, serial processor. However, if many processors were used simultaneously it could be done much more rapidly. In this thesis the task of line and curve extraction is expressed in terms of constraints, in a form that is susceptible to parallel computation. Iterative algorithms to perform this task have been designed and tested. They are proved to be convergent and to achieve the computation specified. Some previous work on the design of properly convergent, parallel algorithms has drawn on the mathematics of optimisation by relaxation. This thesis develops the use of these techniques for applying "continuity constraints" in line and curve description. First, the constraints are imposed "almost everywhere" on the grey-tone picture data, in two dimensions. Some "discontinuities" - places where the constraints are not satisfied - remain, and they form the lines and curves required for picture interpretation Secondly, a similar process is applied along each line or curve to segment it. Discontinuities in the angle of the tangent along the line or curve mark the positions of vertices. In each case the process is executed in parallel throughout the picture. It is shown that the specification of such a process as an optimisation problem is non-convex and this means that an optimal solution cannot necessarily be found in a reasonable time A method is developed for efficiently achieving a good sub-optimal solution. A parallel array processor is a large array of processor cells which can act simultaneously, throughout a picture. A software emulator of such a processor array was coded in C and a POP-2 based high level language, PARAPIC, to drive it was written and used to validate the parallel algorithms developed in the thesis It is argued that the scope, in a vision system, of parallel methods such as those exploited in this work is extensive. The implications for the design of hardware to perform low-level vision are discussed and it is suggested that a machine consisting of fewer, more powerful cells than in a parallel array processor would execute the parallel algorithms more efficiently. 621.3994

Search results