Global ETD Search

81	Loop parallelization in the cloud using OpenMP and MapReduce / Paralelização de laços na nuvem usando OpenMP e MapReduce Wottrich, Rodolfo Guilherme, 1990- 04 September 2014 (has links) Orientadores: Guido Costa Souza de Araújo, Rodolfo Jardim de Azevedo / Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação / Made available in DSpace on 2018-08-24T12:44:05Z (GMT). No. of bitstreams: 1 Wottrich_RodolfoGuilherme_M.pdf: 2132128 bytes, checksum: b8ac1197909b6cdaf96b95d6097649f3 (MD5) Previous issue date: 2014 / Resumo: A busca por paralelismo sempre foi um importante objetivo no projeto de sistemas computacionais, conduzida principalmente pelo constante interesse na redução de tempos de execução de aplicações. Programação paralela é uma área de pesquisa ativa, na qual o interesse tem crescido devido à emergência de arquiteturas multicore. Por outro lado, aproveitar as grandes capacidades de computação e armazenamento da nuvem e suas características desejáveis de flexibilidade e escalabilidade oferece várias oportunidades interessantes para abordar problemas de pesquisa relevantes em computação científica. Infelizmente, em muitos casos a implementação de aplicações na nuvem demanda conhecimento específico de interfaces de programação paralela e APIs, o que pode se tornar um fardo na programação de aplicações complexas. Para superar tais limitações, neste trabalho propomos OpenMR, um modelo de execução baseado na sintaxe e nos princípios da API OpenMP que facilita a tarefa de programar sistemas distribuídos (isto é, clusters locais ou a nuvem remota). Especificamente, este trabalho aborda o problema de executar a paralelização de laços, usando OpenMR, em um ambiente distribuído, através do mapeamento de iterações do laço para nós MapReduce. Assim, a interface de programação para a nuvem se torna a própria linguagem, livrando o desenvolvedor da tarefa de se preocupar com detalhes da distribuição de cargas de trabalho e dados. Para avaliar a validade da proposta, modificamos benchmarks da suite SPEC OMP2012 para se encaixarem no modelo proposto, desenvolvemos outros toy benchmarks que são I/O-bound e executamo-os em duas configurações: (a) um cluster de computadores disponível localmente através de uma LAN padrão; e (b) clusters disponíveis remotamente através dos serviços Amazon AWS. Comparamos os resultados com a execução utilizando OpenMP em uma arquitetura SMP e mostramos que a técnica de paralelização proposta é factível e demonstra boa escalabilidade / Abstract: The pursuit of parallelism has always been an important goal in the design of computer systems, driven mainly by the constant interest in reducing program execution time. Parallel programming is an active research area, which has grown in interest due to the emergence of multicore architectures. On the other hand, harnessing the large computing and storage capabilities of the cloud and its desirable flexibility and scaling features offers a number of interesting opportunities to address some relevant research problems in scientific computing. Unfortunately, in many cases the implementation of applications on the cloud demands specific knowledge of parallel programming interfaces and APIs, which may become a burden when programming complex applications. To overcome such limitations, in this work we propose OpenMR, an execution model based on the syntax and principles of the OpenMP API which eases the task of programming distributed systems (i.e. local clusters or remote cloud). Specifically, this work addresses the problem of performing loop parallelization, using OpenMR, in a distributed environment, through the mapping of loop iterations to MapReduce nodes. By doing so, the cloud programming interface becomes the programming language itself, freeing the developer from the task of worrying about the details of distributing workload and data. To assess the validity of the proposal, we modified benchmarks from the SPEC OMP2012 suite to fit the proposed model, developed other I/O-bound toy benchmarks and executed them in two settings: (a) a computer cluster locally available through a standard LAN; and (b) clusters remotely available through the Amazon AWS services. We compare the results to the execution using OpenMP in an SMP architecture and show that the proposed parallelization technique is feasible and demonstrates good scalability / Mestrado / Ciência da Computação / Mestre em Ciência da Computação Programação paralela (Computação) Computação em nuvem OpenMP (Programação paralela) Parallel programming (Computer science) Cloud computing OpenMP (Parallel programming)
82	Machine learning based mapping of data and streaming parallelism to multi-cores Wang, Zheng January 2011 (has links) Multi-core processors are now ubiquitous and are widely seen as the most viable means of delivering performance with increasing transistor densities. However, this potential can only be realised if the application programs are suitably parallel. Applications can either be written in parallel from scratch or converted from existing sequential programs. Regardless of how applications are parallelised, the code must be efficiently mapped onto the underlying platform to fully exploit the hardware’s potential. This thesis addresses the problem of finding the best mappings of data and streaming parallelism—two types of parallelism that exist in broad and important domains such as scientific, signal processing and media applications. Despite significant progress having been made over the past few decades, state-of-the-art mapping approaches still largely rely upon hand-crafted, architecture-specific heuristics. Developing a heuristic by hand, however, often requiresmonths of development time. Asmulticore designs become increasingly diverse and complex, manually tuning a heuristic for a wide range of architectures is no longer feasible. What are needed are innovative techniques that can automatically scale with advances in multi-core technologies. In this thesis two distinct areas of computer science, namely parallel compiler design and machine learning, are brought together to develop new compiler-based mapping techniques. Using machine learning, it is possible to automatically build highquality mapping schemes, which adapt to evolving architectures, with little human involvement. First, two techniques are proposed to find the best mapping of data parallelism. The first technique predicts whether parallel execution of a data parallel candidate is profitable on the underlying architecture. On a typical multi-core platform, it achieves almost the same (and sometimes a better) level of performance when compared to the manually parallelised code developed by independent experts. For a profitable candidate, the second technique predicts how many threads should be used to execute the candidate across different program inputs. The second technique achieves, on average, over 96% of the maximum available performance on two different multi-core platforms. Next, a new approach is developed for partitioning stream applications. This approach predicts the ideal partitioning structure for a given stream application. Based on the prediction, a compiler can rapidly search the program space (without executing any code) to generate a good partition. It achieves, on average, a 1.90x speedup over the already tuned partitioning scheme of a state-of-the-art streaming compiler. 005.3
83	Tracing de aplicações paralelas com informações de alto nível de abstração / Tracing of parallel applications with information of high level abstraction Piola, Thatyana de Faria 06 July 2007 (has links) A computação paralela tem se estabelecido como uma ferramenta indispensável para conseguir o desempenho esperado em aplicações de muitas áreas científicas. É importante avaliar os fatores que limitam o desempenho de uma aplicação paralela. Este trabalho vem apresentar o desenvolvimento e a implementação de uma ferramenta chamada Hierarchical Analyses que permite o levantamento de dados para análise de fatores de desempenho em programas paralelos de forma hierárquica, ou seja, permite coletar as informações acompanhando o nível de abstração usado pelo programador. Essa ferramenta é composta pelos módulos de coleta e transformação dos dados. O módulo de coleta chamado HieraCollector é responsável por coletar e armazenar os dados em arquivos no formato XML, sendo que o usuário não precisa alterar o código fonte de sua aplicação. O módulo de transformação chamado HieraTransform é reponsável por transformar os dados coletados extraindo medidas que permitam a realização da análise do programa paralelo. Para validação dos módulos de coleta e transformação foi utilizada a biblioteca MPI e o framework OOPS que utiliza orientação a objetos. Outra contribuição deste trabalho, foi o desenvolvimento da ferramenta visual chamada HieraOLAP que auxilia o usuário na análise de desempenho de programas paralelos. / Parallel computing has become an essential tool to achieve the performance needed by many scientific applications. The evaluation of performance factors of parallel applications is of utmost significant. This work presents the developement and implementation of a tool called Hierarchical Analyses which facilitates data collection for performance analysis of parallel programs with hierarchical information, i.e. the information is collected in the various abstraction levels used in the application program. The tool consists of a collection and a transformation modules. The collection module (HieraCollector) collects the data and stores it in XML format. The transformation module (HieraTransform) processes the collected data computing measurements to be used in the analysis of parallel code. To validate the tool, implementations adapted to MPI and the OOPS framework have been developed. Another contribution of this work is the development of a visual tool called HieraOLAP to help the user in the analysis of parallel program performance. Análise de Desempenho Parallel programming Performance analyses Programação paralela Trace Trace
84	Data-parallel concurrent constraint programming. January 1994 (has links) by Bo-ming Tong. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1994. / Includes bibliographical references (leaves 104-[110]). / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Concurrent Constraint Programming --- p.2 / Chapter 1.2 --- Finite Domain Constraints --- p.3 / Chapter 2 --- The Firebird Language --- p.5 / Chapter 2.1 --- Finite Domain Constraints --- p.6 / Chapter 2.2 --- The Firebird Computation Model --- p.6 / Chapter 2.3 --- Miscellaneous Features --- p.7 / Chapter 2.4 --- Clause-Based N on determinism --- p.9 / Chapter 2.5 --- Programming Examples --- p.10 / Chapter 2.5.1 --- Magic Series --- p.10 / Chapter 2.5.2 --- Weak Queens --- p.14 / Chapter 3 --- Operational Semantics --- p.15 / Chapter 3.1 --- The Firebird Computation Model --- p.16 / Chapter 3.2 --- The Firebird Commit Law --- p.17 / Chapter 3.3 --- Derivation --- p.17 / Chapter 3.4 --- Correctness of Firebird Computation Model --- p.18 / Chapter 4 --- Exploitation of Data-Parallelism in Firebird --- p.24 / Chapter 4.1 --- An Illustrative Example --- p.25 / Chapter 4.2 --- Mapping Partitions to Processor Elements --- p.26 / Chapter 4.3 --- Masks --- p.27 / Chapter 4.4 --- Control Strategy --- p.27 / Chapter 4.4.1 --- A Control Strategy Suitable for Linear Equations --- p.28 / Chapter 5 --- Data-Parallel Abstract Machine --- p.30 / Chapter 5.1 --- Basic DPAM --- p.31 / Chapter 5.1.1 --- Hardware Requirements --- p.31 / Chapter 5.1.2 --- Procedure Calling Convention And Process Creation --- p.32 / Chapter 5.1.3 --- Memory Model --- p.34 / Chapter 5.1.4 --- Registers --- p.41 / Chapter 5.1.5 --- Process Management --- p.41 / Chapter 5.1.6 --- Unification --- p.49 / Chapter 5.1.7 --- Variable Table --- p.49 / Chapter 5.2 --- DPAM with Backtracking --- p.50 / Chapter 5.2.1 --- Choice Point --- p.52 / Chapter 5.2.2 --- Trailing --- p.52 / Chapter 5.2.3 --- Recovering the Process Queues --- p.57 / Chapter 6 --- Implementation --- p.58 / Chapter 6.1 --- The DECmpp Massively Parallel Computer --- p.58 / Chapter 6.2 --- Implementation Overview --- p.59 / Chapter 6.3 --- Constraints --- p.60 / Chapter 6.3.1 --- Breaking Down Equality Constraints --- p.61 / Chapter 6.3.2 --- Processing the Constraint 'As Is' --- p.62 / Chapter 6.4 --- The Wide-Tag Architecture --- p.63 / Chapter 6.5 --- Register Window --- p.64 / Chapter 6.6 --- Dereferencing --- p.65 / Chapter 6.7 --- Output --- p.66 / Chapter 6.7.1 --- Collecting the Solutions --- p.66 / Chapter 6.7.2 --- Decoding the solution --- p.68 / Chapter 7 --- Performance --- p.69 / Chapter 7.1 --- Uniprocessor Performance --- p.71 / Chapter 7.2 --- Solitary Mode --- p.73 / Chapter 7.3 --- Bit Vectors of Domain Variables --- p.75 / Chapter 7.4 --- Heap Consumption of the Heap Frame Scheme --- p.77 / Chapter 7.5 --- Eager Nondeterministic Derivation vs Lazy Nondeterministic Deriva- tion --- p.78 / Chapter 7.6 --- Priority Scheduling --- p.79 / Chapter 7.7 --- Execution Profile --- p.80 / Chapter 7.8 --- Effect of the Number of Processor Elements on Performance --- p.82 / Chapter 7.9 --- Change of the Degree of Parallelism During Execution --- p.84 / Chapter 8 --- Related Work --- p.88 / Chapter 8.1 --- Vectorization of Prolog --- p.89 / Chapter 8.2 --- Parallel Clause Matching --- p.90 / Chapter 8.3 --- Parallel Interpreter --- p.90 / Chapter 8.4 --- Bounded Quantifications --- p.91 / Chapter 8.5 --- SIMD MultiLog --- p.91 / Chapter 9 --- Conclusion --- p.93 / Chapter 9.1 --- Limitations --- p.94 / Chapter 9.1.1 --- Data-Parallel Firebird is Specialized --- p.94 / Chapter 9.1.2 --- Limitations of the Implementation Scheme --- p.95 / Chapter 9.2 --- Future Work --- p.95 / Chapter 9.2.1 --- Extending Firebird --- p.95 / Chapter 9.2.2 --- Improvements Specific to DECmpp --- p.99 / Chapter 9.2.3 --- Labeling --- p.100 / Chapter 9.2.4 --- Parallel Domain Consistency --- p.101 / Chapter 9.2.5 --- Branch and Bound Algorithm --- p.102 / Chapter 9.2.6 --- Other Possible Future Work --- p.102 / Bibliography --- p.104 Parallel programming (Computer science) Parallel computers
85	Java message passing interface. January 1998 (has links) by Wan Lai Man. / Thesis (M.Phil.)--Chinese University of Hong Kong, 1998. / Includes bibliographical references (leaves 76-80). / Abstract also in Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Background --- p.1 / Chapter 1.2 --- Objectives --- p.3 / Chapter 1.3 --- Contributions --- p.4 / Chapter 1.4 --- Overview --- p.4 / Chapter 2 --- Literature Review --- p.6 / Chapter 2.1 --- Message Passing Interface --- p.6 / Chapter 2.1.1 --- Point-to-Point Communication --- p.7 / Chapter 2.1.2 --- Persistent Communication Request --- p.8 / Chapter 2.1.3 --- Collective Communication --- p.8 / Chapter 2.1.4 --- Derived Datatype --- p.9 / Chapter 2.2 --- Communications in Java --- p.10 / Chapter 2.2.1 --- Object Serialization --- p.10 / Chapter 2.2.2 --- Remote Method Invocation --- p.11 / Chapter 2.3 --- Performances Issues in Java --- p.11 / Chapter 2.3.1 --- Byte-code Interpreter --- p.11 / Chapter 2.3.2 --- Just-in-time Compiler --- p.12 / Chapter 2.3.3 --- HotSpot --- p.13 / Chapter 2.4 --- Parallel Computing in Java --- p.14 / Chapter 2.4.1 --- JavaMPI --- p.15 / Chapter 2.4.2 --- Bayanihan --- p.15 / Chapter 2.4.3 --- JPVM --- p.15 / Chapter 3 --- Infrastructure --- p.17 / Chapter 3.1 --- Layered Model --- p.17 / Chapter 3.2 --- Java Parallel Environment --- p.19 / Chapter 3.2.1 --- Job Coordinator --- p.20 / Chapter 3.2.2 --- HostApplet --- p.20 / Chapter 3.2.3 --- Formation of Java Parallel Environment --- p.21 / Chapter 3.2.4 --- Spawning Processes --- p.24 / Chapter 3.2.5 --- Message-passing Mechanism --- p.28 / Chapter 3.3 --- Application Programming Interface --- p.28 / Chapter 3.3.1 --- Message Routing --- p.29 / Chapter 3.3.2 --- Language Binding for MPI in Java --- p.31 / Chapter 4 --- Programming in JMPI --- p.35 / Chapter 4.1 --- JMPI Package --- p.35 / Chapter 4.2 --- Application Startup Procedure --- p.37 / Chapter 4.2.1 --- MPI --- p.38 / Chapter 4.2.2 --- JMPI --- p.38 / Chapter 4.3 --- Example --- p.39 / Chapter 5 --- Processes Management --- p.42 / Chapter 5.1 --- Background --- p.42 / Chapter 5.2 --- Scheduler Model --- p.43 / Chapter 5.3 --- Load Estimation --- p.45 / Chapter 5.3.1 --- Cost Ratios --- p.47 / Chapter 5.4 --- Task Distribution --- p.49 / Chapter 6 --- Performance Evaluation --- p.51 / Chapter 6.1 --- Testing Environment --- p.51 / Chapter 6.2 --- Latency from Java --- p.52 / Chapter 6.2.1 --- Benchmarking --- p.52 / Chapter 6.2.2 --- Experimental Results in Computation Costs --- p.52 / Chapter 6.2.3 --- Experimental Results in Communication Costs --- p.55 / Chapter 6.3 --- Latency from JMPI --- p.56 / Chapter 6.3.1 --- Benchmarking --- p.56 / Chapter 6.3.2 --- Experimental Results --- p.58 / Chapter 6.4 --- Application Granularity --- p.62 / Chapter 6.5 --- Scheduling Enable --- p.64 / Chapter 7 --- Conclusion --- p.66 / Chapter 7.1 --- Summary of the thesis --- p.66 / Chapter 7.2 --- Future work --- p.67 / Chapter A --- Performance Metrics and Benchmark --- p.69 / Chapter A.1 --- Model and Metrics --- p.69 / Chapter A.1.1 --- Measurement Model --- p.69 / Chapter A.1.2 --- Performance Metrics --- p.70 / Chapter A.1.3 --- Communication Parameters --- p.72 / Chapter A.2 --- Benchmarking --- p.73 / Chapter A.2.1 --- Ping --- p.73 / Chapter A.2.2 --- PingPong --- p.74 / Chapter A.2.3 --- Collective --- p.74 / Bibliography --- p.76 Java (Computer program language) Computer interfaces Parallel programming (Computer science)
86	Finding, Measuring, and Reducing Inefficiencies in Contemporary Computer Systems Kambadur, Melanie Rae January 2016 (has links) Computer systems have become increasingly diverse and specialized in recent years. This complexity supports a wide range of new computing uses and users, but is not without cost: it has become difficult to maintain the efficiency of contemporary general purpose computing systems. Computing inefficiencies, which include nonoptimal runtimes, excessive energy use, and limits to scalability, are a serious problem that can result in an inability to apply computing to solve the world's most important problems. Beyond the complexity and vast diversity of modern computing platforms and applications, a number of factors make improving general purpose efficiency challenging, including the requirement that multiple levels of the computer system stack be examined, that legacy hardware devices and software may stand in the way of achieving efficiency, and the need to balance efficiency with reusability, programmability, security, and other goals. This dissertation presents five case studies, each demonstrating different ways in which the measurement of emerging systems can provide actionable advice to help keep general purpose computing efficient. The first of the five case studies is Parallel Block Vectors, a new profiling method for understanding parallel programs with a fine-grained, code-centric perspective aids in both future hardware design and in optimizing software to map better to existing hardware. Second is a project that defines a new way of measuring application interference on a datacenter's worth of chip-multiprocessors, leading to improved scheduling where applications can more effectively utilize available hardware resources. Next is a project that uses the GT-Pin tool to define a method for accelerating the simulation of GPGPUs, ultimately allowing for the development of future hardware with fewer inefficiencies. The fourth project is an experimental energy survey that compares and combines the latest energy efficiency solutions at different levels of the stack to properly evaluate the state of the art and to find paths forward for future energy efficiency research. The final project presented is NRG-Loops, a language extension that allows programs to measure and intelligently adapt their own power and energy use. Parallel programming (Computer science) Computer science Computers Operating systems (Computers)
87	Diagonalizing quantum spin models on parallel machine. / 並行機上量子自旋模型的對角化 / Diagonalizing quantum spin models on parallel machine. / Bing xing ji shang liang zi zi xuan mo xing de dui jiao hua January 2005 (has links) Chan Yuk-Lin = 並行機上量子自旋模型的對角化 / 陳玉蓮. / Thesis submitted in: September 2004. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (leaves 121-123). / Text in English; abstracts in English and Chinese. / Chan Yuk-Lin = Bing xing ji shang liang zi zi xuan mo xing de dui jiao hua / Chen Yulian. / Abstract --- p.i / 摘要 --- p.ii / Acknowledgement --- p.iii / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Motivation --- p.1 / Chapter 1.2 --- Development of Theory of Magnetism --- p.2 / Chapter 1.3 --- Heisenberg Model --- p.5 / Chapter 1.4 --- Thesis Organization --- p.9 / Chapter 2 --- Introduction to Parallel Computing --- p.11 / Chapter 2.1 --- Architecture of Parallel Computer --- p.12 / Chapter 2.2 --- Symmetric Multiprocessors and Clusters --- p.16 / Chapter 2.2.1 --- Symmetric Multiprocessors --- p.16 / Chapter 2.2.2 --- Cluster --- p.18 / Chapter 2.2.3 --- Clusters versus SMP --- p.19 / Chapter 2.3 --- Hybrid Architectures (Cluster of SMPs) --- p.20 / Chapter 2.4 --- Hardware Platform for Parallel Computing --- p.21 / Chapter 2.4.1 --- SGI Origin 2000 (Origin) --- p.21 / Chapter 2.4.2 --- IBM RS/6000 SP (Orbit) --- p.22 / Chapter 3 --- Parallelization --- p.23 / Chapter 3.1 --- Models of Parallel Programming --- p.24 / Chapter 3.2 --- Parallel Programming Paradigm --- p.26 / Chapter 3.2.1 --- Programming for Distributed Memory Systems: MPI --- p.26 / Chapter 3.2.2 --- Programming for Shared Memory Systems: OpenMP --- p.31 / Chapter 3.2.3 --- Programming for Hybrid Systems: MPI + OpenMP --- p.39 / Chapter 4 --- Performance --- p.42 / Chapter 4.1 --- Writing a Parallel Program --- p.42 / Chapter 4.2 --- Performance Analysis --- p.43 / Chapter 4.3 --- Synchronization and Communication --- p.47 / Chapter 4.3.1 --- Communication modes --- p.47 / Chapter 5 --- Exact Diagonalization --- p.50 / Chapter 5.1 --- Symmetry Invariance --- p.52 / Chapter 5.2 --- Lanczos Method --- p.53 / Chapter 5.2.1 --- Basic Lanczos Algorithm --- p.54 / Chapter 5.2.2 --- Modified Lanczos Method --- p.56 / Chapter 5.3 --- Dynamic Memory Allocation --- p.58 / Chapter 6 --- Parallelization of Exact Diagonalization --- p.62 / Chapter 6.1 --- Parallelization of Lanczos Method --- p.62 / Chapter 6.2 --- Hamiltonian Matrix Decomposition --- p.66 / Chapter 6.2.1 --- Row-Wise Block Decomposition --- p.67 / Chapter 6.2.2 --- Column-Wise Block Decomposition --- p.69 / Chapter 7 --- Results and Discussion --- p.71 / Chapter 7.1 --- Lattice structure --- p.71 / Chapter 7.2 --- Definition of Timing --- p.72 / Chapter 7.3 --- Rowwise vs Columnwise --- p.73 / Chapter 7.4 --- SGI Origin 2000(0rigin) --- p.77 / Chapter 7.4.1 --- Timing Results --- p.77 / Chapter 7.4.2 --- Performance --- p.79 / Chapter 7.5 --- IBM RS/6000 SP(Orbit) --- p.82 / Chapter 7.5.1 --- MPI vs Hybrid --- p.82 / Chapter 7.5.2 --- Timing and Performance --- p.84 / Chapter 7.6 --- Timing on Origin vs Orbit --- p.89 / Chapter 8 --- Conclusion --- p.91 / Chapter A --- Basic MPI Concepts --- p.95 / Chapter A.1 --- Message Passing Interface --- p.95 / Chapter A.2 --- MPI Routine Format --- p.96 / Chapter A.3 --- Start writing a MPI program --- p.96 / Chapter A.3.1 --- The First MPI Program --- p.97 / Chapter A.3.2 --- Sample MPI Program #1 --- p.100 / Chapter A.3.3 --- Sample MPI Program #2 --- p.106 / Chapter B --- Compiling and Running Parallel Jobs in IBM SP --- p.111 / Chapter B.1 --- Compilation --- p.111 / Chapter B.1.1 --- Compiler Options --- p.112 / Chapter B.2 --- Running Jobs --- p.114 / Chapter B.2.1 --- Loadleveler --- p.114 / Chapter B.2.2 --- Serial Job Script --- p.114 / Chapter B.2.3 --- Parallel Job Script : MPI Program --- p.115 / Chapter B.2.4 --- Parallel Job Script: OpenMP Program --- p.117 / Chapter B.2.5 --- Parallel Job Script: Hybrid MPI/OpenMP Program . . --- p.118 / Chapter B.2.6 --- LoadLeveler Commands --- p.120 / Bibliography --- p.123 Quantum theory--Data processing Parallel programming (Computer science) Ferromagnetism--Mathematics
88	Genetic parallel programming. / CUHK electronic theses & dissertations collection / Digital dissertation consortium January 2005 (has links) Sin Man Cheang. / "March 2005." / Thesis (Ph.D.)--Chinese University of Hong Kong, 2005. / Includes bibliographical references (p. 219-233) / Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Electronic reproduction. Ann Arbor, MI : ProQuest Information and Learning Company, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web. / Mode of access: World Wide Web. / Abstracts in English and Chinese. Genetic programming (Computer science) Parallel programming (Computer science)
89	Aplicação de técnicas de computação paralela para simulação de fluidos com métodos de partículas explícitos. / Application of parallel computing on explicit particle methods for fluid simulation. Denis Taniguchi 07 February 2014 (has links) O MPS é um método lagrangeano sem malha para simulação de fluidos, que teve origem para estudos de fluxo incompressíveis com superfície livre. Ele possui diversas vantagens se comparado a métodos convencionais baseados no uso de malha, podendo-se citar como principais a facilidade de representação de interfaces e geometrias complexas, assim como a ausência do termo de convecção nas equações algébricas. Este trabalho foca na aplicação de métodos de computação paralela para simulações de escoamento utilizando a variante explícita do método MPS, denominado E-MPS. O objetivo é a diminuição do tempo de processamento das simulações, e o aumento da quantidade de partículas, que possibilita a simulação de casos cada vez mais complexos, e o real emprego do método para solução de problemas de engenharia. O método proposto consiste de dois níveis de paralelismo: um através de uma decomposição de domínio espacial sobre uma arquitetura de memória distribuída, e outra pelo uso de processamento paralelo em uma arquitetura com memória compartilhada, podendo ser pelo uso de dispositivos Graphics Processing Unit (GPU), ou pelo uso de processadores multicore. Os métodos de decomposição de domínio espacial tratados neste trabalho são o estático, ou não adaptativo, o Orthogonal Recursive Bisection (ORB), o ortogonal e uma nova proposta chamada cell transfer. Dentre os métodos já existentes, o ortogonal se mostrou mais atrativo devido à sua simplicidade, conseguindo manter um melhor nível de balanceamento do que o estático no caso estudado. O novo método cell transfer tenta superar as limitações geométricas dos outros métodos citados, ao levar em consideração a natureza do fluxo. Uma das grandes contribuições deste trabalho é um novo método genérico de comunicação entre subdomínios, que evita a reordenação das partículas, e serve para todos os métodos de decomposição investigadas neste trabalho. / MPS is a meshless lagrangian method for computational fluid dynamics that was created to study incompressible free surface flows and has many advantages compared to traditional mesh based methods, such as the ability to represent complex geometries, interface problems, and the absence of the advection term in the algebraic equations. This work focus on the use of parallel computing methods for fluid dynamic simulation, and more specifically, on the explicit variant of the MPS method, namely E-MPS, to decrease the amount of processing needed to perform a simulation and increase the number of particles, which enables the simulation of real and complex engineering problems. The proposed method is composed of two levels of parallelism: a distributed memory parallelism based on spatial domain decomposition, and a shared memory parallelism, using either GPU or multicore CPUs, for fast computation of each subdomain. Static non-adaptive, ORB, orthogonal, and cell transfer spatial decomposition methods are subject of investigations in this work, the latter being originally proposed by this work to overcome the drawbacks found in most of the methods found in the literature. Among the already proposed methods the more attractive was the orthogonal, due to its simplicity, and capability of maintaining a good load balance in the test case. The new cell transfer method was proposed to overcome the geometrical limitations found in all the above methods, by considering the flux while balancing the load among subdomains. One of the main contributions of this work is a new method for the communication of subdomains, which avoids additional sorting steps, and proved to be generic for all the decomposition methods investigated. 3GVC 3GVC
90	Aplicação de técnicas de computação paralela para simulação de fluidos com métodos de partículas explícitos. / Application of parallel computing on explicit particle methods for fluid simulation. Taniguchi, Denis 07 February 2014 (has links) O MPS é um método lagrangeano sem malha para simulação de fluidos, que teve origem para estudos de fluxo incompressíveis com superfície livre. Ele possui diversas vantagens se comparado a métodos convencionais baseados no uso de malha, podendo-se citar como principais a facilidade de representação de interfaces e geometrias complexas, assim como a ausência do termo de convecção nas equações algébricas. Este trabalho foca na aplicação de métodos de computação paralela para simulações de escoamento utilizando a variante explícita do método MPS, denominado E-MPS. O objetivo é a diminuição do tempo de processamento das simulações, e o aumento da quantidade de partículas, que possibilita a simulação de casos cada vez mais complexos, e o real emprego do método para solução de problemas de engenharia. O método proposto consiste de dois níveis de paralelismo: um através de uma decomposição de domínio espacial sobre uma arquitetura de memória distribuída, e outra pelo uso de processamento paralelo em uma arquitetura com memória compartilhada, podendo ser pelo uso de dispositivos Graphics Processing Unit (GPU), ou pelo uso de processadores multicore. Os métodos de decomposição de domínio espacial tratados neste trabalho são o estático, ou não adaptativo, o Orthogonal Recursive Bisection (ORB), o ortogonal e uma nova proposta chamada cell transfer. Dentre os métodos já existentes, o ortogonal se mostrou mais atrativo devido à sua simplicidade, conseguindo manter um melhor nível de balanceamento do que o estático no caso estudado. O novo método cell transfer tenta superar as limitações geométricas dos outros métodos citados, ao levar em consideração a natureza do fluxo. Uma das grandes contribuições deste trabalho é um novo método genérico de comunicação entre subdomínios, que evita a reordenação das partículas, e serve para todos os métodos de decomposição investigadas neste trabalho. / MPS is a meshless lagrangian method for computational fluid dynamics that was created to study incompressible free surface flows and has many advantages compared to traditional mesh based methods, such as the ability to represent complex geometries, interface problems, and the absence of the advection term in the algebraic equations. This work focus on the use of parallel computing methods for fluid dynamic simulation, and more specifically, on the explicit variant of the MPS method, namely E-MPS, to decrease the amount of processing needed to perform a simulation and increase the number of particles, which enables the simulation of real and complex engineering problems. The proposed method is composed of two levels of parallelism: a distributed memory parallelism based on spatial domain decomposition, and a shared memory parallelism, using either GPU or multicore CPUs, for fast computation of each subdomain. Static non-adaptive, ORB, orthogonal, and cell transfer spatial decomposition methods are subject of investigations in this work, the latter being originally proposed by this work to overcome the drawbacks found in most of the methods found in the literature. Among the already proposed methods the more attractive was the orthogonal, due to its simplicity, and capability of maintaining a good load balance in the test case. The new cell transfer method was proposed to overcome the geometrical limitations found in all the above methods, by considering the flux while balancing the load among subdomains. One of the main contributions of this work is a new method for the communication of subdomains, which avoids additional sorting steps, and proved to be generic for all the decomposition methods investigated. 3GVC 3GVC

Search results