Global ETD Search

91	A novel approach to reduce the computation time for CFD : hybrid LES-RANS modelling on parallel computers Turnbull, Julian January 2003 (has links) Large Eddy Simulation is a method of obtaining high accuracy computational results for modelling fluid flow. Unfortunately it is computationally expensive limiting it to users of large parallel machines. However, it may be that the use of LES leads to an over-resolution of the problem because the bulk of the computational domain could be adequately modelled using the Reynolds averaged approach. A study has been undertaken to assess the feasibility, both in accuracy and computational efficiency of using a parallel computer to solve both LES and RANS type turbulence models on the same domain for the problem flow over a circular cylinder at Reynolds number 3 900 To do this the domain has been created and then divided into two sub-domains, one for the LES model and one for the kappa-epsilon turbulence model. The hybrid model has been developed specifically for a parallel computing environment and the user is able to allocate modelling techniques to processors in a way which enables expansion of the model to any number of processors. Computational experimentation has shown that the combination of the Smagorinsky model can be used to capture the vortex shedding from the cylinder and the information successfully passed to the kappa - epsilon model for the dissipation of the vortices further downstream. The results have been compared to high accuracy LES results and with both kappa - epsilon and Smagorinsky LES computations on the same domain. The hybrid models developed compare well with the Smagorinsky model capturing the vortex shedding with the correct periodicity. Suggestions for future work have been made to develop this idea further, and to investigate the possibility of using the technology for the modelling of mixing and fast chemical reactions based on the more accurate prediction of the turbulence levels in the LES sub-domain. 620.106015118
92	Parallelisation of micromagnetic simulations Nagy, Lesleis January 2016 (has links) The field of paleomagnetism attempts to understand in detail the the processes of the Earth by studying naturally occurring magnetic samples. These samples are quite unlike those fabricated in the laboratory. They have irregular shapes; they have been squeezed and stretched, heated and cooled and subjected to oxidation. However micromagnetic modelling allows us to simulate such samples and gain some understanding of how a paleomagnetic signal is acquired and how it is retained. Micromagnetics provides a theory for understanding how the domain structure of a magnetic sample alters subject to what it is made from and the environment that it is in. It furnishes the mathematics that describe the energy of a given domain structure and how that domain structure evolves in time. Combining micromagnetics and ever increasing computer power, it has been possible to produce simulations of small to medium size grains within the so-called single to pseudo single domain state range. However processors are no longer built with increasing speed but with increasing parallelism and it is this that must be exploited to model larger and larger paleomagnetic samples. The purpose of the work presented here is twofold. Firstly a micromagnetics code that is parallel and scalable is presented. This code is based on FEniCS, an existing finite element framework, and is shown to run on ARCHER the UK’s national supercomputing service. The strategy of using existing libraries and frameworks allow future extension and inclusion of new science in the code base. In order to achieve scalability, a spatial mapping technique is used to calculate the demagnetising field - the most computationally intensive part of micromagnetic calculations. This allows grain geometries to be partitioned in such a way that no global communication is required between parallel processes - the source of favourable scaling behaviour. The second part of the theses presents an exploration of domain state evolution in increasing sizes of magnetite grains. This simulation, whilst a first approximation that excludes magneto-elastic effects, is the first attempt to map out the transition from pseudo-single domain states to multi domain states using a full micromagnetic simulation. 538
93	Paralelização da ferramenta de alinhamento de sequências MUSCLE para um ambiente distribuído / Marucci, Evandro Augusto. January 2009 (has links) Orientador: José Márcio Machado / Banca: Liria Matsumoto Sato / Banca: Aleardo Manacero Junior / Resumo: Devido a crescente quantidade de dados genômicos para comparação, a computação paralela está se tornando cada vez mais necessária para realizar uma das operaçoes mais importantes da bioinformática, o alinhamento múltiplo de sequências. Atualmente, muitas ferramentas computacionais são utilizadas para resolver alinhamentos e o uso da computação paralela está se tornando cada vez mais generalizado. Entretanto, embora diferentes algoritmos paralelos tenham sido desenvolvidos para suportar as pesquisas genômicas, muitos deles não consideram aspectos fundamentais da computação paralela. O MUSCLE [1] e uma ferramenta que realiza o alinhamento m ultiplo de sequências com um bom desempenho computacional e resultados biológicos signi cativamente precisos [2]. Embora os m etodos utilizados por ele apresentem diferentes versões paralelas propostas na literatura, apenas uma versão paralela do MUSCLE foi proposta [3]. Essa versão, entretanto, foi desenvolvida para sistemas de mem oria compartilhada. O desenvolvimento de uma versão paralela do MUSCLE para sistemas distribu dos e importante dado o grande uso desses sistemas em laboratórios de pesquisa genômica. Esta paralelização e o foco deste trabalho e ela foi realizada utilizando-se abordagens paralelas existentes e criando-se novas abordagens. Como resultado, diferentes estratégias paralelas foram propostas. Estas estratégias podem ser incorporadas a outras ferramentas de alinhamento que utilizam, em determinadas etapas, a mesma abordagem sequencial. Em cada método paralelizado, considerou-se principalmente a e ciência, a escalabilidade e a capacidade de atender problemas reais da biologia. Os testes realizados mostram que, para cada etapa paralela, ao menos uma estratégia de nida atende bem todos esses crit erios. Al em deste trabalho realizar um paralelismo in edito, ao viabilizar a execução da ferramenta MUSCLE em... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Due to increasing amount of genetic data for comparison, parallel computing is becoming increasingly necessary to perform one of the most important operations in bioinformatics, the multiple sequence alignments. Nowadays, many software tools are used to solve sequence alignments and the use of parallel computing is becoming more and more widespread. However, although di erent parallel algorithms were developed to support genetic researches, many of them do not consider fundamental aspects of parallel computing. The MUSCLE [1] is a tool that performs multiple sequence alignments with good computational performance and biological results signi cantly precise [2]. Although the methods used by them have di erent parallel versions proposed in the literature, only one parallel version of the MUSCLE tool was proposed [3]. This version, however, was developed for shared memory systems. The development of a parallel MUSCLE tool for distributed systems is important given the wide use of such systems in laboratories of genomic researches. This parallelization is the aim of this work and it was done using existing parallel approaches and creating new approaches. Consequently, di erent parallel strategies have been proposed. These strategies can be incorporated into other alignment tools that use, in a given stage, the same sequential approach. In each parallel method, we considered mainly the e ciency, scalability and ability to meet real biological problems. The tests show that, for each parallel step, at least one de ned strategy meets all these criteria. In addition to the new MUSCLE parallelization, enabling it execute in a distributed systems, the results show that the de ned strategies have a better performance than the existing strategies. / Mestre Processamento paralelo (Computadores) Parallel computing. eng Distributed systems. eng
94	Integrace procedurálního kódu do proudových paralelních systémů / Procedural code integration in streaming environments Brabec, Michal January 2018 (has links) Title: Procedural code integration in streaming environments Author: Mgr. Michal Brabec Department: Department of Software Engineering Supervisor: David Bednárek, Ph.D. Abstract: Streaming environments and similar parallel platforms are widely used in image, signal, or general data processing as means of achieving high perfor- mance. Unfortunately, they are often associated with domain specific program- ming languages, and thus hardly accessible for non-experts. In this work, we present a framework for transformation of a procedural code to a streaming ap- plication. We selected a restricted version of the C# language as the interface for our system, because it is widely taught and many programmers are familiar with it. This approach will allow creating streaming applications or their parts using a widely known imperative language instead of the intricate languages specific to streaming. The transformation process is based on the Hybrid Flow Graph - a novel inter- mediate code which employs the streaming paradigm and can be further convert- ed into streaming applications. The intermediate code shares the features and limitations of the streaming environments, while representing the applications without platform specific technical details, which allows us to use well known graph algorithms to work with the...
95	Artificial intelligence models for large scale buildings energy consumption analysis / Modèles d'intelligence artificielle pour analyse énergétique des bâtiments de la consommation Zhao, Haixiang 28 September 2011 (has links) La performance énergétique dans les bâtiments est influencée par de nombreux facteurs, tels que les conditions météorologiques ambiantes, la structure du bâtiment et les caractéristiques, l'occupation et leurs comportements, l'opération de sous-composants de niveau comme le chauffage, de ventilation et de climatisation (CVC). Cette propriété rend complexe la prévision, l'analyse, ou faute de détection / diagnostic de la consommation énergétique du bâtiment est très difficile d'effectuer rapidement et avec précision. Cette thèse se concentre principalement sur la mise à jour des modèles d'intelligence artificielle avec des applications pour résoudre ces problèmes. Tout d'abord, nous passons en revue les modèles récemment développés pour résoudre ces problèmes, y compris des méthodes d'ingénierie détaillée et simplifiée, les méthodes statistiques et les méthodes d'intelligence artificielle. Puis nous simulons des profils de consommation d'énergie pour les bâtiments simples et multiples, et basé sur ces ensembles de données, des modèles de soutien vecteur de la machine sont formés et testés pour faire la prédiction. Les résultats des expériences montrent vaste précision de la prédiction haute et la robustesse de ces modèles. Deuxièmement, déterministe récursif Perceptron (RDP) modèle de réseau neuronal est utilisé pour détecter et diagnostiquer défectueuse consommation d'énergie du bâtiment. La consommation anormale est simulé par l'introduction manuelle d'une dégradation des performances des appareils électriques. Dans l'expérience, le modèle montre la capacité de détection RDP très élevé. Une nouvelle approche est proposée pour diagnostiquer des défauts. Il est basé sur l'évaluation des modèles RDP, dont chacun est capable de détecter une panne de matériel. Troisièmement, nous examinons comment la sélection des sous-ensembles caractéristiques de l'influence la performance du modèle. Les caractéristiques optimales sont choisis en fonction de la faisabilité de l'obtention eux et sur les scores qu'ils fournissent dans l'évaluation de deux méthodes de filtrage. Les résultats expérimentaux confirmer la validité de l'ensemble sélectionné et montrent que la proposé la méthode de sélection fonction peut garantir l'exactitude du modèle et réduit le temps de calcul. Un défi de la consommation énergétique du bâtiment est d'accélérer la prédiction de formation du modèle lorsque les données sont très importantes. Cette thèse propose une mise en œuvre efficace parallèle de Support Vector Machines basée sur la méthode de décomposition pour résoudre de tels problèmes. La parallélisation est réalisée sur le travail le plus fastidieux de formation, c'est à dire de mettre à jour le vecteur gradient de f. Les problèmes intérieurs sont traitées par solveur d'optimisation séquentielle minimale. Le parallélisme sous-jacente est réalisée par la version de mémoire partagée de Map-Reduce paradigme, qui rend le système particulièrement adapté pour être appliqué à des systèmes multi-core et multi-processeurs. Les résultats expérimentaux montrent que notre implémentation offre une augmentation de la vitesse élevée par rapport à libsvm, et il est supérieur à l'état de l'art Pisvm application MPI à la fois la rapidité et l'exigence de stockage. / The energy performance in buildings is influenced by many factors, such as ambient weather conditions, building structure and characteristics, occupancy and their behaviors, the operation of sub-level components like Heating, Ventilation and Air-Conditioning (HVAC) system. This complex property makes the prediction, analysis, or fault detection/diagnosis of building energy consumption very difficult to accurately and quickly perform. This thesis mainly focuses on up-to-date artificial intelligence models with the applications to solve these problems. First, we review recently developed models for solving these problems, including detailed and simplified engineering methods, statistical methods and artificial intelligence methods. Then we simulate energy consumption profiles for single and multiple buildings, and based on these datasets, support vector machine models are trained and tested to do the prediction. The results from extensive experiments demonstrate high prediction accuracy and robustness of these models. Second, Recursive Deterministic Perceptron (RDP) neural network model is used to detect and diagnose faulty building energy consumption. The abnormal consumption is simulated by manually introducing performance degradation to electric devices. In the experiment, RDP model shows very high detection ability. A new approach is proposed to diagnose faults. It is based on the evaluation of RDP models, each of which is able to detect an equipment fault.Third, we investigate how the selection of subsets of features influences the model performance. The optimal features are selected based on the feasibility of obtaining them and on the scores they provide under the evaluation of two filter methods. Experimental results confirm the validity of the selected subset and show that the proposed feature selection method can guarantee the model accuracy and reduces the computational time.One challenge of predicting building energy consumption is to accelerate model training when the dataset is very large. This thesis proposes an efficient parallel implementation of support vector machines based on decomposition method for solving such problems. The parallelization is performed on the most time-consuming work of training, i.e., to update the gradient vector f. The inner problems are dealt by sequential minimal optimization solver. The underlying parallelism is conducted by the shared memory version of Map-Reduce paradigm, making the system particularly suitable to be applied to multi-core and multiprocessor systems. Experimental results show that our implementation offers a high speed increase compared to Libsvm, and it is superior to the state-of-the-art MPI implementation Pisvm in both speed and storage requirement. Efficacité énergétique Bâtiment Calcul parallèle Energy efficiency Building Parallel computing
96	Accelerating the knapsack problem on GPUs Suri, Bharath January 2011 (has links) The knapsack problem manifests itself in many domains like cryptography, financial domain and bio-informatics. Knapsack problems are often inside optimization loops in system-level design and analysis of embedded systems as well. Given a set of items, each associated with a profit and a weight, the knapsack problem deals with how to choose a subset of items such that the profit is maximized and the total weight of the chosen items is less than the capacity of the knapsack. There exists several variants and extensions of this knapsack problem. In this thesis, we focus on the multiple-choice knapsack problem, where the items are grouped into disjoint classes. However, the multiple-choice knapsack problem is known to be NP-hard. While many different heuristics and approximation schemes have been proposed to solve the problem in polynomial-time, such techniques do not return the optimal solution. A dynamic programming algorithm to solve the problem optimally is known, but has a pseudo-polynomial running time. This leads to high running times of tools in various application domains where knapsack problems must be solved. Many system-level design tools in the embedded systems domain, in particular, would suffer from high running when such a knapsack problem must be solved inside a larger optimization loop. To mitigate the high running times of such algorithms, in this thesis, we propose a GPU-based technique to solve the multiple-choice knapsack problem. We study different approaches to map the dynamic programming algorithm on the GPU and compare their performance in terms of the running times. We employ GPU specific methods to further improve the running times like exploiting the GPU on-chip shared memory. Apart from results on synthetic test-cases, we also demonstrate the applicability of our technique in practice by considering a case-study from system-level design. Towards this, we consider the problem of instruction-set selection for customizable processors. gpgpu knapsack parallel computing Computer Sciences Datavetenskap (datalogi)
97	High-performance particle simulation using CUDA Kalms, Mikael January 2015 (has links) Over the past 15 years, modern PC graphics cards (GPUs) have changed from being pure graphics accelerators into parallel computing platforms.Several new parallel programming languages have emerged, including NVIDIA's parallel programming language for GPUs (CUDA). This report explores two related problems in parallel: How well-suited is CUDA for implementing algorithms that utilize non-trivial data structures?And, how does one develop a complex algorithm that uses a CUDA system efficiently? A guide for how to implement complex algorithms in CUDA is presented. Simulation of a dense 2D particle system is chosen as the problem domain foralgorithm optimization. Two algorithmic optimization strategies are presented which reduce the computational workload when simulating theparticle system. The strategies can either be used independently, or combined for slightly improved results. Finally, the resultingimplementations are benchmarked against a simpler implementation on a normal PC processor (CPU) as well as a simpler GPU-algorithm. A simple GPU solution is shown to run at least 10 times faster than a simple CPU solution. An improved GPU solution can thenyield another 10 times speed-up, while sacrificing some accuracy. CUDA parallel computing particle simulation GPU Computer Engineering Datorteknik
98	gcn.MOPS: accelerating cn.MOPS with GPU Alkhamis, Mohammad 16 June 2017 (has links) cn.MOPS is a model-based algorithm used to quantitatively detect copy-number variations in next-generation, DNA-sequencing data. The algorithm is implemented as an R package and can speed up processing with multi-CPU parallelism. However, the maximum achievable speedup is limited by the overhead of multi-CPU parallelism, which increases with the number of CPU cores used. In this thesis, an alternative mechanism of process acceleration is proposed. Using one CPU core and a GPU device, the proposed solution, gcn.MOPS, achieved a speedup factor of 159× and decreased memory usage by more than half. This speedup was substantially higher than the maximum achievable speedup in cn.MOPS, which was ∼20×. / Graduate / 0984 / 0544 / 0715 / alkhamis@uvic.ca GPU GPGPU cn.MOPS gcn.MOPS CUDA C++ parallel computing CNV
99	Parallel methods for classical and disordered Spin models Navarro Guerrero, Cristóbal Alejandro January 2015 (has links) Doctor en Ciencias, Mención Computación / En las últimas décadas han crecido la cantidad de trabajos que buscan encontrar metodos eficientes que describan el comportamiento macroscópico de los sistemas de spin, a partir de una definición microscópica. Los resultados que se obtienen de estos sistemas no solo sirven a la comunidad fı́sica, sino también a otras áreas como dinámica molecular, redes sociales o problemas de optimización, entre otros. El hecho de que los sistemas de spin puedan explicar fenómenos de otras áreas ha generado un interés global en el tema. El problema es, sin embargo, que el costo computacional de los métodos involucrados llega a ser muy alto para fines prácticos. Por esto, es de gran interés estudiar como la computación paralela, combinada con nuevas estrategias algorı́tmicas, puede generar una mejora en velocidad y eficiencia sobre los metodos actuales. En esta tesis se presentan dos contribuciones; (1) un algoritmo exacto multi-core distribuido de tipo transfer matrix y (2) un método Monte Carlo multi-GPU para la sim- ulación del modelo 3D Random Field Ising Model (RFIM). La primera contribución toma ventaja de las relaciones jerárquicas encontradas en el espacio de configuraciones del problema para agruparlas en árboles de familias que se solucionan en paralelo. La segunda contribución extiende el método Exchange Monte Carlo como un algoritmo paralelo multi-GPU que in- cluye una fase de adaptación de temperaturas para mejorar la calidad de la simulación en las zonas de temperatura mas complejas de manera dinámica. Los resultados muestran que el nuevo algoritmo de transfer matrix reduce el espacio de configuraciones desde O(4^m ) a O(3^m ) y logra un fixed-size speedup casi lineal con aproxi- madamente 90% de eficiencia al solucionar los problemas de mayor tamaño. Para el método multi-GPU Monte Carlo, se proponen dos niveles de paralelismo; local, que escala con GPUs mas rápidas y global, que escala con múltiples GPUs. El método logra una aceleración de entre uno y dos ordenes de magnitud respecto a una implementación de referencia en CPU, y su paralelismo escala con aproximadamente 99% de eficiencia. La estrategia adaptativa de distribución de temperaturas incrementa la taza de intercambio en las zonas que estaban mas comprometidas sin aumentar la taza en el resto de las zonas, generando una simulación mas rápida aun y de mejor calidad a que si se usara una distribución uniforme de temperaturas. Las contribuciones logradas han permitido obtener nuevos resultados para el área de la fı́sica, como el calculo de la matriz transferencia para el kagome lattice en m = 9 y la simulación del modelo 3D Random Field Ising Model en L = {32, 64}. Algoritmos computacionales Ciencia de la computación Parallel computing Sistemas de Spin GPU
100	Finding Community Structures In Social Activity Data Peng, Chengbin 19 May 2015 (has links) Social activity data sets are increasing in number and volume. Finding community structure in such data is valuable in many applications. For example, understand- ing the community structure of social networks may reduce the spread of epidemics or boost advertising revenue; discovering partitions in tra c networks can help to optimize routing and to reduce congestion; finding a group of users with common interests can allow a system to recommend useful items. Among many aspects, qual- ity of inference and e ciency in finding community structures in such data sets are of paramount concern. In this thesis, we propose several approaches to improve com- munity detection in these aspects. The first approach utilizes the concept of K-cores to reduce the size of the problem. The K-core of a graph is the largest subgraph within which each node has at least K connections. We propose a framework that accelerates community detection. It first applies a traditional algorithm that is relatively slow to the K-core, and then uses a fast heuristic to infer community labels for the remaining nodes. The second approach is to scale the algorithm to multi-processor systems. We de- vise a scalable community detection algorithm for large networks based on stochastic block models. It is an alternating iterative algorithm using a maximum likelihood ap- proach. Compared with traditional inference algorithms for stochastic block models, our algorithm can scale to large networks and run on multi-processor systems. The time complexity is linear in the number of edges of the input network. The third approach is to improve the quality. We propose a framework for non- negative matrix factorization that allows the imposition of linear or approximately linear constraints on each factor. An example of the applications is to find community structures in bipartite networks, which is useful in recommender systems. Our algorithms are compared with the results in recent papers and their quality and e ciency are verified by experiments. Community Detection Scalability Parallel Computing Constrained NMF Social Networks

Search results