• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 225
  • 81
  • 30
  • 24
  • 14
  • 7
  • 6
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 500
  • 500
  • 103
  • 70
  • 61
  • 58
  • 58
  • 57
  • 57
  • 56
  • 54
  • 54
  • 52
  • 50
  • 47
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

PERFORMANCE EVALUATION OF MEMORY AND COMPUTATIONALLY BOUND CHEMISTRY APPLICATIONS ON STREAMING GPGPUS AND MULTI-CORE X86 CPUS

Weber III, Frederick E 01 May 2010 (has links)
In recent years, multi-core processors have come to dominate the field in desktop and high performance computing. Graphics processors traditionally used in CAD, video games, and other 3-d applications, have become more programmable and are now suitable for general purpose computing. This thesis explores multi-core processors and GPU performance and limitations in two computational chemistry applications: a memory bound component of ab-initio modeling and a computationally bound Monte Carlo simulation. For the applications presented in this thesis, exploiting multiple processors is done using a variety of tools and languages including OpenMP and MKL. Brook+ and the Compute Abstraction Layer streaming environments are used to accelerate applications on AMD GPUs. This thesis gives qualitative assertions about these languages and tools regarding ease of use and optimization in addition to quantitative analyses of performance. GPUs can yield modest performance improvements with little effort in some applications and even larger speedups with simple optimizations.
72

Performance Analysis of kNN on large datasets using CUDA & Pthreads : Comparing between CPU & GPU

Kankatala, Sriram January 2015 (has links)
Several organizations have large databases which are growing at a rapid rate day by day, which need to be regularly maintained. Content based searches are similar searched based on certain features that are obtained from various multi media data. For various applications like multimedia content retrieval, data mining, pattern recognition, etc., performing the nearest neighbor search is a challenging task in multidimensional data. The important factors in nearest neighbor search kNN are searching speed and accuracy. Implementation of kNN on GPU is an ongoing research from last few years, focusing on improving the performance of kNN. By considering these aspects, our research has been started and found a gap in this research area. This master thesis shows effective and efficient parallelism on multi-core of CPU and GPU to compare the performance with single core CPU. This paper shows an experimental implementation of kNN on single core CPU, Mutli-core CPU and GPU using C, Pthreads and CUDA respectively. We considered different levels of inputs (size, dimensions) to evaluate the performance. The experiment shows the GPU outperforms for kNN  when compared to CPU single core with a factor of approximately 5.8 to 16 and CPU multi-core with a factor of approximately 1.2 to 3 for different levels of inputs.
73

Parallel Computing in Statistical-Validation of Clustering Algorithm for the Analysis of High throughput Data

Atlas, Mourad 12 May 2005 (has links)
Currently, clustering applications use classical methods to partition a set of data (or objects) in a set of meaningful sub-classes, called clusters. A cluster is therefore a collection of objects which are “similar” among them, thus can be treated collectively as one group, and are “dissimilar” to the objects belonging to other clusters. However, there are a number of problems with clustering. Among them, as mentioned in [Datta03], dealing with large number of dimensions and large number of data items can be problematic because of computational time. In this thesis, we investigate all clustering algorithms used in [Datta03] and we present a parallel solution to minimize the computational time. We apply parallel programming techniques to the statistical algorithms as a natural extension to sequential programming technique using R. The proposed parallel model has been tested on a high throughput dataset. It is microarray data on the transcriptional profile during sporulation in budding yeast. It contains more than 6,000 genes. Our evaluation includes clustering algorithm scalability pertaining to datasets with varying dimensions, the speedup factor, and the efficiency of the parallel model over the sequential implementation. Our experiments show that the gene expression data follow the pattern predicted in [Datta03] that is Diana appears to be solid performer also the group means for each cluster coincides with that in [Datta03]. We show that our parallel model is applicable to the clustering algorithms and more useful in applications that deal with high throughput data, such as gene expression data.
74

Sprendimų priėmimas lygiagrečiuose skaičiavimuose / Decisions making in parallel computing

Filatovas, Ernestas 14 June 2006 (has links)
The principles of parallel computing, the MPI program parcel, which was used in this work, allow to adapt them to solve tasks using computer nets and peculiarities of implanting this package. The specific character of multiple criteria tasks of optimization and way of solutions using computer nets were cleared up. The decision theory, basic methods of making decisions were described in this work.
75

A Case Study of A Multithreaded Buchberger Normal Form Algorithm

Linfoot, Andy James January 2006 (has links)
Groebner bases have many applications in mathematics, science, and engineering. This dissertation deals with the algorithmic aspects of computing these bases. The dissertation begins with a brief introduction of fundamental concepts about Groebner bases. Following this a discussion of various implementation issues are discussed. Much of the practical difficulties of using Groebner basis algorithms and techniques stems from the high computational complexity. It is shown that the algorithmic complexity of computing a Groebner basis primarily stems from the calculation of normal forms. This is established by studying run profiles of various computations. This leads to two options of making Groebner basis techniques more practical. They are to reduce the complexity by developing new algorithms (heuristics) or reduce running time of normal form calculations by introducing concurrency. The later approach is taken in the remainder of the dissertation where a multithreaded normal form algorithm is presented and discussed. It is shown with a simple example that the new algorithm demonstrates a speedup and scalability. The algorithm also has the advantage of being completion strategy independent. We conclude with an outline of future research involving the new algorithm.
76

Shared memory abstraction: new approach under high concurrency conditions / Αφαίρεση κοινής μνήμης: νέα προσέγγιση υπό συνθήκες υψηλής συγχρονικότητας

Καραντάσης, Κωνσταντίνος 15 May 2012 (has links)
In the current dissertation an implementation of shared memory abstraction on top of contemporary multi-core and many-core clusters has taken place. The results of the presented research effort are mainly depicted in the implementation of the cluster middleware platform Pleiad. Pleiad is a Java-based prototype that incorporates best practices from the field of distributed shared memory systems and also includes some prototype characteristics. Next we review briefly the main results and contributions of the current dissertation: • e presented middleware, Pleiad, is characterized by a highly modular design. Moreover, contrast to most other related efforts, which are usually bound to a specific implementation of consistency, Pleiad has the infrastructure to incorporate many implementations for a certain mechanism and can even interchange such implementations during runtime. • Reference implementations are offered for the relaxed consistency models of Lazy Release Consistency (LRC) and Scope Consistency (ScC). Pleiad is the first Javabased middleware to incorporate implementations for both protocols. • In the current dissertation is taking place one of the few evaluations on a cluster that is supplied with low-power processors (Intel Atom) and thus can be thought as a characteristic case of embedded oriented multi-core clusters. • In the current dissertation one of the first implementations of shared memory abstraction on top of GPU clusters is presented. Shared memory abstraction is evaluated under two schemes. On the first scheme shared memory programming with GPU clusters is achieved under a hybrid combination of the first commercial implementation of OpenMP for clusters, the Intel Cluster OpenMP, and the CUDA platform. e evaluated scheme is the first evaluation of OpenMP and CUDA in the context of GPU clusters. e second scheme involves the enhancement of Pleiad in order to support utilization of GPU clusters. Such implementation is one of the few unified implementation of a shared memory abstraction programming environment that • For the moment there is no establishment of available and widely used benchmarks or application codes that utilize multiple GPUs, either on a cluster or a single node. us, among the thesis contributions is considered the evaluation of shared memory abstraction with real application codes, since the few related systems either have used simple kernels or have been evaluated on a single node. • Specifically, in the current thesis applications from two characteristic domains, computational fluid dynamics (CFD) and data clustering, have been implemented and evaluated using GPU clusters and single GPUs. In the first case, a computationally intensive CDF code that operates on structured grids has been accelerated on a GPU cluster, while a simulation that manipulates unstructured grid has been accelerated in the context of a single GPU and demonstrates its potential for GPU cluster acceleration. Accordingly, a partitional data clustering algorithm is accelerated using shared memory abstraction on GPU clusters and a preliminary implementation of a hierarchical data clustering algorithm on GPUs is described. / -
77

Performance Evaluation of Node.js on Multi-core Computing Systems

Azmat, Janty January 2018 (has links)
Since JavaScript code that is executed by the Node.js run-time environment is run in a single thread without really utilizing the full power of multi-core systems, fairly new approaches attempt to solve this situation. Some of these approaches are considered well publicly tested and are widely used at the time of writing this document. The objectives for this study are to check which ones of these approaches achieve the better scalability in accordance to the number of handled requests, and to what extent those approaches utilize the multi-core power compared to the raw Node.js environment with the normal CPU scheduling.
78

A novel approach to reduce the computation time for CFD : hybrid LES-RANS modelling on parallel computers

Turnbull, Julian January 2003 (has links)
Large Eddy Simulation is a method of obtaining high accuracy computational results for modelling fluid flow. Unfortunately it is computationally expensive limiting it to users of large parallel machines. However, it may be that the use of LES leads to an over-resolution of the problem because the bulk of the computational domain could be adequately modelled using the Reynolds averaged approach. A study has been undertaken to assess the feasibility, both in accuracy and computational efficiency of using a parallel computer to solve both LES and RANS type turbulence models on the same domain for the problem flow over a circular cylinder at Reynolds number 3 900 To do this the domain has been created and then divided into two sub-domains, one for the LES model and one for the kappa-epsilon turbulence model. The hybrid model has been developed specifically for a parallel computing environment and the user is able to allocate modelling techniques to processors in a way which enables expansion of the model to any number of processors. Computational experimentation has shown that the combination of the Smagorinsky model can be used to capture the vortex shedding from the cylinder and the information successfully passed to the kappa - epsilon model for the dissipation of the vortices further downstream. The results have been compared to high accuracy LES results and with both kappa - epsilon and Smagorinsky LES computations on the same domain. The hybrid models developed compare well with the Smagorinsky model capturing the vortex shedding with the correct periodicity. Suggestions for future work have been made to develop this idea further, and to investigate the possibility of using the technology for the modelling of mixing and fast chemical reactions based on the more accurate prediction of the turbulence levels in the LES sub-domain.
79

Parallelisation of micromagnetic simulations

Nagy, Lesleis January 2016 (has links)
The field of paleomagnetism attempts to understand in detail the the processes of the Earth by studying naturally occurring magnetic samples. These samples are quite unlike those fabricated in the laboratory. They have irregular shapes; they have been squeezed and stretched, heated and cooled and subjected to oxidation. However micromagnetic modelling allows us to simulate such samples and gain some understanding of how a paleomagnetic signal is acquired and how it is retained. Micromagnetics provides a theory for understanding how the domain structure of a magnetic sample alters subject to what it is made from and the environment that it is in. It furnishes the mathematics that describe the energy of a given domain structure and how that domain structure evolves in time. Combining micromagnetics and ever increasing computer power, it has been possible to produce simulations of small to medium size grains within the so-called single to pseudo single domain state range. However processors are no longer built with increasing speed but with increasing parallelism and it is this that must be exploited to model larger and larger paleomagnetic samples. The purpose of the work presented here is twofold. Firstly a micromagnetics code that is parallel and scalable is presented. This code is based on FEniCS, an existing finite element framework, and is shown to run on ARCHER the UK’s national supercomputing service. The strategy of using existing libraries and frameworks allow future extension and inclusion of new science in the code base. In order to achieve scalability, a spatial mapping technique is used to calculate the demagnetising field - the most computationally intensive part of micromagnetic calculations. This allows grain geometries to be partitioned in such a way that no global communication is required between parallel processes - the source of favourable scaling behaviour. The second part of the theses presents an exploration of domain state evolution in increasing sizes of magnetite grains. This simulation, whilst a first approximation that excludes magneto-elastic effects, is the first attempt to map out the transition from pseudo-single domain states to multi domain states using a full micromagnetic simulation.
80

Paralelização da ferramenta de alinhamento de sequências MUSCLE para um ambiente distribuído /

Marucci, Evandro Augusto. January 2009 (has links)
Orientador: José Márcio Machado / Banca: Liria Matsumoto Sato / Banca: Aleardo Manacero Junior / Resumo: Devido a crescente quantidade de dados genômicos para comparação, a computação paralela está se tornando cada vez mais necessária para realizar uma das operaçoes mais importantes da bioinformática, o alinhamento múltiplo de sequências. Atualmente, muitas ferramentas computacionais são utilizadas para resolver alinhamentos e o uso da computação paralela está se tornando cada vez mais generalizado. Entretanto, embora diferentes algoritmos paralelos tenham sido desenvolvidos para suportar as pesquisas genômicas, muitos deles não consideram aspectos fundamentais da computação paralela. O MUSCLE [1] e uma ferramenta que realiza o alinhamento m ultiplo de sequências com um bom desempenho computacional e resultados biológicos signi cativamente precisos [2]. Embora os m etodos utilizados por ele apresentem diferentes versões paralelas propostas na literatura, apenas uma versão paralela do MUSCLE foi proposta [3]. Essa versão, entretanto, foi desenvolvida para sistemas de mem oria compartilhada. O desenvolvimento de uma versão paralela do MUSCLE para sistemas distribu dos e importante dado o grande uso desses sistemas em laboratórios de pesquisa genômica. Esta paralelização e o foco deste trabalho e ela foi realizada utilizando-se abordagens paralelas existentes e criando-se novas abordagens. Como resultado, diferentes estratégias paralelas foram propostas. Estas estratégias podem ser incorporadas a outras ferramentas de alinhamento que utilizam, em determinadas etapas, a mesma abordagem sequencial. Em cada método paralelizado, considerou-se principalmente a e ciência, a escalabilidade e a capacidade de atender problemas reais da biologia. Os testes realizados mostram que, para cada etapa paralela, ao menos uma estratégia de nida atende bem todos esses crit erios. Al em deste trabalho realizar um paralelismo in edito, ao viabilizar a execução da ferramenta MUSCLE em... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Due to increasing amount of genetic data for comparison, parallel computing is becoming increasingly necessary to perform one of the most important operations in bioinformatics, the multiple sequence alignments. Nowadays, many software tools are used to solve sequence alignments and the use of parallel computing is becoming more and more widespread. However, although di erent parallel algorithms were developed to support genetic researches, many of them do not consider fundamental aspects of parallel computing. The MUSCLE [1] is a tool that performs multiple sequence alignments with good computational performance and biological results signi cantly precise [2]. Although the methods used by them have di erent parallel versions proposed in the literature, only one parallel version of the MUSCLE tool was proposed [3]. This version, however, was developed for shared memory systems. The development of a parallel MUSCLE tool for distributed systems is important given the wide use of such systems in laboratories of genomic researches. This parallelization is the aim of this work and it was done using existing parallel approaches and creating new approaches. Consequently, di erent parallel strategies have been proposed. These strategies can be incorporated into other alignment tools that use, in a given stage, the same sequential approach. In each parallel method, we considered mainly the e ciency, scalability and ability to meet real biological problems. The tests show that, for each parallel step, at least one de ned strategy meets all these criteria. In addition to the new MUSCLE parallelization, enabling it execute in a distributed systems, the results show that the de ned strategies have a better performance than the existing strategies. / Mestre

Page generated in 0.1166 seconds