81 |
PERFORMANCE EVALUATION OF MEMORY AND COMPUTATIONALLY BOUND CHEMISTRY APPLICATIONS ON STREAMING GPGPUS AND MULTI-CORE X86 CPUSWeber III, Frederick E 01 May 2010 (has links)
In recent years, multi-core processors have come to dominate the field in desktop and high performance computing. Graphics processors traditionally used in CAD, video games, and other 3-d applications, have become more programmable and are now suitable for general purpose computing. This thesis explores multi-core processors and GPU performance and limitations in two computational chemistry applications: a memory bound component of ab-initio modeling and a computationally bound Monte Carlo simulation. For the applications presented in this thesis, exploiting multiple processors is done using a variety of tools and languages including OpenMP and MKL. Brook+ and the Compute Abstraction Layer streaming environments are used to accelerate applications on AMD GPUs. This thesis gives qualitative assertions about these languages and tools regarding ease of use and optimization in addition to quantitative analyses of performance. GPUs can yield modest performance improvements with little effort in some applications and even larger speedups with simple optimizations.
|
82 |
Performance Analysis of kNN on large datasets using CUDA & Pthreads : Comparing between CPU & GPUKankatala, Sriram January 2015 (has links)
Several organizations have large databases which are growing at a rapid rate day by day, which need to be regularly maintained. Content based searches are similar searched based on certain features that are obtained from various multi media data. For various applications like multimedia content retrieval, data mining, pattern recognition, etc., performing the nearest neighbor search is a challenging task in multidimensional data. The important factors in nearest neighbor search kNN are searching speed and accuracy. Implementation of kNN on GPU is an ongoing research from last few years, focusing on improving the performance of kNN. By considering these aspects, our research has been started and found a gap in this research area. This master thesis shows effective and efficient parallelism on multi-core of CPU and GPU to compare the performance with single core CPU. This paper shows an experimental implementation of kNN on single core CPU, Mutli-core CPU and GPU using C, Pthreads and CUDA respectively. We considered different levels of inputs (size, dimensions) to evaluate the performance. The experiment shows the GPU outperforms for kNN when compared to CPU single core with a factor of approximately 5.8 to 16 and CPU multi-core with a factor of approximately 1.2 to 3 for different levels of inputs.
|
83 |
Parallel Computing in Statistical-Validation of Clustering Algorithm for the Analysis of High throughput DataAtlas, Mourad 12 May 2005 (has links)
Currently, clustering applications use classical methods to partition a set of data (or objects) in a set of meaningful sub-classes, called clusters. A cluster is therefore a collection of objects which are “similar” among them, thus can be treated collectively as one group, and are “dissimilar” to the objects belonging to other clusters. However, there are a number of problems with clustering. Among them, as mentioned in [Datta03], dealing with large number of dimensions and large number of data items can be problematic because of computational time. In this thesis, we investigate all clustering algorithms used in [Datta03] and we present a parallel solution to minimize the computational time. We apply parallel programming techniques to the statistical algorithms as a natural extension to sequential programming technique using R. The proposed parallel model has been tested on a high throughput dataset. It is microarray data on the transcriptional profile during sporulation in budding yeast. It contains more than 6,000 genes. Our evaluation includes clustering algorithm scalability pertaining to datasets with varying dimensions, the speedup factor, and the efficiency of the parallel model over the sequential implementation. Our experiments show that the gene expression data follow the pattern predicted in [Datta03] that is Diana appears to be solid performer also the group means for each cluster coincides with that in [Datta03]. We show that our parallel model is applicable to the clustering algorithms and more useful in applications that deal with high throughput data, such as gene expression data.
|
84 |
Sprendimų priėmimas lygiagrečiuose skaičiavimuose / Decisions making in parallel computingFilatovas, Ernestas 14 June 2006 (has links)
The principles of parallel computing, the MPI program parcel, which was used in this work, allow to adapt them to solve tasks using computer nets and peculiarities of implanting this package. The specific character of multiple criteria tasks of optimization and way of solutions using computer nets were cleared up. The decision theory, basic methods of making decisions were described in this work.
|
85 |
A Case Study of A Multithreaded Buchberger Normal Form AlgorithmLinfoot, Andy James January 2006 (has links)
Groebner bases have many applications in mathematics, science, and engineering. This dissertation deals with the algorithmic aspects of computing these bases. The dissertation begins with a brief introduction of fundamental concepts about Groebner bases. Following this a discussion of various implementation issues are discussed. Much of the practical difficulties of using Groebner basis algorithms and techniques stems from the high computational complexity. It is shown that the algorithmic complexity of computing a Groebner basis primarily stems from the calculation of normal forms. This is established by studying run profiles of various computations. This leads to two options of making Groebner basis techniques more practical. They are to reduce the complexity by developing new algorithms (heuristics) or reduce running time of normal form calculations by introducing concurrency. The later approach is taken in the remainder of the dissertation where a multithreaded normal form algorithm is presented and discussed. It is shown with a simple example that the new algorithm demonstrates a speedup and scalability. The algorithm also has the advantage of being completion strategy independent. We conclude with an outline of future research involving the new algorithm.
|
86 |
Local independence in computed tomography as a basis for parallel computingMartin, Daniel Morris 14 September 2007 (has links)
Iterative CT reconstruction algorithms are superior to the standard convolution backpropagation (CBP) methods when reconstructing from a small number of views (hence less radiation), but are computationally costly. To reduce the execution time, this work implements and tests a parallel approach to iterative algorithms using a cluster of workstations, which is a low cost system found in many offices and non-academic sites. A previous implementation showed little speedup because of the significant cost of inter-processor communication. In this thesis, several data partitioning methods are examined, including some image tiling methods that exploit the spatial locality demonstrated by local CT. Using these methods, computation can proceed locally, without the need for inter-processor communication during every iteration. A relative speedup of up to 17 times is obtained using 25 processors, demonstrating that good performance can be obtained running computationally intensive CT reconstruction algorithms on distributed memory hardware.
|
87 |
Local independence in computed tomography as a basis for parallel computingMartin, Daniel Morris 14 September 2007 (has links)
Iterative CT reconstruction algorithms are superior to the standard convolution backpropagation (CBP) methods when reconstructing from a small number of views (hence less radiation), but are computationally costly. To reduce the execution time, this work implements and tests a parallel approach to iterative algorithms using a cluster of workstations, which is a low cost system found in many offices and non-academic sites. A previous implementation showed little speedup because of the significant cost of inter-processor communication. In this thesis, several data partitioning methods are examined, including some image tiling methods that exploit the spatial locality demonstrated by local CT. Using these methods, computation can proceed locally, without the need for inter-processor communication during every iteration. A relative speedup of up to 17 times is obtained using 25 processors, demonstrating that good performance can be obtained running computationally intensive CT reconstruction algorithms on distributed memory hardware.
|
88 |
Shared memory abstraction: new approach under high concurrency conditions / Αφαίρεση κοινής μνήμης: νέα προσέγγιση υπό συνθήκες υψηλής συγχρονικότηταςΚαραντάσης, Κωνσταντίνος 15 May 2012 (has links)
In the current dissertation an implementation of shared memory abstraction on top of
contemporary multi-core and many-core clusters has taken place. The results of the presented research effort are mainly depicted in the implementation of the cluster middleware platform Pleiad. Pleiad is a Java-based prototype that incorporates best practices
from the field of distributed shared memory systems and also includes some prototype
characteristics. Next we review briefly the main results and contributions of the current
dissertation:
• e presented middleware, Pleiad, is characterized by a highly modular design.
Moreover, contrast to most other related efforts, which are usually bound to a
specific implementation of consistency, Pleiad has the infrastructure to incorporate many implementations for a certain mechanism and can even interchange
such implementations during runtime.
• Reference implementations are offered for the relaxed consistency models of Lazy
Release Consistency (LRC) and Scope Consistency (ScC). Pleiad is the first Javabased middleware to incorporate implementations for both protocols.
• In the current dissertation is taking place one of the few evaluations on a cluster
that is supplied with low-power processors (Intel Atom) and thus can be thought
as a characteristic case of embedded oriented multi-core clusters.
• In the current dissertation one of the first implementations of shared memory abstraction on top of GPU clusters is presented. Shared memory abstraction is evaluated under two schemes. On the first scheme shared memory programming with
GPU clusters is achieved under a hybrid combination of the first commercial implementation of OpenMP for clusters, the Intel Cluster OpenMP, and the CUDA
platform. e evaluated scheme is the first evaluation of OpenMP and CUDA
in the context of GPU clusters. e second scheme involves the enhancement of
Pleiad in order to support utilization of GPU clusters. Such implementation is one
of the few unified implementation of a shared memory abstraction programming
environment that
• For the moment there is no establishment of available and widely used benchmarks
or application codes that utilize multiple GPUs, either on a cluster or a single node.
us, among the thesis contributions is considered the evaluation of shared memory abstraction with real application codes, since the few related systems either
have used simple kernels or have been evaluated on a single node.
• Specifically, in the current thesis applications from two characteristic domains,
computational fluid dynamics (CFD) and data clustering, have been implemented and evaluated using GPU clusters and single GPUs. In the first case, a computationally intensive CDF code that operates on structured grids has been accelerated on a GPU cluster, while a simulation that manipulates unstructured grid has
been accelerated in the context of a single GPU and demonstrates its potential for
GPU cluster acceleration. Accordingly, a partitional data clustering algorithm is
accelerated using shared memory abstraction on GPU clusters and a preliminary
implementation of a hierarchical data clustering algorithm on GPUs is described. / -
|
89 |
Performance Evaluation of Node.js on Multi-core Computing SystemsAzmat, Janty January 2018 (has links)
Since JavaScript code that is executed by the Node.js run-time environment is run in a single thread without really utilizing the full power of multi-core systems, fairly new approaches attempt to solve this situation. Some of these approaches are considered well publicly tested and are widely used at the time of writing this document. The objectives for this study are to check which ones of these approaches achieve the better scalability in accordance to the number of handled requests, and to what extent those approaches utilize the multi-core power compared to the raw Node.js environment with the normal CPU scheduling.
|
90 |
Realizace výpočetních úloh na MetaCentru / Realization of demanding computing tasks on MetaCentrumHORELICA, Josef January 2013 (has links)
This thesis deals with the realization of demanding computing tasks on MetaCentrum. For these purposes was created the manual, which should help beginners on MetaCentrum. The first part is about the basic knowledge about parallel computation, characterization of means for computing, project MetaCentrum and some applications offered by MetaCentrum. In the second part there are the practical examples of computing on MetaCentrum. The enclosed CD contains the multimedia tutorial.
|
Page generated in 0.0927 seconds