Global ETD Search

81	PERFORMANCE EVALUATION OF MEMORY AND COMPUTATIONALLY BOUND CHEMISTRY APPLICATIONS ON STREAMING GPGPUS AND MULTI-CORE X86 CPUS Weber III, Frederick E 01 May 2010 (has links) In recent years, multi-core processors have come to dominate the field in desktop and high performance computing. Graphics processors traditionally used in CAD, video games, and other 3-d applications, have become more programmable and are now suitable for general purpose computing. This thesis explores multi-core processors and GPU performance and limitations in two computational chemistry applications: a memory bound component of ab-initio modeling and a computationally bound Monte Carlo simulation. For the applications presented in this thesis, exploiting multiple processors is done using a variety of tools and languages including OpenMP and MKL. Brook+ and the Compute Abstraction Layer streaming environments are used to accelerate applications on AMD GPUs. This thesis gives qualitative assertions about these languages and tools regarding ease of use and optimization in addition to quantitative analyses of performance. GPUs can yield modest performance improvements with little effort in some applications and even larger speedups with simple optimizations. GPU multi-core Monte Carlo parallel computing Computer and Systems Architecture
82	Performance Analysis of kNN on large datasets using CUDA & Pthreads : Comparing between CPU & GPU Kankatala, Sriram January 2015 (has links) Several organizations have large databases which are growing at a rapid rate day by day, which need to be regularly maintained. Content based searches are similar searched based on certain features that are obtained from various multi media data. For various applications like multimedia content retrieval, data mining, pattern recognition, etc., performing the nearest neighbor search is a challenging task in multidimensional data. The important factors in nearest neighbor search kNN are searching speed and accuracy. Implementation of kNN on GPU is an ongoing research from last few years, focusing on improving the performance of kNN. By considering these aspects, our research has been started and found a gap in this research area. This master thesis shows effective and efficient parallelism on multi-core of CPU and GPU to compare the performance with single core CPU. This paper shows an experimental implementation of kNN on single core CPU, Mutli-core CPU and GPU using C, Pthreads and CUDA respectively. We considered different levels of inputs (size, dimensions) to evaluate the performance. The experiment shows the GPU outperforms for kNN when compared to CPU single core with a factor of approximately 5.8 to 16 and CPU multi-core with a factor of approximately 1.2 to 3 for different levels of inputs. GPU Multicore CPU Parallel computing Performance Single core CPU
83	Parallel Computing in Statistical-Validation of Clustering Algorithm for the Analysis of High throughput Data Atlas, Mourad 12 May 2005 (has links) Currently, clustering applications use classical methods to partition a set of data (or objects) in a set of meaningful sub-classes, called clusters. A cluster is therefore a collection of objects which are “similar” among them, thus can be treated collectively as one group, and are “dissimilar” to the objects belonging to other clusters. However, there are a number of problems with clustering. Among them, as mentioned in [Datta03], dealing with large number of dimensions and large number of data items can be problematic because of computational time. In this thesis, we investigate all clustering algorithms used in [Datta03] and we present a parallel solution to minimize the computational time. We apply parallel programming techniques to the statistical algorithms as a natural extension to sequential programming technique using R. The proposed parallel model has been tested on a high throughput dataset. It is microarray data on the transcriptional profile during sporulation in budding yeast. It contains more than 6,000 genes. Our evaluation includes clustering algorithm scalability pertaining to datasets with varying dimensions, the speedup factor, and the efficiency of the parallel model over the sequential implementation. Our experiments show that the gene expression data follow the pattern predicted in [Datta03] that is Diana appears to be solid performer also the group means for each cluster coincides with that in [Datta03]. We show that our parallel model is applicable to the clustering algorithms and more useful in applications that deal with high throughput data, such as gene expression data. Statistical Validation Clustering Algorithms High Throughput Data Parallel computing Mathematics
84	Sprendimų priėmimas lygiagrečiuose skaičiavimuose / Decisions making in parallel computing Filatovas, Ernestas 14 June 2006 (has links) The principles of parallel computing, the MPI program parcel, which was used in this work, allow to adapt them to solve tasks using computer nets and peculiarities of implanting this package. The specific character of multiple criteria tasks of optimization and way of solutions using computer nets were cleared up. The decision theory, basic methods of making decisions were described in this work. Informatics Decisions making Lygiagretūs skaičiavimai Parallel computing Sprendimų priėmimas
85	A Case Study of A Multithreaded Buchberger Normal Form Algorithm Linfoot, Andy James January 2006 (has links) Groebner bases have many applications in mathematics, science, and engineering. This dissertation deals with the algorithmic aspects of computing these bases. The dissertation begins with a brief introduction of fundamental concepts about Groebner bases. Following this a discussion of various implementation issues are discussed. Much of the practical difficulties of using Groebner basis algorithms and techniques stems from the high computational complexity. It is shown that the algorithmic complexity of computing a Groebner basis primarily stems from the calculation of normal forms. This is established by studying run profiles of various computations. This leads to two options of making Groebner basis techniques more practical. They are to reduce the complexity by developing new algorithms (heuristics) or reduce running time of normal form calculations by introducing concurrency. The later approach is taken in the remainder of the dissertation where a multithreaded normal form algorithm is presented and discussed. It is shown with a simple example that the new algorithm demonstrates a speedup and scalability. The algorithm also has the advantage of being completion strategy independent. We conclude with an outline of future research involving the new algorithm. Applied Mathematics Symbolic Computation Groebner Bases Parallel Computing Computer Algebra
86	Local independence in computed tomography as a basis for parallel computing Martin, Daniel Morris 14 September 2007 (has links) Iterative CT reconstruction algorithms are superior to the standard convolution backpropagation (CBP) methods when reconstructing from a small number of views (hence less radiation), but are computationally costly. To reduce the execution time, this work implements and tests a parallel approach to iterative algorithms using a cluster of workstations, which is a low cost system found in many offices and non-academic sites. A previous implementation showed little speedup because of the significant cost of inter-processor communication. In this thesis, several data partitioning methods are examined, including some image tiling methods that exploit the spatial locality demonstrated by local CT. Using these methods, computation can proceed locally, without the need for inter-processor communication during every iteration. A relative speedup of up to 17 times is obtained using 25 processors, demonstrating that good performance can be obtained running computationally intensive CT reconstruction algorithms on distributed memory hardware. parallel computing computed tomography algebraic reconstruction technique data partitioning
87	Local independence in computed tomography as a basis for parallel computing Martin, Daniel Morris 14 September 2007 (has links) Iterative CT reconstruction algorithms are superior to the standard convolution backpropagation (CBP) methods when reconstructing from a small number of views (hence less radiation), but are computationally costly. To reduce the execution time, this work implements and tests a parallel approach to iterative algorithms using a cluster of workstations, which is a low cost system found in many offices and non-academic sites. A previous implementation showed little speedup because of the significant cost of inter-processor communication. In this thesis, several data partitioning methods are examined, including some image tiling methods that exploit the spatial locality demonstrated by local CT. Using these methods, computation can proceed locally, without the need for inter-processor communication during every iteration. A relative speedup of up to 17 times is obtained using 25 processors, demonstrating that good performance can be obtained running computationally intensive CT reconstruction algorithms on distributed memory hardware. parallel computing computed tomography algebraic reconstruction technique data partitioning
88	Shared memory abstraction: new approach under high concurrency conditions / Αφαίρεση κοινής μνήμης: νέα προσέγγιση υπό συνθήκες υψηλής συγχρονικότητας Καραντάσης, Κωνσταντίνος 15 May 2012 (has links) In the current dissertation an implementation of shared memory abstraction on top of contemporary multi-core and many-core clusters has taken place. The results of the presented research eﬀort are mainly depicted in the implementation of the cluster middleware platform Pleiad. Pleiad is a Java-based prototype that incorporates best practices from the ﬁeld of distributed shared memory systems and also includes some prototype characteristics. Next we review brieﬂy the main results and contributions of the current dissertation: • e presented middleware, Pleiad, is characterized by a highly modular design. Moreover, contrast to most other related eﬀorts, which are usually bound to a speciﬁc implementation of consistency, Pleiad has the infrastructure to incorporate many implementations for a certain mechanism and can even interchange such implementations during runtime. • Reference implementations are oﬀered for the relaxed consistency models of Lazy Release Consistency (LRC) and Scope Consistency (ScC). Pleiad is the ﬁrst Javabased middleware to incorporate implementations for both protocols. • In the current dissertation is taking place one of the few evaluations on a cluster that is supplied with low-power processors (Intel Atom) and thus can be thought as a characteristic case of embedded oriented multi-core clusters. • In the current dissertation one of the ﬁrst implementations of shared memory abstraction on top of GPU clusters is presented. Shared memory abstraction is evaluated under two schemes. On the ﬁrst scheme shared memory programming with GPU clusters is achieved under a hybrid combination of the ﬁrst commercial implementation of OpenMP for clusters, the Intel Cluster OpenMP, and the CUDA platform. e evaluated scheme is the ﬁrst evaluation of OpenMP and CUDA in the context of GPU clusters. e second scheme involves the enhancement of Pleiad in order to support utilization of GPU clusters. Such implementation is one of the few uniﬁed implementation of a shared memory abstraction programming environment that • For the moment there is no establishment of available and widely used benchmarks or application codes that utilize multiple GPUs, either on a cluster or a single node. us, among the thesis contributions is considered the evaluation of shared memory abstraction with real application codes, since the few related systems either have used simple kernels or have been evaluated on a single node. • Speciﬁcally, in the current thesis applications from two characteristic domains, computational ﬂuid dynamics (CFD) and data clustering, have been implemented and evaluated using GPU clusters and single GPUs. In the ﬁrst case, a computationally intensive CDF code that operates on structured grids has been accelerated on a GPU cluster, while a simulation that manipulates unstructured grid has been accelerated in the context of a single GPU and demonstrates its potential for GPU cluster acceleration. Accordingly, a partitional data clustering algorithm is accelerated using shared memory abstraction on GPU clusters and a preliminary implementation of a hierarchical data clustering algorithm on GPUs is described. / - Shared memory abstraction Parallel computing Pleiad 004.53
89	Performance Evaluation of Node.js on Multi-core Computing Systems Azmat, Janty January 2018 (has links) Since JavaScript code that is executed by the Node.js run-time environment is run in a single thread without really utilizing the full power of multi-core systems, fairly new approaches attempt to solve this situation. Some of these approaches are considered well publicly tested and are widely used at the time of writing this document. The objectives for this study are to check which ones of these approaches achieve the better scalability in accordance to the number of handled requests, and to what extent those approaches utilize the multi-core power compared to the raw Node.js environment with the normal CPU scheduling. Node.js parallel computing multi-core systems Engineering and Technology Teknik och teknologier
90	Realizace výpočetních úloh na MetaCentru / Realization of demanding computing tasks on MetaCentrum HORELICA, Josef January 2013 (has links) This thesis deals with the realization of demanding computing tasks on MetaCentrum. For these purposes was created the manual, which should help beginners on MetaCentrum. The first part is about the basic knowledge about parallel computation, characterization of means for computing, project MetaCentrum and some applications offered by MetaCentrum. In the second part there are the practical examples of computing on MetaCentrum. The enclosed CD contains the multimedia tutorial.

Search results