Global ETD Search

11	Graphic-Processing-Units Based Adaptive Parameter Estimation of a Visual Psychophysical Model Gu, Hairong 17 December 2012 (has links) No description available. Psychology Psychophysics Adaptive Design Optimization GPU computing parameter estimation
12	Accelerating a Coupled SPH-FEM Solver through Heterogeneous Computing for use in Fluid-Structure Interaction Problems Gilbert, John Nicholas 08 June 2015 (has links) This work presents a partitioned approach to simulating free-surface flow interaction with hyper-elastic structures in which a smoothed particle hydrodynamics (SPH) solver is coupled with a finite-element (FEM) solver. SPH is a mesh-free, Lagrangian numerical technique frequently employed to study physical phenomena involving large deformations, such as fragmentation or breaking waves. As a mesh-free Lagrangian method, SPH makes an attractive alternative to traditional grid-based methods for modeling free-surface flows and/or problems with rapid deformations where frequent re-meshing and additional free-surface tracking algorithms are non-trivial. This work continues and extends the earlier coupled 2D SPH-FEM approach of Yang et al. [1,2] by linking a double-precision GPU implementation of a 3D weakly compressible SPH formulation [3] with the open source finite element software Code_Aster [4]. Using this approach, the fluid domain is evolved on the GPU, while the CPU updates the structural domain. Finally, the partitioned solutions are coupled using a traditional staggered algorithm. / Ph. D. smoothed particle hydrodynamics meshfree CFD fluid-structure interaction GPU computing
13	Exploring the dynamic radio sky with many-core high-performance computing Malenta, Mateusz January 2018 (has links) As new radio telescopes and processing facilities are being built, the amount of data that has to be processed is growing continuously. This poses significant challenges, especially if the real-time processing is required, which is important for surveys looking for poorly understood objects, such as Fast Radio Bursts, where quick detection and localisation can enable rapid follow-up observations at different frequencies. With the data rates increasing all the time, new processing techniques using the newest hardware, such as GPUs, have to be developed. A new pipeline, called PAFINDER, has been developed to process data taken with a phased array feed, which can generate up to 36 beams on the sky, with data rates of 25 GBps per beam. With the majority of work done on GPUs, the pipeline reaches real-time performance when generating filterbank files used for offline processing. The full real-time processing, including single-pulse searches has also been implemented and has been shown to perform well under favourable conditions. The pipeline was successfully used to record and process data containing observations of RRAT J1819-1458 and positions on the sky where 3 FRBs have been observed previously, including the repeating FRB121102. Detailed examination of J1819-1458 single-pulse detections revealed a complex emission environment with pulses coming from three different rotation phase bands and a number of multi-component emissions. No new FRBs and no repeated bursts from FRB121102 have been detected. The GMRT High Resolution Southern Sky survey observes the sky at high galactic latitudes, searching for new pulsars and FRBs. 127 hours of data have been searched for the presence of any new bursts, with the help of new pipeline developed for this survey. No new FRBs have been found, which can be the result of bad RFI pollution, which was not fully removed despite new techniques being developed and combined with the existing solutions to mitigate these negative effects. Using the best estimates on the total amount of data that has been processed correctly, obtained using new single-pulse simulation software, no detections were found to be consistent with the expected rates for standard candle FRBs with a flat or positive spectrum. 500
14	Analysis Of Single Phase Fluid Flow And Heat Transfer In Slip Flow Regime By Parallel Implementation Of Lattice Boltzmann Method On Gpus Celik, Sitki Berat 01 September 2012 (has links) (PDF) In this thesis work fluid flow and heat transfer in two-dimensional microchannels are studied numerically. A computer code based on Lattice Boltzmann Method (LBM) is developed for this purpose. The code is written using MATLAB and Jacket software and has the important feature of being able to run parallel on Graphics Processing Units (GPUs). The code is used to simulate flow and heat transfer inside micro and macro channels. Obtained velocity profiles and Nusselt numbers are compared with the Navier-Stokes based analytical and numerical results available in the literature and good matches are observed. Slip velocity and temperature jump boundary conditions are used for the micro channel simulations with Knudsen number values covering the slip flow regime. Speed of the parallel version of the developed code running on GPUs is compared with that of the serial one running on CPU and for large enough meshes more than 14 times speedup is observed. QC Acoustics, Sound 221-246
15	Efficient and Private Processing of Analytical Queries in Scientific Datasets Kumar, Anand 01 January 2013 (has links) Large amount of data is generated by applications used in basic-science research and development applications. The size of data introduces great challenges in storage, analysis and preserving privacy. This dissertation proposes novel techniques to efficiently analyze the data and reduce storage space requirements through a data compression technique while preserving privacy and providing data security. We present an efficient technique to compute an analytical query called spatial distance histogram (SDH) using spatiotemporal properties of the data. Special spatiotemporal properties present in the data are exploited to process SDH efficiently on the fly. General purpose graphics processing units (GPGPU or just GPU) are employed to further boost the performance of the algorithm. Size of the data generated in scientific applications poses problems of disk space requirements, input/output (I/O) delays and data transfer bandwidth requirements. These problems are addressed by applying proposed compression technique. We also address the issue of preserving privacy and security in scientific data by proposing a security model. The security model monitors user queries input to the database that stores and manages scientific data. Outputs of user queries are also inspected to detect privacy breach. Privacy policies are enforced by the monitor to allow only those queries and results that satisfy data owner specified policies. Big Data Compression Edit Automata GPU Computing Molecular Simulations Parallel Processing Computer Engineering Computer Sciences Engineering
16	Nonnegative matrix factorization for clustering Kuang, Da 27 August 2014 (has links) This dissertation shows that nonnegative matrix factorization (NMF) can be extended to a general and efficient clustering method. Clustering is one of the fundamental tasks in machine learning. It is useful for unsupervised knowledge discovery in a variety of applications such as text mining and genomic analysis. NMF is a dimension reduction method that approximates a nonnegative matrix by the product of two lower rank nonnegative matrices, and has shown great promise as a clustering method when a data set is represented as a nonnegative data matrix. However, challenges in the widespread use of NMF as a clustering method lie in its correctness and efficiency: First, we need to know why and when NMF could detect the true clusters and guarantee to deliver good clustering quality; second, existing algorithms for computing NMF are expensive and often take longer time than other clustering methods. We show that the original NMF can be improved from both aspects in the context of clustering. Our new NMF-based clustering methods can achieve better clustering quality and run orders of magnitude faster than the original NMF and other clustering methods. Like other clustering methods, NMF places an implicit assumption on the cluster structure. Thus, the success of NMF as a clustering method depends on whether the representation of data in a vector space satisfies that assumption. Our approach to extending the original NMF to a general clustering method is to switch from the vector space representation of data points to a graph representation. The new formulation, called Symmetric NMF, takes a pairwise similarity matrix as an input and can be viewed as a graph clustering method. We evaluate this method on document clustering and image segmentation problems and find that it achieves better clustering accuracy. In addition, for the original NMF, it is difficult but important to choose the right number of clusters. We show that the widely-used consensus NMF in genomic analysis for choosing the number of clusters have critical flaws and can produce misleading results. We propose a variation of the prediction strength measure arising from statistical inference to evaluate the stability of clusters and select the right number of clusters. Our measure shows promising performances in artificial simulation experiments. Large-scale applications bring substantial efficiency challenges to existing algorithms for computing NMF. An important example is topic modeling where users want to uncover the major themes in a large text collection. Our strategy of accelerating NMF-based clustering is to design algorithms that better suit the computer architecture as well as exploit the computing power of parallel platforms such as the graphic processing units (GPUs). A key observation is that applying rank-2 NMF that partitions a data set into two clusters in a recursive manner is much faster than applying the original NMF to obtain a flat clustering. We take advantage of a special property of rank-2 NMF and design an algorithm that runs faster than existing algorithms due to continuous memory access. Combined with a criterion to stop the recursion, our hierarchical clustering algorithm runs significantly faster and achieves even better clustering quality than existing methods. Another bottleneck of NMF algorithms, which is also a common bottleneck in many other machine learning applications, is to multiply a large sparse data matrix with a tall-and-skinny dense matrix. We use the GPUs to accelerate this routine for sparse matrices with an irregular sparsity structure. Overall, our algorithm shows significant improvement over popular topic modeling methods such as latent Dirichlet allocation, and runs more than 100 times faster on data sets with millions of documents. Nonnegative matrix factorization Cluster analysis Hierarchical clustering Cancer subtype discovery GPU computing Sparse matrix multiplication
17	MR-CUDASW - GPU accelerated Smith-Waterman algorithm for medium-length (meta)genomic data 2014 November 1900 (has links) The idea of using a graphics processing unit (GPU) for more than simply graphic output purposes has been around for quite some time in scientific communities. However, it is only recently that its benefits for a range of bioinformatics and life sciences compute-intensive tasks has been recognized. This thesis investigates the possibility of improving the performance of the overlap determination stage of an Overlap Layout Consensus (OLC)-based assembler by using a GPU-based implementation of the Smith-Waterman algorithm. In this thesis an existing GPU-accelerated sequence alignment algorithm is adapted and expanded to reduce its completion time. A number of improvements and changes are made to the original software. Workload distribution, query profile construction, and thread scheduling techniques implemented by the original program are replaced by custom methods specifically designed to handle medium-length reads. Accordingly, this algorithm is the first highly parallel solution that has been specifically optimized to process medium-length nucleotide reads (DNA/RNA) from modern sequencing machines (i.e. Ion Torrent). Results show that the software reaches up to 82 GCUPS (Giga Cell Updates Per Second) on a single-GPU graphic card running on a commodity desktop hardware. As a result it is the fastest GPU-based implemen- tation of the Smith-Waterman algorithm tailored for processing medium-length nucleotide reads. Despite being designed for performing the Smith-Waterman algorithm on medium-length nucleotide sequences, this program also presents great potential for improving heterogeneous computing with CUDA-enabled GPUs in general and is expected to make contributions to other research problems that require sensitive pairwise alignment to be applied to a large number of reads. Our results show that it is possible to improve the performance of bioinformatics algorithms by taking full advantage of the compute resources of the underlying commodity hardware and further, these results are especially encouraging since GPU performance grows faster than multi-core CPUs. Bioinformatics Sequence Alignment Smith-Waterman Algorithm GPU Computing CUDA Sequence Assembly Metagenomics Next-Generation-Sequencing
18	Accelerating interpreted programming languages on GPUs with just-in-time compilation and runtime optimisations Fumero Alfonso, Juan José January 2017 (has links) Nowadays, most computer systems are equipped with powerful parallel devices such as Graphics Processing Units (GPUs). They are present in almost every computer system including mobile devices, tablets, desktop computers and servers. These parallel systems have unlocked the possibility for many scientists and companies to process significant amounts of data in shorter time. But the usage of these parallel systems is very challenging due to their programming complexity. The most common programming languages for GPUs, such as OpenCL and CUDA, are created for expert programmers, where developers are required to know hardware details to use GPUs. However, many users of heterogeneous and parallel hardware, such as economists, biologists, physicists or psychologists, are not necessarily expert GPU programmers. They have the need to speed up their applications, which are often written in high-level and dynamic programming languages, such as Java, R or Python. Little work has been done to generate GPU code automatically from these high-level interpreted and dynamic programming languages. This thesis presents a combination of a programming interface and a set of compiler techniques which enable an automatic translation of a subset of Java and R programs into OpenCL to execute on a GPU. The goal is to reduce the programmability and usability gaps between interpreted programming languages and GPUs. The first contribution is an Application Programming Interface (API) for programming heterogeneous and multi-core systems. This API combines ideas from functional programming and algorithmic skeletons to compose and reuse parallel operations. The second contribution is a new OpenCL Just-In-Time (JIT) compiler that automatically translates a subset of the Java bytecode to GPU code. This is combined with a new runtime system that optimises the data management and avoids data transformations between Java and OpenCL. This OpenCL framework and the runtime system achieve speedups of up to 645x compared to Java within 23% slowdown compared to the handwritten native OpenCL code. The third contribution is a new OpenCL JIT compiler for dynamic and interpreted programming languages. While the R language is used in this thesis, the developed techniques are generic for dynamic languages. This JIT compiler uniquely combines a set of existing compiler techniques, such as specialisation and partial evaluation, for OpenCL compilation together with an optimising runtime that compile and execute R code on GPUs. This JIT compiler for the R language achieves speedups of up to 1300x compared to GNU-R and 1.8x slowdown compared to native OpenCL.
19	Performance analysis of GPGPU and CPU on AES Encryption Neelap, Akash Kiran January 2014 (has links) The advancements in computing have led to tremendous increase in the amount of data being generated every minute, which needs to be stored or transferred maintaining high level of security. The military and armed forces today heavily rely on computers to store huge amount of important and secret data, that holds a big deal for the security of the Nation. The traditional standard AES encryption algorithm being the heart of almost every application today, although gives a high amount of security, is time consuming with the traditional sequential approach. Implementation of AES on GPUs is an ongoing research since few years, which still is either inefficient or incomplete, and demands for optimizations for better performance. Considering the limitations in previous research works as a research gap, this paper aims to exploit efficient parallelism on the GPU, and on multi-core CPU, to make a fair and reliable comparison. Also it aims to deduce implementation techniques on multi-core CPU and GPU, in order to utilize them for future implementations. This paper experimentally examines the performance of a CPU and GPGPU in different levels of optimizations using Pthreads, CUDA and CUDA STREAMS. It critically exploits the behaviour of a GPU for different granularity levels and different grid dimensions, to examine the effect on the performance. The results show considerable acceleration in speed on NVIDIA GPU (QuadroK4000), over single-threaded and multi-threaded implementations on CPU (Intel® Xeon® E5-1650). / +46-760742850 AES algorithm CUDA GPU computing Pthreads Computer Sciences Datavetenskap (datalogi) Telecommunications Telekommunikation Software Engineering Programvaruteknik
20	Collective behaviour of model microswimmers Putz, Victor B. January 2010 (has links) At small length scales, low velocities, and high viscosity, the effects of inertia on motion through fluid become insignificant and viscous forces dominate. Microswimmer propulsion, of necessity, is achieved through different means than that achieved by macroscopic organisms. We describe in detail the hydrodynamics of microswimmers consisting of colloidal particles and their interactions. In particular we focus on two-bead swimmers and the effects of asymmetry on collective motion, calculating analytical formulae for time-averaged pair interactions and verifying them with microscopic time-resolved numerical simulation, finding good agreement. We then examine the long-term effects of a swimmer's passing on a passive tracer particle, finding that the force-free nature of these microswimmers leads to loop-shaped tracer trajectories. Even in the presence of Brownian motion, the loop-shaped structures of these trajectories can be recovered by averaging over a large enough sample size. Finally, we explore the phenomenon of synchronisation between microswimmers through hydrodynamic interactions, using the method of constraint forces on a force-based swimmer. We find that the hydrodynamic interactions between swimmers can alter the relative phase between them such that phase-locking can occur over the long term, altering their collective motion. 530.4

Search results