• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 72
  • 21
  • 7
  • 7
  • 5
  • 4
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 164
  • 164
  • 82
  • 76
  • 70
  • 55
  • 44
  • 29
  • 25
  • 24
  • 23
  • 22
  • 22
  • 19
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

Enabling rapid iterative model design within the laboratory environment

Clayton, Thomas F. January 2009 (has links)
This thesis presents a proof of concept study for the better integration of the electrophysiological and modelling aspects of neuroscience. Members of these two sub-disciplines collaborate regularly, but due to differing resource requirements, and largely incompatible spheres of knowledge, cooperation is often impeded by miscommunication and delays. To reduce the model design time, and provide a platform for more efficient experimental analysis, a rapid iterative model design method is proposed. The main achievement of this work is the development of a rapid model evaluation method based on parameter estimation, utilising a combination of evolutionary algorithms (EAs) and graphics processing unit (GPU) hardware acceleration. This method is the primary force behind the better integration of modelling and laboratorybased electrophysiology, as it provides a generic model evaluation method that does not require prior knowledge of model structure, or expertise in modelling, mathematics, or computer science. If combined with a suitable intuitive and user targeted graphical user interface, the ideas presented in this thesis could be developed into a suite of tools that would enable new forms of experimentation to be performed. The latter part of this thesis investigates the use of excitability-based models as the basis of an iterative design method. They were found to be computationally and structurally simple, easily extensible, and able to reproduce a wide range of neural behaviours whilst still faithfully representing underlying cellular mechanisms. A case study was performed to assess the iterative design process, through the implementation of an excitability-based model. The model was extended iteratively, using the rapid model evaluation method, to represent a vasopressin releasing neuron. Not only was the model implemented successfully, but it was able to suggest the existence of other more subtle cell mechanisms, in addition to highlighting potential failings in previous implementations of the class of neuron.
12

Genetic programming and cellular automata for fast flood modelling on multi-core CPU and many-core GPU computers

Gibson, Michael John January 2015 (has links)
Many complex systems in nature are governed by simple local interactions, although a number are also described by global interactions. For example, within the field of hydraulics the Navier-Stokes equations describe free-surface water flow, through means of the global preservation of water volume, momentum and energy. However, solving such partial differential equations (PDEs) is computationally expensive when applied to large 2D flow problems. An alternative which reduces the computational complexity, is to use a local derivative to approximate the PDEs, such as finite difference methods, or Cellular Automata (CA). The high speed processing of such simulations is important to modern scientific investigation especially within urban flood modelling, as urban expansion continues to increase the number of impervious areas that need to be modelled. Large numbers of model runs or large spatial or temporal resolution simulations are required in order to investigate, for example, climate change, early warning systems, and sewer design optimisation. The recent introduction of the Graphics Processor Unit (GPU) as a general purpose computing device (General Purpose Graphical Processor Unit, GPGPU) allows this hardware to be used for the accelerated processing of such locally driven simulations. A novel CA transformation for use with GPUs is proposed here to make maximum use of the GPU hardware. CA models are defined by the local state transition rules, which are used in every cell in parallel, and provide an excellent platform for a comparative study of possible alternative state transition rules. Writing local state transition rules for CA systems is a difficult task for humans due to the number and complexity of possible interactions, and is known as the ‘inverse problem’ for CA. Therefore, the use of Genetic Programming (GP) algorithms for the automatic development of state transition rules from example data is also investigated in this thesis. GP is investigated as it is capable of searching the intractably large areas of possible state transition rules, and producing near optimal solutions. However, such population-based optimisation algorithms are limited by the cost of many repeated evaluations of the fitness function, which in this case requires the comparison of a CA simulation to given target data. Therefore, the use of GPGPU hardware for the accelerated learning of local rules is also developed. Speed-up factors of up to 50 times over serial Central Processing Unit (CPU) processing are achieved on simple CA, up to 5-10 times speedup over the fully parallel CPU for the learning of urban flood modelling rules. Furthermore, it is shown GP can generate rules which perform competitively when compared with human formulated rules. This is achieved with generalisation to unseen terrains using similar input conditions and different spatial/temporal resolutions in this important application domain.
13

Application of stream processing to hydraulic network solvers

24 October 2011 (has links)
M.Ing. / The aim of this research was to investigate the use of stream processing on the graphics processing unit (GPU) and to apply it into the hydraulic modelling of a water distribution system. The stream processing model was programmed and compared to the programming on the conventional, sequential programming platform, namely the CPU. The use of the GPU as a parallel processor has been widely adopted in many different non-graphic applications and the benefits of implementing parallel processing in these fields have been significant. They have the capacity to perform from billions to trillions of floating-point operations per second using programmable shader programs. These great advances seen in the GPU architecture have been driven by the gaming industry and a demand for better gaming experiences. The computational performance of the GPU is much greater than the computational capability of CPU processors. Hydraulic modelling of water distribution systems has become vital to the construction of new water distribution systems. This is because water distribution networks are very complex and are nonlinear in nature. Further, modelling is able to prevent and anticipate problems in a system without physically building the system. The hydraulic model that was used was the Gradient Method, which is the hydraulic model used in the EPANET software package. The Gradient Method produces a linear system which is both positive-definite and symmetric. The Cholesky method is currently being used in the EPANET algorithm in order to solve the linear equations produced by the Gradient Method. Thus, a linear solution method had to be selected for the use in both parallel processing on the GPU and as a hydraulic network solver. The Conjugate Gradient algorithm was selected as an ideal algorithm as it works well with the hydraulic solver and could be converted into a parallel algorithm on the GPU. The Conjugate Gradient Method is one of the best-known iterative techniques used in the solution of sparse symmetric positive definite linear systems. The Conjugate Gradient Method was constructed both in the sequential programming model and the stream processing model, using the CPU and the GPU respectively on two different computer systems. The Cholesky method was also programmed in the sequential programming model for both of the computer systems. A comparison was made between the Cholesky and the Conjugate Gradient Methods in order to evaluate the two methods relative to each other. The findings in this study have shown that stream processing on the GPU can be used in the parallel GPU architecture in order to perform general-purpose algorithms. The results further affirmed that iterative linear solution methods should only be used for large linear systems.
14

A multiple-precision integer arithmetic library for GPUs and its applications

Zhao, Kaiyong 01 January 2011 (has links)
No description available.
15

Performance Analysis of Hybrid CPU/GPU Environments

Smith, Michael Shawn 01 January 2010 (has links)
We present two metrics to assist the performance analyst to gain a unified view of application performance in a hybrid environment: GPU Computation Percentage and GPU Load Balance. We analyze the metrics using a matrix multiplication benchmark suite and a real scientific application. We also extend an experiment management system to support GPU performance data and to calculate and store our GPU Computation Percentage and GPU Load Balance metrics.
16

A recurrent neural network implementation using the graphics processing unit /

Moore, Christopher January 1900 (has links)
Thesis (M.S.)--Oregon State University, 2010. / Printout. Includes bibliographical references (leaves 103-104). Also available on the World Wide Web.
17

Run-time loop parallelization with efficient dependency checking on GPU-accelerated platforms

Zhang, Chenggang, 张呈刚 January 2011 (has links)
General-Purpose computing on Graphics Processing Units (GPGPU) has attracted a lot of attention recently. Exciting results have been reported in using GPUs to accelerate applications in various domains such as scientific simulations, data mining, bio-informatics and computational finance. However, up to now GPUs can only accelerate data-parallel loops with statically analyzable parallelism. Loops with dynamic parallelism (e.g., with array accesses through subscripted subscripts), an important pattern in many general-purpose applications, cannot be parallelized on GPUs using existing technologies. Run-time loop parallelization using Thread Level Speculation (TLS) has been proposed in the literatures to parallelize loops with statically un-analyzable dependencies. However, most of the existing TLS systems are designed for multiprocessor/multi-core CPUs. GPUs have fundamental differences with CPUs in both hardware architecture and execution model, making the previous TLS designs not work or inefficient when ported to GPUs. This thesis presents GPUTLS, a runtime system designed to support speculative loop parallelization on GPUs. The design of GPU-TLS addresses several key problems encountered when adapting TLS to GPUs: (1) To reduce the possibility of mis-speculation, deferred-update memory versioning scheme is adopted to avoid mis-speculations caused by inter-iteration WAR and WAW dependencies. A technique named intra-warp value forwarding is proposed to respect some inter-iteration RAW dependencies, which further reduces the mis-speculation possibility. (2) An incremental speculative execution scheme is designed to exploit partial parallelism within loops. This avoids excessive re-executions and reduces the mis-speculation penalty. (3) The dependency checking among thousands of speculative GPU threads poses large overhead and can easily become the performance bottleneck. To lower the overhead, we design several e_cient dependency checking schemes named PRW+BDC, SW, SR, SRW+EDC, and SRW+LDC respectively. (4) We devise a novel parallel commit scheme to avoid the overhead incurred by the serial commit phase in most existing TLS designs. We have carried out extensive experiments on two platforms with different NVIDIA GPUs, using both a synthetic loop that can simulate loops with different characteristics and several loops from real-life applications. Testing results show that the proposed intra-warp value forwarding and eager dependency checking techniques can improve the performance for almost all kinds of loop patterns. We observe that compared with other dependency checking schemes, SR and SW can achieve better performance in most cases. It is also shown that the proposed parallel commit scheme is especially useful for loops with large write set size and small number of inter-iteration WAW dependencies. Overall, GPU-TLS can achieve speedups ranging from 5 to 105 for loops with dynamic parallelism. / published_or_final_version / Computer Science / Master / Master of Philosophy
18

Profile-guided loop parallelization and co-scheduling on GPU-based heterogeneous many-core architectures

Han, Guodong, 韩国栋 January 2013 (has links)
The GPU-based heterogeneous architectures (e.g., Tianhe-1A, Nebulae), composing multi-core CPU and GPU, have drawn increasing adoptions and are becoming the norm of supercomputing as they are cost-effective and power-efficient. However, programming such heterogeneous architectures still requires significant effort from application developers using sophisticated GPU programming languages such as CUDA and OpenCL. Although some automatic parallelization tools utilizing static analysis could ease the programming efforts, this approach could only parallelize loops 100% free of inter-iteration dependency (i.e., determined DO-ALL loops) because of imprecision of static analysis. To exploit the abundant runtime parallelism and take full advantage of the computing resources both in CPU and GPU, in this work, we propose a new user-friendly compiler framework and runtime system, which helps Java applications harness the full power of a heterogeneous system. It unveils an all-round system design unifying the programming style and language for transparent use of both CPUs and GPUs, automatically parallelizing all kinds of loops, scheduling workloads efficiently across CPU and GPU resources while ensuring data coherence during highly-threaded execution. By means of simple user annotations, sequential Java source code will be analyzed, translated and compiled into a dual executable consisting of CUDA kernels and multiple Java threads running on GPU and CPU cores respectively. Annotated loops will be automatically split into loop chunks (or tasks) being scheduled to execute on all available GPU/CPU cores. To guide the runtime task scheduling, we develop a novel dynamic loop profiler which generates the program dependency graph (PDG) and computes the density of dependencies across iterations through a hybrid checking scheme combining intra-warp and inter-warp analyses. Implementing a GPU-tailored thread-level speculation (TLS) model, our system supports speculative execution of loops with moderate dependency densities and privatization of loops having only false dependencies on the GPU side. Our scheduler also supports task stealing and task sharing algorithms that allow swift load redistribution across GPU and CPU. We have carried out several experiments to evaluate the profiling overhead and up to 11 real-life applications to evaluate our system performance. Testing results show that the overhead is moderate compared with the sequential execution and prove that almost all the applications could benefit from our system. / published_or_final_version / Computer Science / Master / Master of Philosophy
19

Modelo de performance para código com desvios de execução em hardware gráfico / Performance model for code with execution branches in graphics hardware

Vasconcelos, Atila Bohlke January 2006 (has links)
O advento das unidades de processamento gráfico (GPUs) programáveis forneceram um novo modelo computacional que pode ser utilizado em diversas aplicações. Baseadas em arquitetura de fluxo paralelo, a atual geração de GPUs oferece processadores de vértices e de fragmentos programáveis que podem aumentar drasticamente a performance comparada com soluções implementadas exclusivamente em CPUs. Entretanto obter performance ótima no modelo computacional da GPU, que é complexo e altamente paralelo, com ferramentas de depuração limitadas é uma tarefa difícil e importante. Neste trabalho nós descrevemos uma abordagem simples para avaliar diversas soluções baseadas em GPU para uma dada solução. Ela consiste de um modelo de estimativa de performance que procura reproduzir, dentro de faixas toleráveis de erro, a medida de performance para a unidade de processamento de fragmentos. Nós avaliamos a nossa proposta utilizando as últimas gerações de placas gráficas da NVidia e da ATI usando um conjunto de medidas sintéticas bem como um estudo de caso de uma aplicação em tempo-real. / The advent of Graphics Processing Units (GPUs) with programmable shaders brought a new computational model that can be used in several applications. Based on a parallel streaming architecture, current GPU generations offer a vertex and fragment shader that can drastically improve performance if compared to CPU-only solutions. However, obtaining optimal performance in the highly parallel and complex GPU model with limited debugging tools is a challenging and important task. In this work we describe a simple approach to evaluate several GPU alternatives to a given solution. It consists of a performance estimation model that aims to reproduce within acceptable errors the measured performance of the fragment shader. We evaluate our proposal using last generation cards from NVIDIA and ATI using synthetic benchmarks as well as a real-time graphics application case-study.
20

Modelo de performance para código com desvios de execução em hardware gráfico / Performance model for code with execution branches in graphics hardware

Vasconcelos, Atila Bohlke January 2006 (has links)
O advento das unidades de processamento gráfico (GPUs) programáveis forneceram um novo modelo computacional que pode ser utilizado em diversas aplicações. Baseadas em arquitetura de fluxo paralelo, a atual geração de GPUs oferece processadores de vértices e de fragmentos programáveis que podem aumentar drasticamente a performance comparada com soluções implementadas exclusivamente em CPUs. Entretanto obter performance ótima no modelo computacional da GPU, que é complexo e altamente paralelo, com ferramentas de depuração limitadas é uma tarefa difícil e importante. Neste trabalho nós descrevemos uma abordagem simples para avaliar diversas soluções baseadas em GPU para uma dada solução. Ela consiste de um modelo de estimativa de performance que procura reproduzir, dentro de faixas toleráveis de erro, a medida de performance para a unidade de processamento de fragmentos. Nós avaliamos a nossa proposta utilizando as últimas gerações de placas gráficas da NVidia e da ATI usando um conjunto de medidas sintéticas bem como um estudo de caso de uma aplicação em tempo-real. / The advent of Graphics Processing Units (GPUs) with programmable shaders brought a new computational model that can be used in several applications. Based on a parallel streaming architecture, current GPU generations offer a vertex and fragment shader that can drastically improve performance if compared to CPU-only solutions. However, obtaining optimal performance in the highly parallel and complex GPU model with limited debugging tools is a challenging and important task. In this work we describe a simple approach to evaluate several GPU alternatives to a given solution. It consists of a performance estimation model that aims to reproduce within acceptable errors the measured performance of the fragment shader. We evaluate our proposal using last generation cards from NVIDIA and ATI using synthetic benchmarks as well as a real-time graphics application case-study.

Page generated in 0.1008 seconds