Global ETD Search

391	GPU computing for cognitive robotics Peniak, Martin January 2014 (has links) This thesis presents the first investigation of the impact of GPU computing on cognitive robotics by providing a series of novel experiments in the area of action and language acquisition in humanoid robots and computer vision. Cognitive robotics is concerned with endowing robots with high-level cognitive capabilities to enable the achievement of complex goals in complex environments. Reaching the ultimate goal of developing cognitive robots will require tremendous amounts of computational power, which was until recently provided mostly by standard CPU processors. CPU cores are optimised for serial code execution at the expense of parallel execution, which renders them relatively inefficient when it comes to high-performance computing applications. The ever-increasing market demand for high-performance, real-time 3D graphics has evolved the GPU into a highly parallel, multithreaded, many-core processor extraordinary computational power and very high memory bandwidth. These vast computational resources of modern GPUs can now be used by the most of the cognitive robotics models as they tend to be inherently parallel. Various interesting and insightful cognitive models were developed and addressed important scientific questions concerning action-language acquisition and computer vision. While they have provided us with important scientific insights, their complexity and application has not improved much over the last years. The experimental tasks as well as the scale of these models are often minimised to avoid excessive training times that grow exponentially with the number of neurons and the training data. This impedes further progress and development of complex neurocontrollers that would be able to take the cognitive robotics research a step closer to reaching the ultimate goal of creating intelligent machines. This thesis presents several cases where the application of the GPU computing on cognitive robotics algorithms resulted in the development of large-scale neurocontrollers of previously unseen complexity enabling the conducting of the novel experiments described herein. 006.3
392	Improving Visualisation of Large Multi-Variate Datasets: New Hardware-Based Compression Algorithms and Rendering Techniques Chernoglazov, Alexander Igorevich January 2012 (has links) Spectral computed tomography (CT) is a novel medical imaging technique that involves simultaneously counting photons at several energy levels of the x-ray spectrum to obtain a single multi-variate dataset. Visualisation of such data poses significant challenges due its extremely large size and the need for interactive performance for scientific and medical end-users. This thesis explores the properties of spectral CT datasets and presents two algorithms for GPU-accelerated real-time rendering from compressed spectral CT data formats. In addition, we describe an optimised implementation of a volume raycasting algorithm on modern GPU hardware, tailored to the visualisation of spectral CT data. visualisation compression volume rendering gpu gpgpu cuda spectral ct ct rendering optimisation
393	Computational Tools and Methods for Objective Assessment of Image Quality in X-Ray CT and SPECT Palit, Robin January 2012 (has links) Computational tools of use in the objective assessment of image quality for tomography systems were developed for computer processing units (CPU) and graphics processing units (GPU) in the image quality lab at the University of Arizona. Fast analytic x-ray projection code called IQCT was created to compute the mean projection image for cone beam multi-slice helical computed tomography (CT) scanners. IQCT was optimized to take advantage of the massively parallel architecture of GPUs. CPU code for computing single photon emission computed tomography (SPECT) projection images was written calling upon previous research in the image quality lab. IQCT and the SPECT modeling code were used to simulate data for multimodality SPECT/CT observer studies. The purpose of these observer studies was to assess the benefit in image quality of using attenuation information from a CT measurement in myocardial SPECT imaging. The observer chosen for these studies was the scanning linear observer. The tasks for the observer were localization of a signal and estimation of the signal radius. For the localization study, area under the localization receiver operating characteristic curve (A(LROC)) was computed as A(LROC)^Meas = 0.89332 ± 0.00474 and A(LROC)^No = 0.89408 ± 0.00475, where "Meas" implies the use of attenuation information from the CT measurement, and "No" indicates the absence of attenuation information. For the estimation study, area under the estimation receiver operating characteristic curve (A(EROC)) was quantified as A(EROC)^Meas = 0.55926 ± 0.00731 and A(EROC)^No = 0.56167 ± 0.00731. Based on these results, it was concluded that the use of CT information did not improve the scanning linear observer's ability to perform the stated myocardial SPECT tasks. The risk to the patient of the CT measurement was quantified in terms of excess effective dose as 2.37 mSv for males and 3.38 mSv for females.Another image quality tool generated within this body of work was a singular value decomposition (SVD) algorithm to reduce the dimension of the eigenvalue problem for tomography systems with rotational symmetry. Agreement in the results of this reduced dimension SVD algorithm and those of a standard SVD algorithm are shown for a toy problem. The use of SVD toward image quality metrics such as the measurement and null space are also presented. GPU image quality singular value decomposition SPECT Optical Sciences CT dose
394	An Embedded Shading Language Qin, Zheng January 2004 (has links) Modern graphics accelerators have embedded programmable components in the form of vertex and fragment shading units. Current APIs permit specification of the programs for these components using an assembly-language level interface. Compilers for high-level shading languages are available but these read in an external string specification, which can be inconvenient. It is possible, using standard C++, to define an embedded high-level shading language. Such a language can be nearly indistinguishable from a special-purpose shading language, yet permits more direct interaction with the specification of textures and parameters, simplifies implementation, and enables on-the-fly generation, manipulation, and specification of shader programs. An embedded shading language also permits the lifting of C++ host language type, modularity, and scoping constructs into the shading language without any additional implementation effort. Computer Science Shading GPU Shading Language shader vertex shader fragment shader
395	Type-2 fuzzy alpha-cuts Hamrawi, Hussam January 2011 (has links) Systems that utilise type-2 fuzzy sets to handle uncertainty have not been implemented in real world applications unlike the astonishing number of applications involving standard fuzzy sets. The main reason behind this is the complex mathematical nature of type-2 fuzzy sets which is the source of two major problems. On one hand, it is difficult to mathematically manipulate type-2 fuzzy sets, and on the other, the computational cost of processing and performing operations using these sets is very high. Most of the current research carried out on type-2 fuzzy logic concentrates on finding mathematical means to overcome these obstacles. One way of accomplishing the first task is to develop a meaningful mathematical representation of type-2 fuzzy sets that allows functions and operations to be extended from well known mathematical forms to type-2 fuzzy sets. To this end, this thesis presents a novel alpha-cut representation theorem to be this meaningful mathematical representation. It is the decomposition of a type-2 fuzzy set in to a number of classical sets. The alpha-cut representation theorem is the main contribution of this thesis. This dissertation also presents a methodology to allow functions and operations to be extended directly from classical sets to type-2 fuzzy sets. A novel alpha-cut extension principle is presented in this thesis and used to define uncertainty measures and arithmetic operations for type-2 fuzzy sets. Throughout this investigation, a plethora of concepts and definitions have been developed for the first time in order to make the manipulation of type-2 fuzzy sets a simple and straight forward task. Worked examples are used to demonstrate the usefulness of these theorems and methods. Finally, the crisp alpha-cuts of this fundamental decomposition theorem are by definition independent of each other. This dissertation shows that operations on type-2 fuzzy sets using the alpha-cut extension principle can be processed in parallel. This feature is found to be extremely powerful, especially if performing computation on the massively parallel graphical processing units. This thesis explores this capability and shows through different experiments the achievement of significant reduction in processing time. 004
396	High performance bioinformatics and computational biology on general-purpose graphics processing units Ling, Cheng January 2012 (has links) Bioinformatics and Computational Biology (BCB) is a relatively new multidisciplinary field which brings together many aspects of the fields of biology, computer science, statistics, and engineering. Bioinformatics extracts useful information from biological data and makes these more intuitive and understandable by applying principles of information sciences, while computational biology harnesses computational approaches and technologies to answer biological questions conveniently. Recent years have seen an explosion of the size of biological data at a rate which outpaces the rate of increases in the computational power of mainstream computer technologies, namely general purpose processors (GPPs). The aim of this thesis is to explore the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high performance and efficient implementation of BCB applications in order to meet the demands of biological data increases at affordable cost. The thesis presents detailed design and implementations of GPU solutions for a number of BCB algorithms in two widely used BCB applications, namely biological sequence alignment and phylogenetic analysis. Biological sequence alignment can be used to determine the potential information about a newly discovered biological sequence from other well-known sequences through similarity comparison. On the other hand, phylogenetic analysis is concerned with the investigation of the evolution and relationships among organisms, and has many uses in the fields of system biology and comparative genomics. In molecular-based phylogenetic analysis, the relationship between species is estimated by inferring the common history of their genes and then phylogenetic trees are constructed to illustrate evolutionary relationships among genes and organisms. However, both biological sequence alignment and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with the size of sequence databases. The thesis firstly presents a multi-threaded parallel design of the Smith- Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A novel technique is put forward to solve the restriction on the length of the query sequence in previous GPU-based implementations of the SW algorithm. Based on this implementation, the difference between two main task parallelization approaches (Inter-task and Intra-task parallelization) is presented. The resulting GPU implementation matches the speed of existing GPU implementations while providing more flexibility, i.e. flexible length of sequences in real world applications. It also outperforms an equivalent GPPbased implementation by 15x-20x. After this, the thesis presents the first reported multi-threaded design and GPU implementation of the Gapped BLAST with Two-Hit method algorithm, which is widely used for aligning biological sequences heuristically. This achieved up to 3x speed-up improvements compared to the most optimised GPP implementations. The thesis then presents a multi-threaded design and GPU implementation of a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and multiple sequence alignment (MSA). This achieves 8x-20x speed up compared to an equivalent GPP implementation based on the widely used ClustalW software. The NJ method however only gives one possible tree which strongly depends on the evolutionary model used. A more advanced method uses maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte Carlo (MCMC)-based Bayesian inference. The latter was the subject of another multi-threaded design and GPU implementation presented in this thesis, which achieved 4x-8x speed up compared to an equivalent GPP implementation based on the widely used MrBayes software. Finally, the thesis presents a general evaluation of the designs and implementations achieved in this work as a step towards the evaluation of GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA) technology. 572.8
397	Optimisation and computational methods to model the oculomotor system with focus on nystagmus Avramidis, Eleftherios January 2015 (has links) Infantile nystagmus is a condition that causes involuntary, bilateral and conjugate oscillations of the eyes, which are predominately restricted to the horizontal plane. In order to investigate the cause of nystagmus, computational models and nonlinear dynamics techniques have been used to model and analyse the oculomotor system. Computational models are important in making predictions and creating a quantitative framework for the analysis of the oculomotor system. Parameter estimation is a critical step in the construction and analysis of these models. A preliminary parameter estimation of a nonlinear dynamics model proposed by Broomhead et al. [1] has been shown to be able to simulate both normal rapid eye movements (i.e. saccades) and nystagmus oscillations. The application of nonlinear analysis to experimental jerk nystagmus recordings, has shown that the local dimensions number of the oscillation varies across the phase angle of the nystagmus cycle. It has been hypothesised that this is due to the impact of signal dependent noise (SDN) on the neural commands in the oculomotor system. The main aims of this study were: (i) to develop parameter estimation methods for the Broomhead et al. [1] model in order to explore its predictive capacity by fitting it to experimental recordings of nystagmus waveforms and saccades; (ii) to develop a stochastic oculomotor model and examine the hypothesis that noise on the neural commands could be the cause of the behavioural characteristics measured from experimental nystagmus time series using nonlinear analysis techniques. In this work, two parameter estimation methods were developed, one for fitting the model to the experimental nystagmus waveforms and one to saccades. By using the former method, we successfully fitted the model to experimental nystagmus waveforms. This fit allowed to find the specific parameter values that set the model to generate these waveforms. The types of the waveforms that we successfully fitted were asymmetric pseudo-cycloid, jerk and jerk with extended foveation. The fit of other types of nystagmus waveforms were not examined in this work. Moreover, the results showed which waveforms the model can generate almost perfectly and the waveform characteristics of a number of jerk waveforms which it cannot exactly generate. These characteristics were on a specific type of jerk nystagmus waveforms with a very extreme fast phase. The latter parameter estimation method allowed us to explore whether the model can generate horizontal saccades of different amplitudes with the same behaviour as observed experimentally. The results suggest that the model can generate the experimental saccadic velocity profiles of different saccadic amplitudes. However, the results show that best fittings of the model to the experimental data are when different model parameter values were used for different saccadic amplitude. Our parameter estimation methods are based on multi-objective genetic algorithms (MOGA), which have the advantage of optimising biological models with a multi-objective, high-dimensional and complex search space. However, the integration of these models, for a wide range of parameter combinations, is very computationally intensive for a single central processing unit (CPU). To overcome this obstacle, we accelerated the parameter estimation method by utilising the parallel capabilities of a graphics processing unit (GPU). Depending of the GPU model, this could provide a speedup of 30 compared to a midrange CPU. The stochastic model that we developed is based on the Broomhead et al. [1] model, with signal dependent noise (SDN) and constant noise (CN) added to the neural commands. We fitted the stochastic model to saccades and jerk nystagmus waveforms. It was found that SDN and CN can cause similar variability to the local dimensions number of the oscillation as found in the experimental jerk nystagmus waveforms and in the case of saccade generation the saccadic variability recorded experimentally. However, there are small differences in the simulated behaviour compared to the nystagmus experimental data. We hypothesise that these could be caused by the inability of the model to simulate exactly key jerk waveform characteristics. Moreover, the differences between the simulations and the experimental nystagmus waveforms indicate that the proposed model requires further expansion, and this could include other oculomotor subsystem(s). 617.7
398	High Performance Portability with RAJA and Agency Obermiller, Dan 01 January 2017 (has links) High performance and scientific computing take advantage of high-end and high-spec computer architectures. As these architectures evolve, and new architectures are created, applications may be able to run at greater and greater speeds. These changes persent challenges to implementors who wish to take advantage of the newest features and machines. Portability layers such as RAJA and Agency seek to abstract away machine-specific details and allow scientists to take advantage of new features as they become available. We enhance RAJA with a lower-level framework, Agency, to determine if these layered abstractions provide performance or maintainability benefits. high-performance portability c++ gpu threading Programming Languages and Compilers
399	True random number generation using genetic algorithms on high performance architectures MIJARES CHAN, JOSE JUAN 01 September 2016 (has links) Many real-world applications use random numbers generated by pseudo-random number and true random number generators (TRNG). Unlike pseudo-random number generators which rely on an input seed to generate random numbers, a TRNG relies on a non-deterministic source to generate aperiodic random numbers. In this research, we develop a novel and generic software-based TRNG using a random source extracted from compute architectures of today. We show that the non-deterministic events such as race conditions between compute threads follow a near Gamma distribution, independent of the architecture, multi-cores or co-processors. Our design improves the distribution towards a uniform distribution ensuring the stationarity of the sequence of random variables. We improve the random numbers statistical deficiencies by using a post-processing stage based on a heuristic evolutionary algorithm. Our post-processing algorithm is composed of two phases: (i) Histogram Specification and (ii) Stationarity Enforcement. We propose two techniques for histogram equalization, Exact Histogram Equalization (EHE) and Adaptive EHE (AEHE) that maps the random numbers distribution to a user-specified distribution. EHE is an offline algorithm with O(NlogN). AEHE is an online algorithm that improves performance using a sliding window and achieves O(N). Both algorithms ensure a normalized entropy of (0:95; 1:0]. The stationarity enforcement phase uses genetic algorithms to mitigate the statistical deficiencies from the output of histogram equalization by permuting the random numbers until wide-sense stationarity is achieved. By measuring the power spectral density standard deviation, we ensure that the quality of the numbers generated from the genetic algorithms are within the specified level of error defined by the user. We develop two algorithms, a naive algorithm with an expected exponential complexity of E[O(eN)], and an accelerated FFT-based algorithm with an expected quadratic complexity of E[O(N2)]. The accelerated FFT-based algorithm exploits the parallelism found in genetic algorithms on a homogeneous multi-core cluster. We evaluate the effects of its scalability and data size on a standardized battery of tests, TestU01, finding the tuning parameters to ensure wide-sense stationarity on long runs. / October 2016
400	GPU-Based Visualisation of Viewshed from Roads or Areas in a 3D Environment Christoph, Heilmair January 2016 (has links) Viewshed refers to the calculation and visualisation of what part of a terrain isvisible from a given observer point. It is used within many fields, such as militaryplanning or telecommunication tower placement. So far, no general fast methodsexist for calculating the viewshed for multiple observers that may for instancerepresent a road within the terrain. Additionally, if the terrain contains over-lapping structures such as man-made constructions like bridges, most currentviewshed algorithms fail. This report describes two novel methods for viewshedcalculation using multiple observers for terrain that may contain overlappingstructures. The methods have been developed at Vricon in Linköping as a Mas-ter’s Thesis project. Both methods are implemented using the graphics program-ming unit and the OpenGL graphics library, using a computer graphics approach.Results are presented in the form of figures and images, as well as running timetables using two different test setups. Lastly, future possible improvements arealso discussed. The results show that the first method is a viable real-time solu-tion and that the second method requires some additional work. viewshed gpu multiple observers observer graphics card efficient viewshed analysis road real-time real time

Search results