Global ETD Search

31	Scalability of fixed-radius searching in meshless methods for heterogeneous architectures Pols, LeRoi Vincent 12 1900 (has links) Thesis (MEng)--Stellenbosch University, 2014. / ENGLISH ABSTRACT: In this thesis we set out to design an algorithm for solving the all-pairs fixed-radius nearest neighbours search problem for a massively parallel heterogeneous system. The all-pairs search problem is stated as follows: Given a set of N points in d-dimensional space, find all pairs of points within a horizon distance of one another. This search is required by any nonlocal or meshless numerical modelling method to construct the neighbour list of each mesh point in the problem domain. Therefore, this work is applicable to a wide variety of fields, ranging from molecular dynamics to pattern recognition and geographical information systems. Here we focus on nonlocal solid mechanics methods. The basic method of solving the all-pairs search is to calculate, for each mesh point, the distance to each other mesh point and compare with the horizon value to determine if the points are neighbours. This can be a very computationally intensive procedure, especially if the neighbourhood needs to be updated at every time step to account for changes in material configuration. The problem also becomes more complex if the analysis is done in parallel. Furthermore, GPU computing has become very popular in the last decade. Most of the fastest supercomputers in the world today employ GPU processors as accelerators to CPU processors. It is also believed that the next-generation exascale supercomputers will be heterogeneous. Therefore the focus is on how to develop a neighbour searching algorithm that will take advantage of next-generation hardware. In this thesis we propose a CPU - multi GPU algorithm, which is an extension of the fixed-grid method, for the fixed-radius nearest neighbours search on massively parallel systems. / AFRIKAANSE OPSOMMING: In hierdie tesis het ons die ontwerp van ’n algoritme vir die oplossing van die alle-pare vaste-radius naaste bure soektog probleem vir groot skaal parallele heterogene stelsels aangepak. Die alle-pare soektog probleem is as volg gestel: Gegewe ’n stel van N punte in d-dimensionele ruimte, vind al die pare van punte wat binne ’n horison afstand van mekaar af is. Die soektog word deur enige nie-lokale of roosterlose numeriese metode benodig om die bure-lys van alle rooster-punte in die probleem te kry. Daarom is hierdie werk van toepassing op ’n wye verskeidenheid van velde, wat wissel van molekulêre dinamika tot patroon herkenning en geografiese inligtingstelsels. Hier is ons fokus op nie-lokale soliede meganika metodes. Die basiese metode vir die oplossing van die alle-pare soektog is om vir elke rooster-punt, die afstand na elke ander rooster-punt te bereken en te vergelyk met die horison lente, om dus so te bepaal of die punte bure is. Dit kan ’n baie berekenings intensiewe proses wees, veral as die probleem by elke stap opgedateer moet word om die veranderinge in die materiaal konfigurasie daar te stel. Die probleem word ook baie meer kompleks as die analise in parallel gedoen word. Verder het GVE’s (Grafiese verwerkings eenhede) baie gewild geword in die afgelope dekade. Die meeste van die vinnigste superrekenaars in die wêreld vandag gebruik GVE’s as versnellers te same met SVE’s (Sentrale verwerkings eenhede). Dit is ook van mening dat die volgende generasie exa-skaal superrekenaars GVE’s sal implementeer. Daarom is die fokus op hoe om ’n bure-lys soektog algoritme te ontwikkel wat gebruik sal maak van die volgende generasie hardeware. In hierdie tesis stel ons ’n SVE - veelvoudige GVE algoritme voor, wat ’n verlenging van die vaste-rooster metode is, vir die vaste-radius naaste bure soektog op groot skaal parallele stelsels. Solid mechanics Neighbour searching Meshfree methods (Numerical analysis) GPU computing Theses -- Civil engineering Dissertations -- Civil engineering Parallel algorihms Heterogeneous computing UCTD
32	Proteins, anatomy and networks of the fruit fly brain Knowles-Barley, Seymour Francis January 2012 (has links) Our understanding of the complexity of the brain is limited by the data we can collect and analyze. Because of experimental limitations and a desire for greater detail, most investigations focus on just one aspect of the brain. For example, brain function can be studied at many levels of abstraction including, but not limited to, gene expression, protein interactions, anatomical regions, neuronal connectivity, synaptic plasticity, and the electrical activity of neurons. By focusing on each of these levels, neuroscience has built up a detailed picture of how the brain works, but each level is understood mostly in isolation from the others. It is likely that interaction between all these levels is just as important. Therefore, a key hypothesis is that functional units spanning multiple levels of biological organization exist in the brain. This project attempted to combine neuronal circuitry analysis with functional proteomics and anatomical regions of the brain to explore this hypothesis, and took an evolutionary view of the results obtained. During the process we had to solve a number of technical challenges as the tools to undertake this type of research did not exist. Two informatics challenges for this research were to develop ways to analyze neurobiological data, such as brain protein expression patterns, to extract useful information, and how to share and present this data in a way that is fast and easy for anyone to access. This project contributes towards a more wholistic understanding of the fruit fly brain in three ways. Firstly, a screen was conducted to record the expression of proteins in the brain of the fruit fly, Drosophila melanogaster. Protein expression patterns in the fruit fly brain were recorded from 535 protein trap lines using confocal microscopy. A total of 884 3D images were annotated and made available on an easy to use website database, BrainTrap, available at fruitfly.inf.ed.ac.uk/braintrap. The website allows 3D images of the protein expression to be viewed interactively in the web browser, and an ontology-based search tool allows users to search for protein expression patterns in specific areas of interest. Different expression patterns mapped to a common template can be viewed simultaneously in multiple colours. This data bridges the gap between anatomical and biomolecular levels of understanding. Secondly, protein trap expression patterns were used to investigate the properties of the fruit fly brain. Thousands of protein-protein interactions have been recorded by methods such as yeast two-hybrid, however many of these protein pairs do not express in the same regions of the fruit fly brain. Using 535 protein expression patterns it was possible to rule out 149 protein-protein interactions. Also, protein expression patterns registered against a common template brain were used to produce new anatomical breakdowns of the fruit fly brain. Clustering techniques were able to naturally segment brain regions based only on the protein expression data. This is just one example of how, by combining proteomics with anatomy, we were able to learn more about both levels of understanding. Results are analysed further in combination with networks such as genetic homology networks, and connectivity networks. We show how the wealth of biological and neuroscience data now available in public databases can be combined with the Brain- Trap data to reveal similarities between areas of the fruit fly and mammalian brain. The BrainTrap data also informs us on the process of evolution and we show that genes found in fruit fly, yeast and mouse are more likely to be generally expressed throughout the brain, whereas genes found only in fruit fly and mouse, but not yeast, are more likely to have a specific expression pattern in the fruit fly brain. Thus, by combining data from multiple sources we can gain further insight into the complexity of the brain. Neural connectivity data is also analyzed and a new technique for enhanced motifs is developed for the combined analysis of connectivity data with other information such as neuron type data and potentially protein expression data. Thirdly, I investigated techniques for imaging the protein trap lines at higher resolution using electron microscopy (EM) and developed new informatics techniques for the automated analysis of neural connectivity data collected from serial section transmission electron microscopy (ssTEM). Measurement of the connectivity between neurons requires high resolution imaging techniques, such as electron microscopy, and images produced by this method are currently annotated manually to produce very detailed maps of cell morphology and connectivity. This is an extremely time consuming process and the volume of tissue and number of neurons that can be reconstructed is severely limited by the annotation step. I developed a set of computer vision algorithms to improve the alignment between consecutive images, and to perform partial annotation automatically by detecting membrane, synapses and mitochondria present in the images. Accuracy of the automatic annotation was evaluated on a small dataset and 96% of membrane could be identified at the cost of 13% false positives. This research demonstrates that informatics technology can help us to automatically analyze biological images and bring together genetic, anatomical, and connectivity data in a meaningful way. This combination of multiple data sources reveals more detail about each individual level of understanding, and gives us a more wholistic view of the fruit fly brain. 578.012
33	Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit Delorme, Michael Christopher 18 March 2013 (has links) We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at the begining and end of the sort. Our fine-grained implementation utilizes the APU’s integrated memory system to share data throughout the sort. Both these implementations outperform the current state of the art GPU radix sort from NVIDIA. We therefore demonstrate that the CPU can be efficiently used to speed up radix sort on the APU. Our fine-grained implementation slightly outperforms our coarse-grained implementation. This demonstrates the benefit of the APU’s integrated architecture. This performance benefit is hindered by limitations in the APU’s architecture and programming model. We believe that the performance benefits will increase once these limitations are addressed in future generations of the APU. Parallel sorting Radix sort Heterogeneous computing GPU GPGPU AMD Fusion Llano APU Accelerated Processing Unit OpenCL Fusion Sort GPU computing 0984
34	A model of dynamic compilation for heterogeneous compute platforms Kerr, Andrew 10 December 2012 (has links) Trends in computer engineering place renewed emphasis on increasing parallelism and heterogeneity. The rise of parallelism adds an additional dimension to the challenge of portability, as different processors support different notions of parallelism, whether vector parallelism executing in a few threads on multicore CPUs or large-scale thread hierarchies on GPUs. Thus, software experiences obstacles to portability and efficient execution beyond differences in instruction sets; rather, the underlying execution models of radically different architectures may not be compatible. Dynamic compilation applied to data-parallel heterogeneous architectures presents an abstraction layer decoupling program representations from optimized binaries, thus enabling portability without encumbering performance. This dissertation proposes several techniques that extend dynamic compilation to data-parallel execution models. These contributions include: - characterization of data-parallel workloads - machine-independent application metrics - framework for performance modeling and prediction - execution model translation for vector processors - region-based compilation and scheduling We evaluate these claims via the development of a novel dynamic compilation framework, GPU Ocelot, with which we execute real-world workloads from GPU computing. This enables the execution of GPU computing workloads to run efficiently on multicore CPUs, GPUs, and a functional simulator. We show data-parallel workloads exhibit performance scaling, take advantage of vector instruction set extensions, and effectively exploit data locality via scheduling which attempts to maximize control locality. Dynamic compilation GPU computing Cuda Opencl SIMD Vector Multicore Parallel computing Parallel computers Parallel programs (Computer programs) Heterogeneous computing High performance computing
35	Parallel Sorting on the Heterogeneous AMD Fusion Accelerated Processing Unit Delorme, Michael Christopher 18 March 2013 (has links) We explore efficient parallel radix sort for the AMD Fusion Accelerated Processing Unit (APU). Two challenges arise: efficiently partitioning data between the CPU and GPU and the allocation of data in memory regions. Our coarse-grained implementation utilizes both the GPU and CPU by sharing data at the begining and end of the sort. Our fine-grained implementation utilizes the APU’s integrated memory system to share data throughout the sort. Both these implementations outperform the current state of the art GPU radix sort from NVIDIA. We therefore demonstrate that the CPU can be efficiently used to speed up radix sort on the APU. Our fine-grained implementation slightly outperforms our coarse-grained implementation. This demonstrates the benefit of the APU’s integrated architecture. This performance benefit is hindered by limitations in the APU’s architecture and programming model. We believe that the performance benefits will increase once these limitations are addressed in future generations of the APU. Parallel sorting Radix sort Heterogeneous computing GPU GPGPU AMD Fusion Llano APU Accelerated Processing Unit OpenCL Fusion Sort GPU computing 0984
36	A GPU Accelerated Tensor Spectral Method for Subspace Clustering Pai, Nithish January 2016 (has links) (PDF) In this thesis we consider the problem of clustering the data lying in a union of subspaces using spectral methods. Though the data generated may have high dimensionality, in many of the applications, such as motion segmentation and illumination invariant face clustering, the data resides in a union of subspaces having small dimensions. Furthermore, for a number of classification and inference problems, it is often useful to identify these subspaces and work with data in this smaller dimensional manifold. If the observations in each cluster were to be distributed around a centric, applying spectral clustering on an a nifty matrix built using distance based similarity measures between the data points have been used successfully to solve the problem. But it has been observed that using such pair-wise distance based measure between the data points to construct a similarity matrix is not sufficient to solve the subspace clustering problem. Hence, a major challenge is to end a similarity measure that can capture the information of the subspace the data lies in. This is the motivation to develop methods that use an affinity tensor by calculating similarity between multiple data points. One can then use spectral methods on these tensors to solve the subspace clustering problem. In order to keep the algorithm computationally feasible, one can employ column sampling strategies. However, the computational costs for performing the tensor factorization increases very quickly with increase in sampling rate. Fortunately, the advances in GPU computing has made it possible to perform many linear algebra operations several order of magnitudes faster than traditional CPU and multicourse computing. In this work, we develop parallel algorithms for subspace clustering on a GPU com-putting environment. We show that this gives us a significant speedup over the implementations on the CPU, which allows us to sample a larger fraction of the tensor and thereby achieve better accuracies. We empirically analyze the performance of these algorithms on a number of synthetically generated subspaces con gyrations. We ally demonstrate the effectiveness of these algorithms on the motion segmentation, handwritten digit clustering and illumination invariant face clustering and show that the performance of these algorithms are comparable with the state of the art approaches. Subspace Clustering Tensors Spectral Method Hypergraphs and Tensors Tensor Factorization Spectral Clustering based Algorithms GPU Accelerated Algorithm GPU Computing Computer Science
37	Analysis, Implementation and Evaluation of Direction Finding Algorithms using GPU Computing / Analys, implementering och utvärdering av riktningsbestämningsalgoritmer på GPU Andersdotter, Regina January 2022 (has links) Direction Finding (DF) algorithms are used by the Swedish Defence Research Agency (FOI) in the context of electronic warfare against radio. Parallelizing these algorithms using a Graphics Processing Unit (GPU) might improve performance, and thereby increase military support capabilities. This thesis selects the DF algorithms Correlative Interferometer (CORR), Multiple Signal Classification (MUSIC) and Weighted Subspace Fitting (WSF), and examines to what extent GPU implementation of these algorithms is suitable, by analysing, implementing and evaluating. Firstly, six general criteria for GPU suitability are formulated. Then the three algorithms are analyzed with regard to these criteria, giving that MUSIC and WSF are both 58% suitable, closely followed by CORR on 50% suitability. MUSIC is selected for implementation, and an open source implementation is extended to three versions: a multicore CPU version, a GPU version (with Eigenvalue Decomposition (EVD) and pseudo spectrum calculation performed on the GPU), and a MIXED version (with only pseudo spectrum calculation on the GPU). These versions are then evaluated for angle resolutions between 1° and 0.025°, and CUDA block sizes between 8 and 1024. It is found that the GPU version is faster than the CPU version for angle resolutions above 0.1°, and the largest measured speedup is 1.4 times. The block size has no large impact on the total runtime. In conclusion, the overall results indicate that it is not entirely suitable, yet somewhat beneficial for large angle resolutions, to implement MUSIC using GPU computing. GPU GPU Computing Direction Finding GPU Suitability CUDA Multiple Signal Classification Weighted Subspace Fitting Correlative Interferometer runtime angle resolution block size Computer Sciences Datavetenskap (datalogi)
38	3D Printable Designs of Rigid and Deformable Models Yao, Miaojun January 2017 (has links) No description available. Computer Engineering Computer Science
39	[en] TOWARD GPU-BASED GROUND STRUCTURES FOR LARGE SCALE TOPOLOGY OPTIMIZATION / [pt] OTIMIZAÇÃO TOPOLÓGICA DE ESTRUTURAS DE GRANDE PORTE UTILIZANDO O MÉTODO DE GROUND STRUCTURES EM GPU ARTURO ELI CUBAS RODRIGUEZ 14 May 2019 (has links) [pt] A otimização topológica tem como objetivo encontrar a distribuição mais eficiente de material em um domínio especificado sem violar as restrições de projeto definidas pelo usuário. Quando aplicada a estruturas contínuas, a otimização topológica é geralmente realizada por meio de métodos de densidade, conhecidos na literatura técnica. Neste trabalho, daremos ênfase à aplicação de sua formulação discreta, na qual um determinado domínio é discretizado na forma de uma estrutura base, ou seja, uma distribuição espacial finita de nós conectados entre si por meio de barras de treliça. O método de estrutura base fornece uma aproximação para as estruturas de Michell, que são compostas por um número infinito de barras, por meio de um número reduzido de elementos de treliça. O problema de determinar a estrutura final com peso mínimo, para um único caso de carregamento, considerando um comportamento linear elástico do material e restrições de tensão, pode ser formulado como um problema de programação linear. O objetivo deste trabalho é fornecer uma implementação escalável para o problema de otimização de treliças com peso mínimo, considerando domínios com geometrias arbitrárias. O método remove os elementos que são desnecessários, partindo de uma treliça cujo grau de conectividade é definido pelo usuário, mantendo-se fixos os pontos nodais. Propomos uma implementação escalável do método de estrutura base, utilizando um algoritmo de pontos interiores eficiente e robusto, em um ambiente de computação paralela (envolvendo unidades de processamento gráfico ou GPUs). Os resultados apresentados, em estruturas bi e tridimensionais com milhões de barras, ilustram a viabilidade e a eficiência computacional da implementação proposta. / [en] Topology optimization aims to find the most efficient material distribution in a specified domain without violating user-defined design constraints. When applied to continuum structures, topology optimization is usually performed by means of the well-known density methods. In this work we focus on the application of its discrete formulation where a given domain is discretized into a ground structure, i.e., a finite spatial distribution of nodes connected using truss members. The ground structure method provides an approximation to optimal Michell-type structures, composed of an infinite number of members, by using a reduced number of truss members. The optimal least weight truss for a single load case, under linear elastic conditions, subjected to stress constraints can be posed as a linear programming problem. The aim of this work is to provide a scalable implementation for the optimization of least weight trusses embedded in any domain geometry. The method removes unnecessary members from a truss that has a user-defined degree of connectivity while keeping the nodal locations fixed. We discuss in detail the scalable implementation of the ground structure method using an efficient and robust interior point algorithm within a parallel computing environment (involving Graphics Processing Units or GPUs). The capabilities of the proposed implementation is illustrated by means of large scale applications on practical problems with millions of members in both 2D and 3D structures. [pt] METODO DOS ELEMENTOS FINITOS [en] FINITE ELEMENT METHOD [pt] OTIMIZACAO TOPOLOGICA [en] TOPOLOGY OPTIMIZATION [pt] OTIMIZACAO LINEAR [en] LINEAR OPTIMIZATION [pt] COMPUTACAO DE ALTO DESEMPENHO [en] HIGH PERFORMANCE COMPUTING [pt] COMPUTACAO EM GPU [en] GPU COMPUTING [pt] RESOLVEDORES DE SISTEMAS [en] LINEAR EQUATIONS SOLVERS
40	Echantillonage d'importance des sources de lumières réalistes / Importance Sampling of Realistic Light Sources Lu, Heqi 27 February 2014 (has links) On peut atteindre des images réalistes par la simulation du transport lumineuse avec des méthodes de Monte-Carlo. La possibilité d’utiliser des sources de lumière réalistes pour synthétiser les images contribue grandement à leur réalisme physique. Parmi les modèles existants, ceux basés sur des cartes d’environnement ou des champs lumineuse sont attrayants en raison de leur capacité à capter fidèlement les effets de champs lointain et de champs proche, aussi bien que leur possibilité d’être acquis directement. Parce que ces sources lumineuses acquises ont des fréquences arbitraires et sont éventuellement de grande dimension (4D), leur utilisation pour un rendu réaliste conduit à des problèmes de performance.Dans ce manuscrit, je me concentre sur la façon d’équilibrer la précision de la représentation et de l’efficacité de la simulation. Mon travail repose sur la génération des échantillons de haute qualité à partir des sources de lumière par des estimateurs de Monte-Carlo non-biaisés. Dans ce manuscrit, nous présentons trois nouvelles méthodes.La première consiste à générer des échantillons de haute qualité de manière efficace à partir de cartes d’environnement dynamiques (i.e. qui changent au cours du temps). Nous y parvenons en adoptant une approche GPU qui génère des échantillons de lumière grâce à une approximation du facteur de forme et qui combine ces échantillons avec ceux issus de la BRDF pour chaque pixel d’une image. Notre méthode est précise et efficace. En effet, avec seulement 256 échantillons par pixel, nous obtenons des résultats de haute qualité en temps réel pour une résolution de 1024 × 768. La seconde est une stratégie d’échantillonnage adaptatif pour des sources représente comme un "light field". Nous générons des échantillons de haute qualité de manière efficace en limitant de manière conservative la zone d’échantillonnage sans réduire la précision. Avec une mise en oeuvre sur GPU et sans aucun calcul de visibilité, nous obtenons des résultats de haute qualité avec 200 échantillons pour chaque pixel, en temps réel et pour une résolution de 1024×768. Le rendu est encore être interactif, tant que la visibilité est calculée en utilisant notre nouvelle technique de carte d’ombre (shadow map). Nous proposons également une approche totalement non-biaisée en remplaçant le test de visibilité avec une approche CPU. Parce que l’échantillonnage d’importance à base de lumière n’est pas très efficace lorsque le matériau sous-jacent de la géométrie est spéculaire, nous introduisons une nouvelle technique d’équilibrage pour de l’échantillonnage multiple (Multiple Importance Sampling). Cela nous permet de combiner d’autres techniques d’échantillonnage avec le notre basé sur la lumière. En minimisant la variance selon une approximation de second ordre, nous sommes en mesure de trouver une bonne représentation entre les différentes techniques d’échantillonnage sans aucune connaissance préalable. Notre méthode est pertinence, puisque nous réduisons effectivement en moyenne la variance pour toutes nos scènes de test avec différentes sources de lumière, complexités de visibilité et de matériaux. Notre méthode est aussi efficace par le fait que le surcoût de notre approche «boîte noire» est constant et représente 1% du processus de rendu dans son ensemble. / Realistic images can be rendered by simulating light transport with Monte Carlo techniques. The possibility to use realistic light sources for synthesizing images greatly contributes to their physical realism. Among existing models, the ones based on environment maps and light fields are attractive due to their ability to capture faithfully the far-field and near-field effects as well as their possibility of being acquired directly. Since acquired light sources have arbitrary frequencies and possibly high dimension (4D), using such light sources for realistic rendering leads to performance problems.In this thesis, we focus on how to balance the accuracy of the representation and the efficiency of the simulation. Our work relies on generating high quality samples from the input light sources for unbiased Monte Carlo estimation. In this thesis, we introduce three novel methods.The first one is to generate high quality samples efficiently from dynamic environment maps that are changing over time. We achieve this by introducing a GPU approach that generates light samples according to an approximation of the form factor and combines the samples from BRDF sampling for each pixel of a frame. Our method is accurate and efficient. Indeed, with only 256 samples per pixel, we achieve high quality results in real time at 1024 × 768 resolution. The second one is an adaptive sampling strategy for light field light sources (4D), we generate high quality samples efficiently by restricting conservatively the sampling area without reducing accuracy. With a GPU implementation and without any visibility computations, we achieve high quality results with 200 samples per pixel in real time at 1024 × 768 resolution. The performance is still interactive as long as the visibility is computed using our shadow map technique. We also provide a fully unbiased approach by replacing the visibility test with a offline CPU approach. Since light-based importance sampling is not very effective when the underlying material of the geometry is specular, we introduce a new balancing technique for Multiple Importance Sampling. This allows us to combine other sampling techniques with our light-based importance sampling. By minimizing the variance based on a second-order approximation, we are able to find good balancing between different sampling techniques without any prior knowledge. Our method is effective, since we actually reduce in average the variance for all of our test scenes with different light sources, visibility complexity, and materials. Our method is also efficient, by the fact that the overhead of our "black-box" approach is constant and represents 1% of the whole rendering process. Informatique graphique GPGPU Champ lumineux Échantillonnage préférentiel Monte Carlo Sources de lumière réalistes Rendu en temps réel Eclairage à base d'images Computer Graphics Physically-based Rendering Real-time rendering GPU Computing Light field Importance Sampling Monte Carlo Realistic light sources Image-based lighting

Search results