Global ETD Search

171	Global Illumination in Real-Time using Voxel Cone Tracing on Mobile Devices / Global illuminering i realtid på mobila enheter Wahlén, Conrad January 2016 (has links) This thesis explores Voxel Cone Tracing as a possible Global Illumination solutionon mobile devices.The rapid increase of performance on low-power graphics processors hasmade a big impact. More advanced computer graphics algorithms are now possi-ble on a new range of devices. One category of such algorithms is Global Illumi-nation, which calculates realistic lighting in rendered scenes. The combinationof advanced graphics and portability is of special interest to implement in newtechnologies like Virtual Reality.The result of this thesis shows that while possible to implement a state of theart Global Illumination algorithm, the performance of mobile Graphics Process-ing Units is still not enough to make it usable in real-time. global illumination mobile android voxel cone tracing opengl graphics gpgpu light simulation Computer Systems Datorsystem
172	Enhancing productivity and performance portability of OpenCL applications on heterogeneous systems using runtime optimizations Lutz, Thibaut January 2015 (has links) Initially driven by a strong need for increased computational performance in science and engineering, heterogeneous systems have become ubiquitous and they are getting increasingly complex. The single processor era has been replaced with multi-core processors, which have quickly been surrounded by satellite devices aiming to increase the throughput of the entire system. These auxiliary devices, such as Graphics Processing Units, Field Programmable Gate Arrays or other specialized processors have very different architectures. This puts an enormous strain on programming models and software developers to take full advantage of the computing power at hand. Because of this diversity and the unachievable flexibility and portability necessary to optimize for each target individually, heterogeneous systems remain typically vastly under-utilized. In this thesis, we explore two distinct ways to tackle this problem. Providing automated, non intrusive methods in the form of compiler tools and implementing efficient abstractions to automatically tune parameters for a restricted domain are two complementary approaches investigated to better utilize compute resources in heterogeneous systems. First, we explore a fully automated compiler based approach, where a runtime system analyzes the computation flow of an OpenCL application and optimizes it across multiple compute kernels. This method can be deployed on any existing application transparently and replaces significant software engineering effort spent to tune application for a particular system. We show that this technique achieves speedups of up to 3x over unoptimized code and an average of 1.4x over manually optimized code for highly dynamic applications. Second, a library based approach is designed to provide a high level abstraction for complex problems in a specific domain, stencil computation. Using domain specific techniques, the underlying framework optimizes the code aggressively. We show that even in a restricted domain, automatic tuning mechanisms and robust architectural abstraction are necessary to improve performance. Using the abstraction layer, we demonstrate strong scaling of various applications to multiple GPUs with a speedup of up to 1.9x on two GPUs and 3.6x on four. 006.6
173	GPU-Accelerated Contour Extraction on Large Images Using Snakes Kienel, Enrico, Brunnett, Guido 16 February 2009 (has links) (PDF) Active contours have been proven to be a powerful semiautomatic image segmentation approach, that seems to cope with many applications and different image modalities. However, they exhibit inherent drawbacks, including the sensibility to contour initialization due to the limited capture range of image edges and problems with concave boundary regions. The Gradient Vector Flow replaces the traditional image force and provides an enlarged capture range as well as enhanced concavity extraction capabilities, but it involves an expensive computational effort and considerably increased memory requirements at the time of computation. In this paper, we present an enhancement of the active contour model to facilitate semiautomatic contour detection in huge images. We propose a tile-based image decomposition accompanying an image force computation scheme on demand in order to minimize both computational and memory requirements. We show an efficient implementation of this approach on the basis of general purpose GPU processing providing for continuous active contour deformation without a considerable delay. Active Contours GPGPU Gradient Vector Flow Image Segmentation Snakes Tiling ddc:004 Bildsegmentierung Computergraphik Konturfindung
174	Study, Modelling and Implementation of the Level Set Method Used in Micromachining Processes Montoliu Álvaro, Carles 09 December 2015 (has links) [EN] The main topic of the present thesis is the improvement of fabrication processes simulation by means of the Level Set (LS) method. The LS is a mathematical approach used for evolving fronts according to a motion defined by certain laws. The main advantage of this method is that the front is embedded inside a higher dimensional function such that updating this function instead of directly the front itself enables a trivial handling of complex situations like the splitting or coalescing of multiple fronts. In particular, this document is focused on wet and dry etching processes, which are widely used in the micromachining process of Micro-Electro-Mechanical Systems (MEMS). A MEMS is a system formed by mechanical elements, sensors, actuators, and electronics. These devices have gained a lot of popularity in last decades and are employed in several industry fields such as automotive security, motion sensors, and smartphones. Wet etching process consists in removing selectively substrate material (e.g. silicon or quartz) with a liquid solution in order to form a certain structure. This is a complex process since the result of a particular experiment depends on many factors, such as crystallographic structure of the material, etchant solution or its temperature. Similarly, dry etching processes are used for removing substrate material, however, gaseous substances are employed in the etching stage. In both cases, the usage of a simulator capable of predicting accurately the result of a certain experiment would imply a significant reduction of design time and costs. There exist a few LS-based wet etching simulators but they have many limitations and they have never been validated with real experiments. On the other hand, atomistic models are currently considered the most advanced simulators. Nevertheless, atomistic simulators present some drawbacks like the requirement of a prior calibration process in order to use the experimental data. Additionally, a lot of effort must be invested to create an atomistic model for simulating the etching process of substrate materials with different atomistic structures. Furthermore, the final result is always formed by unconnected atoms, which makes difficult a proper visualization and understanding of complex structures, thus, usually an additional visualization technique must be employed. For its part, dry etching simulators usually employ an explicit representation technique to evolve the surface being etched according to etching models. This strategy can produce unrealistic results, specially in complex situations like the interaction of multiple surfaces. Despite some models that use implicit representation have been published, they have never been directly compared with real experiments and computational performance of the implementations have not been properly analysed. The commented limitations are addressed in the various chapters of the present thesis, producing the following contributions: - An efficient LS implementation in order to improve the visual representation of atomistic wet etching simulators. This implementation produces continuous surfaces from atomistic results. - Definition of a new LS-based model which can directly use experimental data of many etchant solutions (such as KOH, TMAH, NH4HF2, and IPA and Triton additives) to simulate wet etching processes of various substrate materials (e.g. silicon and quartz). - Validation of the developed wet etching simulator by comparing it to experimental and atomistic simulator results. - Implementation of a LS-based tool which evolves the surface being etched according to dry etching models in order to enable the simulation of complex processes. This implementation is also validated experimentally. - Acceleration of the developed wet and dry etching simulators by using Graphics Processing Units (GPUs). / [ES] El tema principal de la presente tesis consiste en mejorar la simulación de los procesos de fabricación utilizando el método Level Set (LS). El LS es una técnica matemática utilizada para la evolución de frentes según un movimiento definido por unas leyes. La principal ventaja de este método es que el frente está embebido dentro de una función definida en una dimensión superior. Actualizar dicha función en lugar del propio frente permite tratar de forma trivial situaciones complejas como la separación o la colisión de diversos frentes. En concreto, este documento se centra en los procesos de atacado húmedo y seco, los cuales son ampliamente utilizados en el proceso de fabricación de Sistemas Micro-Electro-Mecánicos (MEMS, de sus siglas en inglés). Un MEMS es un sistema formado por elementos mecánicos, sensores, actuadores y electrónica. Estos dispositivos hoy en día son utilizados en muchos campos de la industria como la seguridad automovilística, sensores de movimiento y teléfonos inteligentes. El proceso de atacado húmedo consiste en eliminar de forma selectiva el material del sustrato (por ejemplo, silicio o cuarzo) con una solución líquida con el fin de formar una estructura específica. Éste es un proceso complejo pues el resultado depende de muchos factores, tales como la estructura cristalográfica del material, la solución atacante o su temperatura. De forma similar, los procesos de atacado seco son utilizados para eliminar el material del sustrato, sin embargo, se utilizan sustancias gaseosas en la fase de atacado. En ambos casos, la utilización de un simulador capaz de predecir de forma precisa el resultado de un experimento concreto implicaría una reducción significativa del tiempo de diseño y de los costes. Existen unos pocos simuladores del proceso de atacado húmedo basados en el método LS, no obstante tienen muchas limitaciones y nunca han sido validados con experimentos reales. Por otro lado, los simuladores atomísticos son hoy en día considerados los simuladores más avanzados pero tienen algunos inconvenientes como la necesidad de un proceso de calibración previo para poder utilizar los datos experimentales. Además, debe invertirse mucho esfuerzo para crear un modelo atomístico para la simulación de materiales de sustrato con distintas estructuras atomísticas. Asimismo, el resultado final siempre está formado por átomos inconexos que dificultan una correcta visualización y un correcto entendimiento de aquellas estructuras complejas, por tanto, normalmente debe emplearse una técnica adicional para la visualización de dichos resultados. Por su parte, los simuladores del proceso de atacado seco normalmente utilizan técnicas de representación explícita para evolucionar, según los modelos de atacado, la superficie que está siendo atacada. Esta técnica puede producir resultados poco realistas, sobre todo en situaciones complejas como la interacción de múltiples superficies. A pesar de que unos pocos modelos son capaces de solventar estos problemas, nunca han sido comparados con experimentos reales ni el rendimiento computacional de las correspondientes implementaciones ha sido adecuadamente analizado. Las expuestas limitaciones son abordadas en la presente tesis y se han producido las siguientes contribuciones: - Implementación eficiente del método LS para mejorar la representación visual de los simuladores atomísticos del proceso de atacado húmedo. - Definición de un nuevo modelo basado en el LS que pueda usar directamente los datos experimentales de muchos atacantes para simular el proceso de atacado húmedo de diversos materiales de sustrato. - Validación del simulador comparándolo con resultados experimentales y con los de simuladores atomísticos. - Implementación de una herramienta basada en el método LS que evolucione la superficie que está siendo atacada según los modelos de atacado seco para habilitar la simulación de procesos comple / [CAT] El tema principal de la present tesi consisteix en millorar la simulació de processos de fabricació mitjançant el mètode Level Set (LS). El LS és una tècnica matemàtica utilitzada per a l'evolució de fronts segons un moviment definit per unes lleis en concret. El principal avantatge d'aquest mètode és que el front està embegut dins d'una funció definida en una dimensió superior. D'aquesta forma, actualitzar la dita funció en lloc del propi front, permet tractar de forma trivial situacions complexes com la separació o la col·lisió de diversos fronts. En concret, aquest document es centra en els processos d'atacat humit i sec, els quals són àmpliament utilitzats en el procés de fabricació de Sistemes Micro-Electro-Mecànics (MEMS, de les sigles en anglès). Un MEMS és un sistema format per elements mecànics, sensors, actuadors i electrònica. Aquests dispositius han guanyat molta popularitat en les últimes dècades i són utilitzats en molts camps de la indústria, com la seguretat automobilística, sensors de moviment i telèfons intel·ligents. El procés d'atacat humit consisteix en eliminar de forma selectiva el material del substrat (per exemple, silici o quars) amb una solució líquida, amb la finalitat de formar una estructura específica. Aquest és un procés complex ja que el resultat de un determinat experiment depèn de molts factors, com l'estructura cristal·logràfica del material, la solució atacant o la seva temperatura. De manera similar, els processos d'atacat sec son utilitzats per a eliminar el material del substrat, no obstant, s'utilitzen substàncies gasoses en la fase d'atacat. En ambdós casos, la utilització d'un simulador capaç de predir de forma precisa el resultat d'un experiment en concret implicaria una reducció significativa del temps de disseny i dels costos. Existeixen uns pocs simuladors del procés d'atacat humit basats en el mètode LS, no obstant tenen moltes limitacions i mai han sigut validats amb experiments reals. Per la seva part, els simuladors atomístics tenen alguns inconvenients com la necessitat d'un procés de calibratge previ per a poder utilitzar les dades experimentals. A més, deu invertir-se molt d'esforç per crear un model atomístic per a la simulació de materials de substrat amb diferents estructures atomístiques. Així mateix, el resultat final sempre està format per àtoms inconnexos que dificulten una correcta visualització i un correcte enteniment d'aquelles estructures complexes, per tant, normalment deu emprar-se una tècnica addicional per a la visualització d'aquests resultats. D'altra banda, els simuladors del procés d'atacat sec normalment utilitzen tècniques de representació explícita per evolucionar, segons els models d'atacat, la superfície que està sent atacada. Aquesta tècnica pot introduir resultats poc realistes, sobretot en situacions complexes com per exemple la interacció de múltiples superfícies. A pesar que uns pocs models son capaços de resoldre aquests problemes, mai han sigut comparats amb experiments reals ni tampoc el rendiment computacional de les corresponents implementacions ha sigut adequadament analitzat. Les exposades limitacions son abordades en els diferents capítols de la present tesi i s'han produït les següents contribucions: - Implementació eficient del mètode LS per millorar la representació visual dels simuladors atomístics del procés d'atacat humit. - Definició d'un nou model basat en el mètode LS que puga utilitzar directament les dades experimentals de molts atacants per a simular el procés d'atacat humit de diversos materials de substrat. - Validació del simulador d'atacat humit desenvolupat comparant-lo amb resultats experimentals i amb els de simuladors atomístics. - Implementació d'una ferramenta basada en el mètode LS que evolucione la superfície que està sent atacada segons els models d'atacat sec per, d'aquesta forma, habilitar la simulació de processo / Montoliu Álvaro, C. (2015). Study, Modelling and Implementation of the Level Set Method Used in Micromachining Processes [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/58609 / TESIS Level Set Wet Etching Dry Etching Parallel Computing CUDA GPGPU TECNOLOGIA ELECTRONICA
175	Inexact Mapping of Short Biological Sequences in High Performance Computational Environments Salavert Torres, José 30 October 2014 (has links) La bioinformática es la aplicación de las ciencias computacionales a la gestión y análisis de datos biológicos. A partir de 2005, con la aparición de los secuenciadores de ADN de nueva generación surge lo que se conoce como Next Generation Sequencing o NGS. Un único experimento biológico puesto en marcha en una máquina de secuenciación NGS puede producir fácilmente cientos de gigabytes o incluso terabytes de datos. Dependiendo de la técnica elegida este proceso puede realizarse en unas pocas horas o días. La disponibilidad de recursos locales asequibles, tales como los procesadores multinúcleo o las nuevas tarjetas gráfi cas preparadas para el cálculo de propósito general GPGPU (General Purpose Graphic Processing Unit ), constituye una gran oportunidad para hacer frente a estos problemas. En la actualidad, un tema abordado con frecuencia es el alineamiento de secuencias de ADN. En bioinformática, el alineamiento permite comparar dos o más secuencias de ADN, ARN, o estructuras primarias proteicas, resaltando sus zonas de similitud. Dichas similitudes podrían indicar relaciones funcionales o evolutivas entre los genes o proteínas consultados. Además, la existencia de similitudes entre las secuencias de un individuo paciente y de otro individuo con una enfermedad genética detectada podría utilizarse de manera efectiva en el campo de la medicina diagnóstica. El problema en torno al que gira el desarrollo de la tesis doctoral consiste en la localización de fragmentos de secuencia cortos dentro del ADN. Esto se conoce bajo el sobrenombre de mapeo de secuencia o sequence mapping. Dicho mapeo debe permitir errores, pudiendo mapear secuencias incluso existiendo variabilidad genética o errores de lectura en el mapeo. Existen diversas técnicas para abordar el mapeo, pero desde la aparición de la NGS destaca la búsqueda por pre jos indexados y agrupados mediante la transformada de Burrows-Wheeler [28] (o BWT en lo sucesivo). Dicha transformada se empleó originalmente en técnicas de compresión de datos, como es el caso del algoritmo bzip2. Su utilización como herramienta para la indización y búsqueda posterior de información es más reciente [22]. La ventaja es que su complejidad computacional depende únicamente de la longitud de la secuencia a mapear. Por otra parte, una gran cantidad de técnicas de alineamiento se basan en algoritmos de programación dinámica, ya sea Smith-Watterman o modelos ocultos de Markov. Estos proporcionan mayor sensibilidad, permitiendo mayor cantidad de errores, pero su coste computacional es mayor y depende del tamaño de la secuencia multiplicado por el de la cadena de referencia. Muchas herramientas combinan una primera fase de búsqueda con la BWT de regiones candidatas al alineamiento y una segunda fase de alineamiento local en la que se mapean cadenas con Smith-Watterman o HMM. Cuando estamos mapeando permitiendo pocos errores, una segunda fase con un algoritmo de programación dinámica resulta demasiado costosa, por lo que una búsqueda inexacta basada en BWT puede resultar más e ficiente. La principal motivación de la tesis doctoral es la implementación de un algoritmo de búsqueda inexacta basado únicamente en la BWT, adaptándolo a las arquitecturas paralelas modernas, tanto en CPU como en GPGPU. El algoritmo constituirá un método nuevo de rami cación y poda adaptado a la información genómica. Durante el periodo de estancia se estudiarán los Modelos ocultos de Markov y se realizará una implementación sobre modelos de computación funcional GTA (Aggregate o Test o Generate), así como la paralelización en memoria compartida y distribuida de dicha plataforma de programación funcional. / Salavert Torres, J. (2014). Inexact Mapping of Short Biological Sequences in High Performance Computational Environments [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/43721 / TESIS Inexact mapping Backward search BWT Burrows-Wheeler Transform Suffix Array GPGPU GPU
176	GPGPU-accelerated nonlinear state estimators : application to MPC-controlled bioreactor performance Roos, Darren Craig January 2021 (has links) Practical control problems are subject to dealing with instrumentation noise and inaccurate models. These can be modelled as measurement and state noise, respectively. Nonlinear state estimators, for example a particle filter, can be used to mitigate these effects. However, they are usually computationally expensive which makes them impractical for industrial use. This text investigates using General Purpose Graphics Processing Units (GPGPU) to improve the performance particle and Gaussian sum filters by parallelizing their prediction, update and resampling steps. GPGPU accelerated filters are found to outperform non-accelerated filters as the number of particle increases. GPGPU acceleration also allows particle filters with 2^19.5 particles to be used on systems with dynamic time constants on the order of 0.1 second and for Gaussian sum filters with 2^18.5 particles to be used with time constants on the order of 1 second. The filters are applied to a bioreactor system containing R. Oryzae, where MPC control is applied to the production phase fumaric acid and glucose concentrations. The bioreactor is modelled using results from Iplik (2017) and Swart (2019). It is found that the GPGPU filters improved run times allow for more particles to be used which provides increased filter accuracy and thus better performance. This improved performance comes at the cost of consuming more energy. Thus, it is believed that the GPGPU implementations should be used for applications with complex dynamics/noise that require large numbers of particles and/or high sampling rates. / Dissertation (MEng (Control Engineering))--University of Pretoria, 2021. / Chemical Engineering / MEng (Control Engineering) / Unrestricted State estimation GPGPU acceleration Gaussian sum filter Particle filter Numba/CuPy UCTD
177	GPUMap: A Transparently GPU-Accelerated Map Function Pachev, Ivan 01 March 2017 (has links) As GPGPU computing becomes more popular, it will be used to tackle a wider range of problems. However, due to the current state of GPGPU programming, programmers are typically required to be familiar with the architecture of the GPU in order to effectively program it. Fortunately, there are software packages that attempt to simplify GPGPU programming in higher-level languages such as Java and Python. However, these software packages do not attempt to abstract the GPU-acceleration process completely. Instead, they require programmers to be somewhat familiar with the traditional GPGPU programming model which involves some understanding of GPU threads and kernels. In addition, prior to using these software packages, programmers are required to transform the data they would like to operate on into arrays of primitive data. Typically, such software packages restrict the use of object-oriented programming when implementing the code to operate on this data. This thesis presents GPUMap, which is a proof-of-concept GPU-accelerated map function for Python. GPUMap aims to hide all the details of the GPU from the programmer, and allows the programmer to accelerate programs written in normal Python code that operate on arbitrarily nested objects using a majority of Python syntax. Using GPUMap, certain types of Python programs are able to be accelerated up to 100 times over normal Python code. There are also software packages that provide simplified GPU acceleration to distributed computing frameworks such as MapReduce and Spark. Unfortunately, these packages do not provide a completely abstracted GPU programming experience, which conflicts with the purpose of the distributed computing frameworks: to abstract the underlying distributed system. This thesis also presents GPU-accelerated RDD (GPURDD), which is a type of Spark Resilient Distributed Dataset (RDD) which incorporates GPUMap into its map, filter, and foreach methods in order to allow Spark applicatons to make use of the abstracted GPU acceleration provided by GPUMap. gpu gpgpu map spark python parallel Other Computer Sciences Programming Languages and Compilers
178	Simulace šíření ultrazvuku v kostech / Simulation of Ultrasound Propagation in Bones Kadlubiak, Kristián January 2017 (has links) It is estimated that mind-boggling 14.1 million new cases of cancer occurred worldwide in 2012 alone. This number is alarming. Although healthy lifestyle may reduce a risk of developing cancer, there is always some probability that cancer would develop even in an absolutely fit individual. There are two main conditions for successful treatment of cancer. Firstly, early diagnostic is absolutely crucial. Secondly, there is a need for suitable surgical methods for affected tissue removal. Ultrasound has a great potential to be used for both purposes as a non-invasive method. Photoacoustic spectroscopy is imaging method for tumor detection of great properties making the use of ultrasound while High-Intensity Focused Ultrasound (HIFU) is non-invasive surgical method. These methods would be impossible without precise ultrasound propagation simulations. The k-Wave is an open source MATLAB toolbox implementing such simulations. So, why are not these methods already deployed in treatment? Unfortunately, the simulation of ultrasound propagation is a very time consuming task, which makes it ineffective for medical purposes. However, there are a few options how to accelerate these simulations. The use of GPU is a very promising way to accelerate simulation. The main topic of this thesis is the acceleration of the simulation of soundwaves propagation in bones and hard tissue. The implementation developed as a part of this thesis was benchmarked on various supercomputers including Anselm in Ostrava and Piz Daint in Lugano. The implemented solution provides remarkable acceleration compared to the original MATLAB prototype. It was able to accelerate the simulation around 160 times in the best case. It means that the simulation, which would otherwise last for 6.5 days, can be now computed in one hour. This acceleration was achieved using an NVIDIA Tesla P100 to run the simulation with the domain size of 416x416x416 grid points. The thesis includes performance benchmarks on different GPUs to provide complex image acceleration capabilities of developed implementation and provides discussion about memory usage and numerical accuracy. Thanks to the implemented solution harnessing the power of modern GPUs, doctors and researchers all around the world have a powerful tool in hands.
179	Zpracování obrazu s velkými datovými toky - využití CUDA/OpenCL / High data rate image processing using CUDA/OpenCL Sedláček, Filip January 2018 (has links) The main objective of this research is to propose optimization of the defect detection algorithm in the production of nonwoven textile. The algorithm was developed by CAMEA spol. s.r.o. As a consequence of upgrading the current camera system to a more powerful one, it will be necessary to optimize the current algorithm and choose the hardware with the appropriate architecture on which the calculations will be performed. This work will describe a usefull programming techniques of CUDA software architecture and OpenCL framework in details. Using these tools, we proposed to implement a parallel equivalent of the current algorithm, describe various optimization methods, and we designed a GUI to test these methods.
180	Efektivní komunikace v multi-GPU systémech / Efficient Communication in Multi-GPU Systems Špeťko, Matej January 2018 (has links) After the introduction of CUDA by Nvidia, the GPUs became devices capable of accelerating any general purpose computation. GPUs are designed as parallel processors which posses huge computation power. Modern supercomputers are often equipped with GPU accelerators. Sometimes the performance or the memory capacity of a single GPU is not enough for a scientific application. The application needs to be scaled into multiple GPUs. During the computation there is need for the GPUs to exchange partial results. This communication represents computation overhead. For this reason it is important to research the methods of the effective communication between GPUs. This means less CPU involvement, lower latency, shared system buffers. Inter-node and intra-node communication is examined. The main focus is on GPUDirect technologies from Nvidia and CUDA-Aware MPI. Subsequently k-Wave toolbox for simulating the propagation of acoustic waves is introduced. This application is accelerated by using CUDA-Aware MPI.

Search results