Global ETD Search

11	Neural Network on Compute Shader : Running and Training a Neural Network using GPGPU Åström, Fredrik January 2011 (has links) In this thesis I look into how one can train and run an artificial neural network using Compute Shader and what kind of performance can be expected. An artificial neural network is a computational model that is inspired by biological neural networks, e.g. a brain. Finding what kind of performance can be expected was done by creating an implementation that uses Compute Shader and then compare it to the FANN library, i.e. a fast artificial neural network library written in C. The conclusion is that you can improve performance by training an artificial neural network on the compute shader as long as you are using non-trivial datasets and neural network configurations. Artificial Neural Network GPGPU Compute Shader Computer Sciences Datavetenskap (datalogi)
12	Predikce hodnot tíhových veličin na základě terestrických měření a digitálního modelu terénu / Prediction of gravity quantities values based on the terrestrial measurements and digital elevation model Letko, Ivan January 2013 (has links) The main objective of this master thesis is random equipartition concentration of measured gravimetric points in the area of interest pursuant by digital terrain model. Remove-Compute-Restore method was used for this purpose. In this thesis normal acceleration of gravity, topographic effect and Faye anomaly were subtracted from measured gravity. The result is Bouguer anomaly with general topographic effect which is interpolated for concentration points. We obtained predicated values of gravity after the restoration of subtracted effects. The main result of the thesis is the map of real gravity and precision evaluation of used method. Furthermore, the reductions of gravity, interpolation methods in programme ArcGIS, Remove-Compute-Restore method and the term of digital terrain model are explained in the thesis.
13	Radar and sea clutter simulation with Unity 3D game engine / Simulering av radar och sjöklotter med Unity 3D-spelmotor Johnsson, Mikael, Bergman, Linus January 2023 (has links) Game engines are well known for their use in the gaming industry but are starting to have an impact in other areas as well. Architecture, automotive, and the defence industry are today using these engines to visualise and, to some extent, test their products. In this thesis, we have examined how the game engine Unity could be used for simulating a radar with the purpose of detecting and measuring sea clutter. Following a pre-study examining different implementation approaches, it was decided to use ray tracing. The radar itself is simulated by using the camera to emit rays and having a plane object directly behind it act as a receiver. Rays are then individually traced for each pixel, propagating throughout the scene and saving information such as hit coordinates, distance travelled, and direction. By using the total travel distance of each ray that returned to the receiver, the phase of each ray is calculated. This is then used to compute the total amplitude, which represents the returned signal strength. Using a compute shader, most of the computations are done in parallel on the GPU, enabling millions of rays to be traced. As measuring sea clutter was an objective of the study, tests measuring the ocean were carried out. These used ocean surfaces with two different sea states, using the Phillips spectrum to generate realistic waves. A ship object was then tested in free space and on two different ocean surfaces. The calculated amplitude and the number of rays returned were used to determine the signal strength returned and the RCS of the object. The purpose of this was to compare with other results of sea clutter studied, observed both in the real world and in simulated scenarios, and determine if our approach could be a valid choice for the industry. Some results matched the findings of a similar study that used a professional radar simulation tool called OKTAL. Other results of sea clutter were found to not be realistic due to certain limitations. The current main limitation of our implementation is not being able to trace a large enough ocean surface with the finer details needed for realistic results. However, this could be solved by creating a better implementation. These findings suggest that simulating radar and sea clutter in Unity is a feasible approach worth continuing to explore. / Spelmotorer är välkända för sin användning inom spelindustrin men har också fått genomslag inom andra områden. Arkitektur, fordonsindustrin och försvarsindustrin använder idag dessa verktyg för att visualisera och till viss mån, även testa sina produkter. I detta examensarbete har vi undersökt hur spelmotorn Unity kan användas för att simulera en radar i syfte att detektera och mäta sjöklotter. Efter en förstudie där olika implementeringsmetoder undersöktes, beslutades det att använda strålspårning (eng. ray tracing). Själva radarn simuleras genom att använda kameraobjektet i Unity för att sända ut strålar. Bakom kameran finns ett planobjekt som fungerar som mottagare. Strålar spåras sedan individuellt för varje pixel och sprider sig genom en given scen. Samtidigt sparas information såsom träffkoordinater, den totala färdsträckan samt riktning. Genom att använda det totala färdavståndet för varje stråle som återvänt till mottagaren kan fasen för varje stråle beräknas. Detta kan sedan användas för att beräkna den totala returnerade amplituden, vilket motsvarar den returnerade signalstyrkan. Med hjälp av en "compute shader" kan databeräkningarna göras parallellt av GPU:n vilket underlättar när så många strålar ska spåras. Eftersom syftet med uppsatsen var mätning av simulerat sjöklotter, genomfördes tester för att mäta på ett simulerat hav. Havsytorna hade två olika sjöstadier, vilka genererades med Phillips-spektrumet för att få realistiska vågor. Ett fartygsobjekt testades sedan i frirymd och sedan även i de två olika havsytorna. Amplituden och mängden strålar som returnerades användes för att bestämma den totala returnerade signalstyrkan och "Radar Cross Section" (RCS) för objektet. Syftet med detta var att kunna jämföra med andra studier gällande sjöklotter, både simulerade som verklighetsbaserade och avgöra om vårt tillvägagångssätt kunde resultera i ett användbart verktyg för branschen. De olika amplituder och antalet strålar som vi fick tillbaka varierade beroende på vilka vinklar och havsytor som användes.Vissa resultat var inte realistiska jämfört med verkliga mätningar av sjöklotter. Det beror främst på våra nuvarande begränsningar i att inte kunna spåra en tillräckligt stor och tillräckligt detaljerad havsyta, vilket behövs för att mätningarna ska vara mer realistiska. Däremot matchade vi några resultat med de från en liknande studie, där verktyget OKTAL, som är ett professionellt radarsimuleringsverktyg, användes. Detta i kombination med möjligheterna för en förbättrad implementation tyder på att användningen av en spelmotor som Unity är ett intressant verktyg värd att vidareutforska radarsimuleringar med. radar simulation sea clutter simulation game engine Unity ray tracing compute shader graphics programming GPU radarsimulering sjöklottersimulering spelmotor Unity strålspårning compute shader grafisk programmering GPU Computer Engineering Datorteknik
14	High performance bioinformatics and computational biology on general-purpose graphics processing units Ling, Cheng January 2012 (has links) Bioinformatics and Computational Biology (BCB) is a relatively new multidisciplinary field which brings together many aspects of the fields of biology, computer science, statistics, and engineering. Bioinformatics extracts useful information from biological data and makes these more intuitive and understandable by applying principles of information sciences, while computational biology harnesses computational approaches and technologies to answer biological questions conveniently. Recent years have seen an explosion of the size of biological data at a rate which outpaces the rate of increases in the computational power of mainstream computer technologies, namely general purpose processors (GPPs). The aim of this thesis is to explore the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high performance and efficient implementation of BCB applications in order to meet the demands of biological data increases at affordable cost. The thesis presents detailed design and implementations of GPU solutions for a number of BCB algorithms in two widely used BCB applications, namely biological sequence alignment and phylogenetic analysis. Biological sequence alignment can be used to determine the potential information about a newly discovered biological sequence from other well-known sequences through similarity comparison. On the other hand, phylogenetic analysis is concerned with the investigation of the evolution and relationships among organisms, and has many uses in the fields of system biology and comparative genomics. In molecular-based phylogenetic analysis, the relationship between species is estimated by inferring the common history of their genes and then phylogenetic trees are constructed to illustrate evolutionary relationships among genes and organisms. However, both biological sequence alignment and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with the size of sequence databases. The thesis firstly presents a multi-threaded parallel design of the Smith- Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A novel technique is put forward to solve the restriction on the length of the query sequence in previous GPU-based implementations of the SW algorithm. Based on this implementation, the difference between two main task parallelization approaches (Inter-task and Intra-task parallelization) is presented. The resulting GPU implementation matches the speed of existing GPU implementations while providing more flexibility, i.e. flexible length of sequences in real world applications. It also outperforms an equivalent GPPbased implementation by 15x-20x. After this, the thesis presents the first reported multi-threaded design and GPU implementation of the Gapped BLAST with Two-Hit method algorithm, which is widely used for aligning biological sequences heuristically. This achieved up to 3x speed-up improvements compared to the most optimised GPP implementations. The thesis then presents a multi-threaded design and GPU implementation of a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and multiple sequence alignment (MSA). This achieves 8x-20x speed up compared to an equivalent GPP implementation based on the widely used ClustalW software. The NJ method however only gives one possible tree which strongly depends on the evolutionary model used. A more advanced method uses maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte Carlo (MCMC)-based Bayesian inference. The latter was the subject of another multi-threaded design and GPU implementation presented in this thesis, which achieved 4x-8x speed up compared to an equivalent GPP implementation based on the widely used MrBayes software. Finally, the thesis presents a general evaluation of the designs and implementations achieved in this work as a step towards the evaluation of GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA) technology. 572.8
15	Real-time Terrain Deformation with Isosurface Algorithms Nässén, Olle, Leiborn, Edvard January 2019 (has links) Background. Being able to modify virtual environments can create immersive experiences for video-game players. Storing data as volumetric scalar fields allows for highly modifiable 3D environments that can be converted into GPU-friendly triangles with isosurface algorithms. Using scalar fields and isosurface algorithms can be more computationally expensive and require more data than the more commonly used polygonal models. Objectives. The aim of this thesis is to explore solutions to modifying real-time 3D environments with isosurface algorithms. This will be done in two parts. First in terms of observing how modern games deal with storing scalar fields, researching which isosurface algorithms are being used and how they are being used in games. The second part is to create an application and limit the data storage required while still running at a real-time speed. Methods. There are two methods to achieve the aim. The first is to research and see which data structures and isosurface algorithms are being used in modern games and how they are utilized. The second method will be done by implementation. The implementation will use the GPU through compute shaders and use marching cubes as isosurface algorithm. It will utilize Christopher Dyken’s Histogram Pyramids for stream compaction. Two different versions will be implemented that differ in terms of what data types will be used for storage. The first using the data type char and the second int. Between these two versions, the runtime speed will be measured and compared on two different hardware configurations. Results. Finding good data on what algorithms games use is difficult. Modern games are using scalar fields in many different ways: Some allow almost complete modification of terrain, others only use it for a 3D environment. For data storage, octrees and chunks are two common ways to store the fields. Dual Contouring appears to be the primary isosurface algorithm being used based on the researched games. The results of the implementation were very fast and usable in real time environments for destruction of terrain on a large scale. The less storage intensive variation of this implementation(char) gave faster results on modern hardware but the opposite(int) was true on older hardware. Conclusions. Modifying scalar field terrain is done at a very large scale in modern games. The choice of using Dual Contouring or Marching Cubes depends on the use-case. For areas where sharp features can be important Dual Contouring is the preferred choice. Likely for these reasons Dual Contouring was found to be a popular choice in the studied games. For other areas, like many types of terrain, Marching Cubes is very fast, as can be seen in the implementation. By using the char version of the implementation, interacting with the environment in real-time is possible at high frame-rates. Isosurface Marching Cubes Terrain Deformation OpenGL Compute Shader Computer Sciences Datavetenskap (datalogi)
16	Design and Evaluation of a Single Instruction Processor / Design och utveckling av en eninstruktions processor Mu, Rongzeng January 2003 (has links) <p>A new path of DSP processor design is described in this thesis with an example, to design a FFT processor. It is an innovative concept for DSP processor design developed by the Electronic Systems Division in the department of Electrical Engineer department in Linköping University. </p><p>The project described in this thesis is to design a Sande-Tukey FFT processor step by step. It will go through all steps from the simplest MATLAB specification to the final synthesizable VHDL specification. The steps should be as small as possible in order to avoid error and MATLAB should be used as for as possible.</p> Electronics FFT processor specification algorithm input compute output Elektronik Electronics Elektronik
17	Design and Evaluation of a Single Instruction Processor / Design och utveckling av en eninstruktions processor Mu, Rongzeng January 2003 (has links) A new path of DSP processor design is described in this thesis with an example, to design a FFT processor. It is an innovative concept for DSP processor design developed by the Electronic Systems Division in the department of Electrical Engineer department in Linköping University. The project described in this thesis is to design a Sande-Tukey FFT processor step by step. It will go through all steps from the simplest MATLAB specification to the final synthesizable VHDL specification. The steps should be as small as possible in order to avoid error and MATLAB should be used as for as possible. Electronics FFT processor specification algorithm input compute output Elektronik Electronics Elektronik
18	A Novel Linear RF Transmitter Using High-Efficiency Power Amplifier Applied with Envelope Modulation Chen, Yu-An 26 July 2005 (has links) Abstract¡G This thesis mainly implemented an RF transmitter with high efficiency and high linearity. A Cartesian to Polar transformation was implemented by CORDIC algorithm using FPGA. By replacing the envelope detector and limiter in traditional envelope elimination and restoration transmitter, this technique not only achieves more accurate modulation quality, but also becomes more suitable for single chip system. Applying the first order delta-sigma modulation and highly efficient switching-mode DC converter, the envelope signal was amplified highly efficiently. Due to the class-E power amplifier having good linear relation between output voltage and supply voltage, the polar modulation transmitter can achieve high efficiency and high linearity simultaneously. Furthermore, this thesis purposed a new transmitter with two-terminal time-varying modulation. The IQ modulated signal was fed to the input terminal of class-E amplifier, while the envelope signal was used to amplitude modulate the voltage supply terminal. With dynamic input power control, the conversion efficiency and linearity are independent of output power in the purposed architecture. From the experimental results, while transmitting a QPSK-modulated CDMA2000 1x signal with 1.2288 Msps data rate, the transmitter achieve 48 % in drain efficiency, 47 dB in ACPR, and 6 % in EVM at the output power ranging from 10 to 22 dBm. Transmitter with Polar Modulation
19	Αξιοποίηση υπολογιστικών πόρων Σίψας, Κωνσταντίνος 13 December 2010 (has links) Στα πλαίσια αυτής της εργασίας θα εξετάσουμε την δυνατότητα αξιοποίησης της μονάδας επεξεργασίας γραφικών (GPU) για την εκτέλεση ενός αλγορίθμου πολλαπλασιασμού πίνακα-διανύσματος και τριών αλγορίθμων ταξινόμησης και το κατά πόσο είναι δυνατό να επιταχυνθεί η εκτέλεση του κώδικα αυτού. Η αρχιτεκτονική που μελετήθηκε και αναλύεται στην εργασία ονομάζεται Tesla και αναπτύχθηκε από την εταιρία Nvidia, το μοντέλο και το περιβάλλον ανάπτυξης ονομάζονται Cuda (Compute Unified Device Architecture). / In context of this diploma thesis the capability of exploiting the graphics processing unit (GPU) to execute and accelerate an algorithm for matrix vector multiplication and three sorting algorithms was examined. The architecture which was examined and described in this diploma thesis is Tesla and it was created by Nvidia. The CUDA (Compute Unified Device Architecture) programming environment was used to implement the algorithms. 005.741 Parallel sorting Tesla
20	Determinação de autovalores e autovetores de matrizes tridiagonais simétricas usando CUDA Rocha, Lindomar José 04 August 2015 (has links) Dissertação (mestrado)–Universidade de Brasília, Universidade UnB de Planaltina, Programa de Pós-Graduação em Ciência de Materiais, 2015. / Submitted by Fernanda Percia França (fernandafranca@bce.unb.br) on 2015-12-15T17:59:17Z No. of bitstreams: 1 2015_LindomarJoséRocha.pdf: 1300687 bytes, checksum: f028dc5aba5d9f92f1b2ee949e3e3a3d (MD5) / Approved for entry into archive by Raquel Viana(raquelviana@bce.unb.br) on 2016-02-29T22:14:44Z (GMT) No. of bitstreams: 1 2015_LindomarJoséRocha.pdf: 1300687 bytes, checksum: f028dc5aba5d9f92f1b2ee949e3e3a3d (MD5) / Made available in DSpace on 2016-02-29T22:14:44Z (GMT). No. of bitstreams: 1 2015_LindomarJoséRocha.pdf: 1300687 bytes, checksum: f028dc5aba5d9f92f1b2ee949e3e3a3d (MD5) / Diversos ramos do conhecimento humano fazem uso de autovalores e autovetores, dentre eles têm-se Física, Engenharia, Economia, etc. A determinação desses autovalores e autovetores pode ser feita utilizando diversas rotinas computacionais, porém umas mais rápidas que outras nesse senário de ganho de velocidade aparece a opção de se usar a computação paralela de forma mais especifica a CUDA da Nvidia é uma opção que oferece um ganho de velocidade significativo, nesse modelo as rotinas são executadas na GPU onde se tem diversos núcleos de processamento. Dada a tamanha importância dos autovalores e autovetores o objetivo desse trabalho é determinar rotinas que possam efetuar o cálculos dos mesmos com matrizes tridiagonais simétricas reais de maneira mais rápida e segura, através de computação paralela com uso da CUDA. Objetivo esse alcançado através da combinação de alguns métodos numéricos para a obtenção dos autovalores e um alteração no método da iteração inversa utilizado na determinação dos autovetores. Temos feito uso de rotinas LAPACK para comparar com as nossas rotinas desenvolvidas em CUDA. De acordo com os resultados, a rotina desenvolvida em CUDA tem a vantagem clara de velocidade quer na precisão simples ou dupla, quando comparado com o estado da arte das rotinas de CPU a partir da biblioteca LAPACK. ______________________________________________________________________________________________ ABSTRACT / Severa branches of human knowledge make use of eigenvalues and eigenvectors, among them we have physics, engineering, economics, etc. The determination of these eigenvalues and eigenvectors can be using various computational routines, som faster than others in this speed increase scenario appears the option to use the parallel computing more specifically the Nvidia’s CUDA is an option that provides a gain of significant speed, this model the routines are performed on the GPU which has several processing cores. Given the great importance of the eigenvalues and eigenvectors the objective of this study is to determine routines that can perform the same calculations with real symmetric tridiagonal matrices more quickly and safely, through parallel computing with use of CUDA. Objective that achieved by some combination of numerical methods to obtain the eigenvalues and a change in the method of inverse iteration used to determine of the eigenvectors, which was used LAPACK routines to compare with routine developed in CUDA. According to the results of the routine developed in CUDA has marked superiority with single or double precision, in the question speed regarding the routines of LAPACK. Matriz simétrica Autovalores Matriz tridiagonal Programação paralela (Computação) Iteração inversa

Search results