Global ETD Search

761	Building an Efficient Occupancy Grid Map Based on Lidar Data Fusion for Autonomous driving Applications Salem, Marwan January 2019 (has links) The Localization and Map building module is a core building block for designing an autonomous vehicle. It describes the vehicle ability to create an accurate model of its surroundings and maintain its position in the environment at the same time. In this thesis work, we contribute to the autonomous driving research area by providing a proof-of-concept of integrating SLAM solutions into commercial vehicles; improving the robustness of the Localization and Map building module. The proposed system applies Bayesian inference theory within the occupancy grid mapping framework and utilizes Rao-Blackwellized Particle Filter for estimating the vehicle trajectory. The work has been done at Scania CV where a heavy duty vehicle equipped with multiple-Lidar sensory architecture was used. Low level sensor fusion of the different Lidars was performed and a parallelized implementation of the algorithm was achieved using a GPU. When tested on the frequently used datasets in the community, the implemented algorithm outperformed the scan-matching technique and showed acceptable performance in comparison to another state-of-art RBPF implementation that adapts some improvements on the algorithm. The performance of the complete system was evaluated under a designed set of real scenarios. The proposed system showed a significant improvement in terms of the estimated trajectory and provided accurate occupancy representations of the vehicle surroundings. The fusion module was found to build more informative occupancy grids than the grids obtained form individual sensors. / Modulen som har hand om både lokalisering och byggandet av karta är en av huvudorganen i ett system för autonom körning. Den beskriver bilens förmåga att skapa en modell av omgivningen och att hålla en position i förhållande till omgivningen. I detta examensarbete bidrar vi till forskningen inom autonom bilkörning med ett valideringskoncept genom att integrera SLAM-lösningar i kommersiella fordon, vilket förbättrar robustheten hos lokaliserings-kartbyggarmodulen. Det föreslagna systemet använder sig utav Bayesiansk statistik applicerat i ett ramverk som har hand om att skapa en karta, som består av ett rutnät som används för att beskriva ockuperingsgraden. För att estimera den bana som fordonet kommer att färdas använder ramverket RBPF(Rao-Blackwellized particle filter). Examensarbetet har genomförts hos Scania CV, där ett tungt fordon utrustat med flera lidarsensorer har använts. En lägre nivå av sensor fusion applicerades för de olika lidarsensorerna och en parallelliserad implementation av algoritmen implementerades på GPU. När algoritmen kördes mot data som ofta används av ”allmänheten” kan vi konstatera att den implementerade algoritmen ger ett väldigt mycket bättre resultat än ”scan-matchnings”-tekniken och visar på ett acceptabelt resultat i jämförelse med en annan högpresterande RBPFimplementation, vilken tillför några förbättringar på algoritmen. Prestandan av hela systemet utvärderas med ett antal egendesignade realistiska scenarion. Det föreslagna systemet visar på en tydlig förbättring av uppskattningen av körbanan och bidrar även med en exakt representation av omgivningen. Sensor Fusionen visar på en bättre och mer informativ representation än när man endast utgår från de individuella lidarsensorerna. Autonomous Driving Occupancy Grids Bayesian Inference Lidar SLAM Rao-Blackwellized Particle Filter Sensor Fusion GPU Computer and Information Sciences Data- och informationsvetenskap
762	GPU Network Processing Yanggratoke, Rerngvit January 2010 (has links) Networking technology is connecting more and more people around the world. It has become an essential part of our daily life. For this connectivity to be seamless, networks need to be fast. Nonetheless, rapid growth in network traffic and variety of communication protocols overwhelms the Central Processing Units (CPUs) processing packets in the networks. Existing solutions to this problem such as ASIC, FPGA, NPU, and TOE are not cost effective and easy to manage because they require special hardware and custom configurations. This thesis approaches the problem differently by offloading the network processing to off-the-shelf Graphic Processing Units (GPUs). The thesis's primary goal is to find out how the GPUs should be used for the offloading. The thesis follows the case study approach and the selected case studies are layer 2 Bloom filter forwarding and flow lookup in Openflow switch. Implementation alternatives and evaluation methodology are proposed for both of the case studies. Then, the prototype implementation for comparing between traditional CPU-only and GPU-offloading approach is developed and evaluated. The primary findings from this work are criteria of network processing functions suitable for GPU offloading and tradeoffs involved. The criteria are no inter-packet dependency, similar processing flows for all packets, and within-packet parallel processing opportunity. This offloading trades higher latency and memory consumption for higher throughput. / Nätverksteknik ansluter fler och fler människor runt om i världen. Det har blivit en viktig del av vårt dagliga liv. För att denna anslutning skall vara sömlös, måste nätet vara snabbt. Den snabba tillväxten i nätverkstrafiken och olika kommunikationsprotokoll sätter stora krav på processorer som hanterar all trafik. Befintliga lösningar på detta problem, t.ex. ASIC, FPGA, NPU, och TOE är varken kostnadseffektivt eller lätta att hantera, eftersom de kräver speciell hårdvara och anpassade konfigurationer. Denna avhandling angriper problemet på ett annat sätt genom att avlasta nätverks processningen till grafikprocessorer som sitter i vanliga pc-grafikkort. Avhandlingen främsta mål är att ta reda på hur GPU bör användas för detta. Avhandlingen följer fallstudie modell och de valda fallen är lager 2 Bloom filter forwardering och ``flow lookup'' i Openflow switch. Implementerings alternativ och utvärderingsmetodik föreslås för både fallstudierna. Sedan utvecklas och utvärderas en prototyp för att jämföra mellan traditionell CPU- och GPU-offload. Det primära resultatet från detta arbete utgör kriterier för nätvärksprocessfunktioner lämpade för GPU offload och vilka kompromisser som måste göras. Kriterier är inget inter-paket beroende, liknande processflöde för alla paket. och möjlighet att köra fler processer på ett paket paralellt. GPU offloading ger ökad fördröjning och minneskonsumption till förmån för högre troughput. GPU Network processing Openflow switch Bloom filter Computer and Information Sciences Data- och informationsvetenskap Communication Systems Kommunikationssystem Communication Systems Kommunikationssystem
763	Free Wake Potential Flow Vortex Wind Turbine Modeling: Advances in Parallel Processing and Integration of Ground Effects Develder, Nathaniel B 01 January 2014 (has links) (PDF) Potential flow simulations are a great engineering type, middle-ground approach to modeling complex aerodynamic systems, but quickly become computationally unwieldy for large domains. An N-body problem with N-squared interactions to calculate, this free wake vortex model of a wind turbine is well suited to parallel computation. This thesis discusses general trends in wind turbine modeling, a potential flow model of the rotor of the NREL 5MW reference turbine, various forms of parallel computing, current GPU hardware, and the application of ground effects to the model. In the vicinity of 200,000 points, current GPU hardware was found to be nearly 17 times faster than an OpenMP 12 core CPU parallel code, and over 280 times faster than serial MATLAB code. Convergence of the solution is found to be dependent on the direction in which the grid is refined. The "no entry" condition at the ground plane is found to have a measurable but small impact on the model outputs with a periodicity driven by the blade proximity to the ground plane. The effect of the ground panel method was found to converge to that of the "method of images" for increasing ground extent and number of panels. Aerodynamics Wind Turbines Potential Flow Vortex Methods GPU Computing Aerodynamics and Fluid Mechanics Energy Systems
764	Temporal contrast-dependent modeling of laser-driven solids: studying femtosecond-nanometer interactions and probing Garten, Marco 03 May 2023 (has links) Establishing precise control over the unique beam parameters of laser-accelerated ions from relativistic ultra-short pulse laser-solid interactions has been a major goal for the past 20 years. While the spatio-temporal coupling of laser-pulse and target parameters create transient phenomena at femtosecond-nanometer scales that are decisive for the acceleration performance, these scales have also largely been inaccessible to experimental observation. Computer simulations of laser-driven plasmas provide valuable insight into the physics at play. Nevertheless, predictive capabilities are still lacking due to the massive computational cost to perform these in 3D at high resolution for extended simulation times. This thesis investigates the optimal acceleration of protons from ultra-thin foils following the interaction with an ultra-short ultra-high intensity laser pulse, including realistic contrast conditions up to a picosecond before the main pulse. Advanced ionization methods implemented into the highly scalable, open-source particle-in-cell code PIConGPU enabled this study. Supporting two experimental campaigns, the new methods led to a deeper understanding of the physics of Laser-Wake eld acceleration and Colloidal Crystal melting, respectively, for they now allowed to explain experimental observations with simulated ionization- and plasma dynamics. Subsequently, explorative 3D3V simulations of enhanced laser-ion acceleration were performed on the Swiss supercomputer Piz Daint. There, the inclusion of realistic laser contrast conditions altered the intra-pulse dynamics of the acceleration process significantly. Contrary to a perfect Gaussian pulse, a better spatio-temporal overlap of the protons with the electron sheath origin allowed for full exploitation of the accelerating potential, leading to higher maximum energies. Adapting well-known analytic models allowed to match the results qualitatively and, in chosen cases, quantitatively. However, despite complex 3D plasma dynamics not being reflected within the 1D models, the upper limit of ion acceleration performance within the TNSA scenario can be predicted remarkably well. Radiation signatures obtained from synthetic diagnostics of electrons, protons, and bremsstrahlung photons show that the target state at maximum laser intensity is encoded, previewing how experiments may gain insight into this previously unobservable time frame. Furthermore, as X-ray Free Electron Laser facilities have only recently begun to allow observations at femtosecond-nanometer scales, benchmarking the physics models for solid-density plasma simulations is now in reach. Finally, this thesis presents the first start-to-end simulations of optical-pump, X-ray-probe laser-solid interactions with the photon scattering code ParaTAXIS. The associated PIC simulations guided the planning and execution of an LCLS experiment, demonstrating the first observation of solid-density plasma distribution driven by near-relativistic short-pulse laser pulses at femtosecond-nanometer resolution. / Die Erlangung präziser Kontrolle über die einzigartigen Strahlparameter von laserbeschleunigten Ionen aus relativistischen Ultrakurzpuls-Laser-Festkörper-Wechselwirkungen ist ein wesentliches Ziel der letzten 20 Jahre. Während die räumlich-zeitliche Kopplung von Laserpuls und Targetparametern transiente Phänomene auf Femtosekunden- und Nanometerskalen erzeugt, die für den Beschleunigungsprozess entscheidend sind, waren diese Skalen der experimentellen Beobachtung bisher weitgehend unzugänglich. Computersimulationen von lasergetriebenen Plasmen liefern dabei wertvolle Einblicke in die zugrunde liegende Physik. Dennoch mangelt es noch an Vorhersagemöglichkeiten aufgrund des massiven Rechenaufwands, um Parameterstudien in 3D mit hoher Auflösung für längere Simulationszeiten durchzuführen. In dieser Arbeit wird die optimale Beschleunigung von Protonen aus ultradünnen Folien nach der Wechselwirkung mit einem ultrakurzen Ultrahochintensitäts-Laserpuls unter Einbeziehung realistischer Kontrastbedingungen bis zu einer Pikosekunde vor dem Hauptpuls untersucht. Hierbei ermöglichen neu implementierte fortschrittliche Ionisierungsmethoden für den hoch skalierbaren, quelloffenen Partikel-in-Zelle-Code PIConGPU von nun an Studien dieser Art. Bei der Unterstützung zweier Experimentalkampagnen führten diese Methoden zu einem tieferen Verständnis der Laser-Wake eld-Beschleunigung bzw. des Schmelzens kolloidaler Kristalle, da nun experimentelle Beobachtungen mit simulierter Ionisations- und Plasmadynamik erklärt werden konnten. Im Anschluss werden explorative 3D3V Simulationen verbesserter Laser-Ionen-Beschleunigung vorgestellt, die auf dem Schweizer Supercomputer Piz Daint durchgeführt wurden. Dabei veränderte die Einbeziehung realistischer Laserkontrastbedingungen die Intrapulsdynamik des Beschleunigungsprozesses signifikant. Im Gegensatz zu einem perfekten Gauß-Puls erlaubte eine bessere räumlich-zeitliche Überlappung der Protonen mit dem Ursprung der Elektronenwolke die volle Ausnutzung des Beschleunigungspotentials, was zu höheren maximalen Energien führte. Die Adaptation bekannter analytischer Modelle erlaubte es, die Ergebnisse qualitativ und in ausgewählten Fällen auch quantitativ zu bestätigen. Trotz der in den 1D-Modellen nicht abgebildeten komplexen 3D-Plasmadynamik zeigt die Vorhersage erstaunlich gut das obere Limit der erreichbaren Ionen-Energien im TNSA Szenario. Strahlungssignaturen, die aus synthethischen Diagnostiken von Elektronen, Protonen und Bremsstrahlungsphotonen gewonnen wurden, zeigen, dass der Target-Zustand bei maximaler Laserintensität einkodiert ist, was einen Ausblick darauf gibt, wie Experimente Einblicke in dieses bisher unbeobachtbare Zeitfenster gewinnen können. Mit neuen Freie-Elektronen-Röntgenlasern sind Beobachtungen auf Femtosekunden-Nanometerskalen endlich zugänglich geworden. Damit liegt ein Benchmarking der physikalischen Modelle für Plasmasimulationen bei Festkörperdichte nun in Reichweite, aber Experimente sind immer noch selten, komplex, und schwer zu interpretieren. Zuletzt werden daher in dieser Arbeit die ersten Start-zu-End-Simulationen der Pump-Probe Wechselwirkungen von optischem sowie Röntgenlaser mit Festkörpern mittels des Photonenstreu-Codes ParaTAXIS vorgestellt. Darüber hinaus dienten die zugehörigen PIC-Simulationen als Grundlage für die Planung und Durchführung eines LCLS-Experiments zur erstmaligen Beobachtung einer durch nah-relativistische Kurzpuls-Laserpulse getriebenen Festkörper-Plasma-Dichte, dessen Auflösungsbereich gleichzeitig bis auf Femtosekunden und Nanometer vordrang. info:eu-repo/classification/ddc/530 ddc:530
765	System for Collision Detection Between Deformable Models Built on Axis Aligned Bounding Boxes and GPU Based Culling Tuft, David Owen 12 January 2007 (has links) (PDF) Collision detection between deforming models is a difficult problem for collision detection systems to handle. This problem is even more difficult when deformations are unconstrained, objects are in close proximity to one another, and when the entity count is high. We propose a method to perform collision detection between multiple deforming objects with unconstrained deformations that will give good results in close proximities. Currently no systems exist that achieve good performance on both unconstrained triangle level deformations and deformations that preserve edge connectivity. We propose a new system built as a combination of Graphics Processing Unit (GPU) based culling and Axis Aligned Bounding Box (AABB) based culling. Techniques for performing hierarchy-less GPU-based culling are given. We then discuss how and when to switch between GPU-based culling and AABB based techniques. computer science collision detection gpu graphics processing unit axis aligned bounding box deformable models culling Computer Sciences
766	An Optical Flow Implementation Comparison Study Bodily, John M. 12 March 2009 (has links) (PDF) Optical flow is the apparent motion of brightness patterns within an image scene. Algorithms used to calculate the optical flow for a sequence of images are useful in a variety of applications, including motion detection and obstacle avoidance. Typical optical flow algorithms are computationally intense and run slowly when implemented in software, which is problematic since many potential applications of the algorithm require real-time calculation in order to be useful. To increase performance of the calculation, optical flow has recently been implemented on FPGA and GPU platforms. These devices are able to process optical flow in real-time, but are generally less accurate than software solutions. For this thesis, two different optical flow algorithms have been implemented to run on a GPU using NVIDIA's CUDA SDK. Previous FPGA implementations of the algorithms exist and are used to make a comparison between the FPGA and GPU devices for the optical flow calculation. The first algorithm calculates optical flow using 3D gradient tensors and is able to process 640x480 images at about 238 frames per second with an average angular error of 12.1 degrees when run on a GeForce 8800 GTX GPU. The second algorithm uses increased smoothing and a ridge regression calculation to produce a more accurate result. It reduces the average angular error by about 2.3x, but the additional computational complexity of the algorithm also reduces the frame rate by about 1.5x. Overall, the GPU outperforms the FPGA in frame rate and accuracy, but requires much more power and is not as flexible. The most significant advantage of the GPU is the reduced design time and eﬀort needed to implement the algorithms, with the FPGA designs requiring 10x to 12x the eﬀort. optical flow GPU FPGA motion detection CUDA computer vision algorithm comparison 3D tensors ridge regression Electrical and Computer Engineering
767	Global Illumination on Modern GPUs Zhang, Fan January 2022 (has links) This thesis that implemented Monte Carlo path tracing and voxel cone tracing for global illumination on GPU compared the performance and visual result. The Monte Carlo path tracing algorithm is implemented in CUDA to do parallel computing on GPU and accelerate the computing speed. The voxel cone tracing, a global illumination algorithm for real-time computing, runs on OpenGL through the GPU graphics pipeline. The results show that the Monte Carlo Path Tracing on CPU single core takes over 10 hours, around 4 hours with 4 cores, on GPU it takes around 48 minutes, while the voxel cone tracing on the same GPU takes 2 ms. The quality of the image generated by the Monte Carlo path tracing contains much more transparent, reflection, and shadow details than that using the voxel cone tracing algorithm. / <p>Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet</p> Global illumination Monte Carlo path tracing voxel cone tracing GPU acceleration CUDA OpenGL Media and Communication Technology Medieteknik
768	Establishing Effective Techniques for Increasing Deep Neural Networks Inference Speed / Etablering av effektiva tekniker för att öka inferenshastigheten i djupa neurala nätverk Sunesson, Albin January 2017 (has links) Recent trend in deep learning research is to build ever more deep networks (i.e. increase the number of layers) to solve real world classification/optimization problems. This introduces challenges for applications with a latency dependence. The problem arises from the amount of computations that needs to be performed for each evaluation. This is addressed by reducing inference speed. In this study we analyze two different methods for speeding up the evaluation of deep neural networks. The first method reduces the number of weights in a convolutional layer by decomposing its convolutional kernel. The second method lets samples exit a network through early exit branches when classifications are certain. Both methods were evaluated on several network architectures with consistent results. Convolutional kernel decomposition shows 20-70% speed up with no more than 1% loss in classification accuracy in setups evaluated. Early exit branches show up to 300% speed up with no loss in classification accuracy when evaluated on CPUs. / De senaste årens trend inom deep learning har varit att addera fler och fler lager till neurala nätverk. Det här introducerar nya utmaningar i applikationer med latensberoende. Problemet uppstår från mängden beräkningar som måste utföras vid varje evaluering. Detta adresseras med en reducering av inferenshastigheten. Jag analyserar två olika metoder för att snabba upp evalueringen av djupa neurala näverk. Den första metoden reducerar antalet vikter i ett faltningslager via en tensordekomposition på dess kärna. Den andra metoden låter samples lämna nätverket via tidiga förgreningar när en klassificering är säker. Båda metoderna utvärderas på flertalet nätverksarkitekturer med konsistenta resultat. Dekomposition på fältningskärnan visar 20-70% hastighetsökning med mindre än 1% försämring av klassifikationssäkerhet i evaluerade konfigurationer. Tidiga förgreningar visar upp till 300% hastighetsökning utan någon försämring av klassifikationssäkerhet när de evalueras på CPU. Deep Neural Networks Machine Learning Inference Speed Convolution Decomposition Network Branches Julia MXNET GPU CPU. Computer Sciences Datavetenskap (datalogi)
769	An Approach For Computing Intervisibility Using Graphical Processing U Tracy, Judd 01 January 2004 (has links) In large scale entity-level military force-on-force simulations it is essential to know when one entity can visibly see another entity. This visibility determination plays an important role in the simulation and can affect the outcome of the simulation. When virtual Computer Generated Forces (CGF) are introduced into the simulation these intervisibilities must now be calculated by the virtual entities on the battlefield. But as the simulation size increases so does the complexity of calculating visibility between entities. This thesis presents an algorithm for performing these visibility calculations using Graphical Processing Units (GPU) instead of the Central Processing Units (CPU) that have been traditionally used in CGF simulations. This algorithm can be distributed across multiple GPUs in a cluster and its scalability exceeds that of CGF-based algorithms. The poor correlations of the two visibility algorithms are demonstrated showing that the GPU algorithm provides a necessary condition for a "Fair Fight" when paired with visual simulations. intervisiblitity occlusion occlusion query GPU cluster line-of-sight LOS semi-automated forces SAF computer generated forces CGF Computer Engineering Engineering
770	Procedural Natural Texture Generation on a Global Scale Pohl Lundgren, Anna January 2023 (has links) This Master’s thesis investigates the application of dynamically generated procedural terrain textures for texturing 3D representations of the Earth’s surface. The study explores techniques to overcome limitations of the currently most common method – projecting satellite imagery onto the mesh – such as insufficient resolution for close-up views and challenges in accommodating external lighting models. Textures for sand, rock and grass were generated procedurally on the GPU. Aliasing was prevented using a clamping technique, dynamically changing the level of detail when freely navigating across diverse landscapes. The general color of each terrain type was extracted from the satellite images, guided by land cover rasters, in a process where shadows were eliminated using HSV color space conversion and filtering. The procedurally generated textures provide significantly more details than the satellite images in close-up views, while missing some information in medium- to far-distance views, due to the satellite images containing information lacking in the 3D mesh. A qualitative analysis spanning six data sets from diverse global locations demonstrates that the proposed methods are applicable across a range of landscapes and climates. computer graphics procedural generation procedural textures gpu image processing satellite imagery real time real-time shadow detection Signal Processing Signalbehandling

Search results