Global ETD Search

121	Seismic Data Compression and GPU Memory Latency Haugen, Daniel January 2009 (has links) <p>The gap between processing performance and the memory bandwidth is still increasing. To compensate for this gap various techniques have been used, such as using a memory hierarchy with faster memory closer to the processing unit. Other techniques that have been tested include the compression of data prior to a memory transfer. Bandwidth limitations exists not only at low levels within the memory hierarchy, but also between the central processing unit (CPU) and the graphics processing unit (GPU), suggesting the use of compression to mask the gap. Seismic datasets are often very large, e.g. several terabytes. This thesis explores compression of seismic data to hide the bandwidth limitation between the CPU and the GPU for seismic applications. The compression method considered is subband coding, with both run-length encoding (RLE) and Huﬀman encoding as compressors of the quantized data. These methods has shown on CPU implementations to give very good compression ratios for seismic data. A proof of concept implementation for decompression of seismic data on GPUs is developed. It consists of three main components: First the subband synthesis ﬁlter reconstructing the input data processed by the subband analysis ﬁlter. Second, the inverse quantizer generating an output close to the input given to the quantizer. Finally, the decoders decompressing the compressed data using Huﬀman and RLE. The results of our implementation show that the seismic data compression algorithm investigated is probably not suited to hide the bandwidth limitation between CPU and GPU. This is because of the steps taken to do the decompression are likely slower than a simple memory copy of the uncompressed seismic data. It is primarily the decompressors that are the limiting factor, but in our implementation the subband synthesis is also limiting. The sequential nature of the decompres- sion algorithms used makes them diﬃcult to parallelize to make use of the processing units on the GPUs in an eﬃcient way. Several suggestions for future work is then suggested as well as results showing how our GPU implementation can be very useful for data compres- sion for data to be sent over a network. Our compression results give a compression factor between 27 and 32, and a SNR of 24.67dB for a cube of dimension 643. A speedup of 2.5 for the synthesis ﬁlter compared to the CPU implementation is achieved (2029.00/813.76 2.5). Although not currently suited for the GPU-CPU compression, our implementations indicate</p> ntnudaim SIF2 datateknikk Komplekse datasystemer
122	Simulation of Fluid Flow Through Porous Rocks on Modern GPUs Aksnes, Eirik Ola January 2009 (has links) <p>It is important for the petroleum industry to investigate how fluids flow inside the complicated geometries of porous rocks, in order to improve oil production. The lattice Boltzmann method can be used to calculate the porous rock's ability to transport fluids (permeability). However, this method is computationally intensive and hence begging for High Performance Computing (HPC). Modern GPUs are becoming interesting and important platforms for HPC. In this thesis, we show how to implement the lattice Boltzmann method on modern GPUs using the NVIDIA CUDA programming environment. Our work is done in collaborations with Numerical Rocks AS and the Department of Petroleum Engineering at the Norwegian University of Science and Technology. To better evaluate our GPU implementation, a sequential CPU implementation is first prepared. We then develop our GPU implementation and test both implementation using three porous data sets with known permeabilities provided by Numerical Rocks AS. Our simulations of fluid flow get high performance on modern GPUs showing that it is possible to calculate the permeability of porous rocks of simulations sizes up to 368^3, which fit into the 4 GB memory of the NVIDIA Quadro FX 5800 card. The performances of the CPU and GPU implementations are measured in MLUPS (million lattice node updates per second). Both implementations achieve their highest performances using single floating-point precision, resulting in the maximum performance equal to 1.59 MLUPS and 184.30 MLUPS. Techniques for reducing round-off errors are also discussed and implemented.</p> ntnudaim SIF2 datateknikk Komplekse datasystemer
123	Tetrahedral mesh for needle insertion Syvertsen, Rolf Anders January 2007 (has links) This is a Master’s thesis in how to make a tetrahedral mesh for use in a needle insertion simulator. It also describes how it is possible to make the simulator, and how to improve it to make it as realistic as possible. The medical simulator uses a haptic device, a haptic scene graph and a FEM for realistic soft tissue deformation and interaction. In this project a tetrahedral mesh is created from a polygon model, and then the mesh has been loaded into the HaptX haptic scene graph. The objects in the mesh have been made as different haptic objects, and then they have got a simple haptic surface to make it possible to touch them. There has not been implemented any code for the Hybrid Condensed FEM that has been described. ntnudaim SIF2 datateknikk Komplekse datasystemer
124	Seismic processing using Parallel 3D FMM Borlaug, Idar January 2007 (has links) This thesis develops and tests 3D Fast Marching Method (FMM) algorithm and apply these to seismic simulations. The FMM is a general method for monotonically advancing fronts, originally developed by Sethian. It calculates the first arrival time for an advancing front or wave. FMM methods are used for a variety of applications including, fatigue cracks in materials, lymph node segmentation in CT images, computing skeletons and centerlines in 3D objects and for finding salt formations in seismic data. Finding salt formations in seismic data, is important for the oil industry. Oil often flows towards gaps in the soil below a salt formation. It is therefore, important to map the edges of the salt formation, for this the FMM can be used. This FMM creates a first arrival time map, which makes it easier to see the edges of the salt formation. Herrmann developed a 3D parallel algorithm of the FMM testing waves of constant velocity. We implemented and tested his algorithm, but since seismic data typically causes a large variation of the velocities, optimizations were needed to make this algorithm scale. By optimising the border exchange and eliminating much of the roll backs, we delevoped and implemented a much improved 3D FMM which achieved close to theoretical performance, for up to at least 256 nodes on the current supercomputer at NTNU. Other methods like, different domain decompositions for better load balancing and running more FMM picks simultaneous, will also be discussed. ntnudaim SIF2 datateknikk Komplekse datasystemer
125	Modelling fibre orientation of the left ventricular human heart wall Siem, Knut Vidar Løvøy January 2007 (has links) The purpose of this thesis is to obtain and represent the orientation of the muscle fibres in the left ventricular wall of the human heart. The orientation of these fibres vary continuously through the wall. This report features an introduction to the human heart and medical imaging techniques. Attention is gradually drawn to concepts in computer science, and how they can help us get a “clearer picture” of the internals of, perhaps, the most important organ in the human body. A highly detailed Magnetic Resonance Imaging data set of the left ventricle cavity is used as a base for the analysis with 3-D morphological transformations. Also, a 3-D extension of the Hough transformation is developed. This does not seem to have been done before. An attempt is made to obtain the general trend of the trabeculae carneae, as it is believed that this is the orientation of the inner-most muscle fibres of the heart wall. Suggestions for further work include refinement of the proposed 3-D Hough transformation to yield lines that can be used as guides for parametric curves. Also a brief introduction to Diffusion Tensor Magnetic Resonance Imaging is given. ntnudaim SIF2 datateknikk Intelligente systemer
126	Conversational CBR for Improved Patient Information Acquisition Marthinsen, Tor Henrik Aasness January 2007 (has links) In this thesis we describe our study of two knowledge intensive Conversational Case-Based Reasoning (CCBR) systems and their methods. We look in particular at the way they have solved inferencing and question ranking. Then we continue with a description of our own design for a CCBR system, that will help patients share their experiences of side effects with drugs, with other patients. We describe how we create cases, how our question selection methods work and present an example of how the domain model will look. It is also included a simulation of how a dialogue would be for a patient. The design we have created is a good basis for implementing a knowledge intensive CCBR system. The system should work better than a normal CCBR system, because of the inferencing and question ranking methods, which should lessen the cognitive load on the user and require fewer questions answered, to reach a good solution. ntnudaim SIF2 datateknikk Intelligente systemer
127	Real-Time Simulation and Visualization of Large Sea Surfaces Løset, Tarjei Kvamme January 2007 (has links) The open ocean is the setting for enterprises that require extensive monitoring, planning and training. In the offshore industry, virtual environments have been embraced to improve such processes. The presented work focuses on real-time simulation and visualization of open seas. This implies very large water surfaces dominated by wind-driven waves, but also influenced by the presence of watercraft activity and offshore installations. The implemented system treats sea surfaces as periodic elevation fields, obtained by synthesis from statistically sampled frequency spectra. Apparent repeating structures across a surface, due to this periodic nature, are avoided by decomposing the elevation field synthesis, using two or more discrete spectra with different frequency scales. A GPU-based water solver is also included. Its implementation features a convenient input interface, which exploits hardware rasterization both for efficiency and to supply the algorithm with arbitrary data, e.g. smooth, connected deflective paths. Finally, polygonal representations of visible ocean regions are obtained using a GPU-accelerated tessellation scheme suitable for wave fields. The result is realistic, unbounded ocean surfaces with natural distributions of wind-driven waves, avoiding the artificial periodicity associated with previous similar techniques. Further, the simulation allows for superposed boat wakes and surface obstacles in regions of interest. With the proposed tessellation scheme, the visualization is economic with regards to data transfer, conforming with the goal of delivering highly interactive rendering rates. ntnudaim SIF2 datateknikk Komplekse datasystemer
128	Implementing LOD for physically-based real-time fire rendering Tangvald, Lars January 2007 (has links) In this paper, I present a framework for implementing level of detail (LOD) for a 3d physically based fire rendering running on the GPU. While realistic fire rendering that runs in real time exists, it is generally not used in real-time applications such as game, due to the high cost of running such a rendering. Most research into the rendering of fire is only concerned with the fire itself, and not how it can best be included in larger scenes with a multitude of other complex objects. I present methods for increasing the efficiency of a physically based fire rendering without harming its visual quality, by dynamically adjusting the detail level of the fire according to its importance for the current view. I adapt and use methods created both for LOD and for other areas to alter the detail level of the visualization and simulation of a fire rendering. The desired detail level is calculated by evaluating certain conditions such as visibility and distance from the viewpoint, and then used to adjust the detail level of the visualization and simulation of the fire. The implementation of the framework could not be completed in time, but a number of tests were run to determine the effect of the different methods used. These results indicate that by making adjustments to the simulation and visualization of the fire, large boosts in performance are gained without significantly harming the visual quality of the fire rendering. ntnudaim SIF2 datateknikk Komplekse datasystemer
129	Segmentation of Medical Images Using CBR Rieck, Christian Marshall January 2007 (has links) This paper describes a case based reasoning system that is used to guide the parameters of a segmentation algorithm. Instead of using a fixed set of parameters that gives the best average result over all images, the parameteres are tuned to maximize the score for each image separately. The system's foundation is a set of 20 cases that each contains one 3D MRI image and the parameters needed for its optimal segmentation. When a new image is presented to the system a new case is generated and compared to the other cases based on image similarity. The parameters from the best matching case are then used to segment the new image. The key issue is the use of an iterative approach that lets the system adapt the parameters to suit the new image better, if necessary. Each iteration contains a segmentation and a revision of the result, and this is done until the system approves the result. The revision is based on metadata stored in each case to see if the result has the expected properties as defined by the case. The results show that combining case based reasoning and segmentation can be applied within image processing. This is valid for choosing a good set of starting parameters, and also for using case specific knowledge to guide their adaption. A set of challenges for future research is identified and discussed at length. ntnudaim SIF2 datateknikk Intelligente systemer
130	Improving the Performance of Parallel Applications in Chip Multiprocessors with Architectural Techniques Jahre, Magnus January 2007 (has links) Chip Multiprocessors (CMPs) or multi-core architectures are a new class of processor architectures. Here, multiple processing cores are placed on the same physical chip. To reach the performance potential of these architectures with a single application, it must be multi-threaded. In these applications, the processing cores cooperate to solve a single task, and this requires a large amount of inter-processor communication in many cases. Consequently, CMPs need to support this communication in an efficient manner. To investigate inter-processor communication in CMPs, a good understanding of the state-of-the-art of CMP design options, interconnect network design and cache coherence protocol solutions is required. Furthermore, a good computer architecture simulator is needed to evaluate both new and conventional architectural solutions. The M5 simulator is used for this purpose and has been extended with a generic split transaction bus, a crossbar based on the IBM Power 5 crossbar, a butterfly network and an ideal interconnect. The unrealistic ideal interconnect provides an upper bound on the performance improvement available from enhancing the interconnect. In addition, a directory-based coherence protocol proposed by Stenström has been implemented. The performance of 2-, 4- and 8-core CMPs with crossbar and bus interconnects, private L1 caches and shared L2 caches is investigated. The bus and the crossbar are the conventional ways of implementing the L1 to L2 cache interconnect. These configurations have been evaluated with multiprogrammed workloads from the SPEC2000 benchmark suite and parallel, scientific benchmarks from the SPLASH-2 benchmark suite. With multiprogrammed workloads, the crossbar interconnect configurations perform nearly as well as a configuration with an ideal interconnect. However, the performance of the crossbar CMPs is similar to the performance of the bus CMPs when there is intensive L1 to L1 cache communication. The reason is limited L1 to L1 bandwidth. The bus CMPs experience a severe performance degradation with some benchmarks for all processor counts and workload classes. A butterfly interconnect is proposed to alleviate the L1 to L1 communication bottleneck. The butterfly CMP performs on average 3.9 times better than the bus CMP and 3.8 times better than the crossbar CMP when there are 8 processor cores. These numbers are based on the performance of the WaterNSquared, Raytrace, Radix and LUNoncontig benchmarks. The reason is that the other SPLASH-2 benchmarks had issues with the M5 thread implementation for these configurations. For the multiprogrammed workloads, the butterfly CMPs are a bit slower than the crossbar CMPs. ntnudaim SIF2 datateknikk Komplekse datasystemer

Search results