Global ETD Search

91	Experimentation with inverted indexes for dynamic document collections Bjørklund, Truls Amundsen January 2007 (has links) This report aims to asses the efficiency of various inverted indexes when the indexed document collection is dynamic. To achieve this goal, we experiment with three different overall structures: Remerge, hierarchical indexes and a naive B-tree index. An efficiency model is also developed. The resulting estimates for each structure from the efficiency model are compared to the actual results. We introduce two modifications to existing methods. The first is a new scheme for accumulating an index in memory during sort-based inversion. Even though the memory characteristics of this modified scheme are attractive, our experiments suggest that other proposed methods are more efficient. We also introduce a modification to the hierarchical indexes, which makes them more flexible. Tf-idf is used as the ranking scheme in all tested methods. Approximations to this scheme are suggested to make it more efficient in an updatable index. We conclude that in our implementation, the hierarchical index with the modification we have suggested performs best overall. We also conclude that the tf-idf ranking scheme is not fit for updatable indexes. The major problem with using the scheme is that it becomes difficult to make documents searchable immediately without sacrificing update speed. ntnudaim SIF2 datateknikk Komplekse datasystemer
92	Tracking for Outdoor Mobile Augmented Reality : Further development of the Zion Augmented Reality Application Strand, Tor Egil Riegels January 2008 (has links) This report deals with providing tracking to an outdoor mobile augmented reality system and the Zion Augmented Reality Application. ZionARA is meant to display a virtual recreation of a 13th century castle on the site it once stood through an augmented reality Head Mounted Display. Mobile outdoor augmented/mixed reality puts special demands on what kind of equipment is practical. After briefly evaluating the different existing tracking methods, a solution based on GPS and an augmented inertial rotation tracker is further evaluated by trying them out in a real setting. While standard unaugmented GNSS trackers are unable to provide the level of accuracy necessary, a differential GPS receiver is found to be capable of delivering good enough coordinates. The final result is a new version of ZionARA that actually allows a person to walk around at the site of the castle and see the castle as it most likely once stood. Source code and data files for ZionARA are provided on a supplemental disc. ntnudaim SIF2 datateknikk Komplekse datasystemer
93	Utilizing GPUs for Real-Time Visualization of Snow Eidissen, Robin January 2009 (has links) A real-time implementation is achieved, including a GPU based fluid-solver and particle simulation. Snow buildup is implemented on a height mapped terrain. ntnudaim SIF2 datateknikk Komplekse datasystemer
94	Techniques and Tools for Optimizing Codes on Modern Architectures: : A Low-Level Approach Jensen, Rune Erlend January 2009 (has links) This thesis describes novel techniques and test implementations for optimizing numerically intensive codes. Our main focus is on how given algorithms can be adapted to run efficiently on modern microprocessor exploring several architectural features including, instruction selection, and access patterns related to having several levels of cache. Our approach is also shown to be relevant for multicore architectures. Our primary target applications are linear algebra routines in the form of matrix multiply with dense matrices. We analyze how current compilers, microprocessor and common optimization techniques (like loop tiling and date relocation) interact. A tunable assembly code generator is developed, built, and tested on a basic BLAS level-3 routine to side-step some of the performance issues of modern compilers. Our generator has been test on both the Intel Pentium 4 and Intel's Core 2 processors. For the Pentium 4, a 10.8 % speed-up is achieved over ATLAS's rank2k, and a 17% speed-up is achieved over MKL's implementation for 4000-by-4032 matrices. On the Core 2 we optimize our code for 2000-by-2000 matrices and achieved a 24% and 5% speed-up over ATLAS and MKL, respectively with our multi-threaded implementation. Also for other matrix sizes, descent speed-ups are shown. Considering that our implementation is far from fully tuned, we consider these result very respectable. ntnudaim MIT informatikk Komplekse datasystemer
95	MicroRNAs and Transcriptional Control Skaland, Even January 2009 (has links) Background: MicroRNAs are small non-coding transcripts that have regulatory roles in the genome. Cis natural antisense transcripts are transcripts overlapping a sense transcript at the same loci in the genome, but at the opposite strand. Such antisense transcripts are thought to have regulatory roles in the genome, and the hypothesis is that miRNAs might bind to such antisense transcripts and thus activate the overlapping sense transcript. Aim of study: The following two aims have been identified during this project: (1) investigate whether the non-coding transcript of cis-NATs show significant enrichment for conserved miRNA seed sites, and (2) to correlate miRNA expression with expression of the sense side of targeted cis-NAT pairs. Results: Seed sites within such antisense transcripts gave significant enrichment, suggesting that miRNAs might actually bind to such antisense transcripts. There is a significant negative correlation between the expression of mir-28 and the expression of its targeted antisense transcripts, whereas the other miRNAs have no significant correlations. Also, the 3UTR of the sense side of cis-NAT pairs is longer and more conserved than random transcripts. Conclusion: This work has strengthened the hypothesis that miRNAs might bind to such antisense transcripts. ntnudaim SIF2 datateknikk Komplekse datasystemer
96	Practical use of Block-Matching in 3D Speckle Tracking Nielsen, Karl Espen January 2009 (has links) In this thesis, optimizations for speckle tracking are integrated into an existing framework for real-time tracking of deformable subdivision surfaces. This is employed in the segmentation of the the left ventricle (LV) in 3D echocardiography. The main purpose of the project was to optimize the efficiency of material point tracking, this leading to a more robust LV myocardial deformation field estimation. Block-matching is the most time consuming part of speckle tracking, and the corresponding algorithms used in this thesis are optimized based on a Single Instruction Multiple Data (SIMD) model, in order to achieve data level parallelism. The SIMD model is implemented by using Streaming SIMD Extensions (SSE) to improve the processing time for the computation of the sum of absolute differences, one possible metric for block matching purposes. Furthermore, a study is conducted to optimize parameters associated with speckle tracking in regards to both accuracy and computation time. This is tested by using simulated data sets of infarcted ventricles in 3D echocardiography. More specifically, the tests examine how the size of kernel blocks and search windows affect the accuracy and processing time of the tracking. It also compares the performance of kernel blocks specified in cartesian and beamspace coordinates. Finally, tracking-accuracy is compared and measured in different regions (apical, mid-level and basal segments) of the LV. ntnudaim SIF2 datateknikk Komplekse datasystemer
97	Throughput Computing on Future GPUs Hovland, Rune Johan January 2009 (has links) The general-purpose computing capabilities of the Graphics Processing Unit (GPU) have recently been given a great deal of attention by the High-Performance Computing (HPC) community. By allowing massively parallel applications to run efficiently on commodity graphics cards, personal supercomputers are now available in desktop versions at a low price. For some applications, speedups of 70 times that of a single CPU implementation have been achieved. Among the most popular GPUs are those based on the NVIDIA Tesla Architecture which allows relatively easy development of GPU applications using the NVIDIA CUDA programming environment. While the GPU is gaining interest in the HPC community, others are more reluctant to embrace the GPU as a computational device. The focus on throughput and large data volumes separates Information Retrieval (IR) from HPC, since for IR it is critical to process large amounts of data efficiently, a task which the GPU currently does not excel at. Only recently has the IR community begun to explore the possibilities, and an implementation of a search engine for the GPU was published recently in April 2009. This thesis analyzes how GPUs can be improved to better suit large data volume applications. Current graphics cards have a bottleneck regarding the transfer of data between the host and the GPU. One approach to resolve this bottleneck is to include the host memory as part of the GPUs memory hierarchy. We develop a theoretical model, and based on this model, the expected performance improvement for high data volume applications are shown for both computationally-bound and data transfer-bound applications. The performance improvement for an existing search engine is also given based on the theoretical model. For this case, the improvements would result in a speedup between 1.389 and 1.874 for the various query-types supported by the search engine. ntnudaim SIF2 datateknikk Komplekse datasystemer
98	Seismic Shot Processing on GPU Johansen, Owe January 2009 (has links) Todays petroleum industry demand an ever increasing amount of compu- tational resources. Seismic processing applications in use by these types of companies have generally been using large clusters of compute nodes, whose only computing resource has been the CPU. However, using Graphics Pro- cessing Units (GPU) for general purpose programming is these days becoming increasingly more popular in the high performance computing area. In 2007, NVIDIA corporation launched their framework for developing GPU utilizing computational algorithms, known as the Compute Unied Device Architec- ture (CUDA), a wide variety of research areas have adopted this framework for their algorithms. This thesis looks at the applicability of GPU techniques and CUDA for off-loading some of the computational workload in a seismic shot modeling application provided by StatoilHydro to modern GPUs. This work builds on our recent project that looked at providing check- point restart for this MPI enabled shot modeling application. In this thesis, we demonstrate that the inherent data parallelism in the core finite-difference computations also makes our application well suited for GPU acceleration. By using CUDA, we show that we could do an efficient port our application, and through further refinements achieve significant performance increases. Benchmarks done on two different systems in the NTNU IDI (Depart- ment of Computer and Information Science) HPC-lab, are included. One system is a Intel Core2 Quad Q9550 @2.83GHz with 4GB of RAM and an NVIDIA GeForce GTX280 and NVIDIA Tesla C1060 GPU. Our sec- ond testbed was an Intel Core I7 Extreme (965 @3.20GHz) with 12GB of RAM hosting an NVIDIA Tesla S1070 (4X NVIDIA Tesla C1060). On this hardware, speedups up to a factor of 8-14.79 compared to the original se- quential code are achieved, confirming the potential of GPU computing in applications similar to the one used in this thesis. ntnudaim SIF2 datateknikk Komplekse datasystemer
99	Seismic Data Compression and GPU Memory Latency Haugen, Daniel January 2009 (has links) The gap between processing performance and the memory bandwidth is still increasing. To compensate for this gap various techniques have been used, such as using a memory hierarchy with faster memory closer to the processing unit. Other techniques that have been tested include the compression of data prior to a memory transfer. Bandwidth limitations exists not only at low levels within the memory hierarchy, but also between the central processing unit (CPU) and the graphics processing unit (GPU), suggesting the use of compression to mask the gap. Seismic datasets are often very large, e.g. several terabytes. This thesis explores compression of seismic data to hide the bandwidth limitation between the CPU and the GPU for seismic applications. The compression method considered is subband coding, with both run-length encoding (RLE) and Huﬀman encoding as compressors of the quantized data. These methods has shown on CPU implementations to give very good compression ratios for seismic data. A proof of concept implementation for decompression of seismic data on GPUs is developed. It consists of three main components: First the subband synthesis ﬁlter reconstructing the input data processed by the subband analysis ﬁlter. Second, the inverse quantizer generating an output close to the input given to the quantizer. Finally, the decoders decompressing the compressed data using Huﬀman and RLE. The results of our implementation show that the seismic data compression algorithm investigated is probably not suited to hide the bandwidth limitation between CPU and GPU. This is because of the steps taken to do the decompression are likely slower than a simple memory copy of the uncompressed seismic data. It is primarily the decompressors that are the limiting factor, but in our implementation the subband synthesis is also limiting. The sequential nature of the decompres- sion algorithms used makes them diﬃcult to parallelize to make use of the processing units on the GPUs in an eﬃcient way. Several suggestions for future work is then suggested as well as results showing how our GPU implementation can be very useful for data compres- sion for data to be sent over a network. Our compression results give a compression factor between 27 and 32, and a SNR of 24.67dB for a cube of dimension 643. A speedup of 2.5 for the synthesis ﬁlter compared to the CPU implementation is achieved (2029.00/813.76 2.5). Although not currently suited for the GPU-CPU compression, our implementations indicate ntnudaim SIF2 datateknikk Komplekse datasystemer
100	Simulation of Fluid Flow Through Porous Rocks on Modern GPUs Aksnes, Eirik Ola January 2009 (has links) It is important for the petroleum industry to investigate how fluids flow inside the complicated geometries of porous rocks, in order to improve oil production. The lattice Boltzmann method can be used to calculate the porous rock's ability to transport fluids (permeability). However, this method is computationally intensive and hence begging for High Performance Computing (HPC). Modern GPUs are becoming interesting and important platforms for HPC. In this thesis, we show how to implement the lattice Boltzmann method on modern GPUs using the NVIDIA CUDA programming environment. Our work is done in collaborations with Numerical Rocks AS and the Department of Petroleum Engineering at the Norwegian University of Science and Technology. To better evaluate our GPU implementation, a sequential CPU implementation is first prepared. We then develop our GPU implementation and test both implementation using three porous data sets with known permeabilities provided by Numerical Rocks AS. Our simulations of fluid flow get high performance on modern GPUs showing that it is possible to calculate the permeability of porous rocks of simulations sizes up to 368^3, which fit into the 4 GB memory of the NVIDIA Quadro FX 5800 card. The performances of the CPU and GPU implementations are measured in MLUPS (million lattice node updates per second). Both implementations achieve their highest performances using single floating-point precision, resulting in the maximum performance equal to 1.59 MLUPS and 184.30 MLUPS. Techniques for reducing round-off errors are also discussed and implemented. ntnudaim SIF2 datateknikk Komplekse datasystemer

Search results