Spelling suggestions: "subject:"sio2"" "subject:"sift""
171 |
Structured data extraction: separating content from noise on news websitesArizaleta, Mikel January 2009 (has links)
In this thesis, we have treated the problem of separating content from noise on news websites. We have approached this problem by using TiMBL, a memory-based learning software. We have studied the relevance of the similarity in the training data and the effect of data size in the performance of the extractions.
|
172 |
Seismic Shot Processing on GPUJohansen, Owe January 2009 (has links)
Todays petroleum industry demand an ever increasing amount of compu- tational resources. Seismic processing applications in use by these types of companies have generally been using large clusters of compute nodes, whose only computing resource has been the CPU. However, using Graphics Pro- cessing Units (GPU) for general purpose programming is these days becoming increasingly more popular in the high performance computing area. In 2007, NVIDIA corporation launched their framework for developing GPU utilizing computational algorithms, known as the Compute Unied Device Architec- ture (CUDA), a wide variety of research areas have adopted this framework for their algorithms. This thesis looks at the applicability of GPU techniques and CUDA for off-loading some of the computational workload in a seismic shot modeling application provided by StatoilHydro to modern GPUs. This work builds on our recent project that looked at providing check- point restart for this MPI enabled shot modeling application. In this thesis, we demonstrate that the inherent data parallelism in the core finite-difference computations also makes our application well suited for GPU acceleration. By using CUDA, we show that we could do an efficient port our application, and through further refinements achieve significant performance increases. Benchmarks done on two different systems in the NTNU IDI (Depart- ment of Computer and Information Science) HPC-lab, are included. One system is a Intel Core2 Quad Q9550 @2.83GHz with 4GB of RAM and an NVIDIA GeForce GTX280 and NVIDIA Tesla C1060 GPU. Our sec- ond testbed was an Intel Core I7 Extreme (965 @3.20GHz) with 12GB of RAM hosting an NVIDIA Tesla S1070 (4X NVIDIA Tesla C1060). On this hardware, speedups up to a factor of 8-14.79 compared to the original se- quential code are achieved, confirming the potential of GPU computing in applications similar to the one used in this thesis.
|
173 |
Seismic Data Compression and GPU Memory LatencyHaugen, Daniel January 2009 (has links)
The gap between processing performance and the memory bandwidth is still increasing. To compensate for this gap various techniques have been used, such as using a memory hierarchy with faster memory closer to the processing unit. Other techniques that have been tested include the compression of data prior to a memory transfer. Bandwidth limitations exists not only at low levels within the memory hierarchy, but also between the central processing unit (CPU) and the graphics processing unit (GPU), suggesting the use of compression to mask the gap. Seismic datasets are often very large, e.g. several terabytes. This thesis explores compression of seismic data to hide the bandwidth limitation between the CPU and the GPU for seismic applications. The compression method considered is subband coding, with both run-length encoding (RLE) and Huffman encoding as compressors of the quantized data. These methods has shown on CPU implementations to give very good compression ratios for seismic data. A proof of concept implementation for decompression of seismic data on GPUs is developed. It consists of three main components: First the subband synthesis filter reconstructing the input data processed by the subband analysis filter. Second, the inverse quantizer generating an output close to the input given to the quantizer. Finally, the decoders decompressing the compressed data using Huffman and RLE. The results of our implementation show that the seismic data compression algorithm investigated is probably not suited to hide the bandwidth limitation between CPU and GPU. This is because of the steps taken to do the decompression are likely slower than a simple memory copy of the uncompressed seismic data. It is primarily the decompressors that are the limiting factor, but in our implementation the subband synthesis is also limiting. The sequential nature of the decompres- sion algorithms used makes them difficult to parallelize to make use of the processing units on the GPUs in an efficient way. Several suggestions for future work is then suggested as well as results showing how our GPU implementation can be very useful for data compres- sion for data to be sent over a network. Our compression results give a compression factor between 27 and 32, and a SNR of 24.67dB for a cube of dimension 643. A speedup of 2.5 for the synthesis filter compared to the CPU implementation is achieved (2029.00/813.76 2.5). Although not currently suited for the GPU-CPU compression, our implementations indicate
|
174 |
Simulation of Fluid Flow Through Porous Rocks on Modern GPUsAksnes, Eirik Ola January 2009 (has links)
It is important for the petroleum industry to investigate how fluids flow inside the complicated geometries of porous rocks, in order to improve oil production. The lattice Boltzmann method can be used to calculate the porous rock's ability to transport fluids (permeability). However, this method is computationally intensive and hence begging for High Performance Computing (HPC). Modern GPUs are becoming interesting and important platforms for HPC. In this thesis, we show how to implement the lattice Boltzmann method on modern GPUs using the NVIDIA CUDA programming environment. Our work is done in collaborations with Numerical Rocks AS and the Department of Petroleum Engineering at the Norwegian University of Science and Technology. To better evaluate our GPU implementation, a sequential CPU implementation is first prepared. We then develop our GPU implementation and test both implementation using three porous data sets with known permeabilities provided by Numerical Rocks AS. Our simulations of fluid flow get high performance on modern GPUs showing that it is possible to calculate the permeability of porous rocks of simulations sizes up to 368^3, which fit into the 4 GB memory of the NVIDIA Quadro FX 5800 card. The performances of the CPU and GPU implementations are measured in MLUPS (million lattice node updates per second). Both implementations achieve their highest performances using single floating-point precision, resulting in the maximum performance equal to 1.59 MLUPS and 184.30 MLUPS. Techniques for reducing round-off errors are also discussed and implemented.
|
175 |
Dancing RobotsTidemann, Axel January 2006 (has links)
This Masters thesis implements a multiple paired models architecture that is used to control a simulated robot. The architecture consists of several modules. Each module holds a paired forward/inverse model. The inverse model takes as input the current and desired state of the system, and outputs motor commands that will achieve the desired state. The forward model takes as input the current state and the motor commands acting on the environment, and outputs the predicted next state. The models are paired, due to the fact that the output of the inverse model is fed into the forward model. A weighting mechanism based on how well the forward model predicts determines how much a module will influence the total motor control. The architecture is a slight tweak of the HAMMER and MOSAIC architectures of Demiris and Wolpert, respectively. The robot is to imitate dance moves that it sees. Three experiments are done; in the first two the robot imitates another robot, whereas in the third experiment the robot imitates a movement pattern gathered from human data. The pattern was obtained using a Pro Reflex tracking system. After training the multiple paired models architecture, the performance and self-organization of the different modules are analyzed. Shortcomings with the architecture are pointed out along with directions for future work. The main results of this thesis is that the architecture does not self-organize as intended; instead the architecture finds its own way to separate the input space into different modules. This is also most likely attributed to a problem with the learning of the responsibility predictor of the modules. This problem must be solved for the architecture to work as designed, and is a good starting point for future work.
|
176 |
GeneTUC : Event extraction from TQL logicSøvik, Harald January 2006 (has links)
As Natural Language Processing systems converge on a high percentage of successful deeply parsed text, parse success alone is an incomplete measure of the ``intelligence'' exhibited by the system. Because systems apply different grammars, dictionaries and programming languages, the internal representation of parsed text is often different from system to system, making it difficult to compare performance and exchange useful data such as tagged corpora or semantic interpretations. This report describes how semantically annotated corpora can be used to measure quality of Natural Language Processing systems. A selected corpus produced by the GENIA project were used as ``golden standard'' (event-annotated abstracts from MEDLINE). This corpus were sparse (19 abstracts), thus manual methods were employed to produce a mapping from the native GeneTUC knowledge format (TQL). This mapping were used to produce an evaluation of events in GeneTUC. We were able to attain a recall of 67% and average precision of 33% on the training data. These results suggest that the mapping is inadequate. On test data, the recall were 28% and average precision 21%. Because events is a new ``feature'' in NLP-applications, there are no large corpora that can be used for automated rule learning. The conclusion is that at least there exists a partial mapping from TQL to GENIA events, and that larger corpora and AI-methods should be applied to refine the mapping rules. In addition, we discovered that this mapping can be of use for extraction of protein-protein interactions.
|
177 |
Automatic diagnosis of ultrasound images using standard view planes of fetal anatomyØdegård, Jan, Østen, Anders January 2006 (has links)
The use of ultrasound has revolutionised the area of clinical fetal examinations. The possibility of detecting congenital abnormalities at an early stage of the pregnancy is highly important to maximise the chances of correcting the defect before it becomes life-threatening. The problems related to the routine procedure is its complexity and the fact that it requires a lot of knowledge about fetal anatomy. Because of the lack of training among midwives, especially in less developed countries, the results of the examinations are often limited. In addition, the quality of the ultrasound equipment is often restricted. These limitations imply the need for a standardised procedure for the examination to decrease the amount of time required, as well as an automatic method for proposing the diagnosis of the fetus. This thesis has proposed a solution for automatically making a diagnosis based on the contents of extracted ultrasound images. Based on the concept of standard view planes, a list of predefined images are obtained of the fetus during the routine ultrasonography. These images contain the most important organs to examine, and most common congenital abnormalities are therefore detectable in this set. In order to perform the analysis of the images, medical domain knowledge must be obtained and stored to enable reasoning about the findings in the ultrasound images. The findings are extracted through segmentation and each object is given a unique description. An organ database is developed to store descriptions about existing organs to recognise the extracted objects. Once the organs have been identified, a CBR system is applied to analyse the total contents of one standard view plane. The CBR system uses domain knowledge from the medical domain as well as previously solved problems to identify possible abnormalities in the case describing the standard view plane. When a solution is obtained, it is stored for later retrieval. This causes the reliability of future examinations to increase, because of the constant expansion of the knowledge base. The foundation of standard view planes ensures an effective procedure and the amount of training needed to learn the procedure is minimised due to the automatic extraction and analysis of the contents of the standard view plane. The midwife only has to learn which standard view planes to obtain, not the analysis of their contents.
|
178 |
A Shared Memory Structure for Cooperative Problem SolvingRøssland, Kari January 2006 (has links)
The contribution of this thesis is a framework architecture for cooperative distributed problem solving in multiagent systems using a shared memory structure. Our shared memory structure, the TEAM SPACE, coordinates the problem solving process that is based on a plan in form of a hierarchy of decomposed tasks.
|
179 |
Cloth Modelling on the GPUDencker, Kjartan January 2006 (has links)
This project explores the possibility to use general purpose programming on the GPU to simlate clothes in 3D. The goal is to implement a faster version of the method given in 'Large Steps in Cloth Modelling' by Baraff et. al. (Implicit Euler).
|
180 |
Implementation and evaluation of Norwegian Analyzer for use with DotLuceneOlsen, Bjørn Harald January 2006 (has links)
This work has focused on improving retrieval performance of search in Norwegian document collections. The initiator of the thesis, InfoFinder Norge, desired an Norwegian analyzer for DotLucene. The standard analyzer used before did not support stopword elimination and stemming for Norwegian language. Norwegian Analyzer and standard analyzer were used in turns on the same document collection before indexing and querying, then the respective results were compared to discover efficiency improvements. An evaluation method based on Term Relevance Sets were investigated and used on DotLucene with use of the two analyzer approaches. Term Relevance Sets methodology were also compared with common measurements for relevance judging, and found useful for evaluation of IR systems. The evaluation results of Norwegian analyzer and standard analyzer gave clear indications that use of stopword elimination and stemming for Norwegian documents improves retrieval efficiency. Term Relevance Set-based evaluation was found reliable by comparing the results with precision measurements. Precision was increased with 16% with use of Norwegian Analyzer compared to use an standard analyzer with no content preprocessing support for Norwegian. Term Relevance Set evaluation with use of 10 ontopic terms and 10 offtopic terms gave an increased $tScore$ of 44%. The results show that counting term occurrences in the content of retrieved documents can be used to gain confidence that documents are either relevant or not relevant.
|
Page generated in 0.034 seconds