• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 628
  • 171
  • Tagged with
  • 799
  • 799
  • 799
  • 557
  • 471
  • 471
  • 136
  • 136
  • 94
  • 94
  • 88
  • 88
  • 6
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Parallelizing Particle-In-Cell Codes with OpenMP and MPI

Larsgård, Nils Magnus January 2007 (has links)
<p>Today's supercomputers often consists of clusters of SMP nodes. Both OpenMP and MPI are programming paradigms that can be used for parallelization of codes for such architectures. OpenMP uses shared memory, and hence is viewed as a simpler programming paradigm than MPI that is primarily a distributed memory paradigm. However, the Open MP applications may not scale beyond one SMP node. On the other hand, if we only use MPI, we might introduce overhead in intra-node communication. In this thesis we explore the trade-offs between using OpenMP, MPI and a mix of both paradigms for the same application. In particular, we look at a physics simulation and parallalize it with both OpenMP and MPI for large-scale simulations on modern supercomputers. A parallel SOR solver with OpenMP and MPI is implemented and the effects of such hybrid code are measured. We also utilize the FFTW-library that includes both system-optimized serial implementations and a parallel OpenMP FFT implementation. These solvers are used to make our existing Particle-In-Cell codes be more scalable and compatible with current programming paradigms and supercomputer architectures. We demonstrate that the overhead from communications in OpenMP loops on an SMP node is significant and increases with the number of CPUs participating in execution of the loop compared to equivalent MPI implementations. To analyze this result, we also present a simple model on how to estimate the overhead from communication in OpenMP loops. Our results are both surprising and should be of great interest to a large class of parallel applications.</p>
62

Fault-tolerance for MPI Codes on Computational Clusters

Hagen, Knut Imar January 2007 (has links)
<p>This thesis focuses on fault-tolerance for MPI codes on computational clusters. When an application runs on a very large cluster with thousands of processors, there is likely that a process crashes due to a hardware or software failure. Fault-tolerance is the ability of a system to respond gracefully to an unexpected hardware or software failure. A test application which is meant to run for several weeks on several nodes is used in this thesis. The application is a seismic MPI application, written in Fortran90. This application was provided by Statoil, who wanted a fault-tolerant implementation. The original test application had no degree of fault-tolerance --if one process or one node crashed, the entire application also crashed. In this thesis, a collection of fault-tolerant techniques are analysed, including checkpointing, MPI Error handlers, extending MPI, replication, fault detection, atomic clocks and multiple simultaneous failures. Several MPI implementations are described, like MPICH1, MPICH2, LAM/MPI and Open MPI. Next, some fault-tolerant products which are developed at other universities are described, like FT-MPI, FEMPI, MPICH-V including its five protocols, the fault-tolerant functionality of Open MPI, and MPI Error handlers. A fault-tolerant simulator which simulates the application's behaviour is developed. The simulator uses two fault-tolerance methods: FT-MPI and MPI Error handlers. Next, our test application is similarly made fault-tolerant with FT-MPI using three proposed approaches: MPI_Reduce(), MPI_Barrier(), and the final and current implementation: MPI Loop. Tests of the MPI Loop implementation are run on a small and a large cluster to verify the fault-tolerant behaviour. The seismic application survives a crash of n-2 nodes/processes. Process number 0 must stay alive since it acts as an I/O server, and there must be at least one process left to compute data. Processes can also be restarted rather than left out, but the test application needs to be modified to support this.</p>
63

Evolution of Control System(s) for a Multi Joint Snake : Transformer <-> #13

Hatteland, Karl January 2007 (has links)
<p>This thesis is about evolving a control system for a snake called Transformer <-> #13. This is a mechanical snake with several body parts. The choise was to use a cellular genetic algorithm where each body part is a cell. These contain “DNA”, one ruleset for each degree of freedom in the joints, which decides how it will behave in relation to its neighbour body parts. Three different fitness functions have been implemented which each gives a distinct and different behaviour. The goal of the different fitness functions is; crawling far, rising high and making geometry. The crawling part was successfull, while the other two goals was much harder for the snake and didnt provide great results. Concluding that the snake is appropriate for crawling around and making an impression of different cubic forms. Which for artist purposes is adequate, but it fails on getting into specific shapes.</p>
64

Collaborative Filtering for Recommending Movies

Bøe, Cecilie January 2007 (has links)
<p>There is a significant amount of ongoing research in the collaborative filtering field, with much of the research focusing on how to most accurately give item predictions to a user, based on ratings given by other users with similar rating patterns. The objective of this project is to build movie rating prediction models with a simple and intuitive representation, based on previous work within the area. Important factors are the investigation of the predictive power of these models, and the research on how the use of content information can improve accuracy when the available data is sparse. We show that latent class models provide an expressive, but yet simple way to represent the movie rating scenario, and that the models have great potential when it comes to predictive accuracy. We conclude that the inclusion of additional content features into the models can help improve the accuracy when there is little data available.</p>
65

Automatic Configuration for Collective Construction : Automatic parameter setting for response threshold agents in collective construction

Braseth, Jørgen January 2007 (has links)
<p>NA</p>
66

FPGA Framework for CMP

Østby, Kenneth January 2007 (has links)
<p>The single core processor stagnated due to four major factors. (1) The lack of instruction level parallelism to exploit, (2) increased power consumption, (3) complexity involved in designing a modern processor, and (4) the gap in performance between memory and the processor. As the gate size has decreased, a natural solution has been to introduce several cores on the same die, creating a chip multicore processor. However, the introduction of chip multicore processors has brought a new set of new challenges such as power consumptions and cache strategies. Although throughly researched in context of super computers, the chip multiprocessor has decreased in physical size, and thus some of the old paradigms should be reevaluated, and new ones found. To be able to research, simulate and experiment on new multicore architectures, simulators and methods of prototyping are needed by the community, and has traditionally been done by software simulators. To help decrease the time between results, and increase the productivity a hardware based method of prototyping is needed. This thesis contributes by presenting a novel multicore architecture with interchangeable and easily customizable units allowing the developers to extend the architecture, rewriting only the subsystem in question. The architecture is implemented in VHDL and has been tested on a Virtex FPGA, utilizing the MicroBlaze microcontroller. Based upon FPGA technologies, the platform has a more accurate nature than a software based simulator. The thesis also shows that a hardware based environment will significantly decrease the time to results.</p>
67

Framework for Polygonal Structures Computations on Clusters

Larsen, Leif Christian January 2007 (has links)
<p>Seismological applications use a 3D grid to represent the subsea rock structure. Many computations, such as detecting layers of rock in the seismic, can be done using the 3D grid exclusively. However, some algorithms for detecting vertical dislocations in the seismic require computations over a discretized polygon surface imposed over the 3D grid to assist geophysicists in interpreting the seismic data. When using seismological applications on clusters, the 3D grid data is distributed between several cluster nodes. This thesis considers how algorithms involving discretized polygon surfaces can efficiently utilize the parallelism provided by clusters, and provides a general framework such algorithms can utilize. The framework consists of three main parts: 1) efficient caching and transfer of voxels between cluster nodes, 2) efficient discretization or voxelization of polygon surfaces, and 3) efficient load-balancing. First, three algorithms for caching and transferring voxels between nodes are introduced. The strategy which only transfers necessary polygon voxels is shown to be superior in most cases for our workloads, obtaining a speedup of 24.28 over a strategy which caches the bounding volume of the polygon, and a speedup of 2.66 over a strategy which transfers small blocks surrounding each polygon voxel. Second, a new voxelization algorithm which may be suitable for Graphics Processing Units (GPUs) and multi-core CPU implementations is presented. On the GPU, a speedup of 2.14 is obtained over the corresponding algorithm on the CPU. For multi-core architectures without shared memory buses, a speedup of 2.21 is obtained when using 8 cores. Finally, three algorithms for load-balancing the computations are introduced and future work is discussed. Our load-balancing algorithms achieve a speedup of 5.77 compared to not using any load-balancing for our workloads.</p>
68

Tetrahedral mesh for needle insertion

Syvertsen, Rolf Anders January 2007 (has links)
<p>This is a Master’s thesis in how to make a tetrahedral mesh for use in a needle insertion simulator. It also describes how it is possible to make the simulator, and how to improve it to make it as realistic as possible. The medical simulator uses a haptic device, a haptic scene graph and a FEM for realistic soft tissue deformation and interaction. In this project a tetrahedral mesh is created from a polygon model, and then the mesh has been loaded into the HaptX haptic scene graph. The objects in the mesh have been made as different haptic objects, and then they have got a simple haptic surface to make it possible to touch them. There has not been implemented any code for the Hybrid Condensed FEM that has been described.</p>
69

Seismic processing using Parallel 3D FMM

Borlaug, Idar January 2007 (has links)
<p>This thesis develops and tests 3D Fast Marching Method (FMM) algorithm and apply these to seismic simulations. The FMM is a general method for monotonically advancing fronts, originally developed by Sethian. It calculates the first arrival time for an advancing front or wave. FMM methods are used for a variety of applications including, fatigue cracks in materials, lymph node segmentation in CT images, computing skeletons and centerlines in 3D objects and for finding salt formations in seismic data. Finding salt formations in seismic data, is important for the oil industry. Oil often flows towards gaps in the soil below a salt formation. It is therefore, important to map the edges of the salt formation, for this the FMM can be used. This FMM creates a first arrival time map, which makes it easier to see the edges of the salt formation. Herrmann developed a 3D parallel algorithm of the FMM testing waves of constant velocity. We implemented and tested his algorithm, but since seismic data typically causes a large variation of the velocities, optimizations were needed to make this algorithm scale. By optimising the border exchange and eliminating much of the roll backs, we delevoped and implemented a much improved 3D FMM which achieved close to theoretical performance, for up to at least 256 nodes on the current supercomputer at NTNU. Other methods like, different domain decompositions for better load balancing and running more FMM picks simultaneous, will also be discussed.</p>
70

Modelling fibre orientation of the left ventricular human heart wall

Siem, Knut Vidar Løvøy January 2007 (has links)
<p>The purpose of this thesis is to obtain and represent the orientation of the muscle fibres in the left ventricular wall of the human heart. The orientation of these fibres vary continuously through the wall. This report features an introduction to the human heart and medical imaging techniques. Attention is gradually drawn to concepts in computer science, and how they can help us get a “clearer picture” of the internals of, perhaps, the most important organ in the human body. A highly detailed Magnetic Resonance Imaging data set of the left ventricle cavity is used as a base for the analysis with 3-D morphological transformations. Also, a 3-D extension of the Hough transformation is developed. This does not seem to have been done before. An attempt is made to obtain the general trend of the trabeculae carneae, as it is believed that this is the orientation of the inner-most muscle fibres of the heart wall. Suggestions for further work include refinement of the proposed 3-D Hough transformation to yield lines that can be used as guides for parametric curves. Also a brief introduction to Diffusion Tensor Magnetic Resonance Imaging is given.</p>

Page generated in 0.0368 seconds