161 |
Robust Image Registration for Improved Clinical Efficiency : Using Local Structure Analysis and Model-Based ProcessingForsberg, Daniel January 2013 (has links)
Medical imaging plays an increasingly important role in modern healthcare. In medical imaging, it is often relevant to relate different images to each other, something which can prove challenging, since there rarely exists a pre-defined mapping between the pixels in different images. Hence, there is a need to find such a mapping/transformation, a procedure known as image registration. Over the years, image registration has been proved useful in a number of clinical situations. Despite this, current use of image registration in clinical practice is rather limited, typically only used for image fusion. The limited use is, to a large extent, caused by excessive computation times, lack of established validation methods/metrics and a general skepticism toward the trustworthiness of the estimated transformations in deformable image registration. This thesis aims to overcome some of the issues limiting the use of image registration, by proposing a set of technical contributions and two clinical applications targeted at improved clinical efficiency. The contributions are made in the context of a generic framework for non-parametric image registration and using an image registration method known as the Morphon. In image registration, regularization of the estimated transformation forms an integral part in controlling the registration process, and in this thesis, two regularizers are proposed and their applicability demonstrated. Although the regularizers are similar in that they rely on local structure analysis, they differ in regard to implementation, where one is implemented as applying a set of filter kernels, and where the other is implemented as solving a global optimization problem. Furthermore, it is proposed to use a set of quadrature filters with parallel scales when estimating the phase-difference, driving the registration. A proposal that brings both accuracy and robustness to the registration process, as shown on a set of challenging image sequences. Computational complexity, in general, is addressed by porting the employed Morphon algorithm to the GPU, by which a performance improvement of 38-44x is achieved, when compared to a single-threaded CPU implementation. The suggested clinical applications are based upon the concept paint on priors, which was formulated in conjunction with the initial presentation of the Morphon, and which denotes the notion of assigning a model a set of properties (local operators), guiding the registration process. In this thesis, this is taken one step further, in which properties of a model are assigned to the patient data after completed registration. Based upon this, an application using the concept of anatomical transfer functions is presented, in which different organs can be visualized with separate transfer functions. This has been implemented for both 2D slice visualization and 3D volume rendering. A second application is proposed, in which landmarks, relevant for determining various measures describing the anatomy, are transferred to the patient data. In particular, this is applied to idiopathic scoliosis and used to obtain various measures relevant for assessing spinal deformity. In addition, a data analysis scheme is proposed, useful for quantifying the linear dependence between the different measures used to describe spinal deformities.
|
162 |
Programming Models and Runtimes for Heterogeneous SystemsGrossman, Max 16 September 2013 (has links)
With the plateauing of processor frequencies and increase in energy consumption in computing, application developers are seeking new sources of performance acceleration. Heterogeneous platforms with multiple processor architectures offer one possible avenue to address these challenges. However, modern heterogeneous programming models tend to be either so low-level as to severely hinder programmer productivity, or so high-level as to limit optimization opportunities. The novel systems presented in this thesis strike a better balance between abstraction and transparency, enabling programmers to be productive and produce high-performance applications on heterogeneous platforms.
This thesis starts by summarizing the strengths, weaknesses, and features of existing heterogeneous programming models. It then introduces and evaluates four novel heterogeneous programming models and runtime systems: JCUDA, CnC-CUDA, DyGR, and HadoopCL. We'll conclude by positioning the key contributions of each piece in this thesis relative to the state-of-the-art, and outline possible directions for future work.
|
163 |
High-speed View Matching using Region Descriptors / Vymatchning i realtid med region-deskriptorerLind, Anders January 2010 (has links)
This thesis treats topics within the area of object recognition. A real-time view matching method has been developed to compute the transformation between two different images of the same scene. This method uses a color based region detector called MSCR and affine transformations of these regions to create affine-invariant patches that are used as input to the SIFT algorithm. A parallel method to compute the SIFT descriptor has been created with relaxed constraints so that the descriptor size and the number of histogram bins can be adjusted. Additionally, a matching step to deduce correspondences and a parallel RANSAC method have been created to estimate the undergone transformation between these descriptors. To achieve real-time performance, the implementation has been targeted to use the parallel nature of the GPU with CUDA as the programming language. Focus has been put on the architecture of the GPU to find the best way to parallelize the different processing steps. CUDA has also been combined with OpenGL to be able to use the hardware accelerated anisotropic sampling method for affine transformations of regions. Parts of the implementation can also be used individually from either Matlab or by using the provided C++ library directly. The method was also evaluated in terms of accuracy and speed. It was shown that our algorithm has similar or better accuracy at finding correspondences than SIFT when the 3D geometry changes are large but we get a slightly worse result on images with flat surfaces.
|
164 |
ParModelica : Extending the Algorithmic Subset ofModelica with Explicit Parallel LanguageConstructs for Multi-core SimulationGebremedhin, Mahder January 2011 (has links)
In today’s world of high tech manufacturing and computer-aided design simulations of models is at theheart of the whole manufacturing process. Trying to represent and study the variables of real worldmodels using simulation computer programs can turn out to be a very expensive and time consumingtask. On the other hand advancements in modern multi-core CPUs and general purpose GPUs promiseremarkable computational power. Properly utilizing this computational power can provide reduced simulation time. To this end modernmodeling environments provide different optimization and parallelization options to take advantage ofthe available computational power. Some of these parallelization approaches are based onautomatically extracting parallelism with the help of a compiler. Another approach is to provide themodel programmers with the necessary language constructs to express any potential parallelism intheir models. This second approach is taken in this thesis work. The OpenModelica modeling and simulation environment for the Modelica language has beenextended with new language constructs for explicitly stating parallelism in algorithms. This slightlyextended algorithmic subset of Modelica is called ParModelica. The new extensions allow modelswritten in ParModelica to be translated to optimized OpenCL code which can take advantage of thecomputational power of available Multi-core CPUs and general purpose GPUs.
|
165 |
Rendering for Microlithography on GPU HardwareIwaniec, Michel January 2008 (has links)
Over the last decades, integrated circuits have changed our everyday lives in a number of ways. Many common devices today taken for granted would not have been possible without this industrial revolution. Central to the manufacturing of integrated circuits is the photomask used to expose the wafers. Additionally, such photomasks are also used for manufacturing of flat screen displays. Microlithography, the manufacturing technique of such photomasks, requires complex electronics equipment that excels in both speed and fidelity. Manufacture of such equipment requires competence in virtually all engineering disciplines, where the conversion of geometry into pixels is but one of these. Nevertheless, this single step in the photomask drawing process has a major impact on the throughput and quality of a photomask writer. Current high-end semiconductor writers from Micronic use a cluster of Field-Programmable Gate Array circuits (FPGA). FPGAs have for many years been able to replace Application Specific Integrated Circuits due to their flexibility and low initial development cost. For parallel computation, an FPGA can achieve throughput not possible with microprocessors alone. Nevertheless, high-performance FPGAs are expensive devices, and upgrading from one generation to the next often requires a major redesign. During the last decade, the computer games industry has taken the lead in parallel computation with graphics card for 3D gaming. While essentially being designed to render 3D polygons and lacking the flexibility of an FPGA, graphics cards have nevertheless started to rival FPGAs as the main workhorse of many parallel computing applications. This thesis covers an investigation on utilizing graphics cards for the task of rendering geometry into photomask patterns. It describes different strategies that were tried and the throughput and fidelity achieved with them, along with the problems encountered. It also describes the development of a suitable evaluation framework that was critical to the process.
|
166 |
Distributed Algorithms for SVD-based Least Squares EstimationPeng, Yu-Ting 19 July 2011 (has links)
Singular value decomposition (SVD) is a popular decomposition method for solving least-squares estimation problems. However, for large datasets, SVD is very time consuming and memory demanding in obtaining least squares solutions. In this paper, we propose a least squares estimator based on an iterative divide-and-merge scheme for large-scale estimation problems. The estimator consists of several levels. At each level, the input matrices are subdivided into submatrices. The submatrices are decomposed by SVD respectively and the results are merged into smaller matrices which become the input of the next level. The process is iterated until the resulting matrices are small enough which can then be solved directly and efficiently by the SVD algorithm. However, the iterative divide-and-merge algorithms executed on a single machine is still time demanding on large scale datasets. We propose two distributed algorithms to overcome this shortcoming by permitting several machines to perform the decomposition and merging of the submatrices in each level in parallel. The first one is implemented in MapReduce on the Hadoop distributed platform which can run the tasks in parallel on a collection of computers. The second one is implemented on CUDA which can run the tasks in parallel using the Nvidia GPUs. Experimental results demonstrate that the proposed distributed algorithms can greatly reduce the time required to solve large-squares problems.
|
167 |
Massive Crowd Simulation With Parallel ProcessingYilmaz, Erdal 01 February 2010 (has links) (PDF)
This thesis analyzes how parallel processing with Graphics Processing Unit (GPU) could be used for massive crowd simulation, not only in terms of rendering but also the computational power that is required for realistic simulation. The extreme population in massive crowd simulation introduces an extra computational load, which is quite difficult to meet by using Central Processing Unit (CPU) resources only. The thesis shows the specific methods and approaches that maximize the throughput of GPU parallel computing, while using GPU as the main processor for massive crowd simulation.
The methodology introduced in this thesis makes it possible to simulate and visualize hundreds of thousands of virtual characters in real-time. In order to achieve two orders of magnitude speedups by using GPU parallel processing, various stream compaction and effective memory access approaches were employed.
To simulate crowd behavior, fuzzy logic functionality on the GPU has been implemented from scratch. This implementation is capable of computing more than half billion fuzzy inferences per second.
|
168 |
A Parallel Algorithm For Flight Route Planning On Gpu Using CudaSanci, Seckin 01 May 2010 (has links) (PDF)
Aerial surveillance missions require a geographical region known as the area of interest to be inspected. The route that the aerial reconnaissance vehicle will follow is known as the flight route. Flight route planning operation has to be done before the actual mission is executed. A flight route may consist of hundreds of pre-defined geographical positions called waypoints. The optimal flight route planning manages to find a tour passing through all of the waypoints by covering the minimum possible distance. Due to the combinatorial nature of the problem it is impractical to devise a solution using brute force approaches. This study presents a strategy to find a cost effective and near-optimal solution to the flight route planning problem. The proposed approach is implemented on GPU using CUDA.
|
169 |
Rendering for Microlithography on GPU HardwareIwaniec, Michel January 2008 (has links)
<p>Over the last decades, integrated circuits have changed our everyday lives in a number of ways. Many common devices today taken for granted would not have been possible without this industrial revolution.</p><p>Central to the manufacturing of integrated circuits is the photomask used to expose the wafers. Additionally, such photomasks are also used for manufacturing of flat screen displays. Microlithography, the manufacturing technique of such photomasks, requires complex electronics equipment that excels in both speed and fidelity. Manufacture of such equipment requires competence in virtually all engineering disciplines, where the conversion of geometry into pixels is but one of these. Nevertheless, this single step in the photomask drawing process has a major impact on the throughput and quality of a photomask writer.</p><p>Current high-end semiconductor writers from Micronic use a cluster of Field-Programmable Gate Array circuits (FPGA). FPGAs have for many years been able to replace Application Specific Integrated Circuits due to their flexibility and low initial development cost. For parallel computation, an FPGA can achieve throughput not possible with microprocessors alone. Nevertheless, high-performance FPGAs are expensive devices, and upgrading from one generation to the next often requires a major redesign.</p><p>During the last decade, the computer games industry has taken the lead in parallel computation with graphics card for 3D gaming. While essentially being designed to render 3D polygons and lacking the flexibility of an FPGA, graphics cards have nevertheless started to rival FPGAs as the main workhorse of many parallel computing applications.</p><p>This thesis covers an investigation on utilizing graphics cards for the task of rendering geometry into photomask patterns. It describes different strategies that were tried and the throughput and fidelity achieved with them, along with the problems encountered. It also describes the development of a suitable evaluation framework that was critical to the process.</p>
|
170 |
Advanced Real-time Post-Processing using GPGPU techniquesLönroth, Per, Unger, Mattias January 2008 (has links)
<p> </p><p>Post-processing techniques are used to change a rendered image as a last step before presentation and include, but is not limited to, operations such as change of saturation or contrast, and also more advanced effects like depth-of-field and tone mapping.</p><p>Depth-of-field effects are created by changing the focus in an image; the parts close to the focus point are perfectly sharp while the rest of the image has a variable amount of blurriness. The effect is widely used in photography and movies as a depth cue but has in the latest years also been introduced into computer games.</p><p>Today’s graphics hardware gives new possibilities when it comes to computation capacity. Shaders and GPGPU languages can be used to do massive parallel operations on graphics hardware and are well suited for game developers.</p><p>This thesis presents the theoretical background of some of the recent and most valuable depth-of-field algorithms and describes the implementation of various solutions in the shader domain but also using GPGPU techniques. The main objective is to analyze various depth-of-field approaches and look at their visual quality and how the methods scale performance wise when using different techniques.</p><p> </p>
|
Page generated in 0.0248 seconds