Global ETD Search

51	GPU acceleration of matrix-based methods in computational electromagnetics Lezar, Evan 03 1900 (has links) Thesis (PhD (Electrical and Electronic Engineering))--University of Stellenbosch, 2011. / ENGLISH ABSTRACT: This work considers the acceleration of matrix-based computational electromagnetic (CEM) techniques using graphics processing units (GPUs). These massively parallel processors have gained much support since late 2006, with software tools such as CUDA and OpenCL greatly simplifying the process of harnessing the computational power of these devices. As with any advances in computation, the use of these devices enables the modelling of more complex problems, which in turn should give rise to better solutions to a number of global challenges faced at present. For the purpose of this dissertation, CUDA is used in an investigation of the acceleration of two methods in CEM that are used to tackle a variety of problems. The first of these is the Method of Moments (MOM) which is typically used to model radiation and scattering problems, with the latter begin considered here. For the CUDA acceleration of the MOM presented here, the assembly and subsequent solution of the matrix equation associated with the method are considered. This is done for both single and double precision oating point matrices. For the solution of the matrix equation, general dense linear algebra techniques are used, which allow for the use of a vast expanse of existing knowledge on the subject. This also means that implementations developed here along with the results presented are immediately applicable to the same wide array of applications where these methods are employed. Both the assembly and solution of the matrix equation implementations presented result in signi cant speedups over multi-core CPU implementations, with speedups of up to 300x and 10x, respectively, being measured. The implementations presented also overcome one of the major limitations in the use of GPUs as accelerators (that of limited memory capacity) with problems up to 16 times larger than would normally be possible being solved. The second matrix-based technique considered is the Finite Element Method (FEM), which allows for the accurate modelling of complex geometric structures including non-uniform dielectric and magnetic properties of materials, and is particularly well suited to handling bounded structures such as waveguide. In this work the CUDA acceleration of the cutoff and dispersion analysis of three waveguide configurations is presented. The modelling of these problems using an open-source software package, FEniCS, is also discussed. Once again, the problem can be approached from a linear algebra perspective, with the formulation in this case resulting in a generalised eigenvalue (GEV) problem. For the problems considered, a total solution speedup of up to 7x is measured for the solution of the generalised eigenvalue problem, with up to 22x being attained for the solution of the standard eigenvalue problem that forms part of the GEV problem. / AFRIKAANSE OPSOMMING: In hierdie werkstuk word die versnelling van matriksmetodes in numeriese elektromagnetika (NEM) deur die gebruik van grafiese verwerkingseenhede (GVEe) oorweeg. Die gebruik van hierdie verwerkingseenhede is aansienlik vergemaklik in 2006 deur sagteware pakette soos CUDA en OpenCL. Hierdie toestelle, soos ander verbeterings in verwerkings vermoe, maak dit moontlik om meer komplekse probleme op te los. Hierdie stel wetenskaplikes weer in staat om globale uitdagings beter aan te pak. In hierdie proefskrif word CUDA gebruik om ondersoek in te stel na die versnelling van twee metodes in NEM, naamlik die Moment Metode (MOM) en die Eindige Element Metode (EEM). Die MOM word tipies gebruik om stralings- en weerkaatsingsprobleme op te los. Hier word slegs na die weerkaatsingsprobleme gekyk. CUDA word gebruik om die opstel van die MOM matriks en ook die daaropvolgende oplossing van die matriksvergelyking wat met die metode gepaard gaan te bespoedig. Algemene digte lineere algebra tegnieke word benut om die matriksvergelykings op te los. Dit stel die magdom bestaande kennis in die vagebied beskikbaar vir die oplossing, en gee ook aanleiding daartoe dat enige implementasies wat ontwikkel word en resultate wat verkry word ook betrekking het tot 'n wye verskeidenheid probleme wat die lineere algebra metodes gebruik. Daar is gevind dat beide die opstelling van die matriks en die oplossing van die matriksvergelyking aansienlik vinniger is as veelverwerker SVE implementasies. 'n Verselling van tot 300x en 10x onderkeidelik is gemeet vir die opstel en oplos fases. Die hoeveelheid geheue beskikbaar tot die GVE is een van die belangrike beperkinge vir die gebruik van GVEe vir groot probleme. Hierdie beperking word hierin oorkom en probleme wat selfs 16 keer groter is as die GVE se beskikbare geheue word geakkommodeer en suksesvol opgelos. Die Eindige Element Metode word op sy beurt gebruik om komplekse geometriee asook nieuniforme materiaaleienskappe te modelleer. Die EEM is ook baie geskik om begrensde strukture soos golfgeleiers te hanteer. Hier word CUDA gebruik of om die afsny- en dispersieanalise van drie gol eierkonfigurasies te versnel. Die implementasie van hierdie probleme word gedoen deur 'n versameling oopbronkode wat bekend staan as FEniCS, wat ook hierin bespreek word. Die probleme wat ontstaan in die EEM kan weereens vanaf 'n lineere algebra uitganspunt benader word. In hierdie geval lei die formulering tot 'n algemene eiewaardeprobleem. Vir die gol eier probleme wat ondersoek word is gevind dat die algemene eiewaardeprobleem met tot 7x versnel word. Die standaard eiewaardeprobleem wat 'n stap is in die oplossing van die algemene eiewaardeprobleem is met tot 22x versnel. Massively parallel processors Acceleration of CEM techniques CUDA OpenCL Dissertations -- Electronic engineering Theses -- Electronic engineering Electromagnetism Graphics processing units Computational electromagnetics (CEM)
52	Modeling Multi-factor Financial Derivatives by a Partial Differential Equation Approach with Efficient Implementation on Graphics Processing Units Dang, Duy Minh 15 November 2013 (has links) This thesis develops efficient modeling frameworks via a Partial Differential Equation (PDE) approach for multi-factor financial derivatives, with emphasis on three-factor models, and studies highly efficient implementations of the numerical methods on novel high-performance computer architectures, with particular focus on Graphics Processing Units (GPUs) and multi-GPU platforms/clusters of GPUs. Two important classes of multi-factor financial instruments are considered: cross-currency/foreign exchange (FX) interest rate derivatives and multi-asset options. For cross-currency interest rate derivatives, the focus of the thesis is on Power Reverse Dual Currency (PRDC) swaps with three of the most popular exotic features, namely Bermudan cancelability, knockout, and FX Target Redemption. The modeling of PRDC swaps using one-factor Gaussian models for the domestic and foreign interest short rates, and a one-factor skew model for the spot FX rate results in a time-dependent parabolic PDE in three space dimensions. Our proposed PDE pricing framework is based on partitioning the pricing problem into several independent pricing subproblems over each time period of the swap's tenor structure, with possible communication at the end of the time period. Each of these subproblems requires a solution of the model PDE. We then develop a highly efficient GPU-based parallelization of the Alternating Direction Implicit (ADI) timestepping methods for solving the model PDE. To further handle the substantially increased computational requirements due to the exotic features, we extend the pricing procedures to multi-GPU platforms/clusters of GPUs to solve each of these independent subproblems on a separate GPU. Numerical results indicate that the proposed GPU-based parallel numerical methods are highly efficient and provide significant increase in performance over CPU-based methods when pricing PRDC swaps. An analysis of the impact of the FX volatility skew on the price of PRDC swaps is provided. In the second part of the thesis, we develop efficient pricing algorithms for multi-asset options under the Black-Scholes-Merton framework, with strong emphasis on multi-asset American options. Our proposed pricing approach is built upon a combination of (i) a discrete penalty approach for the linear complementarity problem arising due to the free boundary and (ii) a GPU-based parallel ADI Approximate Factorization technique for the solution of the linear algebraic system arising from each penalty iteration. A timestep size selector implemented efficiently on GPUs is used to further increase the efficiency of the methods. We demonstrate the efficiency and accuracy of the proposed GPU-based parallel numerical methods by pricing American options written on three assets. multi-currency swaps multi-currency options Power Reverse-Dual Currency PRDC Partial Differential Equation PDE Alternating Direction Implicit ADI Graphics Processing Units GPU parallel computing finite difference 0984
53	Accélérateurs logiciels et matériels pour l'algèbre linéaire creuse sur les corps finis / Hardware and Software Accelerators for Sparse Linear Algebra over Finite Fields Jeljeli, Hamza 16 July 2015 (has links) Les primitives de la cryptographie à clé publique reposent sur la difficulté supposée de résoudre certains problèmes mathématiques. Dans ce travail, on s'intéresse à la cryptanalyse du problème du logarithme discret dans les sous-groupes multiplicatifs des corps finis. Les algorithmes de calcul d'index, utilisés dans ce contexte, nécessitent de résoudre de grands systèmes linéaires creux définis sur des corps finis de grande caractéristique. Cette algèbre linéaire représente dans beaucoup de cas le goulot d'étranglement qui empêche de cibler des tailles de corps plus grandes. L'objectif de cette thèse est d'explorer les éléments qui permettent d'accélérer cette algèbre linéaire sur des architectures pensées pour le calcul parallèle. On est amené à exploiter le parallélisme qui intervient dans différents niveaux algorithmiques et arithmétiques et à adapter les algorithmes classiques aux caractéristiques des architectures utilisées et aux spécificités du problème. Dans la première partie du manuscrit, on présente un rappel sur le contexte du logarithme discret et des architectures logicielles et matérielles utilisées. La seconde partie du manuscrit est consacrée à l'accélération de l'algèbre linéaire. Ce travail a donné lieu à deux implémentations de résolution de systèmes linéaires basées sur l'algorithme de Wiedemann par blocs : une implémentation adaptée à un cluster de GPU NVIDIA et une implémentation adaptée à un cluster de CPU multi-cœurs. Ces implémentations ont contribué à la réalisation de records de calcul de logarithme discret dans les corps binaires GF(2^{619}) et GF(2^{809} et dans le corps premier GF(p_{180}) / The security of public-key cryptographic primitives relies on the computational difficulty of solving some mathematical problems. In this work, we are interested in the cryptanalysis of the discrete logarithm problem over the multiplicative subgroups of finite fields. The index calculus algorithms, which are used in this context, require solving large sparse systems of linear equations over finite fields. This linear algebra represents a serious limiting factor when targeting larger fields. The object of this thesis is to explore all the elements that accelerate this linear algebra over parallel architectures. We need to exploit the different levels of parallelism provided by these computations and to adapt the state-of-the-art algorithms to the characteristics of the considered architectures and to the specificities of the problem. In the first part of the manuscript, we present an overview of the discrete logarithm context and an overview of the considered software and hardware architectures. The second part deals with accelerating the linear algebra. We developed two implementations of linear system solvers based on the block Wiedemann algorithm: an NVIDIA-GPU-based implementation and an implementation adapted to a cluster of multi-core CPU. These implementations contributed to solving the discrete logarithm problem in binary fields GF(2^{619}) et GF(2^{809}) and in the prime field GF(p_{180}) Calcul haute-Performance Solveurs d’algèbre linéaire creuse Arithmétique sur les corps finis Residue Number System Processeurs multicœurs Graphics Processing Units (GPU) 004.35
54	Microrganismos patogênicos veiculados por formigas "andarilhas" em unidades de alimentação / Pathogens vectored by "tramp" ants in Food Distribution Units Schuller, Lucia 29 April 2004 (has links) As formigas andarilhas têm adquirido uma notoriedade científica graças aos trabalhos realizados desde a década de 70 quando foi constatada a presença de patógenos nas amostras de formigas coletadas de ambientes hospitalares. Os trabalhos elaborados a partir de então relataram a presença dos gêneros Salmonella, Staphylococcus, Klebsiella e Enterobacter nesses ambientes além de outros microrganismos patogênicos de importância. No entanto, pouco conhecimento foi produzido a partir da sua presença em ambientes em que se manipulam e produzem alimentos para consumo humano. As formigas andarilhas têm sido observadas com freqüência em domicílios, áreas de manipulação e fabrico de alimentos asism como em Unidades de Alimentação e se constituem em uma das principais queixas de consumidores. O presente estudo procurou verificar quais os patógenos de importância para a indústria de alimentos encontrados em formigas coletadas em Unidades de Alimentação. As coletas foram feitas em meio de cultura Agar sangue e os isolamentos nos seguintes meios de cultura:Baird Parker para Staphylococcus, Sulfito de Bismuto para Salmonella e Agar MacConkey para enterobactérias. Os resultados demonstraram a presença de S. aureus e de enterobactérias provenientes de amostras de formigas coletadas em Unidades de Alimentação na região da Grande São Paulo, sugerindo que as formigas andarilhas podem ser importantes vetores de microrganismos de relevância e que interfiram na higidez dos alimentos. / Tramp ants have been scientific recognized due to the investigations conducted since 1970 when pathogens were for the first time encountered in ant samples collected from hospital environments. The surveys conducted since then state the presence of important microorganisms such as Salmonella, Staphylococcus, Klebsiella and Enterobacter. However, very little knowledge has been produced in regards to food preparation areas. Tramp ants have been frequently detected in houses, food preparation areas as well as in industrial and commercial restaurants and are considered as one of the most important pests nowadays. The present survey investigated the food industry important pathogens found in tramp species that were collected from industrial restaurants. The samples were obtained with agar blood plaque baits and were later incubated in several culture means: Baird Parker for Staphylococcus, Bismut Sulfit for Salmonella and MacKonkey for enterobacteria. The results show the presence of S. aureus and enterobacteria in the ant samples collected from the industrial restaurant environment in São Paulo, Brazil. These results indicate the tramp ants can be important mechanical vectors of relevant pathogens for the food industry. Alimentação Bacteria Bactérias Food processing units Formigas Formigas andarilhas Formigas urbanas Microrganismos patogênicos Pathogens Patógenos Tramp ants Unidades de alimentação Vectors Vetores
55	Microrganismos patogênicos veiculados por formigas "andarilhas" em unidades de alimentação / Pathogens vectored by "tramp" ants in Food Distribution Units Lucia Schuller 29 April 2004 (has links) As formigas andarilhas têm adquirido uma notoriedade científica graças aos trabalhos realizados desde a década de 70 quando foi constatada a presença de patógenos nas amostras de formigas coletadas de ambientes hospitalares. Os trabalhos elaborados a partir de então relataram a presença dos gêneros Salmonella, Staphylococcus, Klebsiella e Enterobacter nesses ambientes além de outros microrganismos patogênicos de importância. No entanto, pouco conhecimento foi produzido a partir da sua presença em ambientes em que se manipulam e produzem alimentos para consumo humano. As formigas andarilhas têm sido observadas com freqüência em domicílios, áreas de manipulação e fabrico de alimentos asism como em Unidades de Alimentação e se constituem em uma das principais queixas de consumidores. O presente estudo procurou verificar quais os patógenos de importância para a indústria de alimentos encontrados em formigas coletadas em Unidades de Alimentação. As coletas foram feitas em meio de cultura Agar sangue e os isolamentos nos seguintes meios de cultura:Baird Parker para Staphylococcus, Sulfito de Bismuto para Salmonella e Agar MacConkey para enterobactérias. Os resultados demonstraram a presença de S. aureus e de enterobactérias provenientes de amostras de formigas coletadas em Unidades de Alimentação na região da Grande São Paulo, sugerindo que as formigas andarilhas podem ser importantes vetores de microrganismos de relevância e que interfiram na higidez dos alimentos. / Tramp ants have been scientific recognized due to the investigations conducted since 1970 when pathogens were for the first time encountered in ant samples collected from hospital environments. The surveys conducted since then state the presence of important microorganisms such as Salmonella, Staphylococcus, Klebsiella and Enterobacter. However, very little knowledge has been produced in regards to food preparation areas. Tramp ants have been frequently detected in houses, food preparation areas as well as in industrial and commercial restaurants and are considered as one of the most important pests nowadays. The present survey investigated the food industry important pathogens found in tramp species that were collected from industrial restaurants. The samples were obtained with agar blood plaque baits and were later incubated in several culture means: Baird Parker for Staphylococcus, Bismut Sulfit for Salmonella and MacKonkey for enterobacteria. The results show the presence of S. aureus and enterobacteria in the ant samples collected from the industrial restaurant environment in São Paulo, Brazil. These results indicate the tramp ants can be important mechanical vectors of relevant pathogens for the food industry. Alimentação Bactérias Formigas Formigas andarilhas Formigas urbanas Microrganismos patogênicos Patógenos Unidades de alimentação Vetores Bacteria Food processing units Pathogens Tramp ants Vectors
56	Efficient Compilation Of Stream Programs Onto Multi-cores With Accelerators Udupa, Abhishek 07 1900 (has links) Over the past two decades, microprocessor manufacturers have typically relied on wider issue widths and deeper pipelines to obtain performance improvements for single threaded applications. However, in the recent years, with power dissipation and wire delays becoming primary design constraints, this approach can no longer be effectively used to yield performance improvements. Thus process designers and vendors are universally moving towards multi-core designs. Examples for these are the commodity general purpose multi-core processors, the CellBE accelerator from IBM and the Graphics Processing Units from NVIDIA and ATI. Although these many and multi-core architectures can provide enormous performance benefits, it is difficult to program for them due to the complexity of writing explicitly parallel code. The ubiquity of computationally intensive media processing applications makes it imperative to consider new programming frameworks and languages that can express parallelism in an easy, portable manner. The StreamIt programming language has been proposed to efficiently exploit parallelism at various levels on general purpose multi-core architectures and stream processors and allow media processing and DSP application to be developed in an easy and portable fashion. The StreamIt model allows programmers to specify a program as a set of filters connected by FIFO communication channels. The graphs thus specified by the StreamIt programs describe task, data and pipeline parallelism which can be potentially exploited on modern Graphics Processing Units (GPUs), which have emerged as powerful, commodity stream processors, which support abundant parallelism in hardware. The first part of this thesis deals with the challenges in mapping StreamIt programs to GPUs and proposes an efficient technique to software pipeline the execution of stream Programs on GPUs. We formulate this problem—both scheduling and assignment of filters to processors—as an efficient Integer Linear Program(ILP), which is then solved using ILP solvers. We also describe a novel buffer layout technique for GPUs which facilitates exploiting the high memory bandwidth available in GPUs. The proposed scheduling utilizes both the scalar units in GPU, to exploit data parallelism, and multiprocessors, to exploit task and pipeline parallelism. We have evaluated our approach on a platform equipped with an NVIDIA GeForce 8800 GTS 512 GPU and our approach yields a (geometric) mean speedup of 5.02X, with a maximum speedup of 36.83X across a set of StreamIt benchmarks, with the speedup measured relative to an optimized single threaded CPU execution. While the approach of software pipelining the execution of stream programs on GPUs is efficient and performs well, it does not utilize the CPU cores to perform useful computation. Further, it does not support programs with stateful filters, which are essentially filters that are not data parallel owing to a dependence between each successive firing that is carried through the implicit state of the filter. The second part of the thesis aims at addressing these issues and describes a novel method to orchestrate the execution of a StreamIt program on the multiple cores of a system and GPUs in a synergistic manner. The proposed approach identifies, using profiling, the relative benefits of executing a task on the superscalar CPU cores and the accelerator. We formulate the problem of partitioning the work between the CPU cores and the GPU, taking into account the latencies for data transfers, the limited DMA bandwidth available and the required buffer layout transformations associated with the partitioning, as an integrated Integer Linear Program(ILP) which can then be solved by an ILP solver. Since solving an ILP is NP-Hard in the general case and may thus require a large amount of time, we also propose an efficient heuristic algorithm for the work partitioning between the CPU and the GPU, which provides solutions which are within 9.05% of the optimal solutions to the ILP formulation on an average across the benchmark suite, while requiring 2–3 orders of magnitude less time than the ILP approach. The partitioned tasks are then software pipelined to execute on the multiple CPU cores and the Streaming Multiprocessors (SMs) of the GPU. The software pipelining algorithm orchestrates the execution between CPU cores and the GPU by emitting the code for the CPU and the GPU, and the code for the required data transfers. Our experiments on a platform with eight CPU cores, out of which four were used, and a GeForce 8800 GTS512 GPU show a(geometric) mean speed up of 6.84X with a maximum of 51.96X over a single threaded CPU execution across a set of StreamIt benchmarks. Compilers Stream Programs Partitioning Algorithm Stream Programs - Execution Stream Programs - Compilation Graphics Processing Units (GPUs) Streaming Multiprocessors (SMs) Accelerators Computer Science
57	CUDA performance analyzer Dasgupta, Aniruddha 05 April 2011 (has links) GPGPU Computing using CUDA is rapidly gaining ground today. GPGPU has been brought to the masses through the ease of use of CUDA and ubiquity of graphics cards supporting the same. Although CUDA has a low learning curve for programmers familiar with standard programming languages like C, extracting optimum performance from it, through optimizations and hand tuning is not a trivial task. This is because, in case of GPGPU, an optimization strategy rarely affects the functioning in an isolated manner. Many optimizations affect different aspects for better or worse, establishing a tradeoff situation between them, which needs to be carefully handled to achieve good performance. Thus optimizing an application for CUDA is tough and the performance gain might not be commensurate to the coding effort put in. I propose to simplify the process of optimizing CUDA programs using a CUDA Performance Analyzer. The analyzer is based on analytical modeling of CUDA compatible GPUs. The model characterizes the different aspects of GPU compute unified architecture and can make prediction about expected performance of a CUDA program. It would also give an insight into the performance bottlenecks of the CUDA implementation. This would hint towards, what optimizations need to be applied to improve performance. Based on the model, one would also be able to make a prediction about the performance of the application if the optimizations are applied to the CUDA implementation. This enables a CUDA programmer to test out different optimization strategies without putting in a lot of coding effort. GPU CUDA Analytical modeling GPGPU Optimization Performance prediction Fast multipole method Performance analysis Ocelot Graphics processing units Computer graphics Application software
58	Modeling Multi-factor Financial Derivatives by a Partial Differential Equation Approach with Efficient Implementation on Graphics Processing Units Dang, Duy Minh 15 November 2013 (has links) This thesis develops efficient modeling frameworks via a Partial Differential Equation (PDE) approach for multi-factor financial derivatives, with emphasis on three-factor models, and studies highly efficient implementations of the numerical methods on novel high-performance computer architectures, with particular focus on Graphics Processing Units (GPUs) and multi-GPU platforms/clusters of GPUs. Two important classes of multi-factor financial instruments are considered: cross-currency/foreign exchange (FX) interest rate derivatives and multi-asset options. For cross-currency interest rate derivatives, the focus of the thesis is on Power Reverse Dual Currency (PRDC) swaps with three of the most popular exotic features, namely Bermudan cancelability, knockout, and FX Target Redemption. The modeling of PRDC swaps using one-factor Gaussian models for the domestic and foreign interest short rates, and a one-factor skew model for the spot FX rate results in a time-dependent parabolic PDE in three space dimensions. Our proposed PDE pricing framework is based on partitioning the pricing problem into several independent pricing subproblems over each time period of the swap's tenor structure, with possible communication at the end of the time period. Each of these subproblems requires a solution of the model PDE. We then develop a highly efficient GPU-based parallelization of the Alternating Direction Implicit (ADI) timestepping methods for solving the model PDE. To further handle the substantially increased computational requirements due to the exotic features, we extend the pricing procedures to multi-GPU platforms/clusters of GPUs to solve each of these independent subproblems on a separate GPU. Numerical results indicate that the proposed GPU-based parallel numerical methods are highly efficient and provide significant increase in performance over CPU-based methods when pricing PRDC swaps. An analysis of the impact of the FX volatility skew on the price of PRDC swaps is provided. In the second part of the thesis, we develop efficient pricing algorithms for multi-asset options under the Black-Scholes-Merton framework, with strong emphasis on multi-asset American options. Our proposed pricing approach is built upon a combination of (i) a discrete penalty approach for the linear complementarity problem arising due to the free boundary and (ii) a GPU-based parallel ADI Approximate Factorization technique for the solution of the linear algebraic system arising from each penalty iteration. A timestep size selector implemented efficiently on GPUs is used to further increase the efficiency of the methods. We demonstrate the efficiency and accuracy of the proposed GPU-based parallel numerical methods by pricing American options written on three assets. multi-currency swaps multi-currency options Power Reverse-Dual Currency PRDC Partial Differential Equation PDE Alternating Direction Implicit ADI Graphics Processing Units GPU parallel computing finite difference 0984
59	A Multidimensional Filtering Framework with Applications to Local Structure Analysis and Image Enhancement Svensson, Björn January 2008 (has links) Filtering is a fundamental operation in image science in general and in medical image science in particular. The most central applications are image enhancement, registration, segmentation and feature extraction. Even though these applications involve non-linear processing a majority of the methodologies available rely on initial estimates using linear filters. Linear filtering is a well established cornerstone of signal processing, which is reflected by the overwhelming amount of literature on finite impulse response filters and their design. Standard techniques for multidimensional filtering are computationally intense. This leads to either a long computation time or a performance loss caused by approximations made in order to increase the computational efficiency. This dissertation presents a framework for realization of efficient multidimensional filters. A weighted least squares design criterion ensures preservation of the performance and the two techniques called filter networks and sub-filter sequences significantly reduce the computational demand. A filter network is a realization of a set of filters, which are decomposed into a structure of sparse sub-filters each with a low number of coefficients. Sparsity is here a key property to reduce the number of floating point operations required for filtering. Also, the network structure is important for efficiency, since it determines how the sub-filters contribute to several output nodes, allowing reduction or elimination of redundant computations. Filter networks, which is the main contribution of this dissertation, has many potential applications. The primary target of the research presented here has been local structure analysis and image enhancement. A filter network realization for local structure analysis in 3D shows a computational gain, in terms of multiplications required, which can exceed a factor 70 compared to standard convolution. For comparison, this filter network requires approximately the same amount of multiplications per signal sample as a single 2D filter. These results are purely algorithmic and are not in conflict with the use of hardware acceleration techniques such as parallel processing or graphics processing units (GPU). To get a flavor of the computation time required, a prototype implementation which makes use of filter networks carries out image enhancement in 3D, involving the computation of 16 filter responses, at an approximate speed of 1MVoxel/s on a standard PC. Medical image science multidimensional filtering image enhancement image registration image segmentation filter networks graphics processing units (GPU) Medical engineering Medicinsk teknik
60	On continuous maximum ﬂow image segmentation algorithm Marak, Laszlo 28 March 2012 (has links) (PDF) In recent years, with the advance of computing equipment and image acquisition techniques, the sizes, dimensions and content of acquired images have increased considerably. Unfortunately as time passes there is a steadily increasing gap between the classical and parallel programming paradigms and their actual performance on modern computer hardware. In this thesis we consider in depth one particular algorithm, the continuous maximum flow computation. We review in detail why this algorithm is useful and interesting, and we propose efficient and portable implementations on various architectures. We also examine how it performs in the terms of segmentation quality on some recent problems of materials science and nano-scale biology [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Total variation Image analysis Continuous optimization Transmission electron tomography Image segmentation

Search results