Global ETD Search

821	Hybrid Spectral Ray Tracing Method for Multi-scale Millimeter-wave and Photonic Propagation Problems Hailu, Daniel 30 September 2011 (has links) This thesis presents an efficient self-consistent Hybrid Spectral Ray Tracing (HSRT) technique for analysis and design of multi-scale sub-millimeter wave problems, where sub-wavelength features are modeled using rigorous methods, and complex structures with dimensions in the order of tens or even hundreds of wavelengths are modeled by asymptotic methods. Quasi-optical devices are used in imaging arrays for sub-millimeter and terahertz applications, THz time-domain spectroscopy (THz-TDS), high-speed wireless communications, and space applications to couple terahertz radiation from space to a hot electron bolometer. These devices and structures, as physically small they have become, are very large in terms of the wavelength of the driving quasi-optical sources and may have dimension in the tens or even hundreds of wavelengths. Simulation and design optimization of these devices and structures is an extremely challenging electromagnetic problem. The analysis of complex electrically large unbounded wave structures using rigorous methods such as method of moments (MoM), finite element method (FEM), and finite difference time domain (FDTD) method can become almost impossible due to the need for large computational resources. Asymptotic high-frequency techniques are used for analysis of electrically large quasi-optical systems and hybrid methods for solving multi-scale problems. Spectral Ray Tracing (SRT) has a number of unique advantages as a candidate for hybridization. The SRT method has the advantages of Spectral Theory of Diffraction (STD). STD can model reflection, refraction and diffraction of an arbitrary wave incident on the complex structure, which is not the case for diffraction theories such as Geometrical Theory of Diffraction (GTD), Uniform theory of Diffraction (UTD) and Uniform Asymptotic Theory (UAT). By including complex rays, SRT can effectively analyze both near-fields and far-fields accurately with minimal approximations. In this thesis, a novel matrix representation of SRT is presented that uses only one spectral integration per observation point and applied to modeling a hemispherical and hyper-hemispherical lens. The hybridization of SRT with commercially available FEM and MoM software is proposed in this work to solve the complexity of multi-scale analysis. This yields a computationally efficient self-consistent HSRT algorithm. Various arrangements of the Hybrid SRT method such as FEM-SRT, and MoM-SRT, are investigated and validated through comparison of radiation patterns with Ansoft HFSS for the FEM method, FEKO for MoM, Multi-level Fast Multipole Method (MLFMM) and physical optics. For that a bow-tie terahertz antenna backed by hyper-hemispherical silicon lens, an on-chip planar dipole fabricated in SiGe:C BiCMOS technology and attached to a hyper-hemispherical silicon lens and a double-slot antenna backed by silica lens will be used as sample structures to be analyzed using the HSRT. Computational performance (memory requirement, CPU/GPU time) of developed algorithm is compared to other methods in commercially available software. It is shown that the MoM-SRT, in its present implementation, is more accurate than MoM-PO but comparable in speed. However, as shown in this thesis, MoM-SRT can take advantage of parallel processing and GPU. The HSRT algorithm is applied to simulation of on-chip dipole antenna backed by Silicon lens and integrated with a 180-GHz VCO and radiation pattern compared with measurements. The radiation pattern is measured in a quasi-optical configuration using a power detector. In addition, it is shown that the matrix formulation of SRT and HSRT are promising approaches for solving complex electrically large problems with high accuracy. This thesis also expounds on new measurement setup specifically developed for measuring integrated antennas, radiation pattern and gain of the embedded on-chip antenna in the mmW/ terahertz range. In this method, the radiation pattern is first measured in a quasi-optical configuration using a power detector. Subsequently, the radiated power is estimated form the integration over the radiation pattern. Finally, the antenna gain is obtained from the measurement of a two-antenna system. Antennas Electromagnetic radiation Method of Moments Physical Optics graphics processing unit (GPU) lens antennas ray tracing Spectral Ray Tracing SiGe technology Terahertz Electrical and Computer Engineering
822	CUDA performance analyzer Dasgupta, Aniruddha 05 April 2011 (has links) GPGPU Computing using CUDA is rapidly gaining ground today. GPGPU has been brought to the masses through the ease of use of CUDA and ubiquity of graphics cards supporting the same. Although CUDA has a low learning curve for programmers familiar with standard programming languages like C, extracting optimum performance from it, through optimizations and hand tuning is not a trivial task. This is because, in case of GPGPU, an optimization strategy rarely affects the functioning in an isolated manner. Many optimizations affect different aspects for better or worse, establishing a tradeoff situation between them, which needs to be carefully handled to achieve good performance. Thus optimizing an application for CUDA is tough and the performance gain might not be commensurate to the coding effort put in. I propose to simplify the process of optimizing CUDA programs using a CUDA Performance Analyzer. The analyzer is based on analytical modeling of CUDA compatible GPUs. The model characterizes the different aspects of GPU compute unified architecture and can make prediction about expected performance of a CUDA program. It would also give an insight into the performance bottlenecks of the CUDA implementation. This would hint towards, what optimizations need to be applied to improve performance. Based on the model, one would also be able to make a prediction about the performance of the application if the optimizations are applied to the CUDA implementation. This enables a CUDA programmer to test out different optimization strategies without putting in a lot of coding effort. GPU CUDA Analytical modeling GPGPU Optimization Performance prediction Fast multipole method Performance analysis Ocelot Graphics processing units Computer graphics Application software
823	High-performance computer system architectures for embedded computing Lee, Dongwon 26 August 2011 (has links) The main objective of this thesis is to propose new methods for designing high-performance embedded computer system architectures. To achieve the goal, three major components - multi-core processing elements (PEs), DRAM main memory systems, and on/off-chip interconnection networks - in multi-processor embedded systems are examined in each section respectively. The first section of this thesis presents architectural enhancements to graphics processing units (GPUs), one of the multi- or many-core PEs, for improving performance of embedded applications. An embedded application is first mapped onto GPUs to explore the design space, and then architectural enhancements to existing GPUs are proposed for improving throughput of the embedded application. The second section proposes high-performance buffer mapping methods, which exploit useful features of DRAM main memory systems, in DSP multi-processor systems. The memory wall problem becomes increasingly severe in multiprocessor environments because of communication and synchronization overheads. To alleviate the memory wall problem, this section exploits bank concurrency and page mode access of DRAM main memory systems for increasing the performance of multiprocessor DSP systems. The final section presents a network-centric Turbo decoder and network-centric FFT processors. In the era of multi-processor systems, an interconnection network is another performance bottleneck. To handle heavy communication traffic, this section applies a crossbar switch - one of the indirect networks - to the parallel Turbo decoder, and applies a mesh topology to the parallel FFT processors. When designing the mesh FFT processors, a very different approach is taken to improve performance; an optical fiber is used as a new interconnection medium. Turbo decoding GPU architecture SDF graph DRAM system Embedded computer systems High performance computing Electronic data processing
824	Multilayer background modeling under occlusions for spatio-temporal scene analysis Azmat, Shoaib 21 September 2015 (has links) This dissertation presents an efficient multilayer background modeling approach to distinguish among midground objects, the objects whose existence occurs over varying time scales between the extremes of short-term ephemeral appearances (foreground) and long-term stationary persistences (background). Traditional background modeling separates a given scene into foreground and background regions. However, the real world can be much more complex than this simple classification, and object appearance events often occur over varying time scales. There are situations in which objects appear on the scene at different points in time and become stationary; these objects can get occluded by one another, and can change positions or be removed from the scene. Inability to deal with such scenarios involving midground objects results in errors, such as ghost objects, miss-detection of occluding objects, aliasing caused by the objects that have left the scene but are not removed from the model, and new objects’ detection when existing objects are displaced. Modeling temporal layers of multiple objects allows us to overcome these errors, and enables the surveillance and summarization of scenes containing multiple midground objects. Video surveillance Multilayer background modeling Adaptive background modeling Temporal multimodal mean Occlusion reasoning Appearance model Video summarization Low-power integrated GPU Embedded smart camera Spatio-temporal scene analysis
825	Hardware Acceleration of a Monte Carlo Simulation for Photodynamic Therapy Treatment Planning Lo, William Chun Yip 15 February 2010 (has links) Monte Carlo (MC) simulations are widely used in the field of medical biophysics, particularly for modelling light propagation in biological tissue. The iterative nature of MC simulations and their high computation time currently limit their use to solving the forward solution for a given set of source characteristics and tissue optical properties. However, applications such as photodynamic therapy treatment planning or image reconstruction in diffuse optical tomography require solving the inverse problem given a desired light dose distribution or absorber distribution, respectively. A faster means for performing MC simulations would enable the use of MC-based models for such tasks. In this thesis, a gold standard MC code called MCML was accelerated using two distinct hardware-based approaches, namely designing custom hardware on field-programmable gate arrays (FPGAs) and programming commodity graphics processing units (GPUs). Currently, the GPU-based approach is promising, offering approximately 1000-fold speedup with 4 GPUs compared to an Intel Xeon CPU. Photodynamic therapy Hardware acceleration Monte Carlo simulation MCML Treatment planning Graphics processing unit (GPU) Field programmable gate array (FPGA) 0760 0544 0752
826	Développement d'algorithmes pour la fonction NCTR - Application des calculs parallèles sur les processeurs GPU. Boulay, Thomas 22 October 2013 (has links) (PDF) Le thème principal de cette thèse est l'étude d'algorithmes de reconnaissance de cibles non coopératives (NCTR). Il s'agit de faire de la reconnaissance au sein de la classe "chasseur" en utilisant le profil distance. Nous proposons l'étude de quatre algorithmes : un basé sur l'algorithme des KPPV, un sur les méthodes probabilistes et deux sur la logique floue. Une contrainte majeure des algorithmes NCTR est le contrôle du taux d'erreur tout en maximisant le taux de succès. Nous avons pu montrer que les deux premiers algorithmes ne permettait pas de respecter cette contrainte. Nous avons en revanche proposé deux algorithmes basés sur la logique floue qui permettent de respecter cette contrainte. Ceci se fait au détriment du taux de succès (notamment sur les données réelles) pour le premier des deux algorithmes. Cependant la deuxième version de l'algorithme a permis d'augmenter considérablement le taux de succès tout en gardant le contrôle du taux d'erreur. Le principe de cet algorithme est de caractériser, case distance par case distance, l'appartenance à une classe en introduisant notamment des données acquises en chambre sourde. Nous avons également proposé une procédure permettant d'adapter les données acquises en chambre sourde pour une classe donnée à d'autres classes de cibles. La deuxième contrainte forte des algorithmes NCTR est la contrainte du temps réel. Une étude poussée d'une parallélisation de l'algorithme basé sur les KPPV a été réalisée en début de thèse. Cette étude a permis de faire ressortir les points à prendre en compte lors d'une parallélisation sur GPU d'algorithmes NCTR. Les conclusions tirées de cette étude permettront par la suite de paralléliser de manière efficace sur GPU les futurs algorithmes NCTR et notamment ceux proposés dans le cadre de cette thèse. NCTR CLASSIFICATION RADAR HRD GPU KPPV LOGIQUE-FLOUE PROFIL DISTANCE RECONNAISSANCE
827	In Situ Real-time Visualization and Corrosion Testing of Stainless Steel 316LVM with Emphasis on Digital In-line Holographic Microscopy Klages, Peter E. 17 August 2012 (has links) Digital in-line holographic microscopy (DIHM) has been incorporated as an additional simultaneous in situ optical technique with ellipsomicroscopy for surface imaging and microscopy to study metastable pitting corrosion on stainless steel 316LVM in simulated biological solutions. DIHM adds microscopic volume imaging, allows one to detect local changes of the index of refraction in the vicinity of a pitting event, and allows one to track tracer particles and/or material ejected from the pitting sites. To improve the pitting corrosion resistance of stainless steel 316LVM, a simple surface treatment was tested and the aforementioned imaging techniques were used to verify that pitting occurred only on the wire face. Treatments consisted of polishing the samples to remove the passive layer, then immersing the wires in 90 C nanopure water for several hours. Treated wires show a marked increase in pitting corrosion resistance over untreated wires: the pit initiation potential increases by a minimum of 200 mV. Additional testing with scanning electron microscopy and energy dispersive X-ray spectroscopy indicate that the removal of sulphide inclusions from the surface is the most probable cause of this enhancement. To increase holographic reconstruction performance, Graphics Processing Units (GPUs) have been used; 4 Mpixel holograms are reconstructed using the dot product approximation of the Kirchhoff-Fresnel integral in 60 ms on a Tesla c1060 GPU. Errors in sizes and positions can easily be as large as 5 to 10 % for regions where the dot product approximation is not valid, so algorithms with fewer or no approximations are also required. Reconstructions for arbitrary holographic geometries using the full Kirchhoff-Fresnel integral take approximately 1 hour (compared to 1 week on a quad-core CPU), and reconstructions using convolution methods, in which the results of 256 reconstructions at 4096 x 4096 pixels in one plane are combined, take 17 s. This method is almost exact, with approximations only in the obliquity factor. Digital In-line Holographic Microscopy Ellipsomicroscopy for Surface Imaging Optical Microscopy Stainless Steel Surface Treatment Stainless Steel GPU Computation Corrosion Testing Metastable Pitting
828	Modeling Multi-factor Financial Derivatives by a Partial Differential Equation Approach with Efficient Implementation on Graphics Processing Units Dang, Duy Minh 15 November 2013 (has links) This thesis develops efficient modeling frameworks via a Partial Differential Equation (PDE) approach for multi-factor financial derivatives, with emphasis on three-factor models, and studies highly efficient implementations of the numerical methods on novel high-performance computer architectures, with particular focus on Graphics Processing Units (GPUs) and multi-GPU platforms/clusters of GPUs. Two important classes of multi-factor financial instruments are considered: cross-currency/foreign exchange (FX) interest rate derivatives and multi-asset options. For cross-currency interest rate derivatives, the focus of the thesis is on Power Reverse Dual Currency (PRDC) swaps with three of the most popular exotic features, namely Bermudan cancelability, knockout, and FX Target Redemption. The modeling of PRDC swaps using one-factor Gaussian models for the domestic and foreign interest short rates, and a one-factor skew model for the spot FX rate results in a time-dependent parabolic PDE in three space dimensions. Our proposed PDE pricing framework is based on partitioning the pricing problem into several independent pricing subproblems over each time period of the swap's tenor structure, with possible communication at the end of the time period. Each of these subproblems requires a solution of the model PDE. We then develop a highly efficient GPU-based parallelization of the Alternating Direction Implicit (ADI) timestepping methods for solving the model PDE. To further handle the substantially increased computational requirements due to the exotic features, we extend the pricing procedures to multi-GPU platforms/clusters of GPUs to solve each of these independent subproblems on a separate GPU. Numerical results indicate that the proposed GPU-based parallel numerical methods are highly efficient and provide significant increase in performance over CPU-based methods when pricing PRDC swaps. An analysis of the impact of the FX volatility skew on the price of PRDC swaps is provided. In the second part of the thesis, we develop efficient pricing algorithms for multi-asset options under the Black-Scholes-Merton framework, with strong emphasis on multi-asset American options. Our proposed pricing approach is built upon a combination of (i) a discrete penalty approach for the linear complementarity problem arising due to the free boundary and (ii) a GPU-based parallel ADI Approximate Factorization technique for the solution of the linear algebraic system arising from each penalty iteration. A timestep size selector implemented efficiently on GPUs is used to further increase the efficiency of the methods. We demonstrate the efficiency and accuracy of the proposed GPU-based parallel numerical methods by pricing American options written on three assets. multi-currency swaps multi-currency options Power Reverse-Dual Currency PRDC Partial Differential Equation PDE Alternating Direction Implicit ADI Graphics Processing Units GPU parallel computing finite difference 0984
829	Comparación del uso de GPGPU y cluster de multicore en problemas con alta demanda computacional Montes de Oca, Erica January 2012 (has links) La presente Tesina de Grado tiene como objetivo la investigación y el estudio de las plataformas de memoria compartida GPU y cluster de Multicore para la resolución de problemas con alta demanda computacional. Se presentan soluciones al problema planteado con el fin de comparar rendimiento en sus versiones secuencial, paralela con memoria compartida, paralela con pasaje de mensajes, paralela híbrida y paralela en GPU. Se analiza la bondad de las soluciones en relación al tiempo de ejecución y aceleración, y se introduce el análisis de consumo energético. programación paralela multicore cluster de multicore GPU GPGPU CUDA problemas de alta demanda computacional N-Body Parallel programming Clustering Information Systems Ciencias Informáticas
830	A Multidimensional Filtering Framework with Applications to Local Structure Analysis and Image Enhancement Svensson, Björn January 2008 (has links) Filtering is a fundamental operation in image science in general and in medical image science in particular. The most central applications are image enhancement, registration, segmentation and feature extraction. Even though these applications involve non-linear processing a majority of the methodologies available rely on initial estimates using linear filters. Linear filtering is a well established cornerstone of signal processing, which is reflected by the overwhelming amount of literature on finite impulse response filters and their design. Standard techniques for multidimensional filtering are computationally intense. This leads to either a long computation time or a performance loss caused by approximations made in order to increase the computational efficiency. This dissertation presents a framework for realization of efficient multidimensional filters. A weighted least squares design criterion ensures preservation of the performance and the two techniques called filter networks and sub-filter sequences significantly reduce the computational demand. A filter network is a realization of a set of filters, which are decomposed into a structure of sparse sub-filters each with a low number of coefficients. Sparsity is here a key property to reduce the number of floating point operations required for filtering. Also, the network structure is important for efficiency, since it determines how the sub-filters contribute to several output nodes, allowing reduction or elimination of redundant computations. Filter networks, which is the main contribution of this dissertation, has many potential applications. The primary target of the research presented here has been local structure analysis and image enhancement. A filter network realization for local structure analysis in 3D shows a computational gain, in terms of multiplications required, which can exceed a factor 70 compared to standard convolution. For comparison, this filter network requires approximately the same amount of multiplications per signal sample as a single 2D filter. These results are purely algorithmic and are not in conflict with the use of hardware acceleration techniques such as parallel processing or graphics processing units (GPU). To get a flavor of the computation time required, a prototype implementation which makes use of filter networks carries out image enhancement in 3D, involving the computation of 16 filter responses, at an approximate speed of 1MVoxel/s on a standard PC. Medical image science multidimensional filtering image enhancement image registration image segmentation filter networks graphics processing units (GPU) Medical engineering Medicinsk teknik

Search results