Global ETD Search

241	Acceleration of Computer Based Simulation, Image Processing, and Data Analysis Using Computer Clusters with Heterogeneous Accelerators Chen, Chong January 2016 (has links) No description available. Computer Engineering parallel computing distributed computing GPGPU Xeon Phi Preconditioned Iterative Solver ALS bilateral filtering
242	Distribution-based Exploration and Visualization of Large-scale Vector and Multivariate Fields Lu, Kewei 08 August 2017 (has links) No description available. Computer Science Computer Engineering Large Data Analysis Visualization Query Driven Visualization Parallel Computing
243	Numerical simulations of unsteady flows in a pulse detonation engine by the conservation element and solution element method He, Hao 13 March 2006 (has links) No description available. Engineering, Mechanical Detonation Pulse Detonation Engine The CESE Method Computational Fluid Dynamics Numerical Simulation Parallel Computing
244	Load-Balancing Spatially Located Computations using Rectangular Partitions Bas, Erdeniz Ozgun 29 July 2011 (has links) No description available. Computer Science matrix partitioning load balancing rectangular partitioning scientific computing high performance computing parallel computing
245	Parallel Computation of the Meddis MATLAB Auditory Periphery Model Sanghvi, Niraj D. 18 July 2012 (has links) No description available. Parallel Computing Auditory Periphery High Performance Computing GPU CUDA Meddis Auditory Periphery MATLAB Parallel Computing Toolbox
246	Hardware-based Parallel Computing for Real-time Simulation of Soft-object Deformation Mafi, Ramin 06 1900 (has links) In the last two decades there has been an increasing interest in the field of haptics science. Real-time simulation of haptic interaction with non-rigid deformable object/tissue is computationally demanding. The computational bottleneck in finite- element (FE) modeling of deformable objects is in solving a large but sparse linear system of equations at each time step of the simulation. Depending on the mechanical properties of the object, high-fidelity stable haptic simulations require an update rate in the order of 100 − 1000 Hz. Direct software-based implementations that use conventional computers are fairly limited in the size of the model that they can process at such high rates. In this thesis, a new hardware-based parallel implementation of the iterative Conjugate Gradient (CG) algorithm for solving linear systems of equations is pro- posed. Sparse matrix-vector multiplication (SpMxV) is the main computational kernel in iterative solution methods such as the CG algorithm. Modern micro- processors exhibit poor performance in executing memory-bound tasks such as SpMxV. In the proposed hardware architecture, a novel organization of on-chip memory resources enables concurrent utilization of a large number of fixed-point computing units on a FPGA device for performing the calculations. The result is a powerful parallel computing platform that can iteratively solve the system of equations arising from the FE models of object deformation within the timing constraint of real-time haptics applications. Numerical accuracy of the fixed-point implementation, the hardware architecture design, and issues pertaining to the degree of parallelism and scalability of the solution are discussed in details. The proposed computing platform in this thesis is successfully employed in a set of haptic interaction experiments using static and dynamic linear FE-based models. / Master of Applied Science (MASc) Hardware-based Parallel Computing Real-time simulation Computational Engineering Electrical and Computer Engineering Computational Engineering
247	Bayesian Modeling of Complex High-Dimensional Data Huo, Shuning 07 December 2020 (has links) With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional complex data in different forms, such as medical images, genomics measurements. However, acquisition of more data does not automatically lead to better knowledge discovery. One needs efficient and reliable analytical tools to extract useful information from complex datasets. The main objective of this dissertation is to develop innovative Bayesian methodologies to enable effective and efficient knowledge discovery from complex high-dimensional data. It contains two parts—the development of computationally efficient functional mixed models and the modeling of data heterogeneity via Dirichlet Diffusion Tree. The first part focuses on tackling the computational bottleneck in Bayesian functional mixed models. We propose a computational framework called variational functional mixed model (VFMM). This new method facilitates efficient data compression and high-performance computing in basis space. We also propose a new multiple testing procedure in basis space, which can be used to detect significant local regions. The effectiveness of the proposed model is demonstrated through two datasets, a mass spectrometry dataset in a cancer study and a neuroimaging dataset in an Alzheimer's disease study. The second part is about modeling data heterogeneity by using Dirichlet Diffusion Trees. We propose a Bayesian latent tree model that incorporates covariates of subjects to characterize the heterogeneity and uncover the latent tree structure underlying data. This innovative model may reveal the hierarchical evolution process through branch structures and estimate systematic differences between groups of samples. We demonstrate the effectiveness of the model through the simulation study and a brain tumor real data. / Doctor of Philosophy / With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional data in different forms, such as engineering signals, medical images, and genomics measurements. However, acquisition of such data does not automatically lead to efficient knowledge discovery. The main objective of this dissertation is to develop novel Bayesian methods to extract useful knowledge from complex high-dimensional data. It has two parts—the development of an ultra-fast functional mixed model and the modeling of data heterogeneity via Dirichlet Diffusion Trees. The first part focuses on developing approximate Bayesian methods in functional mixed models to estimate parameters and detect significant regions. Two datasets demonstrate the effectiveness of proposed method—a mass spectrometry dataset in a cancer study and a neuroimaging dataset in an Alzheimer's disease study. The second part focuses on modeling data heterogeneity via Dirichlet Diffusion Trees. The method helps uncover the underlying hierarchical tree structures and estimate systematic differences between the group of samples. We demonstrate the effectiveness of the method through the brain tumor imaging data. Variational Inference Bayesian Variable Selection Functional Mixed Model Parallel Computing Bayesian Hierarchical Clustering Dirichlet Diffusion Tree
248	Robust Online Trajectory Prediction for Non-cooperative Small Unmanned Aerial Vehicles Badve, Prathamesh Mahesh 21 January 2022 (has links) In recent years, unmanned aerial vehicles (UAVs) have got a boost in their applications in civilian areas like aerial photography, agriculture, communication, etc. An increasing research effort is being exerted to develop sophisticated trajectory prediction methods for UAVs for collision detection and trajectory planning. The existing techniques suffer from problems such as inadequate uncertainty quantification of predicted trajectories. This work adopts particle filters together with Löwner-John ellipsoid to approximate the highest posterior density region for trajectory prediction and uncertainty quantification. The particle filter is tuned and tested on real-world and simulated data sets and compared with the Kalman filter. A parallel computing approach for particle filter is further proposed. This parallel implementation makes the particle filter faster and more suitable for real-time online applications. / Master of Science / In recent years, unmanned aerial vehicles (UAVs) have got a boost in their applications in civilian areas like aerial photography, agriculture, communication, etc. Over the coming years, the number of UAVs will increase rapidly. As a result, the risk of mid-air collisions grows, leading to property damages and possible loss of life if a UAV collides with manned aircraft. An increasing research effort has been made to develop sophisticated trajectory prediction methods for UAVs for collision detection and trajectory planning. The existing techniques suffer from problems such as inadequate uncertainty quantification of predicted trajectories. This work adopts particle filters, a Bayesian inferencing technique for trajectory prediction. The use of minimum volume enclosing ellipsoid to approximate the highest posterior density region for prediction uncertainty quantification is also investigated. The particle filter is tuned and tested on real-world and simulated data sets and compared with the Kalman filter. A parallel computing approach for particle filter is further proposed. This parallel implementation makes the particle filter faster and more suitable for real-time online applications. particle filters path planning Bayesian inference non-cooperative UAVs parallel computing message passing interface
249	Multi-level Parallelism with MPI and OpenACC for CFD Applications McCall, Andrew James 14 June 2017 (has links) High-level parallel programming approaches, such as OpenACC, have recently become popular in complex fluid dynamics research since they are cross-platform and easy to implement. OpenACC is a directive-based programming model that, unlike low-level programming models, abstracts the details of implementation on the GPU. Although OpenACC generally limits the performance of the GPU, this model significantly reduces the work required to port an existing code to any accelerator platform, including GPUs. The purpose of this research is twofold: to investigate the effectiveness of OpenACC in developing a portable and maintainable GPU-accelerated code, and to determine the capability of OpenACC to accelerate large, complex programs on the GPU. In both of these studies, the OpenACC implementation is optimized and extended to a multi-GPU implementation while maintaining a unified code base. OpenACC is shown as a viable option for GPU computing with CFD problems. In the first study, a CFD code that solves incompressible cavity flows is accelerated using OpenACC. Overlapping communication with computation improves performance for the multi-GPU implementation by up to 21%, achieving up to 400 times faster performance than a single CPU and 99% weak scalability efficiency with 32 GPUs. The second study ports the execution of a more complex CFD research code to the GPU using OpenACC. Challenges using OpenACC with modern Fortran are discussed. Three test cases are used to evaluate performance and scalability. The multi-GPU performance using 27 GPUs is up to 100 times faster than a single CPU and maintains a weak scalability efficiency of 95%. / Master of Science Graphics processing unit Directive-based programming OpenACC Lid-driven cavity Multi-GPU Parallel computing
250	Coupled-Cluster Methods for Large Molecular Systems Through Massive Parallelism and Reduced-Scaling Approaches Peng, Chong 02 May 2018 (has links) Accurate correlated electronic structure methods involve a significant amount of computations and can be only employed to small molecular systems. For example, the coupled-cluster singles, doubles, and perturbative triples model (CCSD(T)), which is known as the ``gold standard" of quantum chemistry for its accuracy, usually can treat molecules with 20-30 atoms. To extend the reach of accurate correlated electronic structure methods to larger molecular systems, we work towards two directions: parallel computing and reduced-cost/scaling approaches. Parallel computing can utilize more computational resources to handle systems that demand more substantial computational efforts. Reduced-cost/scaling approaches, which introduce approximations to the existing electronic structure methods, can significantly reduce the amount of computation and storage requirements. In this work, we introduce a new distributed-memory massively parallel implementation of standard and explicitly correlated (F12) coupled-cluster singles and doubles (CCSD) with canonical bigO{N^6} computational complexity ( C. Peng, J. A. Calvin, F. Pavov{s}evi'c, J. Zhang, and E. F. Valeev, textit{J. Phys. Chem. A} 2016, textbf{120}, 10231.), based on the TiledArray tensor framework. Excellent strong scaling is demonstrated on a multi-core shared-memory computer, a commodity distributed-memory computer, and a national-scale supercomputer. We also present a distributed-memory implementation of the density-fitting (DF) based CCSD(T) method. (C. Peng, J. A. Calvin, and E. F. Valeev, textit{in preparation for submission}) An improved parallel DF-CCSD is presented utilizing lazy evaluation for tensors with more than two unoccupied indices, which makes the DF-CCSD storage requirements always smaller than those of the non-iterative triples correction (T). Excellent strong scaling is observed on both shared-memory and distributed-memory computers equipped with conventional Intel Xeon processors and the Intel Xeon Phi (Knights Landing) processors. With the new implementation, the CCSD(T) energies can be evaluated for systems containing 200 electrons and 1000 basis functions in a few days using a small size commodity cluster, with even more massive computations possible on leadership-class computing resources. The inclusion of F12 correction to the CCSD(T) method makes it converge to basis set limit much more rapidly. The large-scale parallel explicitly correlated coupled-cluster program makes the accurate estimation of the coupled-cluster basis set limit for molecules with 20 or more atoms a routine. Thus, it can be used rigorously to test the emerging reduced-scaling coupled-cluster approaches. Moreover, we extend the pair natural orbital (PNO) approach to excited states through the equation-of-motion coupled cluster singles and doubles (EOM-CCSD) method. (C. Peng, M. C. Clement, and E. F. Valeev, textit{submitted}) We simulate the PNO-EOM-CCSD method using an existing massively parallel canonical EOM-CCSD program. We propose the use of state-averaged PNOs, which are generated from the average of the pair density of excited states, to span the PNO space of all the excited states. The doubles amplitudes in the CIS(D) method are used to compute the state-averaged pair density of excited states. The issue of incorrect states in the state-averaged pair density, caused by an energy reordering of excited states between the CIS(D) and EOM-CCSD, is resolved by simply computing more states than desired. We find that with a truncation threshold of $10^{-7}$, the truncation error for the excitation energy is already below 0.02 eV for the systems tested, while the average number of PNOs is reduced to 50-70 per pair. The accuracy of the PNO-EOM-CCSD method on local, Rydberg and charge transfer states is also investigated. / Ph. D. Quantum Chemistry Electronic Structure Theory Parallel Computing

Search results