Global ETD Search

71	A Low Communication Condensation-based Linear System Solver Utilizing Cramer's Rule Habgood, Kenneth C 01 August 2011 (has links) Systems of linear equations are central to many science and engineering application domains. Given the abundance of low-cost parallel processing fabrics, the study of fast and accurate parallel algorithms for solving such systems is receiving attention. Fast linear solvers generally use a form of LU factorization. These methods face challenges with workload distribution and communication overhead that hinder their application in a true broadcast communication environment. Presented is an efficient framework for solving large-scale linear systems by means of a novel utilization of Cramer's rule. While the latter is often perceived to be impractical when considered for large systems, it is shown that the algorithm proposed has an order N^3 complexity with pragmatic forward and backward stability. To the best of our knowledge, this is the first time that Cramer's rule has been demonstrated to be an order N^3 process. Empirical results are provided to substantiate the stated accuracy and computational complexity, clearly demonstrating the efficacy of the approach taken. The unique utilization of Cramer's rule and matrix condensation techniques yield an elegant process that can be applied to parallel computing architectures that support a broadcast communication infrastructure. The regularity of the communication patterns, and send-ahead ability, yields a viable framework for solving linear equations using conventional computing platforms. In addition, this dissertation demonstrates the algorithm's potential for solving large-scale sparse linear systems. Cramer's Rule Parallel Computing Chio's Condensation Linear Systems Computational Engineering
72	PERFORMANCE EVALUATION OF MEMORY AND COMPUTATIONALLY BOUND CHEMISTRY APPLICATIONS ON STREAMING GPGPUS AND MULTI-CORE X86 CPUS Weber III, Frederick E 01 May 2010 (has links) In recent years, multi-core processors have come to dominate the field in desktop and high performance computing. Graphics processors traditionally used in CAD, video games, and other 3-d applications, have become more programmable and are now suitable for general purpose computing. This thesis explores multi-core processors and GPU performance and limitations in two computational chemistry applications: a memory bound component of ab-initio modeling and a computationally bound Monte Carlo simulation. For the applications presented in this thesis, exploiting multiple processors is done using a variety of tools and languages including OpenMP and MKL. Brook+ and the Compute Abstraction Layer streaming environments are used to accelerate applications on AMD GPUs. This thesis gives qualitative assertions about these languages and tools regarding ease of use and optimization in addition to quantitative analyses of performance. GPUs can yield modest performance improvements with little effort in some applications and even larger speedups with simple optimizations. GPU multi-core Monte Carlo parallel computing Computer and Systems Architecture
73	Parallel Computing for Applications in Aeronautical CFD Ytterström, Anders January 2001 (has links) No description available. CFD "Parallel Computing" "Load Balancing" Aerodynamics
74	Performance Optimizations for Software Transactional Memory January 2011 (has links) The transition from single-core processors to multi-core processors demands a change from sequential programming to concurrent programming for mainstream programmers. However, concurrent programming has long been widely recognized as being notoriously difficult. A major reason for its difficulty is that existing concurrent programming constructs provide low-level programming abstractions. Using these constructs forces programmers to consider many low level details. Locks, the dominant programming construct for mutual exclusion, suffer several well known problems, such as deadlock, priority inversion, and convoying, and are directly related to the difficulty of concurrent programming. The alternative to locks, i.e. non-blocking programming, not only is extremely error-prone, but also does not produce consistently good performance. Better programming constructs are critical to reduce the complexity of concurrent programming, increase productivity, and expose the computing power in multi-core processors. Transactional memory has emerged recently as one promising programming construct for supporting atomic operations on shared data. By eliminating the need to consider a huge number of possible interactions among concurrent transactions, Transactional memory greatly reduces the complexity of concurrent programming and vastly improves programming productivity. Software transactional memory systems implement a transactional memory abstraction in software. Unfortunately, existing designs of Software Transactional Memory systems incur significant performance overhead that could potentially prevent it from being widely used. Reducing STM's overhead will be critical for mainstream programmers to improve productivity while not suffering performance degradation. My thesis is that the performance of STM can be significantly improved by intelligently designing validation and commit protocols, by designing the time base, and by incorporating application-specific knowledge. I present four novel techniques for improving performance of STM systems to support my thesis. First, I propose a time-based STM system based on a runtime tuning strategy that is able to deliver performance equal to or better than existing strategies. Second, I present several novel commit phase designs and evaluate their performance. Then I propose a new STM programming interface extension that enables transaction optimizations using fast shared memory reads while maintaining transaction composability. Next, I present a distributed time base design that outperforms existing time base designs for certain types of STM applications. Finally, I propose a novel programming construct to support multi-place isolation. Experimental results show the techniques presented here can significantly improve the STM performance. We expect these techniques to help STM be accepted by more programmers. Applied sciences Transactional memory Parallel computing Concurrent programming Computer science
75	A novel approach to reduce the computation time for CFD; hybrid LES–RANS modelling on parallel computers Turnbull, Julian January 2003 (has links) Large Eddy Simulation is a method of obtaining high accuracy computational results for modelling fluid flow. Unfortunately it is computationally expensive limiting it to users of large parallel machines. However, it may be that the use of LES leads to an over-resolution of the problem because the bulk of the computational domain could be adequately modelled using the Reynolds averaged approach. A study has been undertaken to assess the feasibility, both in accuracy and computational efficiency of using a parallel computer to solve both LES and RANS type turbulence models on the same domain for the problem flow over a circular cylinder at Reynolds number 3 900 To do this the domain has been created and then divided into two sub-domains, one for the LES model and one for the kappa - epsilon turbulence model. The hybrid model has been developed specifically for a parallel computing environment and the user is able to allocate modelling techniques to processors in a way which enables expansion of the model to any number of processors. Computational experimentation has shown that the combination of the Smagorinsky model can be used to capture the vortex shedding from the cylinder and the information successfully passed to the kappa - epsilon model for the dissipation of the vortices further downstream. The results have been compared to high accuracy LES results and with both kappa - epsilon and Smagorinsky LES computations on the same domain. The hybrid models developed compare well with the Smagorinsky model capturing the vortex shedding with the correct periodicity. Suggestions for future work have been made to develop this idea further, and to investigate the possibility of using the technology for the modelling of mixing and fast chemical reactions based on the more accurate prediction of the turbulence levels in the LES sub-domain. Parallel computing Turbulence Large Eddy Simulation Smagorinsky model
76	Local independence in computed tomography as a basis for parallel computing Martin, Daniel Morris 14 September 2007 (has links) Iterative CT reconstruction algorithms are superior to the standard convolution backpropagation (CBP) methods when reconstructing from a small number of views (hence less radiation), but are computationally costly. To reduce the execution time, this work implements and tests a parallel approach to iterative algorithms using a cluster of workstations, which is a low cost system found in many offices and non-academic sites. A previous implementation showed little speedup because of the significant cost of inter-processor communication. In this thesis, several data partitioning methods are examined, including some image tiling methods that exploit the spatial locality demonstrated by local CT. Using these methods, computation can proceed locally, without the need for inter-processor communication during every iteration. A relative speedup of up to 17 times is obtained using 25 processors, demonstrating that good performance can be obtained running computationally intensive CT reconstruction algorithms on distributed memory hardware. / October 2007 parallel computing computed tomography algebraic reconstruction technique data partitioning
77	Exploring the neural codes using parallel hardware Baladron Pezoa, Javier 07 June 2013 (has links) (PDF) The aim of this thesis is to understand the dynamics of large interconnected populations of neurons. The method we use to reach this objective is a mixture of mesoscopic modeling and high performance computing. The rst allows us to reduce the complexity of the network and the second to perform large scale simulations. In the rst part of this thesis a new mean eld approach for conductance based neurons is used to study numerically the eects of noise on extremely large ensembles of neurons. Also, the same approach is used to create a model of one hypercolumn from the primary visual cortex where the basic computational units are large populations of neurons instead of simple cells. All of these simulations are done by solving a set of partial dierential equations that describe the evolution of the probability density function of the network. In the second part of this thesis a numerical study of two neural eld models of the primary visual cortex is presented. The main focus in both cases is to determine how edge selection and continuation can be computed in the primary visual cortex. The dierence between the two models is in how they represent the orientation preference of neurons, in one this is a feature of the equations and the connectivity depends on it, while in the other there is an underlying map which denes an input function. All the simulations are performed on a Graphic Processing Unit cluster. Thethesis proposes a set of techniques to simulate the models fast enough on this kind of hardware. The speedup obtained is equivalent to that of a huge standard cluster. [INFO:INFO_OH] Computer Science/Other Neuroscience Numerical methods Parallel computing
78	Algorithms for VLSI Circuit Optimization and GPU-Based Parallelization Liu, Yifang 2010 May 1900 (has links) This research addresses some critical challenges in various problems of VLSI design automation, including sophisticated solution search on DAG topology, simultaneous multi-stage design optimization, optimization on multi-scenario and multi-core designs, and GPU-based parallel computing for runtime acceleration. Discrete optimization for VLSI design automation problems is often quite complex, due to the inconsistency and interference between solutions on reconvergent paths in directed acyclic graph (DAG). This research proposes a systematic solution search guided by a global view of the solution space. The key idea of the proposal is joint relaxation and restriction (JRR), which is similar in spirit to mathematical relaxation techniques, such as Lagrangian relaxation. Here, the relaxation and restriction together provides a global view, and iteratively improves the solution. Traditionally, circuit optimization is carried out in a sequence of separate optimization stages. The problem with sequential optimization is that the best solution in one stage may be worse for another. To overcome this difficulty, we take the approach of performing multiple optimization techniques simultaneously. By searching in the combined solution space of multiple optimization techniques, a broader view of the problem leads to the overall better optimization result. This research takes this approach on two problems, namely, simultaneous technology mapping and cell placement, and simultaneous gate sizing and threshold voltage assignment. Modern processors have multiple working modes, which trade off between power consumption and performance, or to maintain certain performance level in a powerefficient way. As a result, the design of a circuit needs to accommodate different scenarios, such as different supply voltage settings. This research deals with this multi-scenario optimization problem with Lagrangian relaxation technique. Multiple scenarios are taken care of simultaneously through the balance by Lagrangian multipliers. Similarly, multiple objective and constraints are simultaneously dealt with by Lagrangian relaxation. This research proposed a new method to calculate the subgradients of the Lagrangian function, and solve the Lagrangian dual problem more effectively. Multi-core architecture also poses new problems and challenges to design automation. For example, multiple cores on the same chip may have identical design in some part, while differ from each other in the rest. In the case of buffer insertion, the identical part have to be carefully optimized for all the cores with different environmental parameters. This problem has much higher complexity compared to buffer insertion on single cores. This research proposes an algorithm that optimizes the buffering solution for multiple cores simultaneously, based on critical component analysis. Under the intensifying time-to-market pressure, circuit optimization not only needs to find high quality solutions, but also has to come up with the result fast. Recent advance in general purpose graphics processing unit (GPGPU) technology provides massive parallel computing power. This research turns the complex computation task of circuit optimization into many subtasks processed by parallel threads. The proposed task partitioning and scheduling methods take advantage of the GPU computing power, achieve significant speedup without sacrifice on the solution quality. Optimization Algorithm Parallel Computing VLSI Circuit Design Design Automation
79	Advance the DNA computing Qiu, Zhiquan Frank 30 September 2004 (has links) It has been previously shown that DNA computing can solve those problems currently intractable on even the fastest electronic computers. The algorithm design for DNA computing, however, is not straightforward. A strong background in both the DNA molecule and computer engineering are required to develop efficient DNA computing algorithms. After Adleman solved the Hamilton Path Problem using a combinatorial molecular method, many other hard computational problems were investigated with the proposed DNA computer. The existing models from which a few DNA computing algorithms have been developed are not sufficiently powerful and robust, however, to attract potential users. This thesis has described research performed to build a new DNA computing model based on various new algorithms developed to solve the 3-Coloring problem. These new algorithms are presented as vehicles for demonstrating the advantages of the new model, and they can be expanded to solve other NP-complete problems. These new algorithms can significantly speed up computation and therefore achieve a consistently better time performance. With the given resource, these algorithms can also solve problems of a much greater size, especially as compared to existing DNA computation algorithms. The error rate can also be greatly reduced by applying these new algorithms. Furthermore, they have the advantage of dynamic updating, so an answer can be changed based on modifications made to the initial condition. This new model makes use of the huge possible memory by generating a ``lookup table'' during the implementation of the algorithms. If the initial condition changes, the answer changes accordingly. In addition, the new model has the advantage of decoding all the strands in the final pool both quickly and efficiently. The advantages provided by the new model make DNA computing an efficient and attractive means of solving computationally intense problems. DNA Computing Parallel Computing Molecular Computing Divide and Conquer
80	Parallel Computing for Applications in Aeronautical CFD Ytterström, Anders January 2001 (has links) No description available. CFD "Parallel Computing" "Load Balancing" Aerodynamics

Search results