• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 225
  • 81
  • 30
  • 24
  • 14
  • 7
  • 6
  • 3
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 500
  • 500
  • 103
  • 70
  • 61
  • 58
  • 58
  • 57
  • 57
  • 56
  • 54
  • 54
  • 52
  • 50
  • 47
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Advancement of Computing on Large Datasets via Parallel Computing and Cyberinfrastructure

Yildirim, Ahmet Artu 01 May 2015 (has links)
Large datasets require efficient processing, storage and management to efficiently extract useful information for innovation and decision-making. This dissertation demonstrates novel approaches and algorithms using virtual memory approach, parallel computing and cyberinfrastructure. First, we introduce a tailored user-level virtual memory system for parallel algorithms that can process large raster data files in a desktop computer environment with limited memory. The application area for this portion of the study is to develop parallel terrain analysis algorithms that use multi-threading to take advantage of common multi-core processors for greater efficiency. Second, we present two novel parallel WaveCluster algorithms that perform cluster analysis by taking advantage of discrete wavelet transform to reduce large data to coarser representations so data is smaller and more easily managed than the original data in size and complexity. Finally, this dissertation demonstrates an HPC gateway service that abstracts away many details and complexities involved in the use of HPC systems including authentication, authorization, and data and job management.
62

MATLAB*P 2.0: A unified parallel MATLAB

Choy, Ron, Edelman, Alan 01 1900 (has links)
MATLAB is one of the most widely used mathematical computing environments in technical computing. It is an interactive environment that provides high performance computational routines and an easy-to-use, C-like scripting language. Mathworks, the company that develops MATLAB, currently does not provide a version of MATLAB that can utilize parallel computing. This has led to academic and commercial efforts outside Mathworks to build a parallel MATLAB, using a variety of approaches. In a survey, 26 parallel MATLAB projects utilizing four different approaches have been identified. MATLAB*P is one of the 26 projects. It makes use of the backend support approach. This approach provides parallelism to MATLAB programs by relaying MATLAB commands to a parallel backend. The main difference between MATLAB*P and other projects that make use of the same approach is in its focus. MATLAB*P aims to provide a user-friendly supercomputing environment in which parallelism is achieved transparently through the use of objected oriented programming features in MATLAB. One advantage of this approach is that existing scripts can be run in parallel with no or minimal modifications. This paper describes MATLAB*P 2.0, which is a complete rewrite of MATLAB*P. This new version brings together the backend support approach with embarrassingly parallel and MPI approaches to provide the first complete parallel MATLAB framework. / Singapore-MIT Alliance (SMA)
63

A Low Communication Condensation-based Linear System Solver Utilizing Cramer's Rule

Habgood, Kenneth C 01 August 2011 (has links)
Systems of linear equations are central to many science and engineering application domains. Given the abundance of low-cost parallel processing fabrics, the study of fast and accurate parallel algorithms for solving such systems is receiving attention. Fast linear solvers generally use a form of LU factorization. These methods face challenges with workload distribution and communication overhead that hinder their application in a true broadcast communication environment. Presented is an efficient framework for solving large-scale linear systems by means of a novel utilization of Cramer's rule. While the latter is often perceived to be impractical when considered for large systems, it is shown that the algorithm proposed has an order N^3 complexity with pragmatic forward and backward stability. To the best of our knowledge, this is the first time that Cramer's rule has been demonstrated to be an order N^3 process. Empirical results are provided to substantiate the stated accuracy and computational complexity, clearly demonstrating the efficacy of the approach taken. The unique utilization of Cramer's rule and matrix condensation techniques yield an elegant process that can be applied to parallel computing architectures that support a broadcast communication infrastructure. The regularity of the communication patterns, and send-ahead ability, yields a viable framework for solving linear equations using conventional computing platforms. In addition, this dissertation demonstrates the algorithm's potential for solving large-scale sparse linear systems.
64

PERFORMANCE EVALUATION OF MEMORY AND COMPUTATIONALLY BOUND CHEMISTRY APPLICATIONS ON STREAMING GPGPUS AND MULTI-CORE X86 CPUS

Weber III, Frederick E 01 May 2010 (has links)
In recent years, multi-core processors have come to dominate the field in desktop and high performance computing. Graphics processors traditionally used in CAD, video games, and other 3-d applications, have become more programmable and are now suitable for general purpose computing. This thesis explores multi-core processors and GPU performance and limitations in two computational chemistry applications: a memory bound component of ab-initio modeling and a computationally bound Monte Carlo simulation. For the applications presented in this thesis, exploiting multiple processors is done using a variety of tools and languages including OpenMP and MKL. Brook+ and the Compute Abstraction Layer streaming environments are used to accelerate applications on AMD GPUs. This thesis gives qualitative assertions about these languages and tools regarding ease of use and optimization in addition to quantitative analyses of performance. GPUs can yield modest performance improvements with little effort in some applications and even larger speedups with simple optimizations.
65

Parallel Computing for Applications in Aeronautical CFD

Ytterström, Anders January 2001 (has links)
No description available.
66

Performance Optimizations for Software Transactional Memory

January 2011 (has links)
The transition from single-core processors to multi-core processors demands a change from sequential programming to concurrent programming for mainstream programmers. However, concurrent programming has long been widely recognized as being notoriously difficult. A major reason for its difficulty is that existing concurrent programming constructs provide low-level programming abstractions. Using these constructs forces programmers to consider many low level details. Locks, the dominant programming construct for mutual exclusion, suffer several well known problems, such as deadlock, priority inversion, and convoying, and are directly related to the difficulty of concurrent programming. The alternative to locks, i.e. non-blocking programming, not only is extremely error-prone, but also does not produce consistently good performance. Better programming constructs are critical to reduce the complexity of concurrent programming, increase productivity, and expose the computing power in multi-core processors. Transactional memory has emerged recently as one promising programming construct for supporting atomic operations on shared data. By eliminating the need to consider a huge number of possible interactions among concurrent transactions, Transactional memory greatly reduces the complexity of concurrent programming and vastly improves programming productivity. Software transactional memory systems implement a transactional memory abstraction in software. Unfortunately, existing designs of Software Transactional Memory systems incur significant performance overhead that could potentially prevent it from being widely used. Reducing STM's overhead will be critical for mainstream programmers to improve productivity while not suffering performance degradation. My thesis is that the performance of STM can be significantly improved by intelligently designing validation and commit protocols, by designing the time base, and by incorporating application-specific knowledge. I present four novel techniques for improving performance of STM systems to support my thesis. First, I propose a time-based STM system based on a runtime tuning strategy that is able to deliver performance equal to or better than existing strategies. Second, I present several novel commit phase designs and evaluate their performance. Then I propose a new STM programming interface extension that enables transaction optimizations using fast shared memory reads while maintaining transaction composability. Next, I present a distributed time base design that outperforms existing time base designs for certain types of STM applications. Finally, I propose a novel programming construct to support multi-place isolation. Experimental results show the techniques presented here can significantly improve the STM performance. We expect these techniques to help STM be accepted by more programmers.
67

Exploring the neural codes using parallel hardware

Baladron Pezoa, Javier 07 June 2013 (has links) (PDF)
The aim of this thesis is to understand the dynamics of large interconnected populations of neurons. The method we use to reach this objective is a mixture of mesoscopic modeling and high performance computing. The rst allows us to reduce the complexity of the network and the second to perform large scale simulations. In the rst part of this thesis a new mean eld approach for conductance based neurons is used to study numerically the eects of noise on extremely large ensembles of neurons. Also, the same approach is used to create a model of one hypercolumn from the primary visual cortex where the basic computational units are large populations of neurons instead of simple cells. All of these simulations are done by solving a set of partial dierential equations that describe the evolution of the probability density function of the network. In the second part of this thesis a numerical study of two neural eld models of the primary visual cortex is presented. The main focus in both cases is to determine how edge selection and continuation can be computed in the primary visual cortex. The dierence between the two models is in how they represent the orientation preference of neurons, in one this is a feature of the equations and the connectivity depends on it, while in the other there is an underlying map which denes an input function. All the simulations are performed on a Graphic Processing Unit cluster. Thethesis proposes a set of techniques to simulate the models fast enough on this kind of hardware. The speedup obtained is equivalent to that of a huge standard cluster.
68

Algorithms for VLSI Circuit Optimization and GPU-Based Parallelization

Liu, Yifang 2010 May 1900 (has links)
This research addresses some critical challenges in various problems of VLSI design automation, including sophisticated solution search on DAG topology, simultaneous multi-stage design optimization, optimization on multi-scenario and multi-core designs, and GPU-based parallel computing for runtime acceleration. Discrete optimization for VLSI design automation problems is often quite complex, due to the inconsistency and interference between solutions on reconvergent paths in directed acyclic graph (DAG). This research proposes a systematic solution search guided by a global view of the solution space. The key idea of the proposal is joint relaxation and restriction (JRR), which is similar in spirit to mathematical relaxation techniques, such as Lagrangian relaxation. Here, the relaxation and restriction together provides a global view, and iteratively improves the solution. Traditionally, circuit optimization is carried out in a sequence of separate optimization stages. The problem with sequential optimization is that the best solution in one stage may be worse for another. To overcome this difficulty, we take the approach of performing multiple optimization techniques simultaneously. By searching in the combined solution space of multiple optimization techniques, a broader view of the problem leads to the overall better optimization result. This research takes this approach on two problems, namely, simultaneous technology mapping and cell placement, and simultaneous gate sizing and threshold voltage assignment. Modern processors have multiple working modes, which trade off between power consumption and performance, or to maintain certain performance level in a powerefficient way. As a result, the design of a circuit needs to accommodate different scenarios, such as different supply voltage settings. This research deals with this multi-scenario optimization problem with Lagrangian relaxation technique. Multiple scenarios are taken care of simultaneously through the balance by Lagrangian multipliers. Similarly, multiple objective and constraints are simultaneously dealt with by Lagrangian relaxation. This research proposed a new method to calculate the subgradients of the Lagrangian function, and solve the Lagrangian dual problem more effectively. Multi-core architecture also poses new problems and challenges to design automation. For example, multiple cores on the same chip may have identical design in some part, while differ from each other in the rest. In the case of buffer insertion, the identical part have to be carefully optimized for all the cores with different environmental parameters. This problem has much higher complexity compared to buffer insertion on single cores. This research proposes an algorithm that optimizes the buffering solution for multiple cores simultaneously, based on critical component analysis. Under the intensifying time-to-market pressure, circuit optimization not only needs to find high quality solutions, but also has to come up with the result fast. Recent advance in general purpose graphics processing unit (GPGPU) technology provides massive parallel computing power. This research turns the complex computation task of circuit optimization into many subtasks processed by parallel threads. The proposed task partitioning and scheduling methods take advantage of the GPU computing power, achieve significant speedup without sacrifice on the solution quality.
69

Advance the DNA computing

Qiu, Zhiquan Frank 30 September 2004 (has links)
It has been previously shown that DNA computing can solve those problems currently intractable on even the fastest electronic computers. The algorithm design for DNA computing, however, is not straightforward. A strong background in both the DNA molecule and computer engineering are required to develop efficient DNA computing algorithms. After Adleman solved the Hamilton Path Problem using a combinatorial molecular method, many other hard computational problems were investigated with the proposed DNA computer. The existing models from which a few DNA computing algorithms have been developed are not sufficiently powerful and robust, however, to attract potential users. This thesis has described research performed to build a new DNA computing model based on various new algorithms developed to solve the 3-Coloring problem. These new algorithms are presented as vehicles for demonstrating the advantages of the new model, and they can be expanded to solve other NP-complete problems. These new algorithms can significantly speed up computation and therefore achieve a consistently better time performance. With the given resource, these algorithms can also solve problems of a much greater size, especially as compared to existing DNA computation algorithms. The error rate can also be greatly reduced by applying these new algorithms. Furthermore, they have the advantage of dynamic updating, so an answer can be changed based on modifications made to the initial condition. This new model makes use of the huge possible memory by generating a ``lookup table'' during the implementation of the algorithms. If the initial condition changes, the answer changes accordingly. In addition, the new model has the advantage of decoding all the strands in the final pool both quickly and efficiently. The advantages provided by the new model make DNA computing an efficient and attractive means of solving computationally intense problems.
70

Parallel Computing for Applications in Aeronautical CFD

Ytterström, Anders January 2001 (has links)
No description available.

Page generated in 0.047 seconds