111 |
High Performance Computing Issues in Large-Scale Molecular Statics SimulationsPulla, Gautam 02 June 1999 (has links)
Successful application of parallel high performance computing to practical problems requires overcoming several challenges. These range from the need to make sequential and parallel improvements in programs to the implementation of software tools which create an environment that aids sharing of high performance hardware resources and limits losses caused by hardware and software failures. In this thesis we describe our approach to meeting these challenges in the context of a Molecular Statics code. We describe sequential and parallel optimizations made to the code and also a suite of tools constructed to facilitate the execution of the Molecular Statics program on a network of parallel machines with the aim of increasing resource sharing, fault tolerance and availability. / Master of Science
|
112 |
Scalability Analysis of Synchronous Data-Parallel Artificial Neural Network (ANN) LearnersSun, Chang 14 September 2018 (has links)
Artificial Neural Networks (ANNs) have been established as one of the most important algorithmic tools in the Machine Learning (ML) toolbox over the past few decades. ANNs' recent rise to widespread acceptance can be attributed to two developments: (1) the availability of large-scale training and testing datasets; and (2) the availability of new computer architectures for which ANN implementations are orders of magnitude more efficient. In this thesis, I present research on two aspects of the second development. First, I present a portable, open source implementation of ANNs in OpenCL and MPI. Second, I present performance and scaling models for ANN algorithms on state-of-the-art Graphics Processing Unit (GPU) based parallel compute clusters. / Master of Science / Artificial Neural Networks (ANNs) have been established as one of the most important algorithmic tools in the Machine Learning (ML) toolbox over the past few decades. ANNs’ recent rise to widespread acceptance can be attributed to two developments: (1) the availability of large-scale training and testing datasets; and (2) the availability of new computer architectures for which ANN implementations are orders of magnitude more efficient. In this thesis, I present research on two aspects of the second development. First, I present a portable, open source implementation of ANNs in OpenCL and MPI. Second, I present performance and scaling models for ANN algorithms on state-of-the-art Graphics Processing Unit (GPU) based parallel compute clusters.
|
113 |
Evaluating the Design and Performance of a Single-Chip Parallel Computer Using System-Level Models and MethodologyLa Fratta, Patrick Anthony 12 May 2005 (has links)
As single-chip systems are predicted to soon contain over a billion transistors, design methodologies are evolving dramatically to account for the fast evolution of technologies and product properties. Novel methodologies feature the exploration of design alternatives early in development, the support for IPs, and early error detection — all with a decreasing time-to-market. In order to accommodate these product complexities and development needs, the modeling levels at which designers are working have quickly changed, as development at higher levels of abstraction allows for faster simulations of system models and earlier estimates of system performance while considering design trade-offs.
Recent design advancements to exploit instruction-level parallelism on single-processor computer systems have become exceedingly complex, and modern applications are presenting an increasing potential to be partitioned and parallelized at the thread level. The new Single-Chip, Message-Passing (SCMP) parallel computer is a tightly coupled mesh of processing nodes that is designed to exploit thread-level parallelism as efficiently as possible. By minimizing the latency of communication among processors, memory access time, and the time for context switching, the system designer will undoubtedly observe an overall performance increase. This study presents in-depth evaluations and quantitative analyses of various design and performance aspects of SCMP through the development of abstract hardware models by following a formalized, well-defined methodology. The performance evaluations are taken through benchmark simulation while taking into account system-level communication and synchronization among nodes as well as node-level timing and interaction amongst node components. Through the exploration of alternatives and optimization of the components within the SCMP models, maximum system performance in the hardware implementation can be achieved. / Master of Science
|
114 |
Structural Modeling and Optimization of Aircraft Wings having Curvilinear Spars and Ribs (SpaRibs)De, Shuvodeep 22 September 2017 (has links)
The aviation industry is growing at a steady rate but presently, the industry is highly dependent on fossil fuel. As the world is running out of fossil fuels and the wide-spread acceptance of climate change due to carbon emissions, both the governments and industry are spending a significant amount of resources on research to reduce the weight and hence the fuel consumption of commercial aircraft. A commercial fixed-wing aircraft wing consists of spars which are beams running in span-wise direction, carrying the flight loads and ribs which are panels with holes attached to the spars to preserve the outer airfoil shape of the wing. Kapania et al. at Virginia Tech proposed the concept of reducing the weight of aircraft wing using unconventional design of the internal structure consisting of curvilinear spars and ribs (known as SpaRibs) for enhanced performance. A research code, EBF3GLWingOpt, was developed by the Kapania Group. at Virginia Tech to find the best configuration of SpaRibs in terms of weight saving for given flight conditions. However, this software had a number of limitations and it can only create and analyze limited number of SpaRibs configurations. In this work, the limitations of the EBF3GLWingOpt code has been identified and new algorithms have been developed to make is robust and analyze larger number of SpaRibs configurations. The code also has the capability to create cut-outs in the SpaRibs for passage of fuel pipes and wirings. This new version of the code can be used to find best SpaRibs configuration for multiple objectives such as reduction of weight and increase flutter velocity. The code is developed in Python language and it has parallel computational capabilities. The wing is modeled using commercial FEA software, MSC.PATRAN and analyzed using MSC.NASTRAN which are from within EBF3GLWingOpt. Using this code a significant weight reduction for a transport aircraft wing has been achieved. / PHD / The aviation industry is growing at a steady rate but presently, the industry is highly dependent on fossil fuel. As the world is running out of fossil fuels and the wide-spread acceptance of climate change due to carbon emissions, both the governments and industry are spending a significant amount of resources on research to reduce the weight and hence the fuel consumption of commercial aircraft. A commercial fixed-wing aircraft wing consists of spars which are beams running in span-wise direction, carrying the flight loads and ribs which are panels with holes attached to the spars to preserve the outer airfoil shape of the wing. Kapania et al. at Virginia Tech proposed the concept of reducing the weight of aircraft wing using unconventional design of the internal structure consisting of curvilinear spars and ribs (known as SpaRibs) for enhanced performance. A research code, EBF3GLWingOpt, was developed by the Kapania Group. at Virginia Tech to find the best configuration of SpaRibs in terms of weight saving for given flight conditions. However, this software had a number of limitations and it can only create and analyze limited number of SpaRibs configurations. In this work, the limitations of the EBF3GLWingOpt code has been identified and new algorithms have been developed to make is robust and analyze larger number of SpaRibs configurations. The code also has the capability to create cut-outs in the SpaRibs for passage of fuel pipes and wirings. This new version of the code can be used to find best SpaRibs configuration for multiple objectives such as reduction of weight and increase flutter velocity. The code is developed in Python language and it has parallel computational capabilities. The wing is modeled using commercial FEA software, MSC.PATRAN and analyzed using MSC.NASTRAN which are from within EBF3GLWingOpt. Using this code a significant weight reduction for a transport aircraft wing has been achieved.
|
115 |
Blending Methods for Composite Laminate OptimizationAdams, David Bruce 30 August 2002 (has links)
Composite panel structure optimization is commonly decomposed into panel optimization subproblems, with specified local loads, resulting in manufacturing incompatibilities between adjacent panel designs. Using genetic algorithms to optimize local panel stacking sequences allows panel populations of stacking sequences to evolve in parallel and send migrants to adjacent panels, so as to blend the local panel designs globally. The blending process is accomplished using the edit distance between individuals of a population and the set of migrants from adjacent panels. The objective function evaluating the fitness of designs is modified according to the severity of mismatches detected between neighboring populations. This lays the ground work for natural evolution to a blended global solution without leaving the paradigm of genetic algorithms. An additional method proposed here for constructing globally blended panel designs uses a parallel decomposition antithetical to that of earlier work. Rather than performing concurrent panel genetic optimizations, a single genetic optimization is conducted for the entire structure with the parallelism solely within the fitness evaluations. A guide based genetic algorithm approach is introduced to exclusively generate and evaluate valid globally blended designs, utilizing a simple master-slave parallel implementation, implicitly reducing the size of the problem design space and increasing the quality of discovered local optima. / Master of Science
|
116 |
An Efficient Parallel Three-Level Preconditioner for Linear Partial Differential EquationsYao, Aixiang I Song 26 February 1998 (has links)
The primary motivation of this research is to develop and investigate parallel preconditioners for linear elliptic partial differential equations. Three preconditioners are studied: block-Jacobi preconditioner (BJ), a two-level tangential preconditioner (D0), and a three-level preconditioner (D1). Performance and scalability on a distributed memory parallel computer are considered. Communication cost and redundancy are explored as well.
After experiments and analysis, we find that the three-level preconditioner D1 is the most efficient and scalable parallel preconditioner, compared to BJ and D0. The D1 preconditioner reduces both the number of iterations and computational time substantially. A new hybrid preconditioner is suggested which may combine the best features of D0 and D1. / Master of Science
|
117 |
Towards Enhancing Performance, Programmability, and Portability in Heterogeneous ComputingKrommydas, Konstantinos 03 May 2017 (has links)
The proliferation of a diverse set of heterogeneous computing platforms in conjunction with the plethora of programming languages and optimization techniques on each language for each underlying architecture exacerbate widespread adoption of such platforms. This is especially true for novice programmers and the non-technical-savvy masses that are largely precluded from enjoying the advantages of high-performance computing. Moreover, different groups within the heterogeneous computing community (e.g., hardware architects, tool developers, and programmers) are presented with new challenges with respect to performance, programmability, and portability (or the three P's) of heterogeneous computing.
In this work we discuss such challenges and identify benchmarking techniques based on computation and communication patterns as an appropriate means for the systematic evaluation of heterogeneous computing with respect to the three P's. Our proposed approach is based on OpenCL implementations of the Berkeley dwarfs. We use our benchmark suite (OpenDwarfs) in characterizing performance of state-of-the-art parallel architectures, and as the main component of a methodology (Telescoping Architectures) for identifying trends in future heterogeneous architectures. Furthermore, we employ OpenDwarfs in a multi-faceted study on the gaps between the three P's in the context of the modern heterogeneous computing landscape. Our case-study spans a variety of compilers, languages, optimizations, and target architectures, including the CPU, GPU, MIC, and FPGA. Based on our insights, and extending aspects of prior research (e.g., in compilers, programming languages, and auto-tuning), we propose the introduction of grid-based data structures as the basis of programming frameworks and present a prototype unified framework (GLAF) that encompasses a novel visual programming environment with code generation, auto-parallelization, and auto-tuning capabilities. Our results, which span scientific domains, indicate that our holistic approach constitutes a viable alternative towards enhancing the three P's and further democratizing heterogeneous, parallel computing for non-programming-savvy audiences, and especially domain scientists. / Ph. D. / In the past decade computing has moved from <i>single-core</i> machines, that is machines with a CPU that can execute code in a serial manner, to <i>multi-core</i> ones, i.e., machines with CPUs that can execute code in a parallel fashion. Another paradigm shift that has manifested in the past years entails computing that utilizes <i>heterogeneous processing</i>, as opposed to <i>homogeneous processing</i>. In the latter case a single type of processor (CPU) is responsible for executing a given program, whereas in the former case different types of processors (such as CPUs, graphics processors or other accelerators) collaborate in an effort to tackle computationally difficult problems in a fast, parallel manner.
The shift to <i>multi-core, parallel, heterogeneous computing</i> described above is accompanied by an associated shift in programming languages for such platforms, as well as techniques to optimize programs for high performance (i.e., execution speed). The unique complexities of parallel and heterogeneous computing exacerbate widespread adoption of such platforms. This is especially true for novice programmers and the non-technical-savvy masses that are largely precluded from the advantages of high-performance computing. Challenges include obtaining fast execution speeds (i.e., <i>performance<i>), easiness of programming (i.e., <i>programmability</i>), and the ability to execute programs across different heterogeneous platforms (i.e., <i>portability</i>). Performance, programmability, and portability constitute the <i>3 P’s of heterogeneous computing</i>.
In this work we discuss the above challenges in detail and provide insights and solutions for different interest groups within the computing community, such as computer architects, tool developers and programmers. We propose an approach for evaluating existing heterogeneous computing platforms based on the concept of <i>dwarf-based benchmarks</i> (i.e., applications that are characterized by certain computation and communication patterns). Furthermore, we propose a methodology for utilizing the dwarf concept for evaluating potential future heterogeneous platforms. In our research we attempt to quantify the trade-offs between performance, programmability, and portability in a wide set of modern heterogeneous platforms. Based on the above, we seek to bridge the 3 P’s by introducing a programming framework that democratizes parallel algorithm development on heterogeneous architectures for novice programmers and domain scientists. Specifically, our framework produces parallel, optimized code implementations in multiple languages with the potential of executing across different heterogeneous platforms.
|
118 |
AMIGO: Uma contribuição para a convergência na área de escalonamento de processos / AMIGO: a contribution to the convergence in the area of process schedulingSouza, Paulo Sergio Lopes de 26 June 2000 (has links)
Este trabalho propõe e descreve em detalhes o projeto do AMIGO (DynAMical FlexIble SchedulinG EnvirOnment), uma nova ferramenta de software capaz de viabilizar a união de diferentes algoritmos de escalonamento, de uma maneira completamente transparente ao usuário. O AMIGO é capaz de flexibilizar o escalonamento (em tempo de execução da aplicação) desde a sua configuração até a sua efetiva aplicação. Além da flexibilidade dinâmica e da transparência, o AMIGO também é modular: o seu projeto está dividido em módulos que, entre outras vantagens, facilitam sua execução em diferentes plataformas. Este trabalho também contribui apresentando uma análise crítica da literatura da área, apontando divergências e propondo pontos de convergência importantes. Assim, o levantamento bibliográfico apresentado atua como um material introdutório precioso para que os pesquisadores iniciantes formem um contexto geral sobre a área e, desse modo, aprofundem mais rapidamente seus estudos em outros trabalhos mais específicos. A avaliação de desempenho feita com o AMIGO demonstra que é possível a obtenção de ganhos de desempenho expressivos, com total transparência para o usuário final. Unindo-se desempenho, flexibilidade e transparência, espera-se contribuir para a redução da lacuna existente entre teoria e prática na área de escalonamento de processos / This thesis proposes and describes in details the design of the AMIGO (DynAMical FlexIble SchedulinG EnvirOnment), a novel software tool that makes possible the union of different scheduling algorithms, in a way completely transparent to the user. The AMIGO is able to make flexible the scheduling activity (at run-time), covering all the steps from its configuration up to its effective application. Besides the dynamic flexibility and transparency, AMIGO is also modular: it is split into modules that, among other advantages, facilitate its execution on different platforms. This work also contributes by presenting a critical analysis of the process-scheduling literature, pointing out the existing divergences and proposing important convergence points. Thus, the literature survey presented acts as a precious introductory material, which is able, on one hand, to give to the beginners a broad view of the process-scheduling area and, on the other hand, to facilitate the development of deeper studies in a quicker fashion when more specific works are needed. The performance evaluation of the AMIGO shows that is possible to have expressive performance gains, while having total user transparency. Joining flexibility and transparency it is hoped to contribute for the reduction of the existing gap between theory and practice in the scheduling process area
|
119 |
AMIGO: Uma contribuição para a convergência na área de escalonamento de processos / AMIGO: a contribution to the convergence in the area of process schedulingPaulo Sergio Lopes de Souza 26 June 2000 (has links)
Este trabalho propõe e descreve em detalhes o projeto do AMIGO (DynAMical FlexIble SchedulinG EnvirOnment), uma nova ferramenta de software capaz de viabilizar a união de diferentes algoritmos de escalonamento, de uma maneira completamente transparente ao usuário. O AMIGO é capaz de flexibilizar o escalonamento (em tempo de execução da aplicação) desde a sua configuração até a sua efetiva aplicação. Além da flexibilidade dinâmica e da transparência, o AMIGO também é modular: o seu projeto está dividido em módulos que, entre outras vantagens, facilitam sua execução em diferentes plataformas. Este trabalho também contribui apresentando uma análise crítica da literatura da área, apontando divergências e propondo pontos de convergência importantes. Assim, o levantamento bibliográfico apresentado atua como um material introdutório precioso para que os pesquisadores iniciantes formem um contexto geral sobre a área e, desse modo, aprofundem mais rapidamente seus estudos em outros trabalhos mais específicos. A avaliação de desempenho feita com o AMIGO demonstra que é possível a obtenção de ganhos de desempenho expressivos, com total transparência para o usuário final. Unindo-se desempenho, flexibilidade e transparência, espera-se contribuir para a redução da lacuna existente entre teoria e prática na área de escalonamento de processos / This thesis proposes and describes in details the design of the AMIGO (DynAMical FlexIble SchedulinG EnvirOnment), a novel software tool that makes possible the union of different scheduling algorithms, in a way completely transparent to the user. The AMIGO is able to make flexible the scheduling activity (at run-time), covering all the steps from its configuration up to its effective application. Besides the dynamic flexibility and transparency, AMIGO is also modular: it is split into modules that, among other advantages, facilitate its execution on different platforms. This work also contributes by presenting a critical analysis of the process-scheduling literature, pointing out the existing divergences and proposing important convergence points. Thus, the literature survey presented acts as a precious introductory material, which is able, on one hand, to give to the beginners a broad view of the process-scheduling area and, on the other hand, to facilitate the development of deeper studies in a quicker fashion when more specific works are needed. The performance evaluation of the AMIGO shows that is possible to have expressive performance gains, while having total user transparency. Joining flexibility and transparency it is hoped to contribute for the reduction of the existing gap between theory and practice in the scheduling process area
|
120 |
High performance latent dirichlet allocation for text miningLiu, Zelong January 2013 (has links)
Latent Dirichlet Allocation (LDA), a total probability generative model, is a three-tier Bayesian model. LDA computes the latent topic structure of the data and obtains the significant information of documents. However, traditional LDA has several limitations in practical applications. LDA cannot be directly used in classification because it is a non-supervised learning model. It needs to be embedded into appropriate classification algorithms. LDA is a generative model as it normally generates the latent topics in the categories where the target documents do not belong to, producing the deviation in computation and reducing the classification accuracy. The number of topics in LDA influences the learning process of model parameters greatly. Noise samples in the training data also affect the final text classification result. And, the quality of LDA based classifiers depends on the quality of the training samples to a great extent. Although parallel LDA algorithms are proposed to deal with huge amounts of data, balancing computing loads in a computer cluster poses another challenge. This thesis presents a text classification method which combines the LDA model and Support Vector Machine (SVM) classification algorithm for an improved accuracy in classification when reducing the dimension of datasets. Based on Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the algorithm automatically optimizes the number of topics to be selected which reduces the number of iterations in computation. Furthermore, this thesis presents a noise data reduction scheme to process noise data. When the noise ratio is large in the training data set, the noise reduction scheme can always produce a high level of accuracy in classification. Finally, the thesis parallelizes LDA using the MapReduce model which is the de facto computing standard in supporting data intensive applications. A genetic algorithm based load balancing algorithm is designed to balance the workloads among computers in a heterogeneous MapReduce cluster where the computers have a variety of computing resources in terms of CPU speed, memory space and hard disk space.
|
Page generated in 0.0919 seconds