Global ETD Search

41	Improving the throughput of novel cluster computing systems Wu, Jiadong 21 September 2015 (has links) Traditional cluster computing systems such as the supercomputers are equipped with specially designed high-performance hardware, which escalates the manufacturing cost and the energy cost of those systems. Due to such drawbacks and the diversified demand in computation, two new types of clusters are developed: the GPU clusters and the Hadoop clusters. The GPU cluster combines traditional CPU-only computing cluster with general purpose GPUs to accelerate the applications. Thanks to the massively-parallel architecture of the GPU, this type of system can deliver much higher performance-per-watt than the traditional computing clusters. The Hadoop cluster is another popular type of cluster computing system. It uses inexpensive off-the-shelf component and standard Ethernet to minimize manufacturing cost. The Hadoop systems are widely used throughout the industry. Alongside with the lowered cost, these new systems also bring their unique challenges. According to our study, the GPU clusters are prone to severe under-utilization due to the heterogeneous nature of its computation resources, and the Hadoop clusters are vulnerable to network congestion due to its limited network resources. In this research, we are trying to improve the throughput of these novel cluster computing systems by increasing the workload parallelism and network I/O parallelism. GPU Hadoop Computer cluster Parallel computing
42	Improving the efficiency of dynamic traffic assignment through computational methods based on combinatorial algorithm Nezamuddin 12 October 2011 (has links) Transportation planning and operation requires determining the state of the transportation system under different network supply and demand conditions. The most fundamental determinant of the state of a transportation system is time-varying traffic flow pattern on its roadway segments. It forms a basis for numerous engineering analyses which are used in operational- and planning-level decision-making process. Dynamic traffic assignment (DTA) models are the leading modeling tools employed to determine time-varying traffic flow pattern under changing network conditions. DTA models have matured over the past three decades, and are now being adopted by transportation planning agencies and traffic management centers. However, DTA models for large-scale regional networks require excessive computational resources. The problem becomes further compounded for other applications such as congestion pricing, capacity calibration, and network design for which DTA needs to be solved repeatedly as a sub-problem. This dissertation aims to improve the efficiency of the DTA models, and increase their viability for various planning and operational applications. To this end, a suite of computational methods based on the combinatorial approach for dynamic traffic assignment was developed in this dissertation. At first, a new polynomial run time combinatorial algorithm for DTA was developed. The combinatorial DTA (CDTA) model complements and aids simulation-based DTA models rather than replace them. This is because various policy measures and active traffic control strategies are best modeled using the simulation-based DTA models. Solution obtained from the CDTA model was provided as an initial feasible solution to a simulation-based DTA model to improve its efficiency – this process is called “warm starting” the simulation-based DTA model. To further improve the efficiency of the simulation-based DTA model, the warm start process is made more efficient through parallel computing. Parallel computing was applied to the CDTA model and the traffic simulator used for warm starting. Finally, another warm start method based on the static traffic assignment model was tested on the simulation-based DTA model. The computational methods developed in this dissertation were tested on the Anaheim, CA and Winnipeg, Canada networks. Models warm-started using the CDTA solution performed better than the purely simulation-based DTA models in terms of equilibrium convergence metrics and run time. Warm start methods using solutions from the static traffic assignment models showed similar improvements. Parallel computing was applied to the CDTA model, and it resulted in faster execution time by employing multiple computer processors. Parallel version of the traffic simulator can also be embedded into the simulation-assignment framework of the simulation-based DTA models and improve their efficiency. / text Dynamic traffic assignment Parallel computing Combinatorial algorithm
43	Dynamic peer-to-peer construction of clusters Kadaru, Pranith Reddy 13 January 2010 (has links) The use of parallel computing is increasing with the need to solve ever more complex problems. Unfortunately, while the cost of parallel systems (including clusters and small-scale shared memory machines) has decreased, such machines are still not within the reach of many users. This is particularly true if large numbers of processors are needed. A largely untapped resource for doing some, simpler, types of parallel computing are temporarily idle machines in distributed environments. Such environments range from the simple (identical machines connected via a LAN) to the complex (heterogeneous machines connected via the Internet). In this thesis I describe a system for dynamically clustering together similar machines distributed across the Internet. This is done in a peer-to-peer (P2P) fashion with the goal of ultimately forming useful compute clusters without the need for a heavily centralized software system overseeing the process. In this sense my work builds on so-called "volunteer computing" efforts, such as SETI@Home but with the goal of supporting a #11;different class of compute problems. I #12;first consider the characteristics that are necessary to form good clusters of shared machines that can be used together effectively. Second, I exploit simple clustering algorithms to group together appropriate machines using the identified#12;ed characteristics. My system assembles workstations into clusters which are, in some sense, "close" in terms of bandwidth, latency and/or number of network hops and that are also computationally similar in terms of processor speed, memory capacity and available hard disk space. Finally, I assess the conditions under which my proposed system might be effective via simulation using generated network topologies that are intended to reflect real-world characteristics. The results of these simulations suggest that my system is tunable to different conditions and that the algorithms presented can #11;effectively group together appropriate machines to form clusters and can also manage those clusters #11;effectively as the constituent machines join and leave the system. Clusters Peer-to-Peer parallel computing
44	Dynamic peer-to-peer construction of clusters Kadaru, Pranith Reddy 13 January 2010 (has links) The use of parallel computing is increasing with the need to solve ever more complex problems. Unfortunately, while the cost of parallel systems (including clusters and small-scale shared memory machines) has decreased, such machines are still not within the reach of many users. This is particularly true if large numbers of processors are needed. A largely untapped resource for doing some, simpler, types of parallel computing are temporarily idle machines in distributed environments. Such environments range from the simple (identical machines connected via a LAN) to the complex (heterogeneous machines connected via the Internet). In this thesis I describe a system for dynamically clustering together similar machines distributed across the Internet. This is done in a peer-to-peer (P2P) fashion with the goal of ultimately forming useful compute clusters without the need for a heavily centralized software system overseeing the process. In this sense my work builds on so-called "volunteer computing" efforts, such as SETI@Home but with the goal of supporting a #11;different class of compute problems. I #12;first consider the characteristics that are necessary to form good clusters of shared machines that can be used together effectively. Second, I exploit simple clustering algorithms to group together appropriate machines using the identified#12;ed characteristics. My system assembles workstations into clusters which are, in some sense, "close" in terms of bandwidth, latency and/or number of network hops and that are also computationally similar in terms of processor speed, memory capacity and available hard disk space. Finally, I assess the conditions under which my proposed system might be effective via simulation using generated network topologies that are intended to reflect real-world characteristics. The results of these simulations suggest that my system is tunable to different conditions and that the algorithms presented can #11;effectively group together appropriate machines to form clusters and can also manage those clusters #11;effectively as the constituent machines join and leave the system. Clusters Peer-to-Peer parallel computing
45	Parallel training algorithms for analogue hardware neural nets Zhang, Liang January 2007 (has links) Feedforward neural networks are massively parallel computing structures that have the capability of universal function approximation. The most prevalent realisation of neural nets is in the form of an algorithm implemented in a computer program. Neural networks as computer programs lose the inher- ent parallism. Parallism can only be recovered by executing the program on an expensive parallel digital computer. Achievement of the inherent massive parallelism at a lower cost requires direct hardware realisation of the neural net. Such hardware has been developed jointly by QUT and the Heinz Nixdorf Institute (Germany) called the Local Cluster Neural Network (LCNN) chip. But this neural net chip lacks the capability of in-circuit learning or on-chip training. The weights for the analogue LCNN network have to be computed o® chip on a digital computer. Based on the previous work, this research focuses on the Local Cluster Neu- ral Network and its analogue chip. The characteristic of the LCNN chip was measured exhaustively and its behaviours were compared to the theoretical functionality of the LCNN. To overcome the manufacturing °uctuations and deviations presented in analogue circuits, we used chip-in-the-loop strategy for training of the LCNN chip. A new training algorithm: Probabilistic Random Weight Change for the chip-in-the-loop training for function approximation. In order to implement the LCNN analogue chip with on-chip training, two training algorithms are studied in on-line training mode in simulations: the Probabilistic Random Weight Change (PRWC) algorithm and the modified Gradient Descent (GD) algorithm. The circuits design for the PRWC on-chip training and the GD on-chip training are outlined. These two methods are compared for their training performance and the complexity of their circuits. This research provides the foundation for the next version of LCNN analogue hardware implementation. neural networks parallism parallel computing LCNN
46	A HIGH PERFORMANCE GIBBS-SAMPLING ALGORITHM FOR ITEM RESPONSE THEORY MODELS Patsias, Kyriakos 01 January 2009 (has links) Item response theory (IRT) is a newer and improved theory compared to the classical measurement theory. The fully Bayesian approach shows promise for IRT models. However, it is computationally expensive, and therefore is limited in various applications. It is important to seek ways to reduce the execution time and a suitable solution is the use of high performance computing (HPC). HPC offers considerably high computational power and can handle applications with high computation and memory requirements. In this work, we have modified the existing fully Bayesian algorithm for 2PNO IRT models so that it can be run on a high performance parallel machine. With this parallel version of the algorithm, the empirical results show that a speedup was achieved and the execution time was reduced considerably. Gibbs sampling IRT MPI parallel computing
47	A PARALLEL IMPLEMENTATION OF GIBBS SAMPLING ALGORITHM FOR 2PNO IRT MODELS Rahimi, Mona 01 August 2011 (has links) Item response theory (IRT) is a newer and improved theory compared to the classical measurement theory. The fully Bayesian approach shows promise for IRT models. However, it is computationally expensive, and therefore is limited in various applications. It is important to seek ways to reduce the execution time and a suitable solution is the use of high performance computing (HPC). HPC offers considerably high computational power and can handle applications with high computation and memory requirements. In this work, we have applied two different parallelism methods to the existing fully Bayesian algorithm for 2PNO IRT models so that it can be run on a high performance parallel machine with less communication load. With our parallel version of the algorithm, the empirical results show that a speedup was achieved and the execution time was considerably reduced. 2PNO IRT Models Gibbs Sampling Parallel computing
48	Algoritmos de alinhamento múltiplo e técnicas de otimização para esses algoritmos utilizando Ant Colony / Zafalon, Geraldo Francisco Donega. January 2009 (has links) Orientador: José Márcio Machado / Banca: Liria Matsumoto Sato / Banca: Renata Spolon Lobato / Resumo: A biologia, como uma ciência bastante desenvolvida, foi dividida em diversas areas, dentre elas, a genética. Esta area passou a crescer em importância nos ultimos cinquenta anos devido aos in umeros benefícios que ela pode trazer, principalmente, aos seres humanos. Como a gen etica passou a apresentar problemas com grande complexidade de resolução estratégias computacionais foram agregadas a ela, surgindo assim a bioinform atica. A bioinformática desenvolveu-se de forma bastante signi cativa nos ultimos anos e esse desenvolvimento vem se acentuando a cada dia, devido ao aumento da complexidade dos problemas genômicos propostos pelos biólogos. Assim, os cientistas da computação têm se empenhado no desenvolvimento de novas técnicas computacionais para os biólogos, principalmente no que diz respeito as estrat egias para alinhamentos m ultiplos de sequências. Quando as sequências estão alinhadas, os biólogos podem realizar mais inferências sobre elas, principalmente no reconhecimento de padrões que e uma outra area interessante da bioinformática. Atrav es do reconhecimento de padrãoes, os bi ologos podem identicar pontos de alta signi cância (hot spots) entre as sequências e, consequentemente, pesquisar curas para doençass, melhoramentos genéticos na agricultura, entre outras possibilidades. Este trabalho traz o desenvolvimento e a comparação entre duas técnicas computacionais para o alinhamento m ultiplo de sequências. Uma e baseada na técnica de alinhamento múltiplo de sequências progressivas pura e a outra, e uma técnica de alinhamento múltiplo de sequências otimizada a partir da heurística de colônia de formigas. Ambas as técnicas adotam em algumas de suas fases estratégias de paralelismo, focando na redu c~ao do tempo de execução dos algoritmos. Os testes de desempenho e qualidade dos alinhamentos que foram conduzidos com as duas estrat egias... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Biology as an enough developed science was divided in some areas, and genetics is one of them. This area has improved its relevance in last fty years due to the several bene ts that it can mainly bring to the humans. As genetics starts to show problems with hard resolution complexity, computational strategies were aggregated to it, leading to the start of the bioinformatics. The bioinformatics has been developed in a signi cant way in the last years and this development is accentuating everyday due to the increase of the complexity of the genomic problems proposed by biologists. Thus, the computer scientists have committed in the development of new computational techniques to the biologists, mainly related to the strategies to multiple sequence alignments. When the sequences are aligned, the biologists can do more inferences about them mainly in the pattern recognition that is another interesting area of the bioinformatics. Through the pattern recognition, the biologists can nd hot spots among the sequences and consequently contribute for the cure of diseases, genetics improvements in the agriculture and many other possibilities. This work brings the development and the comparison between two computational techniques for the multiple sequence alignments. One is based on the pure progressive multiple sequence alignment technique and the other one is an optimized multiple sequence alignment technique based on the ant colony heuristics. Both techniques take on some of its stages of parallel strategies, focusing on reducing the execution time of algorithms. Performance and quality tests of the alignments were conducted with both strategies and showed that the optimized approach presents better results when it is compared with the pure progressive approach. Biology as an enough developed science was divided in some areas, and genetics is one of them. This area has improved... (Complete abstract click electronic access below) / Mestre Processamento paralelo (Computadores) Parallel computing. eng
49	Hardware Architecture Impact on Manycore Programming Model Stubbfält, Erik January 2021 (has links) This work investigates how certain processor architectures can affectthe implementation and performance of a parallel programming model.The Ericsson Many-Core Architecture (EMCA) is compared and contrastedto general-purpose multicore processors, highlighting differencesin their memory systems and processor cores. A proof-of-conceptimplementation of the Concurrency Building Blocks (CBB) programmingmodel is developed for x86-64 using MPI. Benchmark tests showhow CBB on EMCA handles compute-intensive and memory-intensivescenarios, compared to a high-end x86-64 machine running the proofof-concept implementation. EMCA shows its strengths in heavy computationswhile x86-64 performs at its best with high degrees of datareuse. Both systems are able to utilize locality in their memory systemsto achieve great performance benefits. Computer Architecture Parallel Computing Computer Engineering Datorteknik
50	Large-Scale Dynamic Optimization Under Uncertainty using Parallel Computing Washington, Ian D. January 2016 (has links) This research focuses on the development of a solution strategy for the optimization of large-scale dynamic systems under uncertainty. Uncertainty resides naturally within the external forces posed to the system or from within the system itself. For example, in chemical process systems, external inputs include flow rates, temperatures or compositions; while internal sources include kinetic or mass transport parameters; and empirical parameters used within thermodynamic correlations and expressions. The goal in devising a dynamic optimization approach which explicitly accounts for uncertainty is to do so in a manner which is computationally tractable and is general enough to handle various types and sources of uncertainty. The approach developed in this thesis follows a so-called multiperiod technique whereby the infinite dimensional uncertainty space is discretized at numerous points (known as periods or scenarios) which creates different possible realizations of the uncertain parameters. The resulting optimization formulation encompasses an approximated expected value of a chosen objective functional subject to a dynamic model for all the generated realizations of the uncertain parameters. The dynamic model can be solved, using an appropriate numerical method, in an embedded manner for which the solution is used to construct the optimization formulation constraints; or alternatively the model could be completely discretized over the temporal domain and posed directly as part of the optimization formulation. Our approach in this thesis has mainly focused on the embedded model technique for dynamic optimization which can either follow a single- or multiple-shooting solution method. The first contribution of the thesis investigates a combined multiperiod multiple-shooting dynamic optimization approach for the design of dynamic systems using ordinary differential equation (ODE) or differential-algebraic equation (DAE) process models. A major aspect of this approach is the analysis of the parallel solution of the embedded model within the optimization formulation. As part of this analysis, we further consider the application of the dynamic optimization approach to several design and operation applications. Another vmajor contribution of the thesis is the development of a nonlinear programming (NLP) solver based on an approach that combines sequential quadratic programming (SQP) with an interior-point method (IPM) for the quadratic programming subproblem. A unique aspect of the approach is that the inherent structure (and parallelism) of the multiperiod formulation is exploited at the linear algebra level within the SQP-IPM nonlinear programming algorithm using an explicit Schur-complement decomposition. Our NLP solution approach is further assessed using several static and dynamic optimization benchmark examples. / Thesis / Doctor of Philosophy (PhD)

Search results