Spelling suggestions: "subject:"arallel computing"" "subject:"aparallel computing""
41 
Improving the throughput of novel cluster computing systemsWu, Jiadong 21 September 2015 (has links)
Traditional cluster computing systems such as the supercomputers are equipped with specially designed highperformance hardware, which escalates the manufacturing cost and the energy cost of those systems. Due to such drawbacks and the diversified demand in computation, two new types of clusters are developed: the GPU clusters and the Hadoop clusters.
The GPU cluster combines traditional CPUonly computing cluster with general purpose GPUs to accelerate the applications. Thanks to the massivelyparallel architecture of the GPU, this type of system can deliver much higher performanceperwatt than the traditional computing clusters. The Hadoop cluster is another popular type of cluster computing system. It uses inexpensive offtheshelf component and standard Ethernet to minimize manufacturing cost. The Hadoop systems are widely used throughout the industry.
Alongside with the lowered cost, these new systems also bring their unique challenges. According to our study, the GPU clusters are prone to severe underutilization due to the heterogeneous nature of its computation resources, and the Hadoop clusters are vulnerable to network congestion due to its limited network resources. In this research, we are trying to improve the throughput of these novel cluster computing systems by increasing the workload parallelism and network I/O parallelism.

42 
Improving the efficiency of dynamic traffic assignment through computational methods based on combinatorial algorithmNezamuddin 12 October 2011 (has links)
Transportation planning and operation requires determining the state of the transportation system under different network supply and demand conditions. The most fundamental determinant of the state of a transportation system is timevarying traffic flow pattern on its roadway segments. It forms a basis for numerous engineering analyses which are used in operational and planninglevel decisionmaking process. Dynamic traffic assignment (DTA) models are the leading modeling tools employed to determine timevarying traffic flow pattern under changing network conditions. DTA models have matured over the past three decades, and are now being adopted by transportation planning agencies and traffic management centers. However, DTA models for largescale regional networks require excessive computational resources. The problem becomes further compounded for other applications such as congestion pricing, capacity calibration, and network design for which DTA needs to be solved repeatedly as a subproblem. This dissertation aims to improve the efficiency of the DTA models, and increase their viability for various planning and operational applications.
To this end, a suite of computational methods based on the combinatorial approach for dynamic traffic assignment was developed in this dissertation. At first, a new polynomial run time combinatorial algorithm for DTA was developed. The combinatorial DTA (CDTA) model complements and aids simulationbased DTA models rather than replace them. This is because various policy measures and active traffic control strategies are best modeled using the simulationbased DTA models. Solution obtained from the CDTA model was provided as an initial feasible solution to a simulationbased DTA model to improve its efficiency – this process is called “warm starting” the simulationbased DTA model. To further improve the efficiency of the simulationbased DTA model, the warm start process is made more efficient through parallel computing. Parallel computing was applied to the CDTA model and the traffic simulator used for warm starting. Finally, another warm start method based on the static traffic assignment model was tested on the simulationbased DTA model.
The computational methods developed in this dissertation were tested on the Anaheim, CA and Winnipeg, Canada networks. Models warmstarted using the CDTA solution performed better than the purely simulationbased DTA models in terms of equilibrium convergence metrics and run time. Warm start methods using solutions from the static traffic assignment models showed similar improvements. Parallel computing was applied to the CDTA model, and it resulted in faster execution time by employing multiple computer processors. Parallel version of the traffic simulator can also be embedded into the simulationassignment framework of the simulationbased DTA models and improve their efficiency. / text

43 
Dynamic peertopeer construction of clustersKadaru, Pranith Reddy 13 January 2010 (has links)
The use of parallel computing is increasing with the need to solve ever more complex problems. Unfortunately, while the cost of parallel systems (including clusters and smallscale shared memory machines) has decreased, such machines are still not within the reach of many users. This is particularly true if large numbers of processors are needed. A largely untapped resource for doing some, simpler, types of parallel computing
are temporarily idle machines in distributed environments. Such environments range from the simple (identical machines connected via a LAN) to the complex (heterogeneous machines connected via the Internet).
In this thesis I describe a system for dynamically clustering together similar machines distributed across the Internet. This is done in a peertopeer (P2P) fashion with the goal of ultimately forming useful compute clusters without the need for a heavily centralized software system overseeing the process. In this sense my work builds on socalled
"volunteer computing" efforts, such as SETI@Home but with the goal of supporting a #11;different class of compute problems.
I #12;first consider the characteristics that are necessary to form good clusters of shared machines that can be used together effectively. Second, I exploit simple clustering algorithms to group together appropriate machines using the identified#12;ed characteristics. My system assembles workstations into clusters which are, in some sense, "close" in terms
of bandwidth, latency and/or number of network hops and that are also computationally similar in terms of processor speed, memory capacity and available hard disk space. Finally, I assess the conditions under which my proposed system might be effective via simulation using generated network topologies that are intended to reflect realworld characteristics. The results of these simulations suggest that my system is tunable to different conditions and that the algorithms presented can #11;effectively group together appropriate machines to form clusters and can also manage those clusters #11;effectively as the constituent machines join and leave the system.

44 
Dynamic peertopeer construction of clustersKadaru, Pranith Reddy 13 January 2010 (has links)
The use of parallel computing is increasing with the need to solve ever more complex problems. Unfortunately, while the cost of parallel systems (including clusters and smallscale shared memory machines) has decreased, such machines are still not within the reach of many users. This is particularly true if large numbers of processors are needed. A largely untapped resource for doing some, simpler, types of parallel computing
are temporarily idle machines in distributed environments. Such environments range from the simple (identical machines connected via a LAN) to the complex (heterogeneous machines connected via the Internet).
In this thesis I describe a system for dynamically clustering together similar machines distributed across the Internet. This is done in a peertopeer (P2P) fashion with the goal of ultimately forming useful compute clusters without the need for a heavily centralized software system overseeing the process. In this sense my work builds on socalled
"volunteer computing" efforts, such as SETI@Home but with the goal of supporting a #11;different class of compute problems.
I #12;first consider the characteristics that are necessary to form good clusters of shared machines that can be used together effectively. Second, I exploit simple clustering algorithms to group together appropriate machines using the identified#12;ed characteristics. My system assembles workstations into clusters which are, in some sense, "close" in terms
of bandwidth, latency and/or number of network hops and that are also computationally similar in terms of processor speed, memory capacity and available hard disk space. Finally, I assess the conditions under which my proposed system might be effective via simulation using generated network topologies that are intended to reflect realworld characteristics. The results of these simulations suggest that my system is tunable to different conditions and that the algorithms presented can #11;effectively group together appropriate machines to form clusters and can also manage those clusters #11;effectively as the constituent machines join and leave the system.

45 
Parallel training algorithms for analogue hardware neural netsZhang, Liang January 2007 (has links)
Feedforward neural networks are massively parallel computing structures that have the capability of universal function approximation. The most prevalent realisation of neural nets is in the form of an algorithm implemented in a computer program. Neural networks as computer programs lose the inher ent parallism. Parallism can only be recovered by executing the program on an expensive parallel digital computer. Achievement of the inherent massive parallelism at a lower cost requires direct hardware realisation of the neural net. Such hardware has been developed jointly by QUT and the Heinz Nixdorf Institute (Germany) called the Local Cluster Neural Network (LCNN) chip. But this neural net chip lacks the capability of incircuit learning or onchip training. The weights for the analogue LCNN network have to be computed o® chip on a digital computer. Based on the previous work, this research focuses on the Local Cluster Neu ral Network and its analogue chip. The characteristic of the LCNN chip was measured exhaustively and its behaviours were compared to the theoretical functionality of the LCNN. To overcome the manufacturing °uctuations and deviations presented in analogue circuits, we used chipintheloop strategy for training of the LCNN chip. A new training algorithm: Probabilistic Random Weight Change for the chipintheloop training for function approximation. In order to implement the LCNN analogue chip with onchip training, two training algorithms are studied in online training mode in simulations: the Probabilistic Random Weight Change (PRWC) algorithm and the modified Gradient Descent (GD) algorithm. The circuits design for the PRWC onchip training and the GD onchip training are outlined. These two methods are compared for their training performance and the complexity of their circuits. This research provides the foundation for the next version of LCNN analogue hardware implementation.

46 
A HIGH PERFORMANCE GIBBSSAMPLING ALGORITHM FOR ITEM RESPONSE THEORY MODELSPatsias, Kyriakos 01 January 2009 (has links)
Item response theory (IRT) is a newer and improved theory compared to the classical measurement theory. The fully Bayesian approach shows promise for IRT models. However, it is computationally expensive, and therefore is limited in various applications. It is important to seek ways to reduce the execution time and a suitable solution is the use of high performance computing (HPC). HPC offers considerably high computational power and can handle applications with high computation and memory requirements. In this work, we have modified the existing fully Bayesian algorithm for 2PNO IRT models so that it can be run on a high performance parallel machine. With this parallel version of the algorithm, the empirical results show that a speedup was achieved and the execution time was reduced considerably.

47 
A PARALLEL IMPLEMENTATION OF GIBBS SAMPLING ALGORITHM FOR 2PNO IRT MODELSRahimi, Mona 01 August 2011 (has links)
Item response theory (IRT) is a newer and improved theory compared to the classical measurement theory. The fully Bayesian approach shows promise for IRT models. However, it is computationally expensive, and therefore is limited in various applications. It is important to seek ways to reduce the execution time and a suitable solution is the use of high performance computing (HPC). HPC offers considerably high computational power and can handle applications with high computation and memory requirements. In this work, we have applied two different parallelism methods to the existing fully Bayesian algorithm for 2PNO IRT models so that it can be run on a high performance parallel machine with less communication load. With our parallel version of the algorithm, the empirical results show that a speedup was achieved and the execution time was considerably reduced.

48 
Algoritmos de alinhamento múltiplo e técnicas de otimização para esses algoritmos utilizando Ant Colony /Zafalon, Geraldo Francisco Donega. January 2009 (has links)
Orientador: José Márcio Machado / Banca: Liria Matsumoto Sato / Banca: Renata Spolon Lobato / Resumo: A biologia, como uma ciência bastante desenvolvida, foi dividida em diversas areas, dentre elas, a genética. Esta area passou a crescer em importância nos ultimos cinquenta anos devido aos in umeros benefícios que ela pode trazer, principalmente, aos seres humanos. Como a gen etica passou a apresentar problemas com grande complexidade de resolução estratégias computacionais foram agregadas a ela, surgindo assim a bioinform atica. A bioinformática desenvolveuse de forma bastante signi cativa nos ultimos anos e esse desenvolvimento vem se acentuando a cada dia, devido ao aumento da complexidade dos problemas genômicos propostos pelos biólogos. Assim, os cientistas da computação têm se empenhado no desenvolvimento de novas técnicas computacionais para os biólogos, principalmente no que diz respeito as estrat egias para alinhamentos m ultiplos de sequências. Quando as sequências estão alinhadas, os biólogos podem realizar mais inferências sobre elas, principalmente no reconhecimento de padrões que e uma outra area interessante da bioinformática. Atrav es do reconhecimento de padrãoes, os bi ologos podem identicar pontos de alta signi cância (hot spots) entre as sequências e, consequentemente, pesquisar curas para doençass, melhoramentos genéticos na agricultura, entre outras possibilidades. Este trabalho traz o desenvolvimento e a comparação entre duas técnicas computacionais para o alinhamento m ultiplo de sequências. Uma e baseada na técnica de alinhamento múltiplo de sequências progressivas pura e a outra, e uma técnica de alinhamento múltiplo de sequências otimizada a partir da heurística de colônia de formigas. Ambas as técnicas adotam em algumas de suas fases estratégias de paralelismo, focando na redu c~ao do tempo de execução dos algoritmos. Os testes de desempenho e qualidade dos alinhamentos que foram conduzidos com as duas estrat egias... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Biology as an enough developed science was divided in some areas, and genetics is one of them. This area has improved its relevance in last fty years due to the several bene ts that it can mainly bring to the humans. As genetics starts to show problems with hard resolution complexity, computational strategies were aggregated to it, leading to the start of the bioinformatics. The bioinformatics has been developed in a signi cant way in the last years and this development is accentuating everyday due to the increase of the complexity of the genomic problems proposed by biologists. Thus, the computer scientists have committed in the development of new computational techniques to the biologists, mainly related to the strategies to multiple sequence alignments. When the sequences are aligned, the biologists can do more inferences about them mainly in the pattern recognition that is another interesting area of the bioinformatics. Through the pattern recognition, the biologists can nd hot spots among the sequences and consequently contribute for the cure of diseases, genetics improvements in the agriculture and many other possibilities. This work brings the development and the comparison between two computational techniques for the multiple sequence alignments. One is based on the pure progressive multiple sequence alignment technique and the other one is an optimized multiple sequence alignment technique based on the ant colony heuristics. Both techniques take on some of its stages of parallel strategies, focusing on reducing the execution time of algorithms. Performance and quality tests of the alignments were conducted with both strategies and showed that the optimized approach presents better results when it is compared with the pure progressive approach. Biology as an enough developed science was divided in some areas, and genetics is one of them. This area has improved... (Complete abstract click electronic access below) / Mestre

49 
Hardware Architecture Impact on Manycore Programming ModelStubbfält, Erik January 2021 (has links)
This work investigates how certain processor architectures can affectthe implementation and performance of a parallel programming model.The Ericsson ManyCore Architecture (EMCA) is compared and contrastedto generalpurpose multicore processors, highlighting differencesin their memory systems and processor cores. A proofofconceptimplementation of the Concurrency Building Blocks (CBB) programmingmodel is developed for x8664 using MPI. Benchmark tests showhow CBB on EMCA handles computeintensive and memoryintensivescenarios, compared to a highend x8664 machine running the proofofconcept implementation. EMCA shows its strengths in heavy computationswhile x8664 performs at its best with high degrees of datareuse. Both systems are able to utilize locality in their memory systemsto achieve great performance benefits.

50 
LargeScale Dynamic Optimization Under Uncertainty using Parallel ComputingWashington, Ian D. January 2016 (has links)
This research focuses on the development of a solution strategy for the optimization of
largescale dynamic systems under uncertainty. Uncertainty resides naturally within the
external forces posed to the system or from within the system itself. For example, in chemical
process systems, external inputs include flow rates, temperatures or compositions; while
internal sources include kinetic or mass transport parameters; and empirical parameters
used within thermodynamic correlations and expressions. The goal in devising a dynamic
optimization approach which explicitly accounts for uncertainty is to do so in a manner
which is computationally tractable and is general enough to handle various types and
sources of uncertainty. The approach developed in this thesis follows a socalled multiperiod
technique whereby the infinite dimensional uncertainty space is discretized at numerous
points (known as periods or scenarios) which creates different possible realizations of the
uncertain parameters. The resulting optimization formulation encompasses an approximated
expected value of a chosen objective functional subject to a dynamic model for all the
generated realizations of the uncertain parameters. The dynamic model can be solved,
using an appropriate numerical method, in an embedded manner for which the solution
is used to construct the optimization formulation constraints; or alternatively the model
could be completely discretized over the temporal domain and posed directly as part of the
optimization formulation.
Our approach in this thesis has mainly focused on the embedded model technique for
dynamic optimization which can either follow a single or multipleshooting solution method.
The first contribution of the thesis investigates a combined multiperiod multipleshooting
dynamic optimization approach for the design of dynamic systems using ordinary differential
equation (ODE) or differentialalgebraic equation (DAE) process models. A major aspect
of this approach is the analysis of the parallel solution of the embedded model within the
optimization formulation. As part of this analysis, we further consider the application of
the dynamic optimization approach to several design and operation applications. Another
vmajor contribution of the thesis is the development of a nonlinear programming (NLP) solver
based on an approach that combines sequential quadratic programming (SQP) with an
interiorpoint method (IPM) for the quadratic programming subproblem. A unique aspect of
the approach is that the inherent structure (and parallelism) of the multiperiod formulation
is exploited at the linear algebra level within the SQPIPM nonlinear programming algorithm
using an explicit Schurcomplement decomposition. Our NLP solution approach is further
assessed using several static and dynamic optimization benchmark examples. / Thesis / Doctor of Philosophy (PhD)

Page generated in 0.143 seconds