• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 238
  • 81
  • 31
  • 30
  • 17
  • 7
  • 6
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 542
  • 542
  • 111
  • 70
  • 66
  • 62
  • 61
  • 59
  • 58
  • 57
  • 57
  • 56
  • 54
  • 50
  • 48
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Improving the throughput of novel cluster computing systems

Wu, Jiadong 21 September 2015 (has links)
Traditional cluster computing systems such as the supercomputers are equipped with specially designed high-performance hardware, which escalates the manufacturing cost and the energy cost of those systems. Due to such drawbacks and the diversified demand in computation, two new types of clusters are developed: the GPU clusters and the Hadoop clusters. The GPU cluster combines traditional CPU-only computing cluster with general purpose GPUs to accelerate the applications. Thanks to the massively-parallel architecture of the GPU, this type of system can deliver much higher performance-per-watt than the traditional computing clusters. The Hadoop cluster is another popular type of cluster computing system. It uses inexpensive off-the-shelf component and standard Ethernet to minimize manufacturing cost. The Hadoop systems are widely used throughout the industry. Alongside with the lowered cost, these new systems also bring their unique challenges. According to our study, the GPU clusters are prone to severe under-utilization due to the heterogeneous nature of its computation resources, and the Hadoop clusters are vulnerable to network congestion due to its limited network resources. In this research, we are trying to improve the throughput of these novel cluster computing systems by increasing the workload parallelism and network I/O parallelism.
42

Improving the efficiency of dynamic traffic assignment through computational methods based on combinatorial algorithm

Nezamuddin 12 October 2011 (has links)
Transportation planning and operation requires determining the state of the transportation system under different network supply and demand conditions. The most fundamental determinant of the state of a transportation system is time-varying traffic flow pattern on its roadway segments. It forms a basis for numerous engineering analyses which are used in operational- and planning-level decision-making process. Dynamic traffic assignment (DTA) models are the leading modeling tools employed to determine time-varying traffic flow pattern under changing network conditions. DTA models have matured over the past three decades, and are now being adopted by transportation planning agencies and traffic management centers. However, DTA models for large-scale regional networks require excessive computational resources. The problem becomes further compounded for other applications such as congestion pricing, capacity calibration, and network design for which DTA needs to be solved repeatedly as a sub-problem. This dissertation aims to improve the efficiency of the DTA models, and increase their viability for various planning and operational applications. To this end, a suite of computational methods based on the combinatorial approach for dynamic traffic assignment was developed in this dissertation. At first, a new polynomial run time combinatorial algorithm for DTA was developed. The combinatorial DTA (CDTA) model complements and aids simulation-based DTA models rather than replace them. This is because various policy measures and active traffic control strategies are best modeled using the simulation-based DTA models. Solution obtained from the CDTA model was provided as an initial feasible solution to a simulation-based DTA model to improve its efficiency – this process is called “warm starting” the simulation-based DTA model. To further improve the efficiency of the simulation-based DTA model, the warm start process is made more efficient through parallel computing. Parallel computing was applied to the CDTA model and the traffic simulator used for warm starting. Finally, another warm start method based on the static traffic assignment model was tested on the simulation-based DTA model. The computational methods developed in this dissertation were tested on the Anaheim, CA and Winnipeg, Canada networks. Models warm-started using the CDTA solution performed better than the purely simulation-based DTA models in terms of equilibrium convergence metrics and run time. Warm start methods using solutions from the static traffic assignment models showed similar improvements. Parallel computing was applied to the CDTA model, and it resulted in faster execution time by employing multiple computer processors. Parallel version of the traffic simulator can also be embedded into the simulation-assignment framework of the simulation-based DTA models and improve their efficiency. / text
43

Dynamic peer-to-peer construction of clusters

Kadaru, Pranith Reddy 13 January 2010 (has links)
The use of parallel computing is increasing with the need to solve ever more complex problems. Unfortunately, while the cost of parallel systems (including clusters and small-scale shared memory machines) has decreased, such machines are still not within the reach of many users. This is particularly true if large numbers of processors are needed. A largely untapped resource for doing some, simpler, types of parallel computing are temporarily idle machines in distributed environments. Such environments range from the simple (identical machines connected via a LAN) to the complex (heterogeneous machines connected via the Internet). In this thesis I describe a system for dynamically clustering together similar machines distributed across the Internet. This is done in a peer-to-peer (P2P) fashion with the goal of ultimately forming useful compute clusters without the need for a heavily centralized software system overseeing the process. In this sense my work builds on so-called "volunteer computing" efforts, such as SETI@Home but with the goal of supporting a #11;different class of compute problems. I #12;first consider the characteristics that are necessary to form good clusters of shared machines that can be used together effectively. Second, I exploit simple clustering algorithms to group together appropriate machines using the identified#12;ed characteristics. My system assembles workstations into clusters which are, in some sense, "close" in terms of bandwidth, latency and/or number of network hops and that are also computationally similar in terms of processor speed, memory capacity and available hard disk space. Finally, I assess the conditions under which my proposed system might be effective via simulation using generated network topologies that are intended to reflect real-world characteristics. The results of these simulations suggest that my system is tunable to different conditions and that the algorithms presented can #11;effectively group together appropriate machines to form clusters and can also manage those clusters #11;effectively as the constituent machines join and leave the system.
44

Dynamic peer-to-peer construction of clusters

Kadaru, Pranith Reddy 13 January 2010 (has links)
The use of parallel computing is increasing with the need to solve ever more complex problems. Unfortunately, while the cost of parallel systems (including clusters and small-scale shared memory machines) has decreased, such machines are still not within the reach of many users. This is particularly true if large numbers of processors are needed. A largely untapped resource for doing some, simpler, types of parallel computing are temporarily idle machines in distributed environments. Such environments range from the simple (identical machines connected via a LAN) to the complex (heterogeneous machines connected via the Internet). In this thesis I describe a system for dynamically clustering together similar machines distributed across the Internet. This is done in a peer-to-peer (P2P) fashion with the goal of ultimately forming useful compute clusters without the need for a heavily centralized software system overseeing the process. In this sense my work builds on so-called "volunteer computing" efforts, such as SETI@Home but with the goal of supporting a #11;different class of compute problems. I #12;first consider the characteristics that are necessary to form good clusters of shared machines that can be used together effectively. Second, I exploit simple clustering algorithms to group together appropriate machines using the identified#12;ed characteristics. My system assembles workstations into clusters which are, in some sense, "close" in terms of bandwidth, latency and/or number of network hops and that are also computationally similar in terms of processor speed, memory capacity and available hard disk space. Finally, I assess the conditions under which my proposed system might be effective via simulation using generated network topologies that are intended to reflect real-world characteristics. The results of these simulations suggest that my system is tunable to different conditions and that the algorithms presented can #11;effectively group together appropriate machines to form clusters and can also manage those clusters #11;effectively as the constituent machines join and leave the system.
45

Parallel training algorithms for analogue hardware neural nets

Zhang, Liang January 2007 (has links)
Feedforward neural networks are massively parallel computing structures that have the capability of universal function approximation. The most prevalent realisation of neural nets is in the form of an algorithm implemented in a computer program. Neural networks as computer programs lose the inher- ent parallism. Parallism can only be recovered by executing the program on an expensive parallel digital computer. Achievement of the inherent massive parallelism at a lower cost requires direct hardware realisation of the neural net. Such hardware has been developed jointly by QUT and the Heinz Nixdorf Institute (Germany) called the Local Cluster Neural Network (LCNN) chip. But this neural net chip lacks the capability of in-circuit learning or on-chip training. The weights for the analogue LCNN network have to be computed o® chip on a digital computer. Based on the previous work, this research focuses on the Local Cluster Neu- ral Network and its analogue chip. The characteristic of the LCNN chip was measured exhaustively and its behaviours were compared to the theoretical functionality of the LCNN. To overcome the manufacturing °uctuations and deviations presented in analogue circuits, we used chip-in-the-loop strategy for training of the LCNN chip. A new training algorithm: Probabilistic Random Weight Change for the chip-in-the-loop training for function approximation. In order to implement the LCNN analogue chip with on-chip training, two training algorithms are studied in on-line training mode in simulations: the Probabilistic Random Weight Change (PRWC) algorithm and the modified Gradient Descent (GD) algorithm. The circuits design for the PRWC on-chip training and the GD on-chip training are outlined. These two methods are compared for their training performance and the complexity of their circuits. This research provides the foundation for the next version of LCNN analogue hardware implementation.
46

A HIGH PERFORMANCE GIBBS-SAMPLING ALGORITHM FOR ITEM RESPONSE THEORY MODELS

Patsias, Kyriakos 01 January 2009 (has links)
Item response theory (IRT) is a newer and improved theory compared to the classical measurement theory. The fully Bayesian approach shows promise for IRT models. However, it is computationally expensive, and therefore is limited in various applications. It is important to seek ways to reduce the execution time and a suitable solution is the use of high performance computing (HPC). HPC offers considerably high computational power and can handle applications with high computation and memory requirements. In this work, we have modified the existing fully Bayesian algorithm for 2PNO IRT models so that it can be run on a high performance parallel machine. With this parallel version of the algorithm, the empirical results show that a speedup was achieved and the execution time was reduced considerably.
47

A PARALLEL IMPLEMENTATION OF GIBBS SAMPLING ALGORITHM FOR 2PNO IRT MODELS

Rahimi, Mona 01 August 2011 (has links)
Item response theory (IRT) is a newer and improved theory compared to the classical measurement theory. The fully Bayesian approach shows promise for IRT models. However, it is computationally expensive, and therefore is limited in various applications. It is important to seek ways to reduce the execution time and a suitable solution is the use of high performance computing (HPC). HPC offers considerably high computational power and can handle applications with high computation and memory requirements. In this work, we have applied two different parallelism methods to the existing fully Bayesian algorithm for 2PNO IRT models so that it can be run on a high performance parallel machine with less communication load. With our parallel version of the algorithm, the empirical results show that a speedup was achieved and the execution time was considerably reduced.
48

Algoritmos de alinhamento múltiplo e técnicas de otimização para esses algoritmos utilizando Ant Colony /

Zafalon, Geraldo Francisco Donega. January 2009 (has links)
Orientador: José Márcio Machado / Banca: Liria Matsumoto Sato / Banca: Renata Spolon Lobato / Resumo: A biologia, como uma ciência bastante desenvolvida, foi dividida em diversas areas, dentre elas, a genética. Esta area passou a crescer em importância nos ultimos cinquenta anos devido aos in umeros benefícios que ela pode trazer, principalmente, aos seres humanos. Como a gen etica passou a apresentar problemas com grande complexidade de resolução estratégias computacionais foram agregadas a ela, surgindo assim a bioinform atica. A bioinformática desenvolveu-se de forma bastante signi cativa nos ultimos anos e esse desenvolvimento vem se acentuando a cada dia, devido ao aumento da complexidade dos problemas genômicos propostos pelos biólogos. Assim, os cientistas da computação têm se empenhado no desenvolvimento de novas técnicas computacionais para os biólogos, principalmente no que diz respeito as estrat egias para alinhamentos m ultiplos de sequências. Quando as sequências estão alinhadas, os biólogos podem realizar mais inferências sobre elas, principalmente no reconhecimento de padrões que e uma outra area interessante da bioinformática. Atrav es do reconhecimento de padrãoes, os bi ologos podem identicar pontos de alta signi cância (hot spots) entre as sequências e, consequentemente, pesquisar curas para doençass, melhoramentos genéticos na agricultura, entre outras possibilidades. Este trabalho traz o desenvolvimento e a comparação entre duas técnicas computacionais para o alinhamento m ultiplo de sequências. Uma e baseada na técnica de alinhamento múltiplo de sequências progressivas pura e a outra, e uma técnica de alinhamento múltiplo de sequências otimizada a partir da heurística de colônia de formigas. Ambas as técnicas adotam em algumas de suas fases estratégias de paralelismo, focando na redu c~ao do tempo de execução dos algoritmos. Os testes de desempenho e qualidade dos alinhamentos que foram conduzidos com as duas estrat egias... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Biology as an enough developed science was divided in some areas, and genetics is one of them. This area has improved its relevance in last fty years due to the several bene ts that it can mainly bring to the humans. As genetics starts to show problems with hard resolution complexity, computational strategies were aggregated to it, leading to the start of the bioinformatics. The bioinformatics has been developed in a signi cant way in the last years and this development is accentuating everyday due to the increase of the complexity of the genomic problems proposed by biologists. Thus, the computer scientists have committed in the development of new computational techniques to the biologists, mainly related to the strategies to multiple sequence alignments. When the sequences are aligned, the biologists can do more inferences about them mainly in the pattern recognition that is another interesting area of the bioinformatics. Through the pattern recognition, the biologists can nd hot spots among the sequences and consequently contribute for the cure of diseases, genetics improvements in the agriculture and many other possibilities. This work brings the development and the comparison between two computational techniques for the multiple sequence alignments. One is based on the pure progressive multiple sequence alignment technique and the other one is an optimized multiple sequence alignment technique based on the ant colony heuristics. Both techniques take on some of its stages of parallel strategies, focusing on reducing the execution time of algorithms. Performance and quality tests of the alignments were conducted with both strategies and showed that the optimized approach presents better results when it is compared with the pure progressive approach. Biology as an enough developed science was divided in some areas, and genetics is one of them. This area has improved... (Complete abstract click electronic access below) / Mestre
49

Hardware Architecture Impact on Manycore Programming Model

Stubbfält, Erik January 2021 (has links)
This work investigates how certain processor architectures can affectthe implementation and performance of a parallel programming model.The Ericsson Many-Core Architecture (EMCA) is compared and contrastedto general-purpose multicore processors, highlighting differencesin their memory systems and processor cores. A proof-of-conceptimplementation of the Concurrency Building Blocks (CBB) programmingmodel is developed for x86-64 using MPI. Benchmark tests showhow CBB on EMCA handles compute-intensive and memory-intensivescenarios, compared to a high-end x86-64 machine running the proofof-concept implementation. EMCA shows its strengths in heavy computationswhile x86-64 performs at its best with high degrees of datareuse. Both systems are able to utilize locality in their memory systemsto achieve great performance benefits.
50

GPU Based Large Scale Multi-Agent Crowd Simulation and Path Planning

Gusukuma, Luke 13 May 2015 (has links)
Crowd simulation is used for many applications including (but not limited to) videogames, building planning, training simulators, and various virtual environment applications. Particularly, crowd simulation is most useful for when real life practices wouldn't be practical such as repetitively evacuating a building, testing the crowd flow for various building blue prints, placing law enforcers in actual crowd suppression circumstances, etc. In our work, we approach the fidelity to scalability problem of crowd simulation from two angles, a programmability angle, and a scalability angle, by creating new methodology building off of a struct of arrays approach and transforming it into an Object Oriented Struct of Arrays approach. While the design pattern itself is applied to crowd simulation in our work, the application of crowd simulation exemplifies the variety of applications for which the design pattern can be used. / Master of Science

Page generated in 0.0937 seconds