Global ETD Search

31	Branch-level scheduling in Aurora : the Dharma scheduler Sindaha, Raed Yousef Saba January 1995 (has links) No description available. 005
32	Efficient scheduling of parallel applications on workstation clusters Dantas, Mario A. R. January 1996 (has links) No description available. 005
33	FADI : a fault-tolerant environment for distributed processing systems Osman, Taha Mohammed January 1998 (has links) No description available. 005 Parallel computing; Error detection; PVM
34	Practical structured parallelism using BMF Crooke, David January 1998 (has links) This thesis concerns the use of the Bird- Meertens Formalism as a mechanism to control parallelism in an imperative programming language. One of the main reasons for the failure of parallelism to enter mainstream computing is the difficulty of developing software and the lack of the portability and performance predictability enjoyed by sequential systems. A key objetive should be to minimise costs by abstracting much of the complexity away from the programmer. Criteria for a suitable parallel programming paradigm to meet this goal are defined. The Bird-Meertens Formalism, which has in the past been shown to be a suitable vehicle for expressing parallel algorithms, is used as the basis for a proposed imperative parallel programming paradigm which meets these criteria. A programming language is proposed which is an example of this paradigm, based on the BMF Theory of Lists and the sequential language C. A concurrent operational semantics is outlined, with the emphasis on its use as a practical tool for imcreasing confidence in program correctness, rather than on full and rigorous formality. A prototype implementation of a subset of this language for a distributed memory, massively parallel computer is produced in the form of a C subroutine library. Although not offering realistic absolute performance, it permits measurements of scalability and relative performance to be undertaken. A case study is undertaken which implements a simple but realistic algoritm in the language, and considers how well the the criteria outlined at the start of the project are met. The prototype library implementation is used for performance measurements. A range of further possibilities is examinedm, in particular ways in which the paradigm language may be extended, and the possibility of using alternative BMF-like type theories. Pragmatic considerations for achieving performance in a production implementation are discussed. 004.01
35	Computational Structure of the N-body Problem Katzenelson, Jacob 01 April 1988 (has links) This work considers the organization and performance of computations on parallel computers of tree algorithms for the N-body problem where the number of particles is on the order of a million. The N-body problem is formulated as a set of recursive equations based on a few elementary functions, which leads to a computational structure in the form of a pyramid-like graph, where each vertex is a process, and each arc a communication link. The pyramid is mapped to three different processor configurations: (1) A pyramid of processors corresponding to the processes pyramid graph; (2) A hypercube of processors, e.g., a connection-machine like architecture; (3) A rather small array, e.g., $2 \\times 2 \\ times 2$, of processors faster than the ones considered in (1) and (2) above. Simulations of this size can be performed on any of the three architectures in reasonable time. N-body problem parallel computing particle simulation
36	A Phase Based Dense Stereo Algorithm Implemented in CUDA Macomber, Brent David 2011 May 1900 (has links) Stereo imaging is routinely used in Simultaneous Localization and Mapping (SLAM) systems for the navigation and control of autonomous spacecraft proximity operations, advanced robotics, and robotic mapping and surveying applications. A key step (and generally the most computationally expensive step) in the generation of high fidelity geometric environment models from image data is the solution of the dense stereo correspondence problem. A novel method for solving the stereo correspondence problem to sub-pixel accuracy in the Fourier frequency domain by exploiting the Convolution Theorem is developed. The method is tailored to challenging aerospace applications by incorporation of correction factors for common error sources. Error-checking metrics verify correspondence matches to ensure high quality depth reconstructions are generated. The effect of geometric foreshortening caused by the baseline displacement of the cameras is modeled and corrected, drastically improving correspondence matching on highly off-normal surfaces. A metric for quantifying the strength of correspondence matches is developed and implemented to recognize and reject weak correspondences, and a separate cross-check verification provides a final defense against erroneous matches. The core components of this phase based dense stereo algorithm are implemented and optimized in the Compute Uni ed Device Architecture (CUDA) parallel computation environment onboard an NVIDIA Graphics Processing Unit (GPU). Accurate dense stereo correspondence matching is performed on stereo image pairs at a rate of nearly 10Hz. Dense Stereo Parallel Computing CUDA SLAM
37	Improving the throughput of novel cluster computing systems Wu, Jiadong 21 September 2015 (has links) Traditional cluster computing systems such as the supercomputers are equipped with specially designed high-performance hardware, which escalates the manufacturing cost and the energy cost of those systems. Due to such drawbacks and the diversified demand in computation, two new types of clusters are developed: the GPU clusters and the Hadoop clusters. The GPU cluster combines traditional CPU-only computing cluster with general purpose GPUs to accelerate the applications. Thanks to the massively-parallel architecture of the GPU, this type of system can deliver much higher performance-per-watt than the traditional computing clusters. The Hadoop cluster is another popular type of cluster computing system. It uses inexpensive off-the-shelf component and standard Ethernet to minimize manufacturing cost. The Hadoop systems are widely used throughout the industry. Alongside with the lowered cost, these new systems also bring their unique challenges. According to our study, the GPU clusters are prone to severe under-utilization due to the heterogeneous nature of its computation resources, and the Hadoop clusters are vulnerable to network congestion due to its limited network resources. In this research, we are trying to improve the throughput of these novel cluster computing systems by increasing the workload parallelism and network I/O parallelism. GPU Hadoop Computer cluster Parallel computing
38	Improving the efficiency of dynamic traffic assignment through computational methods based on combinatorial algorithm Nezamuddin 12 October 2011 (has links) Transportation planning and operation requires determining the state of the transportation system under different network supply and demand conditions. The most fundamental determinant of the state of a transportation system is time-varying traffic flow pattern on its roadway segments. It forms a basis for numerous engineering analyses which are used in operational- and planning-level decision-making process. Dynamic traffic assignment (DTA) models are the leading modeling tools employed to determine time-varying traffic flow pattern under changing network conditions. DTA models have matured over the past three decades, and are now being adopted by transportation planning agencies and traffic management centers. However, DTA models for large-scale regional networks require excessive computational resources. The problem becomes further compounded for other applications such as congestion pricing, capacity calibration, and network design for which DTA needs to be solved repeatedly as a sub-problem. This dissertation aims to improve the efficiency of the DTA models, and increase their viability for various planning and operational applications. To this end, a suite of computational methods based on the combinatorial approach for dynamic traffic assignment was developed in this dissertation. At first, a new polynomial run time combinatorial algorithm for DTA was developed. The combinatorial DTA (CDTA) model complements and aids simulation-based DTA models rather than replace them. This is because various policy measures and active traffic control strategies are best modeled using the simulation-based DTA models. Solution obtained from the CDTA model was provided as an initial feasible solution to a simulation-based DTA model to improve its efficiency – this process is called “warm starting” the simulation-based DTA model. To further improve the efficiency of the simulation-based DTA model, the warm start process is made more efficient through parallel computing. Parallel computing was applied to the CDTA model and the traffic simulator used for warm starting. Finally, another warm start method based on the static traffic assignment model was tested on the simulation-based DTA model. The computational methods developed in this dissertation were tested on the Anaheim, CA and Winnipeg, Canada networks. Models warm-started using the CDTA solution performed better than the purely simulation-based DTA models in terms of equilibrium convergence metrics and run time. Warm start methods using solutions from the static traffic assignment models showed similar improvements. Parallel computing was applied to the CDTA model, and it resulted in faster execution time by employing multiple computer processors. Parallel version of the traffic simulator can also be embedded into the simulation-assignment framework of the simulation-based DTA models and improve their efficiency. / text Dynamic traffic assignment Parallel computing Combinatorial algorithm
39	Dynamic peer-to-peer construction of clusters Kadaru, Pranith Reddy 13 January 2010 (has links) The use of parallel computing is increasing with the need to solve ever more complex problems. Unfortunately, while the cost of parallel systems (including clusters and small-scale shared memory machines) has decreased, such machines are still not within the reach of many users. This is particularly true if large numbers of processors are needed. A largely untapped resource for doing some, simpler, types of parallel computing are temporarily idle machines in distributed environments. Such environments range from the simple (identical machines connected via a LAN) to the complex (heterogeneous machines connected via the Internet). In this thesis I describe a system for dynamically clustering together similar machines distributed across the Internet. This is done in a peer-to-peer (P2P) fashion with the goal of ultimately forming useful compute clusters without the need for a heavily centralized software system overseeing the process. In this sense my work builds on so-called "volunteer computing" efforts, such as SETI@Home but with the goal of supporting a #11;different class of compute problems. I #12;first consider the characteristics that are necessary to form good clusters of shared machines that can be used together effectively. Second, I exploit simple clustering algorithms to group together appropriate machines using the identified#12;ed characteristics. My system assembles workstations into clusters which are, in some sense, "close" in terms of bandwidth, latency and/or number of network hops and that are also computationally similar in terms of processor speed, memory capacity and available hard disk space. Finally, I assess the conditions under which my proposed system might be effective via simulation using generated network topologies that are intended to reflect real-world characteristics. The results of these simulations suggest that my system is tunable to different conditions and that the algorithms presented can #11;effectively group together appropriate machines to form clusters and can also manage those clusters #11;effectively as the constituent machines join and leave the system. Clusters Peer-to-Peer parallel computing
40	Dynamic peer-to-peer construction of clusters Kadaru, Pranith Reddy 13 January 2010 (has links) The use of parallel computing is increasing with the need to solve ever more complex problems. Unfortunately, while the cost of parallel systems (including clusters and small-scale shared memory machines) has decreased, such machines are still not within the reach of many users. This is particularly true if large numbers of processors are needed. A largely untapped resource for doing some, simpler, types of parallel computing are temporarily idle machines in distributed environments. Such environments range from the simple (identical machines connected via a LAN) to the complex (heterogeneous machines connected via the Internet). In this thesis I describe a system for dynamically clustering together similar machines distributed across the Internet. This is done in a peer-to-peer (P2P) fashion with the goal of ultimately forming useful compute clusters without the need for a heavily centralized software system overseeing the process. In this sense my work builds on so-called "volunteer computing" efforts, such as SETI@Home but with the goal of supporting a #11;different class of compute problems. I #12;first consider the characteristics that are necessary to form good clusters of shared machines that can be used together effectively. Second, I exploit simple clustering algorithms to group together appropriate machines using the identified#12;ed characteristics. My system assembles workstations into clusters which are, in some sense, "close" in terms of bandwidth, latency and/or number of network hops and that are also computationally similar in terms of processor speed, memory capacity and available hard disk space. Finally, I assess the conditions under which my proposed system might be effective via simulation using generated network topologies that are intended to reflect real-world characteristics. The results of these simulations suggest that my system is tunable to different conditions and that the algorithms presented can #11;effectively group together appropriate machines to form clusters and can also manage those clusters #11;effectively as the constituent machines join and leave the system. Clusters Peer-to-Peer parallel computing

Search results