• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 239
  • 81
  • 31
  • 30
  • 17
  • 7
  • 6
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 543
  • 543
  • 111
  • 70
  • 66
  • 62
  • 61
  • 59
  • 58
  • 57
  • 57
  • 56
  • 54
  • 50
  • 48
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

Performance Analysis Of Stacked Generalization

Ozay, Mete 01 September 2008 (has links) (PDF)
Stacked Generalization (SG) is an ensemble learning technique, which aims to increase the performance of individual classifiers by combining them under a hierarchical architecture. This study consists of two major parts. In the first part, the performance of Stacked Generalization technique is analyzed with respect to the performance of the individual classifiers and the content of the training data. In the second part, based on the findings for a new class of algorithms, called Meta-Fuzzified Yield Value (Meta-FYV) is introduced. The first part introduces and verifies two hypotheses by a set of controlled experiments to assure the performance gain for SG. The learning mechanisms of SG to achieve high performance are explored and the relationship between the performance of the individual classifiers and that of SG is investigated. It is shown that if the samples in the training set are correctly classified by at least one base layer classifier, then, the generalization performance of the SG is increased, compared to the performance of the individual classifiers. In the second hypothesis, the effect of the spurious samples, which are not correctly labeled by any of the base layer classifiers, is investigated. In the second part of the thesis, six theorems are constructed based on the analysis of the feature spaces and the stacked generalization architecture. Based on the theorems and hypothesis, a new class of SG algorithms is proposed. The experiments are performed on both Corel data and synthetically generated data, using parallel programming techniques, on a high performance cluster.
162

Parallel Closet+ Algorithm For Finding Frequent Closed Itemsets

Sen, Tayfun 01 July 2009 (has links) (PDF)
Data mining is proving itself to be a very important field as the data available is increasing exponentially, thanks to first computerization and now internetization. On the other hand, cluster computing systems made up of commodity hardware are becoming widespread, along with the multicore processor architectures. This high computing power is synthesized with data mining to process huge amounts of data and to reach information and knowledge. Frequent itemset mining is a special subtopic of data mining because it is an integral part of many types of data mining tasks. Often this task is a prerequisite for many other data mining algorithms, most notably algorithms in the association rule mining area. For this reason, it is studied heavily in the literature. In this thesis, a parallel implementation of CLOSET+, a frequent closed itemset mining algorithm, is presented. The CLOSET+ algorithm has been modified to run on multiple processors simultaneously, in order to obtain results faster. Open MPI and Boost libraries have been used for the communication between different processes and the program has been tested on different inputs and parameters. Experimental results show that the algorithm exhibits high speedup and eficiency for dense data when the support value is higher than a determined value. Proposed parallel algorithm could prove to be useful for application areas where fast response is needed for low to medium number of frequent closed itemsets. A particular application area is the Web where online applications have similar requirements.
163

Massive Crowd Simulation With Parallel Processing

Yilmaz, Erdal 01 February 2010 (has links) (PDF)
This thesis analyzes how parallel processing with Graphics Processing Unit (GPU) could be used for massive crowd simulation, not only in terms of rendering but also the computational power that is required for realistic simulation. The extreme population in massive crowd simulation introduces an extra computational load, which is quite difficult to meet by using Central Processing Unit (CPU) resources only. The thesis shows the specific methods and approaches that maximize the throughput of GPU parallel computing, while using GPU as the main processor for massive crowd simulation. The methodology introduced in this thesis makes it possible to simulate and visualize hundreds of thousands of virtual characters in real-time. In order to achieve two orders of magnitude speedups by using GPU parallel processing, various stream compaction and effective memory access approaches were employed. To simulate crowd behavior, fuzzy logic functionality on the GPU has been implemented from scratch. This implementation is capable of computing more than half billion fuzzy inferences per second.
164

Enabling collaborative behaviors among cubesats

Browne, Daniel C. 08 July 2011 (has links)
Future spacecraft missions are trending towards the use of distributed systems or fractionated spacecraft. Initiatives such as DARPA's System F6 are encouraging the satellite community to explore the realm of collaborative spacecraft teams in order to achieve lower cost, lower risk, and greater data value over the conventional monoliths in LEO today. Extensive research has been and is being conducted indicating the advantages of distributed spacecraft systems in terms of both capability and cost. Enabling collaborative behaviors among teams or formations of pico-satellites requires technology development in several subsystem areas including attitude determination and control subsystems, orbit determination and maintenance capabilities, as well as a means to maintain accurate knowledge of team members' position and attitude. All of these technology developments desire improvements (more specifically, decreases) in mass and power requirements in order to fit on pico-satellite platforms such as the CubeSat. In this thesis a solution for the last technology development area aforementioned is presented. Accurate knowledge of each spacecraft's state in a formation, beyond improving collision avoidance, provides a means to best schedule sensor data gathering, thereby increasing power budget efficiency. Our solution is composed of multiple software and hardware components. First, finely-tuned flight system software for the maintaining of state knowledge through equations of motion propagation is developed. Additional software, including an extended Kalman filter implementation, and commercially available hardware components provide a means for on-board determination of both orbit and attitude. Lastly, an inter-satellite communication message structure and protocol enable the updating of position and attitude, as required, among team members. This messaging structure additionally provides a means for payload sensor and telemetry data sharing. In order to satisfy the needs of many different missions, the software has the flexibility to vary the limits of accuracy on the knowledge of team member position, velocity, and attitude. Such flexibility provides power savings for simpler applications while still enabling missions with the need of finer accuracy knowledge of the distributed team's state. Simulation results are presented indicating the accuracy and efficiency of formation structure knowledge through incorporation of the described solution. More importantly, results indicate the collaborative module's ability to maintain formation knowledge within bounds prescribed by a user. Simulation has included hardware-in-the-loop setups utilizing an S-band transceiver. Two "satellites" (computers setup with S-band transceivers and running the software components of the collaborative module) are provided GPS inputs comparable to the outputs provided from commercial hardware; this partial hardware-in-the-loop setup demonstrates the overall capabilities of the collaborative module. Details on each component of the module are provided. Although the module is designed with the 3U CubeSat framework as the initial demonstration platform, it is easily extendable onto other small satellite platforms. By using this collaborative module as a base, future work can build upon it with attitude control, orbit or formation control, and additional capabilities with the end goal of achieving autonomous clusters of small spacecraft.
165

On algorithm design and programming model for multi-threaded computing

He, Zhengyu 27 March 2012 (has links)
The objective of this work is to investigate the algorithm design and the programming model of multi-threaded computing. Designing multi-threaded algorithms is very challenging - when multiple threads need to communicate or coordinate with each other, efficient synchronization support is needed. However, synchronizations are known to be expensive on the emerging multi-/many-core processors, especially when the number of threads increases. To fully unleash the power of such processors, carefully investigations are needed in both algorithm design and programming models for multi-threaded systems. In this dissertation, we first present an asynchronous multi-threaded algorithm for the maximum network flow problem. This algorithm is based on the classical push-relabel algorithm and completely removes the use of locks and barriers from its original parallel version. While this algorithmic method shows effectiveness, it is challenging to generalize the success to other multi-threaded problem. We next focus on improving the transactional memory, a promising mechanism to construct multi-threaded programs. A queuing-theory-based model is developed to analyze the performance of different transactional memory systems. Based on the results of the model, we emphasize on the contention management mechanism of transactional memory systems. A profiling-based adaptive contention management scheme is finally proposed to cope with the problem that none of the static contention management schemes can keep good performance on all platforms for all types of workload. From this research, we show that it is necessary and worthwhile to explore both the algorithm design aspect and the programming model aspect for multi-thread computing.
166

Development and application of a parallel compositional reservoir simulator

Ghasemi Doroh, Mojtaba 06 November 2012 (has links)
Simulation of large-scale and complex reservoirs requires fine and detailed gridding, which involves a significant amount of memory and is computationally expensive. Nowadays, clusters of PCs and high-performance computing (HPC) centers are widely available. These systems allow parallel processing, which helps large-scale simulations run faster and more efficient. In this research project, we developed a parallel version of The University of Texas Compositional Simulator (UTCOMP). The parallel UTCOMP is capable of running on both shared and distributed memory parallel computers. This parallelization included all physical features of the original code, such as higher-order finite difference, physical dispersion, and asphaltene precipitation. The parallelization was verified for several case studies using multiple processors. The parallel simulator produces outputs required for visualizing simulation results using the S3graph visualization software. The efficiency of the parallel simulator was assessed in terms of speedup using various numbers of processors. Subsequently, we improved the coding and implementation in the simulator in order to minimize the communications between the processors to improve the parallel efficiency to carry out the simulations. To improve the efficiency of the linear solver in the simulator, we implemented three well-known high-performance parallel solver packages (SAMG, Hypre, and PETSc) in the parallel simulator. Then, the performances of the solver packages were improved in terms of the input parameters for solving large-scale reservoir simulation problems. The developed parallel simulator has expanded the capability of the original code for simulating large-scale reservoir simulation case studies. In other words, with sufficient number of processors, a field-scale simulation with a million grid cells can be performed in few hours. Several case studies are presented to show the performance of the parallel simulator. / text
167

Static guarantees for coordinated components : a statically typed composition model for stream-processing networks

Penczek, Frank January 2012 (has links)
Does your program do what it is supposed to be doing? Without running the program providing an answer to this question is much harder if the language does not support static type checking. Of course, even if compile-time checks are in place only certain errors will be detected: compilers can only second-guess the programmer’s intention. But, type based techniques go a long way in assisting programmers to detect errors in their computations earlier on. The question if a program behaves correctly is even harder to answer if the program consists of several parts that execute concurrently and need to communicate with each other. Compilers of standard programming languages are typically unable to infer information about how the parts of a concurrent program interact with each other, especially where explicit threading or message passing techniques are used. Hence, correctness guarantees are often conspicuously absent. Concurrency management in an application is a complex problem. However, it is largely orthogonal to the actual computational functionality that a program realises. Because of this orthogonality, the problem can be considered in isolation. The largest possible separation between concurrency and functionality is achieved if a dedicated language is used for concurrency management, i.e. an additional program manages the concurrent execution and interaction of the computational tasks of the original program. Such an approach does not only help programmers to focus on the core functionality and on the exploitation of concurrency independently, it also allows for a specialised analysis mechanism geared towards concurrency-related properties. This dissertation shows how an approach that completely decouples coordination from computation is a very supportive substrate for inferring static guarantees of the correctness of concurrent programs. Programs are described as streaming networks connecting independent components that implement the computations of the program, where the network describes the dependencies and interactions between components. A coordination program only requires an abstract notion of computation inside the components and may therefore be used as a generic and reusable design pattern for coordination. A type-based inference and checking mechanism analyses such streaming networks and provides comprehensive guarantees of the consistency and behaviour of coordination programs. Concrete implementations of components are deliberately left out of the scope of coordination programs: Components may be implemented in an external language, for example C, to provide the desired computational functionality. Based on this separation, a concise semantic framework allows for step-wise interpretation of coordination programs without requiring concrete implementations of their components. The framework also provides clear guidance for the implementation of the language. One such implementation is presented and hands-on examples demonstrate how the language is used in practice.
168

Towards river flow computation at the continental scale

David, Cédric H., 1981- 22 March 2011 (has links)
The work presented in this dissertation informs on river network modeling at large scales using geographic information systems, parallel computing and the latest advancements of atmospheric and land surface modeling. This work is motivated by the availability of a vector-based Geographic Information System dataset that describes the networks of streams and rivers in the United States, and how they are connected. A land surface model called Noah-distributed is used to provide lateral inflow to an NHDPlus river network in the Guadalupe River Basin in Texas. Challenges related to the projection of gridded hydrographic data from a coordinate system to another are investigated. The different representations of the shape of the Earth used in atmospheric science (spherical) and hydrology (spheroidal) can lead to a significant North-South shift on the order of 20 km at mid latitudes. A river network model called RAPID is developed and applied in a four-year study of the Guadalupe and San Antonio River Basins in Texas using the river network of NHDPlus. Gage measurements are used to estimate flow wave celerities in a river network and to assess the quality of RAPID flow computations. The performance of RAPID in a massively-parallel computing environment is tested and further investigation of its scalability is needed before using RAPID at the state or federal level. The replacement by RAPID of the river routing scheme used in SIM-France -- a hydro-meteorological model -- is investigated in a ten-year study of river flow in France. While the formulation of RAPID improves the functionality of SIM-France, the flow simulations are comparable in accuracy to those previously obtained by SIM-France. Sub-basin parameterization was found to improve model results. A single criterion for quantifying the quality of river flow simulations using several river gages globally in a river network is developed that normalizes the square error of modeled flow to allow equal treatment of all gaging stations regardless of the magnitude of flow. The use of this criterion as the cost function for parameter estimation in RAPID allows better results than by increasing the degree of spatial variability in model parameters. / text
169

"Virtual malleability" applied to MPI jobs to improve their execution in a multiprogrammed environment"

Utrera Iglesias, Gladys Miriam 10 December 2007 (has links)
This work focuses on scheduling of MPI jobs when executing in shared-memory multiprocessors (SMPs). The objective was to obtain the best performance in response time in multiprogrammed multiprocessors systems using batch systems, assuming all the jobs have the same priority. To achieve that purpose, the benefits of supporting malleability on MPI jobs to reduce fragmentation and consequently improve the performance of the system were studied. The contributions made in this work can be summarized as follows:· Virtual malleability: A mechanism where a job is assigned a dynamic processor partition, where the number of processes is greater than the number of processors. The partition size is modified at runtime, according to external requirements such as the load of the system, by varying the multiprogramming level, making the job contend for resources with itself. In addition to this, a mechanism which decides at runtime if applying local or global process queues to an application depending on the load balancing between processes of it. · A job scheduling policy, that takes decisions such as how many processes to start with and the maximum multiprogramming degree based on the type and number of applications running and queued. Moreover, as soon as a job finishes execution and where there are queued jobs, this algorithm analyzes whether it is better to start execution of another job immediately or just wait until there are more resources available. · A new alternative to backfilling strategies for the problema of window execution time expiring. Virtual malleability is applied to the backfilled job, reducing its partition size but without aborting or suspending it as in traditional backfilling. The evaluation of this thesis has been done using a practical approach. All the proposals were implemented, modifying the three scheduling levels: queuing system, processor scheduler and runtime library. The impact of the contributions were studied under several types of workloads, varying machine utilization, communication and, balance degree of the applications, multiprogramming level, and job size. Results showed that it is possible to offer malleability over MPI jobs. An application obtained better performance when contending for the resources with itself than with other applications, especially in workloads with high machine utilization. Load imbalance was taken into account obtaining better performance if applying the right queue type to each application independently.The job scheduling policy proposed exploited virtual malleability by choosing at the beginning of execution some parameters like the number of processes and maximum multiprogramming level. It performed well under bursty workloads with low to medium machine utilizations. However as the load increases, virtual malleability was not enough. That is because, when the machine is heavily loaded, the jobs, once shrunk are not able to expand, so they must be executed all the time with a partition smaller than the job size, thus degrading performance. Thus, at this point the job scheduling policy concentrated just in moldability.Fragmentation was alleviated also by applying backfilling techniques to the job scheduling algorithm. Virtual malleability showed to be an interesting improvement in the window expiring problem. Backfilled jobs even on a smaller partition, can continue execution reducing memory swapping generated by aborts/suspensions In this way the queueing system is prevented from reinserting the backfilled job in the queue and re-executing it in the future.
170

A DOMAIN DECOMPOSITION APPROACH FOR LARGE-SCALE SIMULATIONS OF FLOW PROCESSES IN HYDRATE-BEARING GEOLOGIC MEDIA

Zhang, Keni, Moridis, George J., Wu, Yu-Shu, Pruess, Karsten 07 1900 (has links)
Simulation of the system behavior of hydrate-bearing geologic media involves solving fully coupled mass- and heat-balance equations. In this study, we develop a domain decomposition approach for large-scale gas hydrate simulations with coarse-granularity parallel computation. This approach partitions a simulation domain into small subdomains. The full model domain, consisting of discrete subdomains, is still simulated simultaneously by using multiple processes/processors. Each processor is dedicated to following tasks of the partitioned subdomain: updating thermophysical properties, assembling mass- and energy-balance equations, solving linear equation systems, and performing various other local computations. The linearized equation systems are solved in parallel with a parallel linear solver, using an efficient interprocess communication scheme. This new domain decomposition approach has been implemented into the TOUGH+HYDRATE code and has demonstrated excellent speedup and good scalability. In this paper, we will demonstrate applications for the new approach in simulating field-scale models for gas production from gas-hydrate deposits.

Page generated in 0.0876 seconds