• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 802
  • 474
  • 212
  • 148
  • 88
  • 77
  • 70
  • 23
  • 16
  • 15
  • 13
  • 13
  • 13
  • 13
  • 13
  • Tagged with
  • 2243
  • 2243
  • 969
  • 659
  • 645
  • 442
  • 432
  • 409
  • 357
  • 335
  • 329
  • 328
  • 323
  • 317
  • 317
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
831

Non-Invasive Permeability Assessment of High-Performance Concrete Bridge Deck Mixtures

Bryant, James William Jr. 27 April 2001 (has links)
Concrete construction methods and practices influence the final in-place quality of concrete. A low permeability concrete mixture does not alone ensure quality in-place concrete. If the concrete mixture is not transported, placed and cured properly, it may not exhibit the desired durability and mechanical properties. This study investigates the in-place permeation properties of low permeability concrete bridge decks mixtures used in the Commonwealth of Virginia. Permeation properties were assessed in both the laboratory and in the field using 4-point Wenner array electrical resistivity, surface air flow (SAF), and chloride ion penetrability (ASTM C 1202-97). Laboratory test specimens consisted of two concrete slabs having dimensions of 280 x 280 x 102-mm (11 x 11 x 4-in) and twelve 102 x 204-mm (4 x 8-in) cylinders per concrete mixture. Specimens were tested at 7, 28 and 91-days. Thirteen cylinder specimens per concrete mixture underwent standard curing in a saturated limewater bath. The simulated field-curing regimes used wet burlap and plastic sheeting for 3 (3B) and 7 days (7B) respectively and was applied to both slabs and cylinder specimens. Slab specimen were tested on finished surface using the SAF at 28 and 91 days, and 4-point electrical resistivity measurements at 1, 3, 7, 14, 28 and 91 days. Compressive strength (CS) tests were conducted at 7 and 28 days. Chloride ion penetrability tests were performed at 7, 28, and 91 days. Statistical analyses were performed to assess the significance of the relationships for the following: Total charge passed and initial current (ASTM C 1202-97); 3B resistivity and 7B resistivity; Slab and cylinder resistivity; Slab resistivity and ASTM C-1202-97 (Total Charge and Initial current); and Surface Air Flow and ASTM C-1202-97. Field cast specimens, test slabs and cylinders, were cast on-site during concrete bridge deck construction. The slab dimensions were 30.5 x 40.6 x 10.2-cm (12 x 16 x 4 in.), and the cylinders were 10.2 x 20.4-cm (4 x 8-in). In-situ SAF and resistivity measurements were taken on the bridge deck at 14, 42 and 91 days. In-place SAF and resistivity measurements on laboratory field cast slabs were taken at 7, 14 and 28-days. ASTM C 1202-97 specimens were prepared from field cast cylinders and tested at 7 and 28 and 42-days. The relationship between in-place permeation measures from field specimens was compared to laboratory data. Results indicated no difference in chloride ion penetrability (Figures 7.4 and 7.5) and 28-day compressive strength (Figure 7.2) with regard to differing simulated field curing regimes, for same age testing. There was no significant difference at the 95 % confidence level between 3B resistivity and 7B resistivity specimens tested at the same age (Figures 7.9 and 7.10). A well defined relationship was observed between total charge passed and initial current (Figure 7-6). An inverse power function was found to describe the relationship between charge passed/initial current and electrical resistivity for all laboratory mixtures used in this study (Figure 7.17 – 7.22). Field data was used to validate laboratory established models for charge passed/initial current and electrical resistivity. Laboratory established models were able to predict 30 to 50% of the field data (Figures 7.31 – 7.34). Results indicate that the SAF lacked the sensitivity to classify the range of concretes used in this study (Figure 7.24). / Ph. D.
832

Enantiomeric separations by HPLC:temperature, mobile phase, flow rate and retention mechanism studies

Klute, Robert Cragg 06 June 2008 (has links)
The effects of changes in temperature, mobile phase composition, flow rate, and stationary phase upon the enantiomeric separation of several racemic mixtures are investigated. The changes in capacity factor (k'), selectivity (α), and efficiency (N) and enantiomeric resolution (R), are explored. Resolution is then shown to be controlled by the specific combination of chiral stationary phase (CSP), solute, mobile phase and temperature. The key to optimizing chiral resolution lies in understanding the retention mechanism(s) for a given CSP. The proposed retention mechanisms for the two CSP used in the optimization studies are evaluated using chromatographic/thermodynamic data. Inferences are made which support the well-characterized "Pirkle"-type R-dinitrobenzoylphenylglycine retention mechanism, which depends solely on attractive-repulsive interactions to establish two diasteriomeric complexes having unequal internal energy, and therefore eluting at different times from the chromatographic system. For comparison, a more complicated CSP composed of a tris cellulose(3,5- dimethylphenylcarbamate) coated to a silica gel support is also examined. For this CSP the proposed mechanisms, which include both attractive-repulsive interactions and inclusion complex types, are evaluated according to the chromatographic optimization data, and compared to similar data for the single-mechanism "Pirkle" CSP. In contrast to the above"macromolecular"-level inferences about retention mechanisms drawn from chromatographic data, a second study was initiated using a model CSP attached to a l000-Å gold surface, with Fourier-Transform Infrared Spectrometry in a Reflectance-Absorbance mode, to probe the specific molecular interactions which make possible the diasteriomeric complex. This <i>in situ</i> experiment, in contrast to previous <i>ex situ</i>, stationary phase work, is designed to show that hydrogen bonding is, as predicted, a principal force holding the complexes together, and that a measurable difference exists between the weaker R-trifluoroanthrylethanol (TFAE) and the stronger S-TFAE complexes due to their different stereo-geometry. A further experiment to characterize the difference in mechanisms between the "Pirkle" and cellulose CSPs involves relating their chromatographic retention behavior to their structure, known as Qualitative Structure Retention Relationships (QSRR). Some structure-specific physical-organic chemistry parameters are determined using an EPA-developed computer program and correlations are made between retention on a given CSP and some of the parameters. / Ph. D.
833

Prioritizing Residential High-Performance Resilient Building Technologies for Immediate and Future Climate Induced Natural Disaster Risks

Ladipo, Oluwateniola Eniola 14 June 2016 (has links)
Climate change is exacerbating natural disasters, and extreme weather events increase with intensity and frequency. This requires an in-depth evaluation of locations across the various U.S. climates where natural hazards, vulnerabilities, and potentially damaging impacts will vary. At the local building level within the built environment, private residences are crucial shelter systems to protect against natural disasters, and are a central component in the greater effort of creating comprehensive disaster resilient environments. In light of recent disasters such as Superstorm Sandy, there is an increased awareness that residential buildings and communities need to become more resilient for the changing climates they are located in, or will face devastating consequences. There is a great potential for specific high-performance building technologies to play a vital role in achieving disaster resilience on a local scale. The application of these technologies can not only provide immediate protection and reduced risk for buildings and its occupants, but can additionally alleviate disaster recovery stressors to critical infrastructure and livelihoods by absorbing, adapting, and rapidly recovering from extreme weather events, all while simultaneously promoting sustainable building development. However, few have evaluated the link between residential high-performance building technologies and natural disaster resilience in regards to identifying and prioritizing viable technologies to assist decision-makers with effective implementation. This research developed a framework for a process that prioritizes residential building technologies that encompass both high-performance and resilience qualities that can be implemented for a variety of housing contexts to mitigate risks associated with climate induced natural hazards. Decision-makers can utilize this process to evaluate a residential building for natural disaster risks, and communicate strategies to improve building performance and resilience in response to such risks. / Ph. D.
834

Advanced Sampling Methods for Solving Large-Scale Inverse Problems

Attia, Ahmed Mohamed Mohamed 19 September 2016 (has links)
Ensemble and variational techniques have gained wide popularity as the two main approaches for solving data assimilation and inverse problems. The majority of the methods in these two approaches are derived (at least implicitly) under the assumption that the underlying probability distributions are Gaussian. It is well accepted, however, that the Gaussianity assumption is too restrictive when applied to large nonlinear models, nonlinear observation operators, and large levels of uncertainty. This work develops a family of fully non-Gaussian data assimilation algorithms that work by directly sampling the posterior distribution. The sampling strategy is based on a Hybrid/Hamiltonian Monte Carlo (HMC) approach that can handle non-normal probability distributions. The first algorithm proposed in this work is the "HMC sampling filter", an ensemble-based data assimilation algorithm for solving the sequential filtering problem. Unlike traditional ensemble-based filters, such as the ensemble Kalman filter and the maximum likelihood ensemble filter, the proposed sampling filter naturally accommodates non-Gaussian errors and nonlinear model dynamics, as well as nonlinear observations. To test the capabilities of the HMC sampling filter numerical experiments are carried out using the Lorenz-96 model and observation operators with different levels of nonlinearity and differentiability. The filter is also tested with shallow water model on the sphere with linear observation operator. Numerical results show that the sampling filter performs well even in highly nonlinear situations where the traditional filters diverge. Next, the HMC sampling approach is extended to the four-dimensional case, where several observations are assimilated simultaneously, resulting in the second member of the proposed family of algorithms. The new algorithm, named "HMC sampling smoother", is an ensemble-based smoother for four-dimensional data assimilation that works by sampling from the posterior probability density of the solution at the initial time. The sampling smoother naturally accommodates non-Gaussian errors and nonlinear model dynamics and observation operators, and provides a full description of the posterior distribution. Numerical experiments for this algorithm are carried out using a shallow water model on the sphere with observation operators of different levels of nonlinearity. The numerical results demonstrate the advantages of the proposed method compared to the traditional variational and ensemble-based smoothing methods. The HMC sampling smoother, in its original formulation, is computationally expensive due to the innate requirement of running the forward and adjoint models repeatedly. The proposed family of algorithms proceeds by developing computationally efficient versions of the HMC sampling smoother based on reduced-order approximations of the underlying model dynamics. The reduced-order HMC sampling smoothers, developed as extensions to the original HMC smoother, are tested numerically using the shallow-water equations model in Cartesian coordinates. The results reveal that the reduced-order versions of the smoother are capable of accurately capturing the posterior probability density, while being significantly faster than the original full order formulation. In the presence of nonlinear model dynamics, nonlinear observation operator, or non-Gaussian errors, the prior distribution in the sequential data assimilation framework is not analytically tractable. In the original formulation of the HMC sampling filter, the prior distribution is approximated by a Gaussian distribution whose parameters are inferred from the ensemble of forecasts. The Gaussian prior assumption in the original HMC filter is relaxed. Specifically, a clustering step is introduced after the forecast phase of the filter, and the prior density function is estimated by fitting a Gaussian Mixture Model (GMM) to the prior ensemble. The base filter developed following this strategy is named cluster HMC sampling filter (ClHMC ). A multi-chain version of the ClHMC filter, namely MC-ClHMC , is also proposed to guarantee that samples are taken from the vicinities of all probability modes of the formulated posterior. These methodologies are tested using a quasi-geostrophic (QG) model with double-gyre wind forcing and bi-harmonic friction. Numerical results demonstrate the usefulness of using GMMs to relax the Gaussian prior assumption in the HMC filtering paradigm. To provide a unified platform for data assimilation research, a flexible and a highly-extensible testing suite, named DATeS , is developed and described in this work. The core of DATeS is implemented in Python to enable for Object-Oriented capabilities. The main components, such as the models, the data assimilation algorithms, the linear algebra solvers, and the time discretization routines are independent of each other, such as to offer maximum flexibility to configure data assimilation studies. / Ph. D.
835

Mechanical Properties and Durability of Sustainable UHPC Incorporated Industrial Waste Residues and Sea/Manufactured Sand

Ge, W., Zhu, S., Yang, J., Ashour, Ashraf, Zhang, Z., Li, W., Jiang, H., Cao, D., Shuai, H. 02 November 2023 (has links)
Yes / Considering the continuous development of sustainable development, energy saving, and emission reduction concepts, it is very important to reduce concrete's cement content in order to improve its environmental impact. Using reactive admixture to replace part of the cement in ultra-high performance concrete (UHPC) can effectively improve the overall performance of the concrete and reduce carbon dioxide emissions (CO2), which is an important aspect of environmental protection. Here, industrial waste residue (fly ash and slag), sea sand (SS), and manufactured sand (MS) were used to produce UHPC under standard curing condition, to reduce the material cost and make the it more environmentally friendly and sustainable. The effects of water-binder ratio, contents of cementitious materials, types of sands, and content of steel fibers on the mechanical performance of UHPC under standard curing were investigated experimentally. In addition, the effects of various factors on the depth under hydraulic pressure and electric flux of UHPC, mass loss, relative dynamic modulus of elasticity, flexural, and compressive strengths of UHPC specimens after freeze-thaw cycles were conducted to evaluate the impermeability, chloride, and freeze-thaw resistance of various UHPCs produced. The obtained experimental results show that the SS-UHPC and MS-UHPC prepared by standard curing exhibit high strength, excellent impermeability, and chloride resistance. The frost resistant grade of all groups of UHPCs prepared by standard curing are greater than F500 and had excellent freeze-thaw resistance, including those produced with local tap water or artificial seawater. The investigation presented in this paper could contribute to the production of new UHPCs of low cost and environmental-friendly and accelerate the application of UHPC in engineering structures.
836

Exploring the Landscape of Big Data Analytics Through Domain-Aware Algorithm Design

Dash, Sajal 20 August 2020 (has links)
Experimental and observational data emerging from various scientific domains necessitate fast, accurate, and low-cost analysis of the data. While exploring the landscape of big data analytics, multiple challenges arise from three characteristics of big data: the volume, the variety, and the velocity. High volume and velocity of the data warrant a large amount of storage, memory, and compute power while a large variety of data demands cognition across domains. Addressing domain-intrinsic properties of data can help us analyze the data efficiently through the frugal use of high-performance computing (HPC) resources. In this thesis, we present our exploration of the data analytics landscape with domain-aware approximate and incremental algorithm design. We propose three guidelines targeting three properties of big data for domain-aware big data analytics: (1) explore geometric and domain-specific properties of high dimensional data for succinct representation, which addresses the volume property, (2) design domain-aware algorithms through mapping of domain problems to computational problems, which addresses the variety property, and (3) leverage incremental arrival of data through incremental analysis and invention of problem-specific merging methodologies, which addresses the velocity property. We demonstrate these three guidelines through the solution approaches of three representative domain problems. We present Claret, a fast and portable parallel weighted multi-dimensional scaling (WMDS) tool, to demonstrate the application of the first guideline. It combines algorithmic concepts extended from the stochastic force-based multi-dimensional scaling (SF-MDS) and Glimmer. Claret computes approximate weighted Euclidean distances by combining a novel data mapping called stretching and Johnson Lindestrauss' lemma to reduce the complexity of WMDS from O(f(n)d) to O(f(n) log d). In demonstrating the second guideline, we map the problem of identifying multi-hit combinations of genetic mutations responsible for cancers to weighted set cover (WSC) problem by leveraging the semantics of cancer genomic data obtained from cancer biology. Solving the mapped WSC with an approximate algorithm, we identified a set of multi-hit combinations that differentiate between tumor and normal tissue samples. To identify three- and four-hits, which require orders of magnitude larger computational power, we have scaled out the WSC algorithm on a hundred nodes of Summit supercomputer. In demonstrating the third guideline, we developed a tool iBLAST to perform an incremental sequence similarity search. Developing new statistics to combine search results over time makes incremental analysis feasible. iBLAST performs (1+δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We also explored various approaches to mitigate catastrophic forgetting in incremental training of deep learning models. / Doctor of Philosophy / Experimental and observational data emerging from various scientific domains necessitate fast, accurate, and low-cost analysis of the data. While exploring the landscape of big data analytics, multiple challenges arise from three characteristics of big data: the volume, the variety, and the velocity. Here volume represents the data's size, variety represents various sources and formats of the data, and velocity represents the data arrival rate. High volume and velocity of the data warrant a large amount of storage, memory, and computational power. In contrast, a large variety of data demands cognition across domains. Addressing domain-intrinsic properties of data can help us analyze the data efficiently through the frugal use of high-performance computing (HPC) resources. This thesis presents our exploration of the data analytics landscape with domain-aware approximate and incremental algorithm design. We propose three guidelines targeting three properties of big data for domain-aware big data analytics: (1) explore geometric (pair-wise distance and distribution-related) and domain-specific properties of high dimensional data for succinct representation, which addresses the volume property, (2) design domain-aware algorithms through mapping of domain problems to computational problems, which addresses the variety property, and (3) leverage incremental data arrival through incremental analysis and invention of problem-specific merging methodologies, which addresses the velocity property. We demonstrate these three guidelines through the solution approaches of three representative domain problems. We demonstrate the application of the first guideline through the design and development of Claret. Claret is a fast and portable parallel weighted multi-dimensional scaling (WMDS) tool that can reduce the dimension of high-dimensional data points. In demonstrating the second guideline, we identify combinations of cancer-causing gene mutations by mapping the problem to a well known computational problem known as the weighted set cover (WSC) problem. We have scaled out the WSC algorithm on a hundred nodes of Summit supercomputer to solve the problem in less than two hours instead of an estimated hundred years. In demonstrating the third guideline, we developed a tool iBLAST to perform an incremental sequence similarity search. This analysis was made possible by developing new statistics to combine search results over time. We also explored various approaches to mitigate the catastrophic forgetting of deep learning models, where a model forgets to perform machine learning tasks efficiently on older data in a streaming setting.
837

Directive-Based Data Partitioning and Pipelining and Auto-Tuning for High-Performance GPU Computing

Cui, Xuewen 15 December 2020 (has links)
The computer science community needs simpler mechanisms to achieve the performance potential of accelerators, such as graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and co-processors (e.g., Intel Xeon Phi), due to their increasing use in state-of-the-art supercomputers. Over the past 10 years, we have seen a significant improvement in both computing power and memory connection bandwidth for accelerators. However, we also observe that the computation power has grown significantly faster than the interconnection bandwidth between the central processing unit (CPU) and the accelerator. Given that accelerators generally have their own discrete memory space, data needs to be copied from the CPU host memory to the accelerator (device) memory before computation starts on the accelerator. Moreover, programming models like CUDA, OpenMP, OpenACC, and OpenCL can efficiently offload compute-intensive workloads to these accelerators. However, achieving the overlapping of data transfers with computation in a kernel with these models is neither simple nor straightforward. Instead, codes copy data to or from the device without overlapping or requiring explicit user design and refactoring. Achieving performance can require extensive refactoring and hand-tuning to apply data transfer optimizations, and users must manually partition their dataset whenever its size is larger than device memory, which can be highly difficult when the device memory size is not exposed to the user. As the systems are becoming more and more complex in terms of heterogeneity, CPUs are responsible for handling many tasks related to other accelerators, computation and data movement tasks, task dependency checking, and task callbacks. Leaving all logic controls to the CPU not only costs extra communication delay over the PCI-e bus but also consumes the CPU resources, which may affect the performance of other CPU tasks. This thesis work aims to provide efficient directive-based data pipelining approaches for GPUs that tackle these issues and improve performance, programmability, and memory management. / Doctor of Philosophy / Over the past decade, parallel accelerators have become increasingly prominent in this emerging era of "big data, big compute, and artificial intelligence.'' In more recent supercomputers and datacenter clusters, we find multi-core central processing units (CPUs), many-core graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and co-processors (e.g., Intel Xeon Phi) being used to accelerate many kinds of computation tasks. While many new programming models have been proposed to support these accelerators, scientists or developers without domain knowledge usually find existing programming models not efficient enough to port their code to accelerators. Due to the limited accelerator on-chip memory size, the data array size is often too large to fit in the on-chip memory, especially while dealing with deep learning tasks. The data need to be partitioned and managed properly, which requires more hand-tuning effort. Moreover, performance tuning is difficult for developers to achieve high performance for specific applications due to a lack of domain knowledge. To handle these problems, this dissertation aims to propose a general approach to provide better programmability, performance, and data management for the accelerators. Accelerator users often prefer to keep their existing verified C, C++, or Fortran code rather than grapple with the unfamiliar code. Since 2013, OpenMP has provided a straightforward way to adapt existing programs to accelerated systems. We propose multiple associated clauses to help developers easily partition and pipeline the accelerated code. Specifically, the proposed extension can overlap kernel computation and data transfer between host and device efficiently. The extension supports memory over-subscription, meaning the memory size required by the tasks could be larger than the GPU size. The internal scheduler guarantees that the data is swapped out correctly and efficiently. Machine learning methods are also leveraged to help with auto-tuning accelerator performance.
838

Scalable and Energy Efficient Execution Methods for Multicore Systems

Li, Dong 16 February 2011 (has links)
Multicore architectures impose great pressure on resource management. The exploration spaces available for resource management increase explosively, especially for large-scale high end computing systems. The availability of abundant parallelism causes scalability concerns at all levels. Multicore architectures also impose pressure on power management. Growth in the number of cores causes continuous growth in power. In this dissertation, we introduce methods and techniques to enable scalable and energy efficient execution of parallel applications on multicore architectures. We study strategies and methodologies that combine DCT and DVFS for the hybrid MPI/OpenMP programming model. Our algorithms yield substantial energy saving (8.74% on average and up to 13.8%) with either negligible performance loss or performance gain (up to 7.5%). To save additional energy for high-end computing systems, we propose a power-aware MPI task aggregation framework. The framework predicts the performance effect of task aggregation in both computation and communication phases and its impact in terms of execution time and energy of MPI programs. Our framework provides accurate predictions that lead to substantial energy saving through aggregation (64.87% on average and up to 70.03%) with tolerable performance loss (under 5%). As we aggregate multiple MPI tasks within the same node, we have the scalability concern of memory registration for high performance networking. We propose a new memory registration/deregistration strategy to reduce registered memory on multicore architectures with helper threads. We investigate design polices and performance implications of the helper thread approach. Our method efficiently reduces registered memory (23.62% on average and up to 49.39%) and avoids memory registration/deregistration costs for reused communication memory. Our system enables the execution of application input sets that could not run to the completion with the memory registration limitation. / Ph. D.
839

Improving the Efficiency of Parallel Applications on Multithreaded and Multicore Systems

Curtis-Maury, Matthew 15 April 2008 (has links)
The scalability of parallel applications executing on multithreaded and multicore multiprocessors is often quite limited due to large degrees of contention over shared resources on these systems. In fact, negative scalability frequently occurs such that a non-negligable performance loss is observed through the use of more processors and cores. In this dissertation, we present a prediction model for identifying efficient operating points of concurrency in multithreaded scientific applications in terms of both performance as a primary objective and power secondarily. We also present a runtime system that uses live analysis of hardware event rates through the prediction model to optimize applications dynamically. We discuss a dynamic, phase-aware performance prediction model (DPAPP), which combines statistical learning techniques, including multivariate linear regression and artificial neural networks, with runtime analysis of data collected from hardware event counters to locate optimal operating points of concurrency. We find that the scalability model achieves accuracy approaching 95%, sufficiently accurate to identify improved concurrency levels and thread placements from within real parallel scientific applications. Using DPAPP, we develop a prediction-driven runtime optimization scheme, called ACTOR, which throttles concurrency so that power consumption can be reduced and performance can be set at the knee of the scalability curve of each parallel execution phase in an application. ACTOR successfully identifies and exploits program phases where limited scalability results in a performance loss through the use of more processing elements, providing simultaneous reductions in execution time by 5%-18% and power consumption by 0%-11% across a variety of parallel applications and architectures. Further, we extend DPAPP and ACTOR to include support for runtime adaptation of DVFS, allowing for the synergistic exploitation of concurrency throttling and DVFS from within a single, autonomically-acting library, providing improved energy-efficiency compared to either approach in isolation. / Ph. D.
840

Scheduling on Asymmetric Architectures

Blagojevic, Filip 22 July 2008 (has links)
We explore runtime mechanisms and policies for scheduling dynamic multi-grain parallelism on heterogeneous multi-core processors. Heterogeneous multi-core processors integrate conventional cores that run legacy codes with specialized cores that serve as computational accelerators. The term multi-grain parallelism refers to the exposure of multiple dimensions of parallelism from within the runtime system, so as to best exploit a parallel architecture with heterogeneous computational capabilities between its cores and execution units. To maximize performance on heterogeneous multi-core processors, programs need to expose multiple dimensions of parallelism simultaneously. Unfortunately, programming with multiple dimensions of parallelism is to date an ad hoc process, relying heavily on the intuition and skill of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. We investigate user- and kernel-level schedulers that dynamically "rightsize" the dimensions and degrees of parallelism on the asymmetric parallel platforms. The schedulers address the problem of mapping application-specific concurrency to an architecture with multiple hardware layers of parallelism, without requiring programmer intervention or sophisticated compiler support. Our runtime environment outperforms the native Linux and MPI scheduling environment by up to a factor of 2.7. We also present a model of multi-dimensional parallel computation for steering the parallelization process on heterogeneous multi-core processors. The model predicts with high accuracy the execution time and scalability of a program using conventional processors and accelerators simultaneously. More specifically, the model reveals optimal degrees of multi-dimensional, task-level and data-level concurrency, to maximize performance across cores. We evaluate our runtime policies as well as the performance model we developed, on an IBM Cell BladeCenter, as well as on a cluster composed of Playstation3 nodes, using two realistic bioinformatics applications. / Ph. D.

Page generated in 0.0676 seconds