• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 149
  • 24
  • 19
  • 12
  • 8
  • 4
  • 4
  • 4
  • 3
  • 2
  • 2
  • 1
  • Tagged with
  • 269
  • 96
  • 82
  • 74
  • 67
  • 47
  • 37
  • 35
  • 31
  • 30
  • 28
  • 26
  • 25
  • 25
  • 25
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Benchmarking and Accelerating TensorFlow-based Deep Learning on Modern HPC Systems

Biswas, Rajarshi 12 October 2018 (has links)
No description available.
62

Profiling MPI Primitives in Real-time Using OSU INAM

Sankarapandian Dayala Ganesh R, Kamal Raj 07 October 2020 (has links)
No description available.
63

Representing Multi-Parent Organizational Structures for Use in High Performance Computing Resource Scheduling Algorithms

Brown, Lloyd T. 06 January 2010 (has links) (PDF)
Historically, organizational structures of many universities and corporations have followed a strictly tree-based, hierarchical model. These organizations are defined with no more than one parent organization, and typically resource requirements for the organization could be derived from the parent organization. In recent years, however, many institutions have created interdisciplinary research groups which incorporate multiple fields of research across multiple campus organizations. For example, at Brigham Young University, there exists a biophysics research group, a child organization of both the Department of Biology and the Department of Physics, making it unclear how to define its resource requirements in the context of multiple parents from diverse colleges. As computing resources are allocated to organizations, the requirements of those organizations must be taken into account. However, when organizations have multiple parent organizations, it is unclear which restrictions or allocations are appropriate for the organization, as shown with the biophysics research group described above. Extending the example, if a campus high-performance computing facility restricts resources on an organizational basis, and the Biology and Physics departments are allocated different resource levels, the newly formed biophysics group will need system administrators' intervention to assure appropriate resource allocation. This document describes a versatile system for modeling organizational structure, including defining multiple parent organizations, the inheritance of arbitrary properties from parent to children, and, when inherited attributes conflict, includes an extensible mechanism for defining conflict resolution policies. This system allows for arbitrary parameters to be applied at any level of the organizational structure. This inherited information can then be used for resource allocation of the campus high performance computing facility.
64

OpenFPM: A scalable environment for particle and particle-mesh codes on parallel computers

Incardona, Pietro 30 August 2022 (has links)
Scalable and efficient numerical simulations continue to gain importance, as computation is firmly established tool of discovery, together with theory and experiment. Meanwhile, the performance of computing hardware grows with increasing heterogeneous hardware, enabling simulations of ever more complex models. However, efficiently implementing scalable codes on heterogeneous, distributed hardware systems becomes the bottleneck. This bottleneck can be alleviated by intermediate software layers that provide higher-level abstractions closer to the problem domain, hence allowing the computational scientist to focus on the simulation. Here, we present OpenFPM, an open and scalable framework that provides an abstraction layer for numerical simulations using particles and/or meshes. OpenFPM provides transparent and scalable infrastructure for shared-memory and distributed-memory implementations of particles-only and hybrid particle-mesh simulations of both discrete and continuous models, as well as non-simulation codes. This infrastructure is complemented with frequently used numerical routines, as well as interfaces to third-party libraries. This thesis will present the architecture and design of OpenFPM, detail the underlying abstractions, and benchmark the framework in applications ranging from Smoothed-Particle Hydrodynamics (SPH) to Molecular Dynamics (MD), Discrete Element Methods (DEM), Vortex Methods, stencil codes, high-dimensional Monte Carlo sampling (CMA-ES), and Reaction-Diffusion solvers, comparing it to the current state of the art and existing software frameworks.
65

General Resource Management for Computationally Demanding Scientific Software

Xinchen Guo (13965024) 17 October 2022 (has links)
<p>Many scientific problems contain nonlinear systems of equations that require multiple iterations to reach converged results. Such software pattern follows the bulk synchronous parallel model. In that sense, an iteration is a superstep, which includes computation of local data, global communication to update data for the next iteration, and synchronization between iterations. In modern HPC environments, MPI is used to distribute data and OpenMP is used to accelerate computation of each data. More MPI processes increase the cost of communication and synchronization whereas more OpenMP threads increase the overhead of multithreading. A proper combination of MPI and OpenMP is critical to accelerate each superstep. Proper orchestration of MPI processes and OpenMP threads is also needed to efficiently use the underlying hardware resources.</p> <p>  </p> <p>Purdue’s multi-purpose nanodevice simulation tool NEMO5 distributes the computation of independent spectral points by MPI. The computation of each spectral point is accelerated with OpenMP threads. A few examples of resource utilization optimizations are presented. One type of simulation applies the non-equilibrium Green’s function method to accurately predict drug molecules. Our profiling results suggest the optimum combination has more MPI processes and fewer OpenMP threads. However, NEMO5's memory usage has large spikes for each spectral point. Such behavior limits the concurrency of spectral point calculation due to the lack of swap space on HPC nodes to prevent out-of-memory. </p> <p><br></p> <p>A distributed resource management framework is proposed and developed to automatically and dynamically manage memory and CPU usage. The concurrent calculation of spectral points is pipelined to avoid simultaneous peak memory usage. This allows more MPI processes and fewer OpenMP threads for higher parallel efficiency. Automatic CPU usage adjustment also reduces the time cost to fill and drain the calculation pipeline. The resource management framework requires minimum code intrusion and successfully speeds up the calculation. It can also be generalized for other simulation software.</p>
66

Cryopreservation and Hypothermal Storage of Hematopoietic Stem Cells

AlMulhem, Norah 11 September 2015 (has links)
No description available.
67

A Framework For Elastic Execution of Existing MPI Programs

Raveendran, Aarthi 08 September 2011 (has links)
No description available.
68

Optimizing All-to-All and Allgather Communications on GPGPU Clusters

Singh, Ashish Kumar 25 June 2012 (has links)
No description available.
69

FEASIBILITY STUDIES OF STATISTIC MULTIPLEXED COMPUTING

Celik, Yasin January 2018 (has links)
In 2012, when Professor Shi introduced me to the concept of Statistic Multiplexed Computing (SMC), I was skeptical. It contradicted everything I have learned and heard about distributed and parallel computing. However, I did believe that unhandled failures in any application will negatively impact its scalability. For that, I agreed to take on the feasibility study of SMC for practical applications. After six+ years research and experimentations, it became clear to me that the most widely believed misconception is “either performance or reliability” when upscaling a distributed application. This conception was the result of the direct use of hop-by-hop communication protocols in distributed application construction. Terminology: Hop-by-hop data protocol is a two-sided reliable lossless data communication protocol for transmitting data between a sender and a receiver. Either the sender or the receiver crash will cause data losses. Examples: MPI, RPC, RMI, OpenMP. End-to-end data protocol is a single-sided reliable lossless data communication protocol for transmitting data between application programs. All runtime available processors, networks and storage will be automatically dispatched to the best effort support of the reliable communication regardless transient and permanent device failures. Examples: HDFS, Blockchain, Fabric and SMC. Active end-to-end data protocol is a single-sided reliable lossless data communication pro- tocol for transmitting data and automatically synchronizing application programs. Example: SMC (AnkaCom, AnkaStore (this dissertation)). Unlike the hop-by-hop protocols, the use of end-to-end protocol forms an application- dependent overlay network. An overlay network for distributed and parallel computing application, such as Blockchain, has been proven to defy the “common wisdom” for two important distributed computing challenges: a) Extreme scale computing without single-point failures is practically feasible. Thus, all transaction or data losses can be eliminated. b) Extreme scale synchronized transaction replication is practically feasible. Thus, the CAP conjecture and theorem become irrelevant. Unlike passive overlay networks, such as the HDFS and Blockchain, this dissertation study proves that an active overlay network can deliver higher performance, higher reliability and security at the same time as the application up scales. Although application-level security is not part of this dissertation, it is easy to see that application-level end-to-end protocols will fundamentally eliminate the “man-in-the-middle” attacks. This will nullify many well-known attacks. With the zero-single-point failure and zero impact synchronous replication features, SMC applications are naturally resistant to DDoS and ransomware attacks. This dissertation explores practical implementations of the SMC concept for compute intensive (CI) and data intensive (DI) applications. This defense will disclose the details of CI and DI runtime implementations and results of inductive computational experiments. The computational environments include the NSF Chameleon bare-metal HPC cloud and Temple’s TCloud cluster. / Computer and Information Science
70

Scalable and Productive Data Management for High-Performance Analytics

Youssef, Karim Yasser Mohamed Yousri 07 November 2023 (has links)
Advancements in data acquisition technologies across different domains, from genome sequencing to satellite and telescope imaging to large-scale physics simulations, are leading to an exponential growth in dataset sizes. Extracting knowledge from this wealth of data enables scientific discoveries at unprecedented scales. However, the sheer volume of the gathered datasets is a bottleneck for knowledge discovery. High-performance computing (HPC) provides a scalable infrastructure to extract knowledge from these massive datasets. However, multiple data management performance gaps exist between big data analytics software and HPC systems. These gaps arise from multiple factors, including the tradeoff between performance and programming productivity, data growth at a faster rate than memory capacity, and the high storage footprints of data analytics workflows. This dissertation bridges these gaps by combining productive data management interfaces with application-specific optimizations of data parallelism, memory operation, and storage management. First, we address the performance-productivity tradeoff by leveraging Spark and optimizing input data partitioning. Our solution optimizes programming productivity while achieving comparable performance to the Message Passing Interface (MPI) for scalable bioinformatics. Second, we address the operating system's kernel limitations for out-of-core data processing by autotuning memory management parameters in userspace. Finally, we address I/O and storage efficiency bottlenecks in data analytics workflows that iteratively and incrementally create and reuse persistent data structures such as graphs, data frames, and key-value datastores. / Doctor of Philosophy / Advancements in various fields, like genetics, satellite imaging, and physics simulations, are generating massive amounts of data. Analyzing this data can lead to groundbreaking scientific discoveries. However, the sheer size of these datasets presents a challenge. High-performance computing (HPC) offers a solution to process and understand this data efficiently. Still, several issues hinder the performance of big data analytics software on HPC systems. These problems include finding the right balance between performance and ease of programming, dealing with the challenges of handling massive amounts of data, and optimizing storage usage. This dissertation focuses on three areas to improve high-performance data analytics (HPDA). Firstly, it demonstrates how using Spark and optimized data partitioning can optimize programming productivity while achieving similar scalability as the Message Passing Interface (MPI) for scalable bioinformatics. Secondly, it addresses the limitations of the operating system's memory management for processing data that is too large to fit entirely in memory. Lastly, it tackles the efficiency issues related to input/output operations and storage when dealing with data structures like graphs, data frames, and key-value datastores in iterative and incremental workflows.

Page generated in 0.0579 seconds