Global ETD Search

1	The Design of Cloud-Economical Computing Services for Program Trading Hsu, Chi-Shin 26 August 2012 (has links) Program Trading has gotten more popular recent years. According to thestatistics, there was about 53.6% of daily volume in the United States, and increased to 73% in 2009. With the universal of program trading, more people have begun to research program trading. The purpose of this paper is constructing a developed platform of program trading for researching or developing. In addition to developed platform, we provide the run-time environment, and three main functions: 1. The job scheduler 2. The high scalability 3. The developed platform In this paper, we use SLURM to implement an economical computing service for program trading. SLURM is a resource management software for some large clusters. However it lacked for an easy interface to the ended users. We modify Xinetd as the external interface for SLURM, and implement the program trading development platform for researching or developing. According to the result, using our scheduler and the external interface that modify from Xinetd can be effective in controlling the server resource and increase the availability. cluster computing cloud computing SLURM Xinetd program trading
2	The Design of Fault Tolerance of Cluster Computing Platform Liao, Yu-tien 29 August 2012 (has links) If nodes got failed in a distributed application service, it will not only pay more cost to handle with these results missing, but also make scheduler cause additional loadings. For whole results don¡¦t recalculated cause by fault occurs, it will be recalculated data of fault nodes in backup machines. Therefore, this paper uses three methods: N + N nodes, N + 1 nodes, and N + 1 nodes with probability to experiment and analyze their pros and cons, the third way gives jobs weight before assigning them, and converts weight into probability and nice value(defined by SLURM[1]) to influence scheduler¡¦s decision of jobs¡¦ order. When fault occurs, calculating in normal nodes¡¦ results will back to control node, and then the fault node¡¦s jobs are going to be reassigned or not be reassigned to backup machine for getting complete results. Finally, we will analyze these three ways good and bad. SLURM cluster system Job duplication fault-tolerance distributed computing
3	Isolation of Temporary Storage in High Performance Computing via Linux Namespacing Satchwell, Steven Tanner 01 June 2018 (has links) Per job isolation of temporary file storage in High Performance Computing (HPC) environments provide benefits in security, efficiency, and administration. HPC system administrators can use the mount_isolation Slurm task plugin to improve security by isolating temporary files where no isolation previously existed. The mount_isolation plugin also increases efficiency by removing obsolete temporary files immediately after each job terminates. This frees valuable disk space in the HPC environment to be used by other jobs. These two improvements reduce the amount of work system administrators must expend to ensure temporary files are removed in a timely manner.Previous temporary file removal solutions were removal on reboot, manual removal, or removal through a Slurm epilog script. The epilog script was the most effective of these, allowing files to be removed in a timely manner. However, HPC users can have multiple supercomputing jobs running concurrently. Temporary files generated by these concurrent or overlapping jobs are only deleted by the epilog script when all jobs run by that user on the compute node have completed. Even though the user may have only one running job, the temporary directory may still contain temporary files from many previously executed jobs, taking up valuable temporary storage on the compute node. The mount_isolation plugin isolates these temporary files on a per job basis allowing prompt removal of obsolete files regardless of job overlap. bind mounts mount namespaces Slurm supercomputing HPC temporary storage Science and Technology Studies
4	Objective-Driven Strategies for HPC Job Scheduling Goponenko, Alexander V 01 January 2024 (has links) (PDF) As High-Performance Computing (HPC) becomes increasingly prevalent and resource-intensive, there is a growing need for the development of more efficient job schedulers, which play a crucial role in the performance of HPC clusters. This dissertation manifests a comprehensive approach to this complex issue, contributing to three major components of the problem: (1) metrics of job packing efficiency and fairness, (2) advanced scheduling algorithms, and (3) job resource utilization prediction techniques. To ensure high relevance of the results, this study emphasizes scheduling objectives. Therefore, scheduling quality metrics are investigated first, yielding a set of metrics that allow comparing alternative schedules and evaluating scheduling goals trade-offs. The set of metrics enables the first comprehensive analysis of effects of different scheduling improvement approaches on several aspects of scheduling quality, covering a variety of list scheduling algorithms as well as constraint programming optimization schedulers. The contribution to the third research area covers techniques to measure and estimate resource usage data. It reports a first-of-a-kind evaluation of various job runtime prediction techniques in improving scheduling quality, demonstrates an approach capable of estimating job parameters beyond the runtime, and explores measuring resources consumed by a job in an HPC cluster. The dissertation concludes with a practical demonstration of these concepts through an I/O-aware scheduling prototype that measures real-time resource utilization, autonomously determines job resource requirements the scheduler needs, and implements full-featured multi-resource backfill scheduling that accounts for the specific properties of the parallel file system bandwidth resource. The study exhibits the advantages of further reducing I/O congestion—beyond the capability of generic I/O-aware scheduling—and presents the Workload-adaptive scheduling strategy that attains such improvement. This approach features a “two-group” approximation technique to maintain efficient performance regardless of zero-throughput job availability. An evaluation conducted on a real HPC cluster demonstrates the effectiveness of the novel strategy. high-performance computing parallel job scheduling schedule quality constraint programming I/O-aware scheduling Slurm

1

Page generated in 0.0223 seconds