1 |
Condor - Job-ManagementsystemGrabner, Rene 27 June 2002 (has links) (PDF)
In diesem Vortrag wird Condor als ein Job-Managementsystem für Rechen-Cluster vorgestellt. Dabei wird Funktionsweise an einem Beispiel demonstriert und erläutert. Besonders untersucht wird das Checkpointing und Migrieren von Prozessen zwischen verschiedenen Knoten.
|
2 |
GEMS: A Fault Tolerant Grid Job Management SystemTadepalli, Sriram Satish 08 January 2004 (has links)
The Grid environments are inherently unstable. Resources join and leave the environment without any prior notification. Application fault detection, checkpointing and restart is of foremost importance in the Grid environments. The need for fault tolerance is especially acute for large parallel applications since the failure rate grows with the number of processors and the duration of the computation.
A Grid job management system hides the heterogeneity of the Grid and the complexity of the Grid protocols from the user. The user submits a job to the Grid job management system and it finds the appropriate resource, submits the job and transfers the output files to the user upon job completion. However, current Grid job management systems do not detect application failures.
The goal of this research is to develop a Grid job management system that can efficiently detect application failures. Failed jobs are restarted either on the same resource or the job is migrated to another resource and restarted. The research also aims to identify the role of local resource managers in the fault detection and migration of Grid applications. / Master of Science
|
3 |
Condor - Job-ManagementsystemGrabner, Rene 27 June 2002 (has links)
In diesem Vortrag wird Condor als ein Job-Managementsystem für Rechen-Cluster vorgestellt. Dabei wird Funktionsweise an einem Beispiel demonstriert und erläutert. Besonders untersucht wird das Checkpointing und Migrieren von Prozessen zwischen verschiedenen Knoten.
|
4 |
Portable Tools for Interoperable Grids : Modular Architectures and Software for Job and Workflow ManagementTordsson, Johan January 2009 (has links)
The emergence of Grid computing infrastructures enables researchers to shareresources and collaborate in more efficient ways than before, despite belongingto different organizations and being geographically distributed. While the Gridcomputing paradigm offers new opportunities, it also gives rise to newdifficulties. This thesis investigates methods, architectures, and algorithmsfor a range of topics in the area of Grid resource management. One studiedtopic is how to automate and improve resource selection, despite heterogeneityin Grid hardware, software, availability, ownership, and usage policies.Algorithmical difficulties for this are, e.g., characterization of jobs andresources, prediction of resource performance, and data placementconsiderations. Investigated Quality of Service aspects of resource selectioninclude how to guarantee job start and/or completion times as well as how tosynchronize multiple resources for coordinated use through coallocation.Another explored research topic is architectural considerations for frameworksthat simplify and automate submission, monitoring, and fault handling for largeamounts of jobs. This thesis also investigates suitable Grid interactionpatterns for scientific workflows, studies programming models that enable dataparallelism for such workflows, as well as analyzes how workflow compositiontools should be designed to increase flexibility and expressiveness. We today have the somewhat paradoxical situation where Grids, originally aimed tofederate resources and overcome interoperability problems between differentcomputing platforms, themselves struggle with interoperability problems causedby the wide range of interfaces, protocols, and data formats that are used indifferent environments. This thesis demonstrates how proof-of-concept softwaretools for Grid resource management can, by using (proposed) standard formatsand protocols as well as leveraging state-of-the-art principles fromservice-oriented architectures, be made independent of current Gridinfrastructures. Further interoperability contributions include an in-depthstudy that surveys issues related to the use of Grid resources in scientificworkflows. This study improves our understanding of interoperability amongscientific workflow systems by viewing this topic from three differentperspectives: model of computation, workflow language, and executionenvironment. A final contribution in this thesis is the investigation of how the design ofGrid middleware tools can adopt principles and concepts from softwareengineering in order to improve, e.g., adaptability and interoperability.
|
Page generated in 0.1011 seconds