• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 34
  • 6
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 54
  • 15
  • 15
  • 14
  • 13
  • 12
  • 11
  • 10
  • 10
  • 9
  • 9
  • 9
  • 8
  • 8
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Implementation of Forward and Reverse Mode Automatic Differentiation for GNU Octave Applications

Kang, Yixiu 04 April 2003 (has links)
No description available.
42

Hardware assisted memory checkpointing and applications in debugging and reliability

Doudalis, Ioannis 25 July 2011 (has links)
The problems of software debugging and system reliability/availability are among the most challenging problems the computing industry is facing today, with direct impact on the development and operating costs of computing systems. A promising debugging technique that assists programmers identify and fix the causes of software bugs a lot more efficiently is bidirectional debugging, which enables the user to execute the program in "reverse", and a typical method used to recover a system after a fault is backwards error recovery, which restores the system to the last error-free state. Both reverse execution and backwards error recovery are enabled by creating memory checkpoints, which are used to restore the program/system to a prior point in time and re-execute until the point of interest. The checkpointing frequency is the primary factor that affects both the latency of reverse execution and the recovery time of the system; more frequent checkpoints reduce the necessary re-execution time. Frequent creation of checkpoints poses performance challenges, because of the increased number of memory reads and writes necessary for copying the modified system/program memory, and also because of software interventions, additional synchronization and I/O, etc., needed for creating a checkpoint. In this thesis I examine a number of different hardware accelerators, whose role is to create frequent memory checkpoints in the background, at minimal performance overheads. For the purpose of reverse execution, I propose the HARE and Euripus hardware checkpoint accelerators. HARE and Euripus create different types of checkpoints, and employ different methods for keeping track of the modified memory. As a result, HARE and Euripus have different hardware costs and provide different functionality which directly affects the latency of reverse execution. For improving the availability of the system, I propose the Kyma hardware accelerator. Kyma enables simultaneous creation of checkpoints at different frequencies, which allows the system to recover from multiple types of errors and tolerate variable error-detection latencies. The Kyma and Euripus hardware engines have similar architectures, but the functionality of the Kyma engine is optimized for further reducing the performance overheads and improving the reliability of the system. The functionality of the Kyma and Euripus engines can be combined into a unified accelerator that can serve the needs of both bidirectional debugging and system recovery.
43

Développements du modèle adjoint de la différentiation algorithmique destinés aux applications intensives en calcul / Extensions of algorithmic differentiation by source transformation inspired by modern scientific computing

Taftaf, Ala 17 January 2017 (has links)
Le mode adjoint de la Différentiation Algorithmique (DA) est particulièrement intéressant pour le calcul des gradients. Cependant, ce mode utilise les valeurs intermédiaires de la simulation d'origine dans l'ordre inverse à un coût qui augmente avec la longueur de la simulation. La DA cherche des stratégies pour réduire ce coût, par exemple en profitant de la structure du programme donné. Dans ce travail, nous considérons d'une part le cas des boucles à point-fixe pour lesquels plusieurs auteurs ont proposé des stratégies adjointes adaptées. Parmi ces stratégies, nous choisissons celle de B. Christianson. Nous spécifions la méthode choisie et nous décrivons la manière dont nous l'avons implémentée dans l'outil de DA Tapenade. Les expériences sur une application de taille moyenne montrent une réduction importante de la consommation de mémoire. D'autre part, nous étudions le checkpointing dans le cas de programmes parallèles MPI avec des communications point-à-point. Nous proposons des techniques pour appliquer le checkpointing à ces programmes. Nous fournissons des éléments de preuve de correction de nos techniques et nous les expérimentons sur des codes représentatifs. Ce travail a été effectué dans le cadre du projet européen ``AboutFlow'' / The adjoint mode of Algorithmic Differentiation (AD) is particularly attractive for computing gradients. However, this mode needs to use the intermediate values of the original simulation in reverse order at a cost that increases with the length of the simulation. AD research looks for strategies to reduce this cost, for instance by taking advantage of the structure of the given program. In this work, we consider on one hand the frequent case of Fixed-Point loops for which several authors have proposed adapted adjoint strategies. Among these strategies, we select the one introduced by B. Christianson. We specify further the selected method and we describe the way we implemented it inside the AD tool Tapenade. Experiments on a medium-size application shows a major reduction of the memory needed to store trajectories. On the other hand, we study checkpointing in the case of MPI parallel programs with point-to-point communications. We propose techniques to apply checkpointing to these programs. We provide proof of correctness of our techniques and we experiment them on representative CFD codes
44

Investigating programming language support for fault-tolerance

Demirkoparan, Ismail January 2023 (has links)
Dataflow systems have become the norm for developing data-intensive computing applications. These systems provide transparent scalability and fault tolerance. For fault tolerance, many dataflow-system adopt a snapshotting approach which persists the state of an operator once it has received a snapshot marker on all its input channels. This approach requires channels to be blocked for potentially prolonged durations until all other input channels have received their markers to guarantee that no events from the future make it into the operator’s present state snapshot. Alignment can for this reason have a severe performance impact. In particular, for black-box user-defined operators, the system has no knowledge about how events from different channels affect the operator’s state. Thus, the system must conservatively assume that all events affect the same state and align all channels. In this thesis, we argue that alignment between two channels is unnecessary if messages from those channels are not written to the same output channel. We propose a snapshotting approach for the fault tolerance and call it partial approach. The partial approach does not require alignment when an operator’s input channels are independent. Two input channels are independent if their events do not affect the same state and are never written to the same output channel. We propose the use of static code analysis to identify such dependencies. To enable this analysis, we translate operators into finite state machines that make the operator’s state explicit. As a proof of concept, we extend the implementation of Arc-Lang, an existing dataflow language, so that applications written in it transparently execute with fault tolerance. We evaluate our approach by comparing it to a baseline eager approach that always requires alignment between the input channels. The conducted experiments’ results show that the partial approach performs about 47 % better than the eager approach when the streaming sources are producing data at different velocities. / Dataflödessystem har blivit normen för utveckling av dataintensiva datorapplikationer. Dessa system erbjuder transparent skalbarhet och felhantering. För felhantering adopterar många dataflödessystem en snapshot-approach som sparar en operatörs tillstånd när den har fått en snapshot-markör på alla sina ingångskanaler. Denna metod kräver att kanalerna blockeras under möjligen förlängda tidsperioder tills alla andra ingångskanaler har fått sina markörer, vilket görs för att garantera att inga händelser från framtiden når operatörens nuvarande tillstånd. Synkronisering mellan kanaler kan därför ha en allvarlig prestandapåverkan. Särskilt för black-box användardefinierade operatörer där systemet inte har kunskap om hur händelser från olika kanaler påverkar operatörens tillstånd. Systemet måste därför konservativt anta att alla händelser påverkar samma tillstånd och synkronisera alla kanaler. I denna avhandling argumenterar vi för att synkroniseringen mellan två kanaler inte är nödvändig om meddelanden från de kanalerna inte skrivs till samma utgångskanal. Vi föreslår en snapshot-approach för felhantering och kallar den för partial-approach. Partial-approach kräver inte justering när en operatörs ingångskanaler är oberoende. Två ingångskanaler är oberoende om deras händelser inte påverkar samma tillstånd och aldrig skrivs till samma utgångskanal. Vi föreslår användning av statisk kodanalys för att identifiera sådana beroenden. För att möjliggöra denna analys översätter vi operatörer till finite state machines som gör operatörens tillstånd explicit. För att bevisa konceptet utökar vi implementeringen av Arc-Lang, vilket är en existerande dataflödesspråk, så att program skrivna i den transparent körs med felhantering. Vi utvärderar vår approach genom att jämföra den med en baseline eager-approach som alltid kräver justering mellan ingångskanalerna. Resultaten från de genomförda experimenten visar att partial-approach presterar cirka 47 % bättre än eager-approach när sourcestreams producerar data i otakt.
45

Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing

López Huguet, Sergio 02 September 2021 (has links)
Tesis por compendio / [ES] Las aplicaciones científicas implican generalmente una carga computacional variable y no predecible a la que las instituciones deben hacer frente variando dinámicamente la asignación de recursos en función de las distintas necesidades computacionales. Las aplicaciones científicas pueden necesitar grandes requisitos. Por ejemplo, una gran cantidad de recursos computacionales para el procesado de numerosos trabajos independientes (High Throughput Computing o HTC) o recursos de alto rendimiento para la resolución de un problema individual (High Performance Computing o HPC). Los recursos computacionales necesarios en este tipo de aplicaciones suelen acarrear un coste muy alto que puede exceder la disponibilidad de los recursos de la institución o estos pueden no adaptarse correctamente a las necesidades de las aplicaciones científicas, especialmente en el caso de infraestructuras preparadas para la ejecución de aplicaciones de HPC. De hecho, es posible que las diferentes partes de una aplicación necesiten distintos tipos de recursos computacionales. Actualmente las plataformas de servicios en la nube se han convertido en una solución eficiente para satisfacer la demanda de las aplicaciones HTC, ya que proporcionan un abanico de recursos computacionales accesibles bajo demanda. Por esta razón, se ha producido un incremento en la cantidad de clouds híbridos, los cuales son una combinación de infraestructuras alojadas en servicios en la nube y en las propias instituciones (on-premise). Dado que las aplicaciones pueden ser procesadas en distintas infraestructuras, actualmente la portabilidad de las aplicaciones se ha convertido en un aspecto clave. Probablemente, las tecnologías de contenedores son la tecnología más popular para la entrega de aplicaciones gracias a que permiten reproducibilidad, trazabilidad, versionado, aislamiento y portabilidad. El objetivo de la tesis es proporcionar una arquitectura y una serie de servicios para proveer infraestructuras elásticas híbridas de procesamiento que puedan dar respuesta a las diferentes cargas de trabajo. Para ello, se ha considerado la utilización de elasticidad vertical y horizontal desarrollando una prueba de concepto para proporcionar elasticidad vertical y se ha diseñado una arquitectura cloud elástica de procesamiento de Análisis de Datos. Después, se ha trabajo en una arquitectura cloud de recursos heterogéneos de procesamiento de imágenes médicas que proporciona distintas colas de procesamiento para trabajos con diferentes requisitos. Esta arquitectura ha estado enmarcada en una colaboración con la empresa QUIBIM. En la última parte de la tesis, se ha evolucionado esta arquitectura para diseñar e implementar un cloud elástico, multi-site y multi-tenant para el procesamiento de imágenes médicas en el marco del proyecto europeo PRIMAGE. Esta arquitectura utiliza un almacenamiento distribuido integrando servicios externos para la autenticación y la autorización basados en OpenID Connect (OIDC). Para ello, se ha desarrollado la herramienta kube-authorizer que, de manera automatizada y a partir de la información obtenida en el proceso de autenticación, proporciona el control de acceso a los recursos de la infraestructura de procesamiento mediante la creación de las políticas y roles. Finalmente, se ha desarrollado otra herramienta, hpc-connector, que permite la integración de infraestructuras de procesamiento HPC en infraestructuras cloud sin necesitar realizar cambios en la infraestructura HPC ni en la arquitectura cloud. Cabe destacar que, durante la realización de esta tesis, se han utilizado distintas tecnologías de gestión de trabajos y de contenedores de código abierto, se han desarrollado herramientas y componentes de código abierto y se han implementado recetas para la configuración automatizada de las distintas arquitecturas diseñadas desde la perspectiva DevOps. / [CA] Les aplicacions científiques impliquen generalment una càrrega computacional variable i no predictible a què les institucions han de fer front variant dinàmicament l'assignació de recursos en funció de les diferents necessitats computacionals. Les aplicacions científiques poden necessitar grans requisits. Per exemple, una gran quantitat de recursos computacionals per al processament de nombrosos treballs independents (High Throughput Computing o HTC) o recursos d'alt rendiment per a la resolució d'un problema individual (High Performance Computing o HPC). Els recursos computacionals necessaris en aquest tipus d'aplicacions solen comportar un cost molt elevat que pot excedir la disponibilitat dels recursos de la institució o aquests poden no adaptar-se correctament a les necessitats de les aplicacions científiques, especialment en el cas d'infraestructures preparades per a l'avaluació d'aplicacions d'HPC. De fet, és possible que les diferents parts d'una aplicació necessiten diferents tipus de recursos computacionals. Actualment les plataformes de servicis al núvol han esdevingut una solució eficient per satisfer la demanda de les aplicacions HTC, ja que proporcionen un ventall de recursos computacionals accessibles a demanda. Per aquest motiu, s'ha produït un increment de la quantitat de clouds híbrids, els quals són una combinació d'infraestructures allotjades a servicis en el núvol i a les mateixes institucions (on-premise). Donat que les aplicacions poden ser processades en diferents infraestructures, actualment la portabilitat de les aplicacions s'ha convertit en un aspecte clau. Probablement, les tecnologies de contenidors són la tecnologia més popular per a l'entrega d'aplicacions gràcies al fet que permeten reproductibilitat, traçabilitat, versionat, aïllament i portabilitat. L'objectiu de la tesi és proporcionar una arquitectura i una sèrie de servicis per proveir infraestructures elàstiques híbrides de processament que puguen donar resposta a les diferents càrregues de treball. Per a això, s'ha considerat la utilització d'elasticitat vertical i horitzontal desenvolupant una prova de concepte per proporcionar elasticitat vertical i s'ha dissenyat una arquitectura cloud elàstica de processament d'Anàlisi de Dades. Després, s'ha treballat en una arquitectura cloud de recursos heterogenis de processament d'imatges mèdiques que proporciona distintes cues de processament per a treballs amb diferents requisits. Aquesta arquitectura ha estat emmarcada en una col·laboració amb l'empresa QUIBIM. En l'última part de la tesi, s'ha evolucionat aquesta arquitectura per dissenyar i implementar un cloud elàstic, multi-site i multi-tenant per al processament d'imatges mèdiques en el marc del projecte europeu PRIMAGE. Aquesta arquitectura utilitza un emmagatzemament integrant servicis externs per a l'autenticació i autorització basats en OpenID Connect (OIDC). Per a això, s'ha desenvolupat la ferramenta kube-authorizer que, de manera automatitzada i a partir de la informació obtinguda en el procés d'autenticació, proporciona el control d'accés als recursos de la infraestructura de processament mitjançant la creació de les polítiques i rols. Finalment, s'ha desenvolupat una altra ferramenta, hpc-connector, que permet la integració d'infraestructures de processament HPC en infraestructures cloud sense necessitat de realitzar canvis en la infraestructura HPC ni en l'arquitectura cloud. Es pot destacar que, durant la realització d'aquesta tesi, s'han utilitzat diferents tecnologies de gestió de treballs i de contenidors de codi obert, s'han desenvolupat ferramentes i components de codi obert, i s'han implementat receptes per a la configuració automatitzada de les distintes arquitectures dissenyades des de la perspectiva DevOps. / [EN] Scientific applications generally imply a variable and an unpredictable computational workload that institutions must address by dynamically adjusting the allocation of resources to their different computational needs. Scientific applications could require a high capacity, e.g. the concurrent usage of computational resources for processing several independent jobs (High Throughput Computing or HTC) or a high capability by means of using high-performance resources for solving complex problems (High Performance Computing or HPC). The computational resources required in this type of applications usually have a very high cost that may exceed the availability of the institution's resources or they are may not be successfully adapted to the scientific applications, especially in the case of infrastructures prepared for the execution of HPC applications. Indeed, it is possible that the different parts that compose an application require different type of computational resources. Nowadays, cloud service platforms have become an efficient solution to meet the need of HTC applications as they provide a wide range of computing resources accessible on demand. For this reason, the number of hybrid computational infrastructures has increased during the last years. The hybrid computation infrastructures are the combination of infrastructures hosted in cloud platforms and the computation resources hosted in the institutions, which are named on-premise infrastructures. As scientific applications can be processed on different infrastructures, the application delivery has become a key issue. Nowadays, containers are probably the most popular technology for application delivery as they ease reproducibility, traceability, versioning, isolation, and portability. The main objective of this thesis is to provide an architecture and a set of services to build up hybrid processing infrastructures that fit the need of different workloads. Hence, the thesis considered aspects such as elasticity and federation. The use of vertical and horizontal elasticity by developing a proof of concept to provide vertical elasticity on top of an elastic cloud architecture for data analytics. Afterwards, an elastic cloud architecture comprising heterogeneous computational resources has been implemented for medical imaging processing using multiple processing queues for jobs with different requirements. The development of this architecture has been framed in a collaboration with a company called QUIBIM. In the last part of the thesis, the previous work has been evolved to design and implement an elastic, multi-site and multi-tenant cloud architecture for medical image processing has been designed in the framework of a European project PRIMAGE. This architecture uses a storage integrating external services for the authentication and authorization based on OpenID Connect (OIDC). The tool kube-authorizer has been developed to provide access control to the resources of the processing infrastructure in an automatic way from the information obtained in the authentication process, by creating policies and roles. Finally, another tool, hpc-connector, has been developed to enable the integration of HPC processing infrastructures into cloud infrastructures without requiring modifications in both infrastructures, cloud and HPC. It should be noted that, during the realization of this thesis, different contributions to open source container and job management technologies have been performed by developing open source tools and components and configuration recipes for the automated configuration of the different architectures designed from the DevOps perspective. The results obtained support the feasibility of the vertical elasticity combined with the horizontal elasticity to implement QoS policies based on a deadline, as well as the feasibility of the federated authentication model to combine public and on-premise clouds. / López Huguet, S. (2021). Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172327 / Compendio
46

High Performance Scientific Computing over Hybrid Cloud Platforms

Calatrava Arroyo, Amanda 16 December 2016 (has links)
Tesis por compendio / Scientific applications generally require large computational requirements, memory and data management for their execution. Such applications have traditionally used high-performance resources, such as shared memory supercomputers, clusters of PCs with distributed memory, or resources from Grid infrastructures on which the application needs to be adapted to run successfully. In recent years, the advent of virtualization techniques, together with the emergence of Cloud Computing, has caused a major shift in the way these applications are executed. However, the execution management of scientific applications on high performance elastic platforms is not a trivial task. In this doctoral thesis, Elastic Cloud Computing Cluster (EC3) has been developed. EC3 is an open-source tool able to execute high performance scientific applications by creating self-managed cost-efficient virtual hybrid elastic clusters on top of IaaS Clouds. These self-managed clusters have the capability to adapt the size of the cluster, i.e. the number of nodes, to the workload, thus creating the illusion of a real cluster without requiring an investment beyond the actual usage. They can be fully customized and migrated from one provider to another, in an automatically and transparent process for the users and jobs running in the cluster. EC3 can also deploy hybrid clusters across on-premises and public Cloud resources, where on-premises resources are supplemented with public Cloud resources to accelerate the execution process. Different instance types and the use of spot instances combined with on-demand resources are also cluster configurations supported by EC3. Moreover, using spot instances, together with checkpointing techniques, the tool can significantly reduce the total cost of executions while introducing automatic fault tolerance. EC3 is conceived to facilitate the use of virtual clusters to users, that might not have an extensive knowledge about these technologies, but they can benefit from them. Thus, the tool offers two different interfaces for its users, a web interface where EC3 is exposed as a service for non-experienced users and a powerful command line interface. Moreover, this thesis explores the field of light-weight virtualization using containers as an alternative to the traditional virtualization solution based on virtual machines. This study analyzes the suitable scenario for the use of containers and proposes an architecture for the deployment of elastic virtual clusters based on this technology. Finally, to demonstrate the functionality and advantages of the tools developed during this thesis, this document includes several use cases covering different scenarios and fields of knowledge, such as structural analysis of buildings, astrophysics or biodiversity. / Las aplicaciones científicas generalmente precisan grandes requisitos de cómputo, memoria y gestión de datos para su ejecución. Este tipo de aplicaciones tradicionalmente ha empleado recursos de altas prestaciones, como supercomputadores de memoria compartida, clústers de PCs de memoria distribuida, o recursos provenientes de infraestructuras Grid, sobre los que se adaptaba la aplicación para que se ejecutara satisfactoriamente. El auge que han tenido las técnicas de virtualización en los últimos años, propiciando la aparición de la computación en la nube (Cloud Computing), ha provocado un importante cambio en la forma de ejecutar este tipo de aplicaciones. Sin embargo, la gestión de la ejecución de aplicaciones científicas sobre plataformas de computación elásticas de altas prestaciones no es una tarea trivial. En esta tesis doctoral se ha desarrollado Elastic Cloud Computing Cluster (EC3), una herramienta de código abierto capaz de llevar a cabo la ejecución de aplicaciones científicas de altas prestaciones creando para ello clústers virtuales, híbridos y elásticos, autogestionados y eficientes en cuanto a costes, sobre plataformas Cloud de tipo Infraestructura como Servicio (IaaS). Estos clústers autogestionados tienen la capacidad de adaptar su tamaño, es decir, el número de nodos, a la carga de trabajo, creando así la ilusión de un clúster real sin requerir una inversión por encima del uso actual. Además, son completamente configurables y pueden ser migrados de un proveedor a otro de manera automática y transparente a los usuarios y trabajos en ejecución en el cluster. EC3 también permite desplegar clústers híbridos sobre recursos Cloud públicos y privados, donde los recursos privados son complementados con recursos Cloud públicos para acelerar el proceso de ejecución. Otras configuraciones híbridas, como el empleo de diferentes tipos de instancias y el uso de instancias puntuales combinado con instancias bajo demanda son también soportadas por EC3. Además, el uso de instancias puntuales junto con técnicas de checkpointing permite a EC3 reducir significantemente el coste total de las ejecuciones a la vez que proporciona tolerancia a fallos. EC3 está concebido para facilitar el uso de clústers virtuales a los usuarios, que, aunque no tengan un conocimiento extenso sobre este tipo de tecnologías, pueden beneficiarse fácilmente de ellas. Por ello, la herramienta ofrece dos interfaces diferentes a sus usuarios, una interfaz web donde se expone EC3 como servicio para usuarios no experimentados y una potente interfaz de línea de comandos. Además, esta tesis doctoral se adentra en el campo de la virtualización ligera, mediante el uso de contenedores como alternativa a la solución tradicional de virtualización basada en máquinas virtuales. Este estudio analiza el escenario propicio para el uso de contenedores y propone una arquitectura para el despliegue de clusters virtuales elásticos basados en esta tecnología. Finalmente, para demostrar la funcionalidad y ventajas de las herramientas desarrolladas durante esta tesis, esta memoria recoge varios casos de uso que abarcan diferentes escenarios y campos de conocimiento, como estudios estructurales de edificios, astrofísica o biodiversidad. / Les aplicacions científiques generalment precisen grans requisits de còmput, de memòria i de gestió de dades per a la seua execució. Este tipus d'aplicacions tradicionalment hi ha empleat recursos d'altes prestacions, com supercomputadors de memòria compartida, clústers de PCs de memòria distribuïda, o recursos provinents d'infraestructures Grid, sobre els quals s'adaptava l'aplicació perquè s'executara satisfactòriament. L'auge que han tingut les tècniques de virtualitzaciò en els últims anys, propiciant l'aparició de la computació en el núvol (Cloud Computing), ha provocat un important canvi en la forma d'executar este tipus d'aplicacions. No obstant això, la gestió de l'execució d'aplicacions científiques sobre plataformes de computació elàstiques d'altes prestacions no és una tasca trivial. En esta tesi doctoral s'ha desenvolupat Elastic Cloud Computing Cluster (EC3), una ferramenta de codi lliure capaç de dur a terme l'execució d'aplicacions científiques d'altes prestacions creant per a això clústers virtuals, híbrids i elàstics, autogestionats i eficients quant a costos, sobre plataformes Cloud de tipus Infraestructura com a Servici (IaaS). Estos clústers autogestionats tenen la capacitat d'adaptar la seua grandària, es dir, el nombre de nodes, a la càrrega de treball, creant així la il·lusió d'un cluster real sense requerir una inversió per damunt de l'ús actual. A més, són completament configurables i poden ser migrats d'un proveïdor a un altre de forma automàtica i transparent als usuaris i treballs en execució en el cluster. EC3 també permet desplegar clústers híbrids sobre recursos Cloud públics i privats, on els recursos privats són complementats amb recursos Cloud públics per a accelerar el procés d'execució. Altres configuracions híbrides, com l'us de diferents tipus d'instàncies i l'ús d'instàncies puntuals combinat amb instàncies baix demanda són també suportades per EC3. A més, l'ús d'instàncies puntuals junt amb tècniques de checkpointing permet a EC3 reduir significantment el cost total de les execucions al mateix temps que proporciona tolerància a fallades. EC3e stà concebut per a facilitar l'ús de clústers virtuals als usuaris, que, encara que no tinguen un coneixement extensiu sobre este tipus de tecnologies, poden beneficiar-se fàcilment d'elles. Per això, la ferramenta oferix dos interfícies diferents dels seus usuaris, una interfície web on s'exposa EC3 com a servici per a usuaris no experimentats i una potent interfície de línia d'ordres. A més, esta tesi doctoral s'endinsa en el camp de la virtualitzaciò lleugera, per mitjà de l'ús de contenidors com a alternativa a la solució tradicional de virtualitzaciò basada en màquines virtuals. Este estudi analitza l'escenari propici per a l'ús de contenidors i proposa una arquitectura per al desplegament de clusters virtuals elàstics basats en esta tecnologia. Finalment, per a demostrar la funcionalitat i avantatges de les ferramentes desenrotllades durant esta tesi, esta memòria arreplega diversos casos d'ús que comprenen diferents escenaris i camps de coneixement, com a estudis estructurals d'edificis, astrofísica o biodiversitat. / Calatrava Arroyo, A. (2016). High Performance Scientific Computing over Hybrid Cloud Platforms [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/75265 / Compendio
47

Schedules for Dynamic Bidirectional Simulations on Parallel Computers / Schemata für dynamische bidirektionale Simulationen auf Parallelrechnern

Lehmann, Uwe 30 April 2003 (has links) (PDF)
For adjoint calculations, parameter estimation, and similar purposes one may need to reverse the execution of a computer program. The simplest option is to record a complete execution log and then to read it backwards. This requires massive amounts of storage. Instead one may generate the execution log piecewise by restarting the ``forward'' calculation repeatedly from suitably placed checkpoints. This thesis extends the theoretical results of the parallel reversal schedules. First a algorithm was constructed which carries out the ``forward'' calculation and distributes checkpoints in a way, such that the reversal calculation can be started at any time. This approach provides adaptive parallel reversal schedules for simulations where the number of time steps is not known a-priori. The number of checkpoints and processors used is optimal at any time. Further, an algorithm was described which makes is possible to restart the initial computer program during the program reversal. Again, this can be done without any additional computation at any time. Hence, optimal parallel reversal schedules for the bidirectional simulation are provided by this thesis. / Bei der Berechnung von Adjungierten, zum Debuggen und für ähnliche Anwendungen kann man die Umkehr der entsprechenden Programmauswertung verwenden. Der einfachste Ansatz, nämlich das Erstellen einer kompletten Mitschrift der Vorwärtsrechnung, welche anschließend rückwärts gelesen wird, verursacht einen enormen Speicherplatzbedarf. Als Alternative dazu kann man die Mitschrift auch stückweise erzeugen, indem die Programmauswertung von passend gewählten Checkpoints wiederholt gestartet wird. In dieser Arbeit wird die Theorie der optimalen parallelen Umkehrschemata erweitert. Zum einen erfolgt die Konstruktion von adaptiven parallelen Umkehrschemata. Dafür wird ein Algorithmus beschrieben, der es durch die Nutzung von mehreren Prozessen ermöglicht, Checkpoints so zu verteilen, daß die Umkehrung des Programmes jederzeit ohne Zeitverlust erfolgen kann. Hierbei bleibt die Zahl der verwendeten Checkpoints und Prozesse innerhalb der bekannten Optimalitätsgrenzen. Zum anderen konnte für die adaptiven parallelen Umkehrschemata ein Algorithmus entwickelt werden, welcher ein Restart der eigentlichen Programmauswertung basierend auf der laufenden Programmumkehr erlaubt. Dieser Restart kann wieder jederzeit ohne Zeitverlust erfolgen und die entstehenden Checkpointverteilung erfüllen wieder sowohl Optimalitäts- als auch die Adaptivitätskriterien. Zusammenfassend wurden damit in dieser Arbeit Schemata konstruiert, die bidirektionale Simulationen ermöglichen.
48

Improving message logging protocols towards extreme-scale HPC systems / Amélioration des protocoles de journalisation des messages vers des systèmes HPC extrême-échelle

Martsinkevich, Tatiana V. 22 September 2015 (has links)
Les machines pétascale qui existent aujourd'hui ont un temps moyen entre pannes de plusieurs heures. Il est prévu que dans les futurs systèmes ce temps diminuera. Pour cette raison, les applications qui fonctionneront sur ces systèmes doivent être capables de tolérer des défaillances fréquentes. Aujourd'hui, le moyen le plus commun de le faire est d'utiliser le mécanisme de retour arrière global où l'application fait des sauvegardes périodiques à partir d’un point de reprise. Si un processus s'arrête à cause d'une défaillance, tous les processus reviennent en arrière et se relancent à partir du dernier point de reprise. Cependant, cette solution deviendra infaisable à grande échelle en raison des coûts de l'énergie et de l'utilisation inefficace des ressources. Dans le contexte des applications MPI, les protocoles de journalisation des messages offrent un meilleur confinement des défaillances car ils ne demandent que le redémarrage du processus qui a échoué, ou parfois d’un groupe de processus limité. Par contre, les protocoles existants ont souvent un surcoût important en l’absence de défaillances qui empêchent leur utilisation à grande échelle. Ce surcoût provient de la nécessité de sauvegarder de façon fiable tous les événements non-déterministes afin de pouvoir correctement restaurer l'état du processus en cas de défaillance. Ensuite, comme les journaux de messages sont généralement stockés dans la mémoire volatile, la journalisation risque de nécessiter une large utilisation de la mémoire. Une autre tendance importante dans le domaine des HPC est le passage des applications MPI simples aux nouveaux modèles de programmation hybrides tels que MPI + threads ou MPI + tâches en réponse au nombre croissant de cœurs par noeud. Cela offre l’opportunité de gérer les défaillances au niveau du thread / de la tâche contrairement à l'approche conventionnelle qui traite les défaillances au niveau du processus. Par conséquent, le travail de cette thèse se compose de trois parties. Tout d'abord, nous présentons un protocole de journalisation hiérarchique pour atténuer une défaillance de processus. Le protocole s'appelle Scalable Pattern-Based Checkpointing et il exploite un nouveau modèle déterministe appelé channel-determinism ainsi qu’une nouvelle relation always-happens-before utilisée pour mettre partiellement en ordre les événements de l'application. Le protocole est évolutif, son surcoût pendant l'exécution sans défaillance est limité, il n'exige l'enregistrement d'aucun évènement et, enfin, il a une reprise entièrement distribuée. Deuxièmement, afin de résoudre le problème de la limitation de la mémoire sur les nœuds de calcul, nous proposons d'utiliser des ressources dédiées supplémentaires, appelées logger nodes. Tous les messages qui ne rentrent pas dans la mémoire du nœud de calcul sont envoyés aux logger nodes et sauvegardés dans leur mémoire. À travers de nos expériences nous montrons que cette approche est réalisable et, associée avec un protocole de journalisation hiérarchique comme le SPBC, les logger nodes peuvent être une solution ultime au problème de mémoire limitée sur les nœuds de calcul. Troisièmement, nous présentons un protocole de tolérance aux défaillances pour des applications hybrides qui adoptent le modèle de programmation MPI + tâches. Ce protocole s'utilise pour tolérer des erreurs détectées non corrigées qui se produisent lors de l'exécution d'une tâche. Normalement, une telle erreur provoque une exception du système ce qui provoque un arrêt brutal de l'application. Dans ce cas, l'application doit redémarrer à partir du dernier point de reprise. Nous combinons la sauvegarde des données de la tâche avec une journalisation des messages afin d’aider à la reprise de la tâche qui a subi une défaillance. Ainsi, nous évitons le redémarrage au niveau du processus, plus coûteux. Nous démontrons les avantages de ce protocole avec l'exemple des applications hybrides MPI + OmpSs. / Existing petascale machines have a Mean Time Between Failures (MTBF) in the order of several hours. It is predicted that in the future systems the MTBF will decrease. Therefore, applications that will run on these systems need to be able to tolerate frequent failures. Currently, the most common way to do this is to use global application checkpoint/restart scheme: if some process fails the whole application rolls back the its last checkpointed state and re-executes from that point. This solution will become infeasible at large scale, due to its energy costs and inefficient resource usage. Therefore fine-grained failure containment is a strongly required feature for the fault tolerance techniques that target large-scale executions. In the context of message passing MPI applications, message logging fault tolerance protocols provide good failure containment as they require restart of only one process or, in some cases, a bounded number of processes. However, existing logging protocols experience a number of issues which prevent their usage at large scale. In particular, they tend to have high failure-free overhead because they usually need to store reliably any nondeterministic events happening during the execution of a process in order to correctly restore its state in recovery. Next, as message logs are usually stored in the volatile memory, logging may incur large memory footprint, especially in communication-intensive applications. This is particularly important because the future exascale systems expect to have less memory available per core. Another important trend in HPC is switching from MPI-only applications to hybrid programming models like MPI+threads and MPI+tasks in response to the increasing number of cores per node. This gives opportunities for employing fault tolerance solutions that handle faults on the level of threads/tasks. Such approach has even better failure containment compared to message logging protocols which handle failures on the level of processes. Thus, the work in these dissertation consists of three parts. First, we present a hierarchical log-based fault tolerance solution, called Scalable Pattern-Based Checkpointing (SPBC) for mitigating process fail-stop failures. The protocol leverages a new deterministic model called channel-determinism and a new always-happens-before relation for partial ordering of events in the application. The protocol is scalable, has low overhead in failure-free execution and does not require logging any events, provides perfect failure containment and has a fully distributed recovery. Second, to address the memory limitation problem on compute nodes, we propose to use additional dedicated resources, or logger nodes. All the logs that do not fit in the memory of compute nodes are sent to the logger nodes and kept in their memory. In a series of experiments we show that not only this approach is feasible, but, combined with a hierarchical logging scheme like the SPBC, logger nodes can be an ultimate solution to the problem of memory limitation for logging protocols. Third, we present a log-based fault tolerance protocol for hybrid applications adopting MPI+tasks programming model. The protocol is used to tolerate detected uncorrected errors (DUEs) that happen during execution of a task. Normally, a DUE caused the system to raise an exception which lead to an application crash. Then, the application has to restart from a checkpoint. In the proposed solution, we combine task checkpointing with message logging in order to support task re-execution. Such task-level failure containment can be beneficial in large-scale executions because it avoids the more expensive process-level restart. We demonstrate the advantages of this protocol on the example of hybrid MPI+OmpSs applications.
49

Schedules for Dynamic Bidirectional Simulations on Parallel Computers

Lehmann, Uwe 19 May 2003 (has links)
For adjoint calculations, parameter estimation, and similar purposes one may need to reverse the execution of a computer program. The simplest option is to record a complete execution log and then to read it backwards. This requires massive amounts of storage. Instead one may generate the execution log piecewise by restarting the ``forward'' calculation repeatedly from suitably placed checkpoints. This thesis extends the theoretical results of the parallel reversal schedules. First a algorithm was constructed which carries out the ``forward'' calculation and distributes checkpoints in a way, such that the reversal calculation can be started at any time. This approach provides adaptive parallel reversal schedules for simulations where the number of time steps is not known a-priori. The number of checkpoints and processors used is optimal at any time. Further, an algorithm was described which makes is possible to restart the initial computer program during the program reversal. Again, this can be done without any additional computation at any time. Hence, optimal parallel reversal schedules for the bidirectional simulation are provided by this thesis. / Bei der Berechnung von Adjungierten, zum Debuggen und für ähnliche Anwendungen kann man die Umkehr der entsprechenden Programmauswertung verwenden. Der einfachste Ansatz, nämlich das Erstellen einer kompletten Mitschrift der Vorwärtsrechnung, welche anschließend rückwärts gelesen wird, verursacht einen enormen Speicherplatzbedarf. Als Alternative dazu kann man die Mitschrift auch stückweise erzeugen, indem die Programmauswertung von passend gewählten Checkpoints wiederholt gestartet wird. In dieser Arbeit wird die Theorie der optimalen parallelen Umkehrschemata erweitert. Zum einen erfolgt die Konstruktion von adaptiven parallelen Umkehrschemata. Dafür wird ein Algorithmus beschrieben, der es durch die Nutzung von mehreren Prozessen ermöglicht, Checkpoints so zu verteilen, daß die Umkehrung des Programmes jederzeit ohne Zeitverlust erfolgen kann. Hierbei bleibt die Zahl der verwendeten Checkpoints und Prozesse innerhalb der bekannten Optimalitätsgrenzen. Zum anderen konnte für die adaptiven parallelen Umkehrschemata ein Algorithmus entwickelt werden, welcher ein Restart der eigentlichen Programmauswertung basierend auf der laufenden Programmumkehr erlaubt. Dieser Restart kann wieder jederzeit ohne Zeitverlust erfolgen und die entstehenden Checkpointverteilung erfüllen wieder sowohl Optimalitäts- als auch die Adaptivitätskriterien. Zusammenfassend wurden damit in dieser Arbeit Schemata konstruiert, die bidirektionale Simulationen ermöglichen.
50

Optimization of memory management on distributed machine

Ha, Viet Hai 05 October 2012 (has links) (PDF)
In order to explore further the capabilities of parallel computing architectures such as grids, clusters, multi-processors and more recently, clouds and multi-cores, an easy-to-use parallel language is an important challenging issue. From the programmer's point of view, OpenMP is very easy to use with its ability to support incremental parallelization, features for dynamically setting the number of threads and scheduling strategies. However, as initially designed for shared memory systems, OpenMP is usually limited on distributed memory systems to intra-nodes' computations. Many attempts have tried to port OpenMP on distributed systems. The most emerged approaches mainly focus on exploiting the capabilities of a special network architecture and therefore cannot provide an open solution. Others are based on an already available software solution such as DMS, MPI or Global Array and, as a consequence, they meet difficulties to become a fully-compliant and high-performance implementation of OpenMP. As yet another attempt to built an OpenMP compliant implementation for distributed memory systems, CAPE − which stands for Checkpointing Aide Parallel Execution − has been developed which with the following idea: when reaching a parallel section, the master thread is dumped and its image is sent to slaves; then, each slave executes a different thread; at the end of the parallel section, slave threads extract and return to the master thread the list of all modifications that has been locally performed; the master includes these modifications and resumes its execution. In order to prove the feasibility of this paradigm, the first version of CAPE was implemented using complete checkpoints. However, preliminary analysis showed that the large amount of data transferred between threads and the extraction of the list of modifications from complete checkpoints lead to weak performance. Furthermore, this version was restricted to parallel problems satisfying the Bernstein's conditions, i.e. it did not solve the requirements of shared data. This thesis aims at presenting the approaches we proposed to improve CAPE' performance and to overcome the restrictions on shared data. First, we developed DICKPT which stands for Discontinuous Incremental Checkpointing, an incremental checkpointing technique that supports the ability to save incremental checkpoints discontinuously during the execution of a process. Based on the DICKPT, the execution speed of the new version of CAPE was significantly increased. For example, the time to compute a large matrix-matrix product on a desktop cluster has become very similar to the execution time of the same optimized MPI program. Moreover, the speedup associated with this new version for various number of threads is quite linear for different problem sizes. In the side of shared data, we proposed UHLRC, which stands for Updated Home-based Lazy Release Consistency, a modified version of the Home-based Lazy Release Consistency (HLRC) memory model, to make it more appropriate to the characteristics of CAPE. Prototypes and algorithms to implement the synchronization and OpenMP data-sharing clauses and directives are also specified. These two works ensures the ability for CAPE to respect shared-data behavior

Page generated in 0.0771 seconds