Global ETD Search

21	Efficient in-situ workflows for time-critical applications on heterogeneous ecosystems Item Feng Li (16627272) 21 July 2023 (has links) <p>In-situ workflows are a special class of scientific workflows, where different component applications (such as simulation, visualization, analysis) run concurrently, and data flows continuously between components during the whole workflow lifetime. Traditionally, simulations write large amounts of output data to persistent storage, which are later read for future analysis/visualization. In comparison, in-situ workflows allow analysis/visualization components to consume simulation data while the simulations are still running and thus reduce the I/O overhead. There are recent research works that focus on providing data transport libraries to help compose a group of applications into an integral in-situ workflow. However, only a few ``performance-oriented'' studies exist for in-situ workflows, and most of these works focus on workflows with simple structures (e.g., single producer and single consumer), also without consideration of heterogeneous environments for in-situ workflows. Being able to efficiently utilize heterogeneous computing resources such as multiple Clouds and HPCs can significantly accelerate real-world in-situ workflows, and benefit applications that require both significant computation power and real-time outputs(e.g., identifying abnormal patterns in fluid dynamics). The goal of this dissertation is to provide resource planning algorithms and runtime support, to improve in-situ workflow performance on heterogeneous environments.</p> <p><br></p> <p>This dissertation first investigates the emerging applications of in-situ workflows, which usually include parallel simulation, visualization, and analysis components. Two representative real-world in-situ workflows are studied in details-- a real-time CFD machine learning/visualization workflow and a wildfire spreading workflow. These workflows showcase the capability of in-situ workflows: e.g., decoupled and accelerated computation and fast near-real-time response time, however, there is a lack of resource planning and runtime support for general in-situ workflows. For resource planning, I first formulate the optimization problem, and then design and implement a heuristic algorithm called ``SNL'' (Scheduled-Neighbor-Lookup). SNL considers the pipelined execution pattern of in-situ workflows, and guides the resource planning of complex in-situ workflows to achieve higher workflow throughput. For the runtime support, I design and implement the ``INSTANT'' runtime framework, a runtime framework to configure, plan, launch, and monitor in-situ workflows for distributed computing environments. INSTANT provides intuitive interfaces to compose abstract in-situ workflows, manages in-site and cross-site data transfers with ADIOS2, and supports resource planning using profiled performance data. Experiments with the two use cases show that INSTANT can efficiently streamline the orchestration of complex in-situ workflows, and the resource planning capability allows INSTANT to plan and carry out fast workflow execution at different computing resource availabilities.</p> Distributed systems and algorithms High performance computing in-situ workflows scientific workflows high-performance computing resource planning runtime system
22	Designing scientific workflow following a structure and provenance-aware strategy / Conception de workflows scientifiques fondée sur la structure et la provenance Chen, Jiuqiang 11 October 2013 (has links) Les expériences bioinformatiques sont généralement effectuées à l'aide de workflows scientifiques dans lesquels les tâches sont enchaînées les unes aux autres pour former des structures de graphes très complexes et imbriquées. Les systèmes de workflows scientifiques ont ensuite été développés pour guider les utilisateurs dans la conception et l'exécution de workflows. Un avantage de ces systèmes par rapport aux approches traditionnelles est leur capacité à mémoriser automatiquement la provenance (ou lignage) des produits de données intermédiaires et finaux générés au cours de l'exécution du workflow. La provenance d'un produit de données contient des informations sur la façon dont le produit est dérivé, et est cruciale pour permettre aux scientifiques de comprendre, reproduire, et vérifier les résultats scientifiques facilement. Pour plusieurs raisons, la complexité du workflow et des structures d'exécution du workflow est en augmentation au fil du temps, ce qui a un impact évident sur la réutilisation des workflows scientifiques.L'objectif global de cette thèse est d'améliorer la réutilisation des workflows en fournissant des stratégies visant à réduire la complexité des structures de workflow tout en préservant la provenance. Deux stratégies sont introduites. Tout d'abord, nous proposons une approche de réécriture de la structure du graphe de n'importe quel workflow scientifique (classiquement représentée comme un graphe acyclique orienté (DAG)) dans une structure plus simple, à savoir une structure série-parallèle (SP) tout en préservant la provenance. Les SP-graphes sont simples et bien structurés, ce qui permet de mieux distinguer les principales étapes du workflow. En outre, d'un point de vue plus formel, on peut utiliser des algorithmes polynomiaux pour effectuer des opérations complexes fondées sur les graphiques (par exemple, la comparaison de workflows, ce qui est directement lié au problème d’homomorphisme de sous-graphes) lorsque les workflows ont des SP-structures alors que ces opérations sont reliées à des problèmes NP-hard pour des graphes qui sont des DAG sans aucune restriction sur leur structure. Nous avons introduit la notion de préservation de la provenance, conçu l’algorithme de réécriture SPFlow et réalisé l’outil associé.Deuxièmement, nous proposons une méthodologie avec une technique capable de réduire la redondance présente dans les workflow (en supprimant les occurrences inutiles de tâches). Plus précisément, nous détectons des « anti-modèles », un terme largement utilisé dans le domaine de la conception de programme, pour indiquer l'utilisation de formes idiomatiques qui mènent à une conception trop compliquée, et qui doit donc être évitée. Nous avons ainsi conçu l'algorithme DistillFlow qui est capable de transformer un workflow donné en un workflow sémantiquement équivalent «distillé», c’est-à-dire, qui est libre ou partiellement libre des anti-modèles et possède une structure plus concise et plus simple. Les deux principales approches de cette thèse (à savoir, SPFlow et DistillFlow) sont basées sur un modèle de provenance que nous avons introduit pour représenter la structure de la provenance des exécutions du workflowl. La notion de «provenance-équivalence» qui détermine si deux workflows ont la même signification est également au centre de notre travail. Nos solutions ont été testées systématiquement sur de grandes collections de workflows réels, en particulier avec le système Taverna. Nos outils sont disponibles à l'adresse: https://www.lri.fr/~chenj/. / Bioinformatics experiments are usually performed using scientific workflows in which tasks are chained together forming very intricate and nested graph structures. Scientific workflow systems have then been developed to guide users in the design and execution of workflows. An advantage of these systems over traditional approaches is their ability to automatically record the provenance (or lineage) of intermediate and final data products generated during workflow execution. The provenance of a data product contains information about how the product was derived, and it is crucial for enabling scientists to easily understand, reproduce, and verify scientific results. For several reasons, the complexity of workflow and workflow execution structures is increasing over time, which has a clear impact on scientific workflows reuse.The global aim of this thesis is to enhance workflow reuse by providing strategies to reduce the complexity of workflow structures while preserving provenance. Two strategies are introduced.First, we propose an approach to rewrite the graph structure of any scientific workflow (classically represented as a directed acyclic graph (DAG)) into a simpler structure, namely, a series-parallel (SP) structure while preserving provenance. SP-graphs are simple and layered, making the main phases of workflow easier to distinguish. Additionally, from a more formal point of view, polynomial-time algorithms for performing complex graph-based operations (e.g., comparing workflows, which is directly related to the problem of subgraph homomorphism) can be designed when workflows have SP-structures while such operations are related to an NP-hard problem for DAG structures without any restriction on their structures. The SPFlow rewriting and provenance-preserving algorithm and its associated tool are thus introduced.Second, we provide a methodology together with a technique able to reduce the redundancy present in workflows (by removing unnecessary occurrences of tasks). More precisely, we detect "anti-patterns", a term broadly used in program design to indicate the use of idiomatic forms that lead to over-complicated design, and which should therefore be avoided. We thus provide the DistillFlow algorithm able to transform a workflow into a distilled semantically-equivalent workflow, which is free or partly free of anti-patterns and has a more concise and simpler structure.The two main approaches of this thesis (namely, SPFlow and DistillFlow) are based on a provenance model that we have introduced to represent the provenance structure of the workflow executions. The notion of provenance-equivalence which determines whether two workflows have the same meaning is also at the center of our work. Our solutions have been systematically tested on large collections of real workflows, especially from the Taverna system. Our approaches are available for use at https://www.lri.fr/~chenj/. Workflow scientifique Provenance Provenance-équivalence Graphes séries-parallèles Taverna Anti-modèles Scientific workflows Provenance Graph rewriting Series-parallel graphs Taverna Anti-patterns
23	Uma abordagem para linha de produtos de software cientíﬁco baseada em ontologia e workﬂow Costa, Gabriella Castro Barbosa 27 February 2013 (has links) Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-05-31T17:53:13Z No. of bitstreams: 1 gabriellacastrobarbosacosta.pdf: 2243060 bytes, checksum: 0aef87199975808e0973490875ce39b5 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-06-01T11:50:00Z (GMT) No. of bitstreams: 1 gabriellacastrobarbosacosta.pdf: 2243060 bytes, checksum: 0aef87199975808e0973490875ce39b5 (MD5) / Made available in DSpace on 2017-06-01T11:50:00Z (GMT). No. of bitstreams: 1 gabriellacastrobarbosacosta.pdf: 2243060 bytes, checksum: 0aef87199975808e0973490875ce39b5 (MD5) Previous issue date: 2013-02-27 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Uma forma de aprimorar a reutilização e a manutenção de uma família de produtos de software é através da utilização de uma abordagem de Linha de Produtos de Software (LPS). Em algumas situações, tais como aplicações cientíﬁcas para uma determinada área, é vantajoso desenvolver uma coleção de produtos de software relacionados, utilizando uma abordagem de LPS. Linhas de Produtos de Software Cientíﬁco (LPSC) diferem-se de Li nhas de Produtos de Software pelo fato de que LPSC fazem uso de um modelo abstrato de workﬂow cientíﬁco. Esse modelo abstrato de workﬂow é deﬁnido de acordo com o domínio cientíﬁco e, através deste workﬂow, os produtos da LPSC serão instanciados. Analisando as diﬁculdades em especiﬁcar experimentos cientíﬁcos e considerando a necessidade de composição de aplicações cientíﬁcas para a sua implementação, constata-se a necessidade de um suporte semântico mais adequado para a fase de análise de domínio. Para tanto, este trabalho propõe uma abordagem baseada na associação de modelo de features e onto logias, denominada PL-Science, para apoiar a especiﬁcação e a condução de experimentos cientíﬁcos. A abordagem PL-Science, que considera o contexto de LPSC, visa auxiliar os cientistas através de um workﬂow que engloba as aplicações cientíﬁcas de um dado experimento. Usando os conceitos de LPS, os cientistas podem reutilizar modelos que especiﬁcam a LPSC e tomar decisões de acordo com suas necessidades. Este trabalho enfatiza o uso de ontologias para facilitar o processo de aplicação de LPS em domínios cientíﬁcos. Através do uso de ontologia como um modelo de domínio consegue-se fornecer informações adicionais, bem como adicionar mais semântica ao contexto de LPSC. / A way to improve reusability and maintainability of a family of software products is through the Software Product Line (SPL) approach. In some situations, such as scientiﬁc applications for a given area, it is advantageous to develop a collection of related software products, using an SPL approach. Scientiﬁc Software Product Lines (SSPL) diﬀers from the Software Product Lines due to the fact that SSPL uses an abstract scientiﬁc workﬂow model. This workﬂow is deﬁned according to the scientiﬁc domain and, using this abstract workﬂow model, the products will be instantiated. Analyzing the diﬃculties to specify scientiﬁc experiments, and considering the need for scientiﬁc applications composition for its implementation, an appropriated semantic support for the domain analysis phase is necessary. Therefore, this work proposes an approach based on the combination of feature models and ontologies, named PL-Science, to support the speciﬁcation and conduction of scientiﬁc experiments. The PL-Science approach, which considers the context of SPL and aims to assist scientists to deﬁne a scientiﬁc experiment, specifying a workﬂow that encompasses scientiﬁc applications of a given experiment, is presented during this disser tation. Using SPL concepts, scientists can reuse models that specify the scientiﬁc product line and carefully make decisions according to their needs. This work also focuses on the use of ontologies to facilitate the process of applying Software Product Line to scientiﬁc domains. Through the use of ontology as a domain model, we can provide additional information as well as add more semantics in the context of Scientiﬁc Software Product Lines. Linha de produtos Ontologia Modelo de features Workﬂow cientíﬁco Alinhamento de sequências Product Line Ontology Feature Model Scientific Workflows Sequence Alignment
24	Apoiando o reúso em uma plataforma de ecossistema de software científico através do gerenciamento de contexto e de proveniência Ambrósio, Lenita Martins 14 September 2018 (has links) Submitted by Renata Lopes (renatasil82@gmail.com) on 2018-11-19T12:32:38Z No. of bitstreams: 1 lenitamartinsambrosio.pdf: 4678886 bytes, checksum: a6f09cd96620242b7eeda9443a48e1a5 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2018-11-23T13:09:03Z (GMT) No. of bitstreams: 1 lenitamartinsambrosio.pdf: 4678886 bytes, checksum: a6f09cd96620242b7eeda9443a48e1a5 (MD5) / Made available in DSpace on 2018-11-23T13:09:03Z (GMT). No. of bitstreams: 1 lenitamartinsambrosio.pdf: 4678886 bytes, checksum: a6f09cd96620242b7eeda9443a48e1a5 (MD5) Previous issue date: 2018-09-14 / Considerando o cenário de experimentação científica atual e o crescente uso de aplicações em larga escala, o gerenciamento de dados de experimentação está se tornando cada vez mais complexo. O processo de experimentação científica requer suporte para atividades colaborativas e distribuídas. O gerenciamento de informações contextuais e de proveniência desempenha um papel fundamental no domínio neste domínio. O registro detalhado das etapas para produzir resultados, bem como as informações contextuais do ambiente de experimentação, pode permitir que os cientistas reutilizem esses resultados em experimentos futuros e reutilizem o experimento ou partes dele em outro contexto. O objetivo deste trabalho é apresentar uma abordagem de gerenciamento de informações de proveniência e contexto que apoie pesquisadores no reúso de conhecimento sobre experimentos científicos conduzidos em uma plataforma colaborativa e distribuída. Primeiramente, as fases do ciclo de vida do gerenciamento de contexto e proveniência foram analisadas, considerando os modelos existentes. Em seguida, foi proposto um framework conceitual para apoiar a análise de elementos contextuais e dados de proveniência de experimentos científicos. Uma ontologia capaz de extrair conhecimento implícito neste domínio foi especificada. Essa abordagem foi implementada em uma plataforma de ecossistema científico. Uma avaliação realizada por meio de estudos de caso evidenciou que essa arquitetura é capaz de auxiliar os pesquisadores durante a reutilização e reprodução de experimentos científicos. Elementos de contexto e proveniência de dados, associados a mecanismos de inferência, podem ser utilizados para apoiar a reutilização no processo de experimentação científica. / Considering the current experimentation scenario and the increasing use of large-scale applications, the experiment data management is growing complex. The scientific experimentation process requires support for collaborative and distributed activities. Managing contextual and provenance information plays a key role in the scientific domain. Detailed logging of the steps to produce results, as well as the environment context information could allow scientists to reuse these results in future experiments and reuse the experiment or parts of it in another context. The goal of this work is to present a provenance and context metadata management approach that support researchers in the reuse of knowledge about scientific experiments conducted in a collaborative and distributed platform. First, the context and provenance management life cycle phases were analyzed, considering existing models. Then it was proposed a conceptual framework to support the analysis of contextual elements and provenance data of scientific experiments. An ontology capable of extracting implicit knowledge in this domain was specified. This approach was implemented in a scientific ecosystem platform. An evaluation conducted through case studies shown evidences that this architecture is able to help researchers during the reuse and reproduction of scientific experiments. Context elements and data provenance, associated with inference mechanisms, can be used to support the reuse in scientific experimentation process. Proveniência de dados Elementos de contexto Reúso Experimentos científicos Workflows científicos E-Science Data provenance Contextual elements Reuse Scientific experiments Scientific workflows E-Science
25	Scientific Workflows for Hadoop Bux, Marc Nicolas 07 August 2018 (has links) Scientific Workflows bieten flexible Möglichkeiten für die Modellierung und den Austausch komplexer Arbeitsabläufe zur Analyse wissenschaftlicher Daten. In den letzten Jahrzehnten sind verschiedene Systeme entstanden, die den Entwurf, die Ausführung und die Verwaltung solcher Scientific Workflows unterstützen und erleichtern. In mehreren wissenschaftlichen Disziplinen wachsen die Mengen zu verarbeitender Daten inzwischen jedoch schneller als die Rechenleistung und der Speicherplatz verfügbarer Rechner. Parallelisierung und verteilte Ausführung werden häufig angewendet, um mit wachsenden Datenmengen Schritt zu halten. Allerdings sind die durch verteilte Infrastrukturen bereitgestellten Ressourcen häufig heterogen, instabil und unzuverlässig. Um die Skalierbarkeit solcher Infrastrukturen nutzen zu können, müssen daher mehrere Anforderungen erfüllt sein: Scientific Workflows müssen parallelisiert werden. Simulations-Frameworks zur Evaluation von Planungsalgorithmen müssen die Instabilität verteilter Infrastrukturen berücksichtigen. Adaptive Planungsalgorithmen müssen eingesetzt werden, um die Nutzung instabiler Ressourcen zu optimieren. Hadoop oder ähnliche Systeme zur skalierbaren Verwaltung verteilter Ressourcen müssen verwendet werden. Diese Dissertation präsentiert neue Lösungen für diese Anforderungen. Zunächst stellen wir DynamicCloudSim vor, ein Simulations-Framework für Cloud-Infrastrukturen, welches verschiedene Aspekte der Variabilität adäquat modelliert. Im Anschluss beschreiben wir ERA, einen adaptiven Planungsalgorithmus, der die Ausführungszeit eines Scientific Workflows optimiert, indem er Heterogenität ausnutzt, kritische Teile des Workflows repliziert und sich an Veränderungen in der Infrastruktur anpasst. Schließlich präsentieren wir Hi-WAY, eine Ausführungsumgebung die ERA integriert und die hochgradig skalierbare Ausführungen in verschiedenen Sprachen beschriebener Scientific Workflows auf Hadoop ermöglicht. / Scientific workflows provide a means to model, execute, and exchange the increasingly complex analysis pipelines necessary for today's data-driven science. Over the last decades, scientific workflow management systems have emerged to facilitate the design, execution, and monitoring of such workflows. At the same time, the amounts of data generated in various areas of science outpaced hardware advancements. Parallelization and distributed execution are generally proposed to deal with increasing amounts of data. However, the resources provided by distributed infrastructures are subject to heterogeneity, dynamic performance changes at runtime, and occasional failures. To leverage the scalability provided by these infrastructures despite the observed aspects of performance variability, workflow management systems have to progress: Parallelization potentials in scientific workflows have to be detected and exploited. Simulation frameworks, which are commonly employed for the evaluation of scheduling mechanisms, have to consider the instability encountered on the infrastructures they emulate. Adaptive scheduling mechanisms have to be employed to optimize resource utilization in the face of instability. State-of-the-art systems for scalable distributed resource management and storage, such as Apache Hadoop, have to be supported. This dissertation presents novel solutions for these aspirations. First, we introduce DynamicCloudSim, a cloud computing simulation framework that is able to adequately model the various aspects of variability encountered in computational clouds. Secondly, we outline ERA, an adaptive scheduling policy that optimizes workflow makespan by exploiting heterogeneity, replicating bottlenecks in workflow execution, and adapting to changes in the underlying infrastructure. Finally, we present Hi-WAY, an execution engine that integrates ERA and enables the highly scalable execution of scientific workflows written in a number of languages on Hadoop. Scientific Workflows Workflow-Management-System Cloud Computing Simulation Workflow-Planungsalgorithmen Adaptive Planungsalgorithmen Brownsche Bewegung Wiener-Prozess Apache Hadoop Hadoop YARN Scientific Workflows Workflow Management Systems Cloud Computing Simulation Workflow Scheduling Adaptive Scheduling Brownian Motion Wiener Process Apache Hadoop Hadoop YARN 004 Datenverarbeitung; Informatik ST 201 H03 ST 201 ddc:004
26	Fluxo de dados em redes de Petri coloridas e em grafos orientados a atores / Dataflow in colored Petri nets and in actors-oriented workflow graphs Borges, Grace Anne Pontes 11 September 2008 (has links) Há três décadas, os sistemas de informação corporativos eram projetados para apoiar a execução de tarefas pontuais. Atualmente, esses sistemas também precisam gerenciar os fluxos de trabalho (workflows) e processos de negócio de uma organização. Em comunidades científicas de físicos, astrônomos, biólogos, geólogos, entre outras, seus sistemas de informações distinguem-se dos existentes em ambientes corporativos por: tarefas repetitivas (como re-execução de um mesmo experimento), processamento de dados brutos em resultados adequados para publicação; e controle de condução de experimentos em diferentes ambientes de hardware e software. As diferentes características dos dois ambientes corporativo e científico propiciam que ferramentas e formalismos existentes ou priorizem o controle de fluxo de tarefas, ou o controle de fluxo de dados. Entretanto, há situações em que é preciso atender simultaneamente ao controle de transferência de dados e ao controle de fluxo de tarefas. Este trabalho visa caracterizar e delimitar o controle e representação do fluxo de dados em processos de negócios e workflows científicos. Para isso, são comparadas as ferramentas CPN Tools e KEPLER, que estão fundamentadas em dois formalismos: redes de Petri coloridas e grafos de workflow orientados a atores, respectivamente. A comparação é feita por meio de implementações de casos práticos, usando os padrões de controle de dados como base de comparação entre as ferramentas. / Three decades ago, business information systems were designed to support the execution of individual tasks. Todays information systems also need to support the organizational workflows and business processes. In scientific communities composed by physicists, astronomers, biologists, geologists, among others, information systems have different characteristics from those existing in business environments, like: repetitive procedures (such as re-execution of an experiment), transforming raw data into publishable results; and coordinating the execution of experiments in several different software and hardware environments. The different characteristics of business and scientific environments propitiate the existence of tools and formalisms that emphasize control-flow or dataflow. However, there are situations where we must simultaneously handle the data transfer and control-flow. This work aims to characterize and define the dataflow representation and control in business processes and scientific workflows. In order to achieve this, two tools are being compared: CPN Tools and KEPLER, which are based in the formalisms: colored Petri nets and actors-oriented workflow graphs, respectively. The comparison will be done through implementation of practical cases, using the dataflow patterns as comparison basis. actors-oriented workflow graphs colored Petri nets CPN Tools CPN Tools dataflow patterns grafos de workflow orientados a atores KEPLER KEPLER padrões de fluxo de dados redes de Petri coloridas scientific workflows workflows científicos
27	Distributed knowledge sharing and production through collaborative e-Science platforms / Partage et production de connaissances distribuées dans des plateformes scientifiques collaboratives Gaignard, Alban 15 March 2013 (has links) Cette thèse s'intéresse à la production et au partage cohérent de connaissances distribuées dans le domaine des sciences de la vie. Malgré l'augmentation constante des capacités de stockage et de calcul des infrastructures informatiques, les approches centralisées pour la gestion de grandes masses de données scientifiques multi-sources deviennent inadaptées pour plusieurs raisons: (i) elles ne garantissent pas l'autonomie des fournisseurs de données qui doivent conserver un certain contrôle sur les données hébergées pour des raisons éthiques et/ou juridiques, (ii) elles ne permettent pas d'envisager le passage à l'échelle des plateformes en sciences computationnelles qui sont la source de productions massives de données scientifiques. Nous nous intéressons, dans le contexte des plateformes collaboratives en sciences de la vie NeuroLOG et VIP, d'une part, aux problématiques de distribution et d'hétérogénéité sous-jacentes au partage de ressources, potentiellement sensibles ; et d'autre part, à la production automatique de connaissances au cours de l'usage de ces plateformes, afin de faciliter l'exploitation de la masse de données produites. Nous nous appuyons sur une approche ontologique pour la modélisation des connaissances et proposons à partir des technologies du web sémantique (i) d'étendre ces plateformes avec des stratégies efficaces, statiques et dynamiques, d'interrogations sémantiques fédérées et (ii) d'étendre leur environnent de traitement de données pour automatiser l'annotation sémantique des résultats d'expérience ``in silico'', à partir de la capture d'informations de provenance à l'exécution et de règles d'inférence spécifiques au domaine. Les résultats de cette thèse, évalués sur l'infrastructure distribuée et contrôlée Grid'5000, apportent des éléments de réponse à trois enjeux majeurs des plateformes collaboratives en sciences computationnelles : (i) un modèle de collaborations sécurisées et une stratégie de contrôle d'accès distribué pour permettre la mise en place d'études multi-centriques dans un environnement compétitif, (ii) des résumés sémantiques d'expérience qui font sens pour l'utilisateur pour faciliter la navigation dans la masse de données produites lors de campagnes expérimentales, et (iii) des stratégies efficaces d'interrogation et de raisonnement fédérés, via les standards du Web Sémantique, pour partager les connaissances capitalisées dans ces plateformes et les ouvrir potentiellement sur le Web de données. Mots-clés: Flots de services et de données scientifiques, Services web sémantiques, Provenance, Web de données, Web sémantique, Fédération de bases de connaissances, Intégration de données distribuées, e-Sciences, e-Santé. / This thesis addresses the issues of coherent distributed knowledge production and sharing in the Life-science area. In spite of the continuously increasing computing and storage capabilities of computing infrastructures, the management of massive scientific data through centralized approaches became inappropriate, for several reasons: (i) they do not guarantee the autonomy property of data providers, constrained, for either ethical or legal concerns, to keep the control over the data they host, (ii) they do not scale and adapt to the massive scientific data produced through e-Science platforms. In the context of the NeuroLOG and VIP Life-science collaborative platforms, we address on one hand, distribution and heterogeneity issues underlying, possibly sensitive, resource sharing ; and on the other hand, automated knowledge production through the usage of these e-Science platforms, to ease the exploitation of the massively produced scientific data. We rely on an ontological approach for knowledge modeling and propose, based on Semantic Web technologies, to (i) extend these platforms with efficient, static and dynamic, transparent federated semantic querying strategies, and (ii) to extend their data processing environment, from both provenance information captured at run-time and domain-specific inference rules, to automate the semantic annotation of ``in silico'' experiment results. The results of this thesis have been evaluated on the Grid'5000 distributed and controlled infrastructure. They contribute to addressing three of the main challenging issues faced in the area of computational science platforms through (i) a model for secured collaborations and a distributed access control strategy allowing for the setup of multi-centric studies while still considering competitive activities, (ii) semantic experiment summaries, meaningful from the end-user perspective, aimed at easing the navigation into massive scientific data resulting from large-scale experimental campaigns, and (iii) efficient distributed querying and reasoning strategies, relying on Semantic Web standards, aimed at sharing capitalized knowledge and providing connectivity towards the Web of Linked Data. Services web sémantiques Web de données Fédération de bases de connaissances Intégration de données distribuées E-Sciences E-Santé Scientific workflows Semantic web services Web of linked data Federated knowledge bases Distributed data integration E-Science E-Health 004
28	Fluxo de dados em redes de Petri coloridas e em grafos orientados a atores / Dataflow in colored Petri nets and in actors-oriented workflow graphs Grace Anne Pontes Borges 11 September 2008 (has links) Há três décadas, os sistemas de informação corporativos eram projetados para apoiar a execução de tarefas pontuais. Atualmente, esses sistemas também precisam gerenciar os fluxos de trabalho (workflows) e processos de negócio de uma organização. Em comunidades científicas de físicos, astrônomos, biólogos, geólogos, entre outras, seus sistemas de informações distinguem-se dos existentes em ambientes corporativos por: tarefas repetitivas (como re-execução de um mesmo experimento), processamento de dados brutos em resultados adequados para publicação; e controle de condução de experimentos em diferentes ambientes de hardware e software. As diferentes características dos dois ambientes corporativo e científico propiciam que ferramentas e formalismos existentes ou priorizem o controle de fluxo de tarefas, ou o controle de fluxo de dados. Entretanto, há situações em que é preciso atender simultaneamente ao controle de transferência de dados e ao controle de fluxo de tarefas. Este trabalho visa caracterizar e delimitar o controle e representação do fluxo de dados em processos de negócios e workflows científicos. Para isso, são comparadas as ferramentas CPN Tools e KEPLER, que estão fundamentadas em dois formalismos: redes de Petri coloridas e grafos de workflow orientados a atores, respectivamente. A comparação é feita por meio de implementações de casos práticos, usando os padrões de controle de dados como base de comparação entre as ferramentas. / Three decades ago, business information systems were designed to support the execution of individual tasks. Todays information systems also need to support the organizational workflows and business processes. In scientific communities composed by physicists, astronomers, biologists, geologists, among others, information systems have different characteristics from those existing in business environments, like: repetitive procedures (such as re-execution of an experiment), transforming raw data into publishable results; and coordinating the execution of experiments in several different software and hardware environments. The different characteristics of business and scientific environments propitiate the existence of tools and formalisms that emphasize control-flow or dataflow. However, there are situations where we must simultaneously handle the data transfer and control-flow. This work aims to characterize and define the dataflow representation and control in business processes and scientific workflows. In order to achieve this, two tools are being compared: CPN Tools and KEPLER, which are based in the formalisms: colored Petri nets and actors-oriented workflow graphs, respectively. The comparison will be done through implementation of practical cases, using the dataflow patterns as comparison basis. CPN Tools grafos de workflow orientados a atores KEPLER padrões de fluxo de dados redes de Petri coloridas workflows científicos actors-oriented workflow graphs colored Petri nets CPN Tools dataflow patterns KEPLER scientific workflows

Search results