Global ETD Search

1	Facilitating reproducible computing via scientific workflows – an integrated system approach Cao, Yuan 04 May 2017 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Reproducible computing and research are of great importance for scientific investigation in any discipline. This thesis presents a general approach to provenance in the context of workflows for widely used script languages. Our solution is based on system integration, and is demonstrated by integrating MATLAB with VisTrails, an open source scientific workflow system. The integrated VisTrails-MATLAB system supports reproducible computing with truly prospective and retrospective provenance at multiple granularity levels as scientists choose for their scripts, and at the same time, is very easy to use. provenance scientific workflow scientific workflow management system integration
2	Computational analysis of CpG site DNA methylation Ghorbani, Mohammadmersad January 2013 (has links) Epigenetics is the study of factors that can change DNA and passed to next generation without change to DNA sequence. DNA methylation is one of the categories of epigenetic change. DNA methylation is the attachment of methyl group (CH3) to DNA. Most of the time it occurs in the sequences that G is followed by C known as CpG sites and by addition of methyl to the cytosine residue. As science and technology progress new data are available about individual’s DNA methylation profile in different conditions. Also new features discovered that can have role in DNA methylation. The availability of new data on DNA methylation and other features of DNA provide challenge to bioinformatics and the opportunity to discover new knowledge from existing data. In this research multiple data series were used to identify classes of methylation DNA to CpG sites. These classes are a) Never methylated CpG sites,b) Always methylated CpG sites, c) Methylated CpG sites in cancer/disease samples and non-methylated in normal samples d) Methylated CpG sites in normal samples and non-methylated in cancer/disease samples. After identification of these sites and their classes, an analysis was carried out to find the features which can better classify these sites a matrix of features was generated using four applications in EMBOSS software suite. Features matrix was also generated using the gUse/WS-PGRADE portal workflow system. In order to do this each of the four applications were grid enabled and ported to BOINC platform. The gUse portal was connected to the BOINC project via 3G-bridge. Each node in the workflow created portion of matrix and then these portions were combined together to create final matrix. This final feature matrix used in a hill climbing workflow. Hill climbing node was a JAVA program ported to BOINC platform. A Hill climbing search workflow was used to search for a subset of features that are better at classifying the CpG sites using 5 different measurements and three different classification methods: support vector machine, naïve bayes and J48 decision tree. Using this approach the hill climbing search found the models which contain less than half the number of features and better classification results. It is also been demonstrated that using gUse/WS-PGRADE workflow system can provide a modular way of feature generation so adding new feature generator application can be done without changing other parts. It is also shown that using grid enabled applications can speedup both feature generation and feature subset selection. The approach used in this research for distributed workflow based feature generation is not restricted to this study and can be applied in other studies that involve feature generation. The approach also needs multiple binaries to generate portions of features. The grid enabled hill climbing search application can also be used in different context as it only requires to follow the same format of feature matrix. 572.8
3	[en] SCIENTIFIC APPLICATION: REENGINEERING TO ADD WORKFLOW CONCEPTS / [pt] REENGENHARIA DE UMA APLICAÇÃO CIENTÍFICA PARA INCLUSÃO DE CONCEITOS DE WORKFLOW THIAGO MANHENTE DE CARVALHO MARQUES 17 January 2017 (has links) [pt] A aplicação de técnicas de workflows na área de computação científica é bastante explorada para a condução de experimentos e construção de modelos in silico. Ao analisarmos alguns desafios enfrentados por uma aplicação científica na área de geociências, percebemos que workflows podem ser usados para representar os modelos gerados na aplicação e facilitar o desenvolvimento de funcionalidades que supram as necessidades identificadas. A maioria dos trabalhos e ferramentas na área de workflows científicos, porém, são voltados para uso em ambientes de computação distribuída, como serviços web e computação em grade, sendo de difícil uso ou integração dentro de aplicações científicas mais simples. Nesta dissertação, discutimos como viabilizar a composição e representação de workflows dentro de uma aplicação científica existente. Descrevemos uma arquitetura conceitual de motor de workflows voltado para o uso dentro de uma aplicação stand-alone. Descrevemos também um modelo de implantação em uma aplicação C plus plus usando redes de Petri para modelar um workflow e funções C plus plus para representar as tarefas. Como prova de conceito, implantamos esse modelo de workflows em uma aplicação existente e analisamos o impacto do seu uso na aplicação. / [en] The use of workflow techniques in scientific computing is widely adopted in the execution of experiments and building in silico models. By analysing some challenges faced by a scientific application in the geosciences domain, we noticed that workflows could be used to represent the geological models created using the application so as to ease the development of features to meet those challenges. Most works and tools on the scientific workflows domain, however, are designed for use in distributed computing contexts like web services and grid computing, which makes them unsuitable for integration or use within simpler scientific applications. In this dissertation, we discuss how to make viable the composition and representation of workflows within an existing scientific application. We describe a conceptual architecture of a workflow engine designed to be used within a stand-alone application. We also describe an implementation model of this architecture in a C plus plus application using Petri nets to model a workflow and C plus plus functions to represent tasks. As proof of concept, we implement this workflow model in an existing application and studied its impact on the application. [pt] WORKFLOWS CIENTIFICOS [en] SCIENTIFIC WORKFLOW [pt] REENGENHARIA DE APLICACOES [en] APPLICATION REENGINEERING
4	Blockchain Use for Data Provenance in Scientific Workflow Sigurjonsson, Sindri Már Kaldal January 2018 (has links) In Scientific workflows, data provenance plays a big part. Through data provenance, the execution of the workflow is documented and information about the data pieces involved are stored. This can be used to reproduce scientific experiments or to proof how the results from the workflow came to be. It is therefore vital that the provenance data that is stored in the provenance database is always synchronized with its corresponding workflow, to verify that the provenance database has not been tampered with. The blockchain technology has been gaining a lot of attention in recent years since Satoshi Nakamoto released his Bitcoin paper in 2009. The blockchain technology consists of a peer-to-peer network where an append-only ledger is stored and replicated across a peer-to-peer network and offers high tamper-resistance through its consensus protocols. In this thesis, the option of whether the blockchain technology is a suitable solution for synchronizing workflow with its provenance data was explored. A system that generates a workflow, based on a definition written in a Domain Specific Language, was extended to utilize the blockchain technology to synchronize the workflow itself and its results. Furthermore, the InterPlanetary File System was utilized to assist with the versioning of individual executions of the workflow. The InterPlanetary File System provided the functionality of comparing individual workflows executions in more detail and to discover how they differ. The solution was analyzed with respect to the 21 CFR Part 11 regulations imposed by the FDA in order to see how it could assist with fulfilling the requirements of the regulations. Analysis on the system shows that the blockchain extension can be used to verify if the synchronization between a workflow and its results has been tampered with. Experiments revealed that the size of the workflow did not have a significant effect on the execution time of the extension. Additionally, the proposed solution offers a constant cost in digital currency regardless of the workflow. However, even though the extension shows some promise of assisting with fulfilling the requirements of the 21 CFR Part 11 regulations, analysis revealed that the extension does not fully comply with it due to the complexity of the regulations / I vetenskapliga arbetsflöden är usprung (eng. provenance) av dataviktigt. Genom att spåra ursprunget av data, i form av dokumentation,kan datas ursprung sparas. Detta kan användas för att återskapavetenskapliga experiment eller för att bevisa hur resultat från arbetsflödegenererats. Det är därför viktigt att datas ursprung, som lagrasi ursprungsdatabasen, alltid är synkroniserad med dess motsvarandearbetsflöde som ett sätt att verifiera att ursprungsdatabasen intehar manipulerats. Blockchainteknologi har fått mycket uppmärksamhetde senaste åren sen Satoshi Nakamoto släppte sin Bitcoin artikelår 2009. Blockchainteknologi består av ett peer-to-peer nätverk där endastbifogning tillåts i en liggare som är replikerad över ett peer-topeernätverk vilken tillhandahåller hög manipuleringsresistans genomkonsensusprotokoll. I denna uppsats undersöks hurvida blockchainteknologi är en passande lösning för arbetsflödessynkronisering avursprungsdata. Ett system som genererar ett arbetsflöde, baserat påen definition som skrivits i ett domänspecifikt språk, var förlängt föratt utnyttja blockchainteknologi för synkronisering av arbetsflödet ochdess resultat. InterPlanetary File System användes för att assistera medversionshanteringen av individuella exekveringar av arbetsflödet. InterPlanetaryFile System tillhandahöll funktionalitet för att jämföra individuellaarbetsflödesexekveringar mer detaljerat samt att upptäckahur de skiljer sig åt. Resultaten är analyserade med hänsyn till 21 CFRPart 11 regleringar från FDA för att se hur resultaten kan assistera medatt uppfylla kraven av förordningarna. Analys av systemen visar attblockchainförlängningen kan användas för att verifiera att synkroniseringenmellan arbetsflödet och dess resultat inte har manipulerats.Experimenten visade att storleken av arbetsflödet inte hade märkbareffekt på exekveringstiden av förlängningen. Därutöver möjliggör denpresenterade lösningen en konstant kostnad i digital valuta oavsett arbetsflödetsstorlek. Även om förlängningen visar lovande resultat förassistering av fullföljande av 21 CFR Part 11 regleringarna påvisar analysatt förlängningen inte fullständigt uppfyller kraven på grund avkomplexiteten av dessa regleringar. Computer Systems Datorsystem
5	Resource-oriented architecture based scientific workflow modelling Duan, Kewei January 2016 (has links) This thesis studies the feasibility and methodology of applying state-of-the-art computer technology in scientific workflow modelling, within a collaborative environment. The collaborative environment also indicates that the people involved include non-computer scientists or engineers from other disciplines. The objective of this research is to provide a systematic methodology based on a web environment for the purpose of lowering the barriers brought by the heterogeneous features of multi-institutions, multi-platforms and geographically distributed resources which are implied in the collaborative environment of scientific workflow. 004.2
6	Data-intensive interactive workflows for visual analytics Khemiri, Wael 12 December 2011 (has links) (PDF) The increasing amounts of electronic data of all forms, produced by humans (e.g. Web pages, structured content such as Wikipedia or the blogosphere etc.) and/or automatic tools (loggers, sensors, Web services, scientific programs or analysis tools etc.) leads to a situation of unprecedented potential for extracting new knowledge, finding new correlations, or simply making sense of the data.Visual analytics aims at combining interactive data visualization with data analysis tasks. Given the explosion in volume and complexity of scientific data, e.g., associated to biological or physical processes or social networks, visual analytics is called to play an important role in scientific data management.Most visual analytics platforms, however, are memory-based, and are therefore limited in the volume of data handled. Moreover, the integration of each new algorithm (e.g. for clustering) requires integrating it by hand into the platform. Finally, they lack the capability to define and deploy well-structured processes where users with different roles interact in a coordinated way sharing the same data and possibly the same visualizations.This work is at the convergence of three research areas: information visualization, database query processing and optimization, and workflow modeling. It provides two main contributions: (i) We propose a generic architecture for deploying a visual analytics platform on top of a database management system (DBMS) (ii) We show how to propagate data changes to the DBMS and visualizations, through the workflow process. Our approach has been implemented in a prototype called EdiFlow, and validated through several applications. It clearly demonstrates that visual analytics applications can benefit from robust storage and automatic process deployment provided by the DBMS while obtaining good performance and thus it provides scalability.Conversely, it could also be integrated into a data-intensive scientific workflow platform in order to increase its visualization features. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Visual analytics Scientific workflow systems Dynamic changes
7	Revisiter les grilles de PCs avec des technologies du Web et le Cloud computing / Re-examaning the Desktop Grids with Web Technologies and Cloud Computing Abidi, Leila 03 March 2015 (has links) Le contexte de cette thèse est à l’intersection des contextes des grilles de calculs, des nouvelles technologies du Web ainsi que des Clouds et des services à la demande. Depuis leur avènement au cours des années 90, les plates-formes distribuées, plus précisément les systèmes de grilles de calcul (Grid Computing), n’ont pas cessé d’évoluer permettant ainsi de susciter multiple efforts de recherche. Les grilles de PCs ont été proposées comme une alternative aux super-calculateurs par la fédération des milliers d’ordinateurs de bureau. Les détails de la mise en oeuvre d’une telle architecture de grille, en termes de mécanismes de mutualisation des ressources, restent très difficile à cerner. Parallèlement, le Web a complètement modifié notre façon d’accéder à l’information. Le Web est maintenant une composante essentielle de notre quotidien. Les équipements ont, à leur tour, évolué d’ordinateurs de bureau ou ordinateurs portables aux tablettes, lecteurs multimédias, consoles de jeux, smartphones, ou NetPCs. Cette évolution exige d’adapter et de repenser les applications/intergiciels de grille de PCs qui ont été développés ces dernières années. Notre contribution se résume dans la réalisation d’un intergiciel de grille de PCs que nous avons appelé RedisDG. Dans son fonctionnement, RedisDG reste similaire à la plupart des intergiciels de grilles de calcul, c’est-à-dire qu’il est capable d’exécuter des applications sous forme de «sacs de tâches» dans un environnement distribué, assurer le monitoring des noeuds, valider et certifier les résultats. L’innovation de RedisDG, réside dans l’intégration de la modélisation et la vérification formelles dans sa phase de conception, ce qui est non conventionnel mais très pertinent dans notre domaine. Notre approche consiste à repenser les grilles de PCs à partir d’une réflexion et d’un cadre formel permettant de les développer, de manière rigoureuse et de mieux maîtriser les évolutions technologiques à venir. / The context of this work is at the intersection of grid computing, the new Web technologies and the Clouds and services on demand contexts. Desktop Grid have been proposed as an alternative to supercomputers by the federation of thousands of desktops. The details of the implementation of such an architecture, in terms of resource sharing mechanisms, remain very hard. Meanwhile, the Web has completely changed the way we access information. The equipment, in turn, have evolved from desktops or laptops to tablets, smartphones or NetPCs. Our approach is to rethink Desktop Grids from a reflexion and a formal framework to develop them rigorously and better control future technological developments. We have reconsidered the interactions between the traditional components of a Desktop Grid based on the Web technology, and given birth to RedisDG, a new Desktop Grid middelware capable to operate on small devices, ie smartphones, tablets like the more traditional devicves (PCs). Our system is entirely based on the publish-subscribe paradigm. RedisDG is developped with Python and uses Redis as advanced key-value cache and store. Modélisation formelle Publication-souscription Grille de PCs Workflow scientifique Formal modelization Publish-subscribe Desktop grid Scientific workflow
8	Data-intensive interactive workflows for visual analytics / Données en masse et workflows interactifs pour la visualisation analytique Khemiri, Wael 12 December 2011 (has links) L'expansion du World Wide Web et la multiplication des sources de données (capteurs, services Web, programmes scientifiques, outils d'analyse, etc.) ont conduit à la prolifération de données hétérogènes et complexes. La phase d'extraction de connaissance et de recherche de corrélation devient ainsi de plus en plus difficile.Typiquement, une telle analyse est effectuée en utilisant les outils logiciels qui combinent: des techniques de visualisation, permettant aux utilisateurs d'avoir une meilleure compréhension des données, et des programmes d'analyse qui effectuent des opérations d'analyses complexes et longues.La visualisation analytique (visual analytics) vise à combiner la visualisation des donnéesavec des tâches d'analyse et de fouille. Etant donnée la complexité et la volumétrie importante des données scientifiques (par exemple, les données associées à des processus biologiques ou physiques, données des réseaux sociaux, etc.), la visualisation analytique est appelée à jouer un rôle important dans la gestion des données scientifiques.La plupart des plateformes de visualisation analytique actuelles utilisent des mécanismes en mémoire centrale pour le stockage et le traitement des données, ce qui limite le volume de données traitées. En outre, l'intégration de nouveaux algorithmes dans le processus de traitement nécessite du code d'intégration ad-hoc. Enfin, les plate-formes de visualisation actuelles ne permettent pas de définir et de déployer des processus structurés, où les utilisateurs partagent les données et, éventuellement, les visualisations.Ce travail, à la confluence des domaines de la visualisation analytique interactive et des bases de données, apporte deux contributions. (i) Nous proposons une architecture générique pour déployer une plate-forme de visualisation analytique au-dessus d'un système de gestion de bases de données (SGBD). (ii) Nous montrons comment propager les changements des données dans le SGBD, au travers des processus et des visualisations qui en font partie. Notre approche permet à l'application de visualisation analytique de profiter du stockage robuste et du déploiement automatique de processus à partir d'une spécification déclarative, supportés par le SGBD.Notre approche a été implantée dans un prototype appelé EdiFlow, et validée à travers plusieurs applications. Elle pourrait aussi s'intégrer dans une plate-forme de workflow scientifique à usage intensif de données, afin d'en augmenter les fonctionnalités de visualisation. / The increasing amounts of electronic data of all forms, produced by humans (e.g. Web pages, structured content such as Wikipedia or the blogosphere etc.) and/or automatic tools (loggers, sensors, Web services, scientific programs or analysis tools etc.) leads to a situation of unprecedented potential for extracting new knowledge, finding new correlations, or simply making sense of the data.Visual analytics aims at combining interactive data visualization with data analysis tasks. Given the explosion in volume and complexity of scientific data, e.g., associated to biological or physical processes or social networks, visual analytics is called to play an important role in scientific data management.Most visual analytics platforms, however, are memory-based, and are therefore limited in the volume of data handled. Moreover, the integration of each new algorithm (e.g. for clustering) requires integrating it by hand into the platform. Finally, they lack the capability to define and deploy well-structured processes where users with different roles interact in a coordinated way sharing the same data and possibly the same visualizations.This work is at the convergence of three research areas: information visualization, database query processing and optimization, and workflow modeling. It provides two main contributions: (i) We propose a generic architecture for deploying a visual analytics platform on top of a database management system (DBMS) (ii) We show how to propagate data changes to the DBMS and visualizations, through the workflow process. Our approach has been implemented in a prototype called EdiFlow, and validated through several applications. It clearly demonstrates that visual analytics applications can benefit from robust storage and automatic process deployment provided by the DBMS while obtaining good performance and thus it provides scalability.Conversely, it could also be integrated into a data-intensive scientific workflow platform in order to increase its visualization features. Visualisation analytique Systèmes workflow Gestion dynamique des données Visual analytics Scientific workflow systems Dynamic changes
9	Methods for Modeling and Analyzing Concurrent Software Zeng, Reng 02 July 2013 (has links) Concurrent software executes multiple threads or processes to achieve high performance. However, concurrency results in a huge number of different system behaviors that are difficult to test and verify. The aim of this dissertation is to develop new methods and tools for modeling and analyzing concurrent software systems at design and code levels. This dissertation consists of several related results. First, a formal model of Mondex, an electronic purse system, is built using Petri nets from user requirements, which is formally verified using model checking. Second, Petri nets models are automatically mined from the event traces generated from scientific workflows. Third, partial order models are automatically extracted from some instrumented concurrent program execution, and potential atomicity violation bugs are automatically verified based on the partial order models using model checking. Our formal specification and verification of Mondex have contributed to the world wide effort in developing a verified software repository. Our method to mine Petri net models automatically from provenance offers a new approach to build scientific workflows. Our dynamic prediction tool, named McPatom, can predict several known bugs in real world systems including one that evades several other existing tools. McPatom is efficient and scalable as it takes advantage of the nature of atomicity violations and considers only a pair of threads and accesses to a single shared variable at one time. However, predictive tools need to consider the tradeoffs between precision and coverage. Based on McPatom, this dissertation presents two methods for improving the coverage and precision of atomicity violation predictions: 1) a post-prediction analysis method to increase coverage while ensuring precision; 2) a follow-up replaying method to further increase coverage. Both methods are implemented in a completely automatic tool. concurrency multi-threaded program atomicity violation model checking Petri nets verification Mondex scientific workflow Software Engineering
10	Abordagem algébrica para seleção de clones ótimos em projetos genomas e metagenomas / Algebraic approach to optimal clone selection in genomics and metagenomics projects. Cantão, Mauricio Egidio 01 December 2009 (has links) Devido à grande diversidade de microrganismos desconhecidos no meio ambiente, 99% deles não podem ser cultivados nos meios de cultura tradicionais dos laboratórios. Para isso, projetos metagenômicos são propostos para estudar comunidades microbianas presentes no meio ambiente, a partir de técnicas moleculares, em especial o seqüenciamento. Dessa forma, para os próximos anos é esperado um acúmulo de seqüências produzidas por esses projetos. As seqüências produzidas pelos projetos genomas e metagenomas apresentam vários desafios para o tratamento, armazenamento e análise, como exemplo: a busca de clones contendo genes de interesse. Este trabalho apresenta uma abordagem algébrica que define e gerencia de forma dinâmica as regras para a seleção de clones em bibliotecas genômicas e metagenômicas, que se baseiam em álgebra de processos. Além disso, uma interface web foi desenvolvida para permitir que os pesquisadores criem e executem facilmente suas próprias regras de seleção de clones em bancos de dados de seqüências genômicas e metagenômicas. Este software foi testado em bibliotecas genômicas e metagenômicas e foi capaz de selecionar clones contendo genes de interesse. / Due to the wide diversity of unknown organisms in the environment, 99% of them cannot be grown in traditional culture medium in laboratories. Therefore, metagenomics projects are proposed to study microbial communities present in the environment, from molecular techniques, especially the sequencing. Thereby, for the coming years it is expected an accumulation of sequences produced by these projects. Thus, the sequences produced by genomics and metagenomics projects present several challenges for the treatment, storing and analysis such as: the search for clones containing genes of interest. This work presents an algebraic approach that defines it dynamically and manages the rules of the selection of clones in genomic and metagenomic libraries, which are based on process algebra. Furthermore, a web interface was developed to allow researchers to easily create and execute their own rules to select clones in genomic and metagenomic sequence database. This software was tested in genomics and metagenomics libraries and it was able to select clones containing genes of interest. Álgebra de Processos Clones Ótimos DNA Microarray Metagenoma Metagenome Microarranjo de DNA NPDL NPDL Optimal Clones Process Algebra Scientific Workflow Workflow Cintífico

Search results