Global ETD Search

91	The Multi-tiered Future of Storage: Understanding Cost and Performance Trade-offs in Modern Storage Systems Iqbal, Muhammad Safdar 19 September 2017 (has links) In the last decade, the landscape of storage hardware and software has changed considerably. Storage hardware has diversified from hard disk drives and solid state drives to include persistent memory (PMEM) devices such as phase change memory (PCM) and Flash-backed DRAM. On the software side, the increasing adoption of cloud services for building and deploying consumer and enterprise applications is driving the use of cloud storage services. Cloud providers have responded by providing a plethora of choices of storage services, each of which have unique performance characteristics and pricing. We argue this variety represents an opportunity for modern storage systems, and it can be leveraged to improve operational costs of the systems. We propose that storage tiering is an effective technique for balancing operational or de- ployment costs and performance in such modern storage systems. We demonstrate this via three key techniques. First, THMCache, which leverages tiering to conserve the lifetime of PMEM devices, hence saving hardware upgrade costs. Second, CAST, which leverages tiering between multiple types of cloud storage to deliver higher utility (i.e. performance per unit of cost) for cloud tenants. Third, we propose a dynamic pricing scheme for cloud storage services, which leverages tiering to increase the cloud provider's profit or offset their management costs. / Master of Science / Storage and retrival of data is one of the key functions of any computer system. Improvements in hardware and software related to data storage can help computer users store (a) store the data faster, which makes for overall faster performance; and (b) increase the storage capacity, which helps store the increasing amount of data generated by modern computer users. Typically, most computers are equipped with either a hard disk drive (HDD) or, the newer and faster, solid state drive (SSD) for data storage. In the last decade however, the landscape of data storage hardware and software has advanced considerably. On the hardware side, several hardware makers are introducing persistent memory (PMEM) devices, which provide very high speed, high capacity storage at reasonable price points. On the software side, the increasing adoption of cloud services by software developers that are building and operating consumer and enterprise applications is driving the use of cloud storage services. These services allow the developers to store a large amount of data without having to manage any physical hardware, paying for the service on a usage-based pricing structure. However, every application’s speed and capacity needs are not the same; hence, cloud service providers have responded by providing a plethora of choices of storage services, each of which have unique performance characteristics and pricing. We argue this variety represents an opportunity for modern storage systems, and it can be leveraged to improve the operating costs of the systems. Storage tiering is a classical technique that involves partitioning the stored data and placing each partition in a different storage device. This lets the applications use mulitple devices at once, taking advantage of each’s sterngths and mitigating their weaknesses. We propose that storage tiering is a relevant and effective technique for balancing operational or deployment costs and performance in modern storage systems such as PMEM devices and cloud storage services. We demonstrate this via three key techniques. First, THMCACHE, which leverages tiering between multiple types of storage hardware to conserve the lifetime of PMEM devices, hence saving hardware upgrade costs. Second, CAST, which leverages tiering between multiple types of cloud storage services to deliver higher utility (i.e. performance per unit of cost) for software developers using these services. Third, we propose a dynamic pricing scheme for cloud storage services, which leverages tiering between multiple cloud storage services to increase the cloud service provider’s profit or offset their management costs. multi-tier storage persistent memory cloud storage in-memory cache multi-level cache MapReduce analytics workflows pricing games dynamic pricing
92	Model-driven development of information systems Wang, Chen-Wei January 2012 (has links) The research presented in this thesis is aimed at developing reliable information systems through the application of model-driven and formal techniques. These are techniques in which a precise, formal model of system behaviour is exploited as source code. As such a model may be more abstract, and more concise, than source code written in a conventional programming language, it should be easier and more economical to create, to analyse, and to change. The quality of the model of the system can be ensured through certain kinds of formal analysis and fixed accordingly if necessary. Most valuably, the model serves as the basis for the automated generation or configuration of a working system. This thesis provides four research contributions. The first involves the analysis of a proposed modelling language targeted at the model-driven development of information systems. Logical properties of the language are derived, as are properties of its compiled form---a guarded substitution notation. The second involves the extension of this language, and its semantics, to permit the description of workflows on information systems. Workflows described in this way may be analysed to determine, in advance of execution, the extent to which their concurrent execution may introduce the possibility of deadlock or blocking: a condition that, in this context, is synonymous with a failure to achieve the specified outcome. The third contribution concerns the validation of models written in this language by adapting existing techniques of software testing to the analysis of design models. A methodology is presented for checking model consistency, on the basis of a generated test suite, against the intended requirements. The fourth and final contribution is the presentation of an implementation strategy for the language, targeted at standard, relational databases, and an argument for its correctness, based on a simple, set-theoretic semantics for structure and operations. 025.04
93	Um estudo da aplicação de técnicas de inteligência computacional e de aprendizado em máquina de mineração de processos de negócio / A study of the application of computational intelligence and machine learning techniques in business process mining Cárdenas Maita, Ana Rocío 04 December 2015 (has links) Mineração de processos é uma área de pesquisa relativamente recente que se situa entre mineração de dados e aprendizado de máquina, de um lado, e modelagem e análise de processos de negócio, de outro lado. Mineração de processos visa descobrir, monitorar e aprimorar processos de negócio reais por meio da extração de conhecimento a partir de logs de eventos disponíveis em sistemas de informação orientados a processos. O principal objetivo deste trabalho foi avaliar o contexto de aplicação de técnicas provenientes das áreas de inteligência computacional e de aprendizado de máquina, incluindo redes neurais artificiais. Para fins de simplificação, denominadas no restante deste texto apenas como ``redes neurais\'\'. e máquinas de vetores de suporte, no contexto de mineração de processos. Considerando que essas técnicas são, atualmente, as mais aplicadas em tarefas de mineração de dados, seria esperado que elas também estivessem sendo majoritariamente aplicadas em mineração de processos, o que não tinha sido demonstrado na literatura recente e foi confirmado por este trabalho. Buscou-se compreender o amplo cenário envolvido na área de mineração de processos, incluindo as principais caraterísticas que têm sido encontradas ao longo dos últimos dez anos em termos de: tipos de mineração de processos, tarefas de mineração de dados usadas, e técnicas usadas para resolver tais tarefas. O principal enfoque do trabalho foi identificar se as técnicas de inteligência computacional e de aprendizado de máquina realmente não estavam sendo amplamente usadas em mineração de processos, ao mesmo tempo que se buscou identificar os principais motivos para esse fenômeno. Isso foi realizado por meio de um estudo geral da área, que seguiu rigor científico e sistemático, seguido pela validação das lições aprendidas por meio de um exemplo de aplicação. Este estudo considera vários enfoques para delimitar a área: por um lado, as abordagens, técnicas, tarefas de mineração e ferramentas comumente mais usadas; e, por outro lado, veículos de publicação, universidades e pesquisadores interessados no desenvolvimento da área. Os resultados apresentam que 81% das publicações atuais seguem as abordagens tradicionais em mineração de dados. O tipo de mineração de processos com mais estudo é Descoberta 71% dos estudos primários. Os resultados deste trabalho são valiosos para profissionais e pesquisadores envolvidos no tema, e representam um grande aporte para a área / Mining process is a relatively new research area that lies between data mining and machine learning, on one hand, and business process modeling and analysis, on the other hand. Mining process aims at discovering, monitoring and improving business processes by extracting real knowledge from event logs available in process-oriented information systems. The main objective of this master\'s project was to assess the application of computational intelligence and machine learning techniques, including, for example, neural networks and support vector machines, in process mining. Since these techniques are currently widely applied in data mining tasks, it would be expected that they were also widely applied to the process mining context, which has been not evidenced in recent literature and confirmed by this work. We sought to understand the broad scenario involved in the process mining area, including the main features that have been found over the last ten years in terms of: types of process mining, data mining tasks used, and techniques applied to solving such tasks. The main focus of the study was to identify whether the computational intelligence and machine learning techniques were indeed not being widely used in process mining whereas we sought to identify the main reasons for this phenomenon. This was accomplished through a general study area, which followed scientific and systematic rigor, followed by validation of the lessons learned through an application example. This study considers various approaches to delimit the area: on the one hand, approaches, techniques, mining tasks and more commonly used tools; and, on the other hand, the publication vehicles, universities and researchers interested in the development area. The results show that 81% of current publications follow traditional approaches to data mining. The type of mining processes more study is Discovery 71% of the primary studies. These results are valuable for practitioners and researchers involved in the issue, and represent a major contribution to the area Aprendizado de máquina BPM BPM Business Process Management Computational Intelligence Data Mining Gestão de Processos de Negócio Inteligência computacional Machine Learning Mineração de dados Mineração de processos Mineração de workflows Process Mining Workflow Mining
94	Coordination fiable de services de données à base de politiques actives / Reliable coordination of data management services Espinosa Oviedo, Javier Alfonso 28 October 2013 (has links) Nous proposons une approche pour ajouter des propriétés non-fonctionnelles (traitement d'exceptions, atomicité, sécurité, persistance) à des coordinations de services. L'approche est basée sur un Modèle de Politiques Actives (AP Model) pour représenter les coordinations de services avec des propriétés non-fonctionnelles comme une collection de types. Dans notre modèle, une coordination de services est représentée comme un workflow compose d'un ensemble ordonné d'activité. Chaque activité est en charge d'implante un appel à l'opération d'un service. Nous utilisons le type Activité pour représenter le workflow et ses composants (c-à-d, les activités du workflow et l'ordre entre eux). Une propriété non-fonctionnelle est représentée comme un ou plusieurs types de politiques actives, chaque politique est compose d'un ensemble de règles événement-condition-action qui implantent un aspect d'un propriété. Les instances des entités du modèle, politique active et activité peuvent être exécutées. Nous utilisons le type unité d'exécution pour les représenter comme des entités dont l'exécution passe par des différents états d'exécution en exécution. Lorsqu'une politique active est associée à une ou plusieurs unités d'exécution, les règles vérifient si l'unité d'exécution respecte la propriété non-fonctionnelle implantée en évaluant leurs conditions sur leurs états d'exécution. Lorsqu'une propriété n'est pas vérifiée, les règles exécutant leurs actions pour renforcer les propriétés en cours d'exécution. Nous avons aussi proposé un Moteur d'exécution de politiques actives pour exécuter un workflow orientés politiques actives modélisé en utilisant notre AP Model. Le moteur implante un modèle d'exécution qui détermine comment les instances d'une AP, une règle et une activité interagissent entre elles pour ajouter des propriétés non-fonctionnelles (NFP) à un workflow en cours d'exécution. Nous avons validé le modèle AP et le moteur d'exécution de politiques actives en définissant des types de politiques actives pour adresser le traitement d'exceptions, l'atomicité, le traitement d'état, la persistance et l'authentification. Ces types de politiques actives ont été utilisés pour implanter des applications à base de services fiables, et pour intégrer les données fournies par des services à travers des mashups. / We propose an approach for adding non-functional properties (exception handling, atomicity, security, persistence) to services' coordinations. The approach is based on an Active Policy Model (AP Model) for representing services' coordinations with non-functional properties as a collection of types. In our model, a services' coordination is represented as a workflow composed of an ordered set of activities, each activity in charge of implementing a call to a service' operation. We use the type Activity for representing a workflow and its components (i.e., the workflow' activities and the order among them). A non-functional property is represented as one or several Active Policy types, each policy composed of a set of event-condition-action rules in charge of implementing an aspect of the property. Instances of active policy and activity types are considered in the model as entities that can be executed. We use the Execution Unit type for representing them as entities that go through a series of states at runtime. When an active policy is associated to one or several execution units, its rules verify whether each unit respects the implemented non-functional property by evaluating their conditions over their execution unit state, and when the property is not verified, the rules execute their actions for enforcing the property at runtime. We also proposed a proof of concept Active Policy Execution Engine for executing an active policy oriented workflow modelled using our AP Model. The engine implements an execution model that determines how AP, Rule and Activity instances interact among each other for adding non-functional properties (NFPs) to a workflow at execution time. We validated the AP Model and the Active Policy Execution Engine by defining active policy types for addressing exception handling, atomicity, state management, persistency and authentication properties. These active policy types were used for implementing reliable service oriented applications, and mashups for integrating data from services. Services de données Coordination de Services Propriétés Non Fonctionnelles Workflows Règles Actives Data services Services Coordinations Non-Functional Properties Active Policy Programming Workflow ECA Rules 004
95	Um modelo de workflow cient?fico para o refinamento da estrutura 3D aproximada de prote?nas Soletti, Leonardo Veronese 30 March 2016 (has links) Submitted by Caroline Xavier (caroline.xavier@pucrs.br) on 2017-06-29T11:48:00Z No. of bitstreams: 1 DIS_LEONARDO_VERONESE_SOLETTI_COMPLETO.pdf: 4509586 bytes, checksum: 932e17294867261485737bead0bba62c (MD5) / Made available in DSpace on 2017-06-29T11:48:00Z (GMT). No. of bitstreams: 1 DIS_LEONARDO_VERONESE_SOLETTI_COMPLETO.pdf: 4509586 bytes, checksum: 932e17294867261485737bead0bba62c (MD5) Previous issue date: 2016-03-30 / Conselho Nacional de Pesquisa e Desenvolvimento Cient?fico e Tecnol?gico - CNPq / As a consequence of the post-genomic era an explosion of information and numerous discoveries made available large amounts of biological data. Even with the technology enhancements regarding protein structure prediction techniques, it is still not possible to find a tool to predict with precision the exact the three-dimensional structure of a given protein. This brings new challenges, starting from how to understand and organize these resources until sharing and reuse of successful experiments, as well as how to provide interoperability between data from different sources, without mentioning the diversity between tools and different user profiles. This kind of data flow is regularly addressed as command line scripts which require users to have programming skills. Such scripts have problems interfering, collecting and storing data while executing. Furthermore, these scripts and can be very complex leading to difficulties of implementation, maintenance and reuse. Another problem that arises when a set of tasks are proposed to be conducted through scripts is the possibility of missing any step in the process or running at incorrect order, leading to inconsistent results. It becomes necessary techniques and tools to ease this process in an organized way as a sequence of steps characterized by a workflow, thus automating this process. In this context, we sought to develop a scientific workflow model using bioinformatics tools and biology expertise to automate the process of protein refinement of polypeptides predicted by CReF method once the refinement process scripts were automated, it was possible to increase the amount of experiments while maintaining an acceptable quality criteria. Finally, was developed a web interface that facilitates the visualization of the results in an organized way. / Com o advento da era p?s-gen?mica surge, como consequ?ncia, uma explos?o de informa??es onde in?meras descobertas geram grande quantidade de dados biol?gicos. Mesmo com o avan?o da tecnologia nas t?cnicas de predi??o de estruturas de prote?nas, n?o ? poss?vel ainda se encontrar uma ferramenta capaz de predizer com precis?o exata a estrutura 3D de prote?nas. Em decorr?ncia disso, surgem novos desafios para entender e organizar esses recursos nas pesquisas, o compartilhamento e reuso de experimentos bem-sucedidos, assim como prover interoperabilidade entre dados e ferramentas de diferentes locais e utilizados por usu?rios com perfis distintos. As atividades de estudos do fluxo destes dados, inicialmente, baseiam-se em scripts que auxiliam na entrada, processamento e resultado final da an?lise, normalmente executados por linha de comando, o que obriga seus usu?rios a terem dom?nio de algoritmos e l?gica de programa??o. Tais scripts apresentam problemas em interferir, coletar e armazenar dados ao longo de sua execu??o, e podem ser muito complexos, ocasionando a dificuldades de implementa??o, manuten??o e reuso. Outro problema ? quando um conjunto de tarefas a serem realizadas atrav?s de scripts, podem ter o risco de faltar algum passo no processo ou n?o ser executado na ordem certa, obtendo-se com isso resultados n?o satisfat?rios. Torna-se necess?rio t?cnicas e ferramentas que facilitem esse processo, de maneira organizada como uma sequ?ncia de etapas caracterizados por um fluxo de execu??o, automatizando-se assim este processo. Neste contexto, buscou-se desenvolver um modelo de workflow cient?fico utilizando-se ferramentas de bioinform?tica e de conhecimentos da biologia para automatizar o processo de refinamento de prote?nas, do polipept?dio predito pelo m?todo CReF. Os scripts do processo de refinamento foram automatizados, com isso foi poss?vel aumentar a quantidade de experimentos, mantendo um crit?rio de qualidade aceit?vel. Para o resultado final do processo, desenvolveu-se uma interface web que facilita a visualiza??o dos resultados de uma forma organizada. Bioinform?tica Bioinform?tica Estrutural Workflows Cient?ficos Predi??o da Estrutura 3D de Prote?nas Din?mica Molecular
96	Designing scientific workflow following a structure and provenance-aware strategy / Conception de workflows scientifiques fondée sur la structure et la provenance Chen, Jiuqiang 11 October 2013 (has links) Les expériences bioinformatiques sont généralement effectuées à l'aide de workflows scientifiques dans lesquels les tâches sont enchaînées les unes aux autres pour former des structures de graphes très complexes et imbriquées. Les systèmes de workflows scientifiques ont ensuite été développés pour guider les utilisateurs dans la conception et l'exécution de workflows. Un avantage de ces systèmes par rapport aux approches traditionnelles est leur capacité à mémoriser automatiquement la provenance (ou lignage) des produits de données intermédiaires et finaux générés au cours de l'exécution du workflow. La provenance d'un produit de données contient des informations sur la façon dont le produit est dérivé, et est cruciale pour permettre aux scientifiques de comprendre, reproduire, et vérifier les résultats scientifiques facilement. Pour plusieurs raisons, la complexité du workflow et des structures d'exécution du workflow est en augmentation au fil du temps, ce qui a un impact évident sur la réutilisation des workflows scientifiques.L'objectif global de cette thèse est d'améliorer la réutilisation des workflows en fournissant des stratégies visant à réduire la complexité des structures de workflow tout en préservant la provenance. Deux stratégies sont introduites. Tout d'abord, nous proposons une approche de réécriture de la structure du graphe de n'importe quel workflow scientifique (classiquement représentée comme un graphe acyclique orienté (DAG)) dans une structure plus simple, à savoir une structure série-parallèle (SP) tout en préservant la provenance. Les SP-graphes sont simples et bien structurés, ce qui permet de mieux distinguer les principales étapes du workflow. En outre, d'un point de vue plus formel, on peut utiliser des algorithmes polynomiaux pour effectuer des opérations complexes fondées sur les graphiques (par exemple, la comparaison de workflows, ce qui est directement lié au problème d’homomorphisme de sous-graphes) lorsque les workflows ont des SP-structures alors que ces opérations sont reliées à des problèmes NP-hard pour des graphes qui sont des DAG sans aucune restriction sur leur structure. Nous avons introduit la notion de préservation de la provenance, conçu l’algorithme de réécriture SPFlow et réalisé l’outil associé.Deuxièmement, nous proposons une méthodologie avec une technique capable de réduire la redondance présente dans les workflow (en supprimant les occurrences inutiles de tâches). Plus précisément, nous détectons des « anti-modèles », un terme largement utilisé dans le domaine de la conception de programme, pour indiquer l'utilisation de formes idiomatiques qui mènent à une conception trop compliquée, et qui doit donc être évitée. Nous avons ainsi conçu l'algorithme DistillFlow qui est capable de transformer un workflow donné en un workflow sémantiquement équivalent «distillé», c’est-à-dire, qui est libre ou partiellement libre des anti-modèles et possède une structure plus concise et plus simple. Les deux principales approches de cette thèse (à savoir, SPFlow et DistillFlow) sont basées sur un modèle de provenance que nous avons introduit pour représenter la structure de la provenance des exécutions du workflowl. La notion de «provenance-équivalence» qui détermine si deux workflows ont la même signification est également au centre de notre travail. Nos solutions ont été testées systématiquement sur de grandes collections de workflows réels, en particulier avec le système Taverna. Nos outils sont disponibles à l'adresse: https://www.lri.fr/~chenj/. / Bioinformatics experiments are usually performed using scientific workflows in which tasks are chained together forming very intricate and nested graph structures. Scientific workflow systems have then been developed to guide users in the design and execution of workflows. An advantage of these systems over traditional approaches is their ability to automatically record the provenance (or lineage) of intermediate and final data products generated during workflow execution. The provenance of a data product contains information about how the product was derived, and it is crucial for enabling scientists to easily understand, reproduce, and verify scientific results. For several reasons, the complexity of workflow and workflow execution structures is increasing over time, which has a clear impact on scientific workflows reuse.The global aim of this thesis is to enhance workflow reuse by providing strategies to reduce the complexity of workflow structures while preserving provenance. Two strategies are introduced.First, we propose an approach to rewrite the graph structure of any scientific workflow (classically represented as a directed acyclic graph (DAG)) into a simpler structure, namely, a series-parallel (SP) structure while preserving provenance. SP-graphs are simple and layered, making the main phases of workflow easier to distinguish. Additionally, from a more formal point of view, polynomial-time algorithms for performing complex graph-based operations (e.g., comparing workflows, which is directly related to the problem of subgraph homomorphism) can be designed when workflows have SP-structures while such operations are related to an NP-hard problem for DAG structures without any restriction on their structures. The SPFlow rewriting and provenance-preserving algorithm and its associated tool are thus introduced.Second, we provide a methodology together with a technique able to reduce the redundancy present in workflows (by removing unnecessary occurrences of tasks). More precisely, we detect "anti-patterns", a term broadly used in program design to indicate the use of idiomatic forms that lead to over-complicated design, and which should therefore be avoided. We thus provide the DistillFlow algorithm able to transform a workflow into a distilled semantically-equivalent workflow, which is free or partly free of anti-patterns and has a more concise and simpler structure.The two main approaches of this thesis (namely, SPFlow and DistillFlow) are based on a provenance model that we have introduced to represent the provenance structure of the workflow executions. The notion of provenance-equivalence which determines whether two workflows have the same meaning is also at the center of our work. Our solutions have been systematically tested on large collections of real workflows, especially from the Taverna system. Our approaches are available for use at https://www.lri.fr/~chenj/. Workflow scientifique Provenance Provenance-équivalence Graphes séries-parallèles Taverna Anti-modèles Scientific workflows Provenance Graph rewriting Series-parallel graphs Taverna Anti-patterns
97	Um estudo da aplicação de técnicas de inteligência computacional e de aprendizado em máquina de mineração de processos de negócio / A study of the application of computational intelligence and machine learning techniques in business process mining Ana Rocío Cárdenas Maita 04 December 2015 (has links) Mineração de processos é uma área de pesquisa relativamente recente que se situa entre mineração de dados e aprendizado de máquina, de um lado, e modelagem e análise de processos de negócio, de outro lado. Mineração de processos visa descobrir, monitorar e aprimorar processos de negócio reais por meio da extração de conhecimento a partir de logs de eventos disponíveis em sistemas de informação orientados a processos. O principal objetivo deste trabalho foi avaliar o contexto de aplicação de técnicas provenientes das áreas de inteligência computacional e de aprendizado de máquina, incluindo redes neurais artificiais. Para fins de simplificação, denominadas no restante deste texto apenas como ``redes neurais\'\'. e máquinas de vetores de suporte, no contexto de mineração de processos. Considerando que essas técnicas são, atualmente, as mais aplicadas em tarefas de mineração de dados, seria esperado que elas também estivessem sendo majoritariamente aplicadas em mineração de processos, o que não tinha sido demonstrado na literatura recente e foi confirmado por este trabalho. Buscou-se compreender o amplo cenário envolvido na área de mineração de processos, incluindo as principais caraterísticas que têm sido encontradas ao longo dos últimos dez anos em termos de: tipos de mineração de processos, tarefas de mineração de dados usadas, e técnicas usadas para resolver tais tarefas. O principal enfoque do trabalho foi identificar se as técnicas de inteligência computacional e de aprendizado de máquina realmente não estavam sendo amplamente usadas em mineração de processos, ao mesmo tempo que se buscou identificar os principais motivos para esse fenômeno. Isso foi realizado por meio de um estudo geral da área, que seguiu rigor científico e sistemático, seguido pela validação das lições aprendidas por meio de um exemplo de aplicação. Este estudo considera vários enfoques para delimitar a área: por um lado, as abordagens, técnicas, tarefas de mineração e ferramentas comumente mais usadas; e, por outro lado, veículos de publicação, universidades e pesquisadores interessados no desenvolvimento da área. Os resultados apresentam que 81% das publicações atuais seguem as abordagens tradicionais em mineração de dados. O tipo de mineração de processos com mais estudo é Descoberta 71% dos estudos primários. Os resultados deste trabalho são valiosos para profissionais e pesquisadores envolvidos no tema, e representam um grande aporte para a área / Mining process is a relatively new research area that lies between data mining and machine learning, on one hand, and business process modeling and analysis, on the other hand. Mining process aims at discovering, monitoring and improving business processes by extracting real knowledge from event logs available in process-oriented information systems. The main objective of this master\'s project was to assess the application of computational intelligence and machine learning techniques, including, for example, neural networks and support vector machines, in process mining. Since these techniques are currently widely applied in data mining tasks, it would be expected that they were also widely applied to the process mining context, which has been not evidenced in recent literature and confirmed by this work. We sought to understand the broad scenario involved in the process mining area, including the main features that have been found over the last ten years in terms of: types of process mining, data mining tasks used, and techniques applied to solving such tasks. The main focus of the study was to identify whether the computational intelligence and machine learning techniques were indeed not being widely used in process mining whereas we sought to identify the main reasons for this phenomenon. This was accomplished through a general study area, which followed scientific and systematic rigor, followed by validation of the lessons learned through an application example. This study considers various approaches to delimit the area: on the one hand, approaches, techniques, mining tasks and more commonly used tools; and, on the other hand, the publication vehicles, universities and researchers interested in the development area. The results show that 81% of current publications follow traditional approaches to data mining. The type of mining processes more study is Discovery 71% of the primary studies. These results are valuable for practitioners and researchers involved in the issue, and represent a major contribution to the area Aprendizado de máquina BPM Gestão de Processos de Negócio Inteligência computacional Mineração de dados Mineração de processos Mineração de workflows BPM Business Process Management Computational Intelligence Data Mining Machine Learning Process Mining Workflow Mining
98	Portable Tools for Interoperable Grids : Modular Architectures and Software for Job and Workflow Management Tordsson, Johan January 2009 (has links) The emergence of Grid computing infrastructures enables researchers to shareresources and collaborate in more efficient ways than before, despite belongingto different organizations and being geographically distributed. While the Gridcomputing paradigm offers new opportunities, it also gives rise to newdifficulties. This thesis investigates methods, architectures, and algorithmsfor a range of topics in the area of Grid resource management. One studiedtopic is how to automate and improve resource selection, despite heterogeneityin Grid hardware, software, availability, ownership, and usage policies.Algorithmical difficulties for this are, e.g., characterization of jobs andresources, prediction of resource performance, and data placementconsiderations. Investigated Quality of Service aspects of resource selectioninclude how to guarantee job start and/or completion times as well as how tosynchronize multiple resources for coordinated use through coallocation.Another explored research topic is architectural considerations for frameworksthat simplify and automate submission, monitoring, and fault handling for largeamounts of jobs. This thesis also investigates suitable Grid interactionpatterns for scientific workflows, studies programming models that enable dataparallelism for such workflows, as well as analyzes how workflow compositiontools should be designed to increase flexibility and expressiveness. We today have the somewhat paradoxical situation where Grids, originally aimed tofederate resources and overcome interoperability problems between differentcomputing platforms, themselves struggle with interoperability problems causedby the wide range of interfaces, protocols, and data formats that are used indifferent environments. This thesis demonstrates how proof-of-concept softwaretools for Grid resource management can, by using (proposed) standard formatsand protocols as well as leveraging state-of-the-art principles fromservice-oriented architectures, be made independent of current Gridinfrastructures. Further interoperability contributions include an in-depthstudy that surveys issues related to the use of Grid resources in scientificworkflows. This study improves our understanding of interoperability amongscientific workflow systems by viewing this topic from three differentperspectives: model of computation, workflow language, and executionenvironment. A final contribution in this thesis is the investigation of how the design ofGrid middleware tools can adopt principles and concepts from softwareengineering in order to improve, e.g., adaptability and interoperability. Grid computing scheduling resource brokering performance predictions advance reservations coallocation standards-based infrastructure interoperability service-oriented architecture job management workflows data flow Computer science Datavetenskap
99	Inversion-based petrophysical interpretation of logging-while-drilling nuclear and resistivity measurements Ijasan, Olabode 01 October 2013 (has links) Undulating well trajectories are often drilled to improve length exposure to rock formations, target desirable hydrocarbon-saturated zones, and enhance resolution of borehole measurements. Despite these merits, undulating wells can introduce adverse conditions to the interpretation of borehole measurements which are seldom observed in vertical wells penetrating horizontal layers. Common examples are polarization horns observed across formation bed boundaries in borehole resistivity measurements acquired in highly-deviated wells. Consequently, conventional interpretation practices developed for vertical wells can yield inaccurate results in HA/HZ wells. A reliable approach to account for well trajectory and bed-boundary effects in the petrophysical interpretation of well logs is the application of forward and inverse modeling techniques because of their explicit use of measurement response functions. The main objective of this dissertation is to develop inversion-based petrophysical interpretation methods that quantitatively integrate logging-while-drilling (LWD) multi-sector nuclear (i.e., density, neutron porosity, photoelectric factor, natural gamma ray) and multi-array propagation resistivity measurements. Under the assumption of a multi-layer formation model, the inversion approach estimates formation properties specific to a given measurement domain by numerically reproducing the available measurements. Subsequently, compositional multi-mineral analysis of inverted layer-by-layer properties is implemented for volumetric estimation of rock and fluid constituents. The most important prerequisite for efficient petrophysical inversion is fast and accurate forward models that incorporate specific measurement response functions for numerical simulation of LWD measurements. In the nuclear measurement domain, first-order perturbation theory and flux sensitivity functions (FSFs) are reliable and accurate for rapid numerical simulation. Albeit efficient, these first-order approximations can be inaccurate when modeling neutron porosity logs, especially in the presence of borehole environmental effects (tool standoff or/and invasion) and across highly contrasting beds and complex formation geometries. Accordingly, a secondary thrust of this dissertation is the introduction of two new methods for improving the accuracy of rapid numerical simulation of LWD neutron porosity measurements. The two methods include: (1) a neutron-density petrophysical parameterization approach for describing formation macroscopic cross section, and (2) a one-group neutron diffusion flux-difference method for estimating perturbed spatial neutron porosity fluxes. Both methods are validated with full Monte Carlo (MC) calculations of spatial neutron detector FSFs and subsequent simulations of neutron porosity logs in the presence of LWD azimuthal standoff, invasion, and highly dipping beds. Analysis of field and synthetic verification examples with the combined resistivity-nuclear inversion method confirms that inversion-based estimation of hydrocarbon pore volume in HA/HZ wells is more accurate than conventional well-log analysis. Estimated hydrocarbon pore volume from conventional analysis can give rise to errors as high as 15% in undulating HA/HZ intervals. / text Petrophysics Log analysis Logging-while-drilling Inversion Borehole geophysics Algorithm Interpretation concepts Methods Integrated workflows Nuclear measurements resistivity measurements earth science computational petrophysics
100	Δυναμική ανάθεση υπολογιστικών πόρων και συ-ντονισμός εκτέλεσης πολύπλοκων διαδικασιών ανάλυσης δεδομένων σε υποδομή Cloud / Dynamic allocation of computational resources and workflow orchestration for data analysis in the Cloud Σφήκα, Νίκη 10 June 2015 (has links) Το Υπολογιστικό Νέφος (Cloud Computing) χαρακτηρίζεται ως το νέο μοντέλο ανάπτυξης λογισμικού και παροχής υπηρεσιών στον τομέα των Τεχνολογιών Πληροφορικής και Επικοινωνιών. Τα κύρια χαρακτηριστικά του είναι η κατά απαίτηση διάθεση υπολογιστικών πόρων, η απομακρυσμένη πρόσβαση σε αυτούς μέσω διαδικτύου και η ευελιξία των παρεχόμενων υπηρεσιών. Η ευελιξία επιτρέπει την αναβάθμιση ή υποβάθμιση των υπολογιστικών πόρων σύμφωνα με τις απαιτήσεις του τελικού χρήστη. Επιπλέον, η συνεχής αύξηση του μεγέθους της παραγόμενης από διάφορες πηγές πληροφορίας (διαδίκτυο, επιστημονικά πειράματα) έχει δημιουργήσει μία τεράστια ποσότητα πολύπλοκων και διάχυτων ψηφιακών δεδομένων . Η απόσπαση χρήσιμης γνώσης από μεγάλου όγκου ψηφιακά δεδομένα απαιτεί έξυπνες και ευκόλως επεκτάσιμες υπηρεσίες ανάλυσης, εργαλεία προγραμματισμού και εφαρμογές. Επομένως, η δυνατότητα της ελαστικότητας και της επεκτασιμότητας έχει κάνει το Υ-πολογιστικό Νέφος να είναι μια αναδυόμενη τεχνολογία αναφορικά με τις αναλύσεις μεγάλου όγκου δεδομένων οι οποίες απαιτούν παραλληλισμό, πολύπλοκες ροές ανάλυσης και υψηλό υπολογιστικό φόρτο εργασίας. Για την καλύτερη δυνατή διαχείριση πολύπλοκων αναλύσεων και ενορχήστρωση των απαιτούμενων διαδικασιών, είναι απαραίτητη η ένθεση ροών εργασιών. Μια ροή εργασίας είναι ένα οργανωμένο σύνολο ενεργειών που πρέπει να πραγματοποιηθούν για να επιτευχθεί μια εμπορική ή ερευνητική διεργασία, καθώς και οι μεταξύ τους εξαρτήσεις αφού κάθε ενέργεια αποτελείται από ορισμένα βήματα που πρέπει να εκτελεστούν σε συγκεκριμένη σειρά. Στην παρούσα μεταπτυχιακή διπλωματική εργασία δημιουργήθηκε ένα σύστημα για τη δυναμική διαχείριση των προσφερόμενων πόρων σε μια υποδομή Υπολογιστικού Νέφους και την εκτέλεση κατανεμημένων υλοποιήσεων υπολογιστικής ανάλυσης δεδομένων. Συγκεκριμένα, η εφαρμογή, αφού λάβει από το χρήστη τα δεδομένα εισόδου για την έναρξη μιας νέας διαδικασίας ανάλυσης, εξετάζει τα δεδομένα των επιστημονικών προβλημάτων καθώς και την πολυπλοκότητά τους και παρέχει δυναμικά και αυτόματα τους αντίστοιχους υπολογιστικούς πόρους για την εκτέλεση της αντίστοιχης λειτουργίας ανάλυσής τους. Επίσης, επιτρέπει την καταγραφή της ανάλυσης και αναθέτει τον συντονισμό της διαδικασίας σε αντίστοιχες ροές εργασιών ώστε να διευκολυνθεί η ενορχήστρωση των παρεχόμενων πόρων και η παρακολούθηση της εκτέλεσης της υπολογιστικής διαδικασίας. Η συγκεκριμένη μεταπτυχιακή εργασία, με τη χρήση τόσο των παρεχόμενων υπηρεσιών μιας υποδομής Υπολογιστικού Νέφους όσο και των δυνατοτήτων που παρέχουν οι ροές εργασιών στην διαχείριση των εργασιών, έχει σαν αποτέλεσμα να απλουστεύει την πρόσβαση, τον έλεγχο, την οργάνωση και την εκτέλεση πολύπλοκων και παράλληλων υλοποιήσεων ανάλυσης δεδομένων από την στιγμή εισαγωγής των δεδομένων από το χρήστη έως τον υπολογισμό του τελικού αποτελέσματος. Πιο αναλυτικά η διπλωματική εργασία επικεντρώθηκε στη πρόταση μιας ολοκληρωμένης λύσης για: 1. τη παροχή μιας εφαρμογής στην οποία ο χρήστης θα έχει τη δυνατότητα να εισάγεται και να ξεκινά μια σύνθετη ανάλυση δεδομένων, 2. τη δημιουργία της κατάλληλης υποδομής για τη δυναμική διάθεση πόρων από μια cloud υποδομή ανάλογα με τις ανάγκες του εκάστοτε προβλήματος και 3. την αυτοματοποιημένη εκτέλεση και συντονισμό της διαδικασίας της ανάλυσης με χρήση ροών εργασιών. Για την επικύρωση και αξιολόγηση της εφαρμογής, αναπτύχθηκε η πλατφόρμα IRaaS η οποία παρέχει στους χρήστες του τη δυνατότητα επίλυσης προβλημάτων πολλαπλών πεδίων / πολλαπλών φυσικών. Η πλατφόρμα IRaaS βασίστηκε πάνω στην προαναφερόμενη εφαρμογή για τη δυναμική ανάθεση υπολογιστικών πόρων και συντονισμός εκτέλεσης πολύπλοκων διαδικασιών ανάλυσης δεδομένων. Εκτελώντας μια σειρά αναλύσεων παρατηρήθηκε ότι η συγκεκριμένη εφαρμογή παρέχει καλύτερους χρόνους εκτέλεσης, μικρότερη δέσμευση υπολογιστικών πόρων και κατά συνέπεια μικρότερο κόστος για τις αναλύσεις. Η εγκατάσταση της πλατφόρμας IRaaS για την εκτέλεση των πειραμάτων έγινε στην υποδομή Υπολογιστικού Νέφους του εργαστηρίου Αναγνώρισης Προτύπων. Η υποδομή βασίστηκε στα λογισμικά XenServer και Cloudstack, τα οποία εγκαταστάθηκαν και παραμετροποιήθηκαν στα πλαίσια της παρούσας εργασίας. / Cloud Computing is the new software development and service providing model in the area of Information and Communication Technologies. The main aspects of Cloud Computing are the on-demand allocation of computational resources, the remote access to the latter via the Internet and the elasticity of the provided services. Elasticity provides the capability to scale the computational resources depending on the computational needs. The continuous proliferation of data warehouses, webpages, audio and video streams, tweets, and blogs is generating a massive amount of complex and pervasive digital data. Extracting useful knowledge from huge digital datasets requires smart and scalable analytics services, programming tools, and applications. Due to the aspects of elasticity and scalability, Cloud Computing has become an emerging technology regarding to big data analysis, which demands parallelization, complex workflow analysis and massive computational workload. In this respect, workflows have an important role in managing complex flows and orchestrating the required processes. A workflow is an orchestrated set of activities that are necessary in order to complete a commercial or scientific task, as well as any dependencies between these tasks, since each one of them can be further decomposed into finer tasks that need to be executed in a predefined order. In this thesis, a system is presented that dynamically allocates the available resources provided by a cloud infrastructure and orchestrates the execution of complex and distrib-uted data analysis on these allocated resources. In particular, the system calculates the required computational resources (memory and CPU) based on the size of the input data and on the available resources of the cloud infrastructure, concluding to allocate dynamically the most suitable resources. . Moreover, the application offers the ability to coordinate the distributed analysis process utilising workflows for the orchestration and monitoring of the different tasks of the computational flow execution. Taking advantage of the services provided by a cloud infrastructure as well as the functionality of workflows in task management, this thesis has resulted in simplifying access, control, coordination and execution of complex and parallel data analysis implementations from the moment that a user enters a set of input data to the computation of the final result. In this context, this thesis focuses on a comprehensive and integrated solution that: 1. provides an application, through which the user is able to log in and start a complex data analysis, 2. offers the necessary infrastructure for dynamically allocating the cloud resources of, based on the needs of the particular problem, and 3. executes and coordinates the analysis process automatically by leveraging workflows. In order to validate and evaluate the application, the IRaaS platform was developed, offering the ability of solving multi-domain/multi-physics problems. The IRaaS platform is based on the aforementioned system in order to enable the dynamic allocation of computational resources and to coordinate the execution of complex data analysis processes. By executing a series of experiments with different input data, we observed that the presented application resulted in improved execution times, better allocation of computational resources and, thus, lower cost. In order to perform experiments, the IRaaS platform was set up on the cloud infrastructure of Pattern Recognition laboratory. In the context of this thesis, a new infrastructure has been installed and parameterized based on XenServer as virtualization hypervisor and CloudStack platform for the creation of a private cloud infrastructure. Υπολογιστικό νέφος Ροές εργασιών 004.678 2 Cloud computing Workflows Dynamic resources allocation Big data analytics

Search results