Global ETD Search

51	In Pursuit of Optimal Workflow Within The Apache Software Foundation January 2017 (has links) abstract: The following is a case study composed of three workflow investigations at the open source software development (OSSD) based Apache Software Foundation (Apache). I start with an examination of the workload inequality within the Apache, particularly with regard to requirements writing. I established that the stronger a participant's experience indicators are, the more likely they are to propose a requirement that is not a defect and the more likely the requirement is eventually implemented. Requirements at Apache are divided into work tickets (tickets). In our second investigation, I reported many insights into the distribution patterns of these tickets. The participants that create the tickets often had the best track records for determining who should participate in that ticket. Tickets that were at one point volunteered for (self-assigned) had a lower incident of neglect but in some cases were also associated with severe delay. When a participant claims a ticket but postpones the work involved, these tickets exist without a solution for five to ten times as long, depending on the circumstances. I make recommendations that may reduce the incidence of tickets that are claimed but not implemented in a timely manner. After giving an in-depth explanation of how I obtained this data set through web crawlers, I describe the pattern mining platform I developed to make my data mining efforts highly scalable and repeatable. Lastly, I used process mining techniques to show that workflow patterns vary greatly within teams at Apache. I investigated a variety of process choices and how they might be influencing the outcomes of OSSD projects. I report a moderately negative association between how often a team updates the specifics of a requirement and how often requirements are completed. I also verified that the prevalence of volunteerism indicators is positively associated with work completion but what was surprising is that this correlation is stronger if I exclude the very large projects. I suggest the largest projects at Apache may benefit from some level of traditional delegation in addition to the phenomenon of volunteerism that OSSD is normally associated with. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2017 Management Industrial engineering mining software repositories open source software process mining sequential pattern mining software analytics workflow mining
52	A Sequential Pattern Mining Driven Framework for Developing Construction Logic Knowledge Bases Le, Chau, Shrestha, Krishna J., Jeong, H. D., Damnjanovic, Ivan 01 January 2021 (has links) One vital task of a project's owner is to determine a reliable and reasonable construction time for the project. A U.S. highway agency typically uses the bar chart or critical path method for estimating project duration, which requires the determination of construction logic. The current practice of activity sequencing is challenging, time-consuming, and heavily dependent upon the agency schedulers' knowledge and experience. Several agencies have developed templates of repetitive projects based on expert inputs to save time and support schedulers in sequencing a new project. However, these templates are deterministic, dependent on expert judgments, and get outdated quickly. This study aims to enhance the current practice by proposing a data-driven approach that leverages the readily available daily work report data of past projects to develop a knowledge base of construction sequence patterns. With a novel application of sequential pattern mining, the proposed framework allows for the determination of common sequential patterns among work items and proposed domain measures such as the confidence level of applying a pattern for future projects under different project conditions. The framework also allows for the extraction of only relevant sequential patterns for future construction time estimation. construction sequence daily work report knowledge base precedence network scheduling sequential pattern mining
53	Data Mining and Mathematical Models for Direct Market Campaign Optimization for Fred Meyer Jewelers Lin, Lebin January 2016 (has links) No description available. Industrial Engineering Operations Research Customer Relationship Management Sequential Pattern Mining Data Mining Optimization Time Based Sequential Pattern
54	Traitement de données numériques par analyse formelle de concepts et structures de patrons / Mining numerical data with formal concept analysis and pattern structures Kaytoue, Mehdi 22 April 2011 (has links) Le sujet principal de cette thèse porte sur la fouille de données numériques et plus particulièrement de données d'expression de gènes. Ces données caractérisent le comportement de gènes dans diverses situations biologiques (temps, cellule, etc.). Un problème important consiste à établir des groupes de gènes partageant un même comportement biologique. Cela permet d'identifier les gènes actifs lors d'un processus biologique, comme par exemple les gènes actifs lors de la défense d'un organisme face à une attaque. Le cadre de la thèse s'inscrit donc dans celui de l'extraction de connaissances à partir de données biologiques. Nous nous proposons d'étudier comment la méthode de classification conceptuelle qu'est l'analyse formelle de concepts (AFC) peut répondre au problème d'extraction de familles de gènes. Pour cela, nous avons développé et expérimenté diverses méthodes originales en nous appuyant sur une extension peu explorée de l'AFC : les structures de patrons. Plus précisément, nous montrons comment construire un treillis de concepts synthétisant des familles de gènes à comportement similaire. L'originalité de ce travail est (i) de construire un treillis de concepts sans discrétisation préalable des données de manière efficace, (ii) d'introduire une relation de similarité entres les gènes et (iii) de proposer des ensembles minimaux de conditions nécessaires et suffisantes expliquant les regroupements formés. Les résultats de ces travaux nous amènent également à montrer comment les structures de patrons peuvent améliorer la prise de décision quant à la dangerosité de pratiques agricoles dans le vaste domaine de la fusion d'information / The main topic of this thesis addresses the important problem of mining numerical data, and especially gene expression data. These data characterize the behaviour of thousand of genes in various biological situations (time, cell, etc.).A difficult task consists in clustering genes to obtain classes of genes with similar behaviour, supposed to be involved together within a biological process.Accordingly, we are interested in designing and comparing methods in the field of knowledge discovery from biological data. We propose to study how the conceptual classification method called Formal Concept Analysis (FCA) can handle the problem of extracting interesting classes of genes. For this purpose, we have designed and experimented several original methods based on an extension of FCA called pattern structures. Furthermore, we show that these methods can enhance decision making in agronomy and crop sanity in the vast formal domain of information fusion Découverte de connaissances Analyse formelle de concepts Extraction de motifs numériques Bi-clustering Fusion d'information Knowledge discovery in databases Formal concept analysis Numerical pattern mining Biclustering Information fusion
55	VISUAL SEMANTIC SEGMENTATION AND ITS APPLICATIONS Gao, Jizhou 01 January 2013 (has links) This dissertation addresses the difficulties of semantic segmentation when dealing with an extensive collection of images and 3D point clouds. Due to the ubiquity of digital cameras that help capture the world around us, as well as the advanced scanning techniques that are able to record 3D replicas of real cities, the sheer amount of visual data available presents many opportunities for both academic research and industrial applications. But the mere quantity of data also poses a tremendous challenge. In particular, the problem of distilling useful information from such a large repository of visual data has attracted ongoing interests in the fields of computer vision and data mining. Structural Semantics are fundamental to understanding both natural and man-made objects. Buildings, for example, are like languages in that they are made up of repeated structures or patterns that can be captured in images. In order to find these recurring patterns in images, I present an unsupervised frequent visual pattern mining approach that goes beyond co-location to identify spatially coherent visual patterns, regardless of their shape, size, locations and orientation. First, my approach categorizes visual items from scale-invariant image primitives with similar appearance using a suite of polynomial-time algorithms that have been designed to identify consistent structural associations among visual items, representing frequent visual patterns. After detecting repetitive image patterns, I use unsupervised and automatic segmentation of the identified patterns to generate more semantically meaningful representations. The underlying assumption is that pixels capturing the same portion of image patterns are visually consistent, while pixels that come from different backdrops are usually inconsistent. I further extend this approach to perform automatic segmentation of foreground objects from an Internet photo collection of landmark locations. New scanning technologies have successfully advanced the digital acquisition of large-scale urban landscapes. In addressing semantic segmentation and reconstruction of this data using LiDAR point clouds and geo-registered images of large-scale residential areas, I develop a complete system that simultaneously uses classification and segmentation methods to first identify different object categories and then apply category-specific reconstruction techniques to create visually pleasing and complete scene models. Automatic Segmentation Frequent Visual Pattern Mining Internet Photo Collections LiDAR Data Artificial Intelligence and Robotics Graphics and Human Computer Interfaces
56	Knowledge discovery using pattern taxonomy model in text mining Wu, Sheng-Tang January 2007 (has links) In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem. This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches. pattern taxonomy model information retrieval text mining data mining association rules sequential pattern mining closed sequential patterns pattern deploying pattern evolving
57	Rough set-based reasoning and pattern mining for information filtering Zhou, Xujuan January 2008 (has links) An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).
58	Effective Characterization of Sequence Data through Frequent Episodes Ibrahim, A January 2015 (has links) (PDF) Pattern discovery is an important area of data mining referring to a class of techniques designed for the extraction of interesting patterns from the data. A pattern is some kind of a local structure that captures correlations and dependencies present in the elements of the data. In general, pattern discovery is about finding all patterns of `interest' in the data and a popular measure of interestingness for a pattern is its frequency of occurrence in the data. Thus the problem of frequent pattern discovery is to find all patterns in the data whose frequency of occurrence exceeds some user defined threshold. However, frequency of a pattern is not the only measure for finding patterns of interest and there also exist other measures and techniques for finding interesting patterns. This thesis is concerned with efficient discovery of inherent patterns from long sequence (or temporally ordered) data. Mining of such sequentially ordered data is called temporal data mining and the temporal patterns that are discovered from large sequential data are called episodes. More specifically, this thesis explores efficient methods for finding small and relevant subsets of episodes from sequence data that best characterize the data. The thesis also discusses methods for comparing datasets, based on comparing the sets of patterns representing the datasets. The data in a frequent episode discovery framework is abstractly viewed as a single long sequence of events. Here, the event is a tuple, (Ei; ti), where Ei is referred to as an event-type (taking values from a finite alphabet set) and ti is the time of occurrence. The events are ordered in the non-decreasing order of the time of occurrence. The pattern of interest in such a sequence is called an episode, which is a collection of event-types with a partial order defined over it. In this thesis, the focus is on a special type of episode called serial episode, where there is a total order defined among the collection of event-types representing the episode. The occurrence of an episode is essentially a subset of events from the data whose event-types match the set of eventtypes associated with the episode and the order in which they occur conforms to the underlying partial order of the episode. The frequency of an episode is some measure of how often it occurs in the event stream. Many different notions of frequency have been defined in literature. Given a frequency definition, the goal of frequent episode discovery is to unearth all episodes which have a frequency greater than a user-defined threshold. The size of an episode is the number of event-types in the episode. An episode β is called a subepisode of another episode β, if the collection of event-types of β is a subset of the corresponding collection of α and the event-types of β satisfy the same partial order relationships present among the corresponding event-types of α. The set of all episodes can be arranged in a partial order lattice, where each level i contains episodes of size i and the partial order is the subepisode relationship. In general, there are two approaches for mining frequent episodes, based on the way one traverses this lattice. The first approach is to traverse this lattice in a breadth-first manner, and is called the Apriori approach. The other approach is the Pattern growth approach, where the lattice is traversed in a depth-first manner. There exist different frequency notions for episodes, and many Apriori based algorithms have been proposed for mining frequent episodes under the different frequencies. However there do not exist Pattern-growth based methods for many of the frequency notions. The first part of the thesis proposes new Pattern-growth methods for discovering frequent serial episodes under two frequency notions called the non-overlapped frequency and the total frequency. Special cases, where certain additional conditions, called the span and gap constraints, are imposed on the occurrences of the episodes are also considered. The proposed methods, in general, consist of two steps: the candidate generation step and the counting step. The candidate generation step involves finding potential frequent episodes. This is done by following the general Pattern growth approach for finding the candidates, which is the depth-first traversal of the lattice of all episodes. The second step, which is the counting step, involves counting the frequencies of the episodes. The thesis presents efficient methods for counting the occurrences of serial episodes using occurrence windows of subepisodes for both the non-overlapped and total frequency. The relative advantages of Pattern-growth approaches over Apriori approaches are also discussed. Through detailed simulation results, the effectiveness of this approach on a host of synthetic and real data sets is shown. It is shown that the proposed methods are highly scalable and efficient in runtime as compared to the existing Apriori approaches. One of the main issues in frequent pattern mining is the huge number of frequent patterns, returned by the discovery methods, irrespective of the approach taken to solve the problems. The second part of this thesis, addresses this issue and discusses methods of selecting a small subset of relevant episodes from event sequences. There have been a few approaches, discussed in the literature, for finding a small subset of patterns. One set of methods are information theory based methods, where patterns that provide maximum information are searched for. Another approach is the Minimum Description Length (MDL) principle based summarization schemes. Here the data is encoded using a subset of patterns (which forms the model for the data) and its occurrences. The subset of patterns that has the maximum efficiency in encoding the data is the best representative model for the data. The MDL principle takes into account both the encoding efficiency of the model as well as model complexity. A method, called Constrained Serial episode Coding(CSC), is proposed based on the MDL principle, which returns a highly relevant, non-redundant and small subset of serial episodes. This also includes an encoding scheme, where the model representation and the encoding of the data are efficient. An interesting feature of this algorithm for isolating a small set of relevant episodes is that it does not need a user-specified threshold on frequency. The effectiveness of this method is shown on two types of data. The first is data obtained from a detailed simulator for a reconfigurable coupled conveyor system. The conveyor system consists of different intersecting paths and packages flow through such a network. Mining of such data can allow one to unearth the main paths of package ows which can be useful in remote monitoring and visualization of the system. On this data, it is shown that the proposed method is able to return highly consistent sub paths, in the form of serial episodes, with great encoding efficiency as compared to other known related sequence summarization schemes, like SQS and GoKrimp. The second type of data consists of a collection of multi-class sequence datasets. It is shown that the selected episodes from the proposed method form good features in classi cation. The proposed method is compared with SQS and GoKrimp, and it is shown that the episodes selected by this method help in achieving better classification results as compared to other methods. The third and nal part of the thesis discusses methods for comparing sets of patterns representing different datasets. There are many instances when one is interested in comparing datasets. For example, in streaming data, one is interested in knowing whether the characteristics of the data are the same or have changed significantly. In other cases, one may simply like to compare two datasets and quantify the degree of similarity between them. Often, data are characterized by a set of patterns as described above. Comparing sets of patterns representing datasets gives information about the similarity/dissimilarity between the datasets. However not many measures exist for comparing sets of patterns. This thesis proposes a similarity measure for comparing sets of patterns which in turn aids in comparison of di erent datasets. First, a kernel for comparing two patterns, called the Pattern Kernel, is proposed. This kernel is proposed for three types of patterns: serial episodes, sequential patterns and itemsets. Using this kernel, a Pattern Set Kernel is proposed for comparing different sets of patterns. The effectiveness of this kernel is shown in classification and change detection. The thesis concludes with a summary of the main contributions and some suggestions for extending the work presented here. Data Mining Pattern Discovery Pattern Mining Sequencial Pattern Episode Formalism Episode Discovery Pattern Set Kernel Episodes Pattern Kernel Frequent Episode Mining Electrical Engineering
59	Mönsteridentifikation på sociala medier : Hur påverkar de webbdesign? / Pattern mining on social media : How do they affect web design? Garnås, Amelie, Duné, Linnéa January 2014 (has links) I denna uppsats undersöks designmönster. Tidigare, erkända designmönsterbibliotek har använts som grund för att ta fram en ny designmönstermall. Sedan har sex designmönster identifierats och skapats från sociala medier med hjälp av mönsteridentifikation, dessa är: gilla-knapp, hashtag, dela, kommentera, lägga upp bilder och nyhetsflöde. Vidare har intervjuer med sju respondenter från olika Stockholmbaserade webbyråer genomförts för att undersöka huruvida de identifierade mönstren från sociala medier påverkar deras webbdesign. Centralt för undersökningen var dessutom att ta reda på om webbyråer idag överhuvudtaget använder sig av designmönster när de bygger en webbplats. Efter genomförda studier framgår det att majoriteten av de undersökta webbyråerna inte tidigare hört talas om designmönster i den benämningen som uppsatsen följer. Webbyråerna använde sig istället av inspiration från andra lyckade sidor för att hitta lösningar på problem och följde också trender inom webbutveckling. Av de sex designmönstren från sociala medier används tre regelbundet, gillaknapp, dela och nyhetsflöde (långa startsidor). / This paper studies design patterns. Existing, known pattern libraries have been used as a foundation for a new pattern template. Six new design patterns have been identified and created from social media with the help of pattern mining, these are: the like button, hashtag, share, comment, posting pictures and news feed. Furthermore, interviews have been conducted with seven respondents from different Stockholm based web design agencies to examine how these six identified design patterns have affected their web design. A central part of this study has also been to see if web design agencies even use design patterns when they are building a web site. After the interviews it is clear that the majority of web designers have not heard about design patterns the way this paper defines them. Web designers are working with inspiration from other web sites to help them find design solutions and they are also following trends in web development. Of the six design patterns from social media only three are regularly used, the like button, share and news feed. Design patterns pattern language pattern library pattern mining interaction design web design social media Designmönster designmönsterspråk designmönsterbibliotek mönsteridentifikation interaktionsdesign webbdesign sociala medier Media and Communication Technology Medieteknik
60	Apport des paradigmes des Systèmes à Evènements Discrets pour la réduction du flux d’alarmes industrielles / Contribution of Discrete Event Systems paradigms for reducing industrial alarm flows Laumonier, Yannick 28 November 2019 (has links) Les systèmes d'alarmes jouent un rôle critique dans la bonne exploitation des installations industrielles modernes. Cependant, dans la plupart de ces systèmes, les alarmes ne sont pas toujours traitées correctement par les opérateurs car il y a régulièrement beaucoup trop d’alarmes à gérer, notamment lors des avalanches d’alarmes. Pour réduire le flux d'alarmes, notre approche consiste à détecter des redondances entre alarmes qui pourraient être supprimées. Pour atteindre cet objectif, nous recherchons dans un premier temps les adjacences fréquentes entre les alarmes contenues dans un historique. Ceci est réalisé en adaptant l’algorithme de recherche de motifs fréquents AprioriAll. Nous explorons également une seconde méthode consistant à trouver des précédences systématiques. Pour les découvrir, nous identifions les relations de domination et de mutuelle dépendance contenues dans l’historique des alarmes. Pour faciliter l’analyse experte, les relations découvertes sont traduites sous la forme d’un réseau de Pétri.Ces deux méthodes sont ensuite confrontées à un historique d’alarmes industriel fourni par General Electric. Les résultats obtenus montrent que nos deux méthodes permettent une réduction globale du flux d'alarmes qui est plus importante durant les avalanches. / Alarm systems play an important role for the safe and efficient operation of modern industrial plants. However, in most of industrial alarm systems, alarm flows cannot always be correctly managed by the operators as they often turn into alarm floods, sequences of numerous alarms occurring in a short period of time. To reduce the alarm flows, this report focuses on detection of redundant alarms that could be removed. This objective is met by, first, looking for frequent adjacency in the alarm log. To identify them, the frequent pattern mining algorithm AprioriAll is adapted. Another way to find potentially redundant alarms is to look for systematic predecessors. To discover them, dominations and mutual dependencies contained in the alarm log are identified. To ease this analysis, the discovered relations are depicted in the form of Petri nets.Both those methods are then tested against an industrial alarm log made available by General Electric. The results show that both methods allow a reduction of the alarm flow, with the biggest reduction being during alarm floods. Alarmes industrielles Systèmes à événements discrets Réseaux de Petri Recherche de motifs Filtrage d'alarmes Industrial alarms Discrete event systems Petri nets Pattern mining Alarm filtering

Search results