Global ETD Search

21	In Pursuit of Optimal Workflow Within The Apache Software Foundation January 2017 (has links) abstract: The following is a case study composed of three workflow investigations at the open source software development (OSSD) based Apache Software Foundation (Apache). I start with an examination of the workload inequality within the Apache, particularly with regard to requirements writing. I established that the stronger a participant's experience indicators are, the more likely they are to propose a requirement that is not a defect and the more likely the requirement is eventually implemented. Requirements at Apache are divided into work tickets (tickets). In our second investigation, I reported many insights into the distribution patterns of these tickets. The participants that create the tickets often had the best track records for determining who should participate in that ticket. Tickets that were at one point volunteered for (self-assigned) had a lower incident of neglect but in some cases were also associated with severe delay. When a participant claims a ticket but postpones the work involved, these tickets exist without a solution for five to ten times as long, depending on the circumstances. I make recommendations that may reduce the incidence of tickets that are claimed but not implemented in a timely manner. After giving an in-depth explanation of how I obtained this data set through web crawlers, I describe the pattern mining platform I developed to make my data mining efforts highly scalable and repeatable. Lastly, I used process mining techniques to show that workflow patterns vary greatly within teams at Apache. I investigated a variety of process choices and how they might be influencing the outcomes of OSSD projects. I report a moderately negative association between how often a team updates the specifics of a requirement and how often requirements are completed. I also verified that the prevalence of volunteerism indicators is positively associated with work completion but what was surprising is that this correlation is stronger if I exclude the very large projects. I suggest the largest projects at Apache may benefit from some level of traditional delegation in addition to the phenomenon of volunteerism that OSSD is normally associated with. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2017 Management Industrial engineering mining software repositories open source software process mining sequential pattern mining software analytics workflow mining
22	A Sequential Pattern Mining Driven Framework for Developing Construction Logic Knowledge Bases Le, Chau, Shrestha, Krishna J., Jeong, H. D., Damnjanovic, Ivan 01 January 2021 (has links) One vital task of a project's owner is to determine a reliable and reasonable construction time for the project. A U.S. highway agency typically uses the bar chart or critical path method for estimating project duration, which requires the determination of construction logic. The current practice of activity sequencing is challenging, time-consuming, and heavily dependent upon the agency schedulers' knowledge and experience. Several agencies have developed templates of repetitive projects based on expert inputs to save time and support schedulers in sequencing a new project. However, these templates are deterministic, dependent on expert judgments, and get outdated quickly. This study aims to enhance the current practice by proposing a data-driven approach that leverages the readily available daily work report data of past projects to develop a knowledge base of construction sequence patterns. With a novel application of sequential pattern mining, the proposed framework allows for the determination of common sequential patterns among work items and proposed domain measures such as the confidence level of applying a pattern for future projects under different project conditions. The framework also allows for the extraction of only relevant sequential patterns for future construction time estimation. construction sequence daily work report knowledge base precedence network scheduling sequential pattern mining
23	Discovering Contiguous Sequential Patterns in Network-Constrained Movement Yang, Can January 2017 (has links) A large proportion of movement in urban area is constrained to a road network such as pedestrian, bicycle and vehicle. That movement information is commonly collected by Global Positioning System (GPS) sensor, which has generated large collections of trajectories. A contiguous sequential pattern (CSP) in these trajectories represents a certain number of objects traversing a sequence of spatially contiguous edges in the network, which is an intuitive way to study regularities in network-constrained movement. CSPs are closely related to route choices and traffic flows and can be useful in travel demand modeling and transportation planning. However, the efficient and scalable extraction of CSPs and effective visualization of the heavily overlapping CSPs are remaining challenges. To address these challenges, the thesis develops two algorithms and a visual analytics system. Firstly, a fast map matching (FMM) algorithm is designed for matching a noisy trajectory to a sequence of edges traversed by the object with a high performance. Secondly, an algorithm called bidirectional pruning based closed contiguous sequential pattern mining (BP-CCSM) is developed to extract sequential patterns with closeness and contiguity constraint from the map matched trajectories. Finally, a visual analytics system called sequential pattern explorer for trajectories (SPET) is designed for interactive mining and visualization of CSPs in a large collection of trajectories. Extensive experiments are performed on a real-world taxi trip GPS dataset to evaluate the algorithms and visual analytics system. The results demonstrate that FMM achieves a superior performance by replacing repeated routing queries with hash table lookups. BP-CCSM considerably outperforms three state-of-the-art algorithms in terms of running time and memory consumption. SPET enables the user to efficiently and conveniently explore spatial and temporal variations of CSPs in network-constrained movement. / <p>QC 20171122</p> map matching trajectory pattern mining trajectory pattern visualization Other Computer and Information Science Annan data- och informationsvetenskap
24	CHSPAM: um modelo multi-domínio para acompanhamento de padrões em históricos de contextos DUPONT, Daniel Ambrosi 21 March 2017 (has links) Submitted by JOSIANE SANTOS DE OLIVEIRA (josianeso) on 2017-05-22T11:24:15Z No. of bitstreams: 1 Daniel Ambrosi Dupont_.pdf: 2397654 bytes, checksum: 6d41d597126ff9f150969b3b5ad9fd1b (MD5) / Made available in DSpace on 2017-05-22T11:24:15Z (GMT). No. of bitstreams: 1 Daniel Ambrosi Dupont_.pdf: 2397654 bytes, checksum: 6d41d597126ff9f150969b3b5ad9fd1b (MD5) Previous issue date: 2017-03-21 / Nenhuma / A Computação Ubíqua estuda o desenvolvimento de técnicas que visam integrar perfeitamente a tecnologia da informação ao cotidiano das pessoas, de modo que elas sejam auxiliadas pelos recursos tecnológicos no mundo real, de forma pró-ativa, enquanto realizam atividades diárias. Um dos aspectos fundamentais para o desenvolvimento deste tipo de aplicação é a questão da Sensibilidade ao Contexto, que permite a uma aplicação adaptar o seu funcionamento conforme o contexto no qual o usuário se encontra. Com o desenvolvimento de sistemas que utilizam informações de contextos armazenados anteriormente, foram surgindo bases de dados que armazenam os Históricos de Contextos capturados ao longo do tempo. Muitos pesquisadores têm estudado diferentes formas para realização de análises nestes dados. Este trabalho aborda um tipo específico de análise de dados em históricos de contextos, que é a busca e acompanhamento de padrões. Deste modo, é proposto um modelo denominado CHSPAM (Context History Pattern Monitoring) que permite a realização de descoberta e acompanhamento de padrões sequenciais em bases de Históricos de Contextos, fazendo uso de técnicas de mineração de dados já existentes. O diferencial deste trabalho é o uso de uma representação genérica para o armazenamento de contextos, permitindo sua aplicação em múltiplos domínios. Outro diferencial é que o modelo realiza o acompanhamento dos padrões descobertos durante o tempo, armazenando um histórico da evolução de cada padrão. Um protótipo foi implementado e, a partir dele, foram realizados três experimentos. O primeiro foi utilizado para avaliar as funcionalidades e serviços oferecidos pelo CHSPAM e foi baseado em dados sintéticos. No segundo, o modelo foi utilizado em uma aplicação de predição e o acompanhamento de padrões proporcionou ganhos na precisão das predições quando comparado ao uso de padrões sem acompanhamento. Por fim, no terceiro experimento, o CHSPAM foi utilizado como componente de uma aplicação de recomendação de objetos de aprendizagem e a aplicação foi capaz de identificar objetos relacionados aos interesses de alunos, utilizando como base o acompanhamento de padrões. / Ubiquitous computing aims to make tasks that depend on computing, transparent to users, thus, providing resources and services anytime and anywhere. One of the key factors to the development this type of application is the matter of Context Awareness, which enables an application to adjust its operation as the situation in which the user is. Thus, several authors have presented formal definitions of what is a context and how to represent it. With the development of systems that use Context information previously stored, databases have emerged that store Historical Contexts captured over time. Many researchers have studied different ways to analyzes this data. This paper addresses a specific type of data analysis in historical contexts, which is the discovery and monitoring of patterns in Context Histories. For this purpose, a model called CHSPAM (Context History Pattern Monitoring) is proposed, that allows the discovery of patterns in Context History databases and keeps track of these patterns to monitor their evolution over the time. Ubiquitous computing aims aim to integrate information technology perfectly into people's daily lives, so that people are aided by technological resources in the real world, proactively, while performing daily activities. One of the fundamental aspects for the development of this type of application is the issue of Context Awareness, which allows an application to adapt its operation according to the context in which the user is. With the development of systems that use information from previously stored contexts, databases have emerged that store captured Context Histories over time. Many researchers have studied different ways to perform analyzes on these data. This work addresses a specific type of data analysis in context histories, which is the search for sequential patterns. With this purpose, a model called CHSPAM (Context History Pattern Monitoring) is proposed that allows the discovery of sequential patterns in Context Historical databases, making use of existing data mining techniques. The main contributions of this work are the use of a generic representation for the storage of contexts allowing its application in multiple domains. Another contribution is that the model monitors the patterns discovered over time, storing history pattern evolution. A prototype of the model was implemented, and from it three experiments were carried out for its validation. The first experiment was used to evaluate the functionalities and services offered by CHSPAM and was based on simulated data. In the second experiment, the model was used in a prediction application and the use of monitored sequential patterns provided accuracy improvement on predictions when compared to the use of common patterns. Finally, in the third experiment, CHSPAM was used as a component of a learning object recommendation application and the application was able to recommend objects related to students’ interests based on monitored sequential patterns extracted from users’ session history. Computação ubíqua Descoberta de padrões Históricos de contextos Mineração de dados Ubiquitous computing Sequential pattern discovery Context history Data mining
25	Knowledge discovery using pattern taxonomy model in text mining Wu, Sheng-Tang January 2007 (has links) In the last decade, many data mining techniques have been proposed for fulfilling various knowledge discovery tasks in order to achieve the goal of retrieving useful information for users. Various types of patterns can then be generated using these techniques, such as sequential patterns, frequent itemsets, and closed and maximum patterns. However, how to effectively exploit the discovered patterns is still an open research issue, especially in the domain of text mining. Most of the text mining methods adopt the keyword-based approach to construct text representations which consist of single words or single terms, whereas other methods have tried to use phrases instead of keywords, based on the hypothesis that the information carried by a phrase is considered more than that by a single term. Nevertheless, these phrase-based methods did not yield significant improvements due to the fact that the patterns with high frequency (normally the shorter patterns) usually have a high value on exhaustivity but a low value on specificity, and thus the specific patterns encounter the low frequency problem. This thesis presents the research on the concept of developing an effective Pattern Taxonomy Model (PTM) to overcome the aforementioned problem by deploying discovered patterns into a hypothesis space. PTM is a pattern-based method which adopts the technique of sequential pattern mining and uses closed patterns as features in the representative. A PTM-based information filtering system is implemented and evaluated by a series of experiments on the latest version of the Reuters dataset, RCV1. The pattern evolution schemes are also proposed in this thesis with the attempt of utilising information from negative training examples to update the discovered knowledge. The results show that the PTM outperforms not only all up-to-date data mining-based methods, but also the traditional Rocchio and the state-of-the-art BM25 and Support Vector Machines (SVM) approaches. pattern taxonomy model information retrieval text mining data mining association rules sequential pattern mining closed sequential patterns pattern deploying pattern evolving
26	Rough set-based reasoning and pattern mining for information filtering Zhou, Xujuan January 2008 (has links) An information filtering (IF) system monitors an incoming document stream to find the documents that match the information needs specified by the user profiles. To learn to use the user profiles effectively is one of the most challenging tasks when developing an IF system. With the document selection criteria better defined based on the users’ needs, filtering large streams of information can be more efficient and effective. To learn the user profiles, term-based approaches have been widely used in the IF community because of their simplicity and directness. Term-based approaches are relatively well established. However, these approaches have problems when dealing with polysemy and synonymy, which often lead to an information overload problem. Recently, pattern-based approaches (or Pattern Taxonomy Models (PTM) [160]) have been proposed for IF by the data mining community. These approaches are better at capturing sematic information and have shown encouraging results for improving the effectiveness of the IF system. On the other hand, pattern discovery from large data streams is not computationally efficient. Also, these approaches had to deal with low frequency pattern issues. The measures used by the data mining technique (for example, “support” and “confidences”) to learn the profile have turned out to be not suitable for filtering. They can lead to a mismatch problem. This thesis uses the rough set-based reasoning (term-based) and pattern mining approach as a unified framework for information filtering to overcome the aforementioned problems. This system consists of two stages - topic filtering and pattern mining stages. The topic filtering stage is intended to minimize information overloading by filtering out the most likely irrelevant information based on the user profiles. A novel user-profiles learning method and a theoretical model of the threshold setting have been developed by using rough set decision theory. The second stage (pattern mining) aims at solving the problem of the information mismatch. This stage is precision-oriented. A new document-ranking function has been derived by exploiting the patterns in the pattern taxonomy. The most likely relevant documents were assigned higher scores by the ranking function. Because there is a relatively small amount of documents left after the first stage, the computational cost is markedly reduced; at the same time, pattern discoveries yield more accurate results. The overall performance of the system was improved significantly. The new two-stage information filtering model has been evaluated by extensive experiments. Tests were based on the well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely, the Reuters Corpus Volume 1 (RCV1). The performance of the new two-stage model was compared with both the term-based and data mining-based IF models. The results demonstrate that the proposed information filtering system outperforms significantly the other IF systems, such as the traditional Rocchio IF model, the state-of-the-art term-based models, including the BM25, Support Vector Machines (SVM), and Pattern Taxonomy Model (PTM).
27	Fouille de motifs : entre accessibilité et robustesse / Pattern mining : between accessibility and robustness Abboud, Yacine 28 November 2018 (has links) L'information occupe désormais une place centrale dans notre vie quotidienne, elle est à la fois omniprésente et facile d'accès. Pourtant, l'extraction de l'information à partir des données est un processus souvent inaccessible. En effet, même si les méthodes de fouilles de données sont maintenant accessibles à tous, les résultats de ces fouilles sont souvent complexes à obtenir et à exploiter pour l'utilisateur. La fouille de motifs combinée à l'utilisation de contraintes est une direction très prometteuse de la littérature pour à la fois améliorer l'efficience de la fouille et rendre ses résultats plus appréhendables par l'utilisateur. Cependant, la combinaison de contraintes désirée par l'utilisateur est souvent problématique car, elle n'est pas toujours adaptable aux caractéristiques des données fouillées tel que le bruit. Dans cette thèse, nous proposons deux nouvelles contraintes et un algorithme pour pallier ce problème. La contrainte de robustesse permet de fouiller des données bruitées en conservant la valeur ajoutée de la contrainte de contiguïté. La contrainte de clôture allégée améliore l'appréhendabilité de la fouille de motifs tout en étant plus résistante au bruit que la contrainte de clôture classique. L'algorithme C3Ro est un algorithme générique de fouille de motifs séquentiels intégrant de nombreuses contraintes, notamment les deux nouvelles contraintes que nous avons introduites, afin de proposer à l'utilisateur la fouille la plus efficiente possible tout en réduisant au maximum la taille de l'ensemble des motifs extraits. C3Ro rivalise avec les meilleurs algorithmes de fouille de motifs de la littérature en termes de temps d'exécution tout en consommant significativement moins de mémoire. C3Ro a été expérimenté dans le cadre de l’extraction de compétences présentes dans les offres d'emploi sur le Web / Information now occupies a central place in our daily lives, it is both ubiquitous and easy to access. Yet extracting information from data is often an inaccessible process. Indeed, even though data mining methods are now accessible to all, the results of these mining are often complex to obtain and exploit for the user. Pattern mining combined with the use of constraints is a very promising direction of the literature to both improve the efficiency of the mining and make its results more apprehensible to the user. However, the combination of constraints desired by the user is often problematic because it does not always fit with the characteristics of the searched data such as noise. In this thesis, we propose two new constraints and an algorithm to overcome this issue. The robustness constraint allows to mine noisy data while preserving the added value of the contiguity constraint. The extended closedness constraint improves the apprehensibility of the set of extracted patterns while being more noise-resistant than the conventional closedness constraint. The C3Ro algorithm is a generic sequential pattern mining algorithm that integrates many constraints, including the two new constraints that we have introduced, to provide the user the most efficient mining possible while reducing the size of the set of extracted patterns. C3Ro competes with the best pattern mining algorithms in the literature in terms of execution time while consuming significantly less memory. C3Ro has been experienced in extracting competencies from web-based job postings Fouille de données Fouille de motifs Contraintes Résistance au bruit Data mining Pattern mining Constraints Noise-resistant 006.312
28	Leveraging Sequential Nature of Conversations for Intent Classification Gotteti, Shree January 2021 (has links) No description available. Computer Science Conversation Understanding Multi-labeled Text Classification Intent Classification Similarity Measures Sequential Pattern Mining Hierarchical Goal/Intent Networks Natural Language Understanding
29	Obtenção de padrões sequenciais em data streams atendendo requisitos do Big Data Carvalho, Danilo Codeco 06 June 2016 (has links) Submitted by Daniele Amaral (daniee_ni@hotmail.com) on 2016-10-20T18:13:56Z No. of bitstreams: 1 DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-11-08T18:42:36Z (GMT) No. of bitstreams: 1 DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) / Approved for entry into archive by Marina Freitas (marinapf@ufscar.br) on 2016-11-08T18:42:42Z (GMT) No. of bitstreams: 1 DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) / Made available in DSpace on 2016-11-08T18:42:49Z (GMT). No. of bitstreams: 1 DissDCC.pdf: 2421455 bytes, checksum: 5fd16625959b31340d5f845754f109ce (MD5) Previous issue date: 2016-06-06 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / The growing amount of data produced daily, by both businesses and individuals in the web, increased the demand for analysis and extraction of knowledge of this data. While the last two decades the solution was to store and perform data mining algorithms, currently it has become unviable even to supercomputers. In addition, the requirements of the Big Data age go far beyond the large amount of data to analyze. Response time requirements and complexity of the data acquire more weight in many areas in the real world. New models have been researched and developed, often proposing distributed computing or different ways to handle the data stream mining. Current researches shows that an alternative in the data stream mining is to join a real-time event handling mechanism with a classic mining association rules or sequential patterns algorithms. In this work is shown a data stream mining approach to meet the Big Data response time requirement, linking the event handling mechanism in real time Esper and Incremental Miner of Stretchy Time Sequences (IncMSTS) algorithm. The results show that is possible to take a static data mining algorithm for data stream environment and keep tendency in the patterns, although not possible to continuously read all data coming into the data stream. / O crescimento da quantidade de dados produzidos diariamente, tanto por empresas como por indivíduos na web, aumentou a exigência para a análise e extração de conhecimento sobre esses dados. Enquanto nas duas últimas décadas a solução era armazenar e executar algoritmos de mineração de dados, atualmente isso se tornou inviável mesmo em super computadores. Além disso, os requisitos da chamada era do Big Data vão muito além da grande quantidade de dados a se analisar. Requisitos de tempo de resposta e complexidade dos dados adquirem maior peso em muitos domínios no mundo real. Novos modelos têm sido pesquisados e desenvolvidos, muitas vezes propondo computação distribuída ou diferentes formas de se tratar a mineração de fluxo de dados. Pesquisas atuais mostram que uma alternativa na mineração de fluxo de dados é unir um mecanismo de tratamento de eventos em tempo real com algoritmos clássicos de mineração de regras de associação ou padrões sequenciais. Neste trabalho é mostrada uma abordagem de mineração de fluxo de dados (data stream) para atender ao requisito de tempo de resposta do Big Data, que une o mecanismo de manipulação de eventos em tempo real Esper e o algoritmo Incremental Miner of Stretchy Time Sequences (IncMSTS). Os resultados mostram ser possível levar um algoritmo de mineração de dados estático para o ambiente de fluxo de dados e manter as tendências de padrões encontrados, mesmo não sendo possível ler todos os dados vindos continuamente no fluxo de dados. Mineração de dados Mineração no Big Data Mineração de data streams Mineração em fluxos de dados Processamento de eventos complexos Data mining Mining Big Data Data stream mining Complex event processing Sliding window Sequential pattern mining Association rule mining CIENCIAS EXATAS E DA TERRA
30	Algoritmo para a extração incremental de sequências relevantes com janelamento e pós-processamento aplicado a dados hidrográficos Silveira Junior, Carlos Roberto 07 June 2013 (has links) Made available in DSpace on 2016-06-02T19:06:09Z (GMT). No. of bitstreams: 1 5554.pdf: 2294386 bytes, checksum: ce6dc6cd7128337c0533ddd23c0bc601 (MD5) Previous issue date: 2013-06-07 / The mining of sequential patterns in data from environmental sensors is a challenging task: the data may show noise and may also contain sparse patterns that are difficult to detect. The knowledge extracted from environmental sensor data can be used to determine climate change, for example. However, there is a lack of methods that can handle this type of database. In order to reduce this gap, the algorithm Incremental Miner of Stretchy Time Sequences with Post-Processing (IncMSTS-PP) was proposed. The IncMSTS-PP applies incremental extraction of sequential patterns with post-processing based on ontology for the generalization of the patterns. The post-processing makes the patterns semantically richer. Generalized patterns synthesize the information and makes it easier to be interpreted. IncMSTS-PP implements the Stretchy Time Window (STW) that allows stretchy time patterns (patterns with temporal intervals) are mined from bases that have noises. In comparison with GSP algorithm, IncMSTS-PP can return 2.3 times more patterns and patterns with 5 times more itemsets. The post-processing module is responsible for the reduction in 22.47% of the number of patterns presented to the user, but the returned patterns are semantically richer. Thus, the IncMSTS-PP showed good performance and mined relevant patterns showing, that way, that IncMSTS-PP is effective, efficient and appropriate for domain of environmental sensor data. / A mineração de padrões sequenciais em dados de sensores ambientais é uma tarefa desafiadora: os dados podem apresentar ruídos e podem, também, conter padrões esparsos que são difíceis de serem detectados. O conhecimento extraído de dados de sensores ambientais pode ser usado para determinar mudanças climáticas, por exemplo. Entretanto, há uma lacuna de métodos que podem lidar com este tipo de banco de dados. Com o intuito de diminuir esta lacuna, o algoritmo Incremental Miner of Stretchy Time Sequences with Post- Processing (IncMSTS-PP) foi proposto. O IncMSTS-PP aplica a extração incremental de padrões sequencias com pós-processamento baseado em ontologia para a generalização dos padrões obtidos que acarreta o enriquecimento semântico desses padrões. Padrões generalizados sintetizam a informação e a torna mais fácil de ser interpretada. IncMSTS-PP implementa o método Stretchy Time Window (STW) que permite que padrões de tempo elástico (padrões com intervalos temporais) sejam extraídos em bases que apresentam ruídos. Em comparação com o algoritmo GSP, o IncMSTS-PP pode retornar 2,3 vezes mais sequencias e sequencias com 5 vezes mais itemsets. O módulo de pós-processamento é responsável pela redução em 22,47% do número de padrões apresentados ao usuário, porém os padrões retornados são semanticamente mais ricos, se comparados aos padrões não generalizados. Assim sendo, o IncMSTS-PP apresentou bons resultados de desempenho e minerou padrões relevantes mostrando, assim, que IncMSTS-PP é eficaz, eficiente e apropriado em domínio de dados de sensores ambientais. Data mining (Mineração de dados) Dados espaçotemporais Extração de padrões sequenciais Janelamento de dados Ontologia difusa Algoritmo de mineração de dados Dados reais Generalização de padrões Incremental Data mining algorithm Time-spacial data Real data Sequential pattern extraction Patterns generalization Data windowing Incremental data mining Fuzzy ontology

Search results