281 |
Effective Characterization of Sequence Data through Frequent EpisodesIbrahim, A January 2015 (has links) (PDF)
Pattern discovery is an important area of data mining referring to a class of techniques designed for the extraction of interesting patterns from the data. A pattern is some kind of a local structure that captures correlations and dependencies present in the elements of the data. In general, pattern discovery is about finding all patterns of `interest' in the data and a popular measure of interestingness for a pattern is its frequency of occurrence in the data. Thus the problem of frequent pattern discovery is to find all patterns in the data whose frequency of occurrence exceeds some user defined threshold. However, frequency of a pattern is not the only measure for finding
patterns of interest and there also exist other measures and techniques for finding
interesting patterns.
This thesis is concerned with efficient discovery of inherent patterns from long
sequence (or temporally ordered) data. Mining of such sequentially ordered data is
called temporal data mining and the temporal patterns that are discovered from large
sequential data are called episodes. More specifically, this thesis explores efficient
methods for finding small and relevant subsets of episodes from sequence data that
best characterize the data. The thesis also discusses methods for comparing datasets,
based on comparing the sets of patterns representing the datasets.
The data in a frequent episode discovery framework is abstractly viewed as a single
long sequence of events. Here, the event is a tuple, (Ei; ti), where Ei is referred to as an event-type (taking values from a finite alphabet set) and ti is the time of occurrence.
The events are ordered in the non-decreasing order of the time of occurrence. The
pattern of interest in such a sequence is called an episode, which is a collection of
event-types with a partial order defined over it. In this thesis, the focus is on a special
type of episode called serial episode, where there is a total order defined among the
collection of event-types representing the episode. The occurrence of an episode is
essentially a subset of events from the data whose event-types match the set of eventtypes
associated with the episode and the order in which they occur conforms to the underlying partial order of the episode. The frequency of an episode is some measure of how often it occurs in the event stream. Many different notions of frequency have been defined in literature. Given a frequency definition, the goal of frequent episode discovery is to unearth all episodes which have a frequency greater than a user-defined threshold. The size of an episode is the number of event-types in the episode. An episode β is called a subepisode of another episode β, if the collection of event-types of β is a subset of the corresponding collection of α and the event-types of β satisfy the same partial order relationships present among the corresponding event-types of α.
The set of all episodes can be arranged in a partial order lattice, where each level
i contains episodes of size i and the partial order is the subepisode relationship. In
general, there are two approaches for mining frequent episodes, based on the way one
traverses this lattice. The first approach is to traverse this lattice in a breadth-first
manner, and is called the Apriori approach. The other approach is the Pattern growth
approach, where the lattice is traversed in a depth-first manner. There exist different frequency notions for episodes, and many Apriori based algorithms have been proposed for mining frequent episodes under the different frequencies. However there do not exist Pattern-growth based methods for many of the frequency notions.
The first part of the thesis proposes new Pattern-growth methods for discovering
frequent serial episodes under two frequency notions called the non-overlapped frequency
and the total frequency. Special cases, where certain additional conditions, called the span and gap constraints, are imposed on the occurrences of the episodes are also considered. The proposed methods, in general, consist of two steps: the candidate
generation step and the counting step. The candidate generation step involves finding potential frequent episodes. This is done by following the general Pattern growth
approach for finding the candidates, which is the depth-first traversal of the lattice of all episodes. The second step, which is the counting step, involves counting the frequencies of the episodes. The thesis presents efficient methods for counting
the occurrences of serial episodes using occurrence windows of subepisodes for both
the non-overlapped and total frequency. The relative advantages of Pattern-growth
approaches over Apriori approaches are also discussed. Through detailed simulation
results, the effectiveness of this approach on a host of synthetic and real data sets
is shown. It is shown that the proposed methods are highly scalable and efficient in
runtime as compared to the existing Apriori approaches.
One of the main issues in frequent pattern mining is the huge number of frequent
patterns, returned by the discovery methods, irrespective of the approach taken to solve the problems. The second part of this thesis, addresses this issue and discusses methods of selecting a small subset of relevant episodes from event sequences. There have been a few approaches, discussed in the literature, for finding a small subset of patterns. One set of methods are information theory based methods, where patterns that provide maximum information are searched for. Another approach is the Minimum Description Length (MDL) principle based summarization schemes. Here the data is encoded using a subset of patterns (which forms the model for the data) and its occurrences. The subset of patterns that has the maximum efficiency in encoding
the data is the best representative model for the data. The MDL principle takes into
account both the encoding efficiency of the model as well as model complexity. A
method, called Constrained Serial episode Coding(CSC), is proposed based on the
MDL principle, which returns a highly relevant, non-redundant and small subset of
serial episodes. This also includes an encoding scheme, where the model representation and the encoding of the data are efficient. An interesting feature of this algorithm for isolating a small set of relevant episodes is that it does not need a user-specified threshold on frequency. The effectiveness of this method is shown on two types of data. The first is data obtained from a detailed simulator for a reconfigurable coupled conveyor system. The conveyor system consists of different intersecting paths and packages flow through such a network. Mining of such data can allow one to unearth the main paths of package
ows which can be useful in remote monitoring
and visualization of the system. On this data, it is shown that the proposed method
is able to return highly consistent sub paths, in the form of serial episodes, with
great encoding efficiency as compared to other known related sequence summarization
schemes, like SQS and GoKrimp. The second type of data consists of a collection
of multi-class sequence datasets. It is shown that the selected episodes from the proposed
method form good features in classi cation. The proposed method is compared
with SQS and GoKrimp, and it is shown that the episodes selected by this method
help in achieving better classification results as compared to other methods.
The third and nal part of the thesis discusses methods for comparing sets of patterns representing different datasets. There are many instances when one is interested in comparing datasets. For example, in streaming data, one is interested in knowing whether the characteristics of the data are the same or have changed significantly.
In other cases, one may simply like to compare two datasets and quantify the degree
of similarity between them. Often, data are characterized by a set of patterns as
described above. Comparing sets of patterns representing datasets gives information
about the similarity/dissimilarity between the datasets. However not many measures
exist for comparing sets of patterns. This thesis proposes a similarity measure for
comparing sets of patterns which in turn aids in comparison of di erent datasets.
First, a kernel for comparing two patterns, called the Pattern Kernel, is proposed.
This kernel is proposed for three types of patterns: serial episodes, sequential patterns and itemsets. Using this kernel, a Pattern Set Kernel is proposed for comparing
different sets of patterns. The effectiveness of this kernel is shown in classification and
change detection. The thesis concludes with a summary of the main contributions and some suggestions for extending the work presented here.
|
282 |
Goal-oriented Pattern Family Framework for Business Process ModelingAhmadi Behnam, Saeed 26 October 2012 (has links)
While several approaches exist for modeling goals and business processes in organizations, the relationships between these two views are often not well defined. This inhibits the effective reuse of available knowledge in models. This thesis aims to address this issue through the introduction of a Goal-oriented Pattern Family (GoPF) framework that helps constructing business process models from organization goals while expanding these goals, establishing traceability relationships between the goal and process views, and improving reusability. Methods for extracting domain knowledge as patterns, which are composed of goal model building blocks, process model building blocks, and their relationships, and for maintaining the patterns over time are also presented. The GoPF framework provides the infrastructure for defining pattern families, i.e., collections of related patterns for particular domains. The foundation of GoPF is formalized as a profile of the User Requirements Notation, a standard modeling language that supports goals, scenarios, and links between them. A method for the use of GoPF is defined and then illustrated through a case study that targets the improvement of patient safety in healthcare organizations. The framework and the extraction/maintenance methods are also validated against another case study involving aviation security in a regulatory environment. The GoPF framework is expected to have a positive impact on the scientific community through the formalization, evolution, and reuse of patterns in domain-specific business domains. From an industrial viewpoint, this framework will also help intermediary organizations (such as consulting firms) who are required to repeatedly create and document goal and process models for other organizations in their business domain.
|
283 |
Audio-video based handwritten mathematical content recognitionVemulapalli, Smita 12 November 2012 (has links)
Recognizing handwritten mathematical content is a challenging problem, and more so when such content appears in classroom videos. However, given the fact that in such videos the handwritten text and the accompanying audio refer to the same content, a combination of video and audio based recognizer has the potential to significantly improve the content recognition accuracy. This dissertation, using a combination of video and audio based recognizers, focuses on improving the recognition accuracy associated with handwritten mathematical content in such videos.
Our approach makes use of a video recognizer as the primary recognizer and a multi-stage assembly, developed as part of this research, is used to facilitate effective combination with an audio recognizer. Specifically, we address the following challenges related to audio-video based handwritten mathematical content recognition: (1) Video Preprocessing - generates a timestamped sequence of segmented characters from the classroom video in the face of occlusions and shadows caused by the instructor, (2) Ambiguity Detection - determines the subset of input characters that may have been incorrectly recognized by the video based recognizer and forwards this subset for disambiguation, (3) A/V Synchronization - establishes correspondence between the handwritten character and the spoken content, (4) A/V Combination - combines the synchronized outputs from the video and audio based recognizers and generates the final recognized character, and (5) Grammar Assisted A/V Based Mathematical Content Recognition - utilizes a base mathematical speech grammar for both character and structure disambiguation. Experiments conducted using videos recorded in a classroom-like environment demonstrate the significant improvements in recognition accuracy that can be achieved using our techniques.
|
284 |
Goal-oriented Pattern Family Framework for Business Process ModelingAhmadi Behnam, Saeed 26 October 2012 (has links)
While several approaches exist for modeling goals and business processes in organizations, the relationships between these two views are often not well defined. This inhibits the effective reuse of available knowledge in models. This thesis aims to address this issue through the introduction of a Goal-oriented Pattern Family (GoPF) framework that helps constructing business process models from organization goals while expanding these goals, establishing traceability relationships between the goal and process views, and improving reusability. Methods for extracting domain knowledge as patterns, which are composed of goal model building blocks, process model building blocks, and their relationships, and for maintaining the patterns over time are also presented. The GoPF framework provides the infrastructure for defining pattern families, i.e., collections of related patterns for particular domains. The foundation of GoPF is formalized as a profile of the User Requirements Notation, a standard modeling language that supports goals, scenarios, and links between them. A method for the use of GoPF is defined and then illustrated through a case study that targets the improvement of patient safety in healthcare organizations. The framework and the extraction/maintenance methods are also validated against another case study involving aviation security in a regulatory environment. The GoPF framework is expected to have a positive impact on the scientific community through the formalization, evolution, and reuse of patterns in domain-specific business domains. From an industrial viewpoint, this framework will also help intermediary organizations (such as consulting firms) who are required to repeatedly create and document goal and process models for other organizations in their business domain.
|
285 |
Statistical pattern recognition approaches for retrieval-based machine translation systemsMansjur, Dwi Sianto 01 November 2011 (has links)
This dissertation addresses the problem of Machine Translation (MT), which is defined as an automated translation of a document written in one language (the source language) to another (the target language) by a computer. The MT task requires various types of knowledge of both the source and target language, e.g., linguistic rules and linguistic exceptions. Traditional MT systems rely on an extensive parsing strategy to decode the linguistic rules and use a knowledge base to encode those linguistic exceptions. However, the construction of the knowledge base becomes an issue as the translation system grows. To overcome this difficulty, real translation examples are used instead of a manually-crafted knowledge base. This design strategy is known as the Example-Based Machine Translation (EBMT) principle. Traditional EBMT systems utilize a database of word or phrase translation pairs. The main challenge of this approach is the difficulty of combining the word or phrase translation units into a meaningful and fluent target text. A novel Retrieval-Based Machine Translation (RBMT) system, which uses a sentence-level translation unit, is proposed in this study. An advantage of using the sentence-level translation unit is that the boundary of a sentence is explicitly defined and the semantic, or meaning, is precise in both the source and target language. The main challenge of using a sentential translation unit is the limited coverage, i.e., the difficulty of finding an exact match between a user query and sentences in the source database. Using an electronic dictionary and a topic modeling procedure, we develop a procedure to obtain clusters of sensible variations for each example in the source database. The coverage of our MT system improves because an input query text is matched against a cluster of sensible variations of translation examples instead of being matched against an original source example. In addition, pattern recognition techniques are used to improve the matching procedure, i.e., the design of optimal pattern classifiers and the incorporation of subjective judgments. A high performance statistical pattern classifier is used to identify the target sentences from an input query sentence in our MT system. The proposed classifier is different from the conventional classifier in terms of the way it addresses the generalization capability. A conventional classifier addresses the generalization issue using the parsimony principle and may encounter the possibility of choosing an oversimplified statistical model. The proposed classifier directly addresses the generalization issue in terms of training (empirical) data. Our classifier is expected to generalize better than the conventional classifiers because our classifier is less likely to use over-simplified statistical models based on the available training data. We further improve the matching procedure by the incorporation of subjective judgments. We formulate a novel cost function that combines subjective judgments and the degree of matching between translation examples and an input query. In addition, we provide an optimization strategy for the novel cost function so that the statistical model can be optimized according to the subjective judgments.
|
286 |
Spatial and temporal patterns of wildfire occurrence and susceptibility in CanadaGralewicz, Nicholas John 31 August 2010 (has links)
Wildfire processes in Canada are expected to change as a result of climate change.
Predictive modeling of wildfire occurrence and susceptibility requires knowledge of
ignition expectations and landscape conditions leading to burn. This research examines and quantifies the spatial and temporal patterns of wildfire across Canada with focus on wildfire occurrence and national scale drivers of susceptibility. Baseline ignition expectations and trends are identified and used to create unique fire ignition regimes, assess anthropogenic influence on ignitions, and determine regions with anomalously high ignitions. The aspatial and spatial characteristics of land cover were characterized for pre- and post-fire landscapes. These included land cover composition, configuration, and abiotic covariates. Temporal trends in forest pattern following ignition are examined
and national scale drivers of wildfire susceptibility determined. Fire ignition regimes and anomalous ignition regions provide spatially explicit outputs for exploring ignition expectation in Canada. Wildfire was identified to burn mainly in coniferous forests with little fragmentation. Fragmentation increased after wildfire and regeneration of pre-fire
forest pattern took 20 years. Additionally, anthropogenic proximity positively influenced
ignition expectation, ignition trend, and wildfire susceptibility. This research provides broad scale methods to assess wildfire occurrence and susceptibility across Canada and will facilitate understanding of changing wildfire processes in the future. Additionally, this research highlights the importance of anthropogenic activity on natural fire processes.
|
287 |
Genusmedveten mönsterdesign - med textil som budbärareLANDÉN, ERIKA January 2017 (has links)
Vårt samhälle är uppbyggt av normer som talar om för oss hur vi ska leva, se ut och bete oss. Många av dessa normer antyder hur en bör vara i förhållande till sitt kön, och ramarna är ofta mycket snäva. Normer kring kön kan återspeglas på många olika sätt, bland annat i design och det vi formger. Detta kandidatarbete undersöker hur genusteoretiska perspektiv kan tillämpas i skapandet av normkritisk mönsterdesign och vad som händer i sambandet däremellan. Genom omvärlds- och diskursanalys undersöker jag i utvalda klädbutiker i en liten svensk stad vad genus i mönster kan vara och belyser hur könsnormer reproduceras i det vi formger. Jag utforskar och reflekterar också kring sambandet mellan mönsterdesign och samhällsmönster, och hur de samverkande reproducerar normer. Med egna gestaltande mönster undersöker jag om genusteori är ett fungerande hjälpmedel och om normkritik fungerar som förhållningssätt i en designprocess för att kunna formge normkritiska mönster. Under undersökningens gång framkommer det att genom identifierandet av mönster kan vi utveckla en djupare förståelse för världen vi lever i och varför den ser ut som den gör, men också att det går att förändra med hjälp av mönster. / Our society is built up by norms that tell us how to live, what to look like and how to act. Many of these norms apply how one should be relative to one’s gender, and the frames for that are often very tight. Gender roles can be reflected in many ways, including in design and what is being shaped. This bachelor thesis examines how gender theoretical perspectives can be applied in the making of norm-critical pattern design and what happens in the relationship between them. Through an external analysis and discourse analysis, I investigate in selected clothing stores in a small Swedish town what gender in patterns can be and highlight how gender standards are reproduced in what we design. I also explore and reflect on the relationship between pattern design and social patterns, and how they co-reproduce norms. With my own pattern design, I investigate whether gender theory is a working tool and if norm criticism serves as an approach in a design process to shape norm-critical patterns. During the investigation, it appears that through the identification of patterns, we can develop a deeper understanding of the world we live in and why it is the way it is, but also that it can change with the help of patterns.
|
288 |
Goal-oriented Pattern Family Framework for Business Process ModelingAhmadi Behnam, Saeed January 2012 (has links)
While several approaches exist for modeling goals and business processes in organizations, the relationships between these two views are often not well defined. This inhibits the effective reuse of available knowledge in models. This thesis aims to address this issue through the introduction of a Goal-oriented Pattern Family (GoPF) framework that helps constructing business process models from organization goals while expanding these goals, establishing traceability relationships between the goal and process views, and improving reusability. Methods for extracting domain knowledge as patterns, which are composed of goal model building blocks, process model building blocks, and their relationships, and for maintaining the patterns over time are also presented. The GoPF framework provides the infrastructure for defining pattern families, i.e., collections of related patterns for particular domains. The foundation of GoPF is formalized as a profile of the User Requirements Notation, a standard modeling language that supports goals, scenarios, and links between them. A method for the use of GoPF is defined and then illustrated through a case study that targets the improvement of patient safety in healthcare organizations. The framework and the extraction/maintenance methods are also validated against another case study involving aviation security in a regulatory environment. The GoPF framework is expected to have a positive impact on the scientific community through the formalization, evolution, and reuse of patterns in domain-specific business domains. From an industrial viewpoint, this framework will also help intermediary organizations (such as consulting firms) who are required to repeatedly create and document goal and process models for other organizations in their business domain.
|
289 |
Discovering Contiguous Sequential Patterns in Network-Constrained MovementYang, Can January 2017 (has links)
A large proportion of movement in urban area is constrained to a road network such as pedestrian, bicycle and vehicle. That movement information is commonly collected by Global Positioning System (GPS) sensor, which has generated large collections of trajectories. A contiguous sequential pattern (CSP) in these trajectories represents a certain number of objects traversing a sequence of spatially contiguous edges in the network, which is an intuitive way to study regularities in network-constrained movement. CSPs are closely related to route choices and traffic flows and can be useful in travel demand modeling and transportation planning. However, the efficient and scalable extraction of CSPs and effective visualization of the heavily overlapping CSPs are remaining challenges. To address these challenges, the thesis develops two algorithms and a visual analytics system. Firstly, a fast map matching (FMM) algorithm is designed for matching a noisy trajectory to a sequence of edges traversed by the object with a high performance. Secondly, an algorithm called bidirectional pruning based closed contiguous sequential pattern mining (BP-CCSM) is developed to extract sequential patterns with closeness and contiguity constraint from the map matched trajectories. Finally, a visual analytics system called sequential pattern explorer for trajectories (SPET) is designed for interactive mining and visualization of CSPs in a large collection of trajectories. Extensive experiments are performed on a real-world taxi trip GPS dataset to evaluate the algorithms and visual analytics system. The results demonstrate that FMM achieves a superior performance by replacing repeated routing queries with hash table lookups. BP-CCSM considerably outperforms three state-of-the-art algorithms in terms of running time and memory consumption. SPET enables the user to efficiently and conveniently explore spatial and temporal variations of CSPs in network-constrained movement. / <p>QC 20171122</p>
|
290 |
High Order Volumetric Directional Pattern for Robust Face RecognitionEssa, Almabrok Essa 28 August 2017 (has links)
No description available.
|
Page generated in 0.0507 seconds