1 |
Contrasting sequence groups by emerging sequencesDeng, Kang. January 2009 (has links)
Thesis (M. Sc.)--University of Alberta, 2009. / Title from PDF file main screen (viewed on Nov. 27, 2009). "A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master of Science, Department of Computing Science, University of Alberta." Includes bibliographical references.
|
2 |
Pattern Mining and Concept Discovery for Multimodal Content AnalysisLi, Hongzhi January 2016 (has links)
With recent advances in computer vision, researchers have been able to demonstrate impressive performance at near-human-level capabilities in difficult tasks such as image recognition. For example, for images taken under typical conditions, computer vision systems now have the ability to recognize if a dog, cat, or car appears in an image. These advances are made possible by utilizing the massive volume of image datasets and label annotations, which include category labels and sometimes bounding boxes around the objects of interest within the image. However, one major limitation of the current solutions is that when users apply recognition models to new domains, users need to manually define the target classes and label the training data in order to prepare labeled annotations required for the process of training the recognition models. Manually identifying the target classes and constructing the concept ontology for a new domain are time-consuming tasks, as they require the users to be familiar with the content of the image collection, and the manual process of defining target classes is difficult to scale up to generate a large number of classes. In addition, there has been significant interest in developing knowledge bases to improve content analysis and information retrieval. Knowledge base is an object model (ontology) with classes, subclasses, attributes, instances, and relations among them. The knowledge base generation problem is to identify the (sub)classes and their structured relations for a given domain of interest. Similar to ontology construction, Knowledge base is usually generated by human experts manually, and it is usually a time-consuming and difficult task. Thus, it is important and necessary to find a way to explore the semantic concepts and their structural relations that are important for a target data collection or domain of interest, so that we can construct an ontology or knowledge base for visual data or multimodal content automatically or semi-automatically.
Visual patterns are the discriminative and representative image content found in objects or local image regions seen in an image collection. Visual patterns can also be used to summarize the major visual concepts in an image collection. Therefore, automatic discovery of visual patterns can help users understand the content and structure of a data collection and in turn help users construct the ontology and knowledge base mentioned earlier.
In this dissertation, we aim to answer the following question: given a new target domain and associated data corpora, how do we rapidly discover nameable content patterns that are semantically coherent, visually consistent, and can be automatically named with semantic concepts related to the events of interest in the target domains? We will develop pattern discovery methods that focus on visual content as well as multimodal data including text and visual.
Traditional visual pattern mining methods only focus on analysis of the visual content, and do not have the ability to automatically name the patterns. To address this, we propose a new multimodal visual pattern mining and naming method that specifically addresses this shortcoming. The named visual patterns can be used as discovered semantic concepts relevant to the target data corpora. By combining information from multiple modalities, we can ensure that the discovered patterns are not only visually similar, but also have consistent meaning, as well. The capability of accurately naming the visual patterns is also important for finding relevant classes or attributes in the knowledge base construction process mentioned earlier.
Our framework contains a visual model and a text model to jointly represent the text and visual content. We use the joint multimodal representation and the association rule mining technique to discover semantically coherent and visually consistent visual patterns. To discover better visual patterns, we further improve the visual model in the multimodal visual pattern mining pipeline, by developing a convolutional neural network (CNN) architecture that allows for the discovery of scale-invariant patterns. In this dissertation, we use news as an example domain and image caption pairs as example multimodal corpora to demonstrate the effectiveness of the proposed methods. However, the overall proposed framework is general and can be easily extended to other domains.
The problem of concept discovery is made more challenging if the target application domain involves fine-grained object categories (e.g., highly related dog categories or consumer product categories). In such cases, the content of different classes could be quite similar, making automatic separation of classes difficult. In the proposed multimodal pattern mining framework, representation models for visual and text data play an important role, as they shape the pool of candidates that are fed to the pattern mining process. General models like the CNN models trained on ImageNet, though shown to be generalizable to various domains, are unable to capture the small differences in the fine-grained dataset. To address this problem, we propose a new representation model that uses an end-to-end artificial neural network architecture to discover visual patterns. This model can be fine-tuned on a fine-grained dataset so that the convolutional layers can be optimized to capture the features and patterns from the fine-trained image set. It has the ability to discover visual patterns from fine-grained image datasets because its convolutional layers of the CNN can be optimized to capture the features and patterns from the fine-grained images. Finally, to demonstrate the advantage of the proposed multimodal visual pattern mining and naming framework, we apply the proposed technique to two applications. In the first application, we use the visual pattern mining technique to find visual anchors to summarize video news events. In the second application, we use the visual patterns as important cues to link video news events to social media events.
The contributions of this dissertation can be summarized as follows: (1) We develop a novel multimodal mining framework for discovering visual patterns and nameable concepts from a collection of multimodal data and automatically naming the discovered patterns, producing a large pool of semantic concepts specifically relevant to a high-level event. The framework combines visual representation based on CNN and text representation based on embedding. The named visual patterns can be required for construct event schema needed in the knowledge base construction process. (2) We propose a scale-invariant visual pattern mining model to improve the multimodal visual pattern mining framework. The improved visual model leads to better overall performance in discovering and naming concepts. To localize the visual patterns discovered in this framework, we propose a deconvolutional neural network model to localize the visual pattern patterns within the image. (3) To directly learn from data in the target domain, we propose a novel end-to-end neural network architecture called PatternNet for finding high-quality visual patterns even for datsets that consistent of fine-grained classes. (4) We demonstrate novel applications of visual pattern mining in two applications: video news event summarization and video news event linking.
|
3 |
AV space for efficiently learning classification rules from large datasets /Wang, Linyan. January 2006 (has links)
Thesis (M.Sc.)--York University, 2006. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 130-134). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR19748
|
4 |
SNIF TOOL - Sniffing for patterns in continuous streamsMukherji, Abhishek. January 2008 (has links)
Thesis (M.S.)--Worcester Polytechnic Institute. / Keywords: continuous queries; streaming time-series; similarity queries; pattern matching. Includes bibliographical references (p. 58-61).
|
5 |
Cybersecurity Testing and Intrusion Detection for Cyber-Physical Power SystemsPan, Shengyi 13 December 2014 (has links)
Power systems will increasingly rely on synchrophasor systems for reliable and high-performance wide area monitoring and control (WAMC). Synchrophasor systems greatly use information communication technologies (ICT) for data exchange which are vulnerable to cyber-attacks. Prior to installation of a synchrophasor system a set of cyber security requirements must be developed and new devices must undergo vulnerability testing to ensure that proper security controls are in place to protect the synchrophasor system from unauthorized access. This dissertation describes vulnerability analysis and testing performed on synchrophasor system components. Two network fuzzing frameworks are proposed; for the I C37.118 protocol and for an energy management system (EMS). While fixing the identified vulnerabilities in information infrastructures is imperative to secure a power system, it is likely that successful intrusions will still occur. The ability to detect intrusions is necessary to mitigate the negative effects from a successful attacks. The emergence of synchrophasor systems provides real-time data with millisecond precision which makes the observation of a sequence of fast events feasible. Different power system scenarios present different patterns in the observed fast event sequences. This dissertation proposes a data mining approach called mining common paths to accurately extract patterns for power system scenarios including disturbances, control and protection actions and cyber-attacks from synchrophasor data and logs of system components. In this dissertation, such a pattern is called a common path, which is represented as a sequence of critical system states in temporal order. The process of automatically discovering common paths and building a state machine for detecting power system scenarios and attacks is introduced. The classification results show that the proposed approach can accurately detect these scenarios even with variation in fault locations and load conditions. This dissertation also describes a hybrid intrusion detection framework that employs the mining common path algorithm to enable a systematic and automatic IDS construction process. An IDS prototype was validated on a 2-line 3-bus power transmission system protected by the distance protection scheme. The result shows the IDS prototype accurately classifies 25 power system scenarios including disturbances, normal control operations, and cyber-attacks.
|
6 |
Feature extraction and similarity-based analysis for proteome and genome databasesÖztürk, Özgür. January 2007 (has links)
Thesis (Ph. D.)--Ohio State University, 2007. / Title from first page of PDF file. Includes bibliographical references (p. 108-119).
|
7 |
Data Mining On Architecture SimulationMaden, Engin 01 March 2010 (has links) (PDF)
Data mining is the process of extracting patterns from huge data. One of the branches
in data mining is mining sequence data and here the data can be viewed as a
sequence of events and each event has an associated time of occurrence. Sequence
data is modelled using episodes and events are included in episodes.
The aim of this thesis work is analysing architecture simulation output data by
applying episode mining techniques, showing the previously known relationships
between the events in architecture and providing an environment to predict the
performance of a program in an architecture before executing the codes. One of the
most important points here is the application area of episode mining techniques.
Architecture simulation data is a new domain to apply these techniques and by using
the results of these techniques making predictions about the performance of
programs in an architecture before execution can be considered as a new approach.
For this purpose, by implementing three episode mining techniques which are
WINEPI approach, non-overlapping occurrence based approach and MINEPI
approach a data mining tool has been developed. This tool has three main
components. These are data pre-processor, episode miner and output analyser.
|
8 |
Using Differential Sequence Mining to Associate Patterns of Interactions in Concept Mapping Activity with Dimensions of Collaborative ProcessJanuary 2015 (has links)
abstract: Computer supported collaborative learning (CSCL) has made great inroads in classroom teaching marked by the use of tools and technologies to support and enhance collaborative learning. Computer mediated learning environments produce large amounts of data, capturing student interactions, which can be used to analyze students’ learning behaviors (Martinez-Maldonado et al., 2013a). The analysis of the process of collaboration is an active area of research in CSCL. Contributing towards this area, Meier et al. (2007) defined nine dimensions and gave a rating scheme to assess the quality of collaboration. This thesis aims to extract and examine frequent patterns of students’ interactions that characterize strong and weak groups across the above dimensions. To achieve this, an exploratory data mining technique, differential sequence mining, was employed using data from a collaborative concept mapping activity where collaboration amongst students was facilitated by an interactive tabletop. The results associate frequent patterns of collaborative concept mapping process with some of the dimensions assessing the quality of collaboration. The analysis of associating these patterns with the dimensions of collaboration is theoretically grounded, considering aspects of collaborative learning, concept mapping, communication, group cognition and information processing. The results are preliminary but still demonstrate the potential of associating frequent patterns of interactions with strong and weak groups across specific dimensions of collaboration, which is relevant for students, teachers, and researchers to monitor the process of collaborative learning. The frequent patterns for strong groups reflected conformance to the process of conversation for dimensions related to “communication” aspect of collaboration. In terms of the concept mapping sub-processes the frequent patterns for strong groups reflect the presentation phase of conversation with processes like talking, sharing individual maps while constructing the groups concept map followed by short utterances which represents the acceptance phase. For “joint information processing” aspect of collaboration, the frequent patterns for strong groups were marked by learners’ contributing more upon each other’s work. In terms of the concept mapping sub-processes the frequent patterns were marked by learners adding links to each other’s concepts or working with each other’s concepts, while revising the group concept map. / Dissertation/Thesis / Masters Thesis Computer Science 2015
|
9 |
Sequential Pattern Mining on Electronic Medical Records for Finding Optimal Clinical PathwaysEdman, Henrik January 2018 (has links)
Electronic Medical Records (EMRs) are digital versions of paper charts, used to record the treatment of different patients in hospitals. Clinical pathways are used as guidelines for how to treat different diseases, determined by observing outcomes from previous treatments. Sequential pattern mining is a version of data mining where the data mined is organized in sequences. It is a common research topic in data mining with many new variations on existing algorithms being introduced frequently. In a previous report, the sequential pattern mining algorithm PrefixSpan was used to mine patterns in EMRs to verify or suggest new clinical pathways. It was found to only be able to verify pathways partially. One of the reasons stated for this was that PrefixSpan was too inefficient to be able to mine at a low enough support to consider some items. In this report CSpan is used instead, since it is supposed to outperform PrefixSpan by up to two orders of magnitude, in order to improve runtime and thereby address the problems mentioned in the previous work. The results show that CSpan did indeed improve the runtime and the algorithm was able to mine at a lower minimum support. However, the output was only barely improved. / Electronic Medical Records (EMRs) är digitala versioner av behandlingshistoriken för patienter på sjukhus. Clinical pathways används som riktlinjer för hur olika sjukdomar borde behandlas, vilka bestäms genom att observera utkomsten av tidigare behandlingar. Sequential pattern mining är en typ av data mining där datan som behandlas är strukturerad i sekvenser. Det är ett vanligt forskningsområde inom data mining där många nya variationer av existerande algoritmer introduceras frekvent. I en tidigare rapport användes sequential pattern mining algoritmen PrefixSpan på EMRs för att verifiera eller föreslå nya clinical pathways. Den kunde dock endast verifiera pathways delvis. En av anledningarna som nämndes för detta var att PrefixSpan var för ineffektiv för att kunna köras med en tillräckligt låg support för att kunna finna vissa åtgärder i en behandling. I den här rapporten används istället CSpan, eftersom den ska överprestera PrefixSpan med upp till två storleksordningar, för att förbättra körningstiden och därmed adressera problemen som nämns i den tidigare rapporten. Resultaten visar att CSpan förbättrade körningstiden och algoritmen kunde köras med lägre support. Däremot blev utdatan knappt förbättrad.
|
10 |
Migration Motif: A Spatial-Temporal Pattern Mining Approach for Financial MarketsDu, Xiaoxi 08 April 2009 (has links)
No description available.
|
Page generated in 0.1334 seconds