Global ETD Search

11	Learning predictive models from graph data using pattern mining Karunaratne, Thashmee M. January 2014 (has links) Learning from graphs has become a popular research area due to the ubiquity of graph data representing web pages, molecules, social networks, protein interaction networks etc. However, standard graph learning approaches are often challenged by the computational cost involved in the learning process, due to the richness of the representation. Attempts made to improve their efficiency are often associated with the risk of degrading the performance of the predictive models, creating tradeoffs between the efficiency and effectiveness of the learning. Such a situation is analogous to an optimization problem with two objectives, efficiency and effectiveness, where improving one objective without the other objective being worse off is a better solution, called a Pareto improvement. In this thesis, it is investigated how to improve the efficiency and effectiveness of learning from graph data using pattern mining methods. Two objectives are set where one concerns how to improve the efficiency of pattern mining without reducing the predictive performance of the learning models, and the other objective concerns how to improve predictive performance without increasing the complexity of pattern mining. The employed research method mainly follows a design science approach, including the development and evaluation of artifacts. The contributions of this thesis include a data representation language that can be characterized as a form in between sequences and itemsets, where the graph information is embedded within items. Several studies, each of which look for Pareto improvements in efficiency and effectiveness are conducted using sets of small graphs. Summarizing the findings, some of the proposed methods, namely maximal frequent itemset mining and constraint based itemset mining, result in a dramatically increased efficiency of learning, without decreasing the predictive performance of the resulting models. It is also shown that additional background knowledge can be used to enhance the performance of the predictive models, without increasing the complexity of the graphs. Machine Learning Graph Data Pattern Mining Classification Regression Predictive Models
12	Discovering Co-Location Patterns and Rules in Uncertain Spatial Datasets Adilmagambetov, Aibek Unknown Date No description available. co-location mining frequent pattern mining uncertain spatial datasets
13	AV space for efficiently learning classification rules from large datasets / Wang, Linyan. January 2006 (has links) Thesis (M.Sc.)--York University, 2006. Graduate Programme in Computer Science. / Typescript. Includes bibliographical references (leaves 130-134). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:MR19748
14	SNIF TOOL - Sniffing for patterns in continuous streams Mukherji, Abhishek. January 2008 (has links) Thesis (M.S.)--Worcester Polytechnic Institute. / Keywords: continuous queries; streaming time-series; similarity queries; pattern matching. Includes bibliographical references (p. 58-61).
15	Interesting Association Rules Mining Based on Improved Rarity Algorithm Xiang, Lan January 2018 (has links) With the rapid development of science and technology, our society has been in the big data era. In human activities, we produce a lot of data in every second and every minute, what contain much information. Then, how to select the useful information from those complicated data is a significant issue. So the association rules mining, a technique of mining patterns or associations between itemsets, comes into being. And this technique aims to find some important associations in data to get useful knowledge. Nowadays, most scholars at home and abroad focus on the frequent pattern mining. However, it is undeniable that the rare pattern mining also plays an important role in many areas, such as the medical, financial, and scientific field. Comparing with frequent pattern mining, studying rare pattern mining is more valuable, because it tends to find unknown, unexpected, and more interesting rules. But the study of rare pattern mining is little difficult because of the scarcity of data used for verifying rules. In the frequent pattern mining, there are two general algorithms of discovering frequent itemsets, i.e., Apriori, the earliest algorithm which is proposed by R.Agrawal in 1994, and FP-Tree, the improved algorithm which reduced the time complexity. And in rare pattern mining, there are also two algorithms, Arima and Rarity, what are similar to Apriori and FP-Tree algorithms, but they still exist some problems, for example, Arima is time-consuming because of repeatedly scanning the large database, and Rarity is space-consuming because of the establishment of the full-combination tree. Therefore, based on the Rarity algorithm, this report presents an improved method to efficiently discover interesting association rules among rare itemsets and aims to get a balance between time and space. It is a top-down strategy which uses the graph structure to indicate all combinations of existing items, defines pattern matrix to record itemsets, and combines the hash table to accelerate calculation process. This method decreases both the time cost and the space cost when comparing with Arima, and reduces the space waste to solve the problem of Rarity, but its searching time of mining rare itemsets is more than Rarity, and we verified the feasibility of this algorithm only on abstract and small databases. Thus in the future, on the one hand, we will continue improving our method to explore how to decrease the searching time in the process and adjust the hash function to optimize the space utilization. And on the other hand, we will apply our method to actual large databases, such as the clinical database of the diabetic patients to mine association rules in diabetic complications. Association rules mining Rare pattern mining Computer Systems Datorsystem
16	Cybersecurity Testing and Intrusion Detection for Cyber-Physical Power Systems Pan, Shengyi 13 December 2014 (has links) Power systems will increasingly rely on synchrophasor systems for reliable and high-performance wide area monitoring and control (WAMC). Synchrophasor systems greatly use information communication technologies (ICT) for data exchange which are vulnerable to cyber-attacks. Prior to installation of a synchrophasor system a set of cyber security requirements must be developed and new devices must undergo vulnerability testing to ensure that proper security controls are in place to protect the synchrophasor system from unauthorized access. This dissertation describes vulnerability analysis and testing performed on synchrophasor system components. Two network fuzzing frameworks are proposed; for the I C37.118 protocol and for an energy management system (EMS). While fixing the identified vulnerabilities in information infrastructures is imperative to secure a power system, it is likely that successful intrusions will still occur. The ability to detect intrusions is necessary to mitigate the negative effects from a successful attacks. The emergence of synchrophasor systems provides real-time data with millisecond precision which makes the observation of a sequence of fast events feasible. Different power system scenarios present different patterns in the observed fast event sequences. This dissertation proposes a data mining approach called mining common paths to accurately extract patterns for power system scenarios including disturbances, control and protection actions and cyber-attacks from synchrophasor data and logs of system components. In this dissertation, such a pattern is called a common path, which is represented as a sequence of critical system states in temporal order. The process of automatically discovering common paths and building a state machine for detecting power system scenarios and attacks is introduced. The classification results show that the proposed approach can accurately detect these scenarios even with variation in fault locations and load conditions. This dissertation also describes a hybrid intrusion detection framework that employs the mining common path algorithm to enable a systematic and automatic IDS construction process. An IDS prototype was validated on a 2-line 3-bus power transmission system protected by the distance protection scheme. The result shows the IDS prototype accurately classifies 25 power system scenarios including disturbances, normal control operations, and cyber-attacks. Fuzzing Common Paths Sequential Pattern Mining Test Bed
17	Frequent Pattern Mining among Weighted and Directed Graphs Cederquist, Aaron January 2009 (has links) No description available. Computer Science data mining graph canonical form frequent pattern mining
18	Discovering Contiguous Sequential Patterns in Network-Constrained Movement Yang, Can January 2017 (has links) A large proportion of movement in urban area is constrained to a road network such as pedestrian, bicycle and vehicle. That movement information is commonly collected by Global Positioning System (GPS) sensor, which has generated large collections of trajectories. A contiguous sequential pattern (CSP) in these trajectories represents a certain number of objects traversing a sequence of spatially contiguous edges in the network, which is an intuitive way to study regularities in network-constrained movement. CSPs are closely related to route choices and traffic flows and can be useful in travel demand modeling and transportation planning. However, the efficient and scalable extraction of CSPs and effective visualization of the heavily overlapping CSPs are remaining challenges. To address these challenges, the thesis develops two algorithms and a visual analytics system. Firstly, a fast map matching (FMM) algorithm is designed for matching a noisy trajectory to a sequence of edges traversed by the object with a high performance. Secondly, an algorithm called bidirectional pruning based closed contiguous sequential pattern mining (BP-CCSM) is developed to extract sequential patterns with closeness and contiguity constraint from the map matched trajectories. Finally, a visual analytics system called sequential pattern explorer for trajectories (SPET) is designed for interactive mining and visualization of CSPs in a large collection of trajectories. Extensive experiments are performed on a real-world taxi trip GPS dataset to evaluate the algorithms and visual analytics system. The results demonstrate that FMM achieves a superior performance by replacing repeated routing queries with hash table lookups. BP-CCSM considerably outperforms three state-of-the-art algorithms in terms of running time and memory consumption. SPET enables the user to efficiently and conveniently explore spatial and temporal variations of CSPs in network-constrained movement. / <p>QC 20171122</p> map matching trajectory pattern mining trajectory pattern visualization Other Computer and Information Science Annan data- och informationsvetenskap
19	Fouille de motifs : entre accessibilité et robustesse / Pattern mining : between accessibility and robustness Abboud, Yacine 28 November 2018 (has links) L'information occupe désormais une place centrale dans notre vie quotidienne, elle est à la fois omniprésente et facile d'accès. Pourtant, l'extraction de l'information à partir des données est un processus souvent inaccessible. En effet, même si les méthodes de fouilles de données sont maintenant accessibles à tous, les résultats de ces fouilles sont souvent complexes à obtenir et à exploiter pour l'utilisateur. La fouille de motifs combinée à l'utilisation de contraintes est une direction très prometteuse de la littérature pour à la fois améliorer l'efficience de la fouille et rendre ses résultats plus appréhendables par l'utilisateur. Cependant, la combinaison de contraintes désirée par l'utilisateur est souvent problématique car, elle n'est pas toujours adaptable aux caractéristiques des données fouillées tel que le bruit. Dans cette thèse, nous proposons deux nouvelles contraintes et un algorithme pour pallier ce problème. La contrainte de robustesse permet de fouiller des données bruitées en conservant la valeur ajoutée de la contrainte de contiguïté. La contrainte de clôture allégée améliore l'appréhendabilité de la fouille de motifs tout en étant plus résistante au bruit que la contrainte de clôture classique. L'algorithme C3Ro est un algorithme générique de fouille de motifs séquentiels intégrant de nombreuses contraintes, notamment les deux nouvelles contraintes que nous avons introduites, afin de proposer à l'utilisateur la fouille la plus efficiente possible tout en réduisant au maximum la taille de l'ensemble des motifs extraits. C3Ro rivalise avec les meilleurs algorithmes de fouille de motifs de la littérature en termes de temps d'exécution tout en consommant significativement moins de mémoire. C3Ro a été expérimenté dans le cadre de l’extraction de compétences présentes dans les offres d'emploi sur le Web / Information now occupies a central place in our daily lives, it is both ubiquitous and easy to access. Yet extracting information from data is often an inaccessible process. Indeed, even though data mining methods are now accessible to all, the results of these mining are often complex to obtain and exploit for the user. Pattern mining combined with the use of constraints is a very promising direction of the literature to both improve the efficiency of the mining and make its results more apprehensible to the user. However, the combination of constraints desired by the user is often problematic because it does not always fit with the characteristics of the searched data such as noise. In this thesis, we propose two new constraints and an algorithm to overcome this issue. The robustness constraint allows to mine noisy data while preserving the added value of the contiguity constraint. The extended closedness constraint improves the apprehensibility of the set of extracted patterns while being more noise-resistant than the conventional closedness constraint. The C3Ro algorithm is a generic sequential pattern mining algorithm that integrates many constraints, including the two new constraints that we have introduced, to provide the user the most efficient mining possible while reducing the size of the set of extracted patterns. C3Ro competes with the best pattern mining algorithms in the literature in terms of execution time while consuming significantly less memory. C3Ro has been experienced in extracting competencies from web-based job postings Fouille de données Fouille de motifs Contraintes Résistance au bruit Data mining Pattern mining Constraints Noise-resistant 006.312
20	Pattern Mining and Sense-Making Support for Enhancing the User Experience Mukherji, Abhishek 07 December 2018 (has links) While data mining techniques such as frequent itemset and sequence mining are well established as powerful pattern discovery tools in domains from science, medicine to business, a detriment is the lack of support for interactive exploration of high numbers of patterns generated with diverse parameter settings and the relationships among the mined patterns. To enhance the user experience, real-time query turnaround times and improved support for interactive mining are desired. There is also an increasing interest in applying data mining solutions for mobile data. Patterns mined over mobile data may enable context-aware applications ranging from automating frequently repeated tasks to providing personalized recommendations. Overall, this dissertation addresses three problems that limit the utility of data mining, namely, (a.) lack of interactive exploration tools for mined patterns, (b.) insufficient support for mining localized patterns, and (c.) high computational mining requirements prohibiting mining of patterns on smaller compute units such as a smartphone. This dissertation develops interactive frameworks for the guided exploration of mined patterns and their relationships. Contributions include the PARAS pre- processing and indexing framework; enabling analysts to gain key insights into rule relationships in a parameter space view due to the compact storage of rules that enables query-time reconstruction of complete rulesets. Contributions also include the visual rule exploration framework FIRE that presents an interactive dual view of the parameter space and the rule space, that together enable enhanced sense-making of rule relationships. This dissertation also supports the online mining of localized association rules computed on data subsets by selectively deploying alternative execution strategies that leverage multidimensional itemset-based data partitioning index. Finally, we designed OLAPH, an on-device context-aware service that learns phone usage patterns over mobile context data such as app usage, location, call and SMS logs to provide device intelligence. Concepts introduced for modeling mobile data as sequences include compressing context logs to intervaled context events, adding generalized time features, and identifying meaningful sequences via filter expressions.

Search results