Global ETD Search

1	Clustering, dimensionality reduction, and side information Law, Hiu Chung. January 2006 (has links) Thesis (Ph. D.)--Michigan State University. Dept. of Computer Science & Engineering, 2006. / Title from PDF t.p. (viewed on June 19, 2009) Includes bibliographical references (p. 296-317). Also issued in print.
2	MACHINE LEARNING BASED IDS LOG ANALYSIS Tianshuai Guan (10710258) 06 May 2021 (has links) <p>With the rapid development of information technology, network traffic is also increasing dramatically. However, many cyber-attack records are buried in this large amount of network trafficking. Therefore, many Intrusion Detection Systems (IDS) that can extract those malicious activities have been developed. Zeek is one of them, and due to its powerful functions and open-source environment, Zeek has been adapted by many organizations. Information Technology at Purdue (ITaP), which uses Zeek as their IDS, captures netflow logs for all the network activities in the whole campus area but has not delved into effective use of the information. This thesis examines ways to help increase the performance of anomaly detection. As a result, this project intends to combine basic database concepts with several different machine learning algorithms and compare the result from different combinations to better find potential attack activities in log files.</p> Pattern Recognition and Data Mining Clustering analysis IDS,
3	Exploring Node Attributes for Data Mining in Attributed Graphs Jihwan Lee (6639122) 10 June 2019 (has links) Graphs have attracted researchers in various fields in that many different kinds of real-world entities and relationships between them can be represented and analyzed effectively and efficiently using graphs. In particular, researchers in data mining and machine learning areas have developed algorithms and models to understand the complex graph data better and perform various data mining tasks. While a large body of work exists on graph mining, most existing work does not fully exploit attributes attached to graph nodes or edges.<div><br></div><div>In this dissertation, we exploit node attributes to generate better solutions to several graph data mining problems addressed in the literature. First, we introduce the notion of statistically significant attribute associations in attribute graphs and propose an effective and efficient algorithm to discover those associations. The effectiveness analysis on the results shows that our proposed algorithm can reveal insightful attribute associations that cannot be identified using the earlier methods focused solely on frequency. Second, we build a probabilistic generative model for observed attributed graphs. Under the assumption that there exist hidden communities behind nodes in a graph, we adopt the idea of latent topic distributions to model a generative process of node attribute values and link structure more precisely. This model can be used to detect hidden communities and profile missing attribute values. Lastly, we investigate how to employ node attributes to learn latent representations of nodes in lower dimensional embedding spaces and use the learned representations to improve the performance of data mining tasks over attributed graphs.<br></div> Pattern Recognition and Data Mining Attributed Graphs Data Mining Machine Learning
4	Mining simple and complex patterns efficiently using binary decision diagrams / Loekito, Elsa. January 2009 (has links) Thesis (Ph.D.)--University of Melbourne, Dept. of Computer Science and Software Engineering, 2009. / Typescript. Includes bibliographical references (p. 209-228)
5	Automated Discovery of Real-Time Network Camera Data from Heterogeneous Web Pages Ryan Merrill Dailey (8086355) 14 January 2021 (has links) <div>Reduction in the cost of Network Cameras along with a rise in connectivity enables entities all around the world to deploy vast arrays of camera networks. Network cameras offer real-time visual data that can be used for studying traffic patterns, emergency response, security, and other applications. Although many sources of Network Camera data are available, collecting the data remains difficult due to variations in programming interface and website structures. Previous solutions rely on manually parsing the target website, taking many hours to complete. We create a general and automated solution for indexing Network Camera data spread across thousands of uniquely structured webpages. We analyze heterogeneous webpage structures and identify common characteristics among 73 sample Network Camera websites (each website has multiple web pages). These characteristics are then used to build an automated camera discovery module that crawls and indexes Network Camera data. Our system successfully extracts 57,364 Network Cameras from 237,257 unique web pages. </div> Computer Engineering Pattern Recognition and Data Mining Web scraping Network Cameras Web Content Differentiation Automated Data Aggregation
6	HYBRID FEATURE SELECTION IN NETWORK INTRUSION DETECTION USING DECISION TREE Chenxi Xiong (9028061) 27 June 2020 (has links) The intrusion detection system has been widely studied and deployed by researchers for providing better security to computer networks. The increasing of the attack volume and the dramatic advancement of the machine learning make the cooperation between the intrusion detection system and machine learning a hot topic and a promising solution for the cybersecurity. Machine learning usually involves the training process using huge amount of sample data. Since the huge input data may cause a negative effect on the training and detection performance of the machine learning model. Feature selection becomes a crucial technique to rule out the irrelevant and redundant features from the dataset. This study applied a feature selection approach that combines the advanced feature selection algorithms and attacks characteristic features to produce the optimal feature subset for the machine learning model in network intrusion detection. The optimal feature subset was created using the CSE-CIC-IDS2018 dataset, which is the most up-to-date benchmark dataset with comprehensive attack diversity and features. The result of the experiment was produced using machine learning models with decision tree classifier and analyzed with respect to the accuracy, precision, recall, and f1 score. Pattern Recognition and Data Mining Networking and Communications Network intrusion detection machine learning Feature selection
7	A Machine Learning Approach for Uniform Intrusion Detection Saurabh Devulapalli (11167824) 23 July 2021 (has links) Intrusion Detection Systems are vital for computer networks as they protect against attacks that lead to privacy breaches and data leaks. Over the years, researchers have formulated intrusion detection systems (IDS) using machine learning and/or deep learning to detect network anomalies and identify four main attacks namely, Denial of Service (DoS), Probe, Remote to Local (R2L) and User to Root (U2R). However, the existing models are efficient in detecting just few of the aforementioned attacks while having inadequate detection rates for the rest. This deficiency makes it difficult to choose an appropriate IDS model when a user does not know what attacks to expect. Thus, there is a need for an IDS model that can detect, with uniform efficiency, all the four main classes of network intrusions. This research is aimed at exploring a machine learning approach to an intrusion detection model that can detect DoS, Probe, R2L and U2R attack classes with uniform and high efficiency. A multilayer perceptron was trained in an ensemble with J48 decision tree. The resultant ensemble learning model achieved over 85% detection rates for each of DoS, probe, R2L, and U2R attacks. Pattern Recognition and Data Mining Computer System Security Machine Learning Intrusion Detection
8	Neural Representation Learning for Semi-Supervised Node Classification and Explainability Hogun Park (9179561) 28 July 2020 (has links) <div>Many real-world domains are relational, consisting of objects (e.g., users and pa- pers) linked to each other in various ways. Because class labels in graphs are often only available for a subset of the nodes, semi-supervised learning for graphs has been studied extensively to predict the unobserved class labels. For example, we can pre- dict political views in a partially labeled social graph dataset and get expected gross incomes of movies in an actor/movie graph with a few labels. Recently, advances in representation learning for graph data have made great strides for the semi-supervised node classification. However, most of the methods have mainly focused on learning node representations by considering simple relational properties (e.g., random walk) or aggregating nearby attributes, and it is still challenging to learn complex inter- action patterns in partially labeled graphs and provide explanations on the learned representations. </div><div><br></div><div>In this dissertation, multiple methods are proposed to alleviate both challenges for semi-supervised node classification. First, we propose a graph neural network architecture, REGNN, that leverages local inferences for unlabeled nodes. REGNN performs graph convolution to enable label propagation via high-order paths and predicts class labels for unlabeled nodes. In particular, our proposed attention layer of REGNN measures the role equivalence among nodes and effectively reduces the noise, which is generated during the aggregation of observed labels from distant neighbors at various distances. Second, we also propose a neural network archi- tecture that jointly captures both temporal and static interaction patterns, which we call Temporal-Static-Graph-Net (TSGNet). The architecture learns a latent rep- resentation of each node in order to encode complex interaction patterns. Our key insight is that leveraging both a static neighbor encoder, that learns aggregate neigh- bor patterns, and a graph neural network-based recurrent unit, that captures complex interaction patterns, improves the performance of node classification. Lastly, in spite of better performance of representation learning on node classification tasks, neural network-based representation learning models are still less interpretable than the pre- vious relational learning models due to the lack of explanation methods. To address the problem, we show that nodes with high bridgeness scores have larger impacts on node embeddings such as DeepWalk, LINE, Struc2Vec, and PTE under perturbation. However, it is computationally heavy to get bridgeness scores, and we propose a novel gradient-based explanation method, GRAPH-wGD, to find nodes with high bridgeness efficiently. In our evaluations, our proposed architectures (REGNN and TSGNet) for semi-supervised node classification consistently improve predictive performance on real-world datasets. Our GRAPH-wGD also identifies important nodes as global explanations, which significantly change both predicted probabilities on node classification tasks and k-nearest neighbors in the embedding space after perturbing the highly ranked nodes and re-learning low-dimensional node representations for DeepWalk and LINE embedding methods.</div> Applied Computer Science Pattern Recognition and Data Mining Neural Representation Learning Semi-Supervised Node Classification Explainability
9	TOWARDS TIME-AWARE COLLABORATIVE FILTERING RECOMMENDATION SYSTEM Dawei Wang (9216029) 12 October 2021 (has links) <div><div><div><p>As technological capacity to store and exchange information progress, the amount of available data grows explosively, which can lead to information overload. The dif- ficulty of making decisions effectively increases when one has too much information about that issue. Recommendation systems are a subclass of information filtering systems that aim to predict a user’s opinion or preference of topic or item, thereby providing personalized recommendations to users by exploiting historic data. They are widely used in e-commerce such as Amazon.com, online movie streaming com- panies such as Netflix, and social media networks such as Facebook. Memory-based collaborative filtering (CF) is one of the recommendation system methods used to predict a user’s rating or preference by exploring historic ratings, but without in- corporating any content information about users or items. Many studies have been conducted on memory-based CFs to improve prediction accuracy, but none of them have achieved better prediction accuracy than state-of-the-art model-based CFs. Fur- thermore, A product or service is not judged only by its own characteristics but also by the characteristics of other products or services offered concurrently. It can also be judged by anchoring based on users’ memories. Rating or satisfaction is viewed as a function of the discrepancy or contrast between expected and obtained outcomes documented as contrast effects. Thus, a rating given to an item by a user is a compar- ative opinion based on the user’s past experiences. Therefore, the score of ratings can be affected by the sequence and time of ratings. However, in traditional CFs, pairwise similarities measured between items do not consider time factors such as the sequence of rating, which could introduce biases caused by contrast effects. In this research, we proposed a new approach that combines both structural and rating-based similarity measurement used in memory-based CFs. We found that memory-based CF using combined similarity measurement can achieve better prediction accuracy than model-based CFs in terms of lower MAE and reduce memory and time by using less neighbors than traditional memory-based CFs on MovieLens and Netflix datasets. We also proposed techniques to reduce the biases caused by those user comparing, anchoring and adjustment behaviors by introducing the time-aware similarity measurements used in memory-based CFs. At last, we introduced novel techniques to identify, quantify, and visualize user preference dynamics and how it could be used in generating dynamic recommendation lists that fits each user’s current preferences.</p></div></div></div> Operations Research Pattern Recognition and Data Mining machine Learning recommendation system collaborative filtering Similarity measurement
10	Bottom-up, Context-Driven Visual Object Understanding Sepehr Farhand (11799710) 20 December 2021 (has links) Recent developments in the computer vision field achieve state-of-the-art performance by utilizing large-scale training datasets and in the absence of that, generating synthetic datasets of said magnitude. Yet, for certain applications, it is not feasible to synthesize high fidelity training data (e.g., biomedical computer vision domain), or to achieve detailed explainability for the program's decisions. Formulating a part-based approach can help alleviate the aforementioned challenges as (i) a scene can naturally be decomposed into a hierarchical part-based structure, and (ii) using domain knowledge by incorporating the object parts' topological and geometrical constraints reduces the complexity of learning and inference, benefiting methods in terms of data efficiency and computational resources. This dissertation investigates multiple applications that benefit from a part-based solution regarding the applications' performance metrics and/or computational efficiency. We develop part-based methods for registration, segmentation, unsupervised object discovery in large-scale image collections, and unsupervised unknown foreground discovery in streaming scenarios. Computer Vision Image Processing Pattern Recognition and Data Mining segmentation registration Dynamic environments part-based image analysis unsupervised object discovery unknown foreground detection

Search results