Global ETD Search

11	RECOMMENDATION SYSTEMS IN SOCIAL NETWORKS Behafarid Mohammad Jafari (15348268) 18 May 2023 (has links) <p> The dramatic improvement in information and communication technology (ICT) has made an evolution in learning management systems (LMS). The rapid growth in LMSs has caused users to demand more advanced, automated, and intelligent services. CourseNetworking is a next-generation LMS adopting machine learning to add personalization, gamification, and more dynamics to the system. This work tries to come up with two recommender systems that can help improve CourseNetworking services. The first one is a social recommender system helping CourseNetworking to track user interests and give more relevant recommendations. Recently, graph neural network (GNN) techniques have been employed in social recommender systems due to their high success in graph representation learning, including social network graphs. Despite the rapid advances in recommender systems performance, dealing with the dynamic property of the social network data is one of the key challenges that is remained to be addressed. In this research, a novel method is presented that provides social recommendations by incorporating the dynamic property of social network data in a heterogeneous graph by supplementing the graph with time span nodes that are used to define users long-term and short-term preferences over time. The second service that is proposed to add to Rumi services is a hashtag recommendation system that can help users label their posts quickly resulting in improved searchability of content. In recent years, several hashtag recommendation methods are proposed and developed to speed up processing of the texts and quickly find out the critical phrases. The methods use different approaches and techniques to obtain critical information from a large amount of data. This work investigates the efficiency of unsupervised keyword extraction methods for hashtag recommendation and recommends the one with the best performance to use in a hashtag recommender system. </p> Knowledge representation and reasoning Natural language processing Data mining and knowledge discovery Graph, social and multimedia data Recommender systems Recommender Systems Graph neural networks (GNNs) Natural Language Processing (NLP) Machine Learning
12	Models and Representation Learning Mechanisms for Graph Data Susheel Suresh (14228138) 15 December 2022 (has links) <p>Graph representation learning (GRL) has been increasing used to model and understand data from a wide variety of complex systems spanning social, technological, bio-chemical and physical domains. GRL consists of two main components (1) a parametrized encoder that provides representations of graph data and (2) a learning process to train the encoder parameters. Designing flexible encoders that capture the underlying invariances and characteristics of graph data are crucial to the success of GRL. On the other hand, the learning process drives the quality of the encoder representations and developing principled learning mechanisms are vital for a number of growing applications in self-supervised, transfer and federated learning settings. To this end, we propose a suite of models and learning algorithms for GRL which form the two main thrusts of this dissertation.</p> <p><br></p> <p>In Thrust I, we propose two novel encoders which build upon on a widely popular GRL encoder class called graph neural networks (GNNs). First, we empirically study the prediction performance of current GNN based encoders when applied to graphs with heterogeneous node mixing patterns using our proposed notion of local assortativity. We find that GNN performance in node prediction tasks strongly correlates with our local assortativity metric---thereby introducing a limit. We propose to transform the input graph into a computation graph with proximity and structural information as distinct types of edges. We then propose a novel GNN based encoder that operates on this computation graph and adaptively chooses between structure and proximity information. Empirically, adopting our transformation and encoder framework leads to improved node classification performance compared to baselines in real-world graphs that exhibit diverse mixing.</p> <p>Secondly, we study the trade-off between expressivity and efficiency of GNNs when applied to temporal graphs for the task of link ranking. We develop an encoder that incorporates a labeling approach designed to allow for efficient inference over the candidate set jointly, while provably boosting expressivity. We also propose to optimize a list-wise loss for improved ranking. With extensive evaluation on real-world temporal graphs, we demonstrate its improved performance and efficiency compared to baselines.</p> <p><br></p> <p>In Thrust II, we propose two principled encoder learning mechanisms for challenging and realistic graph data settings. First, we consider a scenario where only limited or even no labelled data is available for GRL. Recent research has converged on graph contrastive learning (GCL), where GNNs are trained to maximize the correspondence between representations of the same graph in its different augmented forms. However, we find that GNNs trained by traditional GCL often risk capturing redundant graph features and thus may be brittle and provide sub-par performance in downstream tasks. We then propose a novel principle, termed adversarial-GCL (AD-GCL), which enables GNNs to avoid capturing redundant information during the training by optimizing adversarial graph augmentation strategies used in GCL. We pair AD-GCL with theoretical explanations and design a practical instantiation based on trainable edge-dropping graph augmentation. We experimentally validate AD-GCL by comparing with state-of-the-art GCL methods and achieve performance gains in semi-supervised, unsupervised and transfer learning settings using benchmark chemical and biological molecule datasets. </p> <p>Secondly, we consider a scenario where graph data is silo-ed across clients for GRL. We focus on two unique challenges encountered when applying distributed training to GRL: (i) client task heterogeneity and (ii) label scarcity. We propose a novel learning framework called federated self-supervised graph learning (FedSGL), which first utilizes a self-supervised objective to train GNNs in a federated fashion across clients and then, each client fine-tunes the obtained GNNs based on its local task and available labels. Our framework enables the federated GNN model to extract patterns from the common feature (attribute and graph topology) space without the need of labels or being biased by heterogeneous local tasks. Extensive empirical study of FedSGL on both node and graph classification tasks yields fruitful insights into how the level of feature / task heterogeneity, the adopted federated algorithm and the level of label scarcity affects the clients’ performance in their tasks.</p> Data mining and knowledge discovery Graph, social and multimedia data Deep learning Neural networks Semi- and unsupervised learning Graph Neural Networks (GNNs) Deep Learning Self Supervised Learning Federated Learning frameworks
13	Node Centric Community Detection and Evolutional Prediction in Dynamic Networks Oluwafolake A Ayano (13161288) 27 July 2022 (has links) <p> </p> <p>Advances in technology have led to the availability of data from different platforms such as the web and social media platforms. Much of this data can be represented in the form of a network consisting of a set of nodes connected by edges. The nodes represent the items in the networks while the edges represent the interactions between the nodes. Community detection methods have been used extensively in analyzing these networks. However, community detection in evolving networks has been a significant challenge because of the frequent changes to the networks and the need for real-time analysis. Using Static community detection methods for analyzing dynamic networks will not be appropriate because static methods do not retain a network’s history and cannot provide real-time information about the communities in the network.</p> <p>Existing incremental methods treat changes to the network as a sequence of edge additions and/or removals; however, in many real-world networks, changes occur when a node is added with all its edges connecting simultaneously. </p> <p>For efficient processing of such large networks in a timely manner, there is a need for an adaptive analytical method that can process large networks without recomputing the entire network after its evolution and treat all the edges involved with a node equally. </p> <p>We proposed a node-centric community detection method that incrementally updates the community structure in the network using the already known structure of the network to avoid recomputing the entire network from the scratch and consequently achieve a high-quality community structure. The results from our experiments suggest that our approach is efficient for incremental community detection of node-centric evolving networks. </p> Data engineering and data science Data mining and knowledge discovery Graph, social and multimedia data Community Detection Dynamic Networks IP Networks Clustering Big Data Analytics
14	E-model: event-based graph data model theory and implementation Kim, Pilho 06 July 2009 (has links) The necessity of managing disparate data models is increasing within all IT areas. Emerging hybrid relational-XML systems are under development in this context to support both relational and XML data models. However, there are ever-growing needs for adequate data models for texts and multimedia, which are applications that require proper storage, and their capability to coexist and collaborate with other data models is as important as that of a relational-XML hybrid model. This work proposes a new data model named E-model that supports rich relations and reflects the dynamic nature of information. This E-model introduces abstract data typing objects and rules of relation that support: (1) the notion of time in object definition and relation, (2) multiple-type relations, (3) complex schema modeling methods using a relational directed acyclic graph, and (4) interoperation with popular data models. To implement the E-model prototype, extensive data operation APIs have been developed on top of relational databases. In processing dynamic queries, our prototype achieves an order of magnitude improvement in speed compared with popular data models. Based on extensive E-model APIs, a new language named EML is proposed. EML extends the SQL-89 standard with various E-model features: (1) unstructured queries, (2) unified object namespaces, (3) temporal queries, (4) ranking orders, (5) path queries, and (6) semantic expansions. The E-model system can interoperate with popular data models with its rich relations and flexible structure to support complex data models. It can act as a stand-alone database server or it can also provide materialized views for interoperation with other data models. It can also co-exist with established database systems as a centralized online archive or as a proxy database server. The current E-model prototype system was implemented on top of a relational database. This allows significant benefits from established database engines in application development. In addition to extensive features added to SQL, our EML prototype achieves an order of magnitude speed improvement in dynamic queries compared to popular database models. Availability Release the entire work immediately for access worldwide after my graduation. Database architectures Multimedia databases Modeling structured Textual and multimedia data Graphs and networks Linked representations Modeling and management Data models Database models Schema and subschema Data translation Database design Data structures (Computer science) Databases Multimedia systems
15	Automatické strojové metody získávání znalostí z multimediálních dat / Automatic Machine Learning Methods for Multimedia Data Analysis Mašek, Jan January 2016 (has links) The quality and efficient processing of increasing amount of multimedia data is nowadays becoming increasingly needed to obtain some knowledge of this data. The thesis deals with a research, implementation, optimization and the experimental verification of automatic machine learning methods for multimedia data analysis. Created approach achieves higher accuracy in comparison with common methods, when applied on selected examples. Selected results were published in journals with impact factor [1, 2]. For these reasons special parallel computing methods were created in this work. These methods use massively parallel hardware to save electric energy and computing time and for achieving better result while solving problems. Computations which usually take days can be computed in minutes using new optimized methods. The functionality of created methods was verified on selected problems: artery detection from ultrasound images with further classifying of artery disease, the buildings detection from aerial images for obtaining geographical coordinates, the detection of materials contained in meteorite from CT images, the processing of huge databases of structured data, the classification of metallurgical materials with using laser induced breakdown spectroscopy and the automatic classification of emotions from texts.
16	TEMPORAL EVENT MODELING OF SOCIAL HARM WITH HIGH DIMENSIONAL AND LATENT COVARIATES Xueying Liu (13118850) 09 September 2022 (has links) <p> </p> <p>The counting process is the fundamental of many real-world problems with event data. Poisson process, used as the background intensity of Hawkes process, is the most commonly used point process. The Hawkes process, a self-exciting point process fits to temporal event data, spatial-temporal event data, and event data with covariates. We study the Hawkes process that fits to heterogeneous drug overdose data via a novel semi-parametric approach. The counting process is also related to survival data based on the fact that they both study the occurrences of events over time. We fit a Cox model to temporal event data with a large corpus that is processed into high dimensional covariates. We study the significant features that influence the intensity of events. </p> Data mining and knowledge discovery Graph, social and multimedia data Information extraction and fusion Deep learning Semi- and unsupervised learning Hawke Process Latent Covariates Social Harms Spatial-Temporal Data Cox Proportional Hazard Model Temporal Event Sequence Counting Process
17	Dynamic Network Modeling from Temporal Motifs and Attributed Node Activity Giselle Zeno (16675878) 26 July 2023 (has links) <p>The most important networks from different domains—such as Computing, Organization, Economic, Social, Academic, and Biology—are networks that change over time. For example, in an organization there are email and collaboration networks (e.g., different people or teams working on a document). Apart from the connectivity of the networks changing over time, they can contain attributes such as the topic of an email or message, contents of a document, or the interests of a person in an academic citation or a social network. Analyzing these dynamic networks can be critical in decision-making processes. For instance, in an organization, getting insight into how people from different teams collaborate, provides important information that can be used to optimize workflows.</p> <p><br></p> <p>Network generative models provide a way to study and analyze networks. For example, benchmarking model performance and generalization in tasks like node classification, can be done by evaluating models on synthetic networks generated with varying structure and attribute correlation. In this work, we begin by presenting our systemic study of the impact that graph structure and attribute auto-correlation on the task of node classification using collective inference. This is the first time such an extensive study has been done. We take advantage of a recently developed method that samples attributed networks—although static—with varying network structure jointly with correlated attributes. We find that the graph connectivity that contributes to the network auto-correlation (i.e., the local relationships of nodes) and density have the highest impact on the performance of collective inference methods.</p> <p><br></p> <p>Most of the literature to date has focused on static representations of networks, partially due to the difficulty of finding readily-available datasets of dynamic networks. Dynamic network generative models can bridge this gap by generating synthetic graphs similar to observed real-world networks. Given that motifs have been established as building blocks for the structure of real-world networks, modeling them can help to generate the graph structure seen and capture correlations in node connections and activity. Therefore, we continue with a study of motif evolution in <em>dynamic</em> temporal graphs. Our key insight is that motifs rarely change configurations in fast-changing dynamic networks (e.g. wedges intotriangles, and vice-versa), but rather keep reappearing at different times while keeping the same configuration. This finding motivates the generative process of our proposed models, using temporal motifs as building blocks, that generates dynamic graphs with links that appear and disappear over time.</p> <p><br></p> <p>Our first proposed model generates dynamic networks based on motif-activity and the roles that nodes play in a motif. For example, a wedge is sampled based on the likelihood of one node having the role of hub with the two other nodes being the spokes. Our model learns all parameters from observed data, with the goal of producing synthetic graphs with similar graph structure and node behavior. We find that using motifs and node roles helps our model generate the more complex structures and the temporal node behavior seen in real-world dynamic networks.</p> <p><br></p> <p>After observing that using motif node-roles helps to capture the changing local structure and behavior of nodes, we extend our work to also consider the attributes generated by nodes’ activities. We propose a second generative model for attributed dynamic networks that (i) captures network structure dynamics through temporal motifs, and (ii) extends the structural roles of nodes in motifs to roles that generate content embeddings. Our new proposed model is the first to generate synthetic dynamic networks and sample content embeddings based on motif node roles. To the best of our knowledge, it is the only attributed dynamic network model that can generate <em>new</em> content embeddings—not observed in the input graph, but still similar to that of the input graph. Our results show that modeling the network attributes with higher-order structures (e.g., motifs) improves the quality of the networks generated.</p> <p><br></p> <p>The generative models proposed address the difficulty of finding readily-available datasets of dynamic networks—attributed or not. This work will also allow others to: (i) generate networks that they can share without divulging individual’s private data, (ii) benchmark model performance, and (iii) explore model generalization on a broader range of conditions, among other uses. Finally, the evaluation measures proposed will elucidate models, allowing fellow researchers to push forward in these domains.</p> Modelling and simulation Data mining and knowledge discovery Graph, social and multimedia data Neural networks Graph Machine Learning network evolution model temporal graph model Dynamic Networks, Attributed Graphs Social network analysis tools convolutional neural network (CNN) graph convolutional network (GCN) node embeddings language model bert Collective classification Collective inference Node classification model evaluation techniques synthetic networks BERT models pre-trained language models
18	Scalable Parallel Machine Learning on High Performance Computing Systems–Clustering and Reinforcement Learning Weijian Zheng (14226626) 08 December 2022 (has links) <p>High-performance computing (HPC) and machine learning (ML) have been widely adopted by both academia and industries to address enormous data problems at extreme scales. While research has reported on the interactions of HPC and ML, achieving high performance and scalability for parallel and distributed ML algorithms is still a challenging task. This dissertation first summarizes the major challenges for applying HPC to ML applications: 1) poor performance and scalability, 2) loss of the convergence rate, 3) lower quality of the trained model, and 4) a lack of performance optimization techniques designed for specific applications. Researchers can address the four challenges in new ML applications. This dissertation shows how to solve them for two specific applications: 1) a clustering algorithm and 2) graph optimization algorithms that use reinforcement learning (RL).</p> <p>As to the clustering algorithm, we first propose an algorithm called the simulated-annealing clustering algorithm. By combining a blocked data layout and asynchronous local optimization within each thread, the simulated-annealing enhanced clustering algorithm has a convergence rate that is comparable to the K-means algorithm but with much higher performance. Experiments with synthetic and real-world datasets show that the simulated-annealing enhanced clustering algorithm is significantly faster than the MPI K-means library using up to 1024 cores. However, the optimization costs (Sum of Square Error (SSE)) of the simulated-annealing enhanced clustering algorithm became higher than the original costs. To tackle this problem, we devise a new algorithm called the full-step feel-the-way clustering algorithm. In the full-step feel-the-way algorithm, there are L local steps within each block of data points. We use the first local step’s results to compute accurate global optimization costs. Our results show that the full-step algorithm can significantly reduce the global number of iterations needed to converge while obtaining low SSE costs. However, the time spent on the local steps is greater than the benefits of the saved iterations. To improve this performance, we next optimize the local step time by incorporating a sampling-based method called reassignment-history-aware sampling. Extensive experiments with various synthetic and real world datasets (e.g., MNIST, CIFAR-10, ENRON, and PLACES-2) show that our parallel algorithms can outperform the fastest open-source MPI K-means implementation by up to 110% on 4,096 CPU cores with comparable SSE costs.</p> <p>Our evaluations of the sampling-based feel-the-way algorithm establish the effectiveness of the local optimization strategy, the blocked data layout, and the sampling methods for addressing the challenges of applying HPC to ML applications. To explore more parallel strategies and optimization techniques, we focus on a more complex application: graph optimization problems using reinforcement learning (RL). RL has proved successful for automatically learning good heuristics to solve graph optimization problems. However, the existing RL systems either do not support graph RL environments or do not support multiple or many GPUs in a distributed setting. This has compromised RL’s ability to solve large scale graph optimization problems due to the lack of parallelization and high scalability. To address the challenges of parallelization and scalability, we develop OpenGraphGym-MG, a high performance distributed-GPU RL framework for solving graph optimization problems. OpenGraphGym-MG focuses on a class of computationally demanding RL problems in which both the RL environment and the policy model are highly computation intensive. In this work, we distribute large-scale graphs across distributed GPUs and use spatial parallelism and data parallelism to achieve scalable performance. We compare and analyze the performance of spatial and data parallelism and highlight their differences. To support graph neural network (GNN) layers that take data samples partitioned across distributed GPUs as input, we design new parallel mathematical kernels to perform operations on distributed 3D sparse and 3D dense tensors. To handle costly RL environments, we design new parallel graph environments to scale up all RL-environment-related operations. By combining the scalable GNN layers with the scalable RL environment, we are able to develop high performance OpenGraphGym-MG training and inference algorithms in parallel.</p> <p>To summarize, after proposing the major challenges for applying HPC to ML applications, this thesis explores several parallel strategies and performance optimization techniques using two ML applications. Specifically, we propose a local optimization strategy, a blocked data layout, and sampling methods for accelerating the clustering algorithm, and we create a spatial parallelism strategy, a parallel graph environment, agent, and policy model, and an optimized replay buffer, and multi-node selection strategy for solving large optimization problems over graphs. Our evaluations prove the effectiveness of these strategies and demonstrate that our accelerations can significantly outperform the state-of-the-art ML libraries and frameworks without loss of quality in trained models.</p> Graph, social and multimedia data Distributed systems and algorithms High performance computing Reinforcement learning High Performance Computing (HPC) Clustering Algorithm Reinforcement Learning combinatorial optimization problems graph problems Travelling salesperson problem Minimum Vertex Cover Problem Distributed processing of data NP-Hard optimization problems model parallelism data parallelism
19	EXPLORING GRAPH NEURAL NETWORKS FOR CLUSTERING AND CLASSIFICATION Fattah Muhammad Tahabi (14160375) 03 February 2023 (has links) <p><strong>Graph Neural Networks</strong> (GNNs) have become excessively popular and prominent deep learning techniques to analyze structural graph data for their ability to solve complex real-world problems. Because graphs provide an efficient approach to contriving abstract hypothetical concepts, modern research overcomes the limitations of classical graph theory, requiring prior knowledge of the graph structure before employing traditional algorithms. GNNs, an impressive framework for representation learning of graphs, have already produced many state-of-the-art techniques to solve node classification, link prediction, and graph classification tasks. GNNs can learn meaningful representations of graphs incorporating topological structure, node attributes, and neighborhood aggregation to solve supervised, semi-supervised, and unsupervised graph-based problems. In this study, the usefulness of GNNs has been analyzed primarily from two aspects - <strong>clustering and classification</strong>. We focus on these two techniques, as they are the most popular strategies in data mining to discern collected data and employ predictive analysis.</p> Biomechanical engineering Neural engineering Health promotion Preventative health care Applications in health Spatial data and applications Evolutionary computation Natural language processing Planning and decision making Data engineering and data science Data mining and knowledge discovery Graph, social and multimedia data Information retrieval and web search Knowledge and information management Context learning Deep learning Neural networks Semi- and unsupervised learning Data structures and algorithms Graph neural network Node classification Graph clustering Temporal graphs dynamic graphs NODE2VEC Graph Attention Mechanism Hunting BiLSTM model EHR data colorectal Cancer Cancers Cancer symptoms symptom Symptom cluster studies Coauthorship networks network analysis Word2vec Hierarchical Clustering method Dunn index semantic analysis text mining Natural Language Processing Tool UMLS identifiers umls Clinical Data Management

Search results