Spelling suggestions: "subject:"graph mining"" "subject:"raph mining""
1 |
RiboFSM: Frequent Subgraph Mining for the Discovery of RNA Structures and InteractionsGawronski, Alexander 05 November 2013 (has links)
Frequent subgraph mining is a useful method for extracting biologically relevant patterns from a set of graphs or a single large graph. Here, the graph represents all possible RNA structures and interactions. Patterns that are significantly more frequent in this graph over a random graph are extracted. We hypothesize that these patterns are most likely to represent a biological mechanisms. The graph representation used is a directed dual graph, extended to handle intermolecular interactions. The graph is sampled for subgraphs, which are labeled using a canonical labeling method and counted. The resulting patterns are compared to those created from a randomized dataset and scored. The algorithm was applied to the mitochondrial genome of the kinetoplastid species Trypanosoma brucei. This species has a unique RNA editing mechanism that has been well studied, making it a good model organism to test RiboFSM. The most significant patterns contain two stem-loops, indicative of gRNA, and represent interactions of these structures with target mRNA.
|
2 |
RiboFSM: Frequent Subgraph Mining for the Discovery of RNA Structures and InteractionsGawronski, Alexander January 2013 (has links)
Frequent subgraph mining is a useful method for extracting biologically relevant patterns from a set of graphs or a single large graph. Here, the graph represents all possible RNA structures and interactions. Patterns that are significantly more frequent in this graph over a random graph are extracted. We hypothesize that these patterns are most likely to represent a biological mechanisms. The graph representation used is a directed dual graph, extended to handle intermolecular interactions. The graph is sampled for subgraphs, which are labeled using a canonical labeling method and counted. The resulting patterns are compared to those created from a randomized dataset and scored. The algorithm was applied to the mitochondrial genome of the kinetoplastid species Trypanosoma brucei. This species has a unique RNA editing mechanism that has been well studied, making it a good model organism to test RiboFSM. The most significant patterns contain two stem-loops, indicative of gRNA, and represent interactions of these structures with target mRNA.
|
3 |
Mining Approximate Frequent Dense Modules from Multiple Gene Expression DatasetsSeo, San Ha January 2021 (has links)
Large amount of gene expression data has been collected for various environmental and biological conditions. Extracting dense modules that are recurrent in multiple gene coexpression networks has been shown to be promising in functional gene annotation and biomarkers discovery. In this thesis, we propose a biclustering-based approach for mining approximate frequent dense modules. This approach reports a large number of modules with many duplicate modules. Thus, we build on this approach and propose two extended approaches for mining dense modules, which mine set of representative patterns using post-processing and on-line pattern summarization methods. The extended approaches report smaller number of modules and less duplicate modules. Experiments on real gene coexpression networks show that frequent dense modules are biologically interesting as evidenced by the large percentage of biologically enriched frequent dense modules.
|
4 |
Truss decomposition in large probabilistic graphsDaneshmandmehrabani, Mahsa 24 December 2019 (has links)
Truss decomposition is an essential problem in graph mining, which focuses on discovering dense subgraphs of a graph. Detecting trusses in deterministic graphs is extensively studied in the literature. As most of the real-world graphs, such as social, biological, and communication networks, are associated with uncertainty, it is of great importance to study truss decomposition in a probabilistic context. However, the problem has received much less attention in a probabilistic framework. Furthermore, due to computational challenges of truss decomposition in probabilistic graphs, state-of- the-art approaches are not scalable to large graphs. Formally, given a user-defined threshold k (for truss denseness), we are interested in finding all the maximal subgraphs, which are a k-truss with high probability. In this thesis, we introduce a novel approach based on an asynchronous h-index updating process, which offers significant improvement over the state-of-the-art. Our extensive experimental results confirm the scalability and efficiency of our approach. / Graduate
|
5 |
FINDING ANTAGONISTIC COMMUNITIES IN SIGNED UNCERTAIN GRAPHSZhang, Qiqi January 2023 (has links)
Uncertain graph analysis plays a crucial role in many real-world applications, where the presence of uncertain information poses challenges for traditional graph mining algorithms. In this paper, we propose a novel method to find antagonistic communities in signed uncertain graphs, where vertices in the same community have a large expectation of positive edge weights and the vertices in different communities have a large expectation of negative edge weights. By restricting all the computations on small local parts of the signed uncertain graph, our method can efficiently find significant groups of antagonistic communities. We also provide theoretical foundations for the method. Extensive experiments on five real-world datasets and a synthetic dataset demonstrate the outstanding effectiveness and efficiency of the proposed method. / Thesis / Master of Science (MSc)
|
6 |
Ontology Slice Generation and Alignment for Enhanced Life Science Literature SearchBergman Laurila, Jonas January 2009 (has links)
<p>Query composition is an often complicated and cumbersome task for persons performing a literature search. This thesis is part of a project which aims to present possible queries to the user in form of natural language expressions. The thesis presents methods of ontology slice generation. Slices are parts of ontologies connecting two concepts along all possible paths between them. Those slices hence represent all relevant queries connecting the concepts and the paths can in a later step be translated into natural language expressions. Methods of slice alignment, connecting slices that originate from different ontologies, are also presented. The thesis concludes with some example scenarios and comparisons to related work.</p>
|
7 |
Ontology Slice Generation and Alignment for Enhanced Life Science Literature SearchBergman Laurila, Jonas January 2009 (has links)
Query composition is an often complicated and cumbersome task for persons performing a literature search. This thesis is part of a project which aims to present possible queries to the user in form of natural language expressions. The thesis presents methods of ontology slice generation. Slices are parts of ontologies connecting two concepts along all possible paths between them. Those slices hence represent all relevant queries connecting the concepts and the paths can in a later step be translated into natural language expressions. Methods of slice alignment, connecting slices that originate from different ontologies, are also presented. The thesis concludes with some example scenarios and comparisons to related work.
|
8 |
Pattern-Based Vulnerability DiscoveryYamaguchi, Fabian 30 October 2015 (has links)
No description available.
|
9 |
Temporal Graph Mining and Distributed ProcessingKumar, Rohit 19 June 2018 (has links)
With the recent growth of social media platforms and the human desire to interact with the digital world a lot of human-human and human-device interaction data is getting generated every second. With the boom of the Internet of Things (IoT) devices, a lot of device-device interactions are also now on the rise. All these interactions are nothing but a representation of how the underlying network is connecting different entities over time. These interactions when modeled as an interaction network presents a lot of unique opportunities to uncover interesting patterns and to understand the dynamics of the network. Understanding the dynamics of the network is very important because it encapsulates the way we communicate, socialize, consume information and get influenced. To this end, in this PhD thesis, we focus on analyzing an interaction network to understand how the underlying network is being used. We define interaction network as a sequence of time-stamped interactions E over edges of a static graph G=(V, E). Interaction networks can be used to model many real-world networks for example, in a social network or a communication network, each interaction over an edge represents an interaction between two users, e.g. emailing, making a call, re-tweeting, or in case of the financial network an interaction between two accounts to represent a transaction.We analyze interaction network under two settings. In the first setting, we study interaction network under a sliding window model. We assume a node could pass information to other nodes if they are connected to them using edges present in a time window. In this model, we study how the importance or centrality of a node evolves over time. In the second setting, we put additional constraints on how information flows between nodes. We assume a node could pass information to other nodes only if there is a temporal path between them. To restrict the length of the temporal paths we consider a time window in this approach as well. We apply this model to solve the time-constrained influence maximization problem. By analyzing the interaction network data under our model we find the top-k most influential nodes. We test our model both on human-human interaction using social network data as well as on location-location interaction using location-based social network(LBSNs) data. In the same setting, we also mine temporal cyclic paths to understand the communication patterns in a network. Temporal cycles have many applications and appear naturally in communication networks where one person posts a message and after a while reacts to a thread of reactions from peers on the post. In financial networks, on the other hand, the presence of a temporal cycle could be indicative of certain types of fraud. We provide efficient algorithms for all our analysis and test their efficiency and effectiveness on real-world data.Finally, given that many of the algorithms we study have huge computational demands, we also studied distributed graph processing algorithms. An important aspect of these algorithms is to correctly partition the graph data between different machines. A lot of research has been done on efficient graph partitioning strategies but there is no one good partitioning strategy for all kind of graphs and algorithms. Choosing the best partitioning strategy is nontrivial and is mostly a trial and error exercise. To address this problem we provide a cost model based approach to give a better understanding of how a given partitioning strategy is performing for a given graph and algorithm. / Doctorat en Sciences de l'ingénieur et technologie / info:eu-repo/semantics/nonPublished
|
10 |
A Graph-based Approach for Semantic Data MiningLiu, Haishan, Liu, Haishan January 2012 (has links)
Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. It is widely acknowledged that the role of domain knowledge in the discovery process is essential. However, the synergy between domain knowledge and data mining is still at a rudimentary level. This motivates me to develop a framework for explicit incorporation of domain knowledge in a data mining system so that insights can be drawn from both data and domain knowledge. I call such technology "semantic data mining."
Recent research in knowledge representation has led to mature standards such as the Web Ontology Language (OWL) by the W3C's Semantic Web initiative. Semantic Web ontologies have become a key technology for knowledge representation and processing. The OWL ontology language is built on the W3C's Resource Description Framework (RDF) that provides a simple model to describe information resources as a graph. On the other hand, there has been a surge of interest in tackling data mining problems where objects of interest can be best described as a graph of interrelated nodes. I notice that the interface between domain knowledge and data mining can be achieved by using graph representations. Therefore I explore a graph-based approach for modeling both knowledge and data and for analyzing the combined information source from which insight can be drawn systematically.
In summary, I make three main contributions in this dissertation to achieve semantic data mining. First, I develop an information integration solution based on metaheuristic optimization when data mining task require accessing heterogeneous data sources. Second, I describe how a graph interface for both domain knowledge and data can be structured by employing the RDF model and its graph representations. Finally, I describe several graph theoretic analysis approaches for mining the combined information source. I showcase the utility of the proposed methods on finding semantically associated itemsets, a particular case of the frequent pattern mining. I believe these contributions in semantic data mining can provide a novel and useful way to incorporate domain knowledge.
This dissertation includes published and unpublished coauthored material.
|
Page generated in 0.0597 seconds