Spelling suggestions: "subject:"graph data"" "subject:"raph data""
11 |
Modeling and Querying Graph DataYang, Hong 12 March 2009 (has links)
Databases are used in many applications, spanning virtually the entire range of data processing services industry. The data in many database applications can be most naturally represented in the form of a graph structure consisting of various types of nodes and edges with several properties. These graph data can be classified into four categories: social networks describing the relationships between individual person and/or groups of people (e.g. genealogy, network of coauthorship among academics, etc); information networks in which the structure of the network reflects the structure of the information stored in the nodes (e.g. citation network among academic papers, etc); geographic networks, providing geographic information about public transport systems, airline routes, etc; and biological networks (e.g. biochemical networks, neuron network, etc). In order to analyze such networks and obtain desired information that users are interested in, some typical queries must be conducted. It can be seen that many of the query patterns are across multiple categories described above, such as finding nodes with certain properties in a path or graph, finding the distance between nodes, finding sub-graphs, paths enumeration, etc. However, the classical query languages like SQL, OQL are inept dealing with these types of queries needed to be performed in the above applications. Therefore, a data model that can effectively represent the graph objects and their properties, and a query language which empowers users to answer queries across multiple categories are needed. In this research work, a graph data model and a query language are proposed to resolve the issues existing in the current database applications. The proposed graph data model is an object-oriented graph data model which aims to represent the graph objects and their properties for various applications. The graph query language empowers users to query graph objects and their properties in a graph with specified conditions. The capability to specify the relationships among the entities composing the queried sub-graph makes the language more flexible than others.
|
12 |
On Pattern Mining in Graph Data to Support Decision-MakingPetermann, André 14 February 2019 (has links)
In recent years graph data models became increasingly important in both research and industry. Their core is a generic data structure of things (vertices) and connections among those things (edges). Rich graph models such as the property graph model promise an extraordinary analytical power because relationships can be evaluated without knowledge about a domain-specific database schema. This dissertation studies the usage of graph models for data integration and data mining of business data. Although a typical company's business data implicitly describes a graph it is usually stored in multiple relational databases. Therefore, we propose the first semi-automated approach to transform data from multiple relational databases into a single graph whose vertices represent domain objects and whose edges represent their mutual relationships. This transformation is the base of our conceptual framework BIIIG (Business Intelligence with Integrated Instance Graphs). We further proposed a graph-based approach to data integration. The process is executed after the transformation. In established data mining approaches interrelated input data is mostly represented by tuples of measure values and dimension values. In the context of graphs these values must be attached to the graph structure and aggregated measure values are graph attributes. Since the latter was not supported by any existing model, we proposed the use of collections of property graphs. They act as data structure of the novel Extended Property Graph Model (EPGM). The model supports vertices and edges that may appear in different graphs as well as graph properties. Further on, we proposed some operators that benefit from this data structure, for example, graph-based aggregation of measure values. A primitive operation of graph pattern mining is frequent subgraph mining (FSM). However, existing algorithms provided no support for directed multigraphs. We extended the popular gSpan algorithm to overcome this limitation. Some patterns might not be frequent while their generalizations are. Generalized graph patterns can be mined by attaching vertices to taxonomies. We proposed a novel approach to Generalized Multidimensional Frequent Subgraph Mining (GM-FSM), in particular the first solution to generalized FSM that supports not only directed multigraphs but also multiple dimensional taxonomies. In scenarios that compare patterns of different categories, e.g., fraud or not, FSM is not sufficient since pattern frequencies may differ by category. Further on, determining all pattern frequencies without frequency pruning is not an option due to the computational complexity of FSM. Thus, we developed an FSM extension to extract patterns that are characteristic for a specific category according to a user-defined interestingness function called Characteristic Subgraph Mining (CSM). Parts of this work were done in the context of GRADOOP, a framework for distributed graph analytics. To make the primitive operation of frequent subgraph mining available to this framework, we developed Distributed In-Memory gSpan (DIMSpan), a frequent subgraph miner that is tailored to the characteristics of shared-nothing clusters and distributed dataflow systems. Finally, the results of use case evaluations in cooperation with a large scale enterprise will be presented. This includes a report of practical experiences gained in implementation and application of the proposed algorithms.
|
13 |
Systematic Digitized Treatment of Engineering Line-DiagramsSui, T.Z., Qi, Hong Sheng, Qi, Q., Wang, L., Sun, J.W. 05 1900 (has links)
Yes / In engineering design, there are many functional relationships which are difficult to express into a simple and exact mathematical formula. Instead they are documented within a form of line graphs (or plot charts or curve diagrams) in engineering handbooks or text books. Because the information in such a form cannot be used directly in the modern computer aided design (CAD) process, it is necessary to find a way to numerically represent the information. In this paper, a data processing system for numerical representation of line graphs in mechanical design is developed, which incorporates the process cycle from the initial data acquisition to the final output of required information. As well as containing the capability for curve fitting through Cubic spline and Neural network techniques, the system also adapts a novel methodology for use in this application: Grey Models. Grey theory have been used in various applications, normally involved with time-series data, and have the characteristic of being able to handle sparse data sets and data forecasting. Two case studies were then utilized to investigate the feasibility of Grey models for curve fitting. Furthermore, comparisons with the other two established techniques show that the accuracy was better than the Cubic spline function method, but slightly less accurate than the Neural network method. These results are highly encouraging and future work to fully investigate the capability of Grey theory, as well as exploiting its sparse data handling capabilities is recommended.
|
14 |
Graph Processing in Main-Memory Column StoresParadies, Marcus 29 May 2017 (has links) (PDF)
Evermore, novel and traditional business applications leverage the advantages of a graph data model, such as the offered schema flexibility and an explicit representation of relationships between entities. As a consequence, companies are confronted with the challenge of storing, manipulating, and querying terabytes of graph data for enterprise-critical applications. Although these business applications operate on graph-structured data, they still require direct access to the relational data and typically rely on an RDBMS to keep a single source of truth and access.
Existing solutions performing graph operations on business-critical data either use a combination of SQL and application logic or employ a graph data management system. For the first approach, relying solely on SQL results in poor execution performance caused by the functional mismatch between typical graph operations and the relational algebra. To the worse, graph algorithms expose a tremendous variety in structure and functionality caused by their often domain-specific implementations and therefore can be hardly integrated into a database management system other than with custom coding. Since the majority of these enterprise-critical applications exclusively run on relational DBMSs, employing a specialized system for storing and processing graph data is typically not sensible. Besides the maintenance overhead for keeping the systems in sync, combining graph and relational operations is hard to realize as it requires data transfer across system boundaries.
A basic ingredient of graph queries and algorithms are traversal operations and are a fundamental component of any database management system that aims at storing, manipulating, and querying graph data. Well-established graph traversal algorithms are standalone implementations relying on optimized data structures. The integration of graph traversals as an operator into a database management system requires a tight integration into the existing database environment and a development of new components, such as a graph topology-aware optimizer and accompanying graph statistics, graph-specific secondary index structures to speedup traversals, and an accompanying graph query language.
In this thesis, we introduce and describe GRAPHITE, a hybrid graph-relational data management system. GRAPHITE is a performance-oriented graph data management system as part of an RDBMS allowing to seamlessly combine processing of graph data with relational data in the same system. We propose a columnar storage representation for graph data to leverage the already existing and mature data management and query processing infrastructure of relational database management systems. At the core of GRAPHITE we propose an execution engine solely based on set operations and graph traversals.
Our design is driven by the observation that different graph topologies expose different algorithmic requirements to the design of a graph traversal operator. We derive two graph traversal implementations targeting the most common graph topologies and demonstrate how graph-specific statistics can be leveraged to select the optimal physical traversal operator. To accelerate graph traversals, we devise a set of graph-specific, updateable secondary index structures to improve the performance of vertex neighborhood expansion. Finally, we introduce a domain-specific language with an intuitive programming model to extend graph traversals with custom application logic at runtime. We use the LLVM compiler framework to generate efficient code that tightly integrates the user-specified application logic with our highly optimized built-in graph traversal operators.
Our experimental evaluation shows that GRAPHITE can outperform native graph management systems by several orders of magnitude while providing all the features of an RDBMS, such as transaction support, backup and recovery, security and user management, effectively providing a promising alternative to specialized graph management systems that lack many of these features and require expensive data replication and maintenance processes.
|
15 |
Generalized Statistical Tolerance Analysis and Three Dimensional Model for Manufacturing Tolerance Transfer in Manufacturing Process PlanningJanuary 2011 (has links)
abstract: Mostly, manufacturing tolerance charts are used these days for manufacturing tolerance transfer but these have the limitation of being one dimensional only. Some research has been undertaken for the three dimensional geometric tolerances but it is too theoretical and yet to be ready for operator level usage. In this research, a new three dimensional model for tolerance transfer in manufacturing process planning is presented that is user friendly in the sense that it is built upon the Coordinate Measuring Machine (CMM) readings that are readily available in any decent manufacturing facility. This model can take care of datum reference change between non orthogonal datums (squeezed datums), non-linearly oriented datums (twisted datums) etc. Graph theoretic approach based upon ACIS, C++ and MFC is laid out to facilitate its implementation for automation of the model. A totally new approach to determining dimensions and tolerances for the manufacturing process plan is also presented. Secondly, a new statistical model for the statistical tolerance analysis based upon joint probability distribution of the trivariate normal distributed variables is presented. 4-D probability Maps have been developed in which the probability value of a point in space is represented by the size of the marker and the associated color. Points inside the part map represent the pass percentage for parts manufactured. The effect of refinement with form and orientation tolerance is highlighted by calculating the change in pass percentage with the pass percentage for size tolerance only. Delaunay triangulation and ray tracing algorithms have been used to automate the process of identifying the points inside and outside the part map. Proof of concept software has been implemented to demonstrate this model and to determine pass percentages for various cases. The model is further extended to assemblies by employing convolution algorithms on two trivariate statistical distributions to arrive at the statistical distribution of the assembly. Map generated by using Minkowski Sum techniques on the individual part maps is superimposed on the probability point cloud resulting from convolution. Delaunay triangulation and ray tracing algorithms are employed to determine the assembleability percentages for the assembly. / Dissertation/Thesis / Ph.D. Mechanical Engineering 2011
|
16 |
Towards effective analysis of big graphs : from scalability to qualityTian, Chao January 2017 (has links)
This thesis investigates the central issues underlying graph analysis, namely, scalability and quality. We first study the incremental problems for graph queries, which aim to compute the changes to the old query answer, in response to the updates to the input graph. The incremental problem is called bounded if its cost is decided by the sizes of the query and the changes only. No matter how desirable, however, our first results are negative: for common graph queries such as graph traversal, connectivity, keyword search and pattern matching, their incremental problems are unbounded. In light of the negative results, we propose two new characterizations for the effectiveness of incremental computation, and show that the incremental computations above can still be effectively conducted, by either reducing the computations on big graphs to small data, or incrementalizing batch algorithms by minimizing unnecessary recomputation. We next study the problems with regards to improving the quality of the graphs. To uniquely identify entities represented by vertices in a graph, we propose a class of keys that are recursively defined in terms of graph patterns, and are interpreted with subgraph isomorphism. As an application, we study the entity matching problem, which is to find all pairs of entities in a graph that are identified by a given set of keys. Although the problem is proved to be intractable, and cannot be parallelized in logarithmic rounds, we provide two parallel scalable algorithms for it. In addition, to catch numeric inconsistencies in real-life graphs, we extend graph functional dependencies with linear arithmetic expressions and comparison predicates, referred to as NGDs. Indeed, NGDs strike a balance between expressivity and complexity, since if we allow non-linear arithmetic expressions, even of degree at most 2, the satisfiability and implication problems become undecidable. A localizable incremental algorithm is developed to detect errors using NGDs, where the cost is determined by small neighbors of nodes in the updates instead of the entire graph. Finally, a rule-based method to clean graphs is proposed. We extend graph entity dependencies (GEDs) as data quality rules. Given a graph, a set of GEDs and a block of ground truth, we fix violations of GEDs in the graph by combining data repairing and object identification. The method finds certain fixes to errors detected by GEDs, i.e., as long as the GEDs and the ground truth are correct, the fixes are assured correct as their logical consequences. Several fundamental results underlying the method are established, and an algorithm is developed to implement the method. We also parallelize the method and guarantee to reduce its running time with the increase of processors.
|
17 |
Graph Processing in Main-Memory Column StoresParadies, Marcus 03 February 2017 (has links)
Evermore, novel and traditional business applications leverage the advantages of a graph data model, such as the offered schema flexibility and an explicit representation of relationships between entities. As a consequence, companies are confronted with the challenge of storing, manipulating, and querying terabytes of graph data for enterprise-critical applications. Although these business applications operate on graph-structured data, they still require direct access to the relational data and typically rely on an RDBMS to keep a single source of truth and access.
Existing solutions performing graph operations on business-critical data either use a combination of SQL and application logic or employ a graph data management system. For the first approach, relying solely on SQL results in poor execution performance caused by the functional mismatch between typical graph operations and the relational algebra. To the worse, graph algorithms expose a tremendous variety in structure and functionality caused by their often domain-specific implementations and therefore can be hardly integrated into a database management system other than with custom coding. Since the majority of these enterprise-critical applications exclusively run on relational DBMSs, employing a specialized system for storing and processing graph data is typically not sensible. Besides the maintenance overhead for keeping the systems in sync, combining graph and relational operations is hard to realize as it requires data transfer across system boundaries.
A basic ingredient of graph queries and algorithms are traversal operations and are a fundamental component of any database management system that aims at storing, manipulating, and querying graph data. Well-established graph traversal algorithms are standalone implementations relying on optimized data structures. The integration of graph traversals as an operator into a database management system requires a tight integration into the existing database environment and a development of new components, such as a graph topology-aware optimizer and accompanying graph statistics, graph-specific secondary index structures to speedup traversals, and an accompanying graph query language.
In this thesis, we introduce and describe GRAPHITE, a hybrid graph-relational data management system. GRAPHITE is a performance-oriented graph data management system as part of an RDBMS allowing to seamlessly combine processing of graph data with relational data in the same system. We propose a columnar storage representation for graph data to leverage the already existing and mature data management and query processing infrastructure of relational database management systems. At the core of GRAPHITE we propose an execution engine solely based on set operations and graph traversals.
Our design is driven by the observation that different graph topologies expose different algorithmic requirements to the design of a graph traversal operator. We derive two graph traversal implementations targeting the most common graph topologies and demonstrate how graph-specific statistics can be leveraged to select the optimal physical traversal operator. To accelerate graph traversals, we devise a set of graph-specific, updateable secondary index structures to improve the performance of vertex neighborhood expansion. Finally, we introduce a domain-specific language with an intuitive programming model to extend graph traversals with custom application logic at runtime. We use the LLVM compiler framework to generate efficient code that tightly integrates the user-specified application logic with our highly optimized built-in graph traversal operators.
Our experimental evaluation shows that GRAPHITE can outperform native graph management systems by several orders of magnitude while providing all the features of an RDBMS, such as transaction support, backup and recovery, security and user management, effectively providing a promising alternative to specialized graph management systems that lack many of these features and require expensive data replication and maintenance processes.
|
18 |
Predikce spojení v odvozených sociálních sítích / Link Prediction in Inferred Social NetworksMěkota, Ondřej January 2021 (has links)
Social networks can be helpful for the analysis of behaviour of people. An existing social network is rarely available, and its nodes and edges have to be inferred from not necessarily graph data. Link prediction can be used to either correct inaccuracies or to forecast links about to appear in the future. In this work, we study the prediction of miss- ing links in a social network inferred from real-world bank data. We review and compare both verified and modern approaches to link prediction. Following the advancements of deep learning in recent years, we primarily focus on graph neural networks, and their ability to scale to large networks. We propose an adjustment to an existing graph neural network method and show that its performance is either comparable with or outperform- ing the original method. The comparison is performed on two social networks inferred from the same data. We show that it is relatively hard to outperform the verified link prediction methods with graph neural networks. 1
|
19 |
Learning from Structured Data: Scalability, Stability and Temporal AwarenessPavlovski, Martin, 0000-0003-1495-2128 January 2021 (has links)
A plethora of high-impact applications involve predictive modeling of structured data. In various domains, from hospital readmission prediction in the medical realm, though weather forecasting and event detection in power systems, up to conversion prediction in online businesses, the data holds a certain underlying structure. Building predictive models from such data calls for leveraging the structure as an additional source of information. Thus, a broad range of structure-aware approaches have been introduced, yet certain common challenges in many structured learning scenarios remain unresolved. This dissertation revolves around addressing the challenges of scalability, algorithmic stability and temporal awareness in several scenarios of learning from either graphically or sequentially structured data.
Initially, the first two challenges are discussed from a structured regression standpoint. The studies addressing these challenges aim at designing scalable and algorithmically stable models for structured data, without compromising their prediction performance. It is further inspected whether such models can be applied to both static and dynamic (time-varying) graph data. To that end, a structured ensemble model is proposed to scale with the size of temporal graphs, while making stable and reliable yet accurate predictions on a real-world application involving gene expression prediction. In the case of static graphs, a theoretical insight is provided on the relationship between algorithmic stability and generalization in a structured regression setting. A stability-based objective function is designed to indirectly control the stability of a collaborative ensemble regressor, yielding generalization performance improvements on structured regression applications as diverse as predicting housing prices based on real-estate transactions and readmission prediction from hospital records.
Modeling data that holds a sequential rather than a graphical structure requires addressing temporal awareness as one of the major challenges. In that regard, a model is proposed to generate time-aware representations of user activity sequences, intended to be seamlessly applicable across different user-related tasks, while sidestepping the burden of task-driven feature engineering. The quality and effectiveness of the time-aware user representations led to predictive performance improvements over state-of-the-art models on multiple large-scale conversion prediction tasks.
Sequential data is also analyzed from the perspective of a high-impact application in the realm of power systems. Namely, detecting and further classifying disturbance events, as an important aspect of risk mitigation in power systems, is typically centered on the challenges of capturing structural characteristics in sequential synchrophasor recordings. Therefore, a thorough comparative analysis was conducted by assessing various traditional as well as more sophisticated event classification models under different domain-expert-assisted labeling scenarios. The experimental findings provide evidence that hierarchical convolutional neural networks (HCNNs), capable of automatically learning time-invariant feature transformations that preserve the structural characteristics of the synchrophasor signals, consistently outperform traditional model variants. Their performance is observed to further improve as more data are inspected by a domain expert, while smaller fractions of solely expert-inspected signals are already sufficient for HCNNs to achieve satisfactory event classification accuracy. Finally, insights into the impact of the domain expertise on the downstream classification performance are also discussed. / Computer and Information Science
|
20 |
Anti-Money Laundering with Unreliable LabelsHovstadius, David January 2024 (has links)
This thesis examines the effectiveness of Graph Neural Networks (GNNs) in detecting money laundering activities using transaction data with unreliable labels. It analyses how weakly supervised learning, specifically with GNNs, manages the challenges posed by incomplete and inaccurate labels in anti-money laundering (AML) detection. The thesis utilizes simulated transaction data to compare the performance of GNNs against statistical models. This was done by generating various datasets with the AMLSim tool, and evaluating the node classification performance of different statistical machine learning models and GNNs. The findings indicate that GNNs, due to their ability to find relationships in graph structures, demonstrate superior performance in scenarios with incomplete and inaccurate labels. The findings also indicate that inaccurate positive labels has a great negative effect on the performance, showing the label importance of money launderers in graph data. This research provides possible improvements for anti-money laundering detection by employing GNNs to manage challenges in real-world data.
|
Page generated in 0.0627 seconds