Global ETD Search

21	ANNIS: A graph-based query system for deeply annotated text corpora Krause, Thomas 11 January 2019 (has links) Diese Dissertation beschreibt das Design und die Implementierung eines effizienten Suchsystems für linguistische Korpora. Das bestehende und auf einer relationalen Datenbank basierende System ANNIS ist spezialisiert darin, Korpora mit verschiedenen Arten von Annotationen zu unterstützen und nutzt Graphen als einheitliche Repräsentation der verschiedener Annotationen. Für diese Dissertation wurde eine Hauptspeicher-Datenbank, die rein auf Graphen basiert, als Nachfolger für ANNIS entwickelt. Die Korpora werden in Kantenkomponenten partitioniert und für verschiedene Typen von Subgraphen werden unterschiedliche Implementationen zur Darstellung und Suche in diesen Komponenten genutzt. Operationen der Anfragesprache AQL (ANNIS Query Language) werden als Kombination von Erreichbarkeitsanfragen auf diesen verschiedenen Komponenten implementiert und jede Implementierung hat optimierte Funktionen für diese Art von Anfragen. Dieser Ansatz nutzt die verschiedenen Strukturen der unterschiedlichen Annotationsarten aus, ohne die einheitliche Darstellung als Graph zu verlieren. Zusätzliche Optimierungen, wie die parallele Ausführung von Teilen der Anfragen, wurden ebenfalls implementiert und evaluiert. Da AQL eine bestehende Implementierung besitzt und diese für Forscher offen als webbasierter Service zu Verfügung steht, konnten echte AQL-Anfragen aufgenommen werden. Diese dienten als Grundlage für einen Benchmark der neuen Implementierung. Mehr als 4000 Anfragen über 18 Korpora wurden zu einem realistischen Workload zusammengetragen, der sehr unterschiedliche Arten von Korpora und Anfragen mit einem breitem Spektrum von Komplexität enthält. Die neue graphbasierte Implementierung wurde mit der existierenden, die eine relationale Datenbank nutzt, verglichen. Sie führt den Anfragen im Workload im Vergleich ~10 schneller aus und die Experimente zeigen auch, dass die verschiedenen Implementierungen für die Kantenkomponenten daran einen großen Anteil haben. / This dissertation describes the design and implementation of an efficient system for linguistic corpus queries. The existing system ANNIS is based on a relational database and is focused on providing support for corpora with very different kinds of annotations and uses graphs as unified representations of the different annotations. For this dissertation, a main memory and solely graph-based successor of ANNIS has been developed. Corpora are divided into edge components and different implementations for representation and search of these components are used for different types of subgraphs. AQL operations are interpreted as a set of reachability queries on the different components and each component implementation has optimized functions for this type of queries. This approach allows exploiting the different structures of the different kinds of annotations without losing the common representation as a graph. Additional optimizations, like parallel executions of parts of the query, are also implemented and evaluated. Since AQL has an existing implementation and is already provided as a web-based service for researchers, real-life AQL queries have been recorded and thus can be used as a base for benchmarking the new implementation. More than 4000 queries from 18 corpora (from which most are available under an open-access license) have been compiled into a realistic workload that includes very different types of corpora and queries with a wide range of complexity. The new graph-based implementation was compared against the existing one, which uses a relational database. It executes the workload ~10 faster than the baseline and experiments show that the different graph storage implementations had a major effect in this improvement. Hauptspeicher-Datenbank Graphdatenbank Korpuslinguistik Suchmaschine In-memory database Graph database Corpus linguistics Search engine 004 Datenverarbeitung; Informatik ST 306 ddc:004
22	Querying and Mining Multigraphs / Requêtes et fouille de multigraphes Ingalalli, Vijay 27 February 2017 (has links) Avec des volumes de données et d’informations de plus en plus importants, des données de plus en plus complexes et fortement inter-reliées, l’extraction de connaissances reste un véritable défi. Les graphes offrent actuellement un support de représentation efficace pour représenter ces données. Parmi les approches existantes, les multi-graphes ont montré que leur pouvoir d’expression était particulièrement adapté pour manipuler des données complexes possédant de nombreux types de relations entre elles. Cette thèse aborde deux aspects principaux liés aux multigraphes : la recherche de sous graphes et la fouille de sous graphes fréquents dans des multigraphes.Elle propose trois propositions dans le domaines du requêtage et de la fouille de données.La première contribution s’inscrit dans la recherche de sous graphes et concerne l’isomorphisme de sous graphes dans des multigraphes. Cette approche peut, par exemple, être appliquée dans de nombreux domaines d’applications comme l’analyse d’images satellites ou de réseaux sociaux. Dans la seconde, nous nous intéressons aux graphes de connaissances et abordons la problématique de l’homorphisme de graphes dans des multigraphes RDF. Dans les deux contributions, nous proposons de nouvelles techniques d’indexations pour représenter efficacement les informations contenues dans les multigraphes. La recherche des sous graphes tire avantage de ces nouveaux index et différentes heuristiques et optimisations sont également proposées pour garantir de bonnes performances lors de l’exécution des requêtes. La seconde contribution s’inscrit dans le domaine de la fouille de données et nous proposons un algorithme efficace pour extraire les multigraphes fréquents. Etant donné l’espace de recherche à considérer, la recherche de motifs fréquents dans des graphes est un problème difficile en fouille de données. Pour parcourir efficacement l’espace de recherche encore plus volumineux pour les multigraphes, nous proposons de nouvelles techniques et méthodes pour le traverser efficacement notamment en éliminant des candidats où détectant à l’avance les motifs non fréquents. Pour chacune de ces propositions de nombreuses expérimentations sont réalisées pour valider à la fois leurs performances et exactitudes en les comparant avec les approches existantes. Finalement, nous proposons une étude de cas sur des jeux de données issues d’images satellites modélisées sous la forme de multigraphe et montrons que l’application de nos propositions permet de mettre en évidence de nouvelles connaissances utiles. / With the ever-increasing growth of data and information, extracting the right knowledge has become a real challenge.Further, the advanced applications demand the analysis of complex, interrelated data which cannot be adequately described using a propositional representation. The graph representation is of great interest for the knowledge extraction community, since graphs are versatile data structures and are one of the most general forms of data representation. Among several classes of graphs, textit{multigraphs} have been captivating the attention in the recent times, thanks to their inherent property of succinctly representing the entities by allowing the rich and complex relations among them.The focus of this thesis is streamlined into two themes of knowledge extraction; one being textit{knowledge retrieval}, where we focus on the subgraph query matching aspects in multigraphs, and the other being textit{knowledge discovery}, where we focus on the problem of frequent pattern mining in multigraphs.This thesis makes three main contributions in the field of query matching and data mining.The first contribution, which is very generic, addresses querying subgraphs in multigraphs that yields isomorphic matches, and this problem finds potential applications in the domains of remote sensing, social networks, bioinformatics, chemical informatics. The second contribution, which is focussed on knowledge graphs, addresses querying subgraphs in RDF multigraphs that yield homomorphic matches. In both the contributions, we introduce efficient indexing structures that capture the multiedge information. The query matching processes introduced have been carefully optimized, w.r.t. the time performance and the heuristics employed assure robust performance.The third contribution is in the field of data mining, where we propose an efficient frequent pattern mining algorithm for multigraphs. We observe that multigraphs pose challenges while exploring the search space, and hence we introduce novel optimization techniques and heuristic search methods to swiftly traverse the search space.For each proposed approach, we perform extensive experimental analysis by comparing with the existing state-of-the-art approaches in order to validate the performance and correctness of our approaches.In the end, we perform a case study analysis on a remote sensing dataset. Remote sensing dataset is modelled as a multigraph, and the mining and query matching processes are employed to discover some useful knowledge. Exploration de données Fouille de graphes La théorie des graphes Base de données du graphes Data mining Graph mining Graph theory Graph database
23	Complex graph algorithms using relational database Ahmed, Aly 24 August 2021 (has links) Data processing for Big Data plays a vital role for decision-makers in organizations and government, enhances the user experience, and provides quality results in prediction analysis. However, many modern data processing solutions make a significant investment in hardware and maintenance costs, such as Hadoop and Spark, often neglecting the well established and widely used relational database management systems (RDBMS's). In this dissertation, we study three fundamental graph problems in RDBMS. The first problem we tackle is computing shortest paths (SP) from a source to a target in large network graphs. We explore SQL based solutions and leverage the intelligent scheduling that a RDBMS performs when executing set-at-a-time expansions of graph vertices, which is in contrast to vertex-at-a-time expansions in classical SP algorithms. Our algorithms perform orders of magnitude faster than baselines and outperform counterparts in native graph databases. Second, we studied the PageRank problem which is vital in Google Search and social network analysis to determine how to sort search results and identify important nodes in a graph. PageRank is an iterative algorithm which imposes challenges when implementing it over large graphs. We study computing PageRank using RDBMS for very large graphs using a consumer-grade machine and compare the results to a dedicated graph database. We show that our RDBMS solution is able to process graphs of more than a billion edges in few minutes, whereas native graph databases fail to handle graphs of much smaller sizes. Last, we present a carefully engineered RDBMS solution to the problem of triangle enumeration for very large graphs. We show that RDBMS's are suitable tools for enumerating billions of triangles in billion-scale networks on a consumer grade machine. Also, we compare our RDBMS solution's performance to a native graph database and show that our RDBMS solution outperforms by orders of magnitude. / Graduate Shortest Path pagerank RDBMS Matrix partitioning Big Data Triangle Enumeration Graph Database PTE Compact Forward Table partitioning Billion Scale Graph
24	Jazyk pro dotazování Java AST / Java AST Query Language Bílek, Jiří January 2015 (has links) The purpose of this thesis is to design a Java AST query language and implement tool that uses the query language. This work overviews graph databases and their libraries with focus on Neo4J and Titan. This thesis overviews tools Java bytecode analysis as well. Libraries Procyon and BCEL are described in detail. The work includes a proposal the query language and detailed description of the tool implementation, together with the detailed description of the way how Java entities are stored into the graph databases. In the end, the work deals with experiments and the evaluation of the time complexity of the library.
25	A Survey Of Persistent Graph Databases Liu, Yufan 23 April 2014 (has links) No description available. Computer Engineering Computer Science graph database graph mining transactional database benchmark graph algorithm GDB distribute graph processing framework NOSQL
26	A graph database management system for a logistics-related service Walldén, Marcus, Özkan, Aylin January 2016 (has links) Higher demands on database systems have lead to an increased popularity of certain database system types in some niche areas. One such niche area is graph networks, such as social networks or logistics networks. An analysis made on such networks often focus on complex relational patterns that sometimes can not be solved efficiently by traditional relational databases, which has lead to the infusion of some specialized non-relational database systems. Some of the database systems that have seen a surge in popularity in this area are graph database systems. This thesis presents a prototype of a logistics network-related service using a graph database management system called Neo4j, which currently is the most popular graph database management system in use. The logistics network covered by the service is based on existing data from PostNord, Sweden’s biggest provider of logistics solutions, and primarily focuses on customer support and business to business. By creating a prototype of the service this thesis strives to indicate some of the positive and negative aspects of a graph database system, as well as give an indication of how a service using a graph database system could be created. The results indicate that Neo4j is very intuitive and easy to use, which would make it optimal for prototyping and smaller systems, but due to the used evaluation method more research in this area would need to be carried out in order to confirm these conclusions. / Högre krav på databassystem har lett till en ökad popularitet för vissa databassystemstyper i några nischområden. Ett sådant nischområde är grafnätverk, såsomsociala nätverk eller logistiknätverk. Analyser på grafnätverk fokuserar ofta påkomplexa relationsmönster som ibland inte kan lösas effektivt av traditionella relationsdatabassystem, vilket har lett till att vissa specialiserade icke-relationella databassystem har blivit populära alternativ. Många av de populära databassystemen inom detta område är grafdatabassystem. Detta arbete presenterar en prototyp av en logistiknätverksrelaterad tjänst som använder sig av ett grafdatabashanteringssystem som heter Neo4j, vilket är det mest använda grafdatabashanteringssystemet. Logistiknätverket som täcks av tjänsten är baserad på existerande data från PostNord, Sveriges ledande leverantör av logistiklösningar, och fokuserar primärt på kundsupport och företagsrelaterad analys. Genom att skapa en prototyp av tjänsten strävar detta arbete efter att uppvisa vissa av de positiva och negativa aspekterna av ett grafdatabashanteringssystem samt att visa hur en tjänst kan skapas genom att använda ett grafdatabashanteringssystem. Resultaten indikerar att Neo4j är väldigt intuitivt och lättanvänt, vilket skulle göra den optimal för prototyping och mindre system, men på grund av den använda evalueringsmetoden så behöver mer forskning inom detta område utföras innan dessa slutsatser kan bekräftas. Graph database Relational database Prototype Logistics Graph analysis NoSQL Grafdatabas Relationsdatabas Prototyp Logistik Grafanalys NoSQL Communication Systems Kommunikationssystem
27	Leveraging Flexible Data Management with Graph Databases Vasilyeva, Elena, Thiele, Maik, Bornhövd, Christof, Lehner, Wolfgang 01 September 2022 (has links) Integrating up-to-date information into databases from different heterogeneous data sources is still a time-consuming and mostly manual job that can only be accomplished by skilled experts. For this reason, enterprises often lack information regarding the current market situation, preventing a holistic view that is needed to conduct sound data analysis and market predictions. Ironically, the Web consists of a huge and growing number of valuable information from diverse organizations and data providers, such as the Linked Open Data cloud, common knowledge sources like Freebase, and social networks. One desirable usage scenario for this kind of data is its integration into a single database in order to apply data analytics. However, in today's business intelligence tools there is an evident lack of support for so-called situational or ad-hoc data integration. What we need is a system which 1) provides a flexible storage of heterogeneous information of different degrees of structure in an ad-hoc manner, and 2) supports mass data operations suited for data analytics. In this paper, we will provide our vision of such a system and describe an extension of the well-studied property graph model that allows to 'integrate and analyze as you go' external data exposed in the RDF format in a seamless manner. The proposed integration approach extends the internal graph model with external data from the Linked Open Data cloud, which stores over 31 billion RDF triples (September 2011) from a variety of domains. info:eu-repo/classification/ddc/004 ddc:004
28	Graphs enriched by Cubes (GreC) : a new approach for OLAP on information networks / Graphes enrichis par des Cubes (GreC) : une nouvelle approche pour l’OLAP sur des réseaux d’information Jakawat, Wararat 27 September 2016 (has links) L'analyse en ligne OLAP (Online Analytical Processing) est une des technologies les plus importantes dans les entrepôts de données, elle permet l'analyse multidimensionnelle de données. Cela correspond à un outil d'analyse puissant, tout en étant flexible en terme d'utilisation pour naviguer dans les données, plus ou moins en profondeur. OLAP a été le sujet de différentes améliorations et extensions, avec sans cesse de nouveaux problèmes en lien avec le domaine et les données, par exemple le multimedia, les données spatiales, les données séquentielles, etc. A l'origine, OLAP a été introduit pour analyser des données structurées que l'on peut qualifier de classiques. Cependant, l'émergence des réseaux d'information induit alors un nouveau domaine intéressant qu'il convient d'explorer. Extraire des connaissances à partir de larges réseaux constitue une tâche complexe et non évidente. Ainsi, l'analyse OLAP peut être une bonne alternative pour observer les données avec certains points de vue. Différents types de réseaux d'information peuvent aider les utilisateurs dans différentes activités, en fonction de différents domaines. Ici, nous focalisons notre attention sur les réseaux d'informations bibliographiques construits à partir des bases de données bibliographiques. Ces données permettent d'analyser non seulement la production scientifique, mais également les collaborations entre auteurs. Il existe différents travaux qui proposent d'avoir recours aux technologies OLAP pour les réseaux d'information, nommé ``graph OLAP". Beaucoup de techniques se basent sur ce qu'on peut appeler cube de graphes. Dans cette thèse, nous proposons une nouvelle approche de “graph OLAP” que nous appelons “Graphes enrichis par des Cubes” (GreC). Notre proposition consiste à enrichir les graphes avec des cubes plutôt que de construire des cubes de graphes. En effet, les noeuds et/ou les arêtes du réseau considéré sont décrits par des cubes de données. Cela permet des analyses intéressantes pour l'utilisateur qui peut naviguer au sein d'un graphe enrichi de cubes selon différents niveaux d'analyse, avec des opérateurs dédiés. En outre, notons quatre principaux aspects dans GreC. Premièrement, GreC considère la structure du réseau afin de permettre des opérations OLAP topologiques, et pas seulement des opérations OLAP classiques et informationnelles. Deuxièmement, GreC propose une vision globale du graphe avec des informations multidimensionnelles. Troisièmement, le problème de dimension à évolution lente est pris en charge dans le cadre de l'exploration du réseau. Quatrièmement, et dernièrement, GreC permet l'analyse de données avec une évolution du réseau parce que notre approche permet d'observer la dynamique à travers la dimension temporelle qui peut être présente dans les cubes pour la description des noeuds et/ou arêtes. Pour évaluer GreC, nous avons implémenté notre approche et mené une étude expérimentale sur des jeux de données réelles pour montrer l'intérêt de notre approche. L'approche GreC comprend différents algorithmes. Nous avons validé de manière expérimentale la pertinence de nos algorithmes et montrons leurs performances. / Online Analytical Processing (OLAP) is one of the most important technologies in data warehouse systems, which enables multidimensional analysis of data. It represents a very powerful and flexible analysis tool to manage within the data deeply by operating computation. OLAP has been the subject of improvements and extensions across the board with every new problem concerning domain and data; for instance, multimedia, spatial data, sequence data and etc. Basically, OLAP was introduced to analyze classical structured data. However, information networks are yet another interesting domain. Extracting knowledge inside large networks is a complex task and too big to be comprehensive. Therefore, OLAP analysis could be a good idea to look at a more compressed view. Many kinds of information networks can help users with various activities according to different domains. In this scenario, we further consider bibliographic networks formed on the bibliographic databases. This data allows analyzing not only the productions but also the collaborations between authors. There are research works and proposals that try to use OLAP technologies for information networks and it is called Graph OLAP. Many Graph OLAP techniques are based on a cube of graphs.In this thesis, we propose a new approach for Graph OLAP that is graphs enriched by cubes (GreC). In a different and complementary way, our proposal consists in enriching graphs with cubes. Indeed, the nodes or/and edges of the considered network are described by a cube. It allows interesting analyzes for the user who can navigate within a graph enriched by cubes according to different granularity levels, with dedicated operators. In addition, there are four main aspects in GreC. First, GreC takes into account the structure of network in order to do topological OLAP operations and not only classical or informational OLAP operations. Second, GreC has a global view of a network considered with multidimensional information. Third, the slowly changing dimension problem is taken into account in order to explore a network. Lastly, GreC allows data analysis for the evolution of a network because our approach allows observing the evolution through the time dimensions in the cubes.To evaluate GreC, we implemented our approach and performed an experimental study on a real bibliographic dataset to show the interest of our proposal. GreC approach includes different algorithms. Therefore, we also validated the relevance and the performances of our algorithms experimentally. Online Analytical Processing (OLAP) Réseaux d'information Données bibliographiques Cube de données Bases de données en graphes Online Analytical Processing (OLAP) Information networks Bibliographic data Data cube Graph database
29	Grafdatabas: Från data till förståelse / Graph Database: From Data to Wisdom Thiel, Mattias, Brandt, Pontus January 2015 (has links) Detta examensarbete är utfört för Imano AB och behandlar ämnet databaser. Enorma mängder data finns lagrad i databaser världen över, men bara en bråkdel av all data används till något. Data kan förekomma i många olika former och en mängd olika typer av databaser har vuxit fram som komplement till de traditionella relationsdatabaserna. För sociala nätverk, logistiksystem, e-handel och i många andra sammanhang är relationer mellan dataposter ofta lika intressant som själva datainnehållet. När så är fallet kan grafdatabaser vara ett intressant alternativ. I en grafdatabas sparas relationer mellan enskilda dataposter som egna objekt, och denna egenskap kan användas för att ställa frågor om hur data relaterar till andra data. För att på ett effektivt sätt kunna utnyttja grafdatabasens egenskaper finns behov för ett lättillgängligt och användbart verktyg. Syftet med examensarbetet är att skapa ett verktyg, som kombinerar grafdatabasen Neo4js förmåga att hantera relationer mellan enskilda dataposter med visuell presentation av data i en webbapplikation. Studien undersöker om detta verktyg gör att användaren lättare kan få ny förståelse ur befintlig data. Denna studie är i grunden ett utvecklingsarbete som följer principerna för metoden Design Science Research. Metoden består av en utvecklingsprocess i flera steg där empirin är den kunskap som erhålls under arbetets gång. I utvecklingsprocessen ingår även kvalitativa undersökningsmetoder för att samla in data vid demonstration och utvärdering av artefakten. I rapporten jämförs grafdatabaser med relationsdatabaser. Studien avser dock endast att peka på skillnader gällande vissa egenskaper och genomför ingen fullständig jämförelse av exempelvis prestanda. Studien visar enligt utvecklarna att grafdatabasen Neo4j har egenskaper som gör den lämplig för användning där relationer mellan enskilda dataposter är viktiga som källor till kunskap. Resultatet av forskningen är att ny förståelse kan komma ur befintlig data genom användning av grafdatabas, speciellt om den kombineras med visualisering. / This thesis written in swedsh is done for Imano AB and deals with the subject databases. Huge amounts of data are stored in databases worldwide, but only a fraction of all the data is used. Data can exist in many different forms and various types of databases have emerged as a complement to the traditional relational databases. In social networking, logistics systems, e-commerce and many other contexts, relationships between data items are often as interesting as the actual data content. When this is the case, graph databases provide solutions to problems that other databases cannot handle. In a graph database relationships between individual data records are stored as own objects. Thanks to this, it is easier to ask questions about how data relate to other data. To effectively exploit the graph database’s features there is a need for an accessible and useful tool. The purpose of the project is to create a tool that combines the graph database Neo4j’s ability to manage relationships between individual data items with visual presentation of data in a web application. The study examines whether this tool allows the user to more easily gain new insights from existing data. This study is basically a software development process which follows the principles of the method of Design Science Research. The method consists of a development process in several stages where empirical data is the knowledge obtained during work. The development process also includes qualitative research methods to collect data at the demonstration and evaluation of the artifact. The study shows, according to the developers that the graph database Neo4j has properties that make it suitable for use where relationships between individual data items are important as sources of knowledge. The result of the research is that new understanding can emerge from existing data using a graph database, especially when combined with visualization Graph Database Neo4j Cypher Design Science Research Databases Web Application User Interface Visualization Grafdatabas Neo4j Cypher Design Science Research databaser webbapplikation användargränssnitt visualisering Computer and Information Sciences Data- och informationsvetenskap
30	Výhody a nevýhody relačních a nerelačních (noSQL) databází pro analytické úlohy / Advantages and disadvantages of relational and non-relational (NoSQL) databases for analytical tasks Klapač, Milan January 2015 (has links) This work focuses on NoSQL databases, their use for analytical tasks and on comparison of NoSQL databases with relational and OLAP databases. The aim is to analyse the benefits of NoSQL databases and their use for analytical purposes. The first part presents the basic principles of Business Intelligence, Data Warehousing, and Big Data. The second part deals with the key features of relational and NoSQL databases. The last part of the thesis describes the properties of four basic types of NoSQL databases, analyses their advantages, disadvantages and areas of application. The end of this part in-cludes specific examples of the use of NoSQL databases, together with the reasons for the selection of those solutions.

Search results