Global ETD Search

1	A constraint programming approach to subgraph isomorphism Zampelli, Stéphane 24 June 2008 (has links) This thesis proposes an expressive yet efficient declarative framework for graph matching in constraint programming (CP), and focuses on efficient algorithms to solve the subgraph isomorphism problem. The framework is based on graph and map variables, and specific graph morphism constraints. This allows to model and solve various graph matching problems, avoiding the tedious development of dedicated and specific algorithms. A specialized filtering algorithm is proposed for the subgraph isomorphism problem, which uses the semantic of the problem as well as the global structure of the two input graphs. It is shown that it is the state-of-the-art filtering algorithm, compared to dedicated algorithms and other CP approaches. Various search techniques from CP are also extended to the subgraph isomorphism problem. An automatic detection and exploitation of symmetries for the subgraph isomorphism problem is proposed, together with a decomposition approach of the search. The significance of this thesis lies in the fact that, even though the framework is expressive, CP can be considered as the state-of-the-art for subgraph isomorphism, outperforming the dedicated known algorithms on current benchmarks. Isomorphism Matching Graph Constraint programming Subgraph isomorphism Graph matching
2	Learning by Failing to Explain Hall, Robert Joseph 01 May 1986 (has links) Explanation-based Generalization requires that the learner obtain an explanation of why a precedent exemplifies a concept. It is, therefore, useless if the system fails to find this explanation. However, it is not necessary to give up and resort to purely empirical generalization methods. In fact, the system may already know almost everything it needs to explain the precedent. Learning by Failing to Explain is a method which is able to exploit current knowledge to prune complex precedents, isolating the mysterious parts of the precedent. The idea has two parts: the notion of partially analyzing a precedent to get rid of the parts which are already explainable, and the notion of re-analyzing old rules in terms of new ones, so that more general rules are obtained. learning explanation heuristic parsing design sgraph grammars subgraph isomorphism
3	Subgraph Isomorphism Search In Massive Graph Data / Isomorphisme de Sous-Graphes dans les graphes de données massifs Nabti, Chems Eddine 15 December 2017 (has links) L'interrogation de graphes de données est un problème fondamental qui connait un grand intérêt, en particulier pour les données structurées massives où les graphes constituent une alternative prometteuse aux bases de données relationnelles pour la modélisation des grandes masses de données. Cependant, l'interrogation des graphes de données est différente et plus complexe que l'interrogation des données relationnelles à base de tables. La tâche principale impliquée dans l'interrogation de graphes de données est la recherche d'isomorphisme de sous-graphes qui est un problème NP-complet.La recherche d'isomorphisme de sous-graphes est un problème très important impliqué dans divers domaines comme la reconnaissance de formes, l'analyse des réseaux sociaux, la biologie, etc. Il consiste à énumérer les sous-graphes d'un graphe de données qui correspondent à un graphe requête. Les solutions les plus connues de ce problème sont basées sur le retour arrière (backtracking). Elles explorent un grand espace de recherche, ce qui entraîne un coût de traitement élevé, notamment dans le cas de données massives.Pour réduire le temps et la complexité en espace mémoire dans la recherche d'isomorphisme de sous-graphes, nous proposons d'utiliser des graphes compressés. Dans notre approche, la recherche d'isomorphisme de sous-graphes est réalisée sur une représentation compressée des graphes sans les décompresser. La compression des graphes s'effectue en regroupant les sommets en super-sommets. Ce concept est connu dans la théorie des graphes par la décomposition modulaire. Il sert à générer une représentation en arbre d'un graphe qui met en évidence des groupes de sommets qui ont les mêmes voisins. Avec cette compression, nous obtenons une réduction substantielle de l'espace de recherche et par conséquent, une économie significative dans le temps de traitement.Nous proposons également une nouvelle représentation des sommets du graphe, qui simplifie le filtrage de l'espace de recherche. Ce nouveau mécanisme appelé compact neighborhood Index (CNI) encode l'information de voisinage autour d'un sommet en un seul entier. Cet encodage du voisinage réduit la complexité du temps de filtrage de cubique à quadratique. Ce qui est considérable pour les données massifs.Nous proposons également un algorithme de filtrage itératif qui repose sur les caractéristiques des CNIs pour assurer un élagage global de l'espace de recherche.Nous avons évalué nos approches sur plusieurs datasets et nous les avons comparées avec les algorithmes de l’état de l’art / Querying graph data is a fundamental problem that witnesses an increasing interest especially for massive structured data where graphs come as a promising alternative to relational databases for big data modeling. However, querying graph data is different and more complex than querying relational table-based data. The main task involved in querying graph data is subgraph isomorphism search which is an NP-complete problem. Subgraph isomorphism search, is an important problem which is involved in various domains such as pattern recognition, social network analysis, biology, etc. It consists to enumerate the subgraphs of a data graph that match a query graph. The most known solutions of this problem are backtracking-based. They explore a large search space which results in a high computational cost when we deal with massive graph data. To reduce time and memory space complexity of subgraph isomorphism search. We propose to use compressed graphs. In our approach, subgraph isomorphism search is achieved on compressed representations of graphs without decompressing them. Graph compression is performed by grouping vertices into super vertices. This concept is known, in graph theory, as modular decomposition. It is used to generate a tree representation of a graph that highlights groups of vertices that have the same neighbors. With this compression we obtain a substantial reduction of the search space and consequently a significant saving in the processing time. We also propose a novel encoding of vertices that simplifies the filtering of the search space. This new mechanism is called compact neighborhood Index (CNI). A CNI distills all the information around a vertex in a single integer. This simple neighborhood encoding reduces the time complexity of vertex filtering from cubic to quadratic which is considerable for big graphs. We propose also an iterative local global filtering algorithm that relies on the characteristics of CNIs to ensure a global pruning of the search space.We evaluated our approaches on several real-word datasets and compared them with the state of the art algorithms Isomorphisme de sous-graphe Graphes de données massifs Subgraph Isomorphism Search Massive Graph Data 004
4	Analysis of cross-system porting and porting errors in software projects Ray, Baishakhi 11 November 2013 (has links) Software forking---creating a variant product by copying and modifying an existing project---is often considered an ad hoc, low cost alternative to principled product line development. To maintain forked projects, developers need to manually port existing features or bug-fixes from one project to another. Such manual porting is not only tedious but also error-prone. When the contexts of the ported code vary, developers often have to adapt the ported code to fit its surroundings. Faulty adaptations or inconsistent updates of the ported code could potentially introduce subtle inconsistencies in the codebase. To build a deeper understanding to cross-system porting and porting related errors, this dissertation investigates: (1) How can we identify ported code from software version histories? (2) What is the overhead of cross-system porting required to maintain forked projects? (3) What is the extent and characteristics of porting errors that occur in practice? and (4) How can we detect and characterize potential porting errors? As a first step towards assessing the overhead of cross-system porting, we implement REPERTOIRE, a tool to analyze repeated work of cross-system porting across peer projects. REPERTOIRE can detect ported edits between program patches with high accuracy of 94% precision and 84% recall. Using REPERTOIRE, we study the temporal, spatial, and developer dimensions of cross-system porting using 18 years of parallel evolution history of the BSD product family. Our study finds that cross-system porting happens periodically and the porting rate does not necessarily decrease over time. The upkeep work of porting changes from peer projects is significant and currently, porting practice seems to heavily depend on developers doing their porting job on time. Analyzing version histories of Linux and FreeBSD, we derive five categories of porting errors, including incorrect control- and data-flow, code redundancy, and inconsistent identifier and token renamings. Leveraging this categorization, we design a static control- and data-dependence analysis technique, SPA, to detect and characterize porting inconsistencies. SPA detects porting inconsistencies with 65% to 73% precision and 90% recall, and identify inconsistency types with 58% to 63% precision and 92% recall on average. In a comparison with two existing error detection tools, SPA outperforms them with 14% to 17% better precision. / text Software evolution Forking Porting Repetitive changes Code clones Static analysis Subgraph isomorphism Bug Error detection Copy-paste error
5	Towards effective analysis of big graphs : from scalability to quality Tian, Chao January 2017 (has links) This thesis investigates the central issues underlying graph analysis, namely, scalability and quality. We first study the incremental problems for graph queries, which aim to compute the changes to the old query answer, in response to the updates to the input graph. The incremental problem is called bounded if its cost is decided by the sizes of the query and the changes only. No matter how desirable, however, our first results are negative: for common graph queries such as graph traversal, connectivity, keyword search and pattern matching, their incremental problems are unbounded. In light of the negative results, we propose two new characterizations for the effectiveness of incremental computation, and show that the incremental computations above can still be effectively conducted, by either reducing the computations on big graphs to small data, or incrementalizing batch algorithms by minimizing unnecessary recomputation. We next study the problems with regards to improving the quality of the graphs. To uniquely identify entities represented by vertices in a graph, we propose a class of keys that are recursively defined in terms of graph patterns, and are interpreted with subgraph isomorphism. As an application, we study the entity matching problem, which is to find all pairs of entities in a graph that are identified by a given set of keys. Although the problem is proved to be intractable, and cannot be parallelized in logarithmic rounds, we provide two parallel scalable algorithms for it. In addition, to catch numeric inconsistencies in real-life graphs, we extend graph functional dependencies with linear arithmetic expressions and comparison predicates, referred to as NGDs. Indeed, NGDs strike a balance between expressivity and complexity, since if we allow non-linear arithmetic expressions, even of degree at most 2, the satisfiability and implication problems become undecidable. A localizable incremental algorithm is developed to detect errors using NGDs, where the cost is determined by small neighbors of nodes in the updates instead of the entire graph. Finally, a rule-based method to clean graphs is proposed. We extend graph entity dependencies (GEDs) as data quality rules. Given a graph, a set of GEDs and a block of ground truth, we fix violations of GEDs in the graph by combining data repairing and object identification. The method finds certain fixes to errors detected by GEDs, i.e., as long as the GEDs and the ground truth are correct, the fixes are assured correct as their logical consequences. Several fundamental results underlying the method are established, and an algorithm is developed to implement the method. We also parallelize the method and guarantee to reduce its running time with the increase of processors.
6	Systém pro vyhledávání chemických struktur / System for Searching of Chemical Structures Ševčík, Ivan January 2018 (has links) This thesis deals with the problem of searching of structures in large chemical compounds databases. The aim is to design and implement an efficient system that supports two basic types of search, which are identity and substructure search. This task is complicated not only by the large number of entries in databases but also by graph representation of chemical structures, for which many algorithms are hard to solve. The thesis will introduce concepts which will prove useful in solving these problems. A web service is also created as a part of the thesis in order to make the database searching available to the users.
7	Porovnání přístupů k dotazování chemických sloučenin / Comparison of Approaches for Querying of Chemical Compounds Šípek, Vojtěch January 2019 (has links) The purpose of this thesis is to perform an analysis of approaches to querying chemical databases and to validate or invalidate its results. Currently, there exists no work which would compare the performance and memory usage of the best performing approaches on the same data set. In this thesis, we address this lack of information and we create an un-biased benchmark of the most popular index building methods for subgraph querying of chemical databases. Also, we compare the results of such benchmark with the performance results of an SQL and a graph database. 1
8	Modélisation spatio-temporelle du trafic routier en milieu urbain / Spatio-temporal modeling of urban road traffic Oberoi, Kamaldeep Singh 18 November 2019 (has links) Le domaine de la modélisation du trafic routier vise à comprendre son évolution. Dans les dernières années, plusieurs modèles du trafic ont été proposés dans l’objectif de géolocaliser les embouteillages au sein du trafic, détecter des motifs dans le trafic routier, estimer l’état du trafic etc. La plupart des modèles proposés considèrent le trafic routier en termes de ses constituants ou comme une entité agrégée en fonction de l’échelle choisie et expliquent l’évolution du trafic quantitativement en tenant compte des relations entre les variables de trafic comme le flot, la densité et la vitesse. Ces modèles décrivent le trafic en utilisant des données très précises acquises par différents capteurs. La précision des données rend son calcul coûteux en termes de ressources requises. Une des solutions à ce problème est la représentation qualitative du trafic routier qui réduit le nombre de ressources de traitement nécessaires. Puisque le trafic routier est un phénomène spatio-temporel, les modèles proposés pour représenter ce type de phénomène pourraient être appliqués dans le cas du trafic routier. Les modèles spatio-temporels, proposés par la communauté de l’Analyse Spatio-Temporelle, ont comme objectif la représentation d’un phénomène tant du point de vue qualitatif que quantitatif. Certains de ces modèles proposent une discrétisation des phénomènes modélisés en considérant un phénomène comme constitué d’entités. Appliquée au trafic routier, cette notion permet d’identifier différentes entités, comme les véhicules, les piétons, les bâtiments etc., qui le constituent. Ces entités influent sur l’évolution du trafic. Les modèles spatio-temporels qualitatifs définissent l’effet des différentes entités les unes sur les autres en terme de relations spatiales. L’évolution spatio-temporelle du phénomène modélisé est représenté par la variation temporelle de ces relations. La prise en compte des entités du trafic et des relations spatiales formalise une structure qui peut être représentée en utilisant un graphe, où les nœuds modélisent des entités et les arcs des relations spatiales. Par conséquent, l’évolution du trafic, modélisée via ce graphe, devient l’évolution du graphe et peut être représenté en terme de la variation de la structure du graphe ainsi que celle des attributs de ses nœuds et de ses arcs. Dans cette thèse, nous proposons une modélisation du trafic routier de ce type basée sur la théorie des graphes. Une des applications à la modélisation du trafic routier est la détection des motifs pertinents au sein du trafic. Dans les modèles du trafic existants, les motifs détectés sont statistiques et sont représentés en utilisant des caractéristiques numériques. Le modèle que nous pro posons dans cette thèse met en avant la structure représentant le trafic routier et peut donc être utilisé pour définir des motifs structurels du trafic qui prennent en compte des différentes entités du trafic et leurs relations. Ces motifs structurels sont sous-jacents à une modélisation sous forme de graphe dynamique. Dans cette thèse, nous proposons un algorithme pour détecter ces motifs structurels du trafic dans le graphe spatio-temporel représentant le trafic routier. Ce problème est formalisé comme celui de l’isomorphisme de sous-graphe pour des graphes dynamiques. L’algorithme proposé est évalué en fonction desdifférents paramètres de graphes. / For past several decades, researchers have been interested in understanding traffic evolution, hence, have proposed various traffic models to identify bottleneck locations where traffic congestion occurs, to detect traffic patterns, to predict traffic states etc. Most of the existing models consider traffic as many-particle system, describe it using different scales of representation and explain its evolution quantitatively by deducing relations between traffic variables like flow, density and speed. Such models are mainly focused on computing precise information about traffic using acquired traffic data. However, computation of such precise information requires more processing resources. A way to remedy this problem is to consider traffic evolution in qualitative terms which reduces the required number of processing resources. Since traffic is spatio-temporal in nature, the models which deal with spatio-temporal phenomenon can be applied in case of traffic. Such models represent spatio-temporal phenomenon from qualitative as well as quantitative standpoints. Depending on the intended application, some models are able to differentiate between various entities taking part in the phenomenon, which proves useful in case of traffic since different objects like vehicles, buildings, pedestrians, bicycles etc., directly affecting traffic evolution, can be included in traffic models. Qualitative spatio-temporal models consider the effects of different entities on each other in terms of spatial relations between them and spatio-temporal evolution of the modeled phenomenon is described in terms of variation in such relations over time. Considering different traffic constituents and spatial relations between them leads to the formation of a structure which can be abstracted using graph, whose nodes represent individual constituents and edges represent the corresponding spatial relations. As a result, the evolution of traffic, represented using graph, is described in terms of evolution of the graph itself, i. e. change in graph structure and attributes of nodes and edges, with time. In this thesis, we propose such a graph model to represent traffic. As mentioned above, one of the applications of existing traffic models is in detecting traffic patterns. However, since such models consider traffic quantitatively, in terms of acquired traffic data, the patterns detected using such models are statistical (a term employed by Pattern Recognition researchers) in the sense that they are represented using numerical description. Since graph-based traffic model proposed in this thesis represents the structure of traffic, it can be employed to redefine the meaning of traffic patterns from statistical to structural (also a term from Pattern Recognition community). Structural traffic patterns include different traffic constituents and their inter-links and are represented using time-varying graphs. An algorithm to detect a given structural traffic pattern in the spatio-temporal graph representing traffic is proposed in this thesis. It formalizes this problem as subgraph isomorphism for time-varying graphs. In the end, the performance of the algorithm is tested using various graph parameters. Modélisation du trafic routier Modélisation qualitative Modélisation spatio-temporelle Théorie des graphes Graphes dynamiques Détection de motifs Motifs structurels Isomorphisme de sous-graphe Traffic modeling Qualitative modeling Spatio-temporal modeling Graph theory Time-varying graphs Pattern detection Structural patterns Subgraph isomorphism 006.4

Search results