Global ETD Search

461	Gestion des incohérences pour l'accès aux données en présence d'ontologies / Inconsistency Handling in Ontology-Mediated Query Answering Bourgaux, Camille 29 September 2016 (has links) Interroger des bases de connaissances avec des requêtes conjonctives a été une préoccupation majeure de la recherche récente en logique de description. Une question importante qui se pose dans ce contexte est la gestion de données incohérentes avec l'ontologie. En effet, une théorie logique incohérente impliquant toute formule sous la sémantique classique, l'utilisation de sémantiques tolérantes aux incohérences est nécessaire pour obtenir des réponses pertinentes. Le but de cette thèse est de développer des méthodes pour gérer des bases de connaissances incohérentes en utilisant trois sémantiques naturelles (AR, IAR et brave) proposées dans la littérature et qui reposent sur la notion de réparation, définie comme un sous-ensemble maximal des données cohérent avec l'ontologie. Nous utilisons ces trois sémantiques conjointement pour identifier les réponses associées à différents niveaux de confiance. En plus de développer des algorithmes efficaces pour interroger des bases de connaissances DL-Lite incohérentes, nous abordons trois problèmes : (i) l'explication des résultats des requêtes, pour aider l'utilisateur à comprendre pourquoi une réponse est (ou n'est pas) obtenue sous une des trois sémantiques, (ii) la réparation des données guidée par les requêtes, pour améliorer la qualité des données en capitalisant sur les retours des utilisateurs sur les résultats de la requête, et (iii) la définition de variantes des sémantiques à l'aide de réparations préférées pour prendre en compte la fiabilité des données. Pour chacune de ces trois questions, nous développons un cadre formel, analysons la complexité des problèmes de raisonnement associés, et proposons et mettons en œuvre des algorithmes, qui sont étudiés empiriquement sur un jeu de bases de connaissance DL-Lite incohérentes que nous avons construit. Nos résultats indiquent que même si les problèmes à traiter sont théoriquement durs, ils peuvent souvent être résolus efficacement dans la pratique en utilisant des approximations et des fonctionnalités des SAT solveurs modernes. / The problem of querying description logic knowledge bases using database-style queries (in particular, conjunctive queries) has been a major focus of recent description logic research. An important issue that arises in this context is how to handle the case in which the data is inconsistent with the ontology. Indeed, since in classical logic an inconsistent logical theory implies every formula, inconsistency-tolerant semantics are needed to obtain meaningful answers. This thesis aims to develop methods for dealing with inconsistent description logic knowledge bases using three natural semantics (AR, IAR, and brave) previously proposed in the literature and that rely on the notion of a repair, which is an inclusion-maximal subset of the data consistent with the ontology. In our framework, these three semantics are used conjointly to identify answers with different levels of confidence. In addition to developing efficient algorithms for query answering over inconsistent DL-Lite knowledge bases, we address three problems that should support the adoption of this framework: (i) query result explanation, to help the user to understand why a given answer was (not) obtained under one of the three semantics, (ii) query-driven repairing, to exploit user feedback about errors or omissions in the query results to improve the data quality, and (iii) preferred repair semantics, to take into account the reliability of the data. For each of these three topics, we developed a formal framework, analyzed the complexity of the relevant reasoning problems, and proposed and implemented algorithms, which we empirically studied over an inconsistent DL-Lite benchmark we built. Our results indicate that even if the problems related to dealing with inconsistent DL-Lite knowledge bases are theoretically hard, they can often be solved efficiently in practice by using tractable approximations and features of modern SAT solvers. Logiques de description Réponse aux requêtes Gestion de l'incohérence Description logics Query answering Inconsistency handling
462	Identifying, Relating, Consisting and Querying Large Heterogeneous RDF Sources VALDESTILHAS, ANDRE 12 January 2021 (has links) The Linked Data concept relies on a collection of best practices to publish and link structured web-based data. However, the number of available datasets has been growing significantly over the last decades. These datasets are interconnected and now represent the well-known Web of Data, which stands for an extensive collection of concise and detailed interlinked data sets from multiple domains with large datasets. Thus, linking entries across heterogeneous data sources such as databases or knowledge bases becomes an increasing challenge. However, connections between datasets play a leading role in significant activities such as cross-ontology question answering, large-scale inferences, and data integration. In Linked Data, the Linksets are well known for executing the task of generating links between datasets. Due to the heterogeneity of the datasets, this uniqueness is reflected in the structure of the dataset, making a hard task to find relations among those datasets, i.e., to identify how similar they are. In this way, we can say that Linked Data involves Datasets and Linksets and those Linksets needs to be maintained. Such lack of information directed us to the current issues addressed in this thesis, which are: How to Identify and query datasets from a huge heterogeneous collection of RDF (Resource Description Framework) datasets. To address this issue, we need to assure the consistency and to know how the datasets are related and how similar they are. As results, to deal with the need for identifying LOD (Linked Open Data) Datasets, we created an approach called WIMU, which is a regularly updated database index of more than 660K datasets from LODStats and LOD Laundromat, an efficient, low cost and scalable service on the web that shows which dataset most likely defines a URI and various statistics of datasets indexed from LODStats and LOD Laundromat. To integrate and to query LOD datasets, we provide a hybrid SPARQL query processing engine that can retrieve results from 559 active SPARQL endpoints (with a total of 163.23 billion triples) and 668,166 datasets (with a total of 58.49 billion triples) from LOD Stats and LOD Laundromat. To assure consistency of semantic web Linked repositories where these LOD datasets are located we create an approach for the mitigation of the identifier heterogeneity problem and implement a prototype where the user can evaluate existing links, as well as suggest new links to be rated and a time-efficient algorithm for the detection of erroneous links in large-scale link repositories without computing all closures required by the property axiom. To know how the datasets are related and how similar they are we provide a String similarity algorithm called Most Frequent K Characters, in which is based in two nested filters, (1) First Frequency Filter and (2) Hash Intersection filter, that allows discarding candidates before calculating the actual similarity value, thus giving a considerable performance gain, allowing to build a LOD Dataset Relation Index, in which provides information about how similar are all the datasets from LOD cloud, including statistics about the current state of those datasets. The work in this thesis showed that to identify and query LOD datasets, we need to know how those datasets are related, assuring consistency. Our analysis demonstrated that most of the datasets are disconnected from others needing to pass through a consistency and linking process to integrate them, providing a way to query a large number of datasets simultaneously. There is a considerable step towards totally queryable LOD datasets, where the information contained in this thesis is an essential step towards Identifying, Relating, and Querying datasets on the Web of Data.:1 introduction and motivation 1 1.1 The need for identifying and querying LOD datasets . 1 1.2 The need for consistency of semantic web Linked repositories . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 The need for Relation and integration of LOD datasets 2 1.4 Research Questions and Contributions . . . . . . . . . . 3 1.5 Methodology and Contributions . . . . . . . . . . . . . 3 1.6 General Use Cases . . . . . . . . . . . . . . . . . . . . . 6 1.6.1 The Heloise project . . . . . . . . . . . . . . . . . 6 1.7 Chapter overview . . . . . . . . . . . . . . . . . . . . . . 7 2 preliminaries 8 2.1 Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.1 URIs and URLs . . . . . . . . . . . . . . . . . . . 8 2.1.2 Linked Data . . . . . . . . . . . . . . . . . . . . . 9 2.1.3 Resource Description Framework . . . . . . . . 10 2.1.4 Ontologies . . . . . . . . . . . . . . . . . . . . . . 11 2.2 RDF graph . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Transitive property . . . . . . . . . . . . . . . . . . . . . 12 2.4 Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Linkset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.6 RDF graph partitioning . . . . . . . . . . . . . . . . . . 13 2.7 Basic Graph Pattern . . . . . . . . . . . . . . . . . . . . . 13 2.8 RDF Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.9 SPARQL . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.10 Federated Queries . . . . . . . . . . . . . . . . . . . . . . 14 3 state of the art 15 3.1 Identifying Datasets in Large Heterogeneous RDF Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Relating Large amount of RDF datasets . . . . . . . . . 19 3.2.1 Obtaining Similar Resources using String Similarity . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Consistency on Large amout of RDF sources . . . . . . 21 3.3.1 Heterogeneity in DBpedia Identifiers . . . . . . 21 3.3.2 Detection of Erroneous Links in Large-Scale RDF Datasets . . . . . . . . . . . . . . . . . . . . 22 3.4 Querying Large Heterogeneous RDF Datasets . . . . . 25 4 relation among large amount of rdf sources 29 4.1 Identifying Datasets in Large Heterogeneous RDF sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.1 The WIMU approach . . . . . . . . . . . . . . . . 29 4.1.2 The approach . . . . . . . . . . . . . . . . . . . . 30 4.1.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.4 Evaluation: Statistics about the Datasets . . . . 35 4.2 Relating RDF sources . . . . . . . . . . . . . . . . . . . . 38 4.2.1 The ReLOD approach . . . . . . . . . . . . . . . 38 4.2.2 The approach . . . . . . . . . . . . . . . . . . . . 40 4.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Relating Similar Resources using String Similarity . . . 50 4.3.1 The MFKC approach . . . . . . . . . . . . . . . . 50 4.3.2 Approach . . . . . . . . . . . . . . . . . . . . . . 51 4.3.3 Correctness and Completeness . . . . . . . . . . 55 4.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . 57 5 consistency in large amount of rdf sources 67 5.1 Consistency in Heterogeneous DBpedia Identifiers . . 67 5.1.1 The DBpediaSameAs approach . . . . . . . . . . 67 5.1.2 Representation of the idea . . . . . . . . . . . . . 68 5.1.3 The work-flow . . . . . . . . . . . . . . . . . . . 69 5.1.4 Methodology . . . . . . . . . . . . . . . . . . . . 69 5.1.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . 70 5.1.6 Normalization on DBpedia URIs . . . . . . . . . 70 5.1.7 Rate the links . . . . . . . . . . . . . . . . . . . . 71 5.1.8 Results . . . . . . . . . . . . . . . . . . . . . . . . 72 5.1.9 Discussion . . . . . . . . . . . . . . . . . . . . . . 72 5.2 Consistency in Large-Scale RDF sources: Detection of Erroneous Links . . . . . . . . . . . . . . . . . . . . . . . 73 5.2.1 The CEDAL approach . . . . . . . . . . . . . . . 73 5.2.2 Method . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.3 Error Types and Quality Measure for Linkset Repositories . . . . . . . . . . . . . . . . . . . . . 78 5.2.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . 80 5.2.5 Experimental setup . . . . . . . . . . . . . . . . . 80 5.3 Detecting Erroneous Link candidates in Educational Link Repositories . . . . . . . . . . . . . . . . . . . . . . 85 5.3.1 The CEDAL education approach . . . . . . . . . 85 5.3.2 Research questions . . . . . . . . . . . . . . . . . 86 5.3.3 Our contributions . . . . . . . . . . . . . . . . . . 86 5.3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . 86 6 querying large amount of heterogeneous rdf datasets 89 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3 The WimuQ . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.1 Identifying Datasets in Large Heterogeneous RDF Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.2 Relating Large Amount of RDF Datasets . . . . . . . . 101 7.3 Obtaining Similar Resources Using String Similarity . . 102 7.4 Heterogeneity in DBpedia Identifiers . . . . . . . . . . . 102 7.5 Detection of Erroneous Links in Large-Scale RDF Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 7.7 Querying Large Heterogeneous RDF Datasets . . . . . 104 info:eu-repo/classification/ddc/500 ddc:500
463	Lokalizace stanic v Internetu pomocí systému King / Localization of nodes in Internet using King system Exler, Michal January 2011 (has links) This thesis is focus on problematics of nodes localization in Internet. There are describe methods for prediction latency by using artificial coordinates systems and by using direct measurement. This thesis is primarily focus on method names King, which is estimates latency between arbitrary end hosts by using recursive DNS queries in system for translate domain name.
464	Implementace manažerského informačního systému na bázi SAP Business Objects pro mezinárodní výrobní společnosti holdingového typu / Implementation of a Management Information System Based on SAP Business Objects for International Manufacturing Companies a Holding Type Tkačin, Michal January 2015 (has links) Master’s thesis proposes a solution of data and communication model in the ŽOS Trnava, a.s. and GOŠA FŠV. Thesis proposes model for customer, which generates reports for individual management areas.
465	Dotazovací jazyk pro databáze biologických dat / Query Language for Biological Databases Bahurek, Tomáš January 2015 (has links) With rising amount of biological data, biological databases are becoming more important each day. Knowledge discovery (identification of connections that were unknown at the time of data entry) is an essential aspect of these databases. To gain knowledge from these databases one has to construct complicated SQL queries, which requires advanced knowledge of SQL language and used database schema. Biologists usually don't have this knowledge, which creates need for tool, that would offer more intuitive interface for querying biological databases. This work proposes ChQL, an intuitive query language for biological database Chado. ChQL allows biologists to assemble query using terms they are familiar without knowledge of SQL language or Chado database schema. This work implements application for querying Chado database using ChQL. Web interface guides user through process of assembling sentence in ChQL. Application translates this sentence to SQL query, sends it to Chado database and displays returned data in table. Results are evaluated by testing queries on real data.
466	Analysis of parallel scan processing in Shared Disk database systems Rahm, Erhard, Stöhr, Thomas 23 October 2018 (has links) Shared Disk database systems offer a high flexibility for parallel transaction and query processing. This is because each node can process any transaction, query or subquery because it has access to the entire database. Compared to Shared Nothing database systems, this is particularly advantageous for scan queries for which the degree of intra-query parallelism as well as the scan processors themselves can dynamically be chosen. On the other hand, there is the danger of disk contention between subqueries, in particular for index scans. We present a detailed simulation study to analyze the effectiveness of parallel scan processing in Shared Disk database systems. In particular, we investigate the relationship between the degree of declustering and the degree of scan parallelism for relation scans, clustered index scans, and non-clustered index scans. Furthermore, we study the usefulness of disk caches and prefetching for limiting disk contention. Finally, we show that disk contention in multi-user mode can be limited for Shared Disk database systems by dynamically choosing the degree of scan parallelism. info:eu-repo/classification/ddc/004 ddc:004
467	Controlling Disk Contention for Parallel Query Processing in Shared Disk Database Systems Rahm, Erhard, Stöhr, Thomas 08 July 2019 (has links) Shared Disk database systems offer a high flexibility for parallel transaction and query processing. This is because each node can process any transaction, query or subquery because it has access to the entire database. Compared to Shared Nothing, this is particularly advantageous for scan queries for which the degree of intra-query parallelism as well as the scan processors themselves can dynamically be chosen. On the other hand, there is the danger of disk contention between subqueries, in particular for index scans. We present a detailed simulation study to analyze the effectiveness of parallel scan processing in Shared Disk database systems. In particular, we investigate the relationship between the degree of declustering and the degree of scan parallelism for relation scans, clustered index scans, and non-clustered index scans. Furthermore, we study the usefulness of disk caches and prefetching for limiting disk contention. Finally, we show the importance of dynamically choosing the degree of scan parallelism to control disk contention in multi-user mode. info:eu-repo/classification/ddc/004 ddc:004
468	System Identification in Automatic Database Memory Tuning Burrell, Tiffany 25 March 2010 (has links) Databases are very complex systems that require database system administrators to perform system tuning in order to achieve optimal performance. Memory tuning is vital to the performance of a database system because when the database workload exceeds its memory capacity, the results of the queries running on a system are delayed and can cause substantial user dissatisfaction. In order to solve this problem, this thesis presents a platform modeled after a closed control feedback loop to control the level of multi-query processing. Utilizing this platform provides two key assets. First, the system identification is acquired, which is one of two crucial steps involved in developing a closed feedback loop. Second, the platform provides a means to experimentally study database tuning problem and verify the effectiveness of research ideas related to database performance. Workload Management Memory Contention Control Theory Experimental Platform Optimal Multiple Query Processing American Studies Arts and Humanities
469	Audio Moment Retrieval based on Natural Language Query Shevchuk, Danylo January 2020 (has links) Background. Users spend a lot of time searching through media content to find the desirable fragment. Most of the time people can describe verbally what they are looking for but there is not much of a use for that as of today. Using that verbal description as a query to search for the right interval in a given audio sample would save people a lot of time. Objectives. The aim of this thesis is to compare the performance of the methods suitable for retrieving desired intervals from an audio of an arbitrary length using a natural language query input. There are two objectives. The first one is to train models that match a natural language input to the specific interval of a given soundtrack. The second one is to evaluate the models' performance using conventional metrics. Methods. The research method used in this research is mixed. Various literature on the existing methods suitable for audio classification was reviewed. Three models were selected for conducting the experiments. The selected models are YamNet, AlexNet and ResNet-50. Two experiments were conducted. The goal of the first experiment was to measure the models' performance on classifying audio samples. The goal of the second experiment was to measure the same models' performance on the audio intervals retrieval problem which uses classification as a part of the approach. The steps taken to conduct the experiments were reported as well as the statistical data obtained as a result of the experiments. These steps include data collection, data preprocessing, models training and their performance evaluation. Results. The two tests were conducted to see which model performs better on two separate problems - audio classification and intervals retrieval based on a natural language query. The statistical data was obtained as a result of the tests. The degree (performance-wise) to which can we match a natural language query input to a corresponding interval of an audio of an arbitrary length was calculated for each of the selected models. The aggregated performance of the models are mostly comparable, with YamNet occasionally outperforming the other two models. The average Area Under the Curve, and Accuracy for the studied models are as follows: (67, 71.62), (68.99, 67.72) and (66.59, 71.93) for YamNet, AlexNet and ResNet-50, respectively. Conclusions. We have discovered that the tested models were not capable of retrieving intervals from an audio of an arbitrary length based on a natural language query, however the degree to which the models are able to retrieve the intervals varies depending on the queried keyword and other hyperparameters such as the value of the threshold that is used to filter the audio patches that yield too low probability of the queried class. Deep Learning Intervals Retrieval Natural Language Query Audio Classification Computer Sciences Datavetenskap (datalogi)
470	Efficient Query Processing over Spatial-Social Networks Al-Baghdadi, Ahmed 05 April 2022 (has links) No description available. Computer Science

Search results