11 |
Utilising semantic technologies for intelligent indexing and retrieval of digital imagesOsman, T., Thakker, Dhaval, Schaefer, G. 15 October 2013 (has links)
Yes / Yes / The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing a colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they in principle rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this paper we present a semantically-enabled image annotation and retrieval engine that is designed to satisfy the requirements of the commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as the exploitation of lexical databases for explicit semantic-based query expansion.
|
12 |
Towards Secure Outsourced Data Services in the Public CloudSun, Wenhai 25 July 2018 (has links)
Past few years have witnessed a dramatic shift for IT infrastructures from a self-sustained model to a centralized and multi-tenant elastic computing paradigm -- Cloud Computing, which significantly reshapes the landscape of existing data utilization services. In truth, public cloud service providers (CSPs), e.g. Google, Amazon, offer us unprecedented benefits, such as ubiquitous and flexible access, considerable capital expenditure savings and on-demand resource allocation. Cloud has become the virtual ``brain" as well to support and propel many important applications and system designs, for example, artificial intelligence, Internet of Things, and so forth; on the flip side, security and privacy are among the primary concerns with the adoption of cloud-based data services in that the user loses control of her/his outsourced data. Encrypting the sensitive user information certainly ensures the confidentiality. However, encryption places an extra layer of ambiguity and its direct use may be at odds with the practical requirements and defeat the purpose of cloud computing technology. We believe that security in nature should not be in contravention of the cloud outsourcing model. Rather, it is expected to complement the current achievements to further fuel the wide adoption of the public cloud service. This, in turn, requires us not to decouple them from the very beginning of the system design. Drawing the successes and failures from both academia and industry, we attempt to answer the challenges of realizing efficient and useful secure data services in the public cloud. In particular, we pay attention to security and privacy in two essential functions of the cloud ``brain", i.e. data storage and processing. Our first work centers on the secure chunk-based deduplication of encrypted data for cloud backup and achieves the performance comparable to the plaintext cloud storage deduplication while effectively mitigating the information leakage from the low-entropy chunks. On the other hand, we comprehensively study the promising yet challenging issue of search over encrypted data in the cloud environment, which allows a user to delegate her/his search task to a CSP server that hosts a collection of encrypted files while still guaranteeing some measure of query privacy. In order to accomplish this grand vision, we explore both software-based secure computation research that often relies on cryptography and concentrates on algorithmic design and theoretical proof, and trusted execution solutions that depend on hardware-based isolation and trusted computing. Hopefully, through the lens of our efforts, insights could be furnished into future research in the related areas. / Ph. D. / Past few years have witnessed a dramatic shift for IT infrastructures from a self-sustained model to a centralized and multi-tenant elastic computing paradigm – Cloud Computing, which significantly reshapes the landscape of existing data utilization services. In truth, public cloud service providers (CSPs), e.g. Google, Amazon, offer us unprecedented benefits, such as ubiquitous and flexible access, considerable capital expenditure savings and on-demand resource allocation. Cloud has become the virtual “brain” as well to support and propel many important applications and system designs, for example, artificial intelligence, Internet of Things, and so forth; on the flip side, security and privacy are among the primary concerns with the adoption of cloud-based data services in that the user loses control of her/his outsourced data. Encryption definitely provides strong protection to user sensitive data, but it also disables the direct use of cloud data services and may defeat the purpose of cloud computing technology. We believe that security in nature should not be in contravention of the cloud outsourcing model. Rather, it is expected to complement the current achievements to further fuel the wide adoption of the public cloud service. This, in turn, requires us not to decouple them from the very beginning of the system design. Drawing the successes and failures from both academia and industry, we attempt to answer the challenges of realizing efficient and useful secure data services in the public cloud. In particular, we pay attention to security and privacy in two essential functions of the cloud “brain”, i.e. data storage and processing. The first part of this research aims to provide a privacy-preserving data deduplication scheme with the performance comparable to the existing cloud backup storage deduplication. In the second part, we attempt to secure the fundamental information retrieval functions and offer effective solutions in various contexts of cloud data services.
|
13 |
Secure and Reliable Data Outsourcing in Cloud ComputingCao, Ning 31 July 2012 (has links)
"The many advantages of cloud computing are increasingly attracting individuals and organizations to outsource their data from local to remote cloud servers. In addition to cloud infrastructure and platform providers, such as Amazon, Google, and Microsoft, more and more cloud application providers are emerging which are dedicated to offering more accessible and user friendly data storage services to cloud customers. It is a clear trend that cloud data outsourcing is becoming a pervasive service. Along with the widespread enthusiasm on cloud computing, however, concerns on data security with cloud data storage are arising in terms of reliability and privacy which raise as the primary obstacles to the adoption of the cloud. To address these challenging issues, this dissertation explores the problem of secure and reliable data outsourcing in cloud computing. We focus on deploying the most fundamental data services, e.g., data management and data utilization, while considering reliability and privacy assurance. The first part of this dissertation discusses secure and reliable cloud data management to guarantee the data correctness and availability, given the difficulty that data are no longer locally possessed by data owners. We design a secure cloud storage service which addresses the reliability issue with near-optimal overall performance. By allowing a third party to perform the public integrity verification, data owners are significantly released from the onerous work of periodically checking data integrity. To completely free the data owner from the burden of being online after data outsourcing, we propose an exact repair solution so that no metadata needs to be generated on the fly for the repaired data. The second part presents our privacy-preserving data utilization solutions supporting two categories of semantics - keyword search and graph query. For protecting data privacy, sensitive data has to be encrypted before outsourcing, which obsoletes traditional data utilization based on plaintext keyword search. We define and solve the challenging problem of privacy-preserving multi- keyword ranked search over encrypted data in cloud computing. We establish a set of strict privacy requirements for such a secure cloud data utilization system to become a reality. We first propose a basic idea for keyword search based on secure inner product computation, and then give two improved schemes to achieve various stringent privacy requirements in two different threat models. We also investigate some further enhancements of our ranked search mechanism, including supporting more search semantics, i.e., TF × IDF, and dynamic data operations. As a general data structure to describe the relation between entities, the graph has been increasingly used to model complicated structures and schemaless data, such as the personal social network, the relational database, XML documents and chemical compounds. In the case that these data contains sensitive information and need to be encrypted before outsourcing to the cloud, it is a very challenging task to effectively utilize such graph-structured data after encryption. We define and solve the problem of privacy-preserving query over encrypted graph-structured data in cloud computing. By utilizing the principle of filtering-and-verification, we pre-build a feature-based index to provide feature-related information about each encrypted data graph, and then choose the efficient inner product as the pruning tool to carry out the filtering procedure."
|
14 |
Benchmark para métodos de consultas por palavras-chave a bancos de dados relacionais / Benchmark for query methods by keywords to relational databasesOliveira Filho, Audir da Costa 21 June 2018 (has links)
Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2018-08-03T11:37:48Z
No. of bitstreams: 2
Dissertação - Audir da Costa Oliveira Filho - 2018.pdf: 1703675 bytes, checksum: f21c9ff479b840d0cdd37dfc9827c0dd (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2018-08-03T11:41:39Z (GMT) No. of bitstreams: 2
Dissertação - Audir da Costa Oliveira Filho - 2018.pdf: 1703675 bytes, checksum: f21c9ff479b840d0cdd37dfc9827c0dd (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2018-08-03T11:41:39Z (GMT). No. of bitstreams: 2
Dissertação - Audir da Costa Oliveira Filho - 2018.pdf: 1703675 bytes, checksum: f21c9ff479b840d0cdd37dfc9827c0dd (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2018-06-21 / Keyword query techniques have been proven to be very effective due of their user-friendliness on the Web.
However, much of the data is stored in relational databases, being necessary knowledge of a structured
language to access this data. In this sense, during the last decade some works have been proposed with the
intention of performing keyword queries to relational databases. However, systems that implement this
approach have been validated using ad hoc methods that may not reflect real-world workloads. The present
work proposes a benchmark for evaluation of the methods of keyword queries to relational databases
defining a standardized form with workloads that are consistent with the real world. This proposal assists in
assessing the effectiveness of current and future systems. The results obtained with the benchmark
application suggest that there are still many gaps to be addressed by keyword query techniques. / Técnicas de consultas por palavras-chave se mostraram muito eficazes devido à sua facilidade
de utilização por usuário na Web. Contudo, grande parte dos dados estão armazenados em
bancos de dados relacionais, sendo necessário conhecimento de uma linguagem estruturada
para acesso a esses dados. Nesse sentido, durante a última década alguns trabalhos foram
propostos com intuito de realizar consultas por palavras-chaves a bancos de dados
relacionais. No entanto, os sistemas que implementam essa abordagem foram validados
utilizando métodos ad hoc com bancos de dados que podem não refletir as cargas utilizadas
no mundo real. O presente trabalho propõe um benchmark para avaliação dos métodos de
consultas por palavras-chave a bancos de dados relacionais definindo uma forma padronizada
com cargas de trabalhos condizentes com a do mundo real. Esta proposta auxilia na avaliaçãode eficácia dos sistemas atuais e futuros. Os resultados obtidos com a aplicação do
benchmark sugerem que ainda existe muitas lacunas a serem tratadas pelas técnicas de
consultas por palavras-chave.
|
15 |
Semantisk eller keywords? : En studie av interna sökfunktioner och användarens upplevelseStrand, Charlotte January 2023 (has links)
The idea for this study is based on a collaboration with Södra Skogsägarna Ekonomisk Förening, one of Sweden's leading forest industries, who wanted to investigate the possibilities of a new internal search function on its public website, primarily with the help of Azure Cognitive Search. Before and in connection with the implementation of a new search function, the following questions aimed to be answered: • RQ1: How does semantic search differ from keyword search? What are the limitations of semantic search today? • RQ2: In what ways does the user experience of the new search function differ from the old search function? To find answers to the questions, a literature study was conducted and case studies consisting of a survey among the website's visitors and two different user surveys. The literature study aimed to answer RQ1 and form a knowledge base for the design of the new search function by examining the search engine's history, the difference between a keyword-based search function and a semantic search function, and looking at how one expects today's smart search functions to develop. The survey included questions about visitors' use of the existing search function and perception of it. User survey number 1 was conducted with a select group of participants. The survey consisted of a number of tasks that would be performed using the existing search function to get a better picture of the user experience and help answering RQ2. When the new search function was ready for testing, User Survey number 2 was conducted where participants compared the old and the new search function by performing the same tasks with both solutions open in parallel windows. The study showed that the majority of the participants in the survey perceived the old search function as effective enough to make them satisfied. User survey 1 suggested that relevant results came too far down the results list or no relevant results were obtained at all. After implementing Azure Cognitive Search with a semantic feature enabled, test participants were able to ask questions in the search box and get answers directly at the top of the results list, which made the new search feature preferred over the old one. The literature study showed how keyword-based search is based on the principle of keywords and its occurrence in the searchable index, while a semantic search function tries to interpret the meaning behind the search term instead. / Idén till denna studie grundar sig ett samarbete med Södra Skogsägarna Ekonomisk Förening, en av Sveriges ledande skogsindustrier, som ville undersöka möjligheterna med en ny intern sökfunktion på sin publika webbplats, främst med hjälp av Azure Cognitive Search. Inför och i samband med implementeringen av en ny sökfunktion ville man besvara följande frågeställningar: · RQ1: Hur skiljer sig semantisk sökning i jämförelse med sökning mot nyckelord (keywords)? Vilka begränsningar finns det med semantisk sökning idag?’ · RQ2: På vilka sätt skiljer sig användarupplevelsen av den nya sökfunktionen med semantisk funktion i jämförelse med den gamla, nyckelordsbaserade sökfunktionen? För att söka svar på frågeställningarna gjordes en litteraturstudie samt fallstudier bestående av en enkät bland webbplatsens besökare och två olika användarundersökningar. Litteraturstudien ämnade besvara RQ1 och utgöra en kunskapsgrund inför utformningen av den nya sökfunktionen genom att undersöka sökmotorns historia, skillnaden mellan en nyckelordsbaserad sökfunktion och en semantisk sökfunktion samt se på hur man förväntar sig att dagens smarta sökfunktioner kommer att utvecklas. Enkäten innehöll frågor om besökarnas användande av den befintliga sökfunktionen och uppfattningen om den. Användarundersökning 1 utfördes med en utvald skara deltagare. Undersökningen bestod av ett antal uppgifter som skulle utföras med hjälp av den befintliga sökfunktionen för att få en bättre bild av användarupplevelsen och hjälpa till att besvara RQ2. När den nya sökfunktionen var klar för test gjordes Användarundersökning 2 där man lät deltagarna jämför den gamla och den nya sökfunktionen genom att utföra samma uppgifter med båda lösningarna parallellt. Studien visade att majoriteten av deltagarna i enkäten upplevde den gamla sökfunktionen som tillräckligt effektiv för att göra dem nöjda. Användarundersökning 1 antydde att relevanta resultat kom för långt ner i resultatlistan eller så fick man inga relevanta resultat alls. Efter implementering av Azure Cognitive Search med en semantisk funktion påkopplad kunde testdeltagarna ställa frågor i sökrutan och få svar direkt högst upp i resultatlistan, vilket gjorde att den nya sökfunktionen föredrogs framför en gamla. Litteraturstudien visade på hur nyckelordsbaserat sök grundar sig på principen om nyckelord, keywords och dess förekomst i det sökbara indexet medan en semantisk sökfunktion försöker tolka meningen bakom söktermen i stället.
|
16 |
[pt] NOVAS MEDIDAS DE IMPORTÂNCIA DE VÉRTICES PARA APERFEIÇOAR A BUSCA POR PALAVRAS-CHAVE EM GRAFOS RDF / [en] NOVEL NODE IMPORTANCE MEASURES TO IMPROVE KEYWORD SEARCH OVER RDF GRAPHSELISA SOUZA MENENDEZ 15 April 2019 (has links)
[pt] Um ponto importante para o sucesso de sistemas de busca por palavras-chave é um mecanismo de ranqueamento que considera a importância dos documentos recuperados. A noção de importância em grafos é tipicamente computada usando medidas de centralidade, que dependem amplamente do grau dos nós, como o PageRank. Porém, em grafos RDF, a noção de importância não é necessariamente relacionada com o grau do nó. Sendo assim, esta tese aborda dois problemas: (1) como definir uma medida de importância em grafos RDF; (2) como usar essas medidas para ajudar a compilar e ranquear respostas a consultas por palavras-chave sobre grafos RDF. Para resolver estes problemas, esta tese propõe uma nova família de medidas, chamada de InfoRank, e um sistema de busca por palavras-chave, chamado QUIRA, para grafos RDF. Esta tese é concluída com experimentos que mostram que a solução proposta melhora a qualidade dos resultados em benchmarks de busca por palavras-chave. / [en] A key contributor to the success of keyword search systems is a ranking mechanism that considers the importance of the retrieved documents. The notion of importance in graphs is typically computed using centrality measures that highly depend on the degree of the nodes, such as PageRank. However, in RDF graphs, the notion of importance is not necessarily related to the node degree. Therefore, this thesis addresses two problems: (1) how to define importance measures for RDF graphs; (2) how to use these measures to help compile and rank results of keyword queries over RDF graphs. To solve these problems, the thesis proposes a novel family of measures, called InfoRank, and a keyword search system, called QUIRA, for RDF graphs. Finally, this thesis concludes with experiments showing that the proposed solution improves the quality of the results in two keyword search benchmarks.
|
17 |
Exploration et interrogation de données RDF intégrant de la connaissance métier / Integrating domain knowledge for RDF dataset exploration and interrogationOuksili, Hanane 21 October 2016 (has links)
Un nombre croissant de sources de données est publié sur le Web, décrites dans les langages proposés par le W3C tels que RDF, RDF(S) et OWL. Une quantité de données sans précédent est ainsi disponible pour les utilisateurs et les applications, mais l'exploitation pertinente de ces sources constitue encore un défi : l'interrogation des sources est en effet limitée d'abord car elle suppose la maîtrise d'un langage de requêtes tel que SPARQL, mais surtout car elle suppose une certaine connaissance de la source de données qui permet de cibler les ressources et les propriétés pertinentes pour les besoins spécifiques des applications. Le travail présenté ici s'intéresse à l'exploration de sources de données RDF, et ce selon deux axes complémentaires : découvrir d'une part les thèmes sur lesquels porte la source de données, fournir d'autre part un support pour l'interrogation d'une source sans l'utilisation de langage de requêtes, mais au moyen de mots clés. L'approche d'exploration proposée se compose ainsi de deux stratégies complémentaires : l'exploration thématique et la recherche par mots clés. La découverte de thèmes dans une source de données RDF consiste à identifier un ensemble de sous-graphes, non nécessairement disjoints, chacun représentant un ensemble cohérent de ressources sémantiquement liées et définissant un thème selon le point de vue de l'utilisateur. Ces thèmes peuvent être utilisés pour permettre une exploration thématique de la source, où les utilisateurs pourront cibler les thèmes pertinents pour leurs besoins et limiter l'exploration aux seules ressources composant les thèmes sélectionnés. La recherche par mots clés est une façon simple et intuitive d'interroger les sources de données. Dans le cas des sources de données RDF, cette recherche pose un certain nombre de problèmes, comme l'indexation des éléments du graphe, l'identification des fragments du graphe pertinents pour une requête spécifique, l'agrégation de ces fragments pour former un résultat, et le classement des résultats obtenus. Nous abordons dans cette thèse ces différents problèmes, et nous proposons une approche qui permet, en réponse à une requête mots clés, de construire une liste de sous-graphes et de les classer, chaque sous-graphe correspondant à un résultat pertinent pour la requête. Pour chacune des deux stratégies d'exploration d'une source RDF, nous nous sommes intéressés à prendre en compte de la connaissance externe, permettant de mieux répondre aux besoins des utilisateurs. Cette connaissance externe peut représenter des connaissances du domaine, qui permettent de préciser le besoin exprimé dans le cas d'une requête, ou de prendre en compte des connaissances permettant d'affiner la définition des thèmes. Dans notre travail, nous nous sommes intéressés à formaliser cette connaissance externe et nous avons pour cela introduit la notion de pattern. Ces patterns représentent des équivalences de propriétés et de chemins dans le graphe représentant la source. Ils sont évalués et intégrés dans le processus d'exploration pour améliorer la qualité des résultats. / An increasing number of datasets is published on the Web, expressed in languages proposed by the W3C to describe Web data such as RDF, RDF(S) and OWL. The Web has become a unprecedented source of information available for users and applications, but the meaningful usage of this information source is still a challenge. Querying these data sources requires the knowledge of a formal query language such as SPARQL, but it mainly suffers from the lack of knowledge about the source itself, which is required in order to target the resources and properties relevant for the specific needs of the application. The work described in this thesis addresses the exploration of RDF data sources. This exploration is done according to two complementary ways: discovering the themes or topics representing the content of the data source, and providing a support for an alternative way of querying the data sources by using keywords instead of a query formulated in SPARQL. The proposed exploration approach combines two complementary strategies: thematic-based exploration and keyword search. Theme discovery from an RDF dataset consists in identifying a set of sub-graphs which are not necessarily disjoints, and such that each one represents a set of semantically related resources representing a theme according to the point of view of the user. These themes can be used to enable a thematic exploration of the data source where users can target the relevant theme and limit their exploration to the resources composing this theme. Keyword search is a simple and intuitive way of querying data sources. In the case of RDF datasets, this search raises several problems, such as indexing graph elements, identifying the relevant graph fragments for a specific query, aggregating these relevant fragments to build the query results, and the ranking of these results. In our work, we address these different problems and we propose an approach which takes as input a keyword query and provides a list of sub-graphs, each one representing a candidate result for the query. These sub-graphs are ordered according to their relevance to the query. For both keyword search and theme identification in RDF data sources, we have taken into account some external knowledge in order to capture the users needs, or to bridge the gap between the concepts invoked in a query and the ones of the data source. This external knowledge could be domain knowledge allowing to refine the user's need expressed by a query, or to refine the definition of themes. In our work, we have proposed a formalization to this external knowledge and we have introduced the notion of pattern to this end. These patterns represent equivalences between properties and paths in the dataset. They are evaluated and integrated in the exploration process to improve the quality of the result.
|
18 |
[en] A KEYWORD-BASED QUERY PROCESSING METHOD FOR DATASETS WITH SCHEMAS / [pt] MÉTODO PARA O PROCESSAMENTO DE CONSULTAS POR PALAVRAS-CHAVES PARA BASES DE DADOS COM ESQUEMASGRETTEL MONTEAGUDO GARCÍA 23 June 2020 (has links)
[pt] Usuários atualmente esperam consultar dados de maneira semelhante ao Google, digitando alguns termos, chamados palavras-chave, e deixando para o sistema recuperar os dados que melhor correspondem ao conjunto de palavras-chave. O cenário é bem diferente em sistemas de gerenciamento de banco de dados em que os usuários precisam conhecer linguagens de consulta sofisticadas para recuperar dados, ou em aplicações de banco de dados em que as interfaces de usuário são projetadas como inúmeras caixas que o usuário deve preencher com seus parâmetros de pesquisa. Esta tese descreve um algoritmo e um framework projetados para processar consultas baseadas em palavras-chave para bases de dados com esquema, especificamente bancos relacionais e bases de dados em RDF. O algoritmo primeiro converte uma consulta baseada em palavras-chave em uma consulta abstrata e, em seguida, compila a consulta abstrata em uma consulta SPARQL ou SQL, de modo que cada resultado da consulta SPARQL (resp. SQL)
seja uma resposta para a consulta baseada em palavras-chave. O algoritmo explora o esquema para evitar a intervenção do usuário durante o processo de busca e oferece um mecanismo de feedback para gerar novas respostas. A tese termina com experimentos nas bases de dados Mondial, IMDb e Musicbrainz. O algoritmo proposto obtém resultados satisfatórios para os benchmarks. Como parte dos experimentos, a tese também compara os resultados e o desempenho obtidos com bases de dados em RDF e bancos de dados relacionais. / [en] Users currently expect to query data in a Google-like style, by simply typing some terms, called keywords, and leaving it to the system to retrieve the data that best match the set of keywords. The scenario is quite different in database management systems, where users need to know sophisticated query languages to retrieve data, and in database applications, where the user interfaces are designed as a stack of pages with numerous boxes that the user must fill with his search parameters. This thesis describes an algorithm and a framework designed to support keywordbased queries for datasets with schema, specifically RDF datasets and relational databases. The algorithm first translates a keyword-based query into an abstract
query, and then compiles the abstract query into a SPARQL or a SQL query such that each result of the SPARQL (resp. SQL) query is an answer for the keywordbased query. It explores the schema to avoid user intervention during the translation process and offers a feedback mechanism to generate new answers. The thesis concludes with experiments over the Mondial, IMDb, and Musicbrainz databases. The proposed translation algorithm achieves satisfactory results and good performance for the benchmarks. The experiments also compare the RDF and the relational alternatives.
|
19 |
[pt] BUSCA POR PALAVRAS-CHAVE SOBRE GRAFOS RDF FEDERADOS EXPLORANDO SEUS ESQUEMAS / [en] KEYWORD SEARCH OVER FEDERATED RDF GRAPHS BY EXPLORING THEIR SCHEMASYENIER TORRES IZQUIERDO 28 July 2017 (has links)
[pt] O Resource Description Framework (RDF) foi adotado como uma recomendação do W3C em 1999 e hoje é um padrão para troca de dados na Web. De fato, uma grande quantidade de dados foi convertida em RDF, muitas vezes em vários conjuntos de dados fisicamente distribuídos ao longo de diferentes localizações. A linguagem de consulta SPARQL (sigla do inglês de SPARQL Protocol and RDF Query Language) foi oficialmente introduzido em 2008 para recuperar dados RDF e fornecer endpoints para consultar fontes distribuídas. Uma maneira alternativa de acessar conjuntos de dados RDF é usar consultas baseadas em palavras-chave, uma área que tem sido extensivamente pesquisada, com foco recente no conteúdo da Web. Esta dissertação descreve uma estratégia para compilar consultas baseadas em palavras-chave em consultas SPARQL federadas sobre conjuntos de dados RDF distribuídos, assumindo que cada conjunto de dados RDF tem um esquema e que a federação tem um esquema mediado. O processo de compilação da consulta SPARQL federada é explicado em detalhe, incluindo como computar o conjunto de joins externos entre as subconsultas locais geradas, como combinar, com a ajuda de cláusulas UNION, os resultados de consultas locais que não têm joins entre elas, e como construir a cláusula TARGET, de acordo com a composição da cláusula WHERE. Finalmente, a dissertação cobre experimentos com dados do mundo real para validar a implementação. / [en] The Resource Description Framework (RDF) was adopted as a W3C recommendation in 1999 and today is a standard for exchanging data in the Web. Indeed, a large amount of data has been converted to RDF, often as multiple datasets physically distributed over different locations. The SPARQL Protocol and RDF Query Language (SPARQL) was officially introduced in 2008 to retrieve RDF datasets and provide endpoints to query distributed sources. An alternative way to access RDF datasets is to use keyword-based queries, an area that has been extensively researched, with a recent focus on Web content. This dissertation describes a strategy to compile keyword-based queries into federated SPARQL queries over distributed RDF datasets, under the assumption that each RDF dataset has a schema and that the federation has a mediated schema. The compilation process of the federated SPARQL query is explained in detail, including how to compute a set of external joins between the local subqueries, how to combine, with the help of the UNION clauses, the results of local queries which have no external joins between them, and how to construct the TARGET clause, according to the structure of the WHERE clause. Finally, the dissertation covers experiments with real-world data to validate the implementation.
|
20 |
Leyline : a provenance-based desktop search system using graphical sketchpad user interfaceGhorashi, Seyed Soroush 07 December 2011 (has links)
While there are powerful keyword search systems that index all kinds of resources
including emails and web pages, people have trouble recalling semantic facts such as
the name, location, edit dates and keywords that uniquely identifies resources in their
personal repositories. Reusing information exasperates this problem. A rarely used
approach is to leverage episodic memory of file provenance. Provenance is
traditionally defined as "the history of ownership of a valued object". In terms of
documents, we consider not only the ownership, but also the operations performed on
the document, especially those that related it to other people, events, or resources. This
thesis investigates the potential advantages of using provenance data in desktop
search, and consists of two manuscripts. First, a numerical analysis using field data
from a longitudinal study shows that provenance information can effectively be used
to identify files and resources in realistic repositories. We introduce the Leyline, the
first provenance-based search system that supports dynamic relations between files
and resources such as copy/paste, save as, file rename. The Leyline allows users to
search by drawing search queries as graphs in a sketchpad. The Leyline overlays
provenance information that may help users identify targets or explore information
flow. A limited controlled experiment showed that this approach is feasible in terms of
time and effort. Second, we explore the design of the Leyline, compare it to previous provenance-based desktop search systems, including their underlying assumptions and
focus, search coverage and flexibility, and features and limitations. / Graduation date: 2012
|
Page generated in 0.0506 seconds