Spelling suggestions: "subject:"[een] KEYWORD SEARCH"" "subject:"[enn] KEYWORD SEARCH""
1 |
Query Expansion For Handling Exploratory And Ambiguous Keyword QueriesJanuary 2011 (has links)
abstract: Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical options, in order to narrow down the search and reach the desired result. Typical corpus-driven keyword query expansion approaches return popular words in the results as expanded queries. These empirical methods fail to cover all semantics of categories present in the query results. More importantly these methods do not consider the semantic relationship between the keywords featured in an expanded query. Contrary to a normal keyword search setting, these factors are non-trivial in an exploratory and ambiguous query setting where the user's precise discernment of different categories present in the query results is more important for making subsequent search decisions. In this thesis, I propose a new framework for keyword query expansion: generating a set of queries that correspond to the categorization of original query results, which is referred as Categorizing query expansion. Two approaches of algorithms are proposed, one that performs clustering as pre-processing step and then generates categorizing expanded queries based on the clusters. The other category of algorithms handle the case of generating quality expanded queries in the presence of imperfect clusters. / Dissertation/Thesis / M.S. Computer Science 2011
|
2 |
Enhancing the Usability of Complex Structured Data by Supporting Keyword SearchesJanuary 2011 (has links)
abstract: As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data. / Dissertation/Thesis / Ph.D. Computer Science 2011
|
3 |
Semantic Keyword Search on Large-Scale Semi-Structured DataJanuary 2016 (has links)
abstract: Keyword search provides a simple and user-friendly mechanism for information search, and has become increasingly popular for accessing structured or semi-structured data. However, there are two open issues of keyword search on semi/structured data which are not well addressed by existing work yet.
First, while an increasing amount of investigation has been done in this important area, most existing work concentrates on efficiency instead of search quality and may fail to deliver high quality results from semantic perspectives. Majority of the existing work generates minimal sub-graph results that are oblivious to the entity and relationship semantics embedded in the data and in the user query. There are also studies that define results to be subtrees or subgraphs that contain all query keywords but are not necessarily ``minimal''. However, such result construction method suffers from the same problem of semantic mis-alignment between data and user query. In this work the semantics of how to {\em define} results that can capture users' search intention and then the generation of search intention aware results is studied.
Second, most existing research is incapable of handling large-scale structured data. However, as data volume has seen rapid growth in recent years, the problem of how to efficiently process keyword queries on large-scale structured data becomes important. MapReduce is widely acknowledged as an effective programming model to process big data. For keyword query processing on data graph, first graph algorithms which can efficiently return query results that are consistent with users' search intention are proposed. Then these algorithms are migrated to MapReduce to support big data. For keyword query processing on schema graph, it first transforms a keyword query into multiple SQL queries, then all generated SQL queries are run on the structured data. Therefore it is crucial to find the optimal way to execute a SQL query using MapReduce, which can minimize the processing time. In this work, a system called SOSQL is developed which generates the optimal query execution plan using MapReduce for a SQL query $Q$ with time complexity $O(n^2)$, where $n$ is the number of input tables of $Q$. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2016
|
4 |
Resource-dependent acoustic and language modeling for spoken keyword searchChen, I-Fan 27 May 2016 (has links)
In this dissertation, three research directions were explored to alleviate two major issues, i.e., the use of incorrect models and training/test condition mismatches, in the modeling frameworks of modern spoken keyword search (KWS) systems. Each of the three research directions, which include (i) data-efficient training processes, (ii) system optimization objectives, and (iii) data augmentation, utilizes different types and amounts of training resources in different ways to ameliorate the two issues of acoustic and language modeling in modern KWS systems. To be more specific, resource-dependent keyword modeling, keyword-boosted sMBR (state-level minimum Bayes risk) training, and multilingual acoustic modeling are proposed and investigated for acoustic modeling in this research. For language modeling, keyword-aware language modeling, discriminative keyword-aware language modeling, and web text augmented language modeling are presented and discussed. The dissertation provides a comprehensive collection of solutions and strategies to the acoustic and language modeling problems in KWS. It also offers insights into the realization of good-performance KWS systems. Experimental results show that the data-efficient training process and data augmentation are the two directions providing the most prominent performance improvement for KWS systems. While modifying system optimization objectives provides smaller yet consistent performance enhancement in KWS systems with different configurations. The effects of the proposed acoustic and language modeling approaches in the three directions are also shown to be additive and can be combined to further improve the overall KWS system performance.
|
5 |
[en] DISTRIBUTED RDF GRAPH KEYWORD SEARCH / [pt] BUSCA DISTRIBUÍDA EM GRAFO RDF POR PALAVRA-CHAVEDANILO MORET RODRIGUES 26 December 2014 (has links)
[pt] O objetivo desta dissertação é melhorar a busca por palavra-chave em
formato RDF. Propomos uma abordagem escalável, baseada numa representação
tensorial, que permite o armazenamento distribuído e, como consequência, o uso
de técnicas de paralelismo para agilizar a busca sobre grandes bases de RDF, em
particular, as publicadas como Linked Data. Um volume sem precedentes de
informação está sendo disponibilizado seguindo os princípios de Linked Data,
formando o que chamamos de Web of Data. Esta informação, tipicamente
codificada como triplas RDF, costuma ser representada como um grafo, onde
sujeitos e objetos são vértices, e predicados são arestas ligando os vértices. Em
consequência da ampla adoção de mecanismos de busca na World Wide Web,
usuários estão familiarizados com a busca por palavra-chave. No caso de grafos
RDF, no entanto, a extração de uma partição coerente de grafos para enriquecer os
resultados da busca é uma tarefa cara, demorada, e cuja expectativa do usuário é
de que seja executada em tempo real. Este trabalho tem como objetivo o
tratamento deste problema. Parte de uma solução proposta recentemente prega a
indexação do grafo RDF como uma matriz esparsa, que contém um conjunto de
informações pré-computadas para agilizar a extração de seções do grafo, e o uso
de consultas baseadas em tensores sobre a matriz esparsa. Esta abordagem
baseada em tensores permite que se tome vantagem de técnicas modernas de
programação distribuída, e.g., a utilização de bases de dados não-relacionais
fracionadas e o modelo de MapReduce. Nesta dissertação, propomos o desenho e
exploramos a viabilidade da abordagem baseada em tensores, com o objetivo de
construir um depósito de dados distribuído e agilizar a busca por palavras-chave
com uma abordagem paralela. / [en] The goal of this dissertation is to improve RDF keyword search. We
propose a scalable approach, based on a tensor representation that allows for
distributed storage, and thus the use of parallel techniques to speed up the search
over large linked data sets, in particular those published as Linked Data. An
unprecedented amount of information is becoming available following the
principles of Linked Data, forming what is called the Web of Data. This
information, typically codified as RDF subject-predicate-object triples, is
commonly abstracted as a graph which subjects and objects are nodes, and
predicates are edges connecting them. As a consequence of the widespread
adoption of search engines on the World Wide Web, users are familiar with
keyword search. For RDF graphs, however, extracting a coherent subset of data
graphs to enrich search results is a time consuming and expensive task, and it is
expected to be executed on-the-fly at user prompt. The dissertation s goal is to
handle this problem. A recent proposal has been made to index RDF graphs as a
sparse matrix with the pre-computed information necessary for faster retrieval of
sub-graphs, and the use of tensor-based queries over the sparse matrix. The tensor
approach can leverage modern distributed computing techniques, e.g., nonrelational
database sharding and the MapReduce model. In this dissertation, we
propose a design and explore the viability of the tensor-based approach to build a
distributed datastore and speed up keyword search with a parallel approach.
|
6 |
Suporte a consultas temporais por palavras-chave em documentos XML / Supporting temporal keyword queries on XML documentsManica, Edimar January 2010 (has links)
Consultas por palavras-chave permitem o acesso fácil a dados XML, uma vez que não exigem que o usuário aprenda uma linguagem de consulta estruturada nem estude possíveis esquemas de dados complexos. Com isso, vários motores de busca XML foram propostos para permitir a extração de fragmentos XML relevantes para consultas por palavras-chave. No entanto, esses motores de busca tratam as expressões temporais da mesma forma que qualquer outra palavra-chave. Essa abordagem ocasiona inúmeros problemas, como por exemplo, considerar como casamentos para uma expressão temporal nodos do domínio preço ou código. Este trabalho descreve TPI (Two Phase Interception), uma abordagem que permite o suporte a consultas temporais por palavras-chave em documentos XML orientados a dados. O suporte a consultas temporais é realizado através de uma camada adicional de software que executa duas interceptações no processamento de consultas, realizado por um motor de busca XML. Esta camada adicional de software é responsável pelo tratamento adequado das informações temporais presentes na consulta e no conteúdo dos documentos XML. O trabalho ainda especifica TKC (Temporal Keyword Classification), uma classificação de consultas temporais que serve de guia para qualquer mecanismo de consulta por palavras-chave, inclusive TPI. São apresentados os algoritmos de mapeamento das diferentes formas de predicados temporais por palavras-chave, especificadas em TKC, para expressões relacionais a fim de orientar a implementação do processamento das consultas temporais. É proposto um índice temporal e definidas estratégias para identificação de caminhos temporais, desambiguação de formatos de valores temporais, identificação de datas representadas por vários elementos e identificação de intervalos temporais. São demonstrados experimentos que comparam a qualidade, o tempo de processamento e a escalabilidade de um motor de busca XML com e sem a utilização de TPI. A principal contribuição desse trabalho é melhorar significativamente a qualidade dos resultados de consultas temporais por palavras-chave em documentos XML. / Keyword queries enable users to easily access XML data, since the user does not need to learn a structured query language or study possibly complex data schemas. Therewith, several XML search engines have been proposed to extract relevant XML fragments in response to keyword queries. However, these search engines treat the temporal expressions as any other keyword. This approach may lead to several problems. It could, for example, consider prices and codes as matches to a temporal expression. This work describes TPI (Two Phase Interception), an approach that supports temporal keyword queries on data-centric XML documents. The temporal query support is performed by adding an additional software layer that executes two interceptions in the query processing performed by a XML search engine. This additional software layer is responsible for the adequate treatment of the temporal expressions contained in the query and in the contents of the XML documents. This work also specifies TKC (Temporal Keyword Classification), a temporal query classification to be used as guidance for any keyword query mechanism, including TPI. We present the algorithms for mapping different temporal predicates expressed by keywords to relational expressions in order to guide the implementation of the temporal query processing. We propose a temporal index together with strategies to perform temporal path identification, format disambiguation, identification of dates represented by many elements and detection of temporal intervals. This work also reports on experiments which evaluate quality, processing time and scalability of an XML search engine with TPI and without TPI. The main contribution of this work is the significant improvement in the quality of the results of temporal keyword queries on XML documents.
|
7 |
Suporte a consultas temporais por palavras-chave em documentos XML / Supporting temporal keyword queries on XML documentsManica, Edimar January 2010 (has links)
Consultas por palavras-chave permitem o acesso fácil a dados XML, uma vez que não exigem que o usuário aprenda uma linguagem de consulta estruturada nem estude possíveis esquemas de dados complexos. Com isso, vários motores de busca XML foram propostos para permitir a extração de fragmentos XML relevantes para consultas por palavras-chave. No entanto, esses motores de busca tratam as expressões temporais da mesma forma que qualquer outra palavra-chave. Essa abordagem ocasiona inúmeros problemas, como por exemplo, considerar como casamentos para uma expressão temporal nodos do domínio preço ou código. Este trabalho descreve TPI (Two Phase Interception), uma abordagem que permite o suporte a consultas temporais por palavras-chave em documentos XML orientados a dados. O suporte a consultas temporais é realizado através de uma camada adicional de software que executa duas interceptações no processamento de consultas, realizado por um motor de busca XML. Esta camada adicional de software é responsável pelo tratamento adequado das informações temporais presentes na consulta e no conteúdo dos documentos XML. O trabalho ainda especifica TKC (Temporal Keyword Classification), uma classificação de consultas temporais que serve de guia para qualquer mecanismo de consulta por palavras-chave, inclusive TPI. São apresentados os algoritmos de mapeamento das diferentes formas de predicados temporais por palavras-chave, especificadas em TKC, para expressões relacionais a fim de orientar a implementação do processamento das consultas temporais. É proposto um índice temporal e definidas estratégias para identificação de caminhos temporais, desambiguação de formatos de valores temporais, identificação de datas representadas por vários elementos e identificação de intervalos temporais. São demonstrados experimentos que comparam a qualidade, o tempo de processamento e a escalabilidade de um motor de busca XML com e sem a utilização de TPI. A principal contribuição desse trabalho é melhorar significativamente a qualidade dos resultados de consultas temporais por palavras-chave em documentos XML. / Keyword queries enable users to easily access XML data, since the user does not need to learn a structured query language or study possibly complex data schemas. Therewith, several XML search engines have been proposed to extract relevant XML fragments in response to keyword queries. However, these search engines treat the temporal expressions as any other keyword. This approach may lead to several problems. It could, for example, consider prices and codes as matches to a temporal expression. This work describes TPI (Two Phase Interception), an approach that supports temporal keyword queries on data-centric XML documents. The temporal query support is performed by adding an additional software layer that executes two interceptions in the query processing performed by a XML search engine. This additional software layer is responsible for the adequate treatment of the temporal expressions contained in the query and in the contents of the XML documents. This work also specifies TKC (Temporal Keyword Classification), a temporal query classification to be used as guidance for any keyword query mechanism, including TPI. We present the algorithms for mapping different temporal predicates expressed by keywords to relational expressions in order to guide the implementation of the temporal query processing. We propose a temporal index together with strategies to perform temporal path identification, format disambiguation, identification of dates represented by many elements and detection of temporal intervals. This work also reports on experiments which evaluate quality, processing time and scalability of an XML search engine with TPI and without TPI. The main contribution of this work is the significant improvement in the quality of the results of temporal keyword queries on XML documents.
|
8 |
Suporte a consultas temporais por palavras-chave em documentos XML / Supporting temporal keyword queries on XML documentsManica, Edimar January 2010 (has links)
Consultas por palavras-chave permitem o acesso fácil a dados XML, uma vez que não exigem que o usuário aprenda uma linguagem de consulta estruturada nem estude possíveis esquemas de dados complexos. Com isso, vários motores de busca XML foram propostos para permitir a extração de fragmentos XML relevantes para consultas por palavras-chave. No entanto, esses motores de busca tratam as expressões temporais da mesma forma que qualquer outra palavra-chave. Essa abordagem ocasiona inúmeros problemas, como por exemplo, considerar como casamentos para uma expressão temporal nodos do domínio preço ou código. Este trabalho descreve TPI (Two Phase Interception), uma abordagem que permite o suporte a consultas temporais por palavras-chave em documentos XML orientados a dados. O suporte a consultas temporais é realizado através de uma camada adicional de software que executa duas interceptações no processamento de consultas, realizado por um motor de busca XML. Esta camada adicional de software é responsável pelo tratamento adequado das informações temporais presentes na consulta e no conteúdo dos documentos XML. O trabalho ainda especifica TKC (Temporal Keyword Classification), uma classificação de consultas temporais que serve de guia para qualquer mecanismo de consulta por palavras-chave, inclusive TPI. São apresentados os algoritmos de mapeamento das diferentes formas de predicados temporais por palavras-chave, especificadas em TKC, para expressões relacionais a fim de orientar a implementação do processamento das consultas temporais. É proposto um índice temporal e definidas estratégias para identificação de caminhos temporais, desambiguação de formatos de valores temporais, identificação de datas representadas por vários elementos e identificação de intervalos temporais. São demonstrados experimentos que comparam a qualidade, o tempo de processamento e a escalabilidade de um motor de busca XML com e sem a utilização de TPI. A principal contribuição desse trabalho é melhorar significativamente a qualidade dos resultados de consultas temporais por palavras-chave em documentos XML. / Keyword queries enable users to easily access XML data, since the user does not need to learn a structured query language or study possibly complex data schemas. Therewith, several XML search engines have been proposed to extract relevant XML fragments in response to keyword queries. However, these search engines treat the temporal expressions as any other keyword. This approach may lead to several problems. It could, for example, consider prices and codes as matches to a temporal expression. This work describes TPI (Two Phase Interception), an approach that supports temporal keyword queries on data-centric XML documents. The temporal query support is performed by adding an additional software layer that executes two interceptions in the query processing performed by a XML search engine. This additional software layer is responsible for the adequate treatment of the temporal expressions contained in the query and in the contents of the XML documents. This work also specifies TKC (Temporal Keyword Classification), a temporal query classification to be used as guidance for any keyword query mechanism, including TPI. We present the algorithms for mapping different temporal predicates expressed by keywords to relational expressions in order to guide the implementation of the temporal query processing. We propose a temporal index together with strategies to perform temporal path identification, format disambiguation, identification of dates represented by many elements and detection of temporal intervals. This work also reports on experiments which evaluate quality, processing time and scalability of an XML search engine with TPI and without TPI. The main contribution of this work is the significant improvement in the quality of the results of temporal keyword queries on XML documents.
|
9 |
Utilising semantic technologies for intelligent indexing and retrieval of digital imagesOsman, T., Thakker, Dhaval, Schaefer, G. 15 October 2013 (has links)
Yes / Yes / The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing a colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they in principle rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this paper we present a semantically-enabled image annotation and retrieval engine that is designed to satisfy the requirements of the commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as the exploitation of lexical databases for explicit semantic-based query expansion.
|
10 |
Towards Secure Outsourced Data Services in the Public CloudSun, Wenhai 25 July 2018 (has links)
Past few years have witnessed a dramatic shift for IT infrastructures from a self-sustained model to a centralized and multi-tenant elastic computing paradigm -- Cloud Computing, which significantly reshapes the landscape of existing data utilization services. In truth, public cloud service providers (CSPs), e.g. Google, Amazon, offer us unprecedented benefits, such as ubiquitous and flexible access, considerable capital expenditure savings and on-demand resource allocation. Cloud has become the virtual ``brain" as well to support and propel many important applications and system designs, for example, artificial intelligence, Internet of Things, and so forth; on the flip side, security and privacy are among the primary concerns with the adoption of cloud-based data services in that the user loses control of her/his outsourced data. Encrypting the sensitive user information certainly ensures the confidentiality. However, encryption places an extra layer of ambiguity and its direct use may be at odds with the practical requirements and defeat the purpose of cloud computing technology. We believe that security in nature should not be in contravention of the cloud outsourcing model. Rather, it is expected to complement the current achievements to further fuel the wide adoption of the public cloud service. This, in turn, requires us not to decouple them from the very beginning of the system design. Drawing the successes and failures from both academia and industry, we attempt to answer the challenges of realizing efficient and useful secure data services in the public cloud. In particular, we pay attention to security and privacy in two essential functions of the cloud ``brain", i.e. data storage and processing. Our first work centers on the secure chunk-based deduplication of encrypted data for cloud backup and achieves the performance comparable to the plaintext cloud storage deduplication while effectively mitigating the information leakage from the low-entropy chunks. On the other hand, we comprehensively study the promising yet challenging issue of search over encrypted data in the cloud environment, which allows a user to delegate her/his search task to a CSP server that hosts a collection of encrypted files while still guaranteeing some measure of query privacy. In order to accomplish this grand vision, we explore both software-based secure computation research that often relies on cryptography and concentrates on algorithmic design and theoretical proof, and trusted execution solutions that depend on hardware-based isolation and trusted computing. Hopefully, through the lens of our efforts, insights could be furnished into future research in the related areas. / Ph. D. / Past few years have witnessed a dramatic shift for IT infrastructures from a self-sustained model to a centralized and multi-tenant elastic computing paradigm – Cloud Computing, which significantly reshapes the landscape of existing data utilization services. In truth, public cloud service providers (CSPs), e.g. Google, Amazon, offer us unprecedented benefits, such as ubiquitous and flexible access, considerable capital expenditure savings and on-demand resource allocation. Cloud has become the virtual “brain” as well to support and propel many important applications and system designs, for example, artificial intelligence, Internet of Things, and so forth; on the flip side, security and privacy are among the primary concerns with the adoption of cloud-based data services in that the user loses control of her/his outsourced data. Encryption definitely provides strong protection to user sensitive data, but it also disables the direct use of cloud data services and may defeat the purpose of cloud computing technology. We believe that security in nature should not be in contravention of the cloud outsourcing model. Rather, it is expected to complement the current achievements to further fuel the wide adoption of the public cloud service. This, in turn, requires us not to decouple them from the very beginning of the system design. Drawing the successes and failures from both academia and industry, we attempt to answer the challenges of realizing efficient and useful secure data services in the public cloud. In particular, we pay attention to security and privacy in two essential functions of the cloud “brain”, i.e. data storage and processing. The first part of this research aims to provide a privacy-preserving data deduplication scheme with the performance comparable to the existing cloud backup storage deduplication. In the second part, we attempt to secure the fundamental information retrieval functions and offer effective solutions in various contexts of cloud data services.
|
Page generated in 0.0376 seconds