Spelling suggestions: "subject:" keyword search"" "subject:" ikeyword search""
1 |
Query Expansion For Handling Exploratory And Ambiguous Keyword QueriesJanuary 2011 (has links)
abstract: Query Expansion is a functionality of search engines that suggest a set of related queries for a user issued keyword query. In case of exploratory or ambiguous keyword queries, the main goal of the user would be to identify and select a specific category of query results among different categorical options, in order to narrow down the search and reach the desired result. Typical corpus-driven keyword query expansion approaches return popular words in the results as expanded queries. These empirical methods fail to cover all semantics of categories present in the query results. More importantly these methods do not consider the semantic relationship between the keywords featured in an expanded query. Contrary to a normal keyword search setting, these factors are non-trivial in an exploratory and ambiguous query setting where the user's precise discernment of different categories present in the query results is more important for making subsequent search decisions. In this thesis, I propose a new framework for keyword query expansion: generating a set of queries that correspond to the categorization of original query results, which is referred as Categorizing query expansion. Two approaches of algorithms are proposed, one that performs clustering as pre-processing step and then generates categorizing expanded queries based on the clusters. The other category of algorithms handle the case of generating quality expanded queries in the presence of imperfect clusters. / Dissertation/Thesis / M.S. Computer Science 2011
|
2 |
Enhancing the Usability of Complex Structured Data by Supporting Keyword SearchesJanuary 2011 (has links)
abstract: As pointed out in the keynote speech by H. V. Jagadish in SIGMOD'07, and also commonly agreed in the database community, the usability of structured data by casual users is as important as the data management systems' functionalities. A major hardness of using structured data is the problem of easily retrieving information from them given a user's information needs. Learning and using a structured query language (e.g., SQL and XQuery) is overwhelmingly burdensome for most users, as not only are these languages sophisticated, but the users need to know the data schema. Keyword search provides us with opportunities to conveniently access structured data and potentially significantly enhances the usability of structured data. However, processing keyword search on structured data is challenging due to various types of ambiguities such as structural ambiguity (keyword queries have no structure), keyword ambiguity (the keywords may not be accurate), user preference ambiguity (the user may have implicit preferences that are not indicated in the query), as well as the efficiency challenges due to large search space. This dissertation performs an expansive study on keyword search processing techniques as a gateway for users to access structured data and retrieve desired information. The key issues addressed include: (1) Resolving structural ambiguities in keyword queries by generating meaningful query results, which involves identifying relevant keyword matches, identifying return information, composing query results based on relevant matches and return information. (2) Resolving structural, keyword and user preference ambiguities through result analysis, including snippet generation, result differentiation, result clustering, result summarization/query expansion, etc. (3) Resolving the efficiency challenge in processing keyword search on structured data by utilizing and efficiently maintaining materialized views. These works deliver significant technical contributions towards building a full-fledged search engine for structured data. / Dissertation/Thesis / Ph.D. Computer Science 2011
|
3 |
Semantic Keyword Search on Large-Scale Semi-Structured DataJanuary 2016 (has links)
abstract: Keyword search provides a simple and user-friendly mechanism for information search, and has become increasingly popular for accessing structured or semi-structured data. However, there are two open issues of keyword search on semi/structured data which are not well addressed by existing work yet.
First, while an increasing amount of investigation has been done in this important area, most existing work concentrates on efficiency instead of search quality and may fail to deliver high quality results from semantic perspectives. Majority of the existing work generates minimal sub-graph results that are oblivious to the entity and relationship semantics embedded in the data and in the user query. There are also studies that define results to be subtrees or subgraphs that contain all query keywords but are not necessarily ``minimal''. However, such result construction method suffers from the same problem of semantic mis-alignment between data and user query. In this work the semantics of how to {\em define} results that can capture users' search intention and then the generation of search intention aware results is studied.
Second, most existing research is incapable of handling large-scale structured data. However, as data volume has seen rapid growth in recent years, the problem of how to efficiently process keyword queries on large-scale structured data becomes important. MapReduce is widely acknowledged as an effective programming model to process big data. For keyword query processing on data graph, first graph algorithms which can efficiently return query results that are consistent with users' search intention are proposed. Then these algorithms are migrated to MapReduce to support big data. For keyword query processing on schema graph, it first transforms a keyword query into multiple SQL queries, then all generated SQL queries are run on the structured data. Therefore it is crucial to find the optimal way to execute a SQL query using MapReduce, which can minimize the processing time. In this work, a system called SOSQL is developed which generates the optimal query execution plan using MapReduce for a SQL query $Q$ with time complexity $O(n^2)$, where $n$ is the number of input tables of $Q$. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2016
|
4 |
Design and Development of a Metadata-Driven Search Tool for use with Digital RecordingsRadke, Annemarie Katherine 19 June 2019 (has links)
It is becoming more common for researchers to use existing recordings as a source for data rather than to generate new media for research. Prior to the examination of recordings, data must be extracted from the recordings and the recordings must be described with metadata to allow users to search for the recordings and to search information within the recordings. The purpose of this small-scale study was to develop a web based search tool that will permit a comprehensive search of spoken information within a collection of existing digital recordings archived in an open-access digital repository. The study is significant to the field of instructional design and technology (IDT) as the digital recordings used in this study are interviews, which contain personal histories and insight from leaders and scholars who have influenced and advanced the field of IDT. This study explored and used design and development research methods for the development of a search tool for use with digital video interviews. The study applied speech recognition technology, tool prototypes, usability testing, expert review, and the skills of a program developer. Results from the study determined that the produced tool provided a more comprehensive and flexible search for users to locate content from within AECT Legends and Legacies Project video interviews. / Doctor of Philosophy / It is becoming more common for researchers to use existing recordings in studies. Prior to examination, the information about the recordings and within the recordings must be determined to allow users the ability to search information. The purpose of this small-scale study was to develop an online search tool that allows users to locate spoken words within a video interview. The study is important to the field of instructional design and technology (IDT) as the video interviews used in this study contain experience and insight from people who have advanced the field of IDT. Using current and free technology, this study developed a practical search tool to search information from AECT Legends and Legacies Project video interviews.
|
5 |
Resource-dependent acoustic and language modeling for spoken keyword searchChen, I-Fan 27 May 2016 (has links)
In this dissertation, three research directions were explored to alleviate two major issues, i.e., the use of incorrect models and training/test condition mismatches, in the modeling frameworks of modern spoken keyword search (KWS) systems. Each of the three research directions, which include (i) data-efficient training processes, (ii) system optimization objectives, and (iii) data augmentation, utilizes different types and amounts of training resources in different ways to ameliorate the two issues of acoustic and language modeling in modern KWS systems. To be more specific, resource-dependent keyword modeling, keyword-boosted sMBR (state-level minimum Bayes risk) training, and multilingual acoustic modeling are proposed and investigated for acoustic modeling in this research. For language modeling, keyword-aware language modeling, discriminative keyword-aware language modeling, and web text augmented language modeling are presented and discussed. The dissertation provides a comprehensive collection of solutions and strategies to the acoustic and language modeling problems in KWS. It also offers insights into the realization of good-performance KWS systems. Experimental results show that the data-efficient training process and data augmentation are the two directions providing the most prominent performance improvement for KWS systems. While modifying system optimization objectives provides smaller yet consistent performance enhancement in KWS systems with different configurations. The effects of the proposed acoustic and language modeling approaches in the three directions are also shown to be additive and can be combined to further improve the overall KWS system performance.
|
6 |
[en] DISTRIBUTED RDF GRAPH KEYWORD SEARCH / [pt] BUSCA DISTRIBUÍDA EM GRAFO RDF POR PALAVRA-CHAVEDANILO MORET RODRIGUES 26 December 2014 (has links)
[pt] O objetivo desta dissertação é melhorar a busca por palavra-chave em
formato RDF. Propomos uma abordagem escalável, baseada numa representação
tensorial, que permite o armazenamento distribuído e, como consequência, o uso
de técnicas de paralelismo para agilizar a busca sobre grandes bases de RDF, em
particular, as publicadas como Linked Data. Um volume sem precedentes de
informação está sendo disponibilizado seguindo os princípios de Linked Data,
formando o que chamamos de Web of Data. Esta informação, tipicamente
codificada como triplas RDF, costuma ser representada como um grafo, onde
sujeitos e objetos são vértices, e predicados são arestas ligando os vértices. Em
consequência da ampla adoção de mecanismos de busca na World Wide Web,
usuários estão familiarizados com a busca por palavra-chave. No caso de grafos
RDF, no entanto, a extração de uma partição coerente de grafos para enriquecer os
resultados da busca é uma tarefa cara, demorada, e cuja expectativa do usuário é
de que seja executada em tempo real. Este trabalho tem como objetivo o
tratamento deste problema. Parte de uma solução proposta recentemente prega a
indexação do grafo RDF como uma matriz esparsa, que contém um conjunto de
informações pré-computadas para agilizar a extração de seções do grafo, e o uso
de consultas baseadas em tensores sobre a matriz esparsa. Esta abordagem
baseada em tensores permite que se tome vantagem de técnicas modernas de
programação distribuída, e.g., a utilização de bases de dados não-relacionais
fracionadas e o modelo de MapReduce. Nesta dissertação, propomos o desenho e
exploramos a viabilidade da abordagem baseada em tensores, com o objetivo de
construir um depósito de dados distribuído e agilizar a busca por palavras-chave
com uma abordagem paralela. / [en] The goal of this dissertation is to improve RDF keyword search. We
propose a scalable approach, based on a tensor representation that allows for
distributed storage, and thus the use of parallel techniques to speed up the search
over large linked data sets, in particular those published as Linked Data. An
unprecedented amount of information is becoming available following the
principles of Linked Data, forming what is called the Web of Data. This
information, typically codified as RDF subject-predicate-object triples, is
commonly abstracted as a graph which subjects and objects are nodes, and
predicates are edges connecting them. As a consequence of the widespread
adoption of search engines on the World Wide Web, users are familiar with
keyword search. For RDF graphs, however, extracting a coherent subset of data
graphs to enrich search results is a time consuming and expensive task, and it is
expected to be executed on-the-fly at user prompt. The dissertation s goal is to
handle this problem. A recent proposal has been made to index RDF graphs as a
sparse matrix with the pre-computed information necessary for faster retrieval of
sub-graphs, and the use of tensor-based queries over the sparse matrix. The tensor
approach can leverage modern distributed computing techniques, e.g., nonrelational
database sharding and the MapReduce model. In this dissertation, we
propose a design and explore the viability of the tensor-based approach to build a
distributed datastore and speed up keyword search with a parallel approach.
|
7 |
Suporte a consultas temporais por palavras-chave em documentos XML / Supporting temporal keyword queries on XML documentsManica, Edimar January 2010 (has links)
Consultas por palavras-chave permitem o acesso fácil a dados XML, uma vez que não exigem que o usuário aprenda uma linguagem de consulta estruturada nem estude possíveis esquemas de dados complexos. Com isso, vários motores de busca XML foram propostos para permitir a extração de fragmentos XML relevantes para consultas por palavras-chave. No entanto, esses motores de busca tratam as expressões temporais da mesma forma que qualquer outra palavra-chave. Essa abordagem ocasiona inúmeros problemas, como por exemplo, considerar como casamentos para uma expressão temporal nodos do domínio preço ou código. Este trabalho descreve TPI (Two Phase Interception), uma abordagem que permite o suporte a consultas temporais por palavras-chave em documentos XML orientados a dados. O suporte a consultas temporais é realizado através de uma camada adicional de software que executa duas interceptações no processamento de consultas, realizado por um motor de busca XML. Esta camada adicional de software é responsável pelo tratamento adequado das informações temporais presentes na consulta e no conteúdo dos documentos XML. O trabalho ainda especifica TKC (Temporal Keyword Classification), uma classificação de consultas temporais que serve de guia para qualquer mecanismo de consulta por palavras-chave, inclusive TPI. São apresentados os algoritmos de mapeamento das diferentes formas de predicados temporais por palavras-chave, especificadas em TKC, para expressões relacionais a fim de orientar a implementação do processamento das consultas temporais. É proposto um índice temporal e definidas estratégias para identificação de caminhos temporais, desambiguação de formatos de valores temporais, identificação de datas representadas por vários elementos e identificação de intervalos temporais. São demonstrados experimentos que comparam a qualidade, o tempo de processamento e a escalabilidade de um motor de busca XML com e sem a utilização de TPI. A principal contribuição desse trabalho é melhorar significativamente a qualidade dos resultados de consultas temporais por palavras-chave em documentos XML. / Keyword queries enable users to easily access XML data, since the user does not need to learn a structured query language or study possibly complex data schemas. Therewith, several XML search engines have been proposed to extract relevant XML fragments in response to keyword queries. However, these search engines treat the temporal expressions as any other keyword. This approach may lead to several problems. It could, for example, consider prices and codes as matches to a temporal expression. This work describes TPI (Two Phase Interception), an approach that supports temporal keyword queries on data-centric XML documents. The temporal query support is performed by adding an additional software layer that executes two interceptions in the query processing performed by a XML search engine. This additional software layer is responsible for the adequate treatment of the temporal expressions contained in the query and in the contents of the XML documents. This work also specifies TKC (Temporal Keyword Classification), a temporal query classification to be used as guidance for any keyword query mechanism, including TPI. We present the algorithms for mapping different temporal predicates expressed by keywords to relational expressions in order to guide the implementation of the temporal query processing. We propose a temporal index together with strategies to perform temporal path identification, format disambiguation, identification of dates represented by many elements and detection of temporal intervals. This work also reports on experiments which evaluate quality, processing time and scalability of an XML search engine with TPI and without TPI. The main contribution of this work is the significant improvement in the quality of the results of temporal keyword queries on XML documents.
|
8 |
Suporte a consultas temporais por palavras-chave em documentos XML / Supporting temporal keyword queries on XML documentsManica, Edimar January 2010 (has links)
Consultas por palavras-chave permitem o acesso fácil a dados XML, uma vez que não exigem que o usuário aprenda uma linguagem de consulta estruturada nem estude possíveis esquemas de dados complexos. Com isso, vários motores de busca XML foram propostos para permitir a extração de fragmentos XML relevantes para consultas por palavras-chave. No entanto, esses motores de busca tratam as expressões temporais da mesma forma que qualquer outra palavra-chave. Essa abordagem ocasiona inúmeros problemas, como por exemplo, considerar como casamentos para uma expressão temporal nodos do domínio preço ou código. Este trabalho descreve TPI (Two Phase Interception), uma abordagem que permite o suporte a consultas temporais por palavras-chave em documentos XML orientados a dados. O suporte a consultas temporais é realizado através de uma camada adicional de software que executa duas interceptações no processamento de consultas, realizado por um motor de busca XML. Esta camada adicional de software é responsável pelo tratamento adequado das informações temporais presentes na consulta e no conteúdo dos documentos XML. O trabalho ainda especifica TKC (Temporal Keyword Classification), uma classificação de consultas temporais que serve de guia para qualquer mecanismo de consulta por palavras-chave, inclusive TPI. São apresentados os algoritmos de mapeamento das diferentes formas de predicados temporais por palavras-chave, especificadas em TKC, para expressões relacionais a fim de orientar a implementação do processamento das consultas temporais. É proposto um índice temporal e definidas estratégias para identificação de caminhos temporais, desambiguação de formatos de valores temporais, identificação de datas representadas por vários elementos e identificação de intervalos temporais. São demonstrados experimentos que comparam a qualidade, o tempo de processamento e a escalabilidade de um motor de busca XML com e sem a utilização de TPI. A principal contribuição desse trabalho é melhorar significativamente a qualidade dos resultados de consultas temporais por palavras-chave em documentos XML. / Keyword queries enable users to easily access XML data, since the user does not need to learn a structured query language or study possibly complex data schemas. Therewith, several XML search engines have been proposed to extract relevant XML fragments in response to keyword queries. However, these search engines treat the temporal expressions as any other keyword. This approach may lead to several problems. It could, for example, consider prices and codes as matches to a temporal expression. This work describes TPI (Two Phase Interception), an approach that supports temporal keyword queries on data-centric XML documents. The temporal query support is performed by adding an additional software layer that executes two interceptions in the query processing performed by a XML search engine. This additional software layer is responsible for the adequate treatment of the temporal expressions contained in the query and in the contents of the XML documents. This work also specifies TKC (Temporal Keyword Classification), a temporal query classification to be used as guidance for any keyword query mechanism, including TPI. We present the algorithms for mapping different temporal predicates expressed by keywords to relational expressions in order to guide the implementation of the temporal query processing. We propose a temporal index together with strategies to perform temporal path identification, format disambiguation, identification of dates represented by many elements and detection of temporal intervals. This work also reports on experiments which evaluate quality, processing time and scalability of an XML search engine with TPI and without TPI. The main contribution of this work is the significant improvement in the quality of the results of temporal keyword queries on XML documents.
|
9 |
Suporte a consultas temporais por palavras-chave em documentos XML / Supporting temporal keyword queries on XML documentsManica, Edimar January 2010 (has links)
Consultas por palavras-chave permitem o acesso fácil a dados XML, uma vez que não exigem que o usuário aprenda uma linguagem de consulta estruturada nem estude possíveis esquemas de dados complexos. Com isso, vários motores de busca XML foram propostos para permitir a extração de fragmentos XML relevantes para consultas por palavras-chave. No entanto, esses motores de busca tratam as expressões temporais da mesma forma que qualquer outra palavra-chave. Essa abordagem ocasiona inúmeros problemas, como por exemplo, considerar como casamentos para uma expressão temporal nodos do domínio preço ou código. Este trabalho descreve TPI (Two Phase Interception), uma abordagem que permite o suporte a consultas temporais por palavras-chave em documentos XML orientados a dados. O suporte a consultas temporais é realizado através de uma camada adicional de software que executa duas interceptações no processamento de consultas, realizado por um motor de busca XML. Esta camada adicional de software é responsável pelo tratamento adequado das informações temporais presentes na consulta e no conteúdo dos documentos XML. O trabalho ainda especifica TKC (Temporal Keyword Classification), uma classificação de consultas temporais que serve de guia para qualquer mecanismo de consulta por palavras-chave, inclusive TPI. São apresentados os algoritmos de mapeamento das diferentes formas de predicados temporais por palavras-chave, especificadas em TKC, para expressões relacionais a fim de orientar a implementação do processamento das consultas temporais. É proposto um índice temporal e definidas estratégias para identificação de caminhos temporais, desambiguação de formatos de valores temporais, identificação de datas representadas por vários elementos e identificação de intervalos temporais. São demonstrados experimentos que comparam a qualidade, o tempo de processamento e a escalabilidade de um motor de busca XML com e sem a utilização de TPI. A principal contribuição desse trabalho é melhorar significativamente a qualidade dos resultados de consultas temporais por palavras-chave em documentos XML. / Keyword queries enable users to easily access XML data, since the user does not need to learn a structured query language or study possibly complex data schemas. Therewith, several XML search engines have been proposed to extract relevant XML fragments in response to keyword queries. However, these search engines treat the temporal expressions as any other keyword. This approach may lead to several problems. It could, for example, consider prices and codes as matches to a temporal expression. This work describes TPI (Two Phase Interception), an approach that supports temporal keyword queries on data-centric XML documents. The temporal query support is performed by adding an additional software layer that executes two interceptions in the query processing performed by a XML search engine. This additional software layer is responsible for the adequate treatment of the temporal expressions contained in the query and in the contents of the XML documents. This work also specifies TKC (Temporal Keyword Classification), a temporal query classification to be used as guidance for any keyword query mechanism, including TPI. We present the algorithms for mapping different temporal predicates expressed by keywords to relational expressions in order to guide the implementation of the temporal query processing. We propose a temporal index together with strategies to perform temporal path identification, format disambiguation, identification of dates represented by many elements and detection of temporal intervals. This work also reports on experiments which evaluate quality, processing time and scalability of an XML search engine with TPI and without TPI. The main contribution of this work is the significant improvement in the quality of the results of temporal keyword queries on XML documents.
|
10 |
Generation and Ranking of Candidate Networks of Relations for Keyword Search over Relational DatabasesOliveira, Péricles Silva de, 21-98498-9543 28 April 2017 (has links)
Submitted by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-08-22T19:40:10Z
No. of bitstreams: 2
Tese - Péricles Silva de Oliveira.pdf: 1875380 bytes, checksum: 014ba89b7fe1929a1461c9d8d3959416 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-08-22T19:40:26Z (GMT) No. of bitstreams: 2
Tese - Péricles Silva de Oliveira.pdf: 1875380 bytes, checksum: 014ba89b7fe1929a1461c9d8d3959416 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Approved for entry into archive by Divisão de Documentação/BC Biblioteca Central (ddbc@ufam.edu.br) on 2017-08-22T19:40:44Z (GMT) No. of bitstreams: 2
Tese - Péricles Silva de Oliveira.pdf: 1875380 bytes, checksum: 014ba89b7fe1929a1461c9d8d3959416 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) / Made available in DSpace on 2017-08-22T19:40:44Z (GMT). No. of bitstreams: 2
Tese - Péricles Silva de Oliveira.pdf: 1875380 bytes, checksum: 014ba89b7fe1929a1461c9d8d3959416 (MD5)
license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Previous issue date: 2017-04-28 / Several systems proposed for processing keyword queries over relational databases rely on the
generation and evaluation of Candidate Networks (CNs), i.e., networks of joined database relations
that, when processed as SQL queries, provide a relevant answer to the input keyword
query. Although the evaluation of CNs has been extensively addressed in the literature, problems
related to efficiently generating meaningful CNs have received much less attention. To
generate useful CNs is necessary to automatically locating, given a handful of keywords, relations
in the database that may contain relevant pieces of information, and determining suitable
ways of joining these relations to satisfy the implicit information need expressed by a user when
formulating her query. In this thesis, we present two main contributions related to the processing
of Candidate Networks. As our first contribution, we present a novel approach for generating
CNs, in which possible matchings of the query in database are efficiently enumerated at first.
These query matches are then used to guide the CN generation process, avoiding the exhaustive
search procedure used by current state-of-art approaches. We show that our approach allows
the generation of a compact set of CNs that leads to superior quality answers, and that demands
less resources in terms of processing time and memory. As our second contribution, we initially
argue that the number of possible Candidate Networks that can be generated by any algorithm
is usually very high, but that, in fact, only very few of them produce answers relevant to the
user and are indeed worth processing. Thus, there is no point in wasting resources processing
useless CNs. Then, based on such an argument, we present an algorithm for ranking CNs, based
on their probability of producing relevant answers to the user. This relevance is estimated based
on the current state of the underlying database using a probabilistic Bayesian model we have
developed. By doing so we are able do discard a large number of CNs, ultimately leading to
better results in terms of quality and performance. Our claims and proposals are supported by a
comprehensive set of experiments we carried out using several query sets and datasets used in
previous related work and whose results we report and analyse here. / Sem resumo.
|
Page generated in 0.0303 seconds