Global ETD Search

421	[en] HIPERBOLIC PROGRAMMING IN 0-1 VARIABLES AND BIBLIOGRAPHIC DATABASES SEARCH OPTIMIZATION / [pt] PROGRAMAÇÃO HIPERBÓLICA EM VARIÁVEIS 0-1 E OTIMIZAÇÃO DE CONSULTAS A BANCOS DE DADOS BIBLIOGRAFICOS MARCUS VINICIUS SOLEDADE POGGI DE ARAGAO 31 August 2009 (has links) [pt] Neste trabalho estuda-se a resolução de problemas de otimização e síntese de consultas para recuperação de informações de bancos de dados bibliográficos, através da sua formulação como problemas de programação matemática em variáveis 0-1. Primeiramente é estudado o problema de programação hiperbólica, para o qual foram desenvolvidos algoritmos de complexidade linear. O segundo problema estudado trata de uma extensão do anterior, sendo chamado neste texto de problema de soma hiperbólica. Para este problema são desenvolvidas heurísticas dos tipos simulated annealing e steepest ascent mildest descent (tabu search), além de algoritmos exatos do tipo pesquisa arborescente. Todos os métodos descritos acima foram implementados e são apresentados resultados numéricos. Quanto à otimização de consultas, foram estudados dois problemas básicos: consultas periódicas e síntese de novas, que são formulados como problemas de programação hiperbólica e soma hiperbólica, respectivamente. Foram feitas aplicações considerando-se um banco de dados do Centro de Informações Nucleares da CNEN (Comissão Nacional de Energia Nuclear). / [en] In this work we study the solution of problems arising in the field of queries optimization in information retrieval from classical databases, through their formulation as mathematical problems in 0-1 variables. The first problem studied is the hyperbolic programming problem in 0-1 variables, for which we developed exact linear-time algorithms. The second problem studied is an extension of the former, here named as hyperbolic sum problem. For this problem we developed simulated annealing and steepest ascent-mildest descent (tabu search) heuristics, as well as exact branch-and-bound algorithms. All these methods were implemented and numerical results are presented. Concerning the problem of queries optimization, two basic problems were studied: periodical query and synthesis of new queries, which are formulated respectively as an hyperbolic programming problem and an hyperbolic sum problem. We have also done applications involving these problems, considering real data gathered from a database of Center of Nuclear Information from CNEN (Brazilian National Comission of Nucler Energy) [pt] RECUPERACAO DE INFORMACAO [en] INFORMATION RETRIEVAL [pt] OTIMIZACAO DE CONSULTAS [en] QUERY OPTIMIZATION [pt] PROGRAMACAO MULTIMIDIA
422	Modelo de consulta de dados relacionais baseada em contexto para sistemas ubíquos / Model of relational data querying based on context modelling for ubiquitous systems Maran, Vinícius January 2016 (has links) A computação ubíqua define que a computação deve estar presente em ambientes para auxiliar o usuário na realização de suas tarefas diárias de forma eficiente. Para que isto aconteça, sistemas considerados ubíquos devem ser conhecedores do contexto e devem adaptar seu funcionamento em relação aos contextos capturados do ambiente. Informações de contexto podem ser representadas de diversas formas em sistemas computacionais e pesquisas recentes demonstram que a representação destas informações baseada em ontologias apresenta vantagens importantes se comparada à outras soluções, destacando-se principalmente o alto nível de expressividade e a padronização de linguagens para a representação de ontologias. Informações consideradas específicas de domínio são frequentemente representadas em bancos de dados relacionais. Esta diferença em relação a modelos de representação, com o uso de ontologias para representação de contexto e representação relacional para informações de domínio, implica em uma série de problemas no que se refere à adaptação e distribuição de conteúdo em arquiteturas ubíquas. Dentre os principais problemas pode-se destacar a dificuldade de alinhamento entre as informações de domínio e de contexto, a dificuldade na distribuição destas informações entre arquiteturas ubíquas e as diferenças entre modelagens de contexto e de domínio (o conhecimento sobre os objetos do domínio). Este trabalho apresenta um framework de consulta entre informações de contexto e informações de domínio. Com a aplicação deste framework, a recuperação contextualizada de informações se tornou possível, utilizando a expressividade necessária para a modelagem de contexto através de ontologias e utilizando esquemas relacionais previamente definidos e utilizados por sistemas de informação. Para realizar a avaliação do framework, o mesmo foi aplicado em um ambiente baseado no cenário motivador de pesquisa, que descreve possíveis situações de utilização de tecnologias ubíquas. Através da aplicação do framework no cenário motivador, foi possível verificar que a proposta foi capaz de realizar a integração entre contexto e domínio e permitiu estender a filtragem de consultas relacionais. / Ubiquitous computing defines the computer must be present in environments to assist the user to perform their daily tasks efficiently. Thus, ubiquitous systems must be aware of the context and should adapt its operation in relation to the captured environment contexts. Context information can be represented in different ways in computer systems, and recent research shows that the representation of context in ontologies offers important advantages when compared to other solutions, in particular, the high level of expressiveness and the standardization of languages for representation of ontologies. Domain specific information is frequently maintained in relational databases. This difference of representation models, using ontologies for context representation and relational representation to domain information, involves a number of problems as the adjustment and distribution of content in ubiquitous architectures. Related problems include the difficulty of alignment between field and context information, the difficulty in the distribution of information between ubiquitous architectures, and differences between the context and domain modeling (knowledge about the domain objects). This PhD thesis presents a framework of query for context information and domain information. On applying this framework, contextualized information retrieval becomes possible using the expressiveness required for context modeling using ontologies, and using relational schemas previously defined and used by information systems. In order to evaluate the framework, it was applied in an environment based on the motivating scenario. It was possible to verify that the framework was able to accomplish the integration of context and domain, and allowed the extension of the filtering relational queries. Banco : Dados relacionais Ontologias Computação pervasiva Context-awareness Ontology Query Ubiquitous computing Information systems Database
423	Estudo sobre o impacto da adição de vocabulários estruturados da área de ciências da saúde no Currículo Lattes Araújo, Charles Henrique de January 2016 (has links) A busca de informações em bases de dados de instituições que possuem grande volume de dados necessita cada vez mais de processos mais eficientes para realização dessa tarefa. Problemas de grafia, idioma, sinonímia, abreviação de termos e a falta de padronização dos termos, tanto nos argumentos de busca, quanto na indexação dos documentos, interferem diretamente nos resultados. Diante disso, este estudo teve como objetivo avaliar o impacto da adição de vocabulários estruturados da área de Ciências da Saúde no Currículo Lattes, na recuperação de perfis similares de pesquisadores das áreas de Ciências Biológicas e Ciências da Saúde, utilizando técnicas de mineração de dados, expansão de consultas, modelos vetoriais de consultas e utilização de algoritmo de trigramas. Foram realizados cruzamentos de informações entre as palavras-chaves de artigos publicados registrados no Currículo Lattes e as informações contidas no Medical Subject Headings (MeSH) e nos Descritores em Ciências da Saúde (DeCS), bem como comparações entre os resultados das consultas, utilizando as palavras-chaves originais e adicionando-lhes os termos resultantes do processo de expansão de consultas. Os resultados mostram que a metodologia adotada neste estudo pode incrementar qualitativamente o universo de perfis recuperados, podendo dessa forma contribuir para a melhoria dos Sistemas de Informações do Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq. / Information retrieval in large databases need increasingly more efficient ways for accomplishing this task. There are many problems, like spelling, language, synonym, acronyms, lack of standardization of terms, both in the search arguments, as in the indexing of documents. They directly interfere in the results. Thus, this study aimed to evaluate the impact of the addition of structured vocabularies of Health Sciences area in Lattes Database, in the recovery of similar profiles of researchers that work in Biological Sciences and Health Sciences, using Query Expansion, Data Mining procedures, Vector Models and Trigram Phrase Matching algorithm. Crosschecking keywords of articles registered in Lattes Database and Medical Subject Headings (MeSH) and Health Sciences Descriptors (DeCS) terms, as well as comparisons between the results of queries using the original keywords and adding them to query expansion terms. The results show that the methodology used in this study can qualitatively increase the set of recovered profiles, contributing to the improvement of CNPq Information Systems. Vocabulário controlado Sistemas de recomendação Recuperação da informação Ciências da saúde Query expansion Data mining Recommendation systems
424	Implementação de consultas para um modelo de dados temporal orientado a objetos / Implementation of queries for a temporal object data model Carvalho, Tanisi Pereira de January 1997 (has links) O modelo TF-ORM (Temporal Functionality in Objects With Roles Model) é um modelo de dados temporal orientado a objetos que utiliza o conceito de papeis para representar os diferentes comportamentos dos objetos. 0 modelo permite a modelagem dos aspectos estáticos e dinâmicos da aplicação pois considera todos os estados dos objetos ao longo de sua evolução. Sua linguagem de consulta e baseada na linguagem SQL e possibilita a recuperação de diferentes histórias do banco de dados. Este trabalho apresenta um sistema visual de consulta para o modelo TFORM. O VQS TF-ORM (Visual Query System TF-ORM) é um ambiente para recuperação de informações temporais. O sistema permite que as consultas sejam elaboradas de três formas alternativas: textual, gráfica ou por formulários. A linguagem gráfica possui o mesmo poder de expressão da linguagem textual, permitindo que a consulta seja elaborada diretamente sobre o esquema conceitual gráfico do modelo com o auxilio de um conjunto de janelas e elementos visuais. A recuperação de informações utilizando-se formulários não possui o mesmo poder de expressão da linguagem textual, mas possibilita a recuperação dos valores das propriedades de um determinado objeto através de uma hierarquia de janelas. A recuperação de informações através do sistema visual de consulta do modelo apresenta algumas facilidades tais como: representação visual dos operadores temporais do modelo, definição de níveis de detalhe e navegação sobre o esquema gráfico, armazenamento das consultas para posterior utilizando, possibilidade de representar uma consulta textual na forma visual e vice-versa, entre outras. Alem da preocupação com a definição de restrições temporais, o ambiente considera ainda as diferentes formas de apresentação do resultado da consulta que podem ser selecionadas pelo usuário. No sistema apresentado neste trabalho, o modelo TF-ORM é implementado em um banco de dados relacional que utiliza a linguagem SQL para recuperação de informações. Para a implementação do modelo em um banco de dados relacional foi feito um mapeamento, que determina como os conceitos de orientação a objetos, papel e tempo devem ser mapeados para tabelas e atributos no modelo relacional. As consultas realizadas na linguagem TF-ORM são então traduzidas para a linguagem de consulta do banco de dados relacional. O ambiente foi implementado utilizando a ferramenta para desenvolvimento de aplicações Delphi e o banco de dados Watcom, um banco de dados relacional que permite a recuperação de informações no padrão SQL/ANSI. / TF-ORM model (Temporal Functionality in Objects with Roles Model) is an object-oriented temporal data model which uses the role concept to represent different behaviors of objects. The model allows modelling of the static and the dynamic aspects of an application representing all the states of its evolution. The TF-ORM query language is based on the SQL language and enables the recovery of different database histories. This work represents a visual query system for the TF-ORM model. The VQS TF-ORM (Visual Query System TF-ORM) is an environment for recovery of temporal information. The system allows queries to be elaborated in three alternatives way: textual, graphic or by forms. The graphic language has the same functionality of the textual lan g uage permitting the query to be elaborated directly on the graphic conceptual schema of the model this operation is supported by a set of windows and visual elements. The information recovery using forms doesn't have the same functionality of the textual lan guage, but enables recovery of property values of an object through window hierarchies. Information recovery using the visual query system of the model presents some facilities: the visual representation of temporal operators, different levels of details for the navigation on the graphic schema, query storage for later use, possibility of representing a textual query in a visual way and vice-versa. The environment supports the definition of temporal constraints and the selection by the user of different representations forms for the results of a query. In the presented system, the TF-ORM model is implemented in a relational database which uses SQL language for information recovery. In order to implement the model in a relational database, a mapping was done - the concepts of the object orientation, roles and time were mapped in to tables and attributes to the relational model. The queries performed in the TF-ORM language are translated into the query lan guage of relational database. The environment was implemented using Delphi and the Watcom database, a relational database which allows information recovery in SQL/ANSI standard. Banco : Dados Banco : Dados temporais Orientacao : Objetos Database Information recovery Visual query language Temporal model
425	O estudo e desenvolvimento do protótipo de uma ferramenta de apoio a formulação de consultas a bases de dados na área da saúde / The study and development of the prototype of a tool for supporting query formulation to databases in the health area Webber, Carine Geltrudes January 1997 (has links) O objetivo deste trabalho é, através do estudo de diversas tecnologias, desenvolver o protótipo de uma ferramenta capaz de oferecer suporte ao usuário na formulacdo de uma consulta a MEDLINE (Medical Literature Analysis and Retrieval System On Line). A MEDLINE é um sistema de recuperação de informações bibliográficas, na área da biomedicina, desenvolvida pela National Library of Medicine. Ela é uma ferramenta cuja utilizando tem sido ampliada nesta área em decorrência do aumento da utilizando de literatura, disponível eletronicamente, por profissionais da área da saúde. As pessoas, em geral, buscam informação e esperam encontrá-la exatamente de acordo com as suas expectativas, de forma ágil e utilizando todas as fontes de recursos disponíveis. Foi com este propósito que surgiram os primeiros Sistema de Recuperação de Informação (SRI) onde, de forma simplificada, um usuário constrói uma consulta, a qual expressa sua necessidade de informação, em seguida o sistema a processa e os resultados obtidas através dela retornam ao usuário. Grande parte dos usuários encontram dificuldades em representar a sua necessidade de informação de forma a obter resultados satisfatórios em um SRI. Os termos que o usuário escolhe para compor a consulta nem sempre são os mesmos que o sistema reconhece. A fim de que um usuário seja bem sucedido na definição dos termos que compõem a sua consulta é aconselhável que ele conheça a terminologia que foi empregada na indexação dos itens que ele deseja recuperar ou que possa contar com um intermediário que possua esse conhecimento. Em situações em que nenhuma dessas possibilidades seja verdadeira recursos que viabilizem uma consulta bem sucedida se fazem necessários. Este trabalho, inicialmente, apresenta um estudo geral sobre os Sistemas de Recuperação de Informações (SRI), enfocando todos os processos envolvidos e relacionados ao armazenamento, organização e a própria recuperação. Posteriormente, são destacados aspectos relacionados aos vocabulários e classificações medicas em uso, os quais serão Úteis para uma maior compreensão das dificuldades encontradas pelos usuários durante a interação com um sistema com esta finalidade. E, finalmente, é apresentado o protótipo do Sistema para Formulação de Consultas a MEDLINE, bem como seus componentes e funcionalidades. O Sistema para Formulação de Consultas a MEDLINE foi desenvolvido com o intuito de permitir que o usuário utilize qualquer termo na formulação de uma consulta destinada a MEDLINE. Ele possibilita a integração de diferentes terminologias médicas, originárias de vocabulários e classificações disponíveis em língua portuguesa e atualmente em uso. Esta abordagem permite a criação de uma terminologia biomédica mais completa, sendo que cada termo mantém relacionamentos, os quais descrevem a sua semântica, com outros. / The goal of this work is, through the study of many technologies, to develop the prototype of a tool able to offer support to the user in query formulation to the MEDLINE (Medical Literature Analysis and Retrieval System On Line). The MEDLINE is a bibliographical information retrieval system in the biomedicine area developed by National Library of Medicine. It is a tool whose usefulness has been amplifyed in this area by the increase of literature utilization, eletronically available, by health care profissionals. People, in general, look for information and are interested in finding it exactly like their expectations, in an agile way and using every single information source available. With this purpouse the first Information Retrieval System (IRS ) emerged, where in a simplifyed way, a user defines a query, that expresses an information necessity and, one step ahead, the system processes it and returns to the user answers from the query. Most of the users think is difficult to represent their information necessity in order to be succesful in searching an IRS. The terms that the user selects to compose the query are not always the same that the system recognizes. In order to be successfull in the definition of the terms that will compose his/her query is advisable that the user know the terminology that was employed in the indexing process of the wanted items or that he/she can have an intermediary person who knows about it. In many situations where no one of these possibilities can be true, resources that make a successfull query possible will be needed. This work, firstly, presents a general study on IRS focusing all the process involved and related to the storage, organization and retrieval. Lately, aspects related to the medical classifications and vocabulary are emphasized, which will be usefull for a largest comprehension of the difficulties found by users during interaction with a system like this. And, finally, the prototype of the Query Formulation System to MEDLINE is presented, as well as its components and funcionalities. The Query Formulation System to MEDLINE was developed with the intention of allowing the user to use any term in the formulation of a query to the MEDLINE. It allows the integration of different medical terminologies originated from classifications and vocabulary available in Portuguese language and in use today. This approach permits the creation of a more complete biomedical terminology in which each term maintains relationships that describe its semantic. Armazenamento : Dados Recuperacao : Informacao Formulacao : Consulta Tesauro Informática médica Information retrieval Query formulation Medical terminology Thesaurus
426	Suporte a consultas temporais por palavras-chave em documentos XML / Supporting temporal keyword queries on XML documents Manica, Edimar January 2010 (has links) Consultas por palavras-chave permitem o acesso fácil a dados XML, uma vez que não exigem que o usuário aprenda uma linguagem de consulta estruturada nem estude possíveis esquemas de dados complexos. Com isso, vários motores de busca XML foram propostos para permitir a extração de fragmentos XML relevantes para consultas por palavras-chave. No entanto, esses motores de busca tratam as expressões temporais da mesma forma que qualquer outra palavra-chave. Essa abordagem ocasiona inúmeros problemas, como por exemplo, considerar como casamentos para uma expressão temporal nodos do domínio preço ou código. Este trabalho descreve TPI (Two Phase Interception), uma abordagem que permite o suporte a consultas temporais por palavras-chave em documentos XML orientados a dados. O suporte a consultas temporais é realizado através de uma camada adicional de software que executa duas interceptações no processamento de consultas, realizado por um motor de busca XML. Esta camada adicional de software é responsável pelo tratamento adequado das informações temporais presentes na consulta e no conteúdo dos documentos XML. O trabalho ainda especifica TKC (Temporal Keyword Classification), uma classificação de consultas temporais que serve de guia para qualquer mecanismo de consulta por palavras-chave, inclusive TPI. São apresentados os algoritmos de mapeamento das diferentes formas de predicados temporais por palavras-chave, especificadas em TKC, para expressões relacionais a fim de orientar a implementação do processamento das consultas temporais. É proposto um índice temporal e definidas estratégias para identificação de caminhos temporais, desambiguação de formatos de valores temporais, identificação de datas representadas por vários elementos e identificação de intervalos temporais. São demonstrados experimentos que comparam a qualidade, o tempo de processamento e a escalabilidade de um motor de busca XML com e sem a utilização de TPI. A principal contribuição desse trabalho é melhorar significativamente a qualidade dos resultados de consultas temporais por palavras-chave em documentos XML. / Keyword queries enable users to easily access XML data, since the user does not need to learn a structured query language or study possibly complex data schemas. Therewith, several XML search engines have been proposed to extract relevant XML fragments in response to keyword queries. However, these search engines treat the temporal expressions as any other keyword. This approach may lead to several problems. It could, for example, consider prices and codes as matches to a temporal expression. This work describes TPI (Two Phase Interception), an approach that supports temporal keyword queries on data-centric XML documents. The temporal query support is performed by adding an additional software layer that executes two interceptions in the query processing performed by a XML search engine. This additional software layer is responsible for the adequate treatment of the temporal expressions contained in the query and in the contents of the XML documents. This work also specifies TKC (Temporal Keyword Classification), a temporal query classification to be used as guidance for any keyword query mechanism, including TPI. We present the algorithms for mapping different temporal predicates expressed by keywords to relational expressions in order to guide the implementation of the temporal query processing. We propose a temporal index together with strategies to perform temporal path identification, format disambiguation, identification of dates represented by many elements and detection of temporal intervals. This work also reports on experiments which evaluate quality, processing time and scalability of an XML search engine with TPI and without TPI. The main contribution of this work is the significant improvement in the quality of the results of temporal keyword queries on XML documents. Recuperacao : Informacao XML (Linguagem de marcação) Banco : Dados Temporal query Keyword search XML
427	Indexing and querying dataspaces Mergen, Sérgio Luis Sardi January 2011 (has links) Over theWeb, distributed and heterogeneous sources with structured and related content form rich repositories of information commonly referred to as dataspaces. To provide access to this heterogeneous data, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the source schemas. On dataspaces, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. We propose a novel querying architecture that requires neither a mediated schema nor source mappings, which is based mainly on indexing mechanisms and on-the-fly rewriting algorithms. Our indexes are designed for data that is represented as relations, and are able to capture the structure of the sources, their instances and the connections between them. In the absence of a mediated schema, the user formulates structured queries based on what she expects to find. These queries are rewritten using a best-effort approach: the proposed rewriting algorithms compare a user query against the source schemas and produces a set of rewritings based on the matches found. Based on this architecture, two different querying approaches are tested. Experiments show that the indexing and rewriting algorithms are scalable, i.e., able to handle a very large number of structured Web sources; and that support simple, yet expressive queries that exploit the inherent structure of the data. Recuperacao : Informacao Banco : Dados Dataspaces Data integration Search engine Indexing Query rewriting
428	Unsupervised Bayesian Data Cleaning Techniques for Structured Data January 2014 (has links) abstract: Recent efforts in data cleaning have focused mostly on problems like data deduplication, record matching, and data standardization; few of these focus on fixing incorrect attribute values in tuples. Correcting values in tuples is typically performed by a minimum cost repair of tuples that violate static constraints like CFDs (which have to be provided by domain experts, or learned from a clean sample of the database). In this thesis, I provide a method for correcting individual attribute values in a structured database using a Bayesian generative model and a statistical error model learned from the noisy database directly. I thus avoid the necessity for a domain expert or master data. I also show how to efficiently perform consistent query answering using this model over a dirty database, in case write permissions to the database are unavailable. A Map-Reduce architecture to perform this computation in a distributed manner is also shown. I evaluate these methods over both synthetic and real data. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2014 Computer science Consistent Query Answering Databases Data Cleaning Information Retrieval Probabilistic Databases
429	Efficient Processing of Skyline Queries on Static Data Sources, Data Streams and Incomplete Datasets January 2014 (has links) abstract: Skyline queries extract interesting points that are non-dominated and help paint the bigger picture of the data in question. They are valuable in many multi-criteria decision applications and are becoming a staple of decision support systems. An assumption commonly made by many skyline algorithms is that a skyline query is applied to a single static data source or data stream. Unfortunately, this assumption does not hold in many applications in which a skyline query may involve attributes belonging to multiple data sources and requires a join operation to be performed before the skyline can be produced. Recently, various skyline-join algorithms have been proposed to address this problem in the context of static data sources. However, these algorithms suffer from several drawbacks: they often need to scan the data sources exhaustively to obtain the skyline-join results; moreover, the pruning techniques employed to eliminate tuples are largely based on expensive tuple-to-tuple comparisons. On the other hand, most data stream techniques focus on single stream skyline queries, thus rendering them unsuitable for skyline-join queries. Another assumption typically made by most of the earlier skyline algorithms is that the data is complete and all skyline attribute values are available. Due to this constraint, these algorithms cannot be applied to incomplete data sources in which some of the attribute values are missing and are represented by NULL values. There exists a definition of dominance for incomplete data, but this leads to undesirable consequences such as non-transitive and cyclic dominance relations both of which are detrimental to skyline processing. Based on the aforementioned observations, the main goal of the research described in this dissertation is the design and development of a framework of skyline operators that effectively handles three distinct types of skyline queries: 1) skyline-join queries on static data sources, 2) skyline-window-join queries over data streams, and 3) strata-skyline queries on incomplete datasets. This dissertation presents the unique challenges posed by these skyline queries and addresses the shortcomings of current skyline techniques by proposing efficient methods to tackle the added overhead in processing skyline queries on static data sources, data streams, and incomplete datasets. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2014 Computer science Data Streams Incomplete Data Multiple Datasets Skyline-Join Skyline Query Processing Skyline-Window-Join
430	[en] EXPERIMENTAL STUDY OF CONJUNCTIVE QUERIES OPTIMIZATION WITH EXPENSIVE PREDICATES / [pt] ESTUDO EXPERIMENTAL DE ALGORITMOS PARA OTIMIZAÇÃO DE CONSULTAS CONJUNTIVAS COM PREDICADOS CAROS RODRIGO SILVA GUARINO 12 July 2004 (has links) [pt] As técnicas tradicionais de otimização de consultas em banco de dados possuem como heurística fundamental a organização dos predicados de uma consulta em dois tipos principais: predicados simples e predicados envolvendo junção(join) de tabelas. Como príncipio geral considera-se a priori os predicados envolvendo junção bem mais caros do que os predicados simples, e também que não existam diferenças significativas entre os tempos de processamento dos predicados simples, o que leva o otimizador a executar primeiro os predicados simples(em uma ordem qualquer), a fim de se diminuir a quantidade de tuplas que seriam necessárias à execução da junção. Essa consideração que se aplica bem à maioria das aplicações convencionais de banco de dados, passou a não se aplicar mais à novas aplicações que envolviam o preprocessamento de dados e/ou funções complexas nos predicados que não envolviam junções. Dessa forma esses novos predicados simples passaram a ter um tempo de processamento não mais desprezível em relação aos predicados que envolviam junções e também em relação a outros predicados simples. Dessa forma a heurística principal de otimização não se aplicava mais e tornou-se necessário o desenvolvimento de novas técnicas para resolver consultas que envolvessem esse novo tipo de predicado, que passou a ser chamado de predicado caro. O presente trabalho tem dois objetivos principais: apresentar um framework que possibilite o desenvolvimento, teste e análise integrada de algoritmos para o processamento de predicados caros, e analisar o desempenho de quatro implementações de algoritmos baseados na abordagem Cherry Picking, cujo o objetivo é explorar a dependência entre os dados que compõem as consultas. Os experimentos são conduzidos em consultas envolvendo predicados conjuntivos (AND) e a idéia geral é tentar avaliar os atributos em uma ordem que minimize o custo de avaliação geral das tuplas. / [en] Traditional database query optimization technique have as its main heuristic the organization of predicates in two main types: selection predicates and join predicates. Join predicates are considered much more expensive than selection predicates. In additional, it's also considered that there's no big difference among the costs of different selection predicates, what makes the optimizer executes them first in any order, reducing the number of tuples necessary to execute join predicates.This assumption, that is well applied in traditional database applications, becomes invalid in respect of recent database applications, that executes complex functions over complex data in selection predicates. In this cases, selection predicates are considered more expensive than join predicates and their costs cannot be considered equivalent anymore. This makes the main heuristic of push down selections invalid for these kind of new selection predicates which calls for new optimization techniques. These type of cue named expensive predicates. This work has two main objectives: Present a software that makes possible the development, test and integrat analisys of different algorithms for evaluating expensive predicates and analyse the performance of four algorithm's implementations that are based on Cherry Picking strategy, which aims at exploring the data dependency between input values to expensive predicates. The experiments considered conjunctive(AND) queries, and the general idea is to try evaluate the attributes in a order that minimizes the general cost of the tuples. [pt] OTIMIZACAO DE CONSULTAS [en] QUERY OPTIMIZATION [pt] ALGORITMO [en] ALGORITHM [pt] PREDICADOS CAROS [en] EXPENSIVE PREDICATES

Search results