Global ETD Search

211	Term selection in information retrieval Maxwell, Kylie Tamsin January 2016 (has links) Systems trained on linguistically annotated data achieve strong performance for many language processing tasks. This encourages the idea that annotations can improve any language processing task if applied in the right way. However, despite widespread acceptance and availability of highly accurate parsing software, it is not clear that ad hoc information retrieval (IR) techniques using annotated documents and requests consistently improve search performance compared to techniques that use no linguistic knowledge. In many cases, retrieval gains made using language processing components, such as part-of-speech tagging and head-dependent relations, are offset by significant negative effects. This results in a minimal positive, or even negative, overall impact for linguistically motivated approaches compared to approaches that do not use any syntactic or domain knowledge. In some cases, it may be that syntax does not reveal anything of practical importance about document relevance. Yet without a convincing explanation for why linguistic annotations fail in IR, the intuitive appeal of search systems that ‘understand’ text can result in the repeated application, and mis-application, of language processing to enhance search performance. This dissertation investigates whether linguistics can improve the selection of query terms by better modelling the alignment process between natural language requests and search queries. It is the most comprehensive work on the utility of linguistic methods in IR to date. Term selection in this work focuses on identification of informative query terms of 1-3 words that both represent the semantics of a request and discriminate between relevant and non-relevant documents. Approaches to word association are discussed with respect to linguistic principles, and evaluated with respect to semantic characterization and discriminative ability. Analysis is organised around three theories of language that emphasize different structures for the identification of terms: phrase structure theory, dependency theory and lexicalism. The structures identified by these theories play distinctive roles in the organisation of language. Evidence is presented regarding the value of different methods of word association based on these structures, and the effect of method and term combinations. Two highly effective, novel methods for the selection of terms from verbose queries are also proposed and evaluated. The first method focuses on the semantic phenomenon of ellipsis with a discriminative filter that leverages diverse text features. The second method exploits a term ranking algorithm, PhRank, that uses no linguistic information and relies on a network model of query context. The latter focuses queries so that 1-5 terms in an unweighted model achieve better retrieval effectiveness than weighted IR models that use up to 30 terms. In addition, unlike models that use a weighted distribution of terms or subqueries, the concise terms identified by PhRank are interpretable by users. Evaluation with newswire and web collections demonstrates that PhRank-based query reformulation significantly improves performance of verbose queries up to 14% compared to highly competitive IR models, and is at least as good for short, keyword queries with the same models. Results illustrate that linguistic processing may help with the selection of word associations but does not necessarily translate into improved IR performance. Statistical methods are necessary to overcome the limits of syntactic parsing and word adjacency measures for ad hoc IR. As a result, probabilistic frameworks that discover, and make use of, many forms of linguistic evidence may deliver small improvements in IR effectiveness, but methods that use simple features can be substantially more efficient and equally, or more, effective. Various explanations for this finding are suggested, including the probabilistic nature of grammatical categories, a lack of homomorphism between syntax and semantics, the impact of lexical relations, variability in collection data, and systemic effects in language systems. 025.04
212	Application of a Temporal Database Framework for Processing Event Queries January 2012 (has links) abstract: This dissertation presents the Temporal Event Query Language (TEQL), a new language for querying event streams. Event Stream Processing enables online querying of streams of events to extract relevant data in a timely manner. TEQL enables querying of interval-based event streams using temporal database operators. Temporal databases and temporal query languages have been a subject of research for more than 30 years and are a natural fit for expressing queries that involve a temporal dimension. However, operators developed in this context cannot be directly applied to event streams. The research extends a preexisting relational framework for event stream processing to support temporal queries. The language features and formal semantic extensions to extend the relational framework are identified. The extended framework supports continuous, step-wise evaluation of temporal queries. The incremental evaluation of TEQL operators is formalized to avoid re-computation of previous results. The research includes the development of a prototype that supports the integrated event and temporal query processing framework, with support for incremental evaluation and materialization of intermediate results. TEQL enables reporting temporal data in the output, direct specification of conditions over timestamps, and specification of temporal relational operators. Through the integration of temporal database operators with event languages, a new class of temporal queries is made possible for querying event streams. New features include semantic aggregation, extraction of temporal patterns using set operators, and a more accurate specification of event co-occurrence. / Dissertation/Thesis / Ph.D. Computer Science 2012 Computer science event stream processing query languages temporal queries
213	Formalização do processo de tradução de consultas em ambientes de integração de dados XML / Formalization of a query translation process in XML data integration Alves, Willian Bruno Gomes January 2008 (has links) A fim de consultar uma mesma informação em fontes XML heterogêneas seria desejável poder formular uma única consulta em relação a um esquema global conceitual e então traduzi-la automaticamente para consultas XML para cada uma das fontes. CXPath (Conceptual XPath) é uma proposta de linguagem para consultar fontes XML em um nível conceitual. Essa linguagem foi desenvolvida para simplificar o processo de tradução de consultas em nível conceitual para consultas em nível XML. Ao mesmo tempo, a linguagem tem como objetivo a facilidade de aprendizado de sua sintaxe. Por essa razão, sua sintaxe é bastante semelhante à da linguagem XPath utilizada para consultar documentos XML. Nesta dissertação é definido formalmente o mecanismo de tradução de consultas em nível conceitual, escritas em CXPath, para consultas em nível XML, escritas em XPath. É mostrado o tratamento do relacionamento de herança no mecanismo de tradução, e é feita uma discussão sobre a relação entre a expressividade do modelo conceitual e o mecanismo de tradução. Existem situações em que a simples tradução de uma consulta CXPath não contempla alguns resultados, pois as fontes de dados podem ser incompletas. Neste trabalho, o modelo conceitual que constitui o esquema global do sistema de integração de dados é estendido com dependências de inclusão e o mecanismo de resolução de consultas é modificado para lidar com esse tipo de dependência. Mais especificamente, são apresentados mecanismos de reescrita e eliminação de redundâncias de consultas a fim de lidar com essas dependências. Com o aumento de expressividade do esquema global é possível inferir resultados, a partir dos dados disponíveis no sistema de integração, que antes não seriam contemplados com a simples tradução de uma consulta. Também é apresentada a abordagem para integração de dados utilizada nesta dissertação de acordo com o arcabouço formal para integração de dados proposto por (LENZERINI, 2002). De acordo com o autor, tal arcabouço é geral o bastante para capturar todas as abordagens para integração de dados da literatura, o que inclui a abordagem aqui mostrada. / In order to search for the same information in heterogeneous XML data sources, it would be desirable to state a single query against a global conceptual schema and then translate it automatically into an XML query for each specific data source. CXPath (for Conceptual XPath ) has been proposed as a language for querying XML sources at the conceptual level. This language was developed to simplify the translation process of queries at conceptual level to queries at XML level. At the same time, one of the goals of the language design is to facilitate the learning of its syntax. For this reason its syntax is similar to the XPath language used for querying XML documents. In this dissertation, a translation mechanism of queries at conceptual level, written in CXPath, to queries at XML level, written in XPath, is formally defined. The inheritance relationship in the translation mechanism is shown, being discussed the relation between the conceptual model expressivity and the translation mechanism. In some cases, the translation of a CXPath query does not return some of the answers because the data sources may be incomplete. In this work, the conceptual model, which is the basis for the data integration system’s global schema, is improved with inclusion dependencies, and the query answering mechanism is modified to deal with this kind of dependency. More specifically, mechanisms of query rewriting and redundancy elimination are presented to deal with this kind of dependency. This global schema improvement allows infer results, with the data available in the system, that would not be provided with a simple query translation. The approach of data integration used in this dissertation is also presented within the formal framework for data integration proposed by (LENZERINI, 2002). According to the author, that framework is general enough to capture all approaches in the literature, including, in particular, the approach considered in this dissertation. Banco : Dados XML (Linguagem de marcação) Data integration Query answering XML
214	L'interaction au service de l'optimisation à grande échelle des entrepôts de données relationnels / / Kerkad, Amira 11 December 2013 (has links) La technologie de base de données est un environnement adéquat pour l’interaction. Elle peutconcerner plusieurs composantes du SGBD : (a) les données, (b) les requêtes, (c) les techniques d’optimisationet (d) les supports de stockage. Au niveau des données, les corrélations entre les attributs sont très communesdans les données du monde réel, et ont été exploitées pour définir les vues matérialisées et les index. Au niveaurequêtes, l’interaction a été massivement étudiée sous le problème d’optimisation multi-requêtes. Les entrepôtsde données avec leurs jointures en étoile augmentent le taux d’interaction. L’interaction des requêtes a étéemployée pour la sélection des techniques d’optimisation comme les index. L’interaction contribue égalementdans la sélection multiple des techniques d’optimisation comme les vues matérialisées, les index, lepartitionnement et le clustering. Dans les études existantes, l’interaction concerne une seule composante. Danscette thèse, nous considérons l’interaction multi-composante, avec trois techniques d’optimisation, où chacuneconcerne une composante : l’ordonnancement des requêtes (niveau requêtes), la fragmentation horizontale(niveau données) et la gestion du buffer (niveau support de stockage). L’ordonnancement des requêtes (OR)consiste à définir un ordre d’exécution optimal pour les requêtes pour permettre à quelques requêtes debénéficier des données pré-calculées. La fragmentation horizontale (FH) divise les instances de chaque relationen sous-ensembles disjoints. La gestion du buffer (GB) consiste à allouer et remplacer les données dans l’espacebuffer disponible pour réduire le coût de la charge. Habituellement, ces problèmes sont traités soit de façonisolée ou par paire comme la GB et l’OR. Cependant, ces problèmes sont similaires et complémentaires. Uneformalisation profonde pour le scénario hors-ligne et en-ligne des problèmes est fournie et un ensembled’algorithmes avancés inspirés du comportement naturel des abeilles sont proposés. Nos propositions sontvalidées en utilisant un simulateur et un SGBD réel (Oracle) avec le banc d’essai star schema benchmark àgrande échelle. / The database technology is an adequate environment for the interaction. It may concern severalcomponents of the DBMS: (a) the data, (b) the queries, (c) the optimization techniques and (d) the devices. Atthe data level, correlations between attributes are extremely common in the real world relational data, and havebeen exploited to define materialized views and indexes. At the query level, interaction has been massivelystudied under the problem of multi-query optimization. The data warehouses with their star join queriesincrease the rate of the interaction. The query interaction has been used for selecting optimization techniquessuch as indexes. The interaction also contributes in selecting multiple optimization techniques such asmaterialized views, indexes, data partitioning and the clustering. In existing studies, the interaction concernsonly one component. In this thesis, we consider the multi-component interaction, with three optimizationtechniques, where each one concerns one component: the query scheduling (query level), the horizontal datapartitioning (data level) and the buffer management (device level). The query scheduling (QS) consists indefining an optimal order of executing queries to allow some queries to get benefit from already processed data.The horizontal data partitioning (HDP) divides the instances of each relation into disjoint subsets. The buffermanagement (BM) consists in allocating and replacing data in the buffer pool to lower the cost of queries.Usually, these problems are treated either in isolation or pairwise such as BM and QS. However, these problemsare similar and complementary. A deep formalization for off-line and online scenario of these problems is givenand advanced algorithms inspired from natural bees behavior are proposed. Our proposal has been validatedusing a simulator and real DBMS (Oracle) using a large scale of star schema benchmark. Optimisation des requêtes Interaction Support de stockage Query optimization Interaction Storage device
215	Diseño e implementación de un lenguaje de consulta para bases de datos de grafos Ríos Díaz, Gonzalo Andrés January 2013 (has links) Magíster en Ciencias, Mención Computación / Ingeniero Civil Matemático / Las bases de datos de grafos son un modelo que ha ganado terreno en los últimos años, dada la necesidad de modelar situaciones complejas en donde el modelo relacional no es suficiente. En este trabajo introducimos el tema haciendo una revisión del estado del arte de las bases de datos de grafos, explicando algunas de sus aplicaciones reales, definiendo algunos de los diferentes modelos teóricos y analizando las implementaciones más importantes que existen en la realidad. De nuestro análisis concluimos que la gran falencia en el tema es la ausencia de un lenguaje de consulta formal, con una sintaxis y semántica clara, y que tenga un buen equilibrio entre expresividad y complejidad. Nuestra propuesta para solucionar este problema es utilizar Converse-PDL como lenguaje de consulta de bases de datos de grafos, definiéndolo formalmente, y demostrando que su complejidad teórica es óptima. Además, mostramos que la expresividad de este lenguaje es suficiente para una gran cantidad de aplicaciones. Una vez definido nuestro lenguaje, procedemos a diseñar una implementación eficiente, definiendo los algoritmos y las estructuras de datos necesarias, cuidando de cumplir todas las restricciones que están presentes en nuestro modelo computacional. Luego, procedemos a realizar la implementación en sí, describiendo en detalle las representaciones internas de los distintos elementos, respaldando con resultados experimentales las decisiones tomadas. Además, explicamos las distintas mejoras y optimizaciones que realizamos, con el fin de obtener la mayor eficiencia posible. Una vez terminada la implementación, procedemos a explicar todos los archivos programados y la interfaz de usuario implementada, con el fin de facilitar los futuros desarrollos. Con el fin de validar nuestra implementación, procedemos a diseñar un experimento para evaluar cuantitativamente el desempeño de nuestra implementación. El experimento diseñado se ejecuta en nuestra implementación, así como en otras tres implementaciones existentes, escogiendo los proyectos más competitivos, para así realizar una comparación objetiva. Finalizamos nuestro trabajo con las conclusiones obtenidas a partir de los experimentos, destacando los aspectos más importantes de nuestra implementación, y exponiendo algunas ideas a desarrollar en el futuro. Administración de bases de datos Gráficos por computador Query PDL
216	Semantos : a semantically smart information query language Crous, Theodorus 29 November 2009 (has links) Enterprise Information Integration (EII) is rapidly becoming one of the pillars of modern corporate information systems. Given the spread and diversity of information sources in an enterprise, it has become increasingly difficult for decision makers to have access to relevant and accurate information at the opportune time. It has therefore become critical to seamlessly integrate the diverse information stores found in an organization into a single coherent data source. This is the job of EII and one of the key components to making it work is harnessing the implied meaning or semantics hidden within data sources. Modern EII systems are capable of harnessing semantic information and ontologies to make integration across data stores possible. These systems do not, however, allow a consumer of the integration service to build queries with semantic meaning. This is due to the fact that most EII systems make use of XQuery, SQL, or both, as query languages, neither of which has the capability to build semantically rich queries. In this thesis Semantos (from the Greek word sema for “sign or token”) is proposed as a viable alternative: an information query language based in XML, which is capable of exploiting ontologies, enabling consumers to build semantically enriched queries. An exploration is made into the characteristics or requirements that Semantos needs to satisfy as a semantically smart information query language. From these requirements we design and develop a software implementation. The benefit of Semantos is that it possesses a query structure that allows automated processes to decompose and restructure the queries without human intervention. We demonstrate the applicability of Semantos using two realistic examples: a query enhancement- and a query translation service. Both expound the ability of a Semantos query to be manipulated by automated services to achieve Information Integration goals. / Dissertation (MSc)--University of Pretoria, 2009. / Computer Science / unrestricted Semantics Ontologies Query language Xml Enterprise information integration UCTD
217	Plan Bouquets : An Exploratory Approach to Robust Query Processing Dutt, Anshuman January 2016 (has links) (PDF) Over the last four decades, relational database systems, with their mathematical basis in first-order logic, have provided a congenial and efficient environment to handle enterprise data during its entire life cycle of generation, storage, maintenance and processing. An organic reason for their pervasive popularity is intrinsic support for declarative user queries, wherein the user only specifies the end objectives, and the system takes on the responsibility of identifying the most efficient means, called “plans”, to achieve these objectives. A crucial input to generating efficient query execution plans are the compile-time estimates of the data volumes that are output by the operators implementing the algebraic predicates present in the query. These volume estimates are typically computed using the “selectivities” of the predicates. Unfortunately, a pervasive problem encountered in practice is that these selectivities often differ significantly from the values actually encountered during query execution, leading to poor plan choices and grossly inflated response times. While the database research community has spent considerable efforts to address the above challenge, the prior techniques all suffer from a systemic limitation - the inability to provide any guarantees on the execution performance. In this thesis, we materially address this long-standing open problem by developing a radically different query processing strategy that lends itself to attractive guarantees on run-time performance. Specifically, in our approach, the compile-time estimation process is completely eschewed for error-prone selectivities. Instead, from the set of optimal plans in the query’s selectivity error space, a limited subset called the “plan bouquet”, is selected such that at least one of the bouquet plans is 2-optimal at each location in the space. Then, at run time, an exploratory sequence of cost-budgeted executions from the plan bouquet is carried out, eventually finding a plan that executes to completion within its assigned budget. The duration and switching of these executions is controlled by a graded progression of isosurfaces projected onto the optimal performance profile. We prove that this construction provides viable guarantees on the worst-case performance relative to an oracular system that magically possesses accurate apriori knowledge of all selectivities. Moreover, it ensures repeatable execution strategies across different invocations of a query, an extremely desirable feature in industrial settings. Our second contribution is a suite of techniques that substantively improve on the performance guarantees offered by the basic bouquet algorithm. First, we present an algorithm that skips carefully chosen executions from the basic plan bouquet sequence, leveraging the observation that an expensive execution may provide better coverage as compared to a series of cheaper siblings, thereby reducing the aggregate exploratory overheads. Next, we explore randomized variants with regard to both the sequence of plan executions and the constitution of the plan bouquet, and show that the resulting guarantees are markedly superior, in expectation, to the corresponding worst case values. From a deployment perspective, the above techniques are appealing since they are completely “black-box”, that is, non-invasive with regard to the database engine, implementable using only API features that are commonly available in modern systems. As a proof of concept, the bouquet approach has been fully prototyped in QUEST, a Java-based tool that provides a visual and interactive demonstration of the bouquet identification and execution phases. In similar spirit, we propose an efficient isosurface identification algorithm that avoids exploration of large portions of the error space and drastically reduces the effort involved in bouquet construction. The plan bouquet approach is ideally suited for “canned” query environments, where the computational investment in bouquet identification is amortized over multiple query invocations. The final contribution of this thesis is extending the advantage of compile-time sub-optimality guarantees to ad hoc query environments where the overheads of the off-line bouquet identification may turn out to be impractical. Specifically, we propose a completely revamped bouquet algorithm that constructs the cost-budgeted execution sequence in an “on-the-fly” manner. This is achieved through a “white-box” interaction style with the engine, whereby the plan output cardinalities exposed by the engine are used to compute lower bounds on the error-prone selectivities during plan executions. For this algorithm, the sub-optimality guarantees are in the form of a low order polynomial of the number of error-prone selectivities in the query. The plan bouquet approach has been empirically evaluated on both PostgreSQL and a commercial engine ComOpt, over the TPC-H and TPC-DS benchmark environments. Our experimental results indicate that it delivers orders of magnitude improvements in the worst-case behavior, without impairing the average-case performance, as compared to the native optimizers of these systems. In absolute terms, the worst case sub-optimality is upper bounded by 20 across the suite of queries, and the average performance is empirically found to be within a factor of 4 wrt the optimal. Even with the on-the-fly bouquet algorithm, the guarantees are found to be within a factor of 3 as compared to those achievable in the corresponding canned query environment. Overall, the plan bouquet approach provides novel performance guarantees that open up exciting possibilities for robust query processing. Robust Query Processing Plan Bouquets Improve Robustness Bounds Plan Bouquet Architecture Query Processing Monadic Second Order Logic Plan Bouquet Approach Plan Bouquet Sequence Query Processing Computer Science
218	KNN Query Processing in Wireless Sensor and Robot Networks Xie, Wei January 2014 (has links) In Wireless Sensor and Robot Networks (WSRNs), static sensors report event information to one of the robots. In the k nearest neighbour query processing problem in WSRNs, the robot receives event report needs to find exact k nearest robots (KNN) to react to the event, among those connected to it. We are interested in localized solutions, which avoid message flooding to the whole network. Several existing methods restrict the search within a predetermined boundary. Some network density-based estimation algorithms were proposed but they either result in large message transmission or require the density information of the whole network in advance which is complex to implement and lacks robustness. Algorithms with tree structures lead to the excessive energy consumption and large latency caused by structural construction. Itinerary based approaches generate large latency or unsatisfactory accuracy. In this thesis, we propose a new method to estimate a search boundary, which is a circle centred at the query point. Two algorithms are presented to disseminate the message to robots of interest and aggregate their data (e.g. the distance to query point). Multiple Auction Aggregation (MAA) is an algorithm based on auction protocol, with multiple copies of query message being disseminated into the network to get the best bidding from each robot. Partial Depth First Search (PDFS) attempts to traverse all the robots of interest with a query message to gather the data by depth first search. This thesis also optimizes a traditional itinerary-based KNN query processing method called IKNN and compares this algorithm with our proposed MAA and PDFS algorithms. The experimental results followed indicate that the overall performance of MAA and PDFS outweighs IKNN in WSRNs. Wireless Sensor and Robot Networks Wireless Sensor Networks KNN Query Processing
219	Provenance of visual interpretations in the exploration of data Al-Naser, Aqeel January 2015 (has links) The thesis addresses the problem of capturing and tracking multi-user interpretations of 3D spatial datasets. These interpretations are completed after the end of the visualization pipeline to identify and extract features of interest, and are subjective to human intuition and knowledge. Users may also assess regions of these interpretations. Consequently, the thesis proposes a provenance-enabled interpretation pipeline. It adopts and extends the W3C PROV data model, producing a provenance model for visual interpretations. This was implemented for seismic imaging interpretation in a proof-of-concept prototype architecture and application. Accumulation of users' interpretations and annotations are captured by the provenance model in a fine-grained form. The captured provenance information can be utilised to filter data. The work of this thesis was evaluated in three parts. First, a usability evaluation by geoscientists was conducted by postgraduate students in the field of geoscience to illustrate the system's ability in allowing users to amend others' interpretations and trace the history of amendments. Second, a conceptual evaluation of this research was approached by interviewing domain experts. The importance of this research to the industry was assured. Interviewees perceived and shared potential implementations of this work in the workflow of seismic interpretation. Limitations and concerns of the work were highlighted. Third, a performance evaluation was conducted to illustrate the behaviour of the architecture on commodity machines as well as on a multi-node parallel database, such that a new functionality in fine-grained provenance can be implemented simply but with an acceptable performance in realistic visualization tasks. The measures suggested that the current implementation achieved an acceptable performance in comparison to conventional methods. The proposed provenance model in an interpretation pipeline is believed to be a promising shift in methods of data management and storage which can record and preserve interpretations by users as a result of visualization. The approach and software development in this thesis represented a step in this direction. 006.6
220	Towards automatic grading of SQL queries Venkatamuniyappa, Vijay Kumar January 1900 (has links) Master of Science / Department of Computer Science / Doina Caragea / An Introduction to Databases course involves learning the concepts of data storage, manipulation, and retrieval. Relational databases provide an ideal learning path for understanding database concepts. The Structured Query Language (SQL) is a standard language for interacting with relational database. Each database vendor implements a variation of the SQL standard. Furthermore, a particular question that asks for some data can be written in many ways, using somewhat similar or structurally different SQL queries. Evaluation of SQL queries for correctness involves the verification of the SQL syntax and semantics, as well as verification of the output of queries and the usage of correct clauses. An evaluation tool should be independent of the specific database queried, and of the nature of the queries, and should allow multiple ways of providing input and retrieving the output. In this report, we have developed an evaluation tool for SQL queries, which checks for correctness of MySQL and PostgreSQL queries with the help of a parser that can identify SQL clauses. The tool developed will act as a portal for students to test and improve their queries, and finally to submit the queries for grading. The tool minimizes the manual effort required while grading, by taking advantage of the SQL parser to check queries for correctness, provide feedback, and allow submission. Structured Query Language (SQL) parser relational database automatic assignment grading

Search results