Global ETD Search

21	Towards automatic grading of SQL queries Venkatamuniyappa, Vijay Kumar January 1900 (has links) Master of Science / Department of Computer Science / Doina Caragea / An Introduction to Databases course involves learning the concepts of data storage, manipulation, and retrieval. Relational databases provide an ideal learning path for understanding database concepts. The Structured Query Language (SQL) is a standard language for interacting with relational database. Each database vendor implements a variation of the SQL standard. Furthermore, a particular question that asks for some data can be written in many ways, using somewhat similar or structurally different SQL queries. Evaluation of SQL queries for correctness involves the verification of the SQL syntax and semantics, as well as verification of the output of queries and the usage of correct clauses. An evaluation tool should be independent of the specific database queried, and of the nature of the queries, and should allow multiple ways of providing input and retrieving the output. In this report, we have developed an evaluation tool for SQL queries, which checks for correctness of MySQL and PostgreSQL queries with the help of a parser that can identify SQL clauses. The tool developed will act as a portal for students to test and improve their queries, and finally to submit the queries for grading. The tool minimizes the manual effort required while grading, by taking advantage of the SQL parser to check queries for correctness, provide feedback, and allow submission. Structured Query Language (SQL) parser relational database automatic assignment grading
22	Word Embeddings in Database Systems Günther, Michael 18 November 2021 (has links) Research in natural language processing (NLP) focuses recently on the development of learned language models called word embedding models like word2vec, fastText, and BERT. Pre-trained on large amounts of unstructured text in natural language, those embedding models constitute a rich source of common knowledge in the domain of the text used for the training. In the NLP community, significant improvements are achieved by using those models together with deep neural network models. To support applications to benefit from word embeddings, we extend the capabilities of traditional relational database systems, which are still by far the most common DBMSs but only provide limited text analysis features. Therefore, we implement (a) novel database operations involving embedding representations to allow a database user to exploit the knowledge encoded in word embedding models for advanced text analysis operations. The integration of those operations into database query language enables users to construct queries using novel word embedding operations in conjunction with traditional query capabilities of SQL. To allow efficient retrieval of embedding representations and fast execution of the operations, we implement (b) novel search algorithms and index structures for approximated kNN-Joins and integrate those into a relational database management system. Moreover, we investigate techniques to optimize embedding representations of text values in database systems. Therefore, we design (c) a novel context adaptation algorithm. This algorithm utilizes the structured data present in the database to enrich the embedding representations of text values to model their context-specific semantic in the database. Besides, we provide (d) support for selecting a word embedding model suitable for a user's application. Therefore, we developed a data processing pipeline to construct a dataset for domain-specific word embedding evaluation. Finally, we propose (e) novel embedding techniques for pre-training on tabular data to support applications working with text values in tables. Our proposed embedding techniques model semantic relations arising from the alignment of words in tabular layouts that can only hardly be derived from text documents, e.g., relations between table schema and table body. In this way, many applications, which either employ embeddings in supervised machine learning models, e.g., to classify cells in spreadsheets, or through the application of arithmetic operations, e.g., table discovery applications, can profit from the proposed embedding techniques.:1 INTRODUCTION 1.1 Contribution 1.2 Outline 2 REPRESENTATION OF TEXT FOR NATURAL LANGUAGE PROCESSING 2.1 Natural Language Processing Systems 2.2 Word Embedding Models 2.2.1 Matrix Factorization Methods 2.2.2 Learned Distributed Representations 2.2.3 Contextualize Word Embeddings 2.2.4 Advantages of Contextualize and Static Word Embeddings 2.2.5 Properties of Static Word Embeddings 2.2.6 Node Embeddings 2.2.7 Non-Euclidean Embedding Techniques 2.3 Evaluation of Word Embeddings 2.3.1 Similarity Evaluation 2.3.2 Analogy Evaluation 2.3.3 Cluster-based Evaluation 2.4 Application for Tabular Data 2.4.1 Semantic Search 2.4.2 Data Curation 2.4.3 Data Discovery 3 SYSTEM OVERVIEW 3.1 Opportunities of an Integration 3.2 Characteristics of Word Vectors 3.3 Objectives and Challenges 3.4 Word Embedding Operations 3.5 Performance Optimization of Operations 3.6 Context Adaptation 3.7 Requirements for Model Recommendation 3.8 Tabular Embedding Models 4 MANAGEMENT OF EMBEDDING REPRESENTATIONS IN DATABASE SYSTEMS 4.1 Integration of Operations in an RDBMS 4.1.1 System Architecture 4.1.2 Storage Formats 4.1.3 User-Defined Functions 4.1.4 Web Application 4.2 Nearest Neighbor Search 4.2.1 Tree-based Methods 4.2.2 Proximity Graphs 4.2.3 Locality-Sensitive Hashing 4.2.4 Quantization Techniques 4.3 Applicability of ANN Techniques for Word Embedding kNN-Joins 4.4 Related Work on kNN Search in Database Systems 4.5 ANN-Joins for Relational Database Systems 4.5.1 Index Architecture 4.5.2 Search Algorithm 4.5.3 Distance Calculation 4.5.4 Optimization Capabilities 4.5.5 Estimation of the Number of Targets 4.5.6 Flexible Product Quantization 4.5.7 Further Optimizations 4.5.8 Parameter Tuning 4.5.9 kNN-Joins for Word2Bits 4.6 Evaluation 4.6.1 Experimental Setup 4.6.2 Influence of Index Parameters on Precision and Execution Time 4.6.3 Performance of Subroutines 4.6.4 Flexible Product Quantization 4.6.5 Accuracy of the Target Size Estimation 4.6.6 Performance of Word2Bits kNN-Join 4.7 Summary 5 CONTEXT ADAPTATION FOR WORD EMBEDDING OPTIMIZATION 5.1 Related Work 5.1.1 Graph and Text Joint Embedding Methods 5.1.2 Retrofitting Approaches 5.1.3 Table Embedding Models 5.2 Relational Retrofitting Approach 5.2.1 Data Preparation 5.2.2 Relational Retrofitting Problem 5.2.3 Relational Retrofitting Algorithm 5.2.4 Online-RETRO 5.3 Evaluation Platform: Retro Live 5.3.1 Functionality 5.3.2 Interface 5.4 Evaluation 5.4.1 Datasets 5.4.2 Training of Embeddings 5.4.3 Machine Learning Models 5.4.4 Evaluation of ML Models 5.4.5 Run-time Measurements 5.4.6 Online Retrofitting 5.5 Summary 6 MODEL RECOMMENDATION 6.1 Related Work 6.1.1 Extrinsic Evaluation 6.1.2 Intrinsic Evaluation 6.2 Architecture of FacetE 6.3 Evaluation Dataset Construction Pipeline 6.3.1 Web Table Filtering and Facet Candidate Generation 6.3.2 Check Soft Functional Dependencies 6.3.3 Post-Filtering 6.3.4 Categorization 6.4 Evaluation of Popular Word Embedding Models 6.4.1 Domain-Agnostic Evaluation 6.4.2 Evaluation of a Single Facet 6.4.3 Evaluation of an Object Set 6.5 Summary 7 TABULAR TEXT EMBEDDINGS 7.1 Related Work 7.1.1 Static Table Embedding Models 7.1.2 Contextualized Table Embedding Models 7.2 Web Table Embedding Model 7.2.1 Preprocessing 7.2.2 Text Serialization 7.2.3 Encoding Model 7.2.4 Embedding Training 7.3 Applications for Table Embeddings 7.3.1 Table Union Search 7.3.2 Classification Tasks 7.4 Evaluation 7.4.1 Intrinsic Evaluation 7.4.2 Table Union Search Evaluation 7.4.3 Table Layout Classification 7.4.4 Spreadsheet Cell Classification 7.5 Summary 8 CONCLUSION 8.1 Summary 8.2 Directions for Future Work BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLES A CONVEXITY OF RELATIONAL RETROFITTING B EVALUATION OF THE RELATIONAL RETROFITTING HYPERPARAMETERS info:eu-repo/classification/ddc/004 ddc:004
23	Using ontologies to semantify a Web information portal Chimamiwa, Gibson 01 1900 (has links) Ontology, an explicit specification of a shared conceptualisation, captures knowledge about a specific domain of interest. The realisation of ontologies, revolutionised the way data stored in relational databases is accessed and manipulated through ontology and database integration. When integrating ontologies with relational databases, several choices exist regarding aspects such as database implementation, ontology language features, and mappings. However, it is unclear which aspects are relevant and when they affect specific choices. This imposes difficulties in deciding which choices to make and their implications on ontology and database integration solutions. Within this study, a decision-making tool that guides users when selecting a technology and developing a solution that integrates ontologies with relational databases is developed. A theory analysis is conducted to determine current status of technologies that integrate ontologies with databases. Furthermore, a theoretical study is conducted to determine important features affecting ontology and database integration, ontology language features, and choices that one needs to make given each technology. Based on the building blocks stated above, an artifact-building approach is used to develop the decision-making tool, and this tool is verified through a proof-of-concept to prove the usefulness thereof. Key terms: Ontology, semantics, relational database, ontology and database integration, mapping, Web information portal. / Information Science / M. Sc. (Information Systems) Ontology Web portal Information portal Semantics Relational database 025.0422 Ontology Web portals Relational databases Database management
24	Integration of relational database metadata and XML technology to develop an abstract framework to generate automatic and dynamic web entry forms Elsheh, Mohammed Mosbah January 2009 (has links) Developing interactive web application systems requires a large amount of effort on designing database, system logic and user interface. These tasks are expensive and error-prone. Web application systems are accessed and used by many different sets of people with different backgrounds and numerous demands. Meeting these demands requires frequent updating for Web application systems which results in a very high cost process. Thus, many attempts have been made to automate, to some degree, the construction of Web user interfaces. Three main directions have been cited for this purpose. The first direction suggested of generating user interfaces from the application's data model. This path was able to generate the static layout of user interfaces with dynamic behaviour specified programmatically. The second tendency suggested deployment of the domain model to generate both, the layout of a user interface and its dynamic behaviour. Web applications built based on this approach are most useful for domain-specific interfaces with a relatively fixed user dialogue. The last direction adopted the notion of deploying database metadata to developing dynamic user interfaces. Although the notion was quite valuable, its deployment did not present a generic solution for generating a variety of types of dynamic Web user interface targeting several platforms and electronic devices. This thesis has inherited the latter direction and presented significant improvements on the current deployment of this tendency. This thesis aims to contribute towards the development of an abstract framework to generate abstract and dynamic Web user interfaces not targeted to any particular domain or platform. To achieve this target, the thesis proposed and evaluates a general notion for implementing a prototype system that uses an internal model (i.e. database metadata) in conjunction with XML technology. Database metadata is richer than any external model and provides the information needed to build dynamic user interfaces. In addition, XML technology became the mainstream of presenting and storing data in an abstract structure. It is widely adopted in Web development society because of its ability to be transformed into many different formats with a little bit of effort. This thesis finds that only Java can provide us with a generalised database metadata based framework. Other programming languages apply some restrictions on accessing and extracting database metadata from numerous database management systems. Consequently, JavaServlets and relational database were used to implement the proposed framework. In addition, Java Data Base Connectivity was used to bridge the two mentioned technologies. The implementation of our proposed approach shows that it is possible and very straightforward to produce different automatic and dynamic Web entry forms that not targeted at any platform. In addition, this approach can be applied to a particular domain without affecting the main notion or framework architecture. The implemented approach demonstrates a number of advantages over the other approaches based on external or internal models. 005.3
25	Semantinės informacijos išrinkimo iš reliacinių duomenų bazių metodas taikant ontologijas / Method of Semantic Information Retrieval from Relational Databases Using Ontologies Šukys, Algirdas 26 August 2010 (has links) Ontologijos tampa vis populiaresnės, nes leidžia organizacijoms lanksčiau aprašyti dalykinę sritį, ieškoti informacijos iš skirtingų šaltinių ir pateikti semantiškai tiksliau atrinktus rezultatus vartotojams. Tačiau, augant informacijos ontologijoje kiekiui, saugoti ją tekstiniame faile tampa neefektyvu. Šio tyrimo tikslas yra pagerinti užklausų vykdymo ontologijoje galimybes, kai jos saugojamos reliacinėje duomenų bazėje. Tam buvo sukurtas SPARQL užklausų vykdymo metodas ontologijoje, saugojamoje reliacinėje duomenų bazėje pagal OWL2RDB algoritmą. Eksperimentas patvirtino, kad esant dideliam individų skaičiui, metodas leidžia greičiau vykdyti užklausas reliacinėje duomenų bazėje saugojamoje ontologijoje, negu ontologijoje, saugojamoje tekstiniame faile. / Ontologies are becoming increasingly popular because they allow organizations to describe their problem domains in a more flexible manner and to search information from multiple sources giving semantically significant results for users. However, the increasing amount of information in ontology makes its storing in a text file not effective. The aim of this research is to improve possibilities of querying large ontologies when these are kept in relational databases. The method was created for executing SPARQL queries in ontology, stored in a relational database created by OWL2RDB algorithm. Experiments have shown that the method improves query performance time in comparison with existing query engine especially for large ontologies having many individuals. Informatics Ontologija Reliacinė duomenų bazė OWL SPARQL Ontology Relational database OWL SPARQL
26	'n Ondersoek na en bydraes tot navraaghantering en -optimering deur databasisbestuurstelsels / L. Muller Muller, Leslie January 2006 (has links) The problems associated with the effective design and uses of databases are increasing. The information contained in a database is becoming more complex and the size of the data is causing space problems. Technology must continually develop to accommodate this growing need. An inquiry was conducted in order to find effective guidelines that could support queries in general in terms of performance and productivity. Two database management systems were researched to compare die theoretical aspects with the techniques implemented in practice. Microsoft SQL Sewer and MySQL were chosen as the candidates and both were put under close scrutiny. The systems were researched to uncover the methods employed by each to manage queries. The query optimizer forms the basis for each of these systems and manages the parsing and execution of any query. The methods employed by each system for storing data were researched. The way that each system manages table joins, uses indices and chooses optimal execution plans were researched. Adjusted algorithms were introduced for various index processes like B+ trees and hash indexes. Guidelines were compiled that are independent of the database management systems and help to optimize relational databases. Practical implementations of queries were used to acquire and analyse the execution plan for both MySQL and SQL Sewer. This plan along with a few other variables such as execution time is discussed for each system. A model is used for both database management systems in this experiment. / Thesis (M.Sc. (Computer Science))--North-West University, Potchefstroom Campus, 2007. MySQL Microsoft SQL Serve Optimization Queries Index Relational database Query optimizer Execution plan
27	The Use of Relation Valued Attributes in Support of Fuzzy Data Williams, Larry Ritchie, Jr. 03 May 2013 (has links) In his paper introducing fuzzy sets, L.A. Zadeh describes the difficulty of assigning some real-world objects to a particular class when the notion of class membership is ambiguous. If exact classification is not obvious, most people approximate using intuition and may reach agreement by placing an object in more than one class. Numbers or ‘degrees of membership’ within these classes are used to provide an approximation that supports this intuitive process. This results in a ‘fuzzy set’. This fuzzy set consists any number of ordered pairs to represent both the class and the class’s degree of membership to provide a formal representation that can be used to model this process. Although the fuzzy approach to reasoning and classification makes sense, it does not comply with two of the basic principles of classical logic. These principles are the laws of contradiction and excluded middle. While they play a significant role in logic, it is the violation of these principles that gives fuzzy logic its useful characteristics. The problem of this representation within a database system, however, is that the class and its degree of membership are represented by two separate, but indivisible attributes. Further, this representation may contain any number of such pairs of attributes. While the data for class and membership are maintained in individual attributes, neither of these attributes may exist without the other without sacrificing meaning. And, to maintain a variable number of such pairs within the representation is problematic. C. J. Date suggested a relation valued attribute (RVA) which can not only encapsulate the attributes associated with the fuzzy set and impose constraints on their use, but also provide a relation which may contain any number of such pairs. The goal of this dissertation is to establish a context in which the relational database model can be extended through the implementation of an RVA to support of fuzzy data on an actual system. This goal represents an opportunity to study through application and observation, the use of fuzzy sets to support imprecise and uncertain data using database queries which appropriately adhere to the relational model. The intent is to create a pathway that may extend the support of database applications that need fuzzy logic and/or fuzzy data. Fuzzy Data Relational Database Relation Valued Attribute Computer Sciences Physical Sciences and Mathematics
28	Attribute-Level Versioning: A Relational Mechanism for Version Storage and Retrieval Bell, Charles Andrew 01 January 2005 (has links) Data analysts today have at their disposal a seemingly endless supply of data and repositories hence, datasets from which to draw. New datasets become available daily thus making the choice of which dataset to use difficult. Furthermore, traditional data analysis has been conducted using structured data repositories such as relational database management systems (RDBMS). These systems, by their nature and design, prohibit duplication for indexed collections forcing analysts to choose one value for each of the available attributes for an item in the collection. Often analysts discover two or more datasets with information about the same entity. When combining this data and transforming it into a form that is usable in an RDBMS, analysts are forced to deconflict the collisions and choose a single value for each duplicated attribute containing differing values. This deconfliction is the source of a considerable amount of guesswork and speculation on the part of the analyst in the absence of professional intuition. One must consider what is lost by discarding those alternative values. Are there relationships between the conflicting datasets that have meaning? Is each dataset presenting a different and valid view of the entity or are the alternate values erroneous? If so, which values are erroneous? Is there a historical significance of the variances? The analysis of modern datasets requires the use of specialized algorithms and storage and retrieval mechanisms to identify, deconflict, and assimilate variances of attributes for each entity encountered. These variances, or versions of attribute values, contribute meaning to the evolution and analysis of the entity and its relationship to other entities. A new, distinct storage and retrieval mechanism will enable analysts to efficiently store, analyze, and retrieve the attribute versions without unnecessary complexity or additional alterations of the original or derived dataset schemas. This paper presents technologies and innovations that assist data analysts in discovering meaning within their data and preserving all of the original data for every entity in the RDBMS. B tree version store versioning relational database database Computer Sciences Physical Sciences and Mathematics
29	Fuzzy databáze založená na E-R schématu / Fuzzy database based on an E-R schema Plachý, Milan January 2012 (has links) This text is especialy intended to those who are interested into fuzzy logic and its application in relational databases. It is mainly focused on concept of fuzzyfied relational database and implementation of such database. This text consists of two parts: theoretical aspects of fuzzyfication and implementation part. Selected extension is based on fuzzy E-R model so the requirements of the real world can be better met. This paper also describes existing solutions on different level of fuzzyfication. Part of the work is design and implementation of a simple software for querying over fuzzyfied relational database. This work shoud also serve as a guide for design and implementation of fuzzy database.
30	Aktualizace XML dat / Updating XML data Mikuš, Tomáš January 2012 (has links) Updating XML data is very wide area, which must solve a number of difficult problems. From designing language with sufficient expressive power to the XML data repository able to apply the changes. Ways to deal with them are few. From this perspective, is this work very closely dedicated only to the language XQuery. Thus, its extension for updates, for which the candidate recommendation by the W3C were published only recently. Another specialization of this work is to focus only on the XML data stored in the objectrelational database with that repository will enforce the validity of documents to the scheme described in XML Schema. This requirement, combined with the possibility of updating of data in the repository is on the contradictory requirements. In this thesis is designed language based on XQuery language, designed and implemented evaluating of the update queries of the language on the store and a description and implementation of the store in objectrelational database.

Search results