• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 8
  • 2
  • 2
  • Tagged with
  • 15
  • 15
  • 15
  • 11
  • 7
  • 6
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
11

DataCalc: Ad-hoc Analyses on Heterogeneous Data Sources

Luong, Johannes, Habich, Dirk, Lehner, Wolfgang 19 July 2023 (has links)
Storing and processing data at different locations using a heterogeneous set of formats and data managements systems is state-of-the-art in many organizations. However, data analyses can often provide better insight when data from several sources is integrated into a combined perspective. In this paper we present an overview of our data integration system DataCalc. DataCalc is an extensible integration platform that executes adhoc analytical queries on a set of heterogeneous data processors. Our novel platform uses an expressive function shipping interface that promotes local computation and reduces data movement between processors. In this paper, we provide a discussion of the overall architecture and the main components of DataCalc. Moreover, we discuss the cost of integrating additional processors and evaluate the overall performance of the platform.
12

A Technical Perspective of DataCalc: Ad-hoc Analyses on Heterogeneous Data Sources

Luong, Johannes, Habich, Dirk, Lehner, Wolfgang 19 July 2023 (has links)
Many organizations store and process data at different locations using a heterogeneous set of formats and data management systems. However, data analyses can often provide better insight when data from several sources is integrated into a combined perspective. DataCalc is an extensible data integration platform that executes ad-hoc analytical queries on a set of heterogeneous data processors. The platform uses an expressive function shipping interface that promotes local computation and reduces data movement between processors. In this paper, we provide a detailed discussion of the architecture and implementation of DataCalc. We introduce data processors for plain files, JDBC, the MongoDB document store, and a custom in memory system. Finally, we discuss the cost of integrating additional processors and evaluate the overall performance of the platform. Our main contribution is the specification and evaluation of the DataCalc code delegation interface.
13

Intelligent Data Layer: : An approach to generating data layer from normalized database model.

Buzo, Amir January 2012 (has links)
Model View Controller (MVC) software architecture is widely spread and commonly used in application’s development. Therefore generation of data layer for the database model is able to reduce cost and time. After research on current Object Relational Mapping (ORM) tools, it was discovered that there are generating tools like Data Access Object (DAO) and Hibernate, however their usage causes problems like inefficiency and slow performance due to many connections with database and set up time. Most of these tools are trying to solve specific problems rather than generating a data layer which is an important component and the bottom layer of database centred applications. The proposed solution to the problem is an engineering approach where we have designed a tool named Generated Intelligent Data Layer (GIDL). GIDL tool generates small models which create the main data layer of the system according to the Database Model. The goal of this tool is to enable and allow software developers to work only with object without deep knowledge in SQL. The problem of transaction and commit is solved by the tool. Also filter objects are constructed for filtering the database. GIDL tool reduced the number of connections and also have a cache where to store object lists and modify them. The tool is compared under the same environment with Hibernate and showed a better performance in terms of time evaluations for the same functions. GIDL tool is beneficial for software developers, because it generates the entire data layer.
14

Bridging Language & Data : Optimizing Text-to-SQL Generation in Large Language Models / Från ord till SQL : Optimering av text-till-SQL-generering i stora språkmodeller

Wretblad, Niklas, Gordh Riseby, Fredrik January 2024 (has links)
Text-to-SQL, which involves translating natural language into Structured Query Language (SQL), is crucial for enabling broad access to structured databases without expert knowledge. However, designing models for such tasks is challenging due to numerous factors, including the presence of ’noise,’ such as ambiguous questions and syntactical errors. This thesis provides an in-depth analysis of the distribution and types of noise in the widely used BIRD-Bench benchmark and the impact of noise on models. While BIRD-Bench was created to model dirty and noisy database values, it was not created to contain noise and errors in the questions and gold queries. We found after a manual evaluation that noise in questions and gold queries are highly prevalent in the financial domain of the dataset, and a further analysis of the other domains indicate the presence of noise in other parts as well. The presence of incorrect gold SQL queries, which then generate incorrect gold answers, has a significant impact on the benchmark’s reliability. Surprisingly, when evaluating models on corrected SQL queries, zero-shot baselines surpassed the performance of state-of-the-art prompting methods. The thesis then introduces the concept of classifying noise in natural language questions, aiming to prevent the entry of noisy questions into text-to-SQL models and to annotate noise in existing datasets. Experiments using GPT-3.5 and GPT-4 on a manually annotated dataset demonstrated the viability of this approach, with classifiers achieving up to 0.81 recall and 80% accuracy. Additionally, the thesis explored the use of LLMs for automatically correcting faulty SQL queries. This showed a 100% success rate for specific query corrections, highlighting the potential for LLMs in improving dataset quality. We conclude that informative noise labels and reliable benchmarks are crucial to developing new Text-to-SQL methods that can handle varying types of noise.
15

Publikace dat ze sítě meteostanic ve formátu DATEX II / Implementation of Datex II standard for road transport weather stations

Partika, Marek January 2016 (has links)
Master’s thesis deals with implementation of a European standard DATEX II. This standard specifies the data format for information transmission in road transport. The road traffic is flowing streams of current information. For the work was selected network of meteorological stations, which will publish the measured data, ie weather conditions of road transport. Measured data will be available to consumers in the format DATEX II. Implementation will be operational in its entirety meteorological station from design to the actual web service that will produce data information for consumers.

Page generated in 0.056 seconds