• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 424
  • 73
  • 18
  • 15
  • 15
  • 15
  • 15
  • 15
  • 15
  • 14
  • 7
  • 5
  • 5
  • 3
  • 3
  • Tagged with
  • 674
  • 674
  • 274
  • 219
  • 195
  • 153
  • 128
  • 123
  • 97
  • 83
  • 80
  • 67
  • 56
  • 54
  • 53
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
611

Design And Implementation Of An OODBMS For VLSI Interconnect Parasitic Analysis

Arun, N S 07 1900 (has links) (PDF)
No description available.
612

Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3

Kroeze, J.H. (Jan Hendrik) 28 July 2008 (has links)
The thesis discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. A threedimensional array is identified as a suitable data structure to build a data cube to capture multidimensional linguistic data in a computer's temporary storage facility. It also enables online analytical processing, like slicing, to be executed on this data cube in order to reveal various subsets and presentations of the data. XML is investigated as a suitable mark-up language to permanently store such an exploitable databank of Biblical Hebrew linguistic data. This concept is illustrated by tagging a phonetic transcription of Genesis 1:1-2:3 on various linguistic levels and manipulating this databank. Transferring the data set between an XML file and a threedimensional array creates a stable environment allowing editing and advanced processing of the data in order to confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Two experiments are executed to demonstrate possible text-mining procedures. Finally, visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting the process of knowledge creation. Although the data set is very small there are exciting indications that the compilation and analysis of aggregate linguistic data may assist linguists to perform rigorous research, for example regarding the definitions of semantic functions and the mapping of these functions onto the syntactic module. / Thesis (PhD (Information Technology))--University of Pretoria, 2008. / Information Science / unrestricted
613

Návrh systému pro účely administrativy fotbalového svazu / Design of a Football Association System for Administration Purposes

Vařacha, Jan January 2015 (has links)
This master’s thesis aims to design a suitable system based on a relational database for the purposes of administrative activities of the District Football Association. The created relational database should be managed primarily by the association secretary, to a lesser extent by members of association specialist committees. The database should be able to contain all the information and records which have been dealing with the paper form so far (match fixtures, awarding fines, clubs’ fees, players’ punishments, etc.). Routine administrative work, such as reading, inserting, deleting and updating the data will be carried out through the web interface and should not place any special demands on the level of users computer skills.
614

Performance benchmarking of data-at-rest encryption in relational databases

Istifan, Stewart, Makovac, Mattias January 2022 (has links)
This thesis is based on measuring how Relational Database Management Systems utilizing data-at-rest encryption with varying AES key lengths impact the performance in terms of transaction throughput of operations through the process of a controlled experiment. By measuring the effect through a series of load tests followed by statistical analysis, the impact of adopting a specific data-at-rest encryption algorithm could be displayed. The results gathered from this experiment were measured regarding the average transactional throughput of SQL operations. An OLTP workload in the benchmarking tool HammerDB was used to generate a transactional workload. This, in turn, was used to perform load tests on SQL databases encrypted with different AES-key lengths. The data gathered from these tests then underwent statistical analysis to either keep or reject the stated hypotheses. The statistical analysis performed on the different versions of the AES-algorithm showed no significant difference in terms of transaction throughput concerning the results gathered from the load tests on MariaDB. However, statistically, significant differences are proven to exist when running the same tests on MySQL. These results answered our research question, "Is there a significant difference in transaction throughput between the AES-128, AES-192, and AES-256 algorithms used to encrypt data-at-rest in MySQL and MariaDB?". The conclusion is that the statistical evidence suggests a significant difference in transactional throughput between AES algorithms in MySQL but not in MariaDB. This conclusion led us to investigate further transactional database performance between MySQL and MariaDB, where a specific type of transaction is measured to determine if there was a difference in performance between the databases themselves using the same encryption algorithm. The statistical evidence confirmed that MariaDB vastly outperformed MySQL in transactional throughput.
615

A Plan for OLAP

Jaecksch, Bernhard, Lehner, Wolfgang, Faerber, Franz 30 May 2022 (has links)
So far, data warehousing has often been discussed in the light of complex OLAP queries and as reporting facility for operative data. We argue that business planning as a means to generate plan data is an equally important cornerstone of a data warehouse system, and we propose it to be a first-class citizen within an OLAP engine. We introduce an abstract model describing relevant aspects of the planning process in general and the requirements it poses to a planning engine. Furthermore, we show that business planning lends itself well to parallelization and benefits from a column-store much like traditional OLAP does. We then develop a physical model specifically targeted at a highly parallel column-store, and with our implementation, we show nearly linear scaling behavior.
616

The Use of Big Data in Process Management : A Literature Study and Survey Investigation

Ephraim, Ekow Esson, Sehic, Sanel January 2021 (has links)
In recent years there has been an increasing interest in understanding how organizations can utilize big data in their process management to create value and improve their processes. This is due to new challenges for process management which have arisen from increasing competition and the complexity of large data sets due to technological advancements. These large data sets have been described by scholars as big data which involves data that are so complex traditional data analysis software are not sufficient in managing or analyzing them. Because of the complexity of handling such great volumes of data there is a big gap in practical examples where organizations have incorporated big data in their process management. Therefore, in order to fill relevant gaps and contribute to advancements in this field, this thesis will explore how big data can contribute to improved process management. Hence, the aim of this thesis entailed investigating how, why and to what extent big data is used in process management. As well as to outline the purpose and challenges of using big data in process management. This was accomplished through a literature review and a survey, respectively, in order to understand how big data had previously been used to create value and improve processes in organizations. From the extensive literature review, an analysis matrix of how big data is used in process management is provided through the intersections between big data and process management dimensions. The analysis matrix showed that most of the instances in which big data was used in process management were in process analysis & improvement and process control & agility. Simply put, organizations used big data in specific activities involved in process management but not in a holistic manner. Furthermore, the limited findings from the survey indicate that the main challenges and purposes of big data use in Swedish organizations are the complexity of handling data and making statistically better decisions, respectively.
617

Sample synopses for approximate answering of group-by queries

Lehner, Wolfgang, Rösch, Philipp 22 April 2022 (has links)
With the amount of data in current data warehouse databases growing steadily, random sampling is continuously gaining in importance. In particular, interactive analyses of large datasets can greatly benefit from the significantly shorter response times of approximate query processing. Typically, those analytical queries partition the data into groups and aggregate the values within the groups. Further, with the commonly used roll-up and drill-down operations a broad range of group-by queries is posed to the system, which makes the construction of highly-specialized synopses difficult. In this paper, we propose a general-purpose sampling scheme that is biased in order to answer group-by queries with high accuracy. While existing techniques focus on the size of the group when computing its sample size, our technique is based on its standard deviation. The basic idea is that the more homogeneous a group is, the less representatives are required in order to give a good estimate. With an extensive set of experiments, we show that our approach reduces both the estimation error and the construction cost compared to existing techniques.
618

Compression Selection for Columnar Data using Machine-Learning and Feature Engineering

Persson, Douglas, Juelsson Larsen, Ludvig January 2023 (has links)
There is a continuously growing demand for improved solutions that provide both efficient storage and efficient retrieval of big data for analytical purposes. This thesis researches the use of machine-learning together with feature engineering to recommend the most cost-effective compression algorithm and encoding combination for columns in a columnar database management system (DBMS). The framework consists of a cost function calculated using compression time, decompression time, and compression ratio. An XGBoost machine-learning model is trained on labels provided by the cost function to recommend the most cost-effective combination for columnar data within a column or vector-oriented DBMS. While the methods are applied on ClickHouse, one of the most popular open-source column-oriented DBMS on the market, the results are broadly applicable to column-oriented data which share data type and characteristics with IoT telemetry data. Using billions of available rows of numeric real business data obtained at Axis Communications in Lund, Sweden, a set of features are engineered to accurately describe the characteristics of a given column. The proposed framework allows for weighting the business interests (compression time, decompression time, and compression ratio) to determine the individually optimal cost-effective solution. The model reaches an accuracy of 99% on the test dataset and an accuracy of 90.1% on unseen data by leveraging data features that are predictive of compression algorithms and encodings performances. Following ClickHouse strategies and the most suitable practices in the field, combinations of general-purpose compression algorithms and data encodings are analysed that together yield the best results in efficiently compressing the data of certain columns. Applying the unweighted recommended combinations on all columns, the framework’s performance impact was measured to increase the average compression speed by 95.46%. Reducing the time to compress the columns from 31.17 seconds to compress the data to 13.17 seconds. Additionally, the decompression speed was increased by 59.87%, reducing the time to decompress the columns from 2.63 seconds to 2.02 seconds, at the cost of decreasing the compression ratio by 66.05%. Increasing the storage requirements by 94.9 MB. In column and vector databases, chunks of data belonging to a certain column are often stored together on a disk. Therefore, choosing the right compression algorithm can lower the storage requirements and boost database throughput.
619

GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Management Systems

Paradies, Marcus, Lehner, Wolfgang, Bornhövd, Christof 25 August 2022 (has links)
Graph traversals are a basic but fundamental ingredient for a variety of graph algorithms and graph-oriented queries. To achieve the best possible query performance, they need to be implemented at the core of a database management system that aims at storing, manipulating, and querying graph data. Increasingly, modern business applications demand native graph query and processing capabilities for enterprise-critical operations on data stored in relational database management systems. In this paper we propose an extensible graph traversal framework (GRAPHITE) as a central graph processing component on a common storage engine inside a relational database management system. We study the influence of the graph topology on the execution time of graph traversals and derive two traversal algorithm implementations specialized for different graph topologies and traversal queries. We conduct extensive experiments on GRAPHITE for a large variety of real-world graph data sets and input configurations. Our experiments show that the proposed traversal algorithms differ by up to two orders of magnitude for different input configurations and therefore demonstrate the need for a versatile framework to efficiently process graph traversals on a wide range of different graph topologies and types of queries. Finally, we highlight that the query performance of our traversal implementations is competitive with those of two native graph database management systems.
620

GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Paradies, Marcus, Rudolf, Michael, Bornhövd, Christof, Lehner, Wolfgang 25 August 2022 (has links)
Native graph query and processing capabilities have become indispensable for modern business applications in enterprise-critical operations on data that is stored in relational database management systems. Traversal operations are a basic ingredient of graph algorithms and graph queries. As a consequence, they are fundamental for querying graph data in a relational database management system. In this paper we present gratin, a concise secondary index structure to speedup graph traversals in main-memory column stores. Conventional approaches for graph traversals rely on repeated full column scans, making it an inefficient approach for deep traversals on very large graphs. To tackle this challenge, we devise a novel and adaptive block-based index to handle graphs efficiently. Most importantly, gratin is updateable in constant time and allows supporting evolving graphs with frequent updates to the graph topology. We conducted an extensive evaluation on real-world data sets from different domains for a large variety of traversal queries. Our experiments show improvements of up to an order of magnitude compared to a scan-based traversal algorithm.

Page generated in 0.0663 seconds