• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 3
  • 1
  • Tagged with
  • 5
  • 5
  • 5
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Dimensional Analysis of Data Flow Programs

Shennat, Abdulmonem Ibrahim 24 May 2022 (has links)
Our main objective is to design Dimensional Analysis (DA) algorithms for the multidimensional dialect PyLucid of Lucid, the equational data flow language. The significance is that the DA is indispensable for an efficient implementation of multidimensional Lucid and should aid the implementation of other data flow systems, such as Google’s TensorFlow. Data flow is a form of computation in which components of multidimensional datasets (MDDs) travel on communication lines in a network of processing stations. Each processing station incrementally transforms its input MDDs to its output, another (possibly very different) MDD. MDDs are very common in Health Information Systems and data science in general. An important concept is that of relevant dimension. A dimension is relevant if the coordinate of that dimension is required to extract a value. It is very important that in calculating with MDDs we avoid non-relevant dimensions, otherwise we duplicate entries (say, in a cache) and waste time and space. Suppose, for example, that we are measuring rainfall in a region. Each individual measurement (say, of an hour’s worth of rain) is determined by location (one dimension), day, (a second dimension) and time of day (a third dimension). All three dimensions are a priori relevant. Now suppose we want the total rainfall for each day. In this MDD (call it N) the relevant dimensions are location and day, but time of day is no longer relevant and must be removed. Normally this is done manually. However, can this process be automated? We answer this question affirmatively by devising and testing algorithms that produce useful and reliable approximations (specifically, upper bounds) for the dimensionalities of the variables in a program. By dimensionality we mean the set of relevant dimensions. For example, if M is the MDD of raw rain measurements, its dimensionality is {location, day, hour}, and that of N is {location, day}. Note that the dimensionality is more than just the rank, which is simply the number of dimensions. Previously, there’s extensive research on dataflow itself, which we summarize. However, an exhaustive literature search uncovered no relevant previous DA work other than that of the GLU (Granular Lucid) project in the 90s. Unfortunately the GLU project was funded privately and remains proprietary – not even the author has access to it. Our methodology is that we proceeded incrementally, solving increasingly difficult instances of DA corresponding to increasingly sophisticated language features. We solved the case of one dimension (time), two dimensions (time and space), and multiple dimensions. We also solved the difficult problem (which the GLU team never solved) of determining the dimensionality of programs that include user defined functions, including recursively defined functions. We do this by adapting the PyLucid interpreter (to produce the DAM interpreter) to evaluating the entire program over the (finite) domain of dimensionalities. As a result, the experimentally validated algorithms in our dissertation can produce useful upper bounds for the dimensionalities of the variables in multidimensional PyLucid programs. That also includes those with user-defined functions / Graduate
2

A Study of Partitioning and Parallel UDF Execution with the SAP HANA Database

Große, Philipp, May, Norman, Lehner, Wolfgang 08 July 2014 (has links) (PDF)
Large-scale data analysis relies on custom code both for preparing the data for analysis as well as for the core analysis algorithms. The map-reduce framework offers a simple model to parallelize custom code, but it does not integrate well with relational databases. Likewise, the literature on optimizing queries in relational databases has largely ignored user-defined functions (UDFs). In this paper, we discuss annotations for user-defined functions that facilitate optimizations that both consider relational operators and UDFs. We believe this to be the superior approach compared to just linking map-reduce evaluation to a relational database because it enables a broader range of optimizations. In this paper we focus on optimizations that enable the parallel execution of relational operators and UDFs for a number of typical patterns. A study on real-world data investigates the opportunities for parallelization of complex data flows containing both relational operators and UDFs.
3

A Methodology for Domain-Specific Conceptual Data Modeling and Querying

Tian, Hao 02 May 2007 (has links)
Traditional data management technologies originating from business domain are currently facing many challenges from other domains such as scientific research. Data structures in databases are becoming more and more complex and data query functions are moving from the back-end database level towards the front-end user-interface level. Traditional query languages such as SQL, OQL, and form-based query interfaces cannot fully meet the needs today. This research is motivated by the data management issues in life science applications. I propose a methodology for domain-specific conceptual data modeling and querying. The methodology can be applied to any domain to capture more domain semantics and empower end-users to formulate a query at the conceptual level with terminologies and functions familiar to them. The query system resulting from the methodology is designed to work on all major types of database management systems (DBMS) and support end-users to dynamically define and add new domain-specific functions. That is, all user-defined functions can be either pre-defined by domain experts and/or data model creators at the time of system creation, or dynamically defined by end-users from the client side at any time. The methodology has a domain-specific conceptual data model (DSC-DM) and a domain-specific conceptual query language (DSC-QL). DSC-QL uses only the abstract concepts, relationships, and functions defined in DSC-DM. It is a user-oriented high level query language and intentionally designed to be flexible, extensible, and readily usable. DSC-QL queries are much simpler than corresponding SQL or OQL queries because of advanced features such as user-defined functions, composite and set attributes, dot-path expressions, and super-classes. DSC-QL can be translated into SQL and OQL through a dynamic mapping function, and automatically updated when the underlying database schema evolves. The operational and declarative semantics of DSC-QL are formally defined in terms of graphs. A normal form for DSC-QL as a standard format for the mappings from flexible conceptual expressions to restricted SQL or OQL statements is also defined. Two translation algorithms from normalized DSC-QL to SQL and OQL are introduced. Through comparison, DSC-QL is shown to have very good balance between simplicity and expressive power and is suitable for end-users. Implementation details of the query system are reported as well. Two prototypes have been built. One prototype is for neuroscience domain, which is built on an object-oriented DBMS. The other one is for traditional business domain, which is built on a relational DBMS.
4

Návrh databáze pro připojení systému SAP jako zdroje dat pro webovou aplikaci / Database design for connecting SAP as a data source for a Web application

MARHOUN, Lukáš January 2016 (has links)
The thesis deals with connecting SAP ERP system via local database system MS SQL Server using the tools SAP BI, data synchronization between systems and advanced usage of T-SQL language for preparing data for web applications and reports written in PHP. The thesis contains a brief overview of the SAP system and the possibility of connecting to the SAP system. The general principles of described solution can be used in conjunction with other systems and programming languages.
5

A Study of Partitioning and Parallel UDF Execution with the SAP HANA Database

Große, Philipp, May, Norman, Lehner, Wolfgang 08 July 2014 (has links)
Large-scale data analysis relies on custom code both for preparing the data for analysis as well as for the core analysis algorithms. The map-reduce framework offers a simple model to parallelize custom code, but it does not integrate well with relational databases. Likewise, the literature on optimizing queries in relational databases has largely ignored user-defined functions (UDFs). In this paper, we discuss annotations for user-defined functions that facilitate optimizations that both consider relational operators and UDFs. We believe this to be the superior approach compared to just linking map-reduce evaluation to a relational database because it enables a broader range of optimizations. In this paper we focus on optimizations that enable the parallel execution of relational operators and UDFs for a number of typical patterns. A study on real-world data investigates the opportunities for parallelization of complex data flows containing both relational operators and UDFs.

Page generated in 0.0703 seconds