1 |
Designing Conventional, Spatial, and Temporal Data Warehouses: Concepts and Methodological FrameworkMalinowski Gajda, Elzbieta 02 October 2006 (has links)
Decision support systems are interactive, computer-based information systems that provide data and analysis tools in order to better assist managers on different levels of organization in the process of decision making. Data warehouses (DWs) have been developed and deployed as an integral part of decision support systems.
A data warehouse is a database that allows to store high volume of historical data required for analytical purposes. This data is extracted from operational databases, transformed into a coherent whole, and loaded into a DW during the extraction-transformation-loading (ETL) process.
DW data can be dynamically manipulated using on-line analytical processing (OLAP) systems. DW and OLAP systems rely on a multidimensional model that includes measures, dimensions, and hierarchies. Measures are usually numeric additive values that are used for quantitative evaluation of different aspects about organization. Dimensions provide different analysis perspectives while hierarchies allow to analyze measures on different levels of detail.
Nevertheless, currently, designers as well as users find difficult to specify multidimensional elements required for analysis. One reason for that is the lack of conceptual models for DW and OLAP system design, which would allow to express data requirements on an abstract level without considering implementation details. Another problem is that many kinds of complex hierarchies arising in real-world situations are not addressed by current DW and OLAP systems.
In order to help designers to build conceptual models for decision-support systems and to help users in better understanding the data to be analyzed, in this thesis we propose the MultiDimER model - a conceptual model used for representing multidimensional data for DW and OLAP applications. Our model is mainly based on the existing ER constructs, for example, entity types, attributes, relationship types with their usual semantics, allowing to represent the common concepts of dimensions, hierarchies, and measures. It also includes a conceptual classification of different kinds of hierarchies existing in real-world situations and proposes graphical notations for them.
On the other hand, currently users of DW and OLAP systems demand also the inclusion of spatial data, visualization of which allows to reveal patterns that are difficult to discover otherwise. The advantage of using spatial data in the analysis process is widely recognized since it allows to reveal patterns that are difficult to discover otherwise.
However, although DWs typically include a spatial or a location dimension, this dimension is usually represented in an alphanumeric format. Furthermore, there is still a lack of a systematic study that analyze the inclusion as well as the management of hierarchies and measures that are represented using spatial data.
With the aim of satisfying the growing requirements of decision-making users, we extend the MultiDimER model by allowing to include spatial data in the different elements composing the multidimensional model. The novelty of our contribution lays in the fact that a multidimensional model is seldom used for representing spatial data. To succeed with our proposal, we applied the research achievements in the field of spatial databases to the specific features of a multidimensional model. The spatial extension of a multidimensional model raises several issues, to which we refer in this thesis, such as the influence of different topological relationships between spatial objects forming a hierarchy on the procedures required for measure aggregations, aggregations of spatial measures, the inclusion of spatial measures without the presence of spatial dimensions, among others.
Moreover, one of the important characteristics of multidimensional models is the presence of a time dimension for keeping track of changes in measures. However, this dimension cannot be used to model changes in other dimensions.
Therefore, usual multidimensional models are not symmetric in the way of representing changes for measures and dimensions. Further, there is still a lack of analysis indicating which concepts already developed for providing temporal support in conventional databases can be applied and be useful for different elements composing a multidimensional model.
In order to handle in a similar manner temporal changes to all elements of a multidimensional model, we introduce a temporal extension for the MultiDimER model. This extension is based on the research in the area of temporal databases, which have been successfully used for modeling time-varying information for several decades. We propose the inclusion of different temporal types, such as valid and transaction time, which are obtained from source systems, in addition to the DW loading time generated in DWs. We use this temporal support for a conceptual representation of time-varying dimensions, hierarchies, and measures. We also refer to specific constraints that should be imposed on time-varying hierarchies and to the problem of handling multiple time granularities between source systems and DWs.
Furthermore, the design of DWs is not an easy task. It requires to consider all phases from the requirements specification to the final implementation including the ETL process. It should also take into account that the inclusion of different data items in a DW depends on both, users' needs and data availability in source systems. However, currently, designers must rely on their experience due to the lack of a methodological framework that considers above-mentioned aspects.
In order to assist developers during the DW design process, we propose a methodology for the design of conventional, spatial, and temporal DWs. We refer to different phases, such as requirements specification, conceptual, logical, and physical modeling. We include three different methods for requirements specification depending on whether users, operational data sources, or both are the driving force in the process of requirement gathering. We show how each method leads to the creation of a conceptual multidimensional model. We also present logical and physical design phases that refer to DW structures and the ETL process.
To ensure the correctness of the proposed conceptual models, i.e., with conventional data, with the spatial data, and with time-varying data, we formally define them providing their syntax and semantics. With the aim of assessing the usability of our conceptual model including representation of different kinds of hierarchies as well as spatial and temporal support, we present real-world examples. Pursuing the goal that the proposed conceptual solutions can be implemented, we include their logical representations using relational and object-relational databases.
|
2 |
Designing conventional, spatial, and temporal data warehouses: concepts and methodological frameworkMalinowski Gajda, Elzbieta 02 October 2006 (has links)
Decision support systems are interactive, computer-based information systems that provide data and analysis tools in order to better assist managers on different levels of organization in the process of decision making. Data warehouses (DWs) have been developed and deployed as an integral part of decision support systems. <p><p>A data warehouse is a database that allows to store high volume of historical data required for analytical purposes. This data is extracted from operational databases, transformed into a coherent whole, and loaded into a DW during the extraction-transformation-loading (ETL) process. <p><p>DW data can be dynamically manipulated using on-line analytical processing (OLAP) systems. DW and OLAP systems rely on a multidimensional model that includes measures, dimensions, and hierarchies. Measures are usually numeric additive values that are used for quantitative evaluation of different aspects about organization. Dimensions provide different analysis perspectives while hierarchies allow to analyze measures on different levels of detail. <p><p>Nevertheless, currently, designers as well as users find difficult to specify multidimensional elements required for analysis. One reason for that is the lack of conceptual models for DW and OLAP system design, which would allow to express data requirements on an abstract level without considering implementation details. Another problem is that many kinds of complex hierarchies arising in real-world situations are not addressed by current DW and OLAP systems.<p><p>In order to help designers to build conceptual models for decision-support systems and to help users in better understanding the data to be analyzed, in this thesis we propose the MultiDimER model - a conceptual model used for representing multidimensional data for DW and OLAP applications. Our model is mainly based on the existing ER constructs, for example, entity types, attributes, relationship types with their usual semantics, allowing to represent the common concepts of dimensions, hierarchies, and measures. It also includes a conceptual classification of different kinds of hierarchies existing in real-world situations and proposes graphical notations for them.<p><p>On the other hand, currently users of DW and OLAP systems demand also the inclusion of spatial data, visualization of which allows to reveal patterns that are difficult to discover otherwise. The advantage of using spatial data in the analysis process is widely recognized since it allows to reveal patterns that are difficult to discover otherwise. <p><p>However, although DWs typically include a spatial or a location dimension, this dimension is usually represented in an alphanumeric format. Furthermore, there is still a lack of a systematic study that analyze the inclusion as well as the management of hierarchies and measures that are represented using spatial data. <p><p>With the aim of satisfying the growing requirements of decision-making users, we extend the MultiDimER model by allowing to include spatial data in the different elements composing the multidimensional model. The novelty of our contribution lays in the fact that a multidimensional model is seldom used for representing spatial data. To succeed with our proposal, we applied the research achievements in the field of spatial databases to the specific features of a multidimensional model. The spatial extension of a multidimensional model raises several issues, to which we refer in this thesis, such as the influence of different topological relationships between spatial objects forming a hierarchy on the procedures required for measure aggregations, aggregations of spatial measures, the inclusion of spatial measures without the presence of spatial dimensions, among others. <p><p>Moreover, one of the important characteristics of multidimensional models is the presence of a time dimension for keeping track of changes in measures. However, this dimension cannot be used to model changes in other dimensions. <p>Therefore, usual multidimensional models are not symmetric in the way of representing changes for measures and dimensions. Further, there is still a lack of analysis indicating which concepts already developed for providing temporal support in conventional databases can be applied and be useful for different elements composing a multidimensional model. <p><p>In order to handle in a similar manner temporal changes to all elements of a multidimensional model, we introduce a temporal extension for the MultiDimER model. This extension is based on the research in the area of temporal databases, which have been successfully used for modeling time-varying information for several decades. We propose the inclusion of different temporal types, such as valid and transaction time, which are obtained from source systems, in addition to the DW loading time generated in DWs. We use this temporal support for a conceptual representation of time-varying dimensions, hierarchies, and measures. We also refer to specific constraints that should be imposed on time-varying hierarchies and to the problem of handling multiple time granularities between source systems and DWs. <p><p>Furthermore, the design of DWs is not an easy task. It requires to consider all phases from the requirements specification to the final implementation including the ETL process. It should also take into account that the inclusion of different data items in a DW depends on both, users' needs and data availability in source systems. However, currently, designers must rely on their experience due to the lack of a methodological framework that considers above-mentioned aspects. <p><p>In order to assist developers during the DW design process, we propose a methodology for the design of conventional, spatial, and temporal DWs. We refer to different phases, such as requirements specification, conceptual, logical, and physical modeling. We include three different methods for requirements specification depending on whether users, operational data sources, or both are the driving force in the process of requirement gathering. We show how each method leads to the creation of a conceptual multidimensional model. We also present logical and physical design phases that refer to DW structures and the ETL process.<p><p>To ensure the correctness of the proposed conceptual models, i.e. with conventional data, with the spatial data, and with time-varying data, we formally define them providing their syntax and semantics. With the aim of assessing the usability of our conceptual model including representation of different kinds of hierarchies as well as spatial and temporal support, we present real-world examples. Pursuing the goal that the proposed conceptual solutions can be implemented, we include their logical representations using relational and object-relational databases.<p> / Doctorat en sciences appliquées / info:eu-repo/semantics/nonPublished
|
Page generated in 0.1506 seconds