Global ETD Search

1	An agent-based adaptive join algorithm for building data warehouses Yu, Qicheng January 2012 (has links) Making better business decisions in an efficient way is the key to succeeding in today's competitive world. Organisations seeking to improve their decision-making process can be overwhelmed by the sheer volume and complexity of data available from their various operational information systems. Many organisations have responded to this challenge by employing data warehousing technologies to make full use of the information in their systems and address real-world business problems. As organisations move their operation to the Internet to take the advantages of the new technologies, the data warehouse environments for the organisations become more distributed and dynamic. Meanwhile, applications of a data warehouse have evolved from reporting and decision support systems to mission critical decision making systems, which require data warehouses to combine both historical and current data from operational systems. This presents both challenges and opportunities in the designing and developing of new data warehouse systems for supporting decision-making processes which can deliver the right information, to right people, at the right time, interactively and securely. In typical distributed data warehouse architectures both the logical layer and physical layer of the data warehouse are used to map physical tables in distributed data marts. The physical layer contains historical data materialised in a longer time period while most recent data is only available from the logical layer. To extract knowledge from this data is often expensive, as it usually requires complex queries involving a series of joins and aggregations. Many commercial data warehouse systems place limits on such operations at runtime or sacrifice precision by using approximate replication. The join operation is one of the most expensive operations in query processing as it combines, compares and merges potentially large data sets. Joining large tables could consume a significant amount of the system resources including CPU, disk, buffer and network bandwidth. Consequently join performance has a considerable impact on overall system performance especially in a distributed data warehouse environment. The traditional 'optimise-then-execute' query processing paradigm is inadequate in this case. This thesis investigates the evolution of data warehouses to identify architecture suitable for highly distributed data warehouses and studied the feasibility and effectiveness of utilising software agent technology for distributed information systems. A novel agent- based adaptive join algorithm called AJoin for effective and efficient online join operations in distributed data warehouses has been proposed to seamlessly integrate dynamic integration approach with traditional data warehousing technologies to address the issues arising from distributed and dynamic data warehouse environments. Taking into consideration data warehouse features, AJoin utilises intelligent agents for dynamic optimisation and coordination of join processing at run time. Key aspects of the AJoin algorithm have been implemented and evaluated against other modern adaptive join algorithms. The experimental evaluation results demonstrate that AJoin consistently outperforms other adaptive join algorithms under various distributed and dynamic data warehouse environments in this study. The outcome of this research has been very encouraging. The average performance of AJoin in matching the first 50 tuples has improved as much as 67% and overall join performance has improved more than 35% compared with other join algorithms in a distributed and dynamic data warehouse environment. 005.745
2	Analytische Bestimmung einer Datenallokation für Parallele Data Warehouses Stöhr, Thomas 16 October 2018 (has links) Die stark wachsende Bedeutung der Analyse von Data Warehouse-Inhalten und bequemere Anfrageschnittstellen für Endbenutzer erhöhen das Aufkommen an OLAP-Queries signifikant. Bei der Reduktion des Arbeitsumfanges und dem Erreichen kurzer Antwortzeiten für diese komplexen Anfragen ist neben der Nutzung von Verarbeitungs- und I/O-Parallelität eine adäquate Datenallokation der Schlüssel zu guter Leistungsfähigkeit. Allerdings ist die Bestimmung einer geeigneten Fragmentierung und Allokation für große Datenmengen, wie sie z.B. in Form von Faktentabellen oder Indexstrukturen in relationalen Sternschemas vorliegen, ein schwieriges Problem. Hierfür existiert heutzutage praktisch keine Werkzeugunterstützung. Wir präsentieren daher einen Ansatz zur analytischen Bestimmung einer passenden multi-dimensionalen, hierarchischen Datenallokation. Unser Ansatz dürfte recht einfach in ein Werkzeug zur automatischen Unterstützung des Allokationsproblems integriert werden können. info:eu-repo/classification/ddc/005.745 ddc:005.745
3	An analysis of semantic data quality defiencies in a national data warehouse: a data mining approach Barth, Kirstin 07 1900 (has links) This research determines whether data quality mining can be used to describe, monitor and evaluate the scope and impact of semantic data quality problems in the learner enrolment data on the National Learners’ Records Database. Previous data quality mining work has focused on anomaly detection and has assumed that the data quality aspect being measured exists as a data value in the data set being mined. The method for this research is quantitative in that the data mining techniques and model that are best suited for semantic data quality deficiencies are identified and then applied to the data. The research determines that unsupervised data mining techniques that allow for weighted analysis of the data would be most suitable for the data mining of semantic data deficiencies. Further, the academic Knowledge Discovery in Databases model needs to be amended when applied to data mining semantic data quality deficiencies. / School of Computing / M. Tech. (Information Technology) Data warehouse Data mining Data quality mining Exploratory data mining Cluster analysis Association rule Knowledge discovery in databases National Learners’ Records Database Learner enrolment data Semantic data quality deficiencies 005.745 Data warehousing Data mining Cluster analysis Association rule mining

1

Page generated in 0.0202 seconds