1 |
Rizikové chování ETL procesů v prostředí datového skladu / Risk Behaviour of ETL Processes in a Data WarehouseKošinová, Kateřina January 2015 (has links)
This thesis is about hazardous of ETL processes in their data warehouse. In the first part of this thesis I have defined the ETL processes and the aim of this thesis. The second part is about theoretical solutions needed to create a data warehouse, the definition of ETL processes and discovering potential risks. The third part is about discovering potential risks of ETL processes using an analysis and risk assessment. This part also includes a control of the potential risks. The fourth part concentrates on modifying the ETL processes to prevent potential risks. An important part of this chapter is an emergency plan containing necessary processes which must be applied in case of a risk. The fifth part of this thesis is a summary of all knowledge found during the analysis and development.
|
2 |
Utilization of ETL Processes for Geographical Data Migration : A Case Study at Metria ABSihvola, Toni January 2024 (has links)
In this study, the safety of using ETL processes to migrate geographical data between heterogeneous data sources was investigated, as well as whether certain data structures are more prone to integrity loss during such migrations. Geographical data in various vector structures was migrated using ETL software, FME, from a legacy data source (Oracle 11g with integrated Esri geodatabases) to another (PostgreSQL 14.10 with the PostGIS extension) in order to explore the aforementioned challenges. The maintenance of data integrity post-migration was assessed by comparing the difference between the geodata housed in Oracle 11g (the source) and PostgreSQL 14.10 (the destination) using ArcGIS Pro's built-in tools and a Python script. Further evaluation of the role of ETL processes in geographical data migration included conducting interviews with specialists in databases, data migration, and FME both before and after the migration. The study concludes that different vector structures are affected differently. Whereas points and lines maintained 100% data integrity across all datasets, polygons achieved 99.95% accuracy in one out of the three tested datasets. Managing this issue can be addressed by implementing a repair process during the Transform stage of an ETL process. However, such a process does not guarantee an entirely successful outcome; although the affected area was significantly reduced post-repair, the polygons contained a higher amount of mismatches. / I denna studie undersöktes om ETL-processer kan användas på ett säkert sätt för att migrera geografiska data mellan heterogena datakällor, samt om vissa datastrukturer är mer benägna att förlora integritet under sådana migrationer. Geografiskt data i olika vektorstrukturer migrerades med hjälp av ETL-programvaran FME, från en föråldrad datakälla (Oracle 11g med integrerade Esri geodatabaser) till en annan (PostgreSQL 14.10 med PostGIS-tillägget) för att utforska de ovannämnda frågorna. Dataintegritet mättes genom att jämföra skillnaden mellan geodatan på Oracle 11g (källan) och PostgreSQL 14.10 (destinationen) med hjälp av ArcGIS Pro's inbyggda verktyg och ett Python skript. För att ytterligare utvärdera rollen av ETL-processer i migrering av geografiskt data genomfördes intervjuer med specialister inom databaser, datamigration och FME, både före och efter migrationen. Studien konstaterar att olika vektorstrukturer påverkas olika. Medan punkter och linjer bibehöll 100% datatillförlitlighet över alla dataset, uppnådde polygoner 99,95% noggrannhet i ett av de tre testade dataseten. Hantering av detta problem kan adresseras genom att implementera en reparationsprocess under Transform-steget av en ETL-process. Dock garanterar inte en sådan process ett helt lyckat resultat; även om den påverkade arean minskades avsevärt efter reparationen, innehöll polygonerna ett högre antal avvikelser.
|
3 |
A BPMN-based conceptual language for designing ETL processesEl Akkaoui, Zineb 27 June 2014 (has links)
Business Intelligence (BI) is the set of techniques and technologies that support the decision-making process by providing an aggregated insight on data in the organization. Due to the numerous potentially useful data hold by the events and applications running in the organization, the BI market calls for new technologies able to suitably exploit it for analysis wherever it is available. In particular, the Extract, Transform, and Load (ETL) processes, the fundamental BI technology responsible for integrating and cleansing organization data, must respond to these requirements.<p><p>However, the development of ETL processes is still considered to be very complex and time-consuming, to such a point that roughly 80% of the BI project effort is dedicated to the ETL development. Among the phases of ETL development life cycle, ETL modeling is a critical and laborious task. Actually, this phase produces<p>the first effective formal representation of the ETL process, i.e. ETL model, that is completely reused and refined in the subsequent phases of the development.<p><p>Typically, the ETL processes are modeled using vendor-specific ETL tools from the very beginning of development. However, these tools are unsuitable for business users since they induce overwhelming fine-grained models.<p><p>As an attempt to provide more appropriate tools to business users, vendor-independent ETL modeling languages have been proposed in the literature. Nevertheless, they still remain immature. In order to get a precise view on these languages, we conduct a survey which: i) defines a set of criteria associated to major ETL<p>requirements identified in the literature; ii) compares the surveyed conceptual languages, issued from research work, to the physical languages, issued from prominent ETL tools; and iii) studies the whole methodologies of ETL development associated<p>to these modeling languages.<p><p>The analysis of our survey reveals several drawbacks in responding to the ETL requirements. Particularly, the conceptual languages have incomplete elements for ETL modeling with few or no formalization. Several languages are only descriptive with no ability to be automatically implemented into executable code, nor are they able to be automatically maintained according to changes over time.<p><p>To address these shortcomings, we present, in this thesis, a novel approach that tackles the whole development life cycle of ETL processes. <p><p>First, we propose a new vendor-independent language aiming at modeling ETL processes similar to typical business processes, the processes responsible for managing the operations in an organization. The rational behind this proposal is to provide ETL processes with better access to data in events and applications of the organization, including fresh data, and better design capabilities such as available analysis for any users. By using the standard representation mechanism denoted BPMN (Business Process Modeling and Notation) and a classification of ETL elements resulting from a study of the most used commercial and open source ETL tools, the language enables building agile and full-edged ETL processes. We name our language BPMN4ETL to refer to BPMN for ETL processes.<p><p>Second, we build a model-driven framework that provides automatic code generation capability and ameliorates maintenance support of our ETL language. We use the Model-Driven Development (MDD) technology as it helps in developing software, particularly in automating the transformation from one phase of the software development to another. We present a set of model-to-text transformations able to produce code for different business process engines and ETL engines. Also, we depict the model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using a case study is conducted as an initial validation to show that the framework covering modeling, implementation and maintenance could be used in practice.<p><p> To illustrate new concepts introduced in the thesis, mainly the BPMN4ETL language, and the implementation and maintenance framework, we use a case study from the fictitious Northwind Traders company, a retailer company that imports and exports foods from around the world. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished
|
4 |
Řízení podnikové výkonnosti a její implementace v rámci personálních informačních systémů / Corporate Performance Management and Its Implementation in Human Resources Information SystemsScholz, Martin January 2014 (has links)
The thesis addresses the issue of developing indicators focusing on measuring human capital, which will serve as a reporting output from the data warehouse. Goal is propose a set of indicators that will be able to cover the overall picture of corporate human resources. I focused mainly on building sets of indicators for measuring the area of human resources and human capital.
|
Page generated in 0.0495 seconds