Global ETD Search

1	Návrh datového skladu v SaaS společnosti / Design of Data Warehouse in SaaS Company Zetocha, Adam January 2020 (has links) The diploma thesis consists of design and steps leading to build of data warehouse in startup developing SaaS product. Theoretical information about data warehouses and business intelligence are projected into design and following process of data warehouse development mainly for marketing data. Importing process of data into a data warehouse and reporting are fully automated.
2	Post-Processing National Water Model Long-Range Forecasts with Random Forest Regression in the Cloud to Improve Forecast Accuracy for Decision-Makers and Water Managers Anderson, Jacob Matthew 19 December 2024 (has links) (PDF) Post-processing bias correction of streamflow forecasts can be useful in the hydrologic modeling workflow to fine-tune forecasts for operations, water management, and decision-making. Hydrologic model runoff simulations include errors, uncertainties, and biases, leading to less accuracy and precision for applications in real-world scenarios. We used random forest regression to correct biases and errors in streamflow predictions from the U.S. National Water Model (NWM) long-range streamflow forecasts, considering U.S. Geological Survey (USGS) gauge station measurements as a proxy for true streamflow. We used other features in model training, including watershed characteristics, time fraction of year, and lagged streamflow values, to help the model perform better in gauged and ungauged areas. We assessed the effectiveness of the bias correction technique by comparing the difference between forecast and actual streamflow before and after the bias correction model was employed. We also explored advances in hydroinformatics and cloud computing by creating and testing this bias correction capability within the Google Cloud Console environment to avoid slow and unnecessary data downloads to local devices, thereby streamlining the data processing and storage within the cloud. This demonstrates the possibility of integrating our method into the NWM real-time forecasting workflow. Results indicate reasonable bias correction is possible using the random forest regression machine learning technique. Differences between USGS discharge and NWM forecasts are less than the original difference observed after being run through the random forest model. The main issue concerning the forecasts from the NWM is that the error increases further from the reference time or start of the forecast period. The model we created shows significant improvement in streamflow the further the times get from the reference time. The error is reduced and more uniform throughout all the time steps of the 30-day long-range forecasts. streamflow forecasts bias correction machine learning random forest regression Google BigQuery cloud Engineering
3	En jämförelse av metoder och verktyg för datahantering och analys inom datalager / A comparison of methods and tools for data management and analysis within data warehouses Aziz, Adeeba January 2024 (has links) I detta examensarbete utförs en jämförande analys av metoder och verktyg för hantering och analys av data inom datalager. Med den snabbt ökande mängden data och utvecklingen av molnteknologier står företag inför utmaningen att navigera bland olika metoder för att välja den mest lämpliga för sin specifika datahantering och analysbehov. Rapporten belyser metoden One Big Table (OBT) samt verktyget Data Build Tool (dbt) och undersöker deras för- och nackdelar i datalagermiljöer. För att få en djupare förståelse för deras funktion och effektivitet jämförs de i olika användarfall genom prestandatester på latens och samtidighet med hjälp av verktyget Hyperfine. OBT implementeras med hjälp av Google BigQuery såväl som Google Cloud SQL för PostgreSQL där latens och samtidighet för analytiska målsättningar utvärderas genom användning av Python-skript med SQL-frågor respektive med dbt-modeller. Skripten och dbt-modellerna körs mot BigQuery samt PostgreSQL och de båda implementerar OBT. Resultatet visar att SQL-skripten uppvisade lägre latens än dbt-modeller när de exekverades mot både BigQuery och PostgreSQL. Ett annat fynd är att latensen för SQL-skripten var lägre i PostgreSQL jämfört med BigQuery, medan dbt-modellerna istället uppvisade högre latens i PostgreSQL jämfört med BigQuery. I båda datalagermiljöer visas det även att SQL-skripten presterar bättre än dbt-modeller vid samtidiga körningar. / This bachelor’s thesis presents a comparative analysis of methods and tools for data management and analysis within data warehouses. With the rapidly increasing volume of data and the development of cloud technologies, companies face the challenge of navigating various methods to choose the most suitable one for their specific data management and analysis needs. The report highlights the One Big Table (OBT) method and the Data Build Tool (dbt), examining their advantages and disadvantages in data warehouse environments. To gain a deeper understanding of their functionality and efficiency, they are compared in different use cases through performance tests on latency and concurrency using the Hyperfine tool. OBT is implemented using Google BigQuery as well as Google Cloud SQL for PostgreSQL, where latency and concurrency for analytical purposes are evaluated using Python scripts with SQL queries and dbt models. The scripts and dbt models are run against BigQuery and PostgreSQL, both implementing OBT. The results show that the SQL scripts exhibited lower latency than the dbt models when executed against both BigQuery and PostgreSQL. Another finding is that the latency for SQL scripts was lower when run against PostgreSQL compared to BigQuery, while dbt models showed higher latency when run against PostgreSQL compared to BigQuery. The SQL scripts also performed better than the dbt models in concurrent executions in both BigQuery and PostgreSQL. Data Analysis Data Build Tool (dbt) Data Management Data Warehouse Google BigQuery One Big Table (OBT) Performance PostgreSQL SQL Dataanalys Data Build Tool (dbt) datahantering datalager Google BigQuery One Big Table (OBT) prestanda PostgreSQL SQL Computer Engineering Datorteknik

1

Page generated in 0.0351 seconds