• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 70
  • 5
  • 4
  • 3
  • 1
  • 1
  • 1
  • Tagged with
  • 88
  • 65
  • 62
  • 32
  • 28
  • 28
  • 27
  • 25
  • 25
  • 24
  • 21
  • 17
  • 16
  • 15
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Forecasting checking account balance : Using supervised machine learning

Dannelind, Martin January 2022 (has links)
The introduction of open banking has made it possible for companies to build the next generation of applications based on transactional data. Enabling economic forecasts which private individuals can use to make responsible financial decisions. This project investigated forecasting account balances using supervised learning. 7 different regression models were run on transactional data from 377 anonymised checking accounts split into subgroups. The results concluded that multivariate XGBoost optimised with feature selection was the best performing forecasting model and the subgroup with recurring income transactions was easiest to forecast. Based on the result from this project it can be concluded that a viable option to forecast account balances is to split the transactional data into subgroups and forecast them separately. Minimising the errors given by certain random, infrequent and large types of transactions.
72

Employee Turnover Prediction - A Comparative Study of Supervised Machine Learning Models

Kovvuri, Suvoj Reddy, Dommeti, Lydia Sri Divya January 2022 (has links)
Background: In every organization, employees are an essential resource. For several reasons, employees are neglected by the organizations, which leads to employee turnover. Employee turnover causes considerable losses to the organization. Using machine learning algorithms and with the data in hand, a prediction of an employee’s future in an organization is made. Objectives: The aim of this thesis is to conduct a comparison study utilizing supervised machine learning algorithms such as Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost to predict an employee’s future in a company. Using evaluation metrics models are assessed in order to discover the best efficient model for the data in hand. Methods: The quantitative research approach is used in this thesis, and data is analyzed using statistical analysis. The labeled data set comes from Kaggle and includes information on employees at a company. The data set is used to train algorithms. The created models will be evaluated on the test set using evaluation measures including Accuracy, Precision, Recall, F1 Score, and ROC curve to determine which model performs the best at predicting employee turnover. Results: Among the studied features in the data set, there is no feature that has a significant impact on turnover. Upon analyzing the results, the XGBoost classifier has better mean accuracy with 85.3%, followed by the Random Forest classifier with 83% accuracy than the other two algorithms. XGBoost classifier has better precision with 0.88, followed by Random Forest Classifier with 0.82. Both the Random Forest classifier and XGBoost classifier showed a 0.69 Recall score. XGBoost classifier had the highest F1 Score with 0.77, followed by the Random Forest classifier with 0.75. In the ROC curve, the XGBoost classifier had a higher area under the curve(AUC) with 0.88. Conclusions: Among the studied four machine learning algorithms, Logistic Regression, Naive Bayes Classifier, Random Forest Classifier, and XGBoost, the XGBoost classifier is the most optimal with a good performance score respective to the tested performance metrics. No feature is found majorly affect employee turnover.
73

Restaurant Daily Revenue Prediction : Utilizing Synthetic Time Series Data for Improved Model Performance

Jarlöv, Stella, Svensson Dahl, Anton January 2023 (has links)
This study aims to enhance the accuracy of a demand forecasting model, XGBoost, by incorporating synthetic multivariate restaurant time series data during the training process. The research addresses the limited availability of training data by generating synthetic data using TimeGAN, a generative adversarial deep neural network tailored for time series data. A one-year daily time series dataset, comprising numerical and categorical features based on a real restaurant's sales history, supplemented by relevant external data, serves as the original data. TimeGAN learns from this dataset to create synthetic data that closely resembles the original data in terms of temporal and distributional dynamics. Statistical and visual analyses demonstrate a strong similarity between the synthetic and original data. To evaluate the usefulness of the synthetic data, an experiment is conducted where varying lengths of synthetic data are iteratively combined with the one-year real dataset. Each iteration involves retraining the XGBoost model and assessing its accuracy for a one-week forecast using the Root Mean Square Error (RMSE). The results indicate that incorporating 6 years of synthetic data improves the model's performance by 65%. The hyperparameter configurations suggest that deeper tree structures benefit the XGBoost model when synthetic data is added. Furthermore, the model exhibits improved feature selection with an increased amount of training data. This study demonstrates that incorporating synthetic data closely resembling the original data can effectively enhance the accuracy of predictive models, particularly when training data is limited.
74

Neonatal Sepsis Detection Using Decision Tree Ensemble Methods: Random Forest and XGBoost

Al-Bardaji, Marwan, Danho, Nahir January 2022 (has links)
Neonatal sepsis is a potentially fatal medical conditiondue to an infection and is attributed to about 200 000annual deaths globally. With healthcare systems that are facingconstant challenges, there exists a potential for introducingmachine learning models as a diagnostic tool that can beautomatized within existing workflows and would not entail morework for healthcare personnel. The Herlenius Research Teamat Karolinska Institutet has collected neonatal sepsis data thathas been used for the development of many machine learningmodels across several papers. However, none have tried to studydecision tree ensemble methods. In this paper, random forestand XGBoost models are developed and evaluated in order toassess their feasibility for clinical practice. The data contained24 features of vital parameters that are easily collected througha patient monitoring system. The validation and evaluationprocedure needed special consideration due to the data beinggrouped based on patient level and being imbalanced. Theproposed methods developed in this paper have the potentialto be generalized to other similar applications. Finally, usingthe measure receiver-operating-characteristic area-under-curve(ROC AUC), both models achieved around ROC AUC= 0.84.Such results suggest that the random forest and XGBoost modelsare potentially feasible for clinical practice. Another gainedinsight was that both models seemed to perform better withsimpler models, suggesting that future work could create a moreexplainable model. / Nenatal sepsis är ett potentiellt dödligt‌‌‌ medicinskt tillstånd till följd av en infektion och uppges globalt orsaka 200 000 dödsfall årligen. Med sjukvårdssystem som konstant utsätts för utmaningar existerar det en potential för maskininlärningsmodeller som diagnostiska verktyg automatiserade inom existerande arbetsflöden utan att innebära mer arbete för sjukvårdsanställda. Herelenius forskarteam på Karolinska Institet har samlat ihop neonatal sepsis data som har använts för att utveckla många maskininlärningsmodeller över flera studier. Emellertid har ingen prövat att undersöka beslutsträds ensemble metoder. Syftet med denna studie är att utveckla och utvärdera random forest och XGBoost modeller för att bedöma deras möjligheter i klinisk praxis. Datan innehör 24 attribut av vitalparameterar som enkelt samlas in genom patientövervakningssystem. Förfarandet för validering och utvärdering krävde särskild hänsyn med tanke på att datan var grupperad på patientnivå och var obalanserad. Den föreslagna metoden har potential att generaliseras till andra liknande tillämpningar. Slutligen, genom att använda receiveroperating-characteristic area-under-curve (ROC AUC) måttet kunde vi uppvisa att båda modellerna presterade med ett resultat på ROC AUC= 0.84. Sådana resultat föreslår att både random forest och XGBoost modellerna kan potentiellt användas i klinisk praxis. En annan insikt var att båda modellerna verkade prestera bättre med enklare modeller vilket föreslår att ete skulle kunna vara att skapa en mer förklarlig skininlärningsmodell. / Kandidatexjobb i elektroteknik 2022, KTH, Stockholm
75

Geochemical investigation of the co-evolution of life and environment in the Neoproterozoic Era

Kang, Junyao 19 February 2024 (has links)
The co-evolution of life and the environment stands as a cornerstone in Earth's 4.5-billion-year history. Environmental fluctuations have wielded substantial influence over biological evolution, while life forms have, in turn, reshaped Earth's surface and climate. This dissertation centers on a critical period in Earth's history—the Neoproterozoic Era—when profound environmental shifts potentially catalyzed pivotal eukaryotic evolutionary events. By delving deeper into Neoproterozoic paleoenvironments, I aim at a clearer understanding of life-environment co-evolution in this crucial era. The first chapter focuses on an important juncture—the transition from prokaryote to eukaryote dominance in marine ecosystems during the Tonian Period (1000 Ma to 720 Ma). To assess whether the availability of nitrate, an important macro-nutrient, played a critical role in this evolutionary event, nitrogen isotope compositions (δ<sup>15</sup>N) of marine carbonates from the early Tonian (ca. 1000 Ma to ca. 800 Ma) Huaibei Group in North China were measured. The data indicate nitrate limitation in early Neoproterozoic oceans. Further, a compilation of Proterozoic sedimentary δ<sup>15</sup>N data, together with box model simulations, suggest a ~50% increase in marine nitrate availability at ~800 Ma. Limited nitrate availability in early Neoproterozoic oceans may have delayed the ecological rise of eukaryotes until ~800 Ma when increased nitrate supply, together with other environmental and ecological factors, may have contributed to the transition from prokaryote-dominant to eukaryote-dominant marine ecosystems. Recognizing the spatial and temporal variations in Neoproterozoic oceanic environments, the second chapter lays the groundwork for a robust stratigraphic framework for the early Tonian Period. Employing the dynamic time warping algorithm, I constructed a global stratigraphic framework for the early Tonian Period using δ<sup>13</sup>C<sub>carb</sub> data from the North China, São Francisco, and Congo cratons. This exercise confirms the generally narrow range of δ<sup>13</sup>C<sub>carb</sub> fluctuations in the early Tonian, but also confirms the presence of a negative δ<sup>13</sup>C<sub>carb</sub> excursion of notable magnitude (~9 ‰) at ca. 920 Ma in multiple records, suggesting that it was global in scope. This negative excursion, known as the Majiatun excursion, is likely the oldest negative excursion in the Neoproterozoic Era and marks the onset of the dynamic Neoproterozoic carbon cycle. Shifting focus to the late Neoproterozoic, the third chapter delves into the origins of Neoproterozoic superheavy pyrite, whose bulk-sample δ<sup>34</sup>S values are greater than those of contemporaneous seawater sulfate and whose origins remain controversial. Two supervised machine learning algorithms were trained on a large LA-ICP-MS pyrite trace element database to distinguish pyrite of different origins. The analysis validates that two models built on the co-behavior of 12 trace elements (Co, Ni, Cu, Zn, As, Mo, Ag, Sb, Te, Au, Tl, and Pb) can be used to accurately predict pyrite origins. This novel approach was then used to identify the origins of pyrite from two Neoproterozoic sedimentary successions in South China. The first set of samples contains isotopically superheavy pyrite from the Cryogenian Tiesi'ao and Datangpo formations. The second set of samples contains pyritic rims from the Ediacaran Doushantuo Formation; these pyrite rims are associated with fossiliferous chert nodules and do not have superheavy sulfur isotopes. For the superheavy pyrite, the models consistently show high confidence levels in identifying its genesis type, and three out of four samples were inferred to be of sedimentary origins. For the pyritic nodule rims, the models suggest that early diagenetic pyrite was subsequently altered by hydrothermal fluids and therefore shows mixed signals. The third chapter highlights the importance of pyrite trace elements in deciphering and distinguishing the origins of pyrite in sedimentary strata. / Doctor of Philosophy / Understanding how life and the environment have shaped our planet's story over 4.5 billion years is like piecing together an intricate puzzle. On the one hand, changes in the environment kickstarted big shifts in how life evolved. On the other hand, living creatures have also left their mark on Earth's landscapes and climate. This dissertation focuses on unraveling the mysterious Neoproterozoic Era (1 billion to 538 million years ago), a time when Earth saw some of its most dramatic changes. A significant aspect of my investigation delves into the evolutionary dynamics within ancient marine ecosystems. Specifically, I'm exploring a critical juncture when organisms with more complex cellular structures, known as eukaryotes, became ecologically more important than prokaryotic life forms in many aspects of Earth systems. By examining ancient rock formations from China, I have found evidence suggesting that nitrate, a vital nutrient, was scarce in the Neoproterozoic oceans. However, around 800 million years ago, there appears to have been a significant surge in nitrate availability. This surge potentially catalyzed a pivotal phase in evolution, possibly driving the shift from prokaryote to eukaryote dominance in these ancient waters. Second, there is a challenge to delineate a robust timeline for the early Neoproterozoic Era. Imagine trying to piece together a story from a time when there were no calendars or clear dates. Employing advanced statistical methods and comparing chemical signals preserved in carbonate rocks from disparate global locations, I endeavor to craft a coherent timeline for this crucial period. Within this timeline, a noteworthy anomaly in the carbon cycle emerged around 920 million years ago known as the Majiatun excursion. This anomaly represents a significant shift in the Neoproterozoic carbon cycle. Furthermore, my investigation plunges into the geochemistry of sulfur, an important element in shaping ancient marine environments. Certain sedimentary rocks harbor anomalous sulfur isotope signatures in the mineral pyrite (also known as fool's gold), hinting at dramatic environmental transformations during the late Neoproterozoic. Employing advanced analytical techniques and machine learning methodologies, I seek to discern the origins and implications of these anomalous sulfur isotope signals found in pyrite, unraveling their significance in reconstructing the environmental dynamics of ancient oceans.
76

Viewership forecast on a Twitch broadcast : Using machine learning to predict viewers on sponsored Twitch streams

Malm, Jonas, Friberg, Martin January 2022 (has links)
Today, the video game industry is larger than the sports and film industries combined, and the largest streaming platform Twitch with an average of 2.8 million concurrent viewers offers the possibility for gaming and non-gaming brands to market their products. Estimating streamers’ viewership is central in these marketing campaigns, but no large-scale studies have been conducted to predict viewership previously. This paper evaluates three different machine learning algorithms with regard to the three different error metrics MAE, MAPE and RMSE; and presents novel features for predicting viewership. Different models are chosen through recursive feature elimination using k-fold cross-validation with respect to both MAE and MAPE separately. The models are evaluated on an independent test and show promising results, on par with manual expert predictions. None of the models can be said to be significantly better than another. XGBoost optimized for MAPE obtained the lowest MAE error score of 282.54 and lowest MAPE error score of 41.36% on the test set, in comparison to expert predictions with 288.06 MAE and 83.05% MAPE. Furthermore, the study illustrates the importance of past viewership and streamer variety to predict future viewership.
77

Prognostic Stratification in Patients with Left Heart Disease : A Machine Learning Approach / Prognostisk stratifiering hos patienter med vänstersidig hjärtsvikt : En maskininlärningsmetod

Saleh, Mariam January 2024 (has links)
Left heart disease often results in left heart failure and right ventricular dysfunction which is challenging to diagnose with traditional diagnostic approaches. To address this a novel empirical 4-point right ventricular dysfunction score was created at Sahlgrenska University Hospital to overcome the limitations of single variables for diagnosing right ventricular dysfunction. In this study, we used machine learning, more specifically XGBoost coupled with interactive machine learning to develop four different models for predicting death or receiving a left ventricular assist device in patients with left heart disease (n=486). Features were selected from the dataset using recursive feature elimination with the default number of features. The initial model with 29 features, called the baseline model served as the foundation of the three additional models, each adjusted based on feedback from a clinician. The first step of feedback included removing features due to high correlation, creating a modified model with 12 features, the second step was to use 12 well-known characteristics of left and right ventricular dysfunction creating an empirical model, and adjusting the prediction threshold from 50% to 60%. The third step was to reduce the number of features to 5 based on empirical grounds. The models were compared to the right ventricular dysfunction score using the metrics area under the curve, f1 score, positive likelihood ratio, and negative likelihood ratio. The predictive efficacy of the machine learning models was superior compared to the right ventricular dysfunction score. The results also indicated that the models did neither improve nor deteriorate when reducing the number of features. However, insufficient accuracy indicates that none of the machine learning models are clinically viable. These results show the potential of machine learning in enhancing prognostic stratification in patients with left heart disease although further refinement is necessary for clinical use. / Vänstersidig hjärtsjukdom resulterar ofta i vänstersidig hjärtsvikt och högerkammarsvikt vilket är utmanade att diagnostisera med traditionella diagnostiska metoder. För att komma undan med begränsningen med enskilda variabler för att diagnostisera högerkammarsvikt skapades ett 4 poängs högerkammarsvikt score vid Sahlgrenska Universitetssjukhuset. I denna studie användes en XGBoost-algoritm kombinerat med interaktiv maskininlärning för att utveckla fyra olika prediktions modeller för att förutsäga dödlighet eller risken att få en mekanisk hjärtpump för vänster kammare hos patienter med vänster hjärtsvikt (n=486). Variabler valdes från datamängden med hjälp av rekursiv funktionseliminering med ett standardantal variabler. Den initiala modellen med 29 variabler kallades baslinjemodellen och fungerade som grunden för de tre ytterligare modellerna som justerades baserat på klinikerns feedback. Det först steget inkluderade att ta bort variabler med inbördes hög korrelation och vi skapade en modifierad modell med 12 variabler. I det andra steget i den empiriska modellen använde vi 12 kända egenskaperna vid vänsterkammar- och högerkammarsvikt och för båda justerades tröskelvärdet för prediktion från 50% till 60%. I ett tredje steg skapade vi en förenklad modell med 5 variabler ut ifrån klinisk grund. Modellerna jämfördes med höger hjärtsvikts 4 poängskalan med hjälp av mätvariablerna area under kurvan, f1-poäng, positivt sannolikhets ratio och negativt sannolikhets ratio. Detta avslöjade att maskininlärnings modellerna hade bättre prediktiv förmåga än 4-poängs högerkammarsvikt score. Dessutom visade resultatet att modellerna inte försämrades eller förbättrades när variabler valdes bort eller när nya modeller skapades på klinisk grund. Dock hade maskininlärnings modellerna otillräcklig noggrannhet för klinisk användning.
78

Efficient Resource Management : A Comparison of Predictive Scaling Algorithms in Cloud-Based Applications

Dahl, Johanna, Strömbäck, Elsa January 2024 (has links)
This study aims to explore predictive scaling algorithms used to predict and manage workloadsin a containerized system. The goal is to identify which predictive scaling approach delivers themost effective results, contributing to research on cloud elasticity and resource management.This potentially leads to reduced infrastructure costs while maintaining efficient performance,enabling a more sustainable cloud-computing technology. The work involved the developmentand comparison of three different autoscaling algorithms with an interchangeable predictioncomponent. For the predictive part, three different time-series analysis methods were used:XGBoost, ARIMA, and Prophet. A simulation system with the necessary modules wasdeveloped, as well as a designated target service to experience the load. Each algorithm'sscaling accuracy was evaluated by comparing its suggested number of instances to the optimalnumber, with each instance representing a simulated CPU core. The results showed varyingefficiency: XGBoost and Prophet excelled with richer datasets, while ARIMA performed betterwith limited data. Although XGBoost and Prophet maintained 100% uptime, this could lead toresource wastage, whereas ARIMA's lower uptime percentage possibly suggested a moreresource-efficient, though less reliable, approach. Further analysis, particularly experimentalinvestigation is required to deepen the understanding of these predictors' influence on resourceallocation.
79

Comparative Analysis of Machine Learning Algorithms for Cryptocurrency Price Prediction

Kurtagic, Leila January 2024 (has links)
As the cryptocurrency markets continuously grow, so does the need for reliable analytical tools for price prediction. This study conducted a comparative analysis of machine learning (ML) algorithms for cryptocurrency price prediction. Through a literature review, three common and reliable ML algorithms for cryptocurrency price prediction were identified: Long Short-Term Memory (LSTM), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost). Utilizing the Bitcoin All Time History dataset from TradingView, the study assessed both the individual performance of each algorithm and the potential of ensemble methods to enhance predictive accuracy. The results reveal that the LSTM algorithm outperformed RF and XGBoost in terms of predictive accuracy according to the metrics Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). Additionally, two ensemble approaches were tested: Ensemble 1, which enhanced the LSTM model with the combined predictions from RF and XGBoost, and Ensemble 2, which integrated predictions from all three models. Ensemble 2 demonstrated the highest predictive performance among all models, highlighting the advantages of using ensemble approaches for more robust predictions.
80

Money Laundering Detection using Tree Boosting and Graph Learning Algorithms / Detektion av Penningtvätt med hjälp av Trädalgoritmer och Grafinlärningsalgoritmer

Frumerie, Rickard January 2021 (has links)
In this masters thesis we focused on using machine learning methods for detecting money laundering in financial transaction networks, in order to demonstrate that it can be used as a complement or instead of the more commonly used rule based systems. The graph learning method graph convolutional networks (GCN) has been a hot topic in the field since they were shown to scale well with data size back in 2018. However the typical GCN models cannot use edge features, which is why this thesis combines the GCN model with a node and edge neural network (NENN) in order to solve this problem. This new method will be compared towards an already established machine learning method for financial transactions, namely the tree boosting method (XGBoost). Because of confidentiality concerns for financial transactions data, the machine learning algorithms will be tested on two carefully constructed synthetically generated data sets, which from agent based simulations resembles real financial data. The results showed the viability and superiority of the new implementation of the GCN model with it being a preferable method for connectivly structured data, meaning that a transaction or account is analyzed in the context of its financial environment. On the other hand the XGBoost method showed better results when examining transactions independently. Hence it was more accurately able to find fraudulent and non fraudulent patterns from the transactional features themselves. / I detta examensarbete fokuserar vi på användandet av maskininlärningsmetoder för att detektera penningtvätt i finansiella transaktionsnätverk, med målet att demonstrera att dess kan användas som ett komplement till eller i stället för de mer vanligt använda regelbaserade systemen. Grafinlärningsmetoden \textit{graph convolutional networks} (GCN) som har varit ett hett ämne inom området sedan metoden under 2018 visades fungera bra för stora datamängder. Däremot kan inte en vanlig GCN-modell använda kantinformation, vilket är varför denna avhandling kombinerar GCN-modellen med \textit{node and edge neural networks} (NENN) för att mer effektivt detektera penningtvätt. Denna nya metod kommer att jämföras med en redan etablerad maskininlärningsmetod för finansiella transaktioner, nämligen \textit{tree boosting} (XGBoost). På grund av sekretessanledningar för finansiella transaktionsdata var maskininlärningsalgoritmerna testade på två noggrant konstruerade syntetiskt genererade datamängder som från agentbaserade simuleringar liknar riktiga finansiella data. Resultaten visade på applikationsmöjligheter och överlägsenhet för den nya implementationen av GCN-modellen vilken är att föredra för relationsstrukturerade data, det vill säga när transaktioner och konton analyseras i kontexten av deras finansiella omgivning. Å andra sidan visar XGBoost bättre resultat på att examinera transaktioner individuellt eftersom denna metod mer precist kan identifiera bedrägliga och icke-bedrägliga mönster från de transnationella funktionerna.

Page generated in 0.0538 seconds