Global ETD Search

51	Advanced Algorithms for Classification and Anomaly Detection on Log File Data : Comparative study of different Machine Learning Approaches Wessman, Filip January 2021 (has links) Background: A problematic area in today’s large scale distributed systems is the exponential amount of growing log data. Finding anomalies by observing and monitoring this data with manual human inspection methods becomes progressively more challenging, complex and time consuming. This is vital for making these systems available around-the-clock. Aim: The main objective of this study is to determine which are the most suitable Machine Learning (ML) algorithms and if they can live up to needs and requirements regarding optimization and efficiency in the log data monitoring area. Including what specific steps of the overall problem can be improved by using these algorithms for anomaly detection and classification on different real provided data logs. Approach: Initial pre-study is conducted, logs are collected and then preprocessed with log parsing tool Drain and regular expressions. The approach consisted of a combination of K-Means + XGBoost and respectively Principal Component Analysis (PCA) + K-Means + XGBoost. These was trained, tested and with different metrics individually evaluated against two datasets, one being a Server data log and on a HTTP Access log. Results: The results showed that both approaches performed very well on both datasets. Able to with high accuracy, precision and low calculation time classify, detect and make predictions on log data events. It was further shown that when applied without dimensionality reduction, PCA, results of the prediction model is slightly better, by a few percent. As for the prediction time, there was marginally small to no difference for when comparing the prediction time with and without PCA. Conclusions: Overall there are very small differences when comparing the results for with and without PCA. But in essence, it is better to do not use PCA and instead apply the original data on the ML models. The models performance is generally very dependent on the data being applied, it the initial preprocessing steps, size and it is structure, especially affecting the calculation time the most. Machine Learning (ML) K-Means Principal Component Analysis (PCA) XGBoost Log data Anomaly Detection Outlier Detection Clustering. Computer Engineering Datorteknik
52	Predicting profitability of new customers using gradient boosting tree models : Evaluating the predictive capabilities of the XGBoost, LightGBM and CatBoost algorithms Kinnander, Mathias January 2020 (has links) In the context of providing credit online to customers in retail shops, the provider must perform risk assessments quickly and often based on scarce historical data. This can be achieved by automating the process with Machine Learning algorithms. Gradient Boosting Tree algorithms have demonstrated to be capable in a wide range of application scenarios. However, they are yet to be implemented for predicting the profitability of new customers based solely on the customers’ first purchases. This study aims to evaluate the predictive performance of the XGBoost, LightGBM, and CatBoost algorithms in this context. The Recall and Precision metrics were used as the basis for assessing the models’ performance. The experiment implemented for this study shows that the model displays similar capabilities while also being biased towards the majority class. Gradient tree boosting XGBoost LightGBM CatBoost prediction profitability online retail Information Systems, Social aspects
53	Club Head Tracking : Visualizing the Golf Swing with Machine Learning Herbai, Fredrik January 2023 (has links) During the broadcast of a golf tournament, a way to show the audience what a player's swing looks like would be to draw a trace following the movement of the club head. A computer vision model can be trained to identify the position of the club head in an image, but due to the high speed at which professional players swing their clubs coupled with the low frame rate of a typical broadcast camera, the club head is not discernible whatsoever in most frames. This means that the computer vision model is only able to deliver a few sparse detections of the club head. This thesis project aims to develop a machine learning model that can predict the complete motion of the club head, in the form of a swing trace, based on the sparse club head detections. Slow motion videos of golf swings are collected, and the club head's position is annotated manually in each frame. From these annotations, relevant data to describe the club head's motion, such as position and time parameters, is extracted and used to train the machine learning models. The dataset contains 256 annotated swings of professional and competent amateur golfers. The two models that are implemented in this project are XGBoost and a feed forward neural network. The input given to the models only contains information in specific parts of the swing to mimic the pattern of the sparse detections. Both models learned the underlying physics of the golf swing, and the quality of the predicted traces depends heavily on the amount of information provided in the input. In order to produce good predictions with only the amount of input information that can be expected from the computer vision model, a lot more training data is required. The traces predicted by the neural network are significantly smoother and thus look more realistic than the predictions made by the XGBoost model. Golf Machine learning Neural network XGBoost Interpolation Deep learning Data collection Data augmentation Computer Sciences Datavetenskap (datalogi)
54	Arctic Persistent Fire Identification: A Machine Learning Approach to Fire Source Attribution for the Improvement of Arctic Fire Emission Estimates Fain, Justin 06 December 2022 (has links) No description available. Earth Environmental Science Forestry Geography Geographic Information Science remote sensing Arctic fire energy flaring xgboost machine learning geography emissions
55	Predictive Study of Flame status inside a combustor of a gas turbine using binary classification Sasikumar, Sreenand January 2022 (has links) Quick and accurate detection of flame inside a gas turbine is very crucial to mitigaterisks in power generation. Failure of flame detection increases downtime and maintenancecosts and on rare occasions it may cause explosions due to buildup of incombustible fuel inside the combustion chamber.The aim of this thesis is to investigate the applicability ofmachine learning methods to detect the presence of flame within a gas turbine. Traditionally,this is done using an optical flame detection which converts the infrared radiation toa differential reading, which is further converted as a digital signal to the control systemand gives the flame status (1 for flame ON and 0 for flame OFF). The primary purpose ofthis alternative flame detection method is to reduce the instrument cost per gas turbine. Amachine learning model is trained with the data collected over several runs of the turbineengine and would estimate if there is an occurrence of the flame, to decide if the machineshould be ON or OFF. To reduce the instrumentation cost, the presented flame predictionmethod based on deep learning methods is employed, which takes standard data such as dynamic pressure and temperature values as input. These variables are observed to have a high correlation with the flame status. The pressure is measured using a piezocryst sensorand the temperature is measured using a thermocouple. A Study is performed by trainingon several machine learning models and coming up with which model among them have worked the best on this data.The Logistic is used as a baseline and is compared with othermodels such as KNN,SVM,Naïve Bayes,RandomForest and XGBoost is trained with thedata collected over several runs of the turbine and tested on to predict flame status insidethe gas turbine.It was observed that KNN and Random Forest performed exceptionallywell as compared to the baseline model. It is recorded that the minimum time for estimation of the flame status by the machine is 0.6 seconds and if the model implementedcan give a high accuracy with the same time then the proposed method can be an effective alternate flame detection method. Binary Classification Random Forest SVM XGBoost Logistic Regression Extra-trees GasTurbine Flame Detection Probability Theory and Statistics Sannolikhetsteori och statistik
56	Optimizing Flight Ranking:A Machine Learning Approach : Applying Machine Learning to Upgrade Flight Sorting and User Experience / Optimering av flygsortering:En approach med maskininlärning Jabeli, Habib January 2024 (has links) Flygresor.se, a leading flight comparison platform, uses machine learning to rankflights based on their likelihood of being clicked. The main goal of this project was toimprove this flight sorting to obtain a better user experience. The platform's existingmodel is based on a neural network approach and a limited set of features. The solution involved developing and comparing two machine learning models, Random Forest and XGBoost besides using a set of existing and newly created features. TheXGBoost model demonstrated superior performance by significantly improving theprediction of clicked flights by 4.18% while also achieving a remarkable increase inefficiency by being 125 times faster than the existing model. / Flygresor.se, en ledande plattform för jämförelse av flygresor, använder maskininlärning för att ranka flygresor baserat på deras sannolikhet att bli klickade. Huvudmåletmed detta projekt var att förbättra denna flygsortering för att få en bättre användarupplevelse. Plattformens befintliga modell är baserad på ett neuralt nätverk och ettbegränsat antal funktioner. Lösningen innebar att utveckla och jämföra två maskininlärningsmodeller, Random Forest och XGBoost, förutom att använda en uppsättning befintliga och nyskapade funktioner. XGBoost-modellen visade bättre prestandagenom att förbättra predikteringen av de klickade flygresor med 4,18 % samtidigt somden uppnådde högre nivå av effektivitet genom att vara 125 gånger snabbare än denbefintliga modellen. Machine Learning Flight Comparison Flygresor.se Neural Networks Flight Ranking Random Forest XGBoost Computer and Information Sciences Data- och informationsvetenskap
57	Housing Price Prediction over Countrywide Data : A comparison of XGBoost and Random Forest regressor models Henriksson, Erik, Werlinder, Kristopher January 2021 (has links) The aim of this research project is to investigate how an XGBoost regressor compares to a Random Forest regressor in terms of predictive performance of housing prices with the help of two data sets. The comparison considers training time, inference time and the three evaluation metrics R2, RMSE and MAPE. The data sets are described in detail together with background about the regressor models that are used. The method makes substantial data cleaning of the two data sets, it involves hyperparameter tuning to find optimal parameters and 5foldcrossvalidation in order to achieve good performance estimates. The finding of this research project is that XGBoost performs better on both small and large data sets. While the Random Forest model can achieve similar results as the XGBoost model, it needs a much longer training time, between 2 and 50 times as long, and has a longer inference time, around 40 times as long. This makes it especially superior when used on larger sets of data. / Målet med den här studien är att jämföra och undersöka hur en XGBoost regressor och en Random Forest regressor presterar i att förutsäga huspriser. Detta görs med hjälp av två stycken datauppsättningar. Jämförelsen tar hänsyn till modellernas träningstid, slutledningstid och de tre utvärderingsfaktorerna R2, RMSE and MAPE. Datauppsättningarna beskrivs i detalj tillsammans med en bakgrund om regressionsmodellerna. Metoden innefattar en rengöring av datauppsättningarna, sökande efter optimala hyperparametrar för modellerna och 5delad korsvalidering för att uppnå goda förutsägelser. Resultatet av studien är att XGBoost regressorn presterar bättre på både små och stora datauppsättningar, men att den är överlägsen när det gäller stora datauppsättningar. Medan Random Forest modellen kan uppnå liknande resultat som XGBoost modellen, tar träningstiden mellan 250 gånger så lång tid och modellen får en cirka 40 gånger längre slutledningstid. Detta gör att XGBoost är särskilt överlägsen vid användning av stora datauppsättningar. Random Forest XGBoost predicting housing prices feature engineering ensemble learning boosting data cleansing 5foldcrossvalidation. Computer Sciences Datavetenskap (datalogi)
58	Credit Scoring Based on Behavioural Data / Kreditvärdering baserat på beteendedata Bouvin, Daniel, Hamberg, Erik January 2022 (has links) Credit modelling has traditionally been done by credit institutes based on financial data about the individuals requesting the credit. While this has been sufficient in lowering risk in developed economies with plenty of financial data it is inefficient in developing economies and fails to reach the unbanked population. As this is both limiting many responsible consumers from getting access to credit as well as limiting companies from reaching paying customers, it is evident that new strategies for credit modelling are needed. This paper explores the usage of behavioural data for credit modelling gathered from users of Klarna’s app. The models are based on the machine learning algorithms logistic regression, random forests, neural networks, and gradient boosted decision trees. In this study, models were trained on Swedish data in multiple timespans and tested in different timespans and countries. The results show that modelling on the data points developed in this study is effective and suggest that in certain cases be used in predicting new and unknown markets by training on similar markets. / Kreditvärderingar har traditionellt sätt utförts av kreditinstitut baserat på existerande finansiella data kring personen i fråga som ansöker om kredit. Denna metod har varit framgångsrik i att minimera risk inom utvecklade ekonomier där finansiella data har varit tillgänglig. Metoden har varit mindre framgångsrik i utvecklingsekonomier och misslyckas att utvärdera befolkningar som saknar finansiella tjänster. Då detta problem begränsar många pålitliga konsumenter att få tillgång till kredit och samtidigt begränsar företagen att nå ut till möjliga betalande kunder, blir det viktigt att ta fram nya strategier för att utvärdera kredit. Denna uppsats utforskar möjligheten att modellera kreditvärdighet baserat på användarbeteende med hjälp av data från Klarnas shopping app. Modellerna är baserade på maskininlärningsalgoritmerna logistisk regression, Random Forests, neurala nätverk och gradient boosted decision trees. I denna studie tränas modellerna på olika tidsspann inom den svenska marknaden och testas på olika tidsspann och marknader. Resultaten från studien visar att det går med hjälp av beteende data från Klarnas app att, under olika omständigheter, förutspå kreditvärdighet i framtiden och på olika marknader. Banking Behavior Behaviour Credit Modelling Klarna Logistic Regression Machine Learning Neural Networks Random Forests XGBoost Computer Sciences Datavetenskap (datalogi)
59	Machine Learning Methods for Predicting Trading Behaviour of an Actively Managed Mutual Fund Forslund, Herman, Johnson, Marcus January 2021 (has links) This paper aims to reverse engineer the tradingstrategy of an actively managed mutual fund by identifyingtechnical patterns in their trading. Investment strategies formany institutional investors consists of both fundamental andtechnical analysis. The purpose of the paper is to explore towhich extent the latter can be used to predict the trading actionsby taking some commonly used technical indicators as input invarious machine learning algorithms to assess patterns betweenthem and the trading of the fund. Furthermore, the technicalindicators’ ability to predict future prices is analysed using thesame methods. The results are not sufficiently clear to suggestthat the fund uses technical indicators to begin with, let alonewhich ones. As for the prediction of future prices, the technicalindicators appear to have some predictive ability. / Syftet med denna rapport är att prediktera handeln i en aktivt förvaltad aktiefond med hjälp av fyra maskininlärningsmetoder. Investeringsstrategier kombinerar i regel två analysmetoder, fundamental respektive teknisk analys. Avsikten med rapporten är att utforska huruvida det sistnämnda kan användas för att förutspå fondens handel genom att använda ett antal vanligt förekommande tekniska indikatorer och medelst maskininlärningsmetoder söka efter mönster mellan dessa och handeln. Vidare innefattar även studien en analys över hur väl tekniska indikatorer predikterar upprespektive nedgångar på aktiepriser. Vad gäller investeringsstrategierna återfanns inga tydliga samband mellan de utvalda indikatorerna och transaktionerna. Resultaten för andra delen av studien tyder på viss prediktiv förmåga för tekniska indikatorer på marknadsrörelser. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm Machine Learning Random Forest XGBoost Long Short-Term Memory AdaBoost Allocation Strategies Elektroteknik och elektronik
60	Applying Multivariate Time Series Data and Deep Learning to Probability of Default Estimation / Kreditriskbedömning Baserat på Multivariat Tidsseriedata och Djupinlärning Vävinggren, David, Säll, Emil January 2024 (has links) The problem of determining the probability of default or credit risk for companies is crucial when providing financial services. This problem is often modeled based on snapshot data that does not take the time dimension into account. Instead, we approach the problem with enterprise resource planning data in time series. With the added complexity the time series introduce, we pose that deep learning models could be suitable for the task. A comparison of a fully convolutional network and a transformer encoder was made to the current state-of-the-art model for the probability of default problem, XGBoost. The comparison showed that XGBoost generalized very well to the time series domain, even well enough to beat the deep learning models across all evaluation metrics. Furthermore, time series data with monthly, quarterly and yearly timestamps over three years was tested. Also, public features that could be extracted from quarterly and annual financial reports were compared with internal enterprise resource planning data. We found that the introduction of time series to the problem improves the performance and that models based on internal data outperform the ones based on public data. To be more precise, we argue that the dataset being based on small to medium-sized companies lessens the impact of highly granular data, and makes the selection of what features to include more prominent. This is something XGBoost takes advantage of in a very efficient way, especially when extracting features that capture the behavior of the time series, causing it to beat the deep learning competitors even though it does not pick up on the sequential aspect of the data. Deep Learning Machine Learning Credit Risk Probability of Default Transformer XGBoost Fully Convolutional Network Engineering and Technology Teknik och teknologier

Search results