Spelling suggestions: "subject:"cqboost"" "subject:"deboost""
61 |
Optimizing Flight Ranking:A Machine Learning Approach : Applying Machine Learning to Upgrade Flight Sorting and User Experience / Optimering av flygsortering:En approach med maskininlärningJabeli, Habib January 2024 (has links)
Flygresor.se, a leading flight comparison platform, uses machine learning to rankflights based on their likelihood of being clicked. The main goal of this project was toimprove this flight sorting to obtain a better user experience. The platform's existingmodel is based on a neural network approach and a limited set of features. The solution involved developing and comparing two machine learning models, Random Forest and XGBoost besides using a set of existing and newly created features. TheXGBoost model demonstrated superior performance by significantly improving theprediction of clicked flights by 4.18% while also achieving a remarkable increase inefficiency by being 125 times faster than the existing model. / Flygresor.se, en ledande plattform för jämförelse av flygresor, använder maskininlärning för att ranka flygresor baserat på deras sannolikhet att bli klickade. Huvudmåletmed detta projekt var att förbättra denna flygsortering för att få en bättre användarupplevelse. Plattformens befintliga modell är baserad på ett neuralt nätverk och ettbegränsat antal funktioner. Lösningen innebar att utveckla och jämföra två maskininlärningsmodeller, Random Forest och XGBoost, förutom att använda en uppsättning befintliga och nyskapade funktioner. XGBoost-modellen visade bättre prestandagenom att förbättra predikteringen av de klickade flygresor med 4,18 % samtidigt somden uppnådde högre nivå av effektivitet genom att vara 125 gånger snabbare än denbefintliga modellen.
|
62 |
Housing Price Prediction over Countrywide Data : A comparison of XGBoost and Random Forest regressor modelsHenriksson, Erik, Werlinder, Kristopher January 2021 (has links)
The aim of this research project is to investigate how an XGBoost regressor compares to a Random Forest regressor in terms of predictive performance of housing prices with the help of two data sets. The comparison considers training time, inference time and the three evaluation metrics R2, RMSE and MAPE. The data sets are described in detail together with background about the regressor models that are used. The method makes substantial data cleaning of the two data sets, it involves hyperparameter tuning to find optimal parameters and 5foldcrossvalidation in order to achieve good performance estimates. The finding of this research project is that XGBoost performs better on both small and large data sets. While the Random Forest model can achieve similar results as the XGBoost model, it needs a much longer training time, between 2 and 50 times as long, and has a longer inference time, around 40 times as long. This makes it especially superior when used on larger sets of data. / Målet med den här studien är att jämföra och undersöka hur en XGBoost regressor och en Random Forest regressor presterar i att förutsäga huspriser. Detta görs med hjälp av två stycken datauppsättningar. Jämförelsen tar hänsyn till modellernas träningstid, slutledningstid och de tre utvärderingsfaktorerna R2, RMSE and MAPE. Datauppsättningarna beskrivs i detalj tillsammans med en bakgrund om regressionsmodellerna. Metoden innefattar en rengöring av datauppsättningarna, sökande efter optimala hyperparametrar för modellerna och 5delad korsvalidering för att uppnå goda förutsägelser. Resultatet av studien är att XGBoost regressorn presterar bättre på både små och stora datauppsättningar, men att den är överlägsen när det gäller stora datauppsättningar. Medan Random Forest modellen kan uppnå liknande resultat som XGBoost modellen, tar träningstiden mellan 250 gånger så lång tid och modellen får en cirka 40 gånger längre slutledningstid. Detta gör att XGBoost är särskilt överlägsen vid användning av stora datauppsättningar.
|
63 |
Credit Scoring Based on Behavioural Data / Kreditvärdering baserat på beteendedataBouvin, Daniel, Hamberg, Erik January 2022 (has links)
Credit modelling has traditionally been done by credit institutes based on financial data about the individuals requesting the credit. While this has been sufficient in lowering risk in developed economies with plenty of financial data it is inefficient in developing economies and fails to reach the unbanked population. As this is both limiting many responsible consumers from getting access to credit as well as limiting companies from reaching paying customers, it is evident that new strategies for credit modelling are needed. This paper explores the usage of behavioural data for credit modelling gathered from users of Klarna’s app. The models are based on the machine learning algorithms logistic regression, random forests, neural networks, and gradient boosted decision trees. In this study, models were trained on Swedish data in multiple timespans and tested in different timespans and countries. The results show that modelling on the data points developed in this study is effective and suggest that in certain cases be used in predicting new and unknown markets by training on similar markets. / Kreditvärderingar har traditionellt sätt utförts av kreditinstitut baserat på existerande finansiella data kring personen i fråga som ansöker om kredit. Denna metod har varit framgångsrik i att minimera risk inom utvecklade ekonomier där finansiella data har varit tillgänglig. Metoden har varit mindre framgångsrik i utvecklingsekonomier och misslyckas att utvärdera befolkningar som saknar finansiella tjänster. Då detta problem begränsar många pålitliga konsumenter att få tillgång till kredit och samtidigt begränsar företagen att nå ut till möjliga betalande kunder, blir det viktigt att ta fram nya strategier för att utvärdera kredit. Denna uppsats utforskar möjligheten att modellera kreditvärdighet baserat på användarbeteende med hjälp av data från Klarnas shopping app. Modellerna är baserade på maskininlärningsalgoritmerna logistisk regression, Random Forests, neurala nätverk och gradient boosted decision trees. I denna studie tränas modellerna på olika tidsspann inom den svenska marknaden och testas på olika tidsspann och marknader. Resultaten från studien visar att det går med hjälp av beteende data från Klarnas app att, under olika omständigheter, förutspå kreditvärdighet i framtiden och på olika marknader.
|
64 |
Machine Learning Methods for Predicting Trading Behaviour of an Actively Managed Mutual FundForslund, Herman, Johnson, Marcus January 2021 (has links)
This paper aims to reverse engineer the tradingstrategy of an actively managed mutual fund by identifyingtechnical patterns in their trading. Investment strategies formany institutional investors consists of both fundamental andtechnical analysis. The purpose of the paper is to explore towhich extent the latter can be used to predict the trading actionsby taking some commonly used technical indicators as input invarious machine learning algorithms to assess patterns betweenthem and the trading of the fund. Furthermore, the technicalindicators’ ability to predict future prices is analysed using thesame methods. The results are not sufficiently clear to suggestthat the fund uses technical indicators to begin with, let alonewhich ones. As for the prediction of future prices, the technicalindicators appear to have some predictive ability. / Syftet med denna rapport är att prediktera handeln i en aktivt förvaltad aktiefond med hjälp av fyra maskininlärningsmetoder. Investeringsstrategier kombinerar i regel två analysmetoder, fundamental respektive teknisk analys. Avsikten med rapporten är att utforska huruvida det sistnämnda kan användas för att förutspå fondens handel genom att använda ett antal vanligt förekommande tekniska indikatorer och medelst maskininlärningsmetoder söka efter mönster mellan dessa och handeln. Vidare innefattar även studien en analys över hur väl tekniska indikatorer predikterar upprespektive nedgångar på aktiepriser. Vad gäller investeringsstrategierna återfanns inga tydliga samband mellan de utvalda indikatorerna och transaktionerna. Resultaten för andra delen av studien tyder på viss prediktiv förmåga för tekniska indikatorer på marknadsrörelser. / Kandidatexjobb i elektroteknik 2021, KTH, Stockholm
|
65 |
Applying Multivariate Time Series Data and Deep Learning to Probability of Default Estimation / Kreditriskbedömning Baserat på Multivariat Tidsseriedata och DjupinlärningVävinggren, David, Säll, Emil January 2024 (has links)
The problem of determining the probability of default or credit risk for companies is crucial when providing financial services. This problem is often modeled based on snapshot data that does not take the time dimension into account. Instead, we approach the problem with enterprise resource planning data in time series. With the added complexity the time series introduce, we pose that deep learning models could be suitable for the task. A comparison of a fully convolutional network and a transformer encoder was made to the current state-of-the-art model for the probability of default problem, XGBoost. The comparison showed that XGBoost generalized very well to the time series domain, even well enough to beat the deep learning models across all evaluation metrics. Furthermore, time series data with monthly, quarterly and yearly timestamps over three years was tested. Also, public features that could be extracted from quarterly and annual financial reports were compared with internal enterprise resource planning data. We found that the introduction of time series to the problem improves the performance and that models based on internal data outperform the ones based on public data. To be more precise, we argue that the dataset being based on small to medium-sized companies lessens the impact of highly granular data, and makes the selection of what features to include more prominent. This is something XGBoost takes advantage of in a very efficient way, especially when extracting features that capture the behavior of the time series, causing it to beat the deep learning competitors even though it does not pick up on the sequential aspect of the data.
|
66 |
Uplift Modeling : Identifying Optimal Treatment Group Allocation and Whom to Contact to Maximize Return on InvestmentKarlsson, Henrik January 2019 (has links)
This report investigates the possibilities to model the causal effect of treatment within the insurance domain to increase return on investment of sales through telemarketing. In order to capture the causal effect, two or more subgroups are required where one group receives control treatment. Two different uplift models model the causal effect of treatment, Class Transformation Method, and Modeling Uplift Directly with Random Forests. Both methods are evaluated by the Qini curve and the Qini coefficient. To model the causal effect of treatment, the comparison with a control group is a necessity. The report attempts to find the optimal treatment group allocation in order to maximize the precision in the difference between the treatment group and the control group. Further, the report provides a rule of thumb that ensure that the control group is of sufficient size to be able to model the causal effect. If has provided the data material used to model uplift and it consists of approximately 630000 customer interactions and 60 features. The total uplift in the data set, the difference in purchase rate between the treatment group and control group, is approximately 3%. Uplift by random forest with a Euclidean distance splitting criterion that tries to maximize the distributional divergence between treatment group and control group performs best, which captures 15% of the theoretical best model. The same model manages to capture 77% of the total amount of purchases in the treatment group by only giving treatment to half of the treatment group. With the purchase rates in the data set, the optimal treatment group allocation is approximately 58%-70%, but the study could be performed with as much as approximately 97%treatment group allocation.
|
67 |
Lavinprognoser och maskininlärning : Att prediktera lavinprognoser med maskininlärning och väderdataPettersson, Gustav, Almqvist, John January 2019 (has links)
Denna forskningsansats undersöker genomförbarheten i att prediktera lavinfara med hjälp av ma-skininlärning i form avXGBoostoch väderdata. Lavinprognoser och meterologisk vädermodelldata harsamlats in för de sex svenska fjällområden där Naturvårdsveket genomlavinprognoser.sepublicerar lavin-prognoser. Lavinprognoserna har hämtats frånlavinprognoser.seoch den vädermodelldata som användsär hämtad från prognosmodellen MESAN, som produceras och tillhandahålls av Sveriges meteorologiskaoch hydrologiska institut. 40 modeller av typenXGBoosthar sedan tränats på denna datamängd, medsyfte att prediktera olika aspekter av en lavinprognos och den övergripande lavinfaran. Resultaten visaratt det möjligt att prediktera den dagligalavinfaranunder säsongen 2018/19 i Södra Jämtlandsfjällenmed en träffsäkerhet på 71% och enmean average errorpå 0,295, genom att applicera maskininlärningpå väderleken för det området. Värdet avXGBoosti sammanhanget har styrkts genom att jämföradessa resultat med resultaten från den enklare metoden logistisk regression, vilken uppvisade en sämreträffsäkerhet på 56% och enmean average errorpå 0,459. Forskningsansatsens bidrag är ett ”proof ofconcept” som visar på genomförbarheten av att med hjälp av maskininlärning och väderdata predikteralavinprognoser. / This research project examines the feasibility of using machine learning to predict avalanche dangerby usingXGBoostand openly available weather data. Avalanche forecasts and meterological modelledweather data have been gathered for the six areas in Sweden where Naturvårdsverket throughlavin-prognoser.seissues avalanche forecasts. The avanlanche forecasts are collected fromlavinprognoser.seand the modelled weather data is collected from theMESANmodel, which is produced and providedby the Swedish Meteorological and Hydrological Institute. 40 machine learning models, in the form ofXGBoost, have been trained on this data set, with the goal of assessing the main aspects of an avalan-che forecast and the overall avalanche danger. The results show it is possible to predict the day to dayavalanche danger for the 2018/19 season inSödra Jämtlandsfjällenwith an accuracy of 71% and a MeanAverage Error of 0.256, by applying machine learning to the weather data for that region. The contribu-tion ofXGBoostin this context, is demonstrated by applying the simpler method ofLogistic Regressionon the data set and comparing the results. Thelogistic regressionperforms worse with an accuracy of56% and a Mean Average Error of 0.459. The contribution of this research is a proof of concept, showingfeasibility in predicting avalanche danger in Sweden, with the help of machine learning and weather data.
|
68 |
Marketing Mix Modelling: A comparative study of statistical models / En jämförelsestudie av statistiska modeller i en Marketing Mix Modelling-kontextWigren, Richard, Cornell, Filip January 2019 (has links)
Deciding the optimal media advertisement spending is a complex issue that many companies today are facing. With the rise of new ways to market products, the choices can appear infinite. One methodical way to do this is to use Marketing Mix Modelling (MMM), in which statistical modelling is used to attribute sales to media spendings. However, many problems arise during the modelling. Modelling and mitigation of uncertainty, time-dependencies of sales, incorporation of expert information and interpretation of models are all issues that need to be addressed. This thesis aims to investigate the effectiveness of eight different statistical and machine learning methods in terms of prediction accuracy and certainty, each one addressing one of the previously mentioned issues. It is concluded that while Shapley Value Regression has the highest certainty in terms of coefficient estimation, it sacrifices some prediction accuracy. The overall highest performing model is the Bayesian hierarchical model, achieving both high prediction accuracy and high certainty.
|
69 |
Diferenční analýza multilingválního řečového korpusu pacientů s neurodegenerativními onemocněními / Differential analysis of multilingual corpus in patients with neurodegenerative diseasesKováč, Daniel January 2020 (has links)
This diploma thesis focuses on the automated diagnosis of hypokinetic dysarthria in the multilingual speech corpus, which is a motor speech disorder that occurs in patients with neurodegenerative diseases such as Parkinson’s disease. The automatic speech recognition approach to diagnosis is based on the acoustic analysis of speech and subsequent use of mathematical models. The popularity of this method is on the rise due to its objectivity and the possibility of working simultaneously on different languages. The aim of this work is to find out which acoustic parameters have high discriminative power and are universal for multiple languages. To achieve this, a statistical analysis of parameterized speech tasks and subsequent modelling by machine learning methods was used. The analyses were performed for Czech, American English, Hungarian and all languages together. It was found that only some parameters enable the diagnosis of the hypokinetic disorder and are, at the same time, universal for multiple languages. The relF2SD parameter shows the best results, followed by the NST parameter. When classifying speakers of all the languages together, the model achieves accuracy of 59 % and sensitivity of 72 %.
|
70 |
Predicting Multimodal Rehabilitation Outcomes using Machine LearningCheltuitor, Alexandru, Jones-Quartey, Niklas January 2020 (has links)
Chronic pain is a complex health issue and a major cause of disability worldwide. Although multimodal rehabilitation (MMR) has been recognized as an effective form of treatment for chronic pain, some patients do not benefit from it. If treatment outcomes could be reliably predicted, then patients who would benefit more from MMR could be prioritized over others. Machine learning has been proven capable of accurately predicting outcomes in other healthcare related domains. Therefore, this study aims to investigate the use of it to predict outcomes of MMR, using data from the Swedish Quality Registry for Pain Rehabilitation (SQRP). XGBoost regression was used for this purpose, and its predictive performance was compared to Ridge regression. 12 models were trained on SQRP data for each algorithm, in order to predict pain and quality of life related outcomes. The results show similar performances for both algorithms, with mean cross-validated R² values of 0.323 and 0.321 for the XGBoost and Ridge models respectively. The average root mean squared errors of 6.744 for XGBoost and 6.743 for Ridge were similar as well. Since XGBoost performed similarly to a less computationally expensive method, the use of this method for MMR outcome prediction was not supported by the results of this study. However, machine learning has the potential to be more effective for this purpose, through the use of different hyperparameter values, correlation-based feature selection or other machine learning algorithms.
|
Page generated in 0.0253 seconds