Global ETD Search

11	A Machine Learning Assessment to Predict the Sediment Transport Rate Under Oscillating Sheet Flow Conditions Vu, Huy 01 December 2019 (has links) The two-phase flow approach has been the conventional method designed to study the sediment transport rate. Due to the complexity of sediment transport, the precisely numerical models computed from that approach require initial assumptions and, as a result, may not yield accurate output for all conditions. This research work proposes that Machine Learning algorithms can be an alternative way to predict the processes of sediment transport in two-dimensional directions under oscillating sheet flow conditions, by utilizing the available dataset of the SedFoam multidimensional two-phase model. The assessment utilized linear regression and gradient boosting algorithm to analyze the lowest average mean squared error in each case and search for the best partition method based on the domain height of the simulation setup. Computer Sciences
12	Essays on Reinforcement Learning with Decision Trees and Accelerated Boosting of Partially Linear Additive Models Dinger, Steven 01 October 2019 (has links) No description available. Statistics Fitted Q-Iteration Gradient Boosting Online Random Forest Q-Learning Twin Boosting Variable Selection
13	Automation of price prediction using machine learning in a large furniture company Ghorbanali, Mojtaba January 2022 (has links) The accurate prediction of the price of products can be highlybeneficial for the procurers both businesses wised and productionwise. Many companies today, in various fields ofoperations and sizes, have access to a vast amount of datathat valuable information can be extracted from them. In thismaster thesis, some large databases of products in differentcategories have been analyzed. Because of confidentiality, thelabels from the database that are in this thesis are subtitled bysome general titles and the real titles are not mentioned. Also,the company is not referred to by name, but the whole job iscarried out on the real data set of products. As a real-worlddata set, the data was messy and full of nulls and missing data.So, the data wrangling took some more time. The approachesthat were used for the model were Regression methods andGradient Boosting models.The main purpose of this master thesis was to build priceprediction models based on the features of each item to assistwith the initial positioning of the product and its initial price.The best result that was achieved during this master thesiswas from XGBoost machine learning model with about 96%accuracy which can be beneficial for the producer to acceleratetheir pricing strategies. Price Prediction Machine learning Regression analysis Gradient boosting algorithms LightGBM XGBoost Computer Sciences Datavetenskap (datalogi)
14	Machine Learning and Telematics for Risk Assessment in Auto Insurance Ekström, Frithiof, Chen, Anton January 2020 (has links) Pricing models for car insurance traditionally use variables related to the policyholder and the insured vehicle (e.g. car brand and driver age) to determine the premium. This can lead to situations where policyholders belonging to a group that is seen as carrying a higher risk for accidents wrongfully get a higher premium, even if the higher risk might not necessarily apply on a per- individual basis. Telematics data offers an opportunity to look at driving behavior during individual trips, enabling a pricing model that can be customized to each policyholder. While these additional variables can be used in a generalized linear model (GLM) similar to the traditional pricing models, machine learning methods can possibly unravel non-linear connections between the variables. Using telematics data, we build a gradient boosting model (GBM) and a neural network (NN) to predict the claim frequency of policyholders on a monthly basis. We find that both GBMs and NNs offer predictive power that can be generalized to data that has not been used in the training of the models. The results of the study also show that telematics data play a considerable role in the model predictions, and that the frequency and distance of trips are important factors in determining the risk using these models. / Prissättningsmodeller för bilförsäkringar använder traditionellt variabler relaterade till försäkringstagaren och det försäkrade fordonet (t.ex. bilmärke och förarålder) för att bestämma försäkringspremien. Detta kan leda till situationer där försäkringstagare som tillhör en grupp som anses bära på en högre risk för olyckor får en felaktigt hög premie, även om den högre risken inte nödvändigtvis gäller på en individbasis. Telematikdata erbjuder en möjlighet att titta på körbeteende under individuella resor, vilket möjliggör en prissättningsmodell som kan anpassas till varje enskild försäkringstagare. Ä ven om dessa variabler kan användas i en linjär modell liknande de traditionella prissättningsmodellerna kan användandet av maskininlärningsmetoder möjligen avslöja icke-linjära samband mellan variablerna. Med hjälp av telematikdata bygger vi en modell baserad på gradient boosting (GBM) och ett neuralt nätverk (NN) för att förutsäga frekvensen av olyckor för försäkringstagare på månadsbasis. Vi kommer fram till att båda modeller har en prediktiv förmåga som går att generalisera till data som inte har använts vid träningen av modellerna. Resultaten av studien visar även att telematikdata spelar en betydande roll i modellernas prediktioner, samt att frekvensen och sträckan av resor är viktiga faktorer vid bedömningen av risken med hjälp av dessa modeller. Telematics for car insurance Gradient Boosting Machine Neural Network Machine Learning Computer and Information Sciences Data- och informationsvetenskap
15	Data Analytics using Regression Models for Health Insurance Market place Data Killada, Parimala January 2017 (has links) No description available. Computer Science
16	A Comprehensive Experimental and Computational Investigation on Estimation of Scour Depth at Bridge Abutment: Emerging Ensemble Intelligent Systems Pandey, M., Karbasi, M., Jamei, M., Malik, A., Pu, Jaan H. 12 October 2024 (has links) No / Several bridges failed because of scouring and erosion around the bridge elements. Hence, precise prediction of abutment scour is necessary for the safe design of bridges. In this research, experimental and computational investigations have been devoted based on 45 flume experiments carried out at the NIT Warangal, India. Three innovative ensemblebased data intelligence paradigms, namely categorical boosting (CatBoost) in conjunction with extra tree regression (ETR) and K-nearest neighbor (KNN), are used to accurately predict the scour depth around the bridge abutment. A total of 308 series of laboratory data (a wide range of existing abutment scour depth datasets (263 datasets) and 45 flume data) in various sediment and hydraulic conditions were used to develop the models. Four dimensionless variables were used to calculate scour depth: approach densimetric Froude number (Fd50), the upstream depth (y) to abutment transverse length ratio (y/L), the abutment transverse length to the sediment mean diameter (L/d50), and the mean velocity to the critical velocity ratio (V/Vcr). The Gradient boosting decision tree (GBDT) method selected features with higher importance. Based on the feature selection results, two combinations of input variables (comb1 (all variables as model input) and comb2 (all variables except Fd50)) were used. The CatBoost model with Comb1 data input (RMSE = 0.1784, R = 0.9685, MAPE = 10.4724) provided better accuracy when compared to other machine learning models. Abutment Scour depth Extra tree regression CatBoost Gradient boosting decision tree Feature selection
17	Using Gradient Boosting to Identify Pricing Errors in GLM-Based Tariffs for Non-life Insurance / Identifiering av felprissättningar i GLM-baserade skadeförsäkringstariffer genom Gradient boosting Greberg, Felix, Rylander, Andreas January 2022 (has links) Most non-life insurers and many creditors use regressions, more specifically Generalized Linear Models (GLM), to price their liabilities. One limitation with GLMs is that interactions between predictors are handled manually, which makes finding interactions a tedious and time-consuming task. This increases the cost of rate making and, more importantly, actuaries can miss important interactions resulting in sub-optimal customer prices. Several papers have shown that Gradient Tree Boosting can outperform GLMs in insurance pricing since it handles interactions automatically. Insurers and creditors are however reluctant to use so-called ”Black-Box” solutions for both regulatory and technical reasons. Tree-based methods have been used to identify pricing errors in regressions, albeit only as ad-hoc solutions. The authors instead propose a systematic approach to automatically identify and evaluate interactions between predictors before adding them to a traditional GLM. The model can be used in three different ways: Firstly, it can create a table of statistically significant candidate interactions to add to a GLM. Secondly, it can automatically and iteratively add new interactions to an old GLM until no more statistically significant interactions can be found. Lastly, it can automatically create a new GLM without an existing pricing model. All approaches are tested on two motor insurance data sets from a Nordic P&C insurer and the results show that all methods outperform the original GLMs. Although the two iterative modes perform better than the first, insurers are recommended to mainly use the first mode since this results in a reasonable trade-off between automating processes and leveraging actuaries’ professional judgment. / De flesta skadeförsäkringsbolag och många långivare använder regressioner, mer specifikt generaliserade linjära modeller (GLM), för att prissätta sina skulder. En begräsning med GLM:er är att interaktioner mellan exogena variabler hanteras manuellt, vilket innebär att hanteringen av dessa är tidskrävande. Detta påverkar försäkringsbolags lönsamhet på flera sätt. För det första ökar kostnaderna för att skapa tariffer och för det andra kan aktuarier missa viktiga interaktioner, vilket resulterar i suboptimala kundpriser. Tidigare forskning visar att Gradient Boosting kan överträffa GLM:er inom försäkringsprissättning eftersom denna metod hanterar interaktioner automatiskt. Försäkringsbolag och kreditgivare är dock motvilliga till att använda så kallade ”Black-box-lösningar” på grund av både regulatoriska och tekniska skäl. Trädbaserade metoder har tidigare använts för att hitta felprissättningar i regressioner, dock endast genom situationsanpassade lösningar. Författarna föreslår i stället en systematisk metod för att automatiskt identifiera och evaluera interaktioner innan de inkluderas i en traditionell GLM. Modellen kan användas på tre olika sätt: Först och främst kan den användas för att skapa en tabell med statistiskt signifikanta interaktioner att addera till en existerande GLM. Utöver detta kan den iterativt och automatiskt lägga till sådana interaktioner tills inga fler återstår. Slutligen kan modellen också användas för att skapa en helt ny GLM från grunden, utan en existerande prissättningsmodell. Metoderna testas på två motorförsäkringsdataset från ett nordiskt skadeförsäkringsbolag och resultaten visar att alla överträffar originalregressionen. Även om de två iterativa metoderna överträffar den första metoden rekommenderas försäkringsbolag att använda den första metoden. Detta eftersom den resulterar i en rimlig avvägning mellan att automatisera processer och att nyttja aktuariers omdömesförmåga. GLM Gradient Boosting XGBoost Non-life insurance Property & Casualty Rate making Insurance Tariff MTPL insurance Machine learning Regression trees Tweedie regression Credit risk GLM Gradient Boosting XGBoost Skadeförsäkring Prissättning Försäkringstariff Trafikförsäkring Regressionsträd Maskininlärning Tweedie-regression Kreditrisk Other Mathematics Annan matematik
18	Modeling Melodic Accents in Jazz Solos / Modellering av melodiska accenter i jazzsolon Berrios Salas, Misael January 2023 (has links) This thesis looks at how accurately one can model accents in jazz solos, more specifically the sound level. Further understanding the structure of jazz solos can give a way of pedagogically presenting differences within music styles and even between performers. Some studies have tried to model perceived accents in different music styles. In other words, model how listeners perceive some tones as somehow accentuated and more important than others. Other studies have looked at how the sound level correlates to other attributes of the tone. But to our knowledge, no other studies have been made modeling actual accents within jazz solos, nor have other studies had such a big amount of training data. The training data used is a set of 456 solos from the Weimar Jazz Database. This is a database containing tone data and metadata from monophonic solos performed with multiple instruments. The features used for the training algorithms are features obtained from the software Director Musices created at the Royal Institute of Technology in Sweden; features obtained from the software "melfeature" created at the University of Music Franz Liszt Weimar in Germany; and features built upon tone data or solo metadata from the Weimar Jazz Database. A comparison between these is made. Three learning algorithms are used, Multiple Linear Regression (MLR), Support Vector Regression (SVR), and eXtreme Gradient Boosting (XGBoost). The first two are simpler regression models while the last is an award-winning tree boosting algorithm. The tests resulted in eXtreme Gradient Boosting (XGBoost) having the highest accuracy when combining all the available features minus some features that were removed since they did not improve the accuracy. The accuracy was around 27% with a high standard deviation. This tells that there was quite some difference when predicting the different solos, some had an accuracy of about 67% while others did not predict one tone correctly in the entire solo. But as a general model, the accuracy is too low for actual practical use. Either the methods were not the optimal ones or jazz solos differ too much to find a general pattern. / Detta examensarbete undersöker hur väl man kan modellera accenter i jazz-solos, mer specifikt ljudnivån. En bredare förståelse för strukturen i jazzsolos kan ge ett sätt att pedagogiskt presentera skillnaderna mellan olika musikstilar och även mellan olika artister. Andra studier har försökt modellera uppfattade accenter inom olika musik-stilar. Det vill säga, modellera hur åhörare upplever vissa toner som accentuerade och viktigare än andra. Andra studier har undersökt hur ljudnivån är korrelerad till andra attribut hos tonen. Men såvitt vi vet, så finns det inga andra studier som modellerar faktiska accenter inom jazzsolos, eller som haft samma stora mängd träningsdata. Träningsdatan som använts är ett set av 456 solos tagna från Weimar Jazz Database. Databasen innehåller data på toner och metadata från monofoniska solos genomförda med olika instrument. Särdragen som använts för tränings-algoritmerna är särdrag erhållna från mjukvaran Director Musices skapad på Kungliga Tekniska Högskolan i Sverige; särdrag erhållna från mjukvaran ”melfeature” skapad på University of Music Franz Liszt Weimar i Tyskland; och särdrag skapade utifrån datat i Weimar Jazz Database. En jämförelse mellan dessa har också gjorts. Tre inlärningsalgoritmer har använts, Multiple Linear Regression (MLR), Support Vector Regression (SVR), och eXtreme Gradient Boosting (XGBoost). De första två är enklare regressionsalgoritmer, medan den senare är en prisbelönt trädförstärkningsalgoritm. Testen resulterade i att eXtreme Gradient Boosting (XGBoost) skapade en modell med högst noggrannhet givet alla tillgängliga särdrag som träningsdata minus vissa särdrag som tagits bort då de inte förbättrar noggrannheten. Den erhållna noggrannheten låg på runt 27% med en hög standardavvikelse. Detta pekar på att det finns stora skillnader mellan att förutsäga ljudnivån mellan de olika solin. Vissa solin gav en noggrannhet på runt 67% medan andra erhöll inte en endaste ljudnivå korrekt i hela solot. Men som en generell modell är noggrannheten för låg för att användas i praktiken. Antingen är de valda metoderna inte de bästa, eller så är jazzsolin för olika för att hitta ett generellt mönster som går att förutsäga. Accents Jazz Solo Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamic Accenter Jazz Solos Support Vector Regression (SVR) eXtreme Gradient Boosting (XGBoost) Multiple Linear Regression (MLR) Dynamisk Computer and Information Sciences Data- och informationsvetenskap
19	Identifying Optimal Throw-in Strategy in Football Using Logistic Regression / Identifiering av Optimal Inkaststrategi i Fotboll med Logistisk Regression Nieto, Stephan January 2023 (has links) Set-pieces such as free-kicks and corners have been thoroughly examined in studies related to football analytics in recent years. However, little focus has been put on the most frequently occurring set-piece: the throw-in. This project aims to investigate how football teams can optimize their throw-in tactics in order to improve the chance of taking a successful throw-in. Two different definitions of what constitutes a successful throw-in are considered, firstly if the ball is kept in possession and secondly if a goal chance is created after the throw-in. The analysis is conducted using logistic regression, as this model comes with high interpretability, making it easier for players and coaches to gain direct insights from the results. A substantial focus is put on the investigation of the logistic regression assumptions, with the greatest emphasis being put on the linearity assumption. The results suggest that long throws directed towards the opposition’s goal are the most effective for creating goal-scoring opportunities from throw-ins taken in the attacking third of the pitch. However, if the throw-in is taken in the middle or defensive regions of the pitch, the results interestingly indicate that throwing the ball backwards leads to increased chance of scoring. When it comes to retaining the ball possession, the results suggest that throwing the ball backwards is an effective strategy regardless of the pitch position. Moreover, the project outlines how feature transformations can be used to improve the fitting of the logistic regression model. However, it turns out that the most significant improvement in accuracy of logistic regression occurs when incorporating additional relevant features into the model. In such case, the logistic regression model achieves a predictive power comparable to more advanced machine learning methods. / Fasta situationer såsom frisparkar och hörnor har varit välstuderade i studier rörande fotbollsanalys de senaste åren. Lite fokus har emellertid lagts på den vanligast förekommande fasta situationen: inkastet. Detta projekt syftar till att undersöka hur fotbollslag kan optimera sin inkasttaktik för att förbättra möjligheterna till att genomföra ett lyckat inkast. Två olika definitioner av vad som utgör ett lyckat inkast beaktas, dels om bollinnehavet behålls och dels om en målchans skapas efter inkastet. Analysen görs med logistisk regression eftersom denna modell har hög tolkningsbarhet, vilket gör det lättare för spelare och tränare att få direkta insikter från resultaten. Stort fokus läggs på undersökning av de logistiska regressionsantagandena, där störst vikt läggs på antagandet gällande linjäritet. Resultaten tyder på att långa inkast riktade mot motståndarnas mål är de mest gynnsamma för att skapa en målchans från inkast tagna i den offensiva tredjedelen av planen. Om inkastet istället tas från de mellersta eller defensiva delarna av planen tyder resultaten intressant nog på att inkast riktade bakåt leder till ökad chans till att göra mål. När det kommer till att behålla bollinnehavet visar resultaten att kast bakåt är en gynnsam strategi, oavsett var på planen inkasten tas ifrån. Vidare visar projektet hur variabeltransformationer kan användas för att förbättra modellanpassningen för logistisk regression. Det visar sig dock att den tydligaste förbättringen fås då fler relevanta variabler läggs till i modellen. I sådant fall, får logistisk regression en prediktiv förmåga som är jämförbar med mer avancerade maskininlärningsmetoder. Set-piece throw-in football analytics optimal strategy logistic regression model assumptions feature importance feature transformations gradient boosting Fasta situationer inkast fotbollsanalys optimal strategi logistisk regression modellantaganden variabelvikt variabeltransformationer gradient boosting Other Mathematics Annan matematik
20	Predicting House Prices on the Countryside using Boosted Decision Trees / Förutseende av huspriser på landsbygden genom boostade beslutsträd Revend, War January 2020 (has links) This thesis intends to evaluate the feasibility of supervised learning models for predicting house prices on the countryside of South Sweden. It is essential for mortgage lenders to have accurate housing valuation algorithms and the current model offered by Booli is not accurate enough when evaluating residence prices on the countryside. Different types of boosted decision trees were implemented to address this issue and their performances were compared to traditional machine learning methods. These different types of supervised learning models were implemented in order to find the best model with regards to relevant evaluation metrics such as root-mean-squared error (RMSE) and mean absolute percentage error (MAPE). The implemented models were ridge regression, lasso regression, random forest, AdaBoost, gradient boosting, CatBoost, XGBoost, and LightGBM. All these models were benchmarked against Booli's current housing valuation algorithms which are based on a k-NN model. The results from this thesis indicated that the LightGBM model is the optimal one as it had the best overall performance with respect to the chosen evaluation metrics. When comparing the LightGBM model to the benchmark, the performance was overall better, the LightGBM model had an RMSE score of 0.330 compared to 0.358 for the Booli model, indicating that there is a potential of using boosted decision trees to improve the predictive accuracy of residence prices on the countryside. / Denna uppsats ämnar utvärdera genomförbarheten hos olika övervakade inlärningsmodeller för att förutse huspriser på landsbygden i Södra Sverige. Det är viktigt för bostadslånsgivare att ha noggranna algoritmer när de värderar bostäder, den nuvarande modellen som Booli erbjuder har dålig precision när det gäller värderingar av bostäder på landsbygden. Olika typer av boostade beslutsträd implementerades för att ta itu med denna fråga och deras prestanda jämfördes med traditionella maskininlärningsmetoder. Dessa olika typer av övervakad inlärningsmodeller implementerades för att hitta den bästa modellen med avseende på relevanta prestationsmått som t.ex. root-mean-squared error (RMSE) och mean absolute percentage error (MAPE). De övervakade inlärningsmodellerna var ridge regression, lasso regression, random forest, AdaBoost, gradient boosting, CatBoost, XGBoost, and LightGBM. Samtliga algoritmers prestanda jämförs med Boolis nuvarande bostadsvärderingsalgoritm, som är baserade på en k-NN modell. Resultatet från denna uppsats visar att LightGBM modellen är den optimala modellen för att värdera husen på landsbygden eftersom den hade den bästa totala prestandan med avseende på de utvalda utvärderingsmetoderna. LightGBM modellen jämfördes med Booli modellen där prestandan av LightGBM modellen var i överlag bättre, där LightGBM modellen hade ett RMSE värde på 0.330 jämfört med Booli modellen som hade ett RMSE värde på 0.358. Vilket indikerar att det finns en potential att använda boostade beslutsträd för att förbättra noggrannheten i förutsägelserna av huspriser på landsbygden. Machine Learning Predicting House Prices Shrinkage Methods Random Forest Decision Tree AdaBoost Gradient Boosting LightGBM CatBoost XGBoost Maskininlärning Förutseende av Huspriser Krympningsmetoder Random Forest Beslutsträd AdaBoost Gradient Boosting LightGBM CatBoost XGBoost Probability Theory and Statistics Sannolikhetsteori och statistik

Search results